From jason.stajich at gmail.com Sun Jun 2 13:28:50 2013 From: jason.stajich at gmail.com (Jason Stajich) Date: Sun, 2 Jun 2013 11:28:50 -0700 Subject: [maker-devel] getting protein sequences from genomes In-Reply-To: References: <18790D2A402432409BCC7E00F2AE8926ACE666@rexma.intranet.epfl.ch> <18790D2A402432409BCC7E00F2AE8926AD4807@REXMF.intranet.epfl.ch> <98C45AF6-8F3E-4C06-B283-56AD9C07DD2C@genetics.utah.edu> Message-ID: seems like in your case you want to do more of a liftover-based annotation. generate that and feed it as a gff file to maker if your intention is also gene discovery in your population? On May 23, 2013, at 9:48 AM, Daniel Hughes wrote: > would gene annotation by projection using synteny/WGA not be more appropriate? either way what's wrong with running one of the standard orthology predictions tools or just basic best reciprocal blast? > > dan. > > Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) > ------------------------------------------------------------------------------------- > dsth at cantab.net > dsth at cpan.org > > > 2013/5/23 Barry Moore > Hi Liciano, > > If I understand correctly you are including translations of SNAP and Augustus predictions as well as the predictions. If so, you don't want to do that. An overlapping protein evidence is sufficient to promote a prediction to an annotation, so by providing the protein translation of the prediction along with the prediction you will guarantee that every prediction will become an annotation and that means you lose the benefit of evidence supervised annotation that MAKER provides. Include the proteins from the D mel reference and if you want to cast a broader net include proteins from other dipterans or even Uniprot - just depend on how aggressive you want to try to be in capturing new annotations. > > B > > On May 23, 2013, at 8:41 AM, Luciano Abriata wrote: > >> Thanks for your reply! >> >> One more question, can you think of any tips to get the best possible predictions of protein sequences? >> >> I am asking because I am getting a few proteins that are too big to be real and don't exist if I blast them, plus a few others which don't start with Methionine... So far I am including transcripts and translations from flybase, and snap and augustus with their available trainings for flies. Do you see any possible source of error in that? >> >> Thanks again, >> >> Luciano >> >> De: Barry Moore [barry.moore at genetics.utah.edu] >> Enviado el: viernes, 17 de mayo de 2013 09:02 p.m. >> Para: Luciano Abriata >> Cc: maker-devel at yandell-lab.org >> Asunto: Re: [maker-devel] getting protein sequences from genomes >> >> >> On May 17, 2013, at 3:45 AM, Luciano Abriata wrote: >> >>> Hello, I am trying to use Maker to annotate genomes from different individuals of a population (D. melanogaster flies). >>> >>> My ultimate goal is to get, for each gene, the amino acid sequences of the coded proteins as they are expressed from each genome. My questions are: >>> >>> 1) How can I match proteins predicted for the same gene in two genomes? >> >> blastp tweaked with parameters to optimize near perfect match >> >>> >>> 2) What is the meaning of all the data in a line such as the following one (taken from the protein.fasta output) >>> >>> maker-2L-augustus-gene-0.19-mRNA-1 protein AED:0.0322873164323667 eAED:0.0322873164323667 QI:2|1|0.66|1|1|1|3|208|541 >>> >> >> AED = Annotation edit distance describes how closely the prediction matches the evidence. This is a distance measure and thus 0 is a perfect match and 1 is no overlap. >> >> eAED = Exon adjusted annotation edit distance: This metric is the same as AED with a couple of exceptions. For a protein coding exon to be counted as overlapping protein evidence the reading frame must be the same in the coding exon and the protein evidence. Second, when mRNA Seq data is used as evidence and both ends of an exon are supported with splice site spanning reads, the middle of that exon is counted as supported as well even if coverage drops off in the interior of the exon.. For the most part AED and eAED will always be the same, but eAED tends to work better on many fringe cases. >> >> QI values are as follows: >> >> 5' UTR Length >> Fraction of splice sites confirmed by EST alignment. >> Fraction of exons that overlap and EST alignment. >> Fraction of exons that overlap EST or protein alignment. >> Fraction of splice sites confirmed by an ab initio prediction. >> Fraction of exons that overlap an ab intitio prediction. >> Number of exons in the transcript. >> 3' UTR length. >> Length of encoded protein. >> >> >>> 3) If I include snap and augustus to improve protein predictions, I get several protein.fasta files: augustus_masked.proteins.fasta , snap_masked.proteins.fasta , non_overlapping_ab_initio.proteins.fasta , and proteins.fasta >>> >>> Which of these files contains the definite set of predicted protein sequences? >> >> The proteins.fasta file is the final set of proteins for all genes that MAKER created annotations for. >> >>> >>> >>> >>> Thanks in advance! >>> >>> Luciano >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> Barry Moore >> Research Scientist >> Dept. of Human Genetics >> University of Utah >> Salt Lake City, UT 84112 >> -------------------------------------------- >> (801) 585-3543 >> >> >> >> >> > > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Jason Stajich jason.stajich at gmail.com jason at bioperl.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Jun 3 08:04:08 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 03 Jun 2013 09:04:08 -0400 Subject: [maker-devel] Advice on params for ciliates In-Reply-To: <9D9882BB-3A26-45D6-A5B0-9B18F9BF5C31@hms.harvard.edu> Message-ID: I don't have any specific advice, but In general I always set blast_depth parameters in the maker_bopts file to 20 or 30 (faster runtimes). Also max_dna_len can be set to 2x higher if you have sufficient memory (3-4 Gb per cpu as opposed to 1-2 Gb that are assumed with the default). Other than that split_hit, pred_flank, and single_exon are the only ones I might change around. You sort of have to run on a few large contigs before deciding what to do with these parameters. split_hit --> set max intron size for alignments pred_flank --> affects clustering for gene dense organisms single_exon --> leave off unless you expect a lot of singel exon genes. --Carson From: "Freeman, Robert M." Date: Thursday, 23 May, 2013 4:17 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Advice on params for ciliates Dear MAKER community, Am embarking on updating models for a ciliate (taxa Ciliophora) and was wondering if folks had recommendations for MAKER parameters. Thanks, Bob ----------------------------------------------------- Bob Freeman, Ph.D. Acorn Worm Informatics, Kirschner lab Dept of Systems Biology, Alpert 524 Harvard Medical School 200 Longwood Avenue Boston, MA 02115 617/432.2294, vox "Sorry I'm late. Oh, God, that sounded insincere. I'm late." -- Karen Walker, from Will and Grace _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bob_Freeman at hms.harvard.edu Wed Jun 5 08:28:36 2013 From: Bob_Freeman at hms.harvard.edu (Bob Freeman) Date: Wed, 5 Jun 2013 09:28:36 -0400 Subject: [maker-devel] Advice on params for ciliates In-Reply-To: References: Message-ID: Thanks, Carson, for these helpful hints. (Separately, the other code did not work again on our cluster. Have been so swamped -- I'll get to the write-up next week. Have been using the 2.25beta binary and that works OK). Best, Bob On Jun 3, 2013, at 9:04 AM, Carson Holt wrote: > I don't have any specific advice, but In general I always set blast_depth parameters in the maker_bopts file to 20 or 30 (faster runtimes). Also max_dna_len can be set to 2x higher if you have sufficient memory (3-4 Gb per cpu as opposed to 1-2 Gb that are assumed with the default). > > Other than that split_hit, pred_flank, and single_exon are the only ones I might change around. You sort of have to run on a few large contigs before deciding what to do with these parameters. > > split_hit --> set max intron size for alignments > pred_flank --> affects clustering for gene dense organisms > single_exon --> leave off unless you expect a lot of singel exon genes. > > --Carson > > > From: "Freeman, Robert M." > Date: Thursday, 23 May, 2013 4:17 PM > To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] Advice on params for ciliates > > Dear MAKER community, > > Am embarking on updating models for a ciliate (taxa Ciliophora) and was wondering if folks had recommendations for MAKER parameters. > > Thanks, > Bob > > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org ----------------------------------------------------- Bob Freeman, Ph.D. Acorn Worm Informatics, Kirschner lab Dept of Systems Biology, Alpert 524 Harvard Medical School 200 Longwood Avenue Boston, MA 02115 617/432.2294, vox "Sorry I'm late. Oh, God, that sounded insincere. I'm late." -- Karen Walker, from Will and Grace -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Wed Jun 5 11:28:46 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Wed, 5 Jun 2013 11:28:46 -0500 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: References: Message-ID: I upgraded to 2.28 and Maker is not running. Thanks! On Wed, May 22, 2013 at 9:03 AM, Carson Holt wrote: > Are you using MAKER version 2.10? I ask because there is in issue with > other_gff in that version that has since been fixed. So if you don't get > other_gff to pass-through, you will need to upgrade to 2.28 (release date > is later today coincidentally). > > For the Augustus GFF3 file, the format is a little weird which is > causing the problem. They are mRNA features not attached to genes. Rather > than build the expected 3 level gene/mRNA/exon structure for these, it is > simpler just to convert it to the 2 level match/match_part structure. Just > convert the 'mRNA' tag to 'match' and all 'exon' tags to 'match_part'. > Rename the GFF3 when your done so that it will force rebuild of the GFF3 > database when you run again. > > Thanks, > Carson > > > > From: Innocent Onsongo > Date: Wednesday, 22 May, 2013 8:47 AM > To: Barry Moore > Cc: > Subject: Re: [maker-devel] Maker: Re-annotation > > No. The MAKER produced GFF3 file does not contain any annotations. I > even tried setting the keep_preds parameter to 1 (keep_preds=1) to see if > it will pass annotations from the Augustus produced GFF file into the final > annotation but that didn't work. I have attached the maker_opts.ctl file > I used together with the first 100 lines of the GFF files it's using. I > also include the GFF file produced by MAKER (CGS01058First100.gff) > > > > > On Tue, May 21, 2013 at 10:43 PM, Barry Moore wrote: > >> Hi Getiria, >> >> Does the MAKER produced GFF3 file contain any annotations at all? Can >> you send the first ~100 lines each of the MAKER produced GFF3 file and of >> the GFF3 files that you passed via maker_opts.ctl? >> >> B >> >> On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: >> >> Maker Development Team, >> >> I am trying to use Maker for re-annotation using gene predictions from >> Augustus. We had previously used Augustus for gene prediction but now want >> to combine these annotations with some EST data. I updated >> fields maker_opts.ctl as below >> >> genome=CGS01058.fasta #genome sequence file in fasta format >> est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT >> pred_gff=Augustus.gff3 #ab-initio predictions from >> other_gff=Promoters.gff3 #promoter annotations >> other_gff=CpG_Islands.gff3 # CpG island annotations >> >> Maker runs to completion and according to the log file annotation was >> successful. However, it also gives a "Segmentation fault (core dumped)" >> message. It does produce a GFF3 file but when I load the GFF3 file into IGV >> and look it does not contain any of the exon definitions in Augustus.gff3. >> Am I missing something? >> >> Regards, >> Getiria >> >> -- >> Getiria Onsongo, Ph.D. >> Informatics Analyst, Research Informatics Support System >> Minnesota Supercomputing Institute for Advanced Computational Research >> University of Minnesota >> Minneapolis, MN 55455 >> Phone: 612-624-0532 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> Barry Moore >> Research Scientist >> Dept. of Human Genetics >> University of Utah >> Salt Lake City, UT 84112 >> -------------------------------------------- >> (801) 585-3543 >> >> >> >> >> > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jun 5 09:30:20 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Jun 2013 10:30:20 -0400 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: Message-ID: What does it do? --Carson From: Innocent Onsongo Date: Wednesday, 5 June, 2013 12:28 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" , Barry Moore Subject: Re: [maker-devel] Maker: Re-annotation I upgraded to 2.28 and Maker is not running. Thanks! On Wed, May 22, 2013 at 9:03 AM, Carson Holt wrote: > Are you using MAKER version 2.10? I ask because there is in issue with > other_gff in that version that has since been fixed. So if you don't get > other_gff to pass-through, you will need to upgrade to 2.28 (release date is > later today coincidentally). > > For the Augustus GFF3 file, the format is a little weird which is causing the > problem. They are mRNA features not attached to genes. Rather than build the > expected 3 level gene/mRNA/exon structure for these, it is simpler just to > convert it to the 2 level match/match_part structure. Just convert the 'mRNA' > tag to 'match' and all 'exon' tags to 'match_part'. Rename the GFF3 when your > done so that it will force rebuild of the GFF3 database when you run again. > > Thanks, > Carson > > > > From: Innocent Onsongo > Date: Wednesday, 22 May, 2013 8:47 AM > To: Barry Moore > Cc: > Subject: Re: [maker-devel] Maker: Re-annotation > > No. The MAKER produced GFF3 file does not contain any annotations. I even > tried setting the keep_preds parameter to 1 (keep_preds=1) to see if it will > pass annotations from the Augustus produced GFF file into the final annotation > but that didn't work. I have attached the maker_opts.ctl file I used together > with the first 100 lines of the GFF files it's using. I also include the GFF > file produced by MAKER (CGS01058First100.gff) > > > > > On Tue, May 21, 2013 at 10:43 PM, Barry Moore wrote: >> Hi Getiria, >> >> Does the MAKER produced GFF3 file contain any annotations at all? Can you >> send the first ~100 lines each of the MAKER produced GFF3 file and of the >> GFF3 files that you passed via maker_opts.ctl? >> >> B >> >> On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: >> >>> Maker Development Team, >>> >>> I am trying to use Maker for re-annotation using gene predictions from >>> Augustus. We had previously used Augustus for gene prediction but now want >>> to combine these annotations with some EST data. I updated fields >>> maker_opts.ctl as below >>> >>> genome=CGS01058.fasta #genome sequence file in fasta format >>> est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT >>> pred_gff=Augustus.gff3 #ab-initio predictions from >>> other_gff=Promoters.gff3 #promoter annotations >>> other_gff=CpG_Islands.gff3 # CpG island annotations >>> >>> Maker runs to completion and according to the log file annotation was >>> successful. However, it also gives a "Segmentation fault (core dumped)" >>> message. It does produce a GFF3 file but when I load the GFF3 file into IGV >>> and look it does not contain any of the exon definitions in Augustus.gff3. >>> Am I missing something? >>> >>> Regards, >>> Getiria >>> >>> -- >>> Getiria Onsongo, Ph.D. >>> Informatics Analyst, Research Informatics Support System >>> Minnesota Supercomputing Institute for Advanced Computational Research >>> University of Minnesota >>> Minneapolis, MN 55455 >>> Phone: 612-624-0532 >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> Barry Moore >> Research Scientist >> Dept. of Human Genetics >> University of Utah >> Salt Lake City, UT 84112 >> -------------------------------------------- >> (801) 585-3543 >> >> >> >> > > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Wed Jun 5 11:35:43 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Wed, 5 Jun 2013 11:35:43 -0500 Subject: [maker-devel] Maker: accessory scripts Message-ID: I was able to successfully ran Maker and now want to converts the gene prediction match/match_part format to annotation gene/mRNA/exon/CDS format. I looked at the tutorial and the script gff3_preds2models is supposed to do this conversion. How do I access this script. It is not in /maker/2.28-beta/bin/ Also, in running gff3_preds2models is the file I used for pred_gff=? Long story short, how do I transform the GFF output from Maker to the more traditional annotation of exon/intron? Thanks, Getiria -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Wed Jun 5 11:37:01 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Wed, 5 Jun 2013 11:37:01 -0500 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: References: Message-ID: Oops! I meant to type Maker is NOW running. On Wed, Jun 5, 2013 at 9:30 AM, Carson Holt wrote: > What does it do? > > --Carson > > From: Innocent Onsongo > Date: Wednesday, 5 June, 2013 12:28 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" , Barry > Moore > > Subject: Re: [maker-devel] Maker: Re-annotation > > I upgraded to 2.28 and Maker is not running. Thanks! > > > On Wed, May 22, 2013 at 9:03 AM, Carson Holt wrote: > >> Are you using MAKER version 2.10? I ask because there is in issue with >> other_gff in that version that has since been fixed. So if you don't get >> other_gff to pass-through, you will need to upgrade to 2.28 (release date >> is later today coincidentally). >> >> For the Augustus GFF3 file, the format is a little weird which is causing >> the problem. They are mRNA features not attached to genes. Rather than >> build the expected 3 level gene/mRNA/exon structure for these, it is >> simpler just to convert it to the 2 level match/match_part structure. Just >> convert the 'mRNA' tag to 'match' and all 'exon' tags to 'match_part'. >> Rename the GFF3 when your done so that it will force rebuild of the GFF3 >> database when you run again. >> >> Thanks, >> Carson >> >> >> >> From: Innocent Onsongo >> Date: Wednesday, 22 May, 2013 8:47 AM >> To: Barry Moore >> Cc: >> Subject: Re: [maker-devel] Maker: Re-annotation >> >> No. The MAKER produced GFF3 file does not contain any annotations. I even >> tried setting the keep_preds parameter to 1 (keep_preds=1) to see if it >> will pass annotations from the Augustus produced GFF file into the final >> annotation but that didn't work. I have attached the maker_opts.ctl file >> I used together with the first 100 lines of the GFF files it's using. I >> also include the GFF file produced by MAKER (CGS01058First100.gff) >> >> >> >> >> On Tue, May 21, 2013 at 10:43 PM, Barry Moore wrote: >> >>> Hi Getiria, >>> >>> Does the MAKER produced GFF3 file contain any annotations at all? Can >>> you send the first ~100 lines each of the MAKER produced GFF3 file and of >>> the GFF3 files that you passed via maker_opts.ctl? >>> >>> B >>> >>> On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: >>> >>> Maker Development Team, >>> >>> I am trying to use Maker for re-annotation using gene predictions from >>> Augustus. We had previously used Augustus for gene prediction but now want >>> to combine these annotations with some EST data. I updated >>> fields maker_opts.ctl as below >>> >>> genome=CGS01058.fasta #genome sequence file in fasta format >>> est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT >>> pred_gff=Augustus.gff3 #ab-initio predictions from >>> other_gff=Promoters.gff3 #promoter annotations >>> other_gff=CpG_Islands.gff3 # CpG island annotations >>> >>> Maker runs to completion and according to the log file annotation was >>> successful. However, it also gives a "Segmentation fault (core dumped)" >>> message. It does produce a GFF3 file but when I load the GFF3 file into IGV >>> and look it does not contain any of the exon definitions in Augustus.gff3. >>> Am I missing something? >>> >>> Regards, >>> Getiria >>> >>> -- >>> Getiria Onsongo, Ph.D. >>> Informatics Analyst, Research Informatics Support System >>> Minnesota Supercomputing Institute for Advanced Computational Research >>> University of Minnesota >>> Minneapolis, MN 55455 >>> Phone: 612-624-0532 >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> Barry Moore >>> Research Scientist >>> Dept. of Human Genetics >>> University of Utah >>> Salt Lake City, UT 84112 >>> -------------------------------------------- >>> (801) 585-3543 >>> >>> >>> >>> >>> >> >> >> -- >> Getiria Onsongo, Ph.D. >> Informatics Analyst, Research Informatics Support System >> Minnesota Supercomputing Institute for Advanced Computational Research >> University of Minnesota >> Minneapolis, MN 55455 >> Phone: 612-624-0532 >> > > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Jun 5 11:38:59 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 5 Jun 2013 16:38:59 +0000 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: References: Message-ID: Hi Innocent, I'm just jumping in this conversation kind of late in the game, but if you look in the gff3 file that maker gave you, do you see any gene, exon, or CDS features in the output? When you give evidence (protein or EST) and ab-initio predictors to maker the default behavior is to create gene models. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] on behalf of Innocent Onsongo [onson001 at umn.edu] Sent: Wednesday, June 05, 2013 10:35 AM To: Carson Holt Cc: maker-devel at yandell-lab.org; Barry Moore Subject: [maker-devel] Maker: accessory scripts I was able to successfully ran Maker and now want to converts the gene prediction match/match_part format to annotation gene/mRNA/exon/CDS format. I looked at the tutorial and the script gff3_preds2models is supposed to do this conversion. How do I access this script. It is not in /maker/2.28-beta/bin/ Also, in running gff3_preds2models is the file I used for pred_gff=? Long story short, how do I transform the GFF output from Maker to the more traditional annotation of exon/intron? Thanks, Getiria -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jun 5 09:44:36 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Jun 2013 10:44:36 -0400 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: Message-ID: All maker gene annotations will be of the format gene/mRNA/exon/CDS. Anything in the format match/match_part is an evidence alignment or rejected model and is there for reference purposes. If you want to upgrade all of the rejected loci to gene annotations, set keep_preds=1 in the control files. If you want to upgrade a subset of rejected models to a full annotation, create a list of IDs (one per line) then give them to the attached script. gff3_preds2models was previously deprecated and no longer part of the maker distribution, but the attached script is an updated version with the same functionality. --Carson From: Innocent Onsongo Date: Wednesday, 5 June, 2013 12:35 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" , Barry Moore Subject: [maker-devel] Maker: accessory scripts I was able to successfully ran Maker and now want to converts the gene prediction match/match_part format to annotation gene/mRNA/exon/CDS format. I looked at the tutorial and the script gff3_preds2models is supposed to do this conversion. How do I access this script. It is not in /maker/2.28-beta/bin/ Also, in running gff3_preds2models is the file I used for pred_gff=? Long story short, how do I transform the GFF output from Maker to the more traditional annotation of exon/intron? Thanks, Getiria _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gff3_preds2models Type: application/octet-stream Size: 4777 bytes Desc: not available URL: From carsonhh at gmail.com Wed Jun 5 09:45:10 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Jun 2013 10:45:10 -0400 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: Message-ID: Gotcha :-) --Carson From: Innocent Onsongo Date: Wednesday, 5 June, 2013 12:37 PM To: Carson Holt Cc: Carson Holt , "maker-devel at yandell-lab.org" , Barry Moore Subject: Re: [maker-devel] Maker: Re-annotation Oops! I meant to type Maker is NOW running. On Wed, Jun 5, 2013 at 9:30 AM, Carson Holt wrote: > What does it do? > > --Carson > > From: Innocent Onsongo > Date: Wednesday, 5 June, 2013 12:28 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" , Barry Moore > > > Subject: Re: [maker-devel] Maker: Re-annotation > > I upgraded to 2.28 and Maker is not running. Thanks! > > > On Wed, May 22, 2013 at 9:03 AM, Carson Holt wrote: >> Are you using MAKER version 2.10? I ask because there is in issue with >> other_gff in that version that has since been fixed. So if you don't get >> other_gff to pass-through, you will need to upgrade to 2.28 (release date is >> later today coincidentally). >> >> For the Augustus GFF3 file, the format is a little weird which is causing the >> problem. They are mRNA features not attached to genes. Rather than build >> the expected 3 level gene/mRNA/exon structure for these, it is simpler just >> to convert it to the 2 level match/match_part structure. Just convert the >> 'mRNA' tag to 'match' and all 'exon' tags to 'match_part'. Rename the GFF3 >> when your done so that it will force rebuild of the GFF3 database when you >> run again. >> >> Thanks, >> Carson >> >> >> >> From: Innocent Onsongo >> Date: Wednesday, 22 May, 2013 8:47 AM >> To: Barry Moore >> Cc: >> Subject: Re: [maker-devel] Maker: Re-annotation >> >> No. The MAKER produced GFF3 file does not contain any annotations. I even >> tried setting the keep_preds parameter to 1 (keep_preds=1) to see if it will >> pass annotations from the Augustus produced GFF file into the final >> annotation but that didn't work. I have attached the maker_opts.ctl file I >> used together with the first 100 lines of the GFF files it's using. I also >> include the GFF file produced by MAKER (CGS01058First100.gff) >> >> >> >> >> On Tue, May 21, 2013 at 10:43 PM, Barry Moore wrote: >>> Hi Getiria, >>> >>> Does the MAKER produced GFF3 file contain any annotations at all? Can you >>> send the first ~100 lines each of the MAKER produced GFF3 file and of the >>> GFF3 files that you passed via maker_opts.ctl? >>> >>> B >>> >>> On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: >>> >>>> Maker Development Team, >>>> >>>> I am trying to use Maker for re-annotation using gene predictions from >>>> Augustus. We had previously used Augustus for gene prediction but now want >>>> to combine these annotations with some EST data. I updated fields >>>> maker_opts.ctl as below >>>> >>>> genome=CGS01058.fasta #genome sequence file in fasta format >>>> est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT >>>> pred_gff=Augustus.gff3 #ab-initio predictions from >>>> other_gff=Promoters.gff3 #promoter annotations >>>> other_gff=CpG_Islands.gff3 # CpG island annotations >>>> >>>> Maker runs to completion and according to the log file annotation was >>>> successful. However, it also gives a "Segmentation fault (core dumped)" >>>> message. It does produce a GFF3 file but when I load the GFF3 file into IGV >>>> and look it does not contain any of the exon definitions in Augustus.gff3. >>>> Am I missing something? >>>> >>>> Regards, >>>> Getiria >>>> >>>> -- >>>> Getiria Onsongo, Ph.D. >>>> Informatics Analyst, Research Informatics Support System >>>> Minnesota Supercomputing Institute for Advanced Computational Research >>>> University of Minnesota >>>> Minneapolis, MN 55455 >>>> Phone: 612-624-0532 >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> Barry Moore >>> Research Scientist >>> Dept. of Human Genetics >>> University of Utah >>> Salt Lake City, UT 84112 >>> -------------------------------------------- >>> (801) 585-3543 >>> >>> >>> >>> >> >> >> >> -- >> Getiria Onsongo, Ph.D. >> Informatics Analyst, Research Informatics Support System >> Minnesota Supercomputing Institute for Advanced Computational Research >> University of Minnesota >> Minneapolis, MN 55455 >> Phone: 612-624-0532 > > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jun 5 09:47:51 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Jun 2013 10:47:51 -0400 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: Message-ID: Also, just a note, models are rejected if they have no protein or EST support. This is because ab inito predictors over predict (you may have 10 false positives for every true positive in some genomes for example). --Carson From: Carson Holt Date: Wednesday, 5 June, 2013 10:44 AM To: Innocent Onsongo , Carson Holt Cc: "maker-devel at yandell-lab.org" , Barry Moore Subject: Re: [maker-devel] Maker: accessory scripts All maker gene annotations will be of the format gene/mRNA/exon/CDS. Anything in the format match/match_part is an evidence alignment or rejected model and is there for reference purposes. If you want to upgrade all of the rejected loci to gene annotations, set keep_preds=1 in the control files. If you want to upgrade a subset of rejected models to a full annotation, create a list of IDs (one per line) then give them to the attached script. gff3_preds2models was previously deprecated and no longer part of the maker distribution, but the attached script is an updated version with the same functionality. --Carson From: Innocent Onsongo Date: Wednesday, 5 June, 2013 12:35 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" , Barry Moore Subject: [maker-devel] Maker: accessory scripts I was able to successfully ran Maker and now want to converts the gene prediction match/match_part format to annotation gene/mRNA/exon/CDS format. I looked at the tutorial and the script gff3_preds2models is supposed to do this conversion. How do I access this script. It is not in /maker/2.28-beta/bin/ Also, in running gff3_preds2models is the file I used for pred_gff=? Long story short, how do I transform the GFF output from Maker to the more traditional annotation of exon/intron? Thanks, Getiria _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From amelia.ireland at gmod.org Wed Jun 5 12:14:05 2013 From: amelia.ireland at gmod.org (Amelia Ireland) Date: Wed, 5 Jun 2013 10:14:05 -0700 Subject: [maker-devel] Apply now for the GMOD Summer School! Message-ID: Closing date for applications: 10 June July 19-23, 2013; NESCent, Durham, North Carolina http://gmod.org/wiki/2013_GMOD_Summer_School The 2013 GMOD Summer School is the best way to get to grips with GMOD in the Cloud, GMOD's suite of genomic and genetic software. Over five days, attendees will learn how to install, configure, and run popular GMOD software for visualization, storage, and dissemination of genetic and genomic data. The following software is covered: - Chado, a species-independent database schema covering many areas of genetic and genomic data; - GBrowse, the ubiquitous genome browser; - GBrowse syn, a synteny browser built on GBrowse; - Galaxy, analysis and computation pipeline; - JBrowse, genome browsing evolved; - MAKER, automated annotation pipeline; - Tripal, a slick web interface for displaying and editing data from Chado; and - WebApollo, distributed community genome annotation tool (built on JBrowse). There are additional sessions on setting up a GMOD in the Cloud virtual machine in the Amazon cloud, and common file formats. Courses are taught by members of the software development teams, and there are work sessions in the evenings for participants to talk to the developers or apply what they have been taught to their own data. For more information and to apply, visit http://gmod.org/wiki/2013_GMOD_Summer_School. There are some scholarship funds available for those from underrepresented minorities. All applications should be in by June 10th. If you have any questions, please contact the GMOD help desk at help at gmod.org. Hope to see you there! Thanks, Amelia Ireland GMOD Community Support http://gmod.org || @gmodproject -- Amelia Ireland GMOD Community Support http://gmod.org || @gmodproject -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnuhn at ebi.ac.uk Thu Jun 6 03:44:10 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Thu, 06 Jun 2013 09:44:10 +0100 Subject: [maker-devel] Effect of the unmask option Message-ID: <51B04BDA.7050307@ebi.ac.uk> Hello Carson! When running maker with the unmask option, how does maker use the predictions generated from running the gene predictors on the unmasked sequence? The tutorial says: "You do have the option to run ab initio gene predictors on both the masked and unmasked sequence if repeat masking worries you though. You do this by setting unmask:1 in the maker_opt.ctl configuration file." http://gmod.org/wiki/MAKER_Tutorial_2012 But in the sub get_non_overlaping_abinits in maker::auto_annotator (maker version 2.27) they are skipped: #only accept masked predictions unless I'm not masking or the predictor is genemark my $src = $g->{algorithm}; unless($src =~ /_masked$|^pred_gff/ || $CTL_OPT->{_no_mask} || $CTL_OPT->{predictor} eq 'genemark') { next; } Cheers, Michael. From carsonhh at gmail.com Thu Jun 6 08:55:08 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Jun 2013 09:55:08 -0400 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: Message-ID: One thing to keep in mind is the strandedness of the evidence and the model (they must be on the same strand). Further protein evidence is only valid support if it is in the same reading frame as the model. Could you send the full GFF3 for the contig (I need features and GFF3 internal fasta) and the coordinates of the region in question, and I can take a look? Also if you can, it would be good to let maker run Augustus as well with the species file rather than just passing in the GFF3. This is because MAKER can only talk to Augustus to generate competing hint based models if you provide the species. Thanks, Carson From: Innocent Onsongo Date: Wednesday, 5 June, 2013 1:10 PM To: Carson Holt Cc: Carson Holt , "maker-devel at yandell-lab.org" , Barry Moore Subject: Re: [maker-devel] Maker: accessory scripts I checked visually in IGV and there are some exons in the predicted model with protein and EST support but the maker output GFF only has match_part and protein_match in column 3. Does that mean Maker doesn't deem any of the evidence sufficient to make a gene model prediction? I guess I am somewhat surprised I am not getting any exons predicted by Maker. Is there a parameter I can alter to reduce the threshold at which Maker makes this call? I have attached the first 400 lines of one of my GFF files together with the control file (maker_opts.ctl) just in case they might be useful. Getiria On Wed, Jun 5, 2013 at 9:47 AM, Carson Holt wrote: > Also, just a note, models are rejected if they have no protein or EST support. > This is because ab inito predictors over predict (you may have 10 false > positives for every true positive in some genomes for example). > > --Carson > > > > From: Carson Holt > Date: Wednesday, 5 June, 2013 10:44 AM > To: Innocent Onsongo , Carson Holt > > Cc: "maker-devel at yandell-lab.org" , Barry Moore > > Subject: Re: [maker-devel] Maker: accessory scripts > > All maker gene annotations will be of the format gene/mRNA/exon/CDS. > Anything in the format match/match_part is an evidence alignment or rejected > model and is there for reference purposes. If you want to upgrade all of the > rejected loci to gene annotations, set keep_preds=1 in the control files. If > you want to upgrade a subset of rejected models to a full annotation, create a > list of IDs (one per line) then give them to the attached script. > gff3_preds2models was previously deprecated and no longer part of the maker > distribution, but the attached script is an updated version with the same > functionality. > > --Carson > > > > From: Innocent Onsongo > Date: Wednesday, 5 June, 2013 12:35 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" , Barry Moore > > Subject: [maker-devel] Maker: accessory scripts > > I was able to successfully ran Maker and now want to converts the gene > prediction match/match_part format to annotation gene/mRNA/exon/CDS format. I > looked at the tutorial and the script gff3_preds2models > is supposed to do this conversion. How do I access this script. It is not in > /maker/2.28-beta/bin/ > > Also, in running gff3_preds2models is the > file I used for pred_gff=? > > Long story short, how do I transform the GFF output from Maker to the more > traditional annotation of exon/intron? > > Thanks, > Getiria > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Wed Jun 5 12:10:01 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Wed, 5 Jun 2013 12:10:01 -0500 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: References: Message-ID: I checked visually in IGV and there are some exons in the predicted model with protein and EST support but the maker output GFF only has match_part and protein_match in column 3. Does that mean Maker doesn't deem any of the evidence sufficient to make a gene model prediction? I guess I am somewhat surprised I am not getting any exons predicted by Maker. Is there a parameter I can alter to reduce the threshold at which Maker makes this call? I have attached the first 400 lines of one of my GFF files together with the control file (maker_opts.ctl) just in case they might be useful. Getiria On Wed, Jun 5, 2013 at 9:47 AM, Carson Holt wrote: > Also, just a note, models are rejected if they have no protein or EST > support. This is because ab inito predictors over predict (you may have 10 > false positives for every true positive in some genomes for example). > > --Carson > > > > From: Carson Holt > Date: Wednesday, 5 June, 2013 10:44 AM > To: Innocent Onsongo , Carson Holt < > carson.holt at oicr.on.ca> > > Cc: "maker-devel at yandell-lab.org" , Barry > Moore > Subject: Re: [maker-devel] Maker: accessory scripts > > All maker gene annotations will be of the format gene/mRNA/exon/CDS. > Anything in the format match/match_part is an evidence alignment or > rejected model and is there for reference purposes. If you want to upgrade > all of the rejected loci to gene annotations, set keep_preds=1 in the > control files. If you want to upgrade a subset of rejected models to a > full annotation, create a list of IDs (one per line) then give them to the > attached script. gff3_preds2models was previously deprecated and no longer > part of the maker distribution, but the attached script is an updated > version with the same functionality. > > --Carson > > > > From: Innocent Onsongo > Date: Wednesday, 5 June, 2013 12:35 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" , Barry > Moore > Subject: [maker-devel] Maker: accessory scripts > > I was able to successfully ran Maker and now want to converts the gene > prediction match/match_part format to annotation gene/mRNA/exon/CDS format. > I looked at the tutorial and the script gff3_preds2models > is supposed to do this conversion. How do I access this script. It is not > in /maker/2.28-beta/bin/ > > Also, in running gff3_preds2models is list> the file I used for pred_gff=? > > Long story short, how do I transform the GFF output from Maker to the more > traditional annotation of exon/intron? > > Thanks, > Getiria > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4525 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: MakerFirst400.gff Type: application/octet-stream Size: 74870 bytes Desc: not available URL: From onson001 at umn.edu Thu Jun 6 13:58:21 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Thu, 6 Jun 2013 13:58:21 -0500 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: References: Message-ID: Thanks for the timely feedback Carson. I made a change to my pred_gff and est_gff GFF3 files and now I am getting results but I am not sure if the changes I made are valid. I want to make sure the changes I made did not lead Maker to behave in an unexpected way and lead to results that might be incorrect. In my pred_gff file, I replaced "mRNA" with "protein_match" and "exon" with "match" below are the first three lines of the old and new pred_gff files respectively ---------------old pred_gff ##gff-version 3 CGS00003 AUGUSTUS mRNA 1 10865 1 + . CGS00003 AUGUSTUS exon 2013 2050 . + 1 ---------------new pred_gff ##gff-version 3 CGS00003 AUGUSTUS protein_match 1 10865 1 + . CGS00003 AUGUSTUS match_part 2013 2050 . + 1 In my est_gff file, I replaced "mRNA" with "protein_match" and "exon" with "match" below are the first three lines of the old and new pred_gff files respectively ----------------old est_gff ##gff-version 3 CGS00003 EST_BLAT mRNA 4641336 4758501 6072 - . CGS00003 EST_BLAT exon 4641336 4641979 644 - . ----------------new est_gff CGS00003 EST_BLAT expressed_sequence_match 4641336 4758501 6072 - . CGS00003 EST_BLAT match_part 4641336 4641979 644 - . Are the changes I made valid? Thanks, Getiria On Wed, Jun 5, 2013 at 9:47 AM, Carson Holt wrote: > Also, just a note, models are rejected if they have no protein or EST > support. This is because ab inito predictors over predict (you may have 10 > false positives for every true positive in some genomes for example). > > --Carson > > > > From: Carson Holt > Date: Wednesday, 5 June, 2013 10:44 AM > To: Innocent Onsongo , Carson Holt < > carson.holt at oicr.on.ca> > > Cc: "maker-devel at yandell-lab.org" , Barry > Moore > Subject: Re: [maker-devel] Maker: accessory scripts > > All maker gene annotations will be of the format gene/mRNA/exon/CDS. > Anything in the format match/match_part is an evidence alignment or > rejected model and is there for reference purposes. If you want to upgrade > all of the rejected loci to gene annotations, set keep_preds=1 in the > control files. If you want to upgrade a subset of rejected models to a > full annotation, create a list of IDs (one per line) then give them to the > attached script. gff3_preds2models was previously deprecated and no longer > part of the maker distribution, but the attached script is an updated > version with the same functionality. > > --Carson > > > > From: Innocent Onsongo > Date: Wednesday, 5 June, 2013 12:35 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" , Barry > Moore > Subject: [maker-devel] Maker: accessory scripts > > I was able to successfully ran Maker and now want to converts the gene > prediction match/match_part format to annotation gene/mRNA/exon/CDS format. > I looked at the tutorial and the script gff3_preds2models > is supposed to do this conversion. How do I access this script. It is not > in /maker/2.28-beta/bin/ > > Also, in running gff3_preds2models is list> the file I used for pred_gff=? > > Long story short, how do I transform the GFF output from Maker to the more > traditional annotation of exon/intron? > > Thanks, > Getiria > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From benayoun at stanford.edu Fri Jun 7 12:17:47 2013 From: benayoun at stanford.edu (=?ISO-8859-1?Q?B=E9r=E9nice_Benayoun?=) Date: Fri, 7 Jun 2013 10:17:47 -0700 Subject: [maker-devel] Maker and mono-exonic genes ? Message-ID: Dear maker developers, I am currently annotating a de novo fish genome, and have started looking for genes of interest in particular in Maker's output to verify that it's outputting proper gene sets. While many of the genes I look for seem to be correctly annotated by the pipeline, I have noticed that important genes that do have strong evidentiary support but are monoexonic are NOT reported by maker. I am attaching a screenshot for the contig that I know should contain the * Foxl2* gene (notoriously monoexonic across evolution), and highlighted the corresponding evidence for it. Is there any setting I can give to maker to force it to output monoexonic genes ? I already set "single_exon=1" with no success. I attached my config file FYI. Thank you so much in advance for your answer !!! Best, Berenice. -- B?r?nice A. BENAYOUN, Ph.D. Stanford University/Genetics Department *BRUNET Laboratory*, 'Molecular Basis of Longevity and Age Related Diseases' M312 Alway Building 300, Pasteur Drive MC 5120 Stanford, CA 94305-5120 USA Email: benayoun at stanford.edu Web: www.stanford.edu/group/brunet/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Appolo_screenshot_missing_monoexonic_pred.pdf Type: application/pdf Size: 709436 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.log Type: application/octet-stream Size: 5154 bytes Desc: not available URL: From onson001 at umn.edu Fri Jun 7 15:08:43 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Fri, 7 Jun 2013 15:08:43 -0500 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: References: Message-ID: Carson, I have attached the full gff3 for the contig together with a screen shot from IGV with regions I was expecting Maker to make a consensus call. The region on question is CGS00003:5264784-5273457. I will greatly appreciate any insights. Thanks, Getiria On Thu, Jun 6, 2013 at 8:55 AM, Carson Holt wrote: > One thing to keep in mind is the strandedness of the evidence and the > model (they must be on the same strand). Further protein evidence is only > valid support if it is in the same reading frame as the model. > > Could you send the full GFF3 for the contig (I need features and GFF3 > internal fasta) and the coordinates of the region in question, and I can > take a look? Also if you can, it would be good to let maker run Augustus > as well with the species file rather than just passing in the GFF3. This > is because MAKER can only talk to Augustus to generate competing hint based > models if you provide the species. > > Thanks, > Carson > > > From: Innocent Onsongo > Date: Wednesday, 5 June, 2013 1:10 PM > To: Carson Holt > Cc: Carson Holt , "maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org>, Barry Moore > > Subject: Re: [maker-devel] Maker: accessory scripts > > I checked visually in IGV and there are some exons in the predicted model > with protein and EST support but the maker output GFF only has match_part > and protein_match in column 3. Does that mean Maker doesn't deem any of the > evidence sufficient to make a gene model prediction? > > I guess I am somewhat surprised I am not getting any exons predicted by > Maker. Is there a parameter I can alter to reduce the threshold at which > Maker makes this call? I have attached the first 400 lines of one of my GFF > files together with the control file (maker_opts.ctl) just in case they > might be useful. > > Getiria > > > On Wed, Jun 5, 2013 at 9:47 AM, Carson Holt wrote: > >> Also, just a note, models are rejected if they have no protein or EST >> support. This is because ab inito predictors over predict (you may have 10 >> false positives for every true positive in some genomes for example). >> >> --Carson >> >> >> >> From: Carson Holt >> Date: Wednesday, 5 June, 2013 10:44 AM >> To: Innocent Onsongo , Carson Holt < >> carson.holt at oicr.on.ca> >> >> Cc: "maker-devel at yandell-lab.org" , Barry >> Moore >> Subject: Re: [maker-devel] Maker: accessory scripts >> >> All maker gene annotations will be of the format gene/mRNA/exon/CDS. >> Anything in the format match/match_part is an evidence alignment or >> rejected model and is there for reference purposes. If you want to upgrade >> all of the rejected loci to gene annotations, set keep_preds=1 in the >> control files. If you want to upgrade a subset of rejected models to a >> full annotation, create a list of IDs (one per line) then give them to the >> attached script. gff3_preds2models was previously deprecated and no longer >> part of the maker distribution, but the attached script is an updated >> version with the same functionality. >> >> --Carson >> >> >> >> From: Innocent Onsongo >> Date: Wednesday, 5 June, 2013 12:35 PM >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" , Barry >> Moore >> Subject: [maker-devel] Maker: accessory scripts >> >> I was able to successfully ran Maker and now want to converts the gene >> prediction match/match_part format to annotation gene/mRNA/exon/CDS format. >> I looked at the tutorial and the script gff3_preds2models >> is supposed to do this conversion. How do I access this script. It is not >> in /maker/2.28-beta/bin/ >> >> Also, in running gff3_preds2models is > list> the file I used for pred_gff=? >> >> Long story short, how do I transform the GFF output from Maker to the >> more traditional annotation of exon/intron? >> >> Thanks, >> Getiria >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: CGS00003.gff Type: application/octet-stream Size: 11835535 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: CGS00003_5264784-5273457.pdf Type: application/pdf Size: 124264 bytes Desc: not available URL: From dence at genetics.utah.edu Fri Jun 7 16:32:57 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 7 Jun 2013 21:32:57 +0000 Subject: [maker-devel] Maker and mono-exonic genes ? In-Reply-To: References: Message-ID: Hi Berenice, Thank you for sending that screenshot and the maker_opts.log file. Those are exactly what we need to understand how to expect MAKER to perform. In looking at the screenshot, it doesn't look like any of the gene predictors gave a prediction in this region. Uses the predictions from ab-initio tools as a basis for models and considers models that are supported by evidence. It won't by default create a model when there isn't a prediction in the region. Can I ask which gene predictors you used and how they were trained? You might consider training one or more of them on the specific evidence that you expect to support these genes and then rerunning maker with the retrained predictors. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of B?r?nice Benayoun [benayoun at stanford.edu] Sent: Friday, June 07, 2013 11:17 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] Maker and mono-exonic genes ? Dear maker developers, I am currently annotating a de novo fish genome, and have started looking for genes of interest in particular in Maker's output to verify that it's outputting proper gene sets. While many of the genes I look for seem to be correctly annotated by the pipeline, I have noticed that important genes that do have strong evidentiary support but are monoexonic are NOT reported by maker. I am attaching a screenshot for the contig that I know should contain the Foxl2 gene (notoriously monoexonic across evolution), and highlighted the corresponding evidence for it. Is there any setting I can give to maker to force it to output monoexonic genes ? I already set "single_exon=1" with no success. I attached my config file FYI. Thank you so much in advance for your answer !!! Best, Berenice. -- B?r?nice A. BENAYOUN, Ph.D. Stanford University/Genetics Department BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' M312 Alway Building 300, Pasteur Drive MC 5120 Stanford, CA 94305-5120 USA Email: benayoun at stanford.edu Web: www.stanford.edu/group/brunet/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Fri Jun 7 16:58:16 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 7 Jun 2013 21:58:16 +0000 Subject: [maker-devel] Maker and mono-exonic genes ? In-Reply-To: References: , Message-ID: Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: berenice.benayoun at gmail.com [berenice.benayoun at gmail.com] on behalf of B?r?nice Benayoun [benayoun at stanford.edu] Sent: Friday, June 07, 2013 3:50 PM To: Daniel Ence Subject: Re: [maker-devel] Maker and mono-exonic genes ? Hi Daniel, Thanks for the quick answer ! I used SNAP, and trained from a hmm model made with the CEGMA output on my genome (240 gene models) plus a first run of maker of 1/3 of the genome. I tried GenemarkES and Augustus, but for some reason they don't run, so I stopped indicating their existence to maker. Should I do something in particular to train it "better" ? Is there any other predictor that would be worth running ? Thanks so much for your help ! Berenice 2013/6/7 Daniel Ence > Hi Berenice, Thank you for sending that screenshot and the maker_opts.log file. Those are exactly what we need to understand how to expect MAKER to perform. In looking at the screenshot, it doesn't look like any of the gene predictors gave a prediction in this region. Uses the predictions from ab-initio tools as a basis for models and considers models that are supported by evidence. It won't by default create a model when there isn't a prediction in the region. Can I ask which gene predictors you used and how they were trained? You might consider training one or more of them on the specific evidence that you expect to support these genes and then rerunning maker with the retrained predictors. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of B?r?nice Benayoun [benayoun at stanford.edu] Sent: Friday, June 07, 2013 11:17 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] Maker and mono-exonic genes ? Dear maker developers, I am currently annotating a de novo fish genome, and have started looking for genes of interest in particular in Maker's output to verify that it's outputting proper gene sets. While many of the genes I look for seem to be correctly annotated by the pipeline, I have noticed that important genes that do have strong evidentiary support but are monoexonic are NOT reported by maker. I am attaching a screenshot for the contig that I know should contain the Foxl2 gene (notoriously monoexonic across evolution), and highlighted the corresponding evidence for it. Is there any setting I can give to maker to force it to output monoexonic genes ? I already set "single_exon=1" with no success. I attached my config file FYI. Thank you so much in advance for your answer !!! Best, Berenice. -- B?r?nice A. BENAYOUN, Ph.D. Stanford University/Genetics Department BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' M312 Alway Building 300, Pasteur Drive MC 5120 Stanford, CA 94305-5120 USA Email: benayoun at stanford.edu Web: www.stanford.edu/group/brunet/ -- B?r?nice A. BENAYOUN, Ph.D. Stanford University/Genetics Department BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' M312 Alway Building 300, Pasteur Drive MC 5120 Stanford, CA 94305-5120 USA Email: benayoun at stanford.edu Web: www.stanford.edu/group/brunet/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Fri Jun 7 17:30:35 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri, 7 Jun 2013 16:30:35 -0600 Subject: [maker-devel] Maker and mono-exonic genes ? In-Reply-To: References: , Message-ID: <11A6EF4C-B82E-4851-80FC-B8668531E2EC@genetics.utah.edu> Hi Berenice, SNAP is a good gene predictor, but for most genomes Augustus can be more accurate - of course it is also harder to train. Running a first round of MAKER annotation with SNAP as the predictor and then training SNAP on the output from that run followed by a second MAKER run (runs pretty fast second time because all the blast jobs are reused) is a good way to start. Ultimately running Augustus as well (along with custom training) is probably worth it for a final annotation effort. The good thing is you can run these iterative cycles of annotation with minimal effort because MAKER will reuse an computations that have already run. B On Jun 7, 2013, at 3:58 PM, Daniel Ence wrote: > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > From: berenice.benayoun at gmail.com [berenice.benayoun at gmail.com] on behalf of B?r?nice Benayoun [benayoun at stanford.edu] > Sent: Friday, June 07, 2013 3:50 PM > To: Daniel Ence > Subject: Re: [maker-devel] Maker and mono-exonic genes ? > > Hi Daniel, > > Thanks for the quick answer ! > > I used SNAP, and trained from a hmm model made with the CEGMA output on my genome (240 gene models) plus a first run of maker of 1/3 of the genome. I tried GenemarkES and Augustus, but for some reason they don't run, so I stopped indicating their existence to maker. > > Should I do something in particular to train it "better" ? Is there any other predictor that would be worth running ? > > Thanks so much for your help ! > > Berenice > > 2013/6/7 Daniel Ence > Hi Berenice, Thank you for sending that screenshot and the maker_opts.log file. Those are exactly what we need to understand how to expect MAKER to perform. > > In looking at the screenshot, it doesn't look like any of the gene predictors gave a prediction in this region. Uses the predictions from ab-initio tools as a basis for models and considers models that are supported by evidence. It won't by default create a model when there isn't a prediction in the region. > > Can I ask which gene predictors you used and how they were trained? You might consider training one or more of them on the specific evidence that you expect to support these genes and then rerunning maker with the retrained predictors. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of B?r?nice Benayoun [benayoun at stanford.edu] > Sent: Friday, June 07, 2013 11:17 AM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] Maker and mono-exonic genes ? > > Dear maker developers, > > I am currently annotating a de novo fish genome, and have started looking for genes of interest in particular in Maker's output to verify that it's outputting proper gene sets. > > While many of the genes I look for seem to be correctly annotated by the pipeline, I have noticed that important genes that do have strong evidentiary support but are monoexonic are NOT reported by maker. > > I am attaching a screenshot for the contig that I know should contain the Foxl2 gene (notoriously monoexonic across evolution), and highlighted the corresponding evidence for it. > > Is there any setting I can give to maker to force it to output monoexonic genes ? I already set "single_exon=1" with no success. I attached my config file FYI. > > Thank you so much in advance for your answer !!! > > Best, > > Berenice. > -- > B?r?nice A. BENAYOUN, Ph.D. > Stanford University/Genetics Department > BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' > M312 Alway Building > 300, Pasteur Drive > MC 5120 > Stanford, CA 94305-5120 > USA > Email: benayoun at stanford.edu > Web: www.stanford.edu/group/brunet/ > > > > -- > B?r?nice A. BENAYOUN, Ph.D. > Stanford University/Genetics Department > BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' > M312 Alway Building > 300, Pasteur Drive > MC 5120 > Stanford, CA 94305-5120 > USA > Email: benayoun at stanford.edu > Web: www.stanford.edu/group/brunet/ > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jun 7 16:51:53 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 07 Jun 2013 16:51:53 -0500 Subject: [maker-devel] Effect of the unmask option In-Reply-To: <51B04BDA.7050307@ebi.ac.uk> Message-ID: The unmasked option allows the ab initio predictions ran on unmasked sequence to compete against other models, and only if they have a better AED score are they selected. They are not available for non-overlapping rejected models at the end of the run because that set is non-redundant and they tend to have a very high likelihood of being transposons themselves. So I don't let a repeat containing model override a non-repeat containing model unless there is evidence supporting it (there is noever evidence supporting the non-overlapping models). --Carson On 13-06-06 4:44 AM, "Michael Nuhn" wrote: >Hello Carson! > >When running maker with the unmask option, how does maker use the >predictions generated from running the gene predictors on the unmasked >sequence? > >The tutorial says: > >"You do have the option to run ab initio gene predictors on both the >masked and unmasked sequence if repeat masking worries you though. You >do this by setting unmask:1 in the maker_opt.ctl configuration file." > >http://gmod.org/wiki/MAKER_Tutorial_2012 > >But in the sub get_non_overlaping_abinits in maker::auto_annotator >(maker version 2.27) they are skipped: > >#only accept masked predictions unless I'm not masking or the predictor >is genemark >my $src = $g->{algorithm}; >unless($src =~ /_masked$|^pred_gff/ || $CTL_OPT->{_no_mask} || >$CTL_OPT->{predictor} eq 'genemark') { > next; >} > >Cheers, >Michael. > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri Jun 7 17:10:09 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 07 Jun 2013 17:10:09 -0500 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: Message-ID: You seem to be running this in a very odd way. First the GFF3 is not correctly formatted. There are lines containing score=1 (all by itself)? I believe this may be coming through because you are trying to pass in augustus predictions as GFF3 and that input is malformed. All of your Augustus models are also single exon genes, but they are very long and do not even correspond to proper ORFs. The EST evidence is spliced and is thus contradicting the augustus model (they don't support each other). If you want MAKER to be able to use the evidence as feedback for the model, you need to let MAKER run augustus. Otherwise it is only able to accept or reject the model from the GFF3 (nothing more ? no attempt at consensus). Perhaps if you supply you input dataset and control files we can help you get the best settings. You would need to provide the Augustus species set you are using as well (contained in a directory in ?/augustus/config/species). --Carson From: Innocent Onsongo Date: Friday, 7 June, 2013 2:08 PM To: Carson Holt Cc: Carson Holt , "maker-devel at yandell-lab.org" , Barry Moore Subject: Re: [maker-devel] Maker: accessory scripts Carson, I have attached the full gff3 for the contig together with a screen shot from IGV with regions I was expecting Maker to make a consensus call. The region on question is CGS00003:5264784-5273457. I will greatly appreciate any insights. Thanks, Getiria On Thu, Jun 6, 2013 at 8:55 AM, Carson Holt wrote: > One thing to keep in mind is the strandedness of the evidence and the model > (they must be on the same strand). Further protein evidence is only valid > support if it is in the same reading frame as the model. > > Could you send the full GFF3 for the contig (I need features and GFF3 internal > fasta) and the coordinates of the region in question, and I can take a look? > Also if you can, it would be good to let maker run Augustus as well with the > species file rather than just passing in the GFF3. This is because MAKER can > only talk to Augustus to generate competing hint based models if you provide > the species. > > Thanks, > Carson > > > From: Innocent Onsongo > Date: Wednesday, 5 June, 2013 1:10 PM > To: Carson Holt > Cc: Carson Holt , "maker-devel at yandell-lab.org" > , Barry Moore > > Subject: Re: [maker-devel] Maker: accessory scripts > > I checked visually in IGV and there are some exons in the predicted model with > protein and EST support but the maker output GFF only has match_part and > protein_match in column 3. Does that mean Maker doesn't deem any of the > evidence sufficient to make a gene model prediction? > > I guess I am somewhat surprised I am not getting any exons predicted by Maker. > Is there a parameter I can alter to reduce the threshold at which Maker makes > this call? I have attached the first 400 lines of one of my GFF files together > with the control file (maker_opts.ctl) just in case they might be useful. > > Getiria > > > On Wed, Jun 5, 2013 at 9:47 AM, Carson Holt wrote: >> Also, just a note, models are rejected if they have no protein or EST >> support. This is because ab inito predictors over predict (you may have 10 >> false positives for every true positive in some genomes for example). >> >> --Carson >> >> >> >> From: Carson Holt >> Date: Wednesday, 5 June, 2013 10:44 AM >> To: Innocent Onsongo , Carson Holt >> >> >> Cc: "maker-devel at yandell-lab.org" , Barry Moore >> >> Subject: Re: [maker-devel] Maker: accessory scripts >> >> All maker gene annotations will be of the format gene/mRNA/exon/CDS. >> Anything in the format match/match_part is an evidence alignment or rejected >> model and is there for reference purposes. If you want to upgrade all of the >> rejected loci to gene annotations, set keep_preds=1 in the control files. If >> you want to upgrade a subset of rejected models to a full annotation, create >> a list of IDs (one per line) then give them to the attached script. >> gff3_preds2models was previously deprecated and no longer part of the maker >> distribution, but the attached script is an updated version with the same >> functionality. >> >> --Carson >> >> >> >> From: Innocent Onsongo >> Date: Wednesday, 5 June, 2013 12:35 PM >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" , Barry Moore >> >> Subject: [maker-devel] Maker: accessory scripts >> >> I was able to successfully ran Maker and now want to converts the gene >> prediction match/match_part format to annotation gene/mRNA/exon/CDS format. I >> looked at the tutorial and the script gff3_preds2models >> is supposed to do this conversion. How do I access this script. It is not in >> /maker/2.28-beta/bin/ >> >> Also, in running gff3_preds2models is >> the file I used for pred_gff=? >> >> Long story short, how do I transform the GFF output from Maker to the more >> traditional annotation of exon/intron? >> >> Thanks, >> Getiria >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/ma >> ker-devel_yandell-lab.org > > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Fri Jun 7 21:29:50 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Fri, 7 Jun 2013 21:29:50 -0500 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: References: Message-ID: I appreciate the feedback. I will try letting MAKER run augustus instead of passing the Augustus predictions as GFF3. Thanks for all you help! Getiria On Fri, Jun 7, 2013 at 5:10 PM, Carson Holt wrote: > You seem to be running this in a very odd way. First the GFF3 is not > correctly formatted. There are lines containing score=1 (all by itself)? I > believe this may be coming through because you are trying to pass in > augustus predictions as GFF3 and that input is malformed. All of your > Augustus models are also single exon genes, but they are very long and do > not even correspond to proper ORFs. The EST evidence is spliced and is > thus contradicting the augustus model (they don't support each other). If > you want MAKER to be able to use the evidence as feedback for the model, > you need to let MAKER run augustus. Otherwise it is only able to accept or > reject the model from the GFF3 (nothing more ? no attempt at consensus). > > Perhaps if you supply you input dataset and control files we can help you > get the best settings. You would need to provide the Augustus species set > you are using as well (contained in a directory in > ?/augustus/config/species). > > --Carson > > > From: Innocent Onsongo > Date: Friday, 7 June, 2013 2:08 PM > > To: Carson Holt > Cc: Carson Holt , "maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org>, Barry Moore > Subject: Re: [maker-devel] Maker: accessory scripts > > Carson, > > I have attached the full gff3 for the contig together with a screen shot > from IGV with regions I was expecting Maker to make a consensus call. The > region on question is CGS00003:5264784-5273457. I will greatly appreciate > any insights. > > > Thanks, > > Getiria > > > > > On Thu, Jun 6, 2013 at 8:55 AM, Carson Holt wrote: > >> One thing to keep in mind is the strandedness of the evidence and the >> model (they must be on the same strand). Further protein evidence is only >> valid support if it is in the same reading frame as the model. >> >> Could you send the full GFF3 for the contig (I need features and GFF3 >> internal fasta) and the coordinates of the region in question, and I can >> take a look? Also if you can, it would be good to let maker run Augustus >> as well with the species file rather than just passing in the GFF3. This >> is because MAKER can only talk to Augustus to generate competing hint based >> models if you provide the species. >> >> Thanks, >> Carson >> >> >> From: Innocent Onsongo >> Date: Wednesday, 5 June, 2013 1:10 PM >> To: Carson Holt >> Cc: Carson Holt , "maker-devel at yandell-lab.org" < >> maker-devel at yandell-lab.org>, Barry Moore >> >> Subject: Re: [maker-devel] Maker: accessory scripts >> >> I checked visually in IGV and there are some exons in the predicted model >> with protein and EST support but the maker output GFF only has match_part >> and protein_match in column 3. Does that mean Maker doesn't deem any of the >> evidence sufficient to make a gene model prediction? >> >> I guess I am somewhat surprised I am not getting any exons predicted by >> Maker. Is there a parameter I can alter to reduce the threshold at which >> Maker makes this call? I have attached the first 400 lines of one of my GFF >> files together with the control file (maker_opts.ctl) just in case they >> might be useful. >> >> Getiria >> >> >> On Wed, Jun 5, 2013 at 9:47 AM, Carson Holt wrote: >> >>> Also, just a note, models are rejected if they have no protein or EST >>> support. This is because ab inito predictors over predict (you may have 10 >>> false positives for every true positive in some genomes for example). >>> >>> --Carson >>> >>> >>> >>> From: Carson Holt >>> Date: Wednesday, 5 June, 2013 10:44 AM >>> To: Innocent Onsongo , Carson Holt < >>> carson.holt at oicr.on.ca> >>> >>> Cc: "maker-devel at yandell-lab.org" , Barry >>> Moore >>> Subject: Re: [maker-devel] Maker: accessory scripts >>> >>> All maker gene annotations will be of the format gene/mRNA/exon/CDS. >>> Anything in the format match/match_part is an evidence alignment or >>> rejected model and is there for reference purposes. If you want to upgrade >>> all of the rejected loci to gene annotations, set keep_preds=1 in the >>> control files. If you want to upgrade a subset of rejected models to a >>> full annotation, create a list of IDs (one per line) then give them to the >>> attached script. gff3_preds2models was previously deprecated and no longer >>> part of the maker distribution, but the attached script is an updated >>> version with the same functionality. >>> >>> --Carson >>> >>> >>> >>> From: Innocent Onsongo >>> Date: Wednesday, 5 June, 2013 12:35 PM >>> To: Carson Holt >>> Cc: "maker-devel at yandell-lab.org" , Barry >>> Moore >>> Subject: [maker-devel] Maker: accessory scripts >>> >>> I was able to successfully ran Maker and now want to converts the gene >>> prediction match/match_part format to annotation gene/mRNA/exon/CDS format. >>> I looked at the tutorial and the script gff3_preds2models >>> is supposed to do this conversion. How do I access this script. It is >>> not in /maker/2.28-beta/bin/ >>> >>> Also, in running gff3_preds2models is >> list> the file I used for pred_gff=? >>> >>> Long story short, how do I transform the GFF output from Maker to the >>> more traditional annotation of exon/intron? >>> >>> Thanks, >>> Getiria >>> _______________________________________________ maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> >> >> >> -- >> Getiria Onsongo, Ph.D. >> Informatics Analyst, Research Informatics Support System >> Minnesota Supercomputing Institute for Advanced Computational Research >> University of Minnesota >> Minneapolis, MN 55455 >> Phone: 612-624-0532 >> > > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Jun 10 07:40:35 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Jun 2013 08:40:35 -0400 Subject: [maker-devel] Maker and mono-exonic genes ? In-Reply-To: Message-ID: One more note. The ESTs appear to be from multiple overlapping HSPs (based on red line pattern in image). I'd have to see the actual GFF3 to be sure, but if that is the case, then there probably isn't an ORF to work with at that location on that strand (so SNAP can't call it). Possibly the result of assembly error or a pseudogene. --Carson From: Daniel Ence Date: Friday, 7 June, 2013 5:32 PM To: B?r?nice Benayoun , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Maker and mono-exonic genes ? Hi Berenice, Thank you for sending that screenshot and the maker_opts.log file. Those are exactly what we need to understand how to expect MAKER to perform. In looking at the screenshot, it doesn't look like any of the gene predictors gave a prediction in this region. Uses the predictions from ab-initio tools as a basis for models and considers models that are supported by evidence. It won't by default create a model when there isn't a prediction in the region. Can I ask which gene predictors you used and how they were trained? You might consider training one or more of them on the specific evidence that you expect to support these genes and then rerunning maker with the retrained predictors. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of B?r?nice Benayoun [benayoun at stanford.edu] Sent: Friday, June 07, 2013 11:17 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] Maker and mono-exonic genes ? Dear maker developers, I am currently annotating a de novo fish genome, and have started looking for genes of interest in particular in Maker's output to verify that it's outputting proper gene sets. While many of the genes I look for seem to be correctly annotated by the pipeline, I have noticed that important genes that do have strong evidentiary support but are monoexonic are NOT reported by maker. I am attaching a screenshot for the contig that I know should contain the Foxl2 gene (notoriously monoexonic across evolution), and highlighted the corresponding evidence for it. Is there any setting I can give to maker to force it to output monoexonic genes ? I already set "single_exon=1" with no success. I attached my config file FYI. Thank you so much in advance for your answer !!! Best, Berenice. -- B?r?nice A. BENAYOUN, Ph.D. Stanford University/Genetics Department BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' M312 Alway Building 300, Pasteur Drive MC 5120 Stanford, CA 94305-5120 USA Email: benayoun at stanford.edu Web: www.stanford.edu/group/brunet/ _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michel.moser at ips.unibe.ch Mon Jun 10 08:03:06 2013 From: michel.moser at ips.unibe.ch (michel.moser at ips.unibe.ch) Date: Mon, 10 Jun 2013 13:03:06 +0000 Subject: [maker-devel] maker 2.28 blastx error Message-ID: Hello Maker-developper and user I am using maker for the first time to annotate some BAC-sequences. I successfully run both of the test-data sets provided in the maker tarball but when i run maker on my sequences and provide some EST-evidence from cufflinks, i get errors at repeat database blasting (see error below). As te_protein data set i just use the provided file in maker/data/. I sent the data to a colleague which could run it without problem using maker2.10. Or is the problem that i dont have wublast and RepBase installed? Any hint is highly appreciated! Thanks, Michel std.error STATUS: Parsing control files... WARNING: blast_type is set to 'wublast' but executables cannot be located The blast_type 'ncbi+' will be used instead. STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/insert-bac2_datastore To access files for individual sequences use the datastore index: /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/insert-bac2_master_datastore_index.log STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: bac2:383-131865 Length: 131482 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks doing repeat masking doing blastx repeats formating database... #--------- command -------------# Widget::formater: /usr/bin/makeblastdb -dbtype prot -in /tmp/maker_rcBcxr/0/blastprep/te_proteins%2Efasta.mpi.10.0 #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /usr/bin/blastx -db /tmp/maker_rcBcxr/te_proteins%2Efasta.mpi.10.0 -query /tmp/maker_rcBcxr/0/bac2%3A383-131865.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/insert-bac2_datastore/1D/F1/bac2%3A383-131865//theVoid.bac2%3A383-131865/0/bac2%3A383-131865.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner #-------------------------------# BLAST engine error: Warning: Sequence contains no data BLAST engine error: Warning: Sequence contains no data ERROR: BLASTX failed --> rank=NA, hostname=ipsktube ERROR: Failed while doing blastx repeats ERROR: Chunk failed at level:1, tier_type:1 FAILED CONTIG:bac2:383-131865 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:bac2:383-131865 examining contents of the fasta file and run log -------------- next part -------------- A non-text attachment was scrubbed... Name: test1.fasta Type: application/octet-stream Size: 14791 bytes Desc: test1.fasta URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_bopts.ctl Type: application/octet-stream Size: 1413 bytes Desc: maker_bopts.ctl URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_exe.ctl Type: application/octet-stream Size: 1201 bytes Desc: maker_exe.ctl URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4457 bytes Desc: maker_opts.ctl URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: protein.fasta Type: application/octet-stream Size: 452 bytes Desc: protein.fasta URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: insert-bac2.fasta Type: application/octet-stream Size: 131500 bytes Desc: insert-bac2.fasta URL: From anthony.bretaudeau at rennes.inra.fr Mon Jun 10 10:48:13 2013 From: anthony.bretaudeau at rennes.inra.fr (Anthony Bretaudeau) Date: Mon, 10 Jun 2013 17:48:13 +0200 Subject: [maker-devel] Patch for a bug with repeat gff Message-ID: <51B5F53D.90505@rennes.inra.fr> Hello, I am running Maker 2.27b on an insect genome, and I use a gff file containing some repeat positions (rm_gff option in maker_opts.ctl). I encountered an error on 10 scaffolds (the genome contains ~40000 scaffolds) : "substr outside of string" (similar to this post: http://gmod.827538.n3.nabble.com/substr-outside-of-string-td4031889.html). After a lot a debugging, it turns out the problem came from the code of "phathits_on_chunk" function in lib/GFFDB.pm, near line 539: there is a SQL query that fetches features that overlap with the border of the sequence chunk. The problem is that it also fetches features that are completely outside of the chunk in the same region. This produces an error when maker tries to mask the sequence as it does a substr outside the string. I fixed it by patching lib/repeat_mask_seq.pm, near line 138: I replaced: substr($$seq, $b -1 , $l, "$replace"x$l); By: if ($b < length($$seq)) { substr($$seq, $b -1 , $l, "$replace"x$l); } I don't know if there is a more elegant solution, but this seems to solve the problem. Cheers Anthony From carsonhh at gmail.com Mon Jun 10 11:13:50 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Jun 2013 12:13:50 -0400 Subject: [maker-devel] Patch for a bug with repeat gff In-Reply-To: <51B5F53D.90505@rennes.inra.fr> Message-ID: Could you use MAKER version 2.28 instead (launch with maker -a if it still fails). Thanks, Carson On 13-06-10 11:48 AM, "Anthony Bretaudeau" wrote: >Hello, >I am running Maker 2.27b on an insect genome, and I use a gff file >containing some repeat positions (rm_gff option in maker_opts.ctl). > >I encountered an error on 10 scaffolds (the genome contains ~40000 >scaffolds) : "substr outside of string" (similar to this post: >http://gmod.827538.n3.nabble.com/substr-outside-of-string-td4031889.html). > >After a lot a debugging, it turns out the problem came from the code of >"phathits_on_chunk" function in lib/GFFDB.pm, near line 539: there is a >SQL query that fetches features that overlap with the border of the >sequence chunk. >The problem is that it also fetches features that are completely outside >of the chunk in the same region. This produces an error when maker tries >to mask the sequence as it does a substr outside the string. > >I fixed it by patching lib/repeat_mask_seq.pm, near line 138: >I replaced: > substr($$seq, $b -1 , $l, "$replace"x$l); >By: > if ($b < length($$seq)) { > substr($$seq, $b -1 , $l, "$replace"x$l); > } > >I don't know if there is a more elegant solution, but this seems to >solve the problem. > >Cheers >Anthony > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From barry.moore at genetics.utah.edu Mon Jun 10 12:13:49 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Mon, 10 Jun 2013 11:13:49 -0600 Subject: [maker-devel] maker 2.28 blastx error In-Reply-To: References: Message-ID: <1618E393-D123-4D96-AD98-8DDFA9BCD9EF@genetics.utah.edu> Hi Michel, Yes wublast is the problem. On current versions of maker the opts file defaults to ncbi+, but older versions the opts file defaults to wublast. Just edit you maker_bopts.ctl file to have the line: blast_type=ncbi+ It seems like this option may have been in maker_opts.ctl in older files, so if you don't find it in bopts then look in opts. B On Jun 10, 2013, at 7:03 AM, wrote: > Hello Maker-developper and user > > I am using maker for the first time to annotate some BAC-sequences. > I successfully run both of the test-data sets provided in the maker tarball but when i run maker on my > sequences and provide some EST-evidence from cufflinks, i get errors at repeat database blasting (see error below). > As te_protein data set i just use the provided file in maker/data/. > > I sent the data to a colleague which could run it without problem using maker2.10. > Or is the problem that i dont have wublast and RepBase installed? > > Any hint is highly appreciated! > > Thanks, > Michel > > > std.error > > STATUS: Parsing control files... > WARNING: blast_type is set to 'wublast' but executables cannot be located > The blast_type 'ncbi+' will be used instead. > > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/insert-bac2_datastore > > To access files for individual sequences use the datastore index: > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/insert-bac2_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > #--------------------------------------------------------------------- > Now starting the contig!! > SeqID: bac2:383-131865 > Length: 131482 > #--------------------------------------------------------------------- > > > setting up GFF3 output and fasta chunks > doing repeat masking > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /usr/bin/makeblastdb -dbtype prot -in /tmp/maker_rcBcxr/0/blastprep/te_proteins%2Efasta.mpi.10.0 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /usr/bin/blastx -db /tmp/maker_rcBcxr/te_proteins%2Efasta.mpi.10.0 -query /tmp/maker_rcBcxr/0/bac2%3A383-131865.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/insert-bac2_datastore/1D/F1/bac2%3A383-131865//theVoid.bac2%3A383-131865/0/bac2%3A383-131865.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner > #-------------------------------# > BLAST engine error: Warning: Sequence contains no data > BLAST engine error: Warning: Sequence contains no data > ERROR: BLASTX failed > --> rank=NA, hostname=ipsktube > ERROR: Failed while doing blastx repeats > ERROR: Chunk failed at level:1, tier_type:1 > FAILED CONTIG:bac2:383-131865 > > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:bac2:383-131865 > > examining contents of the fasta file and run log > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Jun 10 12:32:55 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Jun 2013 13:32:55 -0400 Subject: [maker-devel] maker 2.28 blastx error In-Reply-To: <1618E393-D123-4D96-AD98-8DDFA9BCD9EF@genetics.utah.edu> Message-ID: It's actually a little more complicated than that. You are already using BLAST+. The sequence you are running on is apparently entirely masked, so there is nothing there to align. The error thrown by NCBI BLAST+ when this happens (currently "Sequence contains no data ") has changed slightly over time. As a result it causes MAKER to fail where wublast doesn't because the error it throws is still recognized, captured by MAKER, and ignored. You can probably ignore that contig, run with a different version of BLAST, or put the attached files in the ?/maker/lib/Widget/ directory. I fixed the check for the current message, so it will ignore the error (as long as the error is still going to STDERR and not STDOUT). --Carson From: Barry Moore Date: Monday, 10 June, 2013 1:13 PM To: Cc: Subject: Re: [maker-devel] maker 2.28 blastx error Hi Michel, Yes wublast is the problem. On current versions of maker the opts file defaults to ncbi+, but older versions the opts file defaults to wublast. Just edit you maker_bopts.ctl file to have the line: blast_type=ncbi+ It seems like this option may have been in maker_opts.ctl in older files, so if you don't find it in bopts then look in opts. B On Jun 10, 2013, at 7:03 AM, wrote: > Hello Maker-developper and user > > I am using maker for the first time to annotate some BAC-sequences. > I successfully run both of the test-data sets provided in the maker tarball > but when i run maker on my > sequences and provide some EST-evidence from cufflinks, i get errors at repeat > database blasting (see error below). > As te_protein data set i just use the provided file in maker/data/. > > I sent the data to a colleague which could run it without problem using > maker2.10. > Or is the problem that i dont have wublast and RepBase installed? > > Any hint is highly appreciated! > > Thanks, > Michel > > > std.error > > STATUS: Parsing control files... > WARNING: blast_type is set to 'wublast' but executables cannot be located > The blast_type 'ncbi+' will be used instead. > > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/inser > t-bac2_datastore > > To access files for individual sequences use the datastore index: > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/inser > t-bac2_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > #--------------------------------------------------------------------- > Now starting the contig!! > SeqID: bac2:383-131865 > Length: 131482 > #--------------------------------------------------------------------- > > > setting up GFF3 output and fasta chunks > doing repeat masking > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /usr/bin/makeblastdb -dbtype prot -in > /tmp/maker_rcBcxr/0/blastprep/te_proteins%2Efasta.mpi.10.0 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /usr/bin/blastx -db /tmp/maker_rcBcxr/te_proteins%2Efasta.mpi.10.0 -query > /tmp/maker_rcBcxr/0/bac2%3A383-131865.0 -num_alignments 10000 > -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 > -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/inser > t-bac2_datastore/1D/F1/bac2%3A383-131865//theVoid.bac2%3A383-131865/0/bac2%3A3 > 83-131865.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi. > 10.0.repeatrunner > #-------------------------------# > BLAST engine error: Warning: Sequence contains no data > BLAST engine error: Warning: Sequence contains no data > ERROR: BLASTX failed > --> rank=NA, hostname=ipsktube > ERROR: Failed while doing blastx repeats > ERROR: Chunk failed at level:1, tier_type:1 > FAILED CONTIG:bac2:383-131865 > > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:bac2:383-131865 > > examining contents of the fasta file and run log > > > > nsert-bac2.fasta>_______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blastn.pm Type: text/x-perl-script Size: 7441 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blastx.pm Type: text/x-perl-script Size: 7501 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: tblastx.pm Type: text/x-perl-script Size: 8363 bytes Desc: not available URL: From carsonhh at gmail.com Mon Jun 10 12:53:53 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Jun 2013 13:53:53 -0400 Subject: [maker-devel] maker 2.28 blastx error In-Reply-To: <1618E393-D123-4D96-AD98-8DDFA9BCD9EF@genetics.utah.edu> Message-ID: Never mind. It's even a little weirder than what I just explained. The contig named (bac2:383-131865) is triggering a behavior on the BioPerl indexer where it recognizes it as a region and not a contig. As a result it can't find the sequence, but also doesn't throw an error (results in an empty fasta). Solution: Just change the name of the contig. Try using 'bac2_383-131865' instread. --Carson From: Barry Moore Date: Monday, 10 June, 2013 1:13 PM To: Cc: Subject: Re: [maker-devel] maker 2.28 blastx error Hi Michel, Yes wublast is the problem. On current versions of maker the opts file defaults to ncbi+, but older versions the opts file defaults to wublast. Just edit you maker_bopts.ctl file to have the line: blast_type=ncbi+ It seems like this option may have been in maker_opts.ctl in older files, so if you don't find it in bopts then look in opts. B On Jun 10, 2013, at 7:03 AM, wrote: > Hello Maker-developper and user > > I am using maker for the first time to annotate some BAC-sequences. > I successfully run both of the test-data sets provided in the maker tarball > but when i run maker on my > sequences and provide some EST-evidence from cufflinks, i get errors at repeat > database blasting (see error below). > As te_protein data set i just use the provided file in maker/data/. > > I sent the data to a colleague which could run it without problem using > maker2.10. > Or is the problem that i dont have wublast and RepBase installed? > > Any hint is highly appreciated! > > Thanks, > Michel > > > std.error > > STATUS: Parsing control files... > WARNING: blast_type is set to 'wublast' but executables cannot be located > The blast_type 'ncbi+' will be used instead. > > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/inser > t-bac2_datastore > > To access files for individual sequences use the datastore index: > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/inser > t-bac2_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > #--------------------------------------------------------------------- > Now starting the contig!! > SeqID: bac2:383-131865 > Length: 131482 > #--------------------------------------------------------------------- > > > setting up GFF3 output and fasta chunks > doing repeat masking > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /usr/bin/makeblastdb -dbtype prot -in > /tmp/maker_rcBcxr/0/blastprep/te_proteins%2Efasta.mpi.10.0 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /usr/bin/blastx -db /tmp/maker_rcBcxr/te_proteins%2Efasta.mpi.10.0 -query > /tmp/maker_rcBcxr/0/bac2%3A383-131865.0 -num_alignments 10000 > -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 > -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/inser > t-bac2_datastore/1D/F1/bac2%3A383-131865//theVoid.bac2%3A383-131865/0/bac2%3A3 > 83-131865.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi. > 10.0.repeatrunner > #-------------------------------# > BLAST engine error: Warning: Sequence contains no data > BLAST engine error: Warning: Sequence contains no data > ERROR: BLASTX failed > --> rank=NA, hostname=ipsktube > ERROR: Failed while doing blastx repeats > ERROR: Chunk failed at level:1, tier_type:1 > FAILED CONTIG:bac2:383-131865 > > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:bac2:383-131865 > > examining contents of the fasta file and run log > > > > nsert-bac2.fasta>_______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From anthony.bretaudeau at rennes.inra.fr Tue Jun 11 10:03:42 2013 From: anthony.bretaudeau at rennes.inra.fr (Anthony Bretaudeau) Date: Tue, 11 Jun 2013 17:03:42 +0200 Subject: [maker-devel] Patch for a bug with repeat gff In-Reply-To: References: Message-ID: <51B73C4E.6030204@rennes.inra.fr> Hello, I have just tested with 2.28b: the problem is still there, and my fix works on this version too. Cheers Anthony On 10/06/2013 18:13, Carson Holt wrote: > Could you use MAKER version 2.28 instead (launch with maker -a if it still > fails). > > Thanks, > Carson > > > > On 13-06-10 11:48 AM, "Anthony Bretaudeau" > wrote: > >> Hello, >> I am running Maker 2.27b on an insect genome, and I use a gff file >> containing some repeat positions (rm_gff option in maker_opts.ctl). >> >> I encountered an error on 10 scaffolds (the genome contains ~40000 >> scaffolds) : "substr outside of string" (similar to this post: >> http://gmod.827538.n3.nabble.com/substr-outside-of-string-td4031889.html). >> >> After a lot a debugging, it turns out the problem came from the code of >> "phathits_on_chunk" function in lib/GFFDB.pm, near line 539: there is a >> SQL query that fetches features that overlap with the border of the >> sequence chunk. >> The problem is that it also fetches features that are completely outside >> of the chunk in the same region. This produces an error when maker tries >> to mask the sequence as it does a substr outside the string. >> >> I fixed it by patching lib/repeat_mask_seq.pm, near line 138: >> I replaced: >> substr($$seq, $b -1 , $l, "$replace"x$l); >> By: >> if ($b < length($$seq)) { >> substr($$seq, $b -1 , $l, "$replace"x$l); >> } >> >> I don't know if there is a more elegant solution, but this seems to >> solve the problem. >> >> Cheers >> Anthony >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From carsonhh at gmail.com Tue Jun 11 10:06:10 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 11 Jun 2013 11:06:10 -0400 Subject: [maker-devel] Patch for a bug with repeat gff In-Reply-To: <51B73C4E.6030204@rennes.inra.fr> Message-ID: Could you send me your repeat_gff and genome fasta, so I can take a look. Thanks, Carson On 13-06-11 11:03 AM, "Anthony Bretaudeau" wrote: >Hello, >I have just tested with 2.28b: the problem is still there, and my fix >works on this version too. >Cheers >Anthony > >On 10/06/2013 18:13, Carson Holt wrote: >> Could you use MAKER version 2.28 instead (launch with maker -a if it >>still >> fails). >> >> Thanks, >> Carson >> >> >> >> On 13-06-10 11:48 AM, "Anthony Bretaudeau" >> wrote: >> >>> Hello, >>> I am running Maker 2.27b on an insect genome, and I use a gff file >>> containing some repeat positions (rm_gff option in maker_opts.ctl). >>> >>> I encountered an error on 10 scaffolds (the genome contains ~40000 >>> scaffolds) : "substr outside of string" (similar to this post: >>> >>>http://gmod.827538.n3.nabble.com/substr-outside-of-string-td4031889.html >>>). >>> >>> After a lot a debugging, it turns out the problem came from the code of >>> "phathits_on_chunk" function in lib/GFFDB.pm, near line 539: there is a >>> SQL query that fetches features that overlap with the border of the >>> sequence chunk. >>> The problem is that it also fetches features that are completely >>>outside >>> of the chunk in the same region. This produces an error when maker >>>tries >>> to mask the sequence as it does a substr outside the string. >>> >>> I fixed it by patching lib/repeat_mask_seq.pm, near line 138: >>> I replaced: >>> substr($$seq, $b -1 , $l, "$replace"x$l); >>> By: >>> if ($b < length($$seq)) { >>> substr($$seq, $b -1 , $l, "$replace"x$l); >>> } >>> >>> I don't know if there is a more elegant solution, but this seems to >>> solve the problem. >>> >>> Cheers >>> Anthony >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > From anthony.bretaudeau at rennes.inra.fr Wed Jun 12 08:29:14 2013 From: anthony.bretaudeau at rennes.inra.fr (Anthony Bretaudeau) Date: Wed, 12 Jun 2013 15:29:14 +0200 Subject: [maker-devel] Patch for a bug with repeat gff In-Reply-To: References: Message-ID: <51B877AA.8060803@rennes.inra.fr> Hi, Here is a minimal gff file that allows to reproduce the bug. It should work with any fasta (my real data is not yet published, I can't share it publicly yet). Tell me if you need more info Anthony On 11/06/2013 17:06, Carson Holt wrote: > Could you send me your repeat_gff and genome fasta, so I can take a look. > > Thanks, > Carson > > > > On 13-06-11 11:03 AM, "Anthony Bretaudeau" > wrote: > >> Hello, >> I have just tested with 2.28b: the problem is still there, and my fix >> works on this version too. >> Cheers >> Anthony >> >> On 10/06/2013 18:13, Carson Holt wrote: >>> Could you use MAKER version 2.28 instead (launch with maker -a if it >>> still >>> fails). >>> >>> Thanks, >>> Carson >>> >>> >>> >>> On 13-06-10 11:48 AM, "Anthony Bretaudeau" >>> wrote: >>> >>>> Hello, >>>> I am running Maker 2.27b on an insect genome, and I use a gff file >>>> containing some repeat positions (rm_gff option in maker_opts.ctl). >>>> >>>> I encountered an error on 10 scaffolds (the genome contains ~40000 >>>> scaffolds) : "substr outside of string" (similar to this post: >>>> >>>> http://gmod.827538.n3.nabble.com/substr-outside-of-string-td4031889.html >>>> ). >>>> >>>> After a lot a debugging, it turns out the problem came from the code of >>>> "phathits_on_chunk" function in lib/GFFDB.pm, near line 539: there is a >>>> SQL query that fetches features that overlap with the border of the >>>> sequence chunk. >>>> The problem is that it also fetches features that are completely >>>> outside >>>> of the chunk in the same region. This produces an error when maker >>>> tries >>>> to mask the sequence as it does a substr outside the string. >>>> >>>> I fixed it by patching lib/repeat_mask_seq.pm, near line 138: >>>> I replaced: >>>> substr($$seq, $b -1 , $l, "$replace"x$l); >>>> By: >>>> if ($b < length($$seq)) { >>>> substr($$seq, $b -1 , $l, "$replace"x$l); >>>> } >>>> >>>> I don't know if there is a more elegant solution, but this seems to >>>> solve the problem. >>>> >>>> Cheers >>>> Anthony >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- scaffold_20 TEs match 199889 203598 0.0 + . ID=some_id_1 scaffold_20 TEs match_part 199889 200163 2.6e-12 + . ID=part_1;Parent=some_id_1 scaffold_20 TEs match_part 203256 203598 2.6e-12 + . ID=part_2;Parent=some_id_1 From sickler.alex at gmail.com Wed Jun 12 13:22:17 2013 From: sickler.alex at gmail.com (Alex Sickler) Date: Wed, 12 Jun 2013 14:22:17 -0400 Subject: [maker-devel] Problem Installing with opencc Message-ID: Hi all, I am trying to install Maker 2.28. When I go to install Maker, it gives the following error message: /usr/bin/perl /usr/local/share/perl5/ExtUtils/xsubpp -typemap "/usr/share/perl5/ExtUtils/typemap" MPI.xs $ /share/apps/openmpi/OpenMPI-1.6.3/bin/mpicc -c -I"/share/apps/maker/src" -I/share/apps/openmpi/OpenMPI-1.6.3/include -D_REENTRANT -D_GNU_SOUR$ opencc WARNING: unknown flag: -fstack-protector opencc WARNING: unknown flag: -fstack-protector opencc ERROR: -- not allowed in non XPG4 environment opencc ERROR parsing --param=ssp-buffer-size=4: unknown flag make: *** [MPI.o] Error 2 The to everything is correct. I tried looking in the Makefile.PL but could not find the "param=" option. Any help is greatly appreciated, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jun 13 14:38:52 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 13 Jun 2013 15:38:52 -0400 Subject: [maker-devel] Problem Installing with opencc In-Reply-To: Message-ID: MAKER installation doesn't have a Makefile.PL. The parameters for compilation of the MPI bindings are being set by mpicc itself, Perl, or environmental variables on your system. In general you want both Perl and OpenMPI to be compiled by the same compiler or you can get cross library problems (as Perl is using the shared libraries in OpenMPI so all communication is really at the C level). This is not always the case, but can happen (I have been fine for the most part mixing pgi, intel, and gcc compiled OpenMPI, but have never tried open64 compilers). Alternatively you can try manually setting the values in the following environmental variables before installing MAKER which should affect the parameter settings (this means before even running the 'perl Build.PL' step): LDFLAGS= LDDLFLAGS= CCCDLFLAGS= CCDLFLAGS= Also you need to export the following variable for OpenMPI to work with shared libraries before trying to install MAKER or run MAKER (this means before even running the 'perl Build.PL' step). It's best just to add it to your ~/.bashrc or ~/.bash_profile. export LD_PRELOAD=/share/apps/openmpi/OpenMPI-1.6.3/lib/libmpi.so You will need to run 'source ~/.bashrc' or 'source ~/.bash_profile' after adding it to implement the changes into the current terminal session. Thanks, Carson From: Alex Sickler Date: Wednesday, 12 June, 2013 2:22 PM To: Cc: Subject: [maker-devel] Problem Installing with opencc Hi all, I am trying to install Maker 2.28. When I go to install Maker, it gives the following error message: /usr/bin/perl /usr/local/share/perl5/ExtUtils/xsubpp -typemap "/usr/share/perl5/ExtUtils/typemap" MPI.xs $ /share/apps/openmpi/OpenMPI-1.6.3/bin/mpicc -c -I"/share/apps/maker/src" -I/share/apps/openmpi/OpenMPI-1.6.3/include -D_REENTRANT -D_GNU_SOUR$ opencc WARNING: unknown flag: -fstack-protector opencc WARNING: unknown flag: -fstack-protector opencc ERROR: -- not allowed in non XPG4 environment opencc ERROR parsing --param=ssp-buffer-size=4: unknown flag make: *** [MPI.o] Error 2 The to everything is correct. I tried looking in the Makefile.PL but could not find the "param=" option. Any help is greatly appreciated, Alex _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun Jun 16 14:46:51 2013 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 16 Jun 2013 15:46:51 -0400 Subject: [maker-devel] Patch for a bug with repeat gff In-Reply-To: <51B877AA.8060803@rennes.inra.fr> Message-ID: Thanks for the detailed report and test files. The problem initiates with your GFF3 giving a repeat structure that is a spliced repeat. I don't know if such a thing can really occur, but regardless maker doesn't expect them to occur, and as a result when assembled some of the spliced exons run off the edge of the sequence. The script currently checks for repeats where the end of a repeat runs off the edge and adjusts accordingly, but does not check for a start that runs off the edge (because it's not expecting spliced repeats). The result is the substring outside of string error. I added 'next if($l <=0)' to both the _soft_mask_seq and _hard_mask_seq functions, and hopefully having spliced repeats won't cause other hidden errors elsewhere downstream, but you may need to be aware of the possibility. Thanks, Carson On 13-06-12 9:29 AM, "Anthony Bretaudeau" wrote: >Hi, >Here is a minimal gff file that allows to reproduce the bug. It should >work with any fasta (my real data is not yet published, I can't share it >publicly yet). >Tell me if you need more info >Anthony > >On 11/06/2013 17:06, Carson Holt wrote: >> Could you send me your repeat_gff and genome fasta, so I can take a >>look. >> >> Thanks, >> Carson >> >> >> >> On 13-06-11 11:03 AM, "Anthony Bretaudeau" >> wrote: >> >>> Hello, >>> I have just tested with 2.28b: the problem is still there, and my fix >>> works on this version too. >>> Cheers >>> Anthony >>> >>> On 10/06/2013 18:13, Carson Holt wrote: >>>> Could you use MAKER version 2.28 instead (launch with maker -a if it >>>> still >>>> fails). >>>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> >>>> On 13-06-10 11:48 AM, "Anthony Bretaudeau" >>>> wrote: >>>> >>>>> Hello, >>>>> I am running Maker 2.27b on an insect genome, and I use a gff file >>>>> containing some repeat positions (rm_gff option in maker_opts.ctl). >>>>> >>>>> I encountered an error on 10 scaffolds (the genome contains ~40000 >>>>> scaffolds) : "substr outside of string" (similar to this post: >>>>> >>>>> >>>>>http://gmod.827538.n3.nabble.com/substr-outside-of-string-td4031889.ht >>>>>ml >>>>> ). >>>>> >>>>> After a lot a debugging, it turns out the problem came from the code >>>>>of >>>>> "phathits_on_chunk" function in lib/GFFDB.pm, near line 539: there >>>>>is a >>>>> SQL query that fetches features that overlap with the border of the >>>>> sequence chunk. >>>>> The problem is that it also fetches features that are completely >>>>> outside >>>>> of the chunk in the same region. This produces an error when maker >>>>> tries >>>>> to mask the sequence as it does a substr outside the string. >>>>> >>>>> I fixed it by patching lib/repeat_mask_seq.pm, near line 138: >>>>> I replaced: >>>>> substr($$seq, $b -1 , $l, "$replace"x$l); >>>>> By: >>>>> if ($b < length($$seq)) { >>>>> substr($$seq, $b -1 , $l, "$replace"x$l); >>>>> } >>>>> >>>>> I don't know if there is a more elegant solution, but this seems to >>>>> solve the problem. >>>>> >>>>> Cheers >>>>> Anthony >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> >>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>g >> > From jmdoyle at purdue.edu Mon Jun 17 12:20:42 2013 From: jmdoyle at purdue.edu (Jacqueline R M Doyle) Date: Mon, 17 Jun 2013 13:20:42 -0400 (EDT) Subject: [maker-devel] altest without MPI? Message-ID: <1755059295.37969.1371489642806.JavaMail.root@mailhub042.itcs.purdue.edu> Hi! I am beginning my first MAKER annotation and had a quick question. I am currently planning on following the ?Training ab initio Gene Predictors? section of the MAKER 2012 tutorial. For my species of interest, I have 784290 scaffolds in which 80% are greater than 100 kb. I have EST data from a closely related species and was also going to use the core cegma protein sequences. With this in mind, I made the following changes to my maker_opts file: genome=scaffolds.fasta altest=Trinity.fasta protein=cegma.fa est2genome=1 cpus=48 My primary concern is that this is going to take a long time to run with altest, even with the extra cpus for BLAST. The software was not originally installed on our computer cluster with MPICH2, but I may be able to talk our computer guys into reinstalling if the situation is going to be completely untenable without MPI. I guess my question is, is there any point in trying to run the above without MPI? Is there a good way to monitor the progress of such a run if I was to give it a shot? Thanks for your help with this! Jackie From carsonhh at gmail.com Mon Jun 17 15:12:58 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 17 Jun 2013 16:12:58 -0400 Subject: [maker-devel] altest without MPI? In-Reply-To: <1755059295.37969.1371489642806.JavaMail.root@mailhub042.itcs.purdue.edu> Message-ID: It's best to use the cegma results with the cegma2zff script to generate a training set for SNAP. Then don't use the cegma proteins. If you can get proteins from a related species with an annotated genome, it will be better than altest option for a different species. This is because altest is aligned via tbalstx which is 3-4 time slower than protein alignments. Also they will rarely be good enough to produce many est2genome models (best to only use them if you have nothing else). The cpus= option is a blast parameter for specifying how many cpus to give to each blast job. It is not an MPI parameter. The number of cpus for MPI is specified using the -n option from mpiexec and not in the maker control files. You don't have to use MPI. You can also split your contigs up into separate jobs and run MAKER multiple times. Use the fasta_tool script that comes with MAKER to split your input file up. Let us know if you come across anything you have more questions on. Thanks, Carson On 13-06-17 1:20 PM, "Jacqueline R M Doyle" wrote: >Hi! > >I am beginning my first MAKER annotation and had a quick question. I am >currently planning on following the ?Training ab initio Gene Predictors? >section of the MAKER 2012 tutorial. For my species of interest, I have >784290 scaffolds in which 80% are greater than 100 kb. I have EST data >from a closely related species and was also going to use the core cegma >protein sequences. With this in mind, I made the following changes to my >maker_opts file: > >genome=scaffolds.fasta >altest=Trinity.fasta >protein=cegma.fa >est2genome=1 >cpus=48 > >My primary concern is that this is going to take a long time to run with >altest, even with the extra cpus for BLAST. The software was not >originally installed on our computer cluster with MPICH2, but I may be >able to talk our computer guys into reinstalling if the situation is >going to be completely untenable without MPI. I guess my question is, is >there any point in trying to run the above without MPI? Is there a good >way to monitor the progress of such a run if I was to give it a shot? > >Thanks for your help with this! > >Jackie > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Jun 19 20:05:49 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 19 Jun 2013 21:05:49 -0400 Subject: [maker-devel] altest without MPI? In-Reply-To: <1997335285.43753.1371676376399.JavaMail.root@mailhub042.itcs.purdue.edu> Message-ID: The throughput is based on contig length, so long contigs will take longer than short contigs. Any contig less than 10kb is mostly useless for annotation purposes (so you can filter those from your 800,000 right away). Take your contigs that finish, and sum up their length to get a better estimate of how long it will take to complete running. Most genomes can complete in a few days an a multi-core machine. Bigger genomes or bigger datasets take longer. (note that altest evidence takes 3-4x longer to align than proteins). The advantage of proteins is that the species do not have to be closely related. Nucleotide sequence diverges quickly and proteins slowly (that's why proteins are used for phylogenetic trees). A good strategy would be to get ~10Mb of sequence (use your longest contigs). Run with Chicken, turkey, and pigeon proteins. Use the protein2genome option to generate annotations. Those annotations should now be sufficient to train SNAP and Augustus. Then you can finish by running all your contigs with the same dataset (protein2genome now turned off), use the newly trained snap and augustus files along with any altest files you want to use. Note that the size of the dataset will determine the total run time. To get things to run faster, you can also run on your university's computer cluster (then you will have hundreds of cpus available to you). The purdue cluster supports MPI and with 30-50 cpus you could annotate even large genomes in a reasonable time. Alternatively you can request a startup account at XSEDE, an NFS funded computer resource open to all US institutions. A startup allocation with 50,000 cpu hours only takes 2 weeks to approve. You should request an allocation on the Lonestar cluster if you go that route, it has 64,000 cpus. I was able to annotate the Maize genome (which is a very large genome at over 2 gigabases). I used an abnormally large EST and protein datasets (~4 gigabases of evidence which is much more than a normal annotation job), and it completed in under 3 hours on 2,100 cpus. --Carson On 13-06-19 5:12 PM, "Jacqueline R M Doyle" wrote: >Hi Carson (and whoever else might be reading this!) > >Thanks so much, I think splitting the files up using fasta_tool will >definitely move things along. I did a trial version with altest this >weekend, and seemed to be averaging about an hour a scaffold (with 1 >cpu). I'm a little concerned, as we have ~800,000 scaffolds. Does this >seem like a reasonable estimate of the time it should take to annotate >one sequence? Could I be missing something in my maker_opts file? > >Let me back up for just a minute and describe the project a little more >generally. As I mentioned before, we have no protein sequences or ESTs >for our species of interest, which is an avian species. I could >potentially use proteins from chicken or turkey, but neither is closely >related to our species. Time is a bit of an issue... do you have any >thoughts on how much time per scaffold it should take to annotate using >protein2genome? If chicken and turkey are not closely related, is it >worth the time investment? > >Let me finish by saying I think MAKER is wonderful, and I really >appreciate the discussions on this group. > >Best wishes, Jackie From jjin01 at mail.rockefeller.edu Thu Jun 20 15:22:22 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Thu, 20 Jun 2013 20:22:22 +0000 Subject: [maker-devel] maker exon result Message-ID: Dear all, I have used maker to predict the gene model in my draft genome. However, when I check the sequence for each exon, I find some of them just have start codon, without stop codon. Is it reasonable for this? Like in this example: processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9 processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0|0|1|0.14|0.12|8|0|362 processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . + 2 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG This is the sequence for this gene, the red color is for the first exon?? However, for this exon, I cannot found the stop codon??? I also find for some exon, there are several stop codon in one exon??? Does anyone have the same problem with me? Or there is something wrong when I configure the maker file?? Thanks! Jingjing -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Thu Jun 20 18:06:29 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 20 Jun 2013 23:06:29 +0000 Subject: [maker-devel] maker exon result In-Reply-To: References: Message-ID: Hi Jingjing, It's really hard to find the stop codon in the nucleotide sequence that you sent. I think most people determine the presence of a stop codon in a gene by viewing the annotations and sequence in some kind of viewer. The one that I use the most is Apollo, but many people also like gbrowse and igv. When you view gene models in Apollo, the start codons are highlighted in green and the stop codons are highlighted in red. Sometimes MAKER couldn't find the stop or start codon for a gene, and in those cases, the end of the gene model is marked with an orange arrow. I hope that I understood your question. Feel free to reply back on the mailing list if I didn't. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing Jin [jjin01 at mail.rockefeller.edu] Sent: Thursday, June 20, 2013 2:22 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker exon result Dear all, I have used maker to predict the gene model in my draft genome. However, when I check the sequence for each exon, I find some of them just have start codon, without stop codon. Is it reasonable for this? Like in this example: processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9 processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0|0|1|0.14|0.12|8|0|362 processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . + 2 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG This is the sequence for this gene, the red color is for the first exon?? However, for this exon, I cannot found the stop codon??? I also find for some exon, there are several stop codon in one exon??? Does anyone have the same problem with me? Or there is something wrong when I configure the maker file?? Thanks! Jingjing -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Thu Jun 20 18:11:56 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu, 20 Jun 2013 17:11:56 -0600 Subject: [maker-devel] maker exon result In-Reply-To: References: Message-ID: <6312A919-6E3A-43F5-A553-5947204FC6DB@genetics.utah.edu> To add to what Daniel suggested if you want to find the stop codon for this gene, look at the last three nucleotides of the last CDS. B On Jun 20, 2013, at 5:06 PM, Daniel Ence wrote: > Hi Jingjing, > > It's really hard to find the stop codon in the nucleotide sequence that you sent. I think most people determine the presence of a stop codon in a gene by viewing the annotations and sequence in some kind of viewer. The one that I use the most is Apollo, but many people also like gbrowse and igv. > > When you view gene models in Apollo, the start codons are highlighted in green and the stop codons are highlighted in red. Sometimes MAKER couldn't find the stop or start codon for a gene, and in those cases, the end of the gene model is marked with an orange arrow. > > I hope that I understood your question. Feel free to reply back on the mailing list if I didn't. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing Jin [jjin01 at mail.rockefeller.edu] > Sent: Thursday, June 20, 2013 2:22 PM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] maker exon result > > Dear all, > > I have used maker to predict the gene model in my draft genome. > > However, when I check the sequence for each exon, I find some of them just have start codon, without stop codon. > > Is it reasonable for this? > > Like in this example: > > processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9 > processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0|0|1|0.14|0.12|8|0|362 > processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . + 2 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > > ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT > CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT > AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG > ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA > TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC > TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG > TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG > ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT > TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT > TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC > TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC > AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT > TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT > AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC > TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA > AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT > TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG > CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC > GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT > TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG > AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA > AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA > TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC > TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC > TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG > TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA > TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC > TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA > ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT > GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA > AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA > GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT > TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT > TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG > AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC > TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT > TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT > AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC > TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA > GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG > GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA > TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA > TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT > TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG > CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA > GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG > AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT > ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT > TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA > CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA > CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT > ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA > TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG > AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG > > > This is the sequence for this gene, the red color is for the first exon?? > > However, for this exon, I cannot found the stop codon??? > > I also find for some exon, there are several stop codon in one exon??? > > Does anyone have the same problem with me? > Or there is something wrong when I configure the maker file?? > > Thanks! > > Jingjing > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Thu Jun 20 19:18:18 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Fri, 21 Jun 2013 00:18:18 +0000 Subject: [maker-devel] maker exon result In-Reply-To: References: , Message-ID: For my understanding, the prediction gene model should be connect different exon together. For each exon of a gene, I think it should have a start codon and stop codon. However, it may be wrong. However, when I check some gene model from maker prediction, some exon of one gene, I cannot find stop codon for it. Like the example I give, the red color is the first exon. However, the last 3 NT is not a stop codon. Even for last 3 NT for last exon, it is also not a stop codon. Is it reasonable? Thanks! Jingjing ________________________________ From: Daniel Ence [dence at genetics.utah.edu] Sent: Thursday, June 20, 2013 7:06 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: RE: maker exon result Hi Jingjing, It's really hard to find the stop codon in the nucleotide sequence that you sent. I think most people determine the presence of a stop codon in a gene by viewing the annotations and sequence in some kind of viewer. The one that I use the most is Apollo, but many people also like gbrowse and igv. When you view gene models in Apollo, the start codons are highlighted in green and the stop codons are highlighted in red. Sometimes MAKER couldn't find the stop or start codon for a gene, and in those cases, the end of the gene model is marked with an orange arrow. I hope that I understood your question. Feel free to reply back on the mailing list if I didn't. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing Jin [jjin01 at mail.rockefeller.edu] Sent: Thursday, June 20, 2013 2:22 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker exon result Dear all, I have used maker to predict the gene model in my draft genome. However, when I check the sequence for each exon, I find some of them just have start codon, without stop codon. Is it reasonable for this? Like in this example: processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9 processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0|0|1|0.14|0.12|8|0|362 processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . + 2 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG This is the sequence for this gene, the red color is for the first exon?? However, for this exon, I cannot found the stop codon??? I also find for some exon, there are several stop codon in one exon??? Does anyone have the same problem with me? Or there is something wrong when I configure the maker file?? Thanks! Jingjing -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Thu Jun 20 19:21:38 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Fri, 21 Jun 2013 00:21:38 +0000 Subject: [maker-devel] maker exon result In-Reply-To: <6312A919-6E3A-43F5-A553-5947204FC6DB@genetics.utah.edu> References: , <6312A919-6E3A-43F5-A553-5947204FC6DB@genetics.utah.edu> Message-ID: For the last three nucleotides of this example, it is also not stop codon. Jingjing ________________________________ From: Barry Moore [barry.moore at genetics.utah.edu] Sent: Thursday, June 20, 2013 7:11 PM To: Daniel Ence Cc: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] maker exon result To add to what Daniel suggested if you want to find the stop codon for this gene, look at the last three nucleotides of the last CDS. B On Jun 20, 2013, at 5:06 PM, Daniel Ence wrote: Hi Jingjing, It's really hard to find the stop codon in the nucleotide sequence that you sent. I think most people determine the presence of a stop codon in a gene by viewing the annotations and sequence in some kind of viewer. The one that I use the most is Apollo, but many people also like gbrowse and igv. When you view gene models in Apollo, the start codons are highlighted in green and the stop codons are highlighted in red. Sometimes MAKER couldn't find the stop or start codon for a gene, and in those cases, the end of the gene model is marked with an orange arrow. I hope that I understood your question. Feel free to reply back on the mailing list if I didn't. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing Jin [jjin01 at mail.rockefeller.edu] Sent: Thursday, June 20, 2013 2:22 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker exon result Dear all, I have used maker to predict the gene model in my draft genome. However, when I check the sequence for each exon, I find some of them just have start codon, without stop codon. Is it reasonable for this? Like in this example: processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9 processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0|0|1|0.14|0.12|8|0|362 processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . + 2 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG This is the sequence for this gene, the red color is for the first exon?? However, for this exon, I cannot found the stop codon??? I also find for some exon, there are several stop codon in one exon??? Does anyone have the same problem with me? Or there is something wrong when I configure the maker file?? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From myandell at genetics.utah.edu Thu Jun 20 20:11:40 2013 From: myandell at genetics.utah.edu (Mark Yandell) Date: Fri, 21 Jun 2013 01:11:40 +0000 Subject: [maker-devel] maker exon result In-Reply-To: References: , , Message-ID: <7A60AB257EFF2B48B1F4C814817EA05365E18B22@mxb2.hg.genetics.utah.edu> Hi Jin, only the terminal coding exon (CDS) of a gene model will contain a stop codon. Sometimes though there is no stop codon as the gene actually runs of the end of the scaffold, or is lost in a gab in the assembly... --mark Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing Jin [jjin01 at mail.rockefeller.edu] Sent: Thursday, June 20, 2013 6:18 PM To: Daniel Ence; maker-devel at yandell-lab.org Subject: Re: [maker-devel] maker exon result For my understanding, the prediction gene model should be connect different exon together. For each exon of a gene, I think it should have a start codon and stop codon. However, it may be wrong. However, when I check some gene model from maker prediction, some exon of one gene, I cannot find stop codon for it. Like the example I give, the red color is the first exon. However, the last 3 NT is not a stop codon. Even for last 3 NT for last exon, it is also not a stop codon. Is it reasonable? Thanks! Jingjing ________________________________ From: Daniel Ence [dence at genetics.utah.edu] Sent: Thursday, June 20, 2013 7:06 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: RE: maker exon result Hi Jingjing, It's really hard to find the stop codon in the nucleotide sequence that you sent. I think most people determine the presence of a stop codon in a gene by viewing the annotations and sequence in some kind of viewer. The one that I use the most is Apollo, but many people also like gbrowse and igv. When you view gene models in Apollo, the start codons are highlighted in green and the stop codons are highlighted in red. Sometimes MAKER couldn't find the stop or start codon for a gene, and in those cases, the end of the gene model is marked with an orange arrow. I hope that I understood your question. Feel free to reply back on the mailing list if I didn't. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing Jin [jjin01 at mail.rockefeller.edu] Sent: Thursday, June 20, 2013 2:22 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker exon result Dear all, I have used maker to predict the gene model in my draft genome. However, when I check the sequence for each exon, I find some of them just have start codon, without stop codon. Is it reasonable for this? Like in this example: processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9 processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0|0|1|0.14|0.12|8|0|362 processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . + 2 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG This is the sequence for this gene, the red color is for the first exon?? However, for this exon, I cannot found the stop codon??? I also find for some exon, there are several stop codon in one exon??? Does anyone have the same problem with me? Or there is something wrong when I configure the maker file?? Thanks! Jingjing From bmoore at genetics.utah.edu Thu Jun 20 20:29:41 2013 From: bmoore at genetics.utah.edu (Barry Moore) Date: Fri, 21 Jun 2013 01:29:41 +0000 Subject: [maker-devel] maker exon result In-Reply-To: References: , , Message-ID: <8BA467BB-5549-4385-A398-65951A19B86C@genetics.utah.edu> To clarify things a bit Jin. Not every exon will have a start and/or stop codon only the fist coding exon will have a start and the last coding exon will have a stop. In the GFF3 format a coding exon is a feature of type 'CDS' (column 3) so only look at CDS features not at 'exon' features. For CDSs you must then concatenate the sequence I'd each CDS line for a given transcript (and reverse compliment the sequence if it is on the minus strand). The resulting sequence will usually (but not always) have start and stop codons at the beginning and end. B Barry Moore Research Scientist Dept. Human Genetics University of Utah On Jun 20, 2013, at 6:18 PM, "Jingjing Jin" > wrote: For my understanding, the prediction gene model should be connect different exon together. For each exon of a gene, I think it should have a start codon and stop codon. However, it may be wrong. However, when I check some gene model from maker prediction, some exon of one gene, I cannot find stop codon for it. Like the example I give, the red color is the first exon. However, the last 3 NT is not a stop codon. Even for last 3 NT for last exon, it is also not a stop codon. Is it reasonable? Thanks! Jingjing ________________________________ From: Daniel Ence [dence at genetics.utah.edu] Sent: Thursday, June 20, 2013 7:06 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: RE: maker exon result Hi Jingjing, It's really hard to find the stop codon in the nucleotide sequence that you sent. I think most people determine the presence of a stop codon in a gene by viewing the annotations and sequence in some kind of viewer. The one that I use the most is Apollo, but many people also like gbrowse and igv. When you view gene models in Apollo, the start codons are highlighted in green and the stop codons are highlighted in red. Sometimes MAKER couldn't find the stop or start codon for a gene, and in those cases, the end of the gene model is marked with an orange arrow. I hope that I understood your question. Feel free to reply back on the mailing list if I didn't. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing Jin [jjin01 at mail.rockefeller.edu] Sent: Thursday, June 20, 2013 2:22 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker exon result Dear all, I have used maker to predict the gene model in my draft genome. However, when I check the sequence for each exon, I find some of them just have start codon, without stop codon. Is it reasonable for this? Like in this example: processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9 processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0|0|1|0.14|0.12|8|0|362 processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . + 2 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG This is the sequence for this gene, the red color is for the first exon?? However, for this exon, I cannot found the stop codon??? I also find for some exon, there are several stop codon in one exon??? Does anyone have the same problem with me? Or there is something wrong when I configure the maker file?? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From kara.deleon at biofilm.montana.edu Thu Jun 20 17:25:31 2013 From: kara.deleon at biofilm.montana.edu (Bowen, Kara (De Leon)) Date: Thu, 20 Jun 2013 16:25:31 -0600 Subject: [maker-devel] augustus_species Message-ID: <3E82665C-ECB7-4A07-B0FF-24E8395EDC4D@biofilm.montana.edu> Hello, I am trying to annotation a Chlamydomonas genome and C. reinhartii was used as a model organism in Augustus. I would like to add this model to augustus_species in the maker_opts.ctl file, but I'm not sure how this information should be inserted on this line (ie. as genus name, file location, etc). I am also having an issue with providing a protein file. When I put in the protein fasta file of C. reinhartti from the Augustus website, I get a fatal error (below). I've looked through the fasta and I'm not seeing anything obvious that would cause this error to be thrown. Do you have any suggestions on where to start to look? Can't open sequence index file /Users/kara/Desktop/CBMW_maker_protein/contigs.maker.output/mpi_blastdb/augustus%2Eu9_aa%2Efasta.mpi.10/augustus%2Eu9_aa%2Efasta.mpi.10.1.index: Inappropriate file type or format at /sw/lib/perl5/5.12.4/Bio/DB/Fasta.pm line 527. FATAL ERROR Thanks for any help you can provide. Kara ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Kara De Le?n Postdoctoral Research Associate Montana State University Center for Biofilm Engineering 366 EPS Building Bozeman, MT 59717 208-484-9078 kara.deleon at biofilm.montana.edu ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -------------- next part -------------- An HTML attachment was scrubbed... URL: From gowthaman.ramasamy at seattlebiomed.org Fri Jun 21 08:29:06 2013 From: gowthaman.ramasamy at seattlebiomed.org (Gowthaman Ramasamy) Date: Fri, 21 Jun 2013 06:29:06 -0700 Subject: [maker-devel] augustus_species Message-ID: I believe the model file should go to Augustus installation directory. Actually in to the 'genomes' sub folder there. Then use the exact name of the model file ( minus extension) in .CTL file....... "Bowen, Kara (De Leon)" wrote: Hello, I am trying to annotation a Chlamydomonas genome and C. reinhartii was used as a model organism in Augustus. I would like to add this model to augustus_species in the maker_opts.ctl file, but I'm not sure how this information should be inserted on this line (ie. as genus name, file location, etc). I am also having an issue with providing a protein file. When I put in the protein fasta file of C. reinhartti from the Augustus website, I get a fatal error (below). I've looked through the fasta and I'm not seeing anything obvious that would cause this error to be thrown. Do you have any suggestions on where to start to look? Can't open sequence index file /Users/kara/Desktop/CBMW_maker_protein/contigs.maker.output/mpi_blastdb/augustus%2Eu9_aa%2Efasta.mpi.10/augustus%2Eu9_aa%2Efasta.mpi.10.1.index: Inappropriate file type or format at /sw/lib/perl5/5.12.4/Bio/DB/Fasta.pm line 527. FATAL ERROR Thanks for any help you can provide. Kara ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Kara De Le?n Postdoctoral Research Associate Montana State University Center for Biofilm Engineering 366 EPS Building Bozeman, MT 59717 208-484-9078 kara.deleon at biofilm.montana.edu ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From carsonhh at gmail.com Fri Jun 21 10:24:17 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Jun 2013 11:24:17 -0400 Subject: [maker-devel] augustus_species In-Reply-To: Message-ID: The model files must go in .../augustus/config/species/ under the augustus installation directory (Each model gets a different directory). The species that augustus can accept will be the same as the directory names under .../augustus/config/species/. The command 'augustus --species=help' will also provide a list of those names. For the protein file can you send it to me? --Carson On 13-06-21 9:29 AM, "Gowthaman Ramasamy" wrote: >I believe the model file should go to Augustus installation directory. >Actually in to the 'genomes' sub folder there. Then use the exact name of >the model file ( minus extension) in .CTL file....... > >"Bowen, Kara (De Leon)" wrote: > > > >Hello, >I am trying to annotation a Chlamydomonas genome and C. reinhartii was >used as a model organism in Augustus. I would like to add this model to >augustus_species in the maker_opts.ctl file, but I'm not sure how this >information should be inserted on this line (ie. as genus name, file >location, etc). > >I am also having an issue with providing a protein file. When I put in >the protein fasta file of C. reinhartti from the Augustus website, I get >a fatal error (below). I've looked through the fasta and I'm not seeing >anything obvious that would cause this error to be thrown. Do you have >any suggestions on where to start to look? > > >Can't open sequence index file >/Users/kara/Desktop/CBMW_maker_protein/contigs.maker.output/mpi_blastdb/au >gustus%2Eu9_aa%2Efasta.mpi.10/augustus%2Eu9_aa%2Efasta.mpi.10.1.index: >Inappropriate file type or format at /sw/lib/perl5/5.12.4/Bio/DB/Fasta.pm >line 527. > >FATAL ERROR > > >Thanks for any help you can provide. > >Kara > > >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >Kara De Le?n >Postdoctoral Research Associate >Montana State University >Center for Biofilm Engineering >366 EPS Building >Bozeman, MT 59717 >208-484-9078 >kara.deleon at biofilm.montana.edu >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri Jun 21 08:58:35 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Jun 2013 09:58:35 -0400 Subject: [maker-devel] maker exon result In-Reply-To: <8BA467BB-5549-4385-A398-65951A19B86C@genetics.utah.edu> Message-ID: To further illustrate this I've highlighted the location of all CDS entries. You need to cut them out, string them together linearly, and only then can you translate. There is a start codon for the merged CDS then all open reading frame following that, but no stop codon so this is a partial transcript. Sometimes the gene predictors do not find a likely stop and a partial model scores better. You can force MAKER to try and find a stop even when the gene predictor (snap, augustus, etc.) doesn't by setting always_complete=1 in the maker_opts.ctl file. Keep in mind that this is just a forced canonical completion. ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCA GAATTATCTCCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCG GGGAAGTTGATTGAGAATAATCCATCAGGGgtgatacagaagaattgtttcagTATCTTG TTGAAATATTGGCTTCTAGAGTGTATGATGTAGCAATTGATTCCCCCTTGCAAAATGCAA CTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGATCAAAAGAGAGGATATGCAGTCCg tatgtttctcctctcttctttttttgatgtagcatttgctttaacttagaatttgtggtt ttaaacataccattagaaaggtatggaggttgaggattagggtagtaaagtaggtagtct agagtgttcataacagtaatattgacaagcagtctcgctttccgttggtagtaggttttt atgactaaccgttattttctttcattgttgatcaacttacttttgttgtttttattctgc ttttatatggctttttggtactgtcccttcttgtctatattttcattaatgtggtgctta tgcttttctaagccgagagtttattggaaacaactttcatatcctcacaaggtaggggta aggtgtgcgtacacactaccctccccagactctacggtgtgggataatatttagtatgtt attgtcgttgttgttgtaaacgttttttttgttgctatcaaagcatgttattacgggtaa aatagaaacatttaaagtgaaagagtttccaaacgtaggaaagcttttttttctttcgga atacaccgaaaaaagaaagactatcatttaagatagaacaacaacagcgacggagctagc cttcgacttactggttcggcagaacccaataattttggcccaaactctgtacttgtacta aaaagctcacttaatatgtataaaaagcctagtaattaagttgcatttttttctttctaa aatctagagctcataaactcaaaattatgtctccgcctctgaacaatggggatattattc tacttttaactatcttagataagttaataattgttctctttttcaaacgtttctgccttg tattattgtgtaactatttatactgtgtggacgcttcaaaatgttgttgcgcccgcgtcg gatcctcaaaaaatatatattttgaggattcgacacgcacccgatgaccttttcggagaa ttcgagcaatataggtaactaatattgctagctcatcaactggtggtattttttagGTGC TCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTC AGAGACTTAAATGTACTGCTACGATTgtcatgcctgttaccacaccagagatcaaggtaa ttagttctctcctgttaatttatccttcatgttcgattcatgtgaatctagttgatcggg cactgagttttactaaaaaatgaagactttcggaacttgggagctttaacatgctgtaac atttgtgtagttataagacttttgaaacttatagtcttagtgggtgtttggacataagaa ttgtaaagttccaagaaaagtgaaaaaaaattcaagtgaaaatggtatttgaaaattaga gttgtgtttggacatgaatataattttaggttgtttttgaagttttgtgagtgatctgac acaaattttgaaaaaacaactttttggagtttttcaaattttcgaaaaattccaaaatgc atcttcaagtgaaaattggaaattatatgaccaaacgctgatttcgggaaaaaaattcga aaaaatgtgaaaattttcttatgtccaaacgggctcttaaatgcgtcataacgtttgtgt ggttataaaagtctctcatctgaatagggtcacacaactaaaacagagagaacaaaataa ttcactaaaaaaaaattggaactagctacaaacttcgtcgcaagtctcgctaaatcgctc gtagctaatagaatttctagataatttgtttagcttgtagcatgaaatttttctatttag caacagaagtagtctgtcgctaattcctatttttttagtagaaagtattgtgaaattatt tgtttttctaaaggaccattttctttacaaatgaacagattgaagcagttaagaacttgg atggtaatgtagttctacagGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGT TGGCTGAAGATGAAGgtctcacattcatcccgcctttcgatcacatcttaaagatataca tgcagtatttctgcctgtagGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAA AAGGGTTGCTCCTCATACAAAGATTATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGAC ACAGTCTTTGTACCACGGAATGAGAGTAAAGTTAGAACAAGTTGATAATTTTGCAGATGG CgtagctgttgcactagTTAGTTGGTGAAGAAACTTTCCGTCTTTGCAAAGATTTAATAG ACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGgttagcacgcacc atctcctaatggtttcagatatgatccgtccaaccagccaaaattggttagaataggacg ggttgaactatcaacccaatcaatcacagcccaaataacatttatgtgggtatatgactc gcccatttattaactcaaccaattttggtccattcaaattcaggctaacccgtccacgtt tgacattcatactttagatgtggattaaagtaactttcttaaatttccctctggttttga catgtactagtttgtgtttgtgtgtgttttgttctttttttcaatagGATGTGTACGACA AAGGAAGGAACATATTAGAGACATCAGGTGCACTCGCCATAGCTGGAGCTGAAGCATACT GCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTAGTGGAGCCAATATGG ACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGGAAGCTC TGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTgtgc gttacttagagcacttaacaagcattttagccagagtttaagttatatacatcgtcgtca gtgtaagaaacttttataccgtcttgatggagtaaaaatttgttacactgacgtgtacat aacttaaaacttttttagttactatatgatactttctgtctaagaaactgaaatattgac ttgaattactggtgggacctatgattattaccgaattcaagtacagatataactctggaa gaaaacaagctctagttctgtacaggtaattaaagttctattcatttttagaggggatgt tggcttctcattttagatttgctttattagttgttaggaaaaaagaaattacttattaca ttcaatttttagATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGGAAG TTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG Thanks, Carson From: Barry Moore Date: Thursday, 20 June, 2013 9:29 PM To: Jingjing Jin Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] maker exon result To clarify things a bit Jin. Not every exon will have a start and/or stop codon only the fist coding exon will have a start and the last coding exon will have a stop. In the GFF3 format a coding exon is a feature of type 'CDS' (column 3) so only look at CDS features not at 'exon' features. For CDSs you must then concatenate the sequence I'd each CDS line for a given transcript (and reverse compliment the sequence if it is on the minus strand). The resulting sequence will usually (but not always) have start and stop codons at the beginning and end. B Barry Moore Research Scientist Dept. Human Genetics University of Utah On Jun 20, 2013, at 6:18 PM, "Jingjing Jin" wrote: > For my understanding, the prediction gene model should be connect different > exon together. > > For each exon of a gene, I think it should have a start codon and stop codon. > However, it may be wrong. > > However, when I check some gene model from maker prediction, some exon of one > gene, I cannot find stop codon for it. Like the example I give, the red color > is the first exon. However, the last 3 NT is not a stop codon. > > Even for last 3 NT for last exon, it is also not a stop codon. > > Is it reasonable? > > Thanks! > > Jingjing > > > > From: Daniel Ence [dence at genetics.utah.edu] > Sent: Thursday, June 20, 2013 7:06 PM > To: Jingjing Jin; maker-devel at yandell-lab.org > Subject: RE: maker exon result > > Hi Jingjing, > > It's really hard to find the stop codon in the nucleotide sequence that you > sent. I think most people determine the presence of a stop codon in a gene by > viewing the annotations and sequence in some kind of viewer. The one that I > use the most is Apollo, but many people also like gbrowse and igv. > > When you view gene models in Apollo, the start codons are highlighted in green > and the stop codons are highlighted in red. Sometimes MAKER couldn't find the > stop or start codon for a gene, and in those cases, the end of the gene model > is marked with an orange arrow. > > I hope that I understood your question. Feel free to reply back on the mailing > list if I didn't. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing > Jin [jjin01 at mail.rockefeller.edu] > Sent: Thursday, June 20, 2013 2:22 PM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] maker exon result > > Dear all, > > I have used maker to predict the gene model in my draft genome. > > However, when I check the sequence for each exon, I find some of them just > have start codon, without stop codon. > > Is it reasonable for this? > > Like in this example: > > processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-proce > ssed_tobacco_genome_sequences_c33-snap-gene-0.9 > processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=ma > ker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_ > tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0 > |0|1|0.14|0.12|8|0|362 > processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . > + 0 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . > + 0 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . > + 0 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . > + 0 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . > + 2 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . > + 0 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . > + 0 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . > + 0 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > > > ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT > CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT > AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG > ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA > TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC > TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG > TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG > ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT > TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT > TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC > TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC > AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT > TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT > AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC > TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA > AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT > TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG > CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC > GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT > TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG > AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA > AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA > TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC > TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC > TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG > TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA > TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC > TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA > ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT > GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA > AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA > GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT > TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT > TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG > AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC > TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT > TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT > AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC > TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA > GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG > GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA > TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA > TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT > TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG > CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA > GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG > AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT > ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT > TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA > CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA > CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT > ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA > TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG > AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG > > > > > This is the sequence for this gene, the red color is for the first exon?? > > > However, for this exon, I cannot found the stop codon??? > > > I also find for some exon, there are several stop codon in one exon??? > > > Does anyone have the same problem with me? > > Or there is something wrong when I configure the maker file?? > > > Thanks! > > > Jingjing > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From amelia.ireland at gmod.org Sun Jun 23 21:15:37 2013 From: amelia.ireland at gmod.org (Amelia Ireland) Date: Sun, 23 Jun 2013 19:15:37 -0700 Subject: [maker-devel] Fwd: about running MAKER In-Reply-To: References: Message-ID: >From the GMOD helpdesk; please cc Lin, lin11 at cougars.csusm.edu. ---------- Forwarded message ---------- From: Yunxi Lin Date: Sun, Jun 23, 2013 at 4:14 PM Subject: about running MAKER To: "gmod-help at gmod.org" Hi I'm running a eukaryote project on our server. Because our server do not have the GUI, is that still work for MAKER? And our command already ran more than one month to try to generate the model use for the training of SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux server. Thank you. Sincerely, Lin -- Amelia Ireland GMOD Community Support http://gmod.org || @gmodproject -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Jun 24 08:05:27 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Jun 2013 09:05:27 -0400 Subject: [maker-devel] Fwd: about running MAKER In-Reply-To: Message-ID: Run time is dependent on the size of your evidence dataset, genome size, and number of processors you use. If you have a large genome (Gb size) and you are running on a single cpu then that could take a long time. This is especially true if you use the alt_est option for evidence as these are aligned via tblastx which is 3-4 times slower than protein alignments, and 10-20 time slower than standard EST alignments. 95% of MAKER's runtime is BLAST alignment so your evidence dataset is the major factor. Also you do not need results from the entire genome to train SNAP. If you get results from ~10Mb of the genome that is usually sufficient. Also make sure you are taking advantage of parallelization. Launch via MPI to get maximum performance. I commonly launch on 16 and 32 cpu Linux servers which can annotate most fungal genomes in a few hours and larger genomes in a few days. --Carson From: Amelia Ireland Date: Sunday, 23 June, 2013 10:15 PM To: Cc: Subject: [maker-devel] Fwd: about running MAKER >From the GMOD helpdesk; please cc Lin, lin11 at cougars.csusm.edu. ---------- Forwarded message ---------- From: Yunxi Lin Date: Sun, Jun 23, 2013 at 4:14 PM Subject: about running MAKER To: "gmod-help at gmod.org" Hi I'm running a eukaryote project on our server. Because our server do not have the GUI, is that still work for MAKER? And our command already ran more than one month to try to generate the model use for the training of SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux server. Thank you. Sincerely, Lin -- Amelia Ireland GMOD Community Support http://gmod.org || @gmodproject _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Carson.Holt at oicr.on.ca Mon Jun 24 19:39:08 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Tue, 25 Jun 2013 00:39:08 +0000 Subject: [maker-devel] Fwd: about running MAKER In-Reply-To: Message-ID: You are most likely only getting 1 cpu of performance. You should just install MPICH2. It's easy just to let MAKER do it for you: Go to the ?/maker/src/ directory Run './Build mpich2' Once it finishes installing, it will be in the ?/maker/exe/mpich2/bin/ directory. Setup MAKER again to use MPICH2: Go to the ?/maker/src/ directory Run 'perl Build.PL' Say yes to the "use MPI": question Run './Build install' Now run MAKER via 'mpiexec'. Example --> ?/maker/exe/mpich2/bin/mpiexec -n 16 maker The ?n flag specifies how many CPUS to use. Mpiexec handles process communication either on the same machine or across machines. You will get much better performance. Thanks, Carson From: Yunxi Lin > Date: Monday, 24 June, 2013 7:11 PM To: Carson Holt > Cc: Amelia Ireland >, > Subject: Re: [maker-devel] Fwd: about running MAKER Hi Carson Thank your for your help. My genome estimated size is 250M base pairs. I ran it in 16cpu, but we don't have the MPI so I cannot use it. I don't think I'm using the alt_est option. I was following the tutorial to do that. I used TopHat and Cufflinks to generate the ESTs from the assembly sequence based on RNA-seq. I used that ESTs to run the MAKER. I think I already got more than 10Mb data. The information you mentioned is very helpful. I may go to use them to try to train the SNAP and Augustus. Because this is my first time using the MAKER, I ran already a month, I was wondering maybe the command I used in a wrong way. Sincerely, Yunxi 2013/6/24 Carson Holt > Run time is dependent on the size of your evidence dataset, genome size, and number of processors you use. If you have a large genome (Gb size) and you are running on a single cpu then that could take a long time. This is especially true if you use the alt_est option for evidence as these are aligned via tblastx which is 3-4 times slower than protein alignments, and 10-20 time slower than standard EST alignments. 95% of MAKER's runtime is BLAST alignment so your evidence dataset is the major factor. Also you do not need results from the entire genome to train SNAP. If you get results from ~10Mb of the genome that is usually sufficient. Also make sure you are taking advantage of parallelization. Launch via MPI to get maximum performance. I commonly launch on 16 and 32 cpu Linux servers which can annotate most fungal genomes in a few hours and larger genomes in a few days. --Carson From: Amelia Ireland > Date: Sunday, 23 June, 2013 10:15 PM To: > Cc: > Subject: [maker-devel] Fwd: about running MAKER >From the GMOD helpdesk; please cc Lin, lin11 at cougars.csusm.edu. ---------- Forwarded message ---------- From: Yunxi Lin > Date: Sun, Jun 23, 2013 at 4:14 PM Subject: about running MAKER To: "gmod-help at gmod.org" > Hi I'm running a eukaryote project on our server. Because our server do not have the GUI, is that still work for MAKER? And our command already ran more than one month to try to generate the model use for the training of SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux server. Thank you. Sincerely, Lin -- Amelia Ireland GMOD Community Support http://gmod.org || @gmodproject _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From lin11 at cougars.csusm.edu Mon Jun 24 18:11:23 2013 From: lin11 at cougars.csusm.edu (Yunxi Lin) Date: Mon, 24 Jun 2013 16:11:23 -0700 Subject: [maker-devel] Fwd: about running MAKER In-Reply-To: References: Message-ID: Hi Carson Thank your for your help. My genome estimated size is 250M base pairs. I ran it in 16cpu, but we don't have the MPI so I cannot use it. I don't think I'm using the alt_est option. I was following the tutorial to do that. I used TopHat and Cufflinks to generate the ESTs from the assembly sequence based on RNA-seq. I used that ESTs to run the MAKER. I think I already got more than 10Mb data. The information you mentioned is very helpful. I may go to use them to try to train the SNAP and Augustus. Because this is my first time using the MAKER, I ran already a month, I was wondering maybe the command I used in a wrong way. Sincerely, Yunxi 2013/6/24 Carson Holt > Run time is dependent on the size of your evidence dataset, genome size, > and number of processors you use. If you have a large genome (Gb size) and > you are running on a single cpu then that could take a long time. This is > especially true if you use the alt_est option for evidence as these are > aligned via tblastx which is 3-4 times slower than protein alignments, and > 10-20 time slower than standard EST alignments. 95% of MAKER's runtime is > BLAST alignment so your evidence dataset is the major factor. > > Also you do not need results from the entire genome to train SNAP. If you > get results from ~10Mb of the genome that is usually sufficient. Also make > sure you are taking advantage of parallelization. Launch via MPI to get > maximum performance. I commonly launch on 16 and 32 cpu Linux servers > which can annotate most fungal genomes in a few hours and larger genomes in > a few days. > > --Carson > > > From: Amelia Ireland > Date: Sunday, 23 June, 2013 10:15 PM > To: > Cc: > Subject: [maker-devel] Fwd: about running MAKER > > From the GMOD helpdesk; please cc Lin, lin11 at cougars.csusm.edu. > > ---------- Forwarded message ---------- > From: Yunxi Lin > Date: Sun, Jun 23, 2013 at 4:14 PM > Subject: about running MAKER > To: "gmod-help at gmod.org" > > > Hi > > I'm running a eukaryote project on our server. Because our server do not > have the GUI, is that still work for MAKER? And our command already ran > more than one month to try to generate the model use for the training of > SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux > server. > > Thank you. > > Sincerely, > Lin > > > > -- > Amelia Ireland > GMOD Community Support > http://gmod.org || @gmodproject > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Jun 25 09:12:45 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 25 Jun 2013 14:12:45 +0000 Subject: [maker-devel] Fwd: about running MAKER In-Reply-To: References: , Message-ID: Hi Yunxi, During the maker installation, there is an option to automatically install MPICH2, which would let you run maker parallelized. Try rerunning the perl Build.PL script in the "maker/src" directory, and when the option to install MPICH2 comes up, tell it yes. This will start an automated download and install onto your server. You can also start more than one maker process. They will work on annotating the genome together. You can start as many as ten or more processes like this, but MPI is a better parallelizing option. Hope that helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Yunxi Lin [lin11 at cougars.csusm.edu] Sent: Monday, June 24, 2013 5:11 PM To: Carson Holt Cc: maker-devel at yandell-lab.org; Amelia Ireland Subject: Re: [maker-devel] Fwd: about running MAKER Hi Carson Thank your for your help. My genome estimated size is 250M base pairs. I ran it in 16cpu, but we don't have the MPI so I cannot use it. I don't think I'm using the alt_est option. I was following the tutorial to do that. I used TopHat and Cufflinks to generate the ESTs from the assembly sequence based on RNA-seq. I used that ESTs to run the MAKER. I think I already got more than 10Mb data. The information you mentioned is very helpful. I may go to use them to try to train the SNAP and Augustus. Because this is my first time using the MAKER, I ran already a month, I was wondering maybe the command I used in a wrong way. Sincerely, Yunxi 2013/6/24 Carson Holt > Run time is dependent on the size of your evidence dataset, genome size, and number of processors you use. If you have a large genome (Gb size) and you are running on a single cpu then that could take a long time. This is especially true if you use the alt_est option for evidence as these are aligned via tblastx which is 3-4 times slower than protein alignments, and 10-20 time slower than standard EST alignments. 95% of MAKER's runtime is BLAST alignment so your evidence dataset is the major factor. Also you do not need results from the entire genome to train SNAP. If you get results from ~10Mb of the genome that is usually sufficient. Also make sure you are taking advantage of parallelization. Launch via MPI to get maximum performance. I commonly launch on 16 and 32 cpu Linux servers which can annotate most fungal genomes in a few hours and larger genomes in a few days. --Carson From: Amelia Ireland > Date: Sunday, 23 June, 2013 10:15 PM To: > Cc: > Subject: [maker-devel] Fwd: about running MAKER >From the GMOD helpdesk; please cc Lin, lin11 at cougars.csusm.edu. ---------- Forwarded message ---------- From: Yunxi Lin > Date: Sun, Jun 23, 2013 at 4:14 PM Subject: about running MAKER To: "gmod-help at gmod.org" > Hi I'm running a eukaryote project on our server. Because our server do not have the GUI, is that still work for MAKER? And our command already ran more than one month to try to generate the model use for the training of SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux server. Thank you. Sincerely, Lin -- Amelia Ireland GMOD Community Support http://gmod.org || @gmodproject _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Carson.Holt at oicr.on.ca Tue Jun 25 10:56:22 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Tue, 25 Jun 2013 15:56:22 +0000 Subject: [maker-devel] Fwd: about running MAKER In-Reply-To: <9FC132E2-9E59-42E9-ADBA-FD91644E2124@cougars.csusm.edu> Message-ID: You can get blast to use more than 1 cpu via the cpus= option, but that is still significantly limiting MAKER's performance. When you let MAKER install MPICH2, it will be local to the MAKER installation (MAKER only). It will be in ?/maker/exe/mpich2. This was purposely done for people who have limited access and install MAKER themselves, so they can run via MPI without having to get upgraded privileges. So I don't know if you installed MAKER yourself, but if you did, then this is an option that will let you run. --Carson From: csusm > Date: Tuesday, 25 June, 2013 11:40 AM To: Carson Holt > Subject: Re: [maker-devel] Fwd: about running MAKER Hi Carson Thank you for your suggestion. Do you mean if I dont use MPI, i could only run it on one cpu? Because my school own the server, I only have the limit authorization. Yunxi Lin On Jun 24, 2013, at 5:39 PM, Carson Holt > wrote: You are most likely only getting 1 cpu of performance. You should just install MPICH2. It's easy just to let MAKER do it for you: Go to the ?/maker/src/ directory Run './Build mpich2' Once it finishes installing, it will be in the ?/maker/exe/mpich2/bin/ directory. Setup MAKER again to use MPICH2: Go to the ?/maker/src/ directory Run 'perl Build.PL' Say yes to the "use MPI": question Run './Build install' Now run MAKER via 'mpiexec'. Example --> ?/maker/exe/mpich2/bin/mpiexec -n 16 maker The ?n flag specifies how many CPUS to use. Mpiexec handles process communication either on the same machine or across machines. You will get much better performance. Thanks, Carson From: Yunxi Lin > Date: Monday, 24 June, 2013 7:11 PM To: Carson Holt > Cc: Amelia Ireland >, > Subject: Re: [maker-devel] Fwd: about running MAKER Hi Carson Thank your for your help. My genome estimated size is 250M base pairs. I ran it in 16cpu, but we don't have the MPI so I cannot use it. I don't think I'm using the alt_est option. I was following the tutorial to do that. I used TopHat and Cufflinks to generate the ESTs from the assembly sequence based on RNA-seq. I used that ESTs to run the MAKER. I think I already got more than 10Mb data. The information you mentioned is very helpful. I may go to use them to try to train the SNAP and Augustus. Because this is my first time using the MAKER, I ran already a month, I was wondering maybe the command I used in a wrong way. Sincerely, Yunxi 2013/6/24 Carson Holt > Run time is dependent on the size of your evidence dataset, genome size, and number of processors you use. If you have a large genome (Gb size) and you are running on a single cpu then that could take a long time. This is especially true if you use the alt_est option for evidence as these are aligned via tblastx which is 3-4 times slower than protein alignments, and 10-20 time slower than standard EST alignments. 95% of MAKER's runtime is BLAST alignment so your evidence dataset is the major factor. Also you do not need results from the entire genome to train SNAP. If you get results from ~10Mb of the genome that is usually sufficient. Also make sure you are taking advantage of parallelization. Launch via MPI to get maximum performance. I commonly launch on 16 and 32 cpu Linux servers which can annotate most fungal genomes in a few hours and larger genomes in a few days. --Carson From: Amelia Ireland > Date: Sunday, 23 June, 2013 10:15 PM To: > Cc: > Subject: [maker-devel] Fwd: about running MAKER >From the GMOD helpdesk; please cc Lin, lin11 at cougars.csusm.edu. ---------- Forwarded message ---------- From: Yunxi Lin > Date: Sun, Jun 23, 2013 at 4:14 PM Subject: about running MAKER To: "gmod-help at gmod.org" > Hi I'm running a eukaryote project on our server. Because our server do not have the GUI, is that still work for MAKER? And our command already ran more than one month to try to generate the model use for the training of SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux server. Thank you. Sincerely, Lin -- Amelia Ireland GMOD Community Support http://gmod.org || @gmodproject _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Tue Jun 25 16:13:53 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Tue, 25 Jun 2013 21:13:53 +0000 Subject: [maker-devel] start position for some genes results Message-ID: Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062 maker gene -1 507 . - . ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jun 25 17:55:11 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Jun 2013 18:55:11 -0400 Subject: [maker-devel] start position for some genes results In-Reply-To: Message-ID: What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062makergene-1507.-.ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-sn ap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Tue Jun 25 18:00:37 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Tue, 25 Jun 2013 23:00:37 +0000 Subject: [maker-devel] start position for some genes results In-Reply-To: References: , Message-ID: Sorry, I have checked. I think it is old version:2.27. I will try the new one. Thanks! Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062 maker gene -1 507 . - . ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Tue Jun 25 19:53:01 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Wed, 26 Jun 2013 00:53:01 +0000 Subject: [maker-devel] start position for some genes results In-Reply-To: References: , Message-ID: Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062 maker gene -1 507 . - . ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jun 25 19:55:54 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Jun 2013 20:55:54 -0400 Subject: [maker-devel] start position for some genes results In-Reply-To: Message-ID: Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062makergene-1507.-.ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-sn ap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Tue Jun 25 20:30:09 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Wed, 26 Jun 2013 01:30:09 +0000 Subject: [maker-devel] start position for some genes results In-Reply-To: References: , Message-ID: Dear Carson, I am so sorry. The problem is still here. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', 'HASH(0x4e10810)', 0) called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 73 Process::MpiTiers::__ANON__() called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 eval {...} called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, 'Process::MpiChunk') called at /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 --> rank=NA, hostname=ChuaServer1 ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `seq_id` does not exist in MpiTier object ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 8:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062 maker gene -1 507 . - . ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jun 25 20:47:10 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Jun 2013 21:47:10 -0400 Subject: [maker-devel] start position for some genes results In-Reply-To: Message-ID: Could you check for this sequence in your input genome file for "processed_tobacco_genome_sequences_c1", make sure that it is in fact that exact name, and there are no ':' characters in the name because they can confuse the bioperl fasta indexer. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 9:30 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Dear Carson, I am so sorry. The problem is still here. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', 'HASH(0x4e10810)', 0) called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 73 Process::MpiTiers::__ANON__() called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 eval {...} called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, 'Process::MpiChunk') called at /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 --> rank=NA, hostname=ChuaServer1 ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `seq_id` does not exist in MpiTier object From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 8:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062makergene-1507.-.ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-sn ap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Tue Jun 25 20:53:33 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Wed, 26 Jun 2013 01:53:33 +0000 Subject: [maker-devel] start position for some genes results In-Reply-To: References: , Message-ID: Yes, this is the real name. There is also no ":" in the name. Because I have use the same file for maker.2.27 and have no problem. I am not sure what is wrong with the new version. Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 9:47 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Could you check for this sequence in your input genome file for "processed_tobacco_genome_sequences_c1", make sure that it is in fact that exact name, and there are no ':' characters in the name because they can confuse the bioperl fasta indexer. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 9:30 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Dear Carson, I am so sorry. The problem is still here. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', 'HASH(0x4e10810)', 0) called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 73 Process::MpiTiers::__ANON__() called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 eval {...} called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, 'Process::MpiChunk') called at /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 --> rank=NA, hostname=ChuaServer1 ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `seq_id` does not exist in MpiTier object ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 8:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062 maker gene -1 507 . - . ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jun 25 21:02:51 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Jun 2013 22:02:51 -0400 Subject: [maker-devel] start position for some genes results In-Reply-To: Message-ID: The point of the failure you are seeing is occurring in the initialization stage, before reaching any of the changes that would have been introduced by 2.28. Try running the test data that comes with MAKER, does it fail as well? --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 9:53 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Yes, this is the real name. There is also no ":" in the name. Because I have use the same file for maker.2.27 and have no problem. I am not sure what is wrong with the new version. Jingjing From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 9:47 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Could you check for this sequence in your input genome file for "processed_tobacco_genome_sequences_c1", make sure that it is in fact that exact name, and there are no ':' characters in the name because they can confuse the bioperl fasta indexer. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 9:30 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Dear Carson, I am so sorry. The problem is still here. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', 'HASH(0x4e10810)', 0) called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 73 Process::MpiTiers::__ANON__() called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 eval {...} called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, 'Process::MpiChunk') called at /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 --> rank=NA, hostname=ChuaServer1 ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `seq_id` does not exist in MpiTier object From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 8:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062makergene-1507.-.ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-sn ap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Tue Jun 25 21:15:46 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Wed, 26 Jun 2013 02:15:46 +0000 Subject: [maker-devel] start position for some genes results In-Reply-To: References: , Message-ID: Yes, it also fails on test data. jingjing at ChuaServer1:~/software/maker.2.28/maker/data/example$ ../../bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/software/maker.2.28/maker/data/example/dpp_contig.maker.output/dpp_contig_datastore To access files for individual sequences use the datastore index: /home/jingjing/software/maker.2.28/maker/data/example/dpp_contig.maker.output/dpp_contig_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >contig-dpp-500-500, trying to re-index the fasta. stop here: contig-dpp-500-500 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 10:02 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results The point of the failure you are seeing is occurring in the initialization stage, before reaching any of the changes that would have been introduced by 2.28. Try running the test data that comes with MAKER, does it fail as well? --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 9:53 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Yes, this is the real name. There is also no ":" in the name. Because I have use the same file for maker.2.27 and have no problem. I am not sure what is wrong with the new version. Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 9:47 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Could you check for this sequence in your input genome file for "processed_tobacco_genome_sequences_c1", make sure that it is in fact that exact name, and there are no ':' characters in the name because they can confuse the bioperl fasta indexer. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 9:30 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Dear Carson, I am so sorry. The problem is still here. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', 'HASH(0x4e10810)', 0) called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 73 Process::MpiTiers::__ANON__() called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 eval {...} called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, 'Process::MpiChunk') called at /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 --> rank=NA, hostname=ChuaServer1 ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `seq_id` does not exist in MpiTier object ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 8:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062 maker gene -1 507 . - . ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jun 26 06:49:11 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Jun 2013 07:49:11 -0400 Subject: [maker-devel] start position for some genes results In-Reply-To: Message-ID: I thought as much. There is something wrong with the installation itself. Could you run maker with the --debug flag and kill it after 30 seconds. Capture the STDERR and send it to me. This is just to check prerequisite that are installed on your system for know incompatabilities. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 10:15 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Yes, it also fails on test data. jingjing at ChuaServer1:~/software/maker.2.28/maker/data/example$ ../../bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/software/maker.2.28/maker/data/example/dpp_contig.maker.outpu t/dpp_contig_datastore To access files for individual sequences use the datastore index: /home/jingjing/software/maker.2.28/maker/data/example/dpp_contig.maker.outpu t/dpp_contig_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >contig-dpp-500-500, trying to re-index the fasta. stop here: contig-dpp-500-500 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 10:02 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results The point of the failure you are seeing is occurring in the initialization stage, before reaching any of the changes that would have been introduced by 2.28. Try running the test data that comes with MAKER, does it fail as well? --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 9:53 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Yes, this is the real name. There is also no ":" in the name. Because I have use the same file for maker.2.27 and have no problem. I am not sure what is wrong with the new version. Jingjing From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 9:47 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Could you check for this sequence in your input genome file for "processed_tobacco_genome_sequences_c1", make sure that it is in fact that exact name, and there are no ':' characters in the name because they can confuse the bioperl fasta indexer. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 9:30 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Dear Carson, I am so sorry. The problem is still here. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', 'HASH(0x4e10810)', 0) called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 73 Process::MpiTiers::__ANON__() called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 eval {...} called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, 'Process::MpiChunk') called at /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 --> rank=NA, hostname=ChuaServer1 ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `seq_id` does not exist in MpiTier object From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 8:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062makergene-1507.-.ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-sn ap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michel.moser at ips.unibe.ch Thu Jun 27 08:33:15 2013 From: michel.moser at ips.unibe.ch (michel.moser at ips.unibe.ch) Date: Thu, 27 Jun 2013 13:33:15 +0000 Subject: [maker-devel] spliting genome for annotation Message-ID: Dear Maker-developers If i understood correctly, in order to increase speed and reduce needed resources one can split the genome into chunks and annotate each chunk separately. (i would really like to use that as i am working with a 1.2 Gbasepair draftgenome and cant use MPI on the computing cluster) I am a bit worried about how this might affect the annotation as the gene-predictor would get trained quite differently for each chunk, right? Or is there communication between the chunks using the -base function of maker? Could you maybe name some pros and cons of splitting your genome for the annotation with maker? Thank you very much, Michel ________________________________________ Von: Moser, Michel (IPS) Gesendet: Donnerstag, 27. Juni 2013 15:24 An: Carson Holt Betreff: AW: [maker-devel] start position for some genes results ________________________________________ Von: maker-devel [maker-devel-bounces at yandell-lab.org]" im Auftrag von "Carson Holt [carsonhh at gmail.com] Gesendet: Mittwoch, 26. Juni 2013 04:02 An: Jingjing Jin; maker-devel at yandell-lab.org Betreff: Re: [maker-devel] start position for some genes results The point of the failure you are seeing is occurring in the initialization stage, before reaching any of the changes that would have been introduced by 2.28. Try running the test data that comes with MAKER, does it fail as well? --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 9:53 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Yes, this is the real name. There is also no ":" in the name. Because I have use the same file for maker.2.27 and have no problem. I am not sure what is wrong with the new version. Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 9:47 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Could you check for this sequence in your input genome file for "processed_tobacco_genome_sequences_c1", make sure that it is in fact that exact name, and there are no ':' characters in the name because they can confuse the bioperl fasta indexer. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 9:30 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Dear Carson, I am so sorry. The problem is still here. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', 'HASH(0x4e10810)', 0) called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 73 Process::MpiTiers::__ANON__() called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 eval {...} called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, 'Process::MpiChunk') called at /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 --> rank=NA, hostname=ChuaServer1 ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `seq_id` does not exist in MpiTier object ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 8:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062 maker gene -1 507 . - . ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From lawson at ebi.ac.uk Thu Jun 27 08:37:10 2013 From: lawson at ebi.ac.uk (Daniel Lawson) Date: Thu, 27 Jun 2013 14:37:10 +0100 Subject: [maker-devel] spliting genome for annotation In-Reply-To: References: Message-ID: Michel, It is about the size of your scaffolds rather than the whole genome. Presumably you don't have 1.2 Gb of contiguous sequence. If you have long scaffolds then the compute time will be constrained by the time taken to process the largest scaffold. regards Dan On 27 June 2013 14:33, wrote: > Dear Maker-developers > > If i understood correctly, in order to increase speed and reduce needed > resources one can split the genome into chunks and annotate each chunk > separately. > (i would really like to use that as i am working with a 1.2 Gbasepair > draftgenome and cant use MPI on the computing cluster) > I am a bit worried about how this might affect the annotation as the > gene-predictor would get trained quite differently for each chunk, right? > Or is there communication between the chunks using the -base function of > maker? > > Could you maybe name some pros and cons of splitting your genome for the > annotation with maker? > > Thank you very much, > Michel > > > > > ________________________________________ > Von: Moser, Michel (IPS) > Gesendet: Donnerstag, 27. Juni 2013 15:24 > An: Carson Holt > Betreff: AW: [maker-devel] start position for some genes results > > ________________________________________ > Von: maker-devel [maker-devel-bounces at yandell-lab.org]" im Auftrag > von "Carson Holt [carsonhh at gmail.com] > Gesendet: Mittwoch, 26. Juni 2013 04:02 > An: Jingjing Jin; maker-devel at yandell-lab.org > Betreff: Re: [maker-devel] start position for some genes results > > The point of the failure you are seeing is occurring in the initialization > stage, before reaching any of the changes that would have been introduced > by 2.28. Try running the test data that comes with MAKER, does it fail as > well? > > --Carson > > > > From: Jingjing Jin jjin01 at mail.rockefeller.edu>> > Date: Tuesday, 25 June, 2013 9:53 PM > To: Carson Holt >, " > maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org> > Subject: RE: [maker-devel] start position for some genes results > > Yes, this is the real name. > > There is also no ":" in the name. > > Because I have use the same file for maker.2.27 and have no problem. > > I am not sure what is wrong with the new version. > > Jingjing > > > ________________________________ > From: Carson Holt [carsonhh at gmail.com] > Sent: Tuesday, June 25, 2013 9:47 PM > To: Jingjing Jin; maker-devel at yandell-lab.org maker-devel at yandell-lab.org> > Subject: Re: [maker-devel] start position for some genes results > > Could you check for this sequence in your input genome file for > "processed_tobacco_genome_sequences_c1", make sure that it is in fact that > exact name, and there are no ':' characters in the name because they can > confuse the bioperl fasta indexer. > > --Carson > > > From: Jingjing Jin jjin01 at mail.rockefeller.edu>> > Date: Tuesday, 25 June, 2013 9:30 PM > To: Carson Holt >, " > maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org> > Subject: RE: [maker-devel] start position for some genes results > > Dear Carson, > > > I am so sorry. The problem is still here. > > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore > > To access files for individual sequences use the datastore index: > > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log > > STATUS: Now running MAKER... > WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to > re-index the fasta. > stop here: processed_tobacco_genome_sequences_c1 > ERROR: Fasta index error > at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm > line 239. > Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', > 'HASH(0x4e10810)', 0) called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm > line 73 > Process::MpiTiers::__ANON__() called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 > eval {...} called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 > Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm > line 79 > Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') > called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm > line 56 > Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, > 'Process::MpiChunk') called at > /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 > --> rank=NA, hostname=ChuaServer1 > ERROR: Failed in tier preparation > WARNING: You must always set a rank before running MpiTiers > FATAL: argument `seq_id` does not exist in MpiTier object > > ________________________________ > From: Carson Holt [carsonhh at gmail.com] > Sent: Tuesday, June 25, 2013 8:55 PM > To: Jingjing Jin; maker-devel at yandell-lab.org maker-devel at yandell-lab.org> > Subject: Re: [maker-devel] start position for some genes results > > Delete the mpi_blastdb directory before starting, to make sure all indexes > get rebuilt. Also make sure you are not setting TMP= to a network mounted > location. > > --Carson > > > From: Jingjing Jin jjin01 at mail.rockefeller.edu>> > Date: Tuesday, 25 June, 2013 8:53 PM > To: Carson Holt >, " > maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org> > Subject: RE: [maker-devel] start position for some genes results > > Dear Carson, > > When I use the new version of maker, I have another problem like this: > > jingjing at ChuaServer1:~/project/$ > /home/jingjing/software/maker.2.28/maker/bin/./maker > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore > > To access files for individual sequences use the datastore index: > > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log > > STATUS: Now running MAKER... > WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to > re-index the fasta. > stop here: processed_tobacco_genome_sequences_c1 > ERROR: Fasta index error > > > Do you know how to fix this problem about new version? > > Thanks! > > Jingjing > > > > ________________________________ > From: Carson Holt [carsonhh at gmail.com] > Sent: Tuesday, June 25, 2013 6:55 PM > To: Jingjing Jin; maker-devel at yandell-lab.org maker-devel at yandell-lab.org> > Subject: Re: [maker-devel] start position for some genes results > > What MAKER version are you using? This should be fixed in the current > 2.28. It only happened under a very specific set of circumstances, but I > remember fixing it. So let me know if you are using 2.28. > > --Carson > > > > From: Jingjing Jin jjin01 at mail.rockefeller.edu>> > Date: Tuesday, 25 June, 2013 5:13 PM > To: "maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org> > Subject: [maker-devel] start position for some genes results > > Dear all, > > I find some strange things about location for my final result. > > Like for some start position of final gene model: > > c124062 maker gene -1 507 . - . > ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 > > > It start position is -1. > > Does someone know why the start position is -1? > > Is there something wrong? > > Thanks! > > Jingjing > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Ensembl Genomes | VectorBase | i5K insect genome initiative -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jun 27 10:42:26 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 27 Jun 2013 11:42:26 -0400 Subject: [maker-devel] spliting genome for annotation In-Reply-To: Message-ID: Correct. The level of splitting is going to be limited by the largest config. The largest config will then be your slowest job, but the total runtime will be based off how much splitting you can achieve. Splitting into 10 jobs and running them all simultaneously will make total run time 1/10 as long. You can use the ?base flag with MAKER to make all jobs write to the same directory. Use the ?g flag to specify a different input fasta file for each job (then they can all share the same control files). You will then need to run maker once using the original assembly fasta and the ?dsindex flag when all jobs complete to get MAKER to clean up the datastore log file (rebuilt to index all contigs). That only takes 2 minutes to run. You can use the fasta_tool utility that comes with MAKER to conveniently split the input assembly fasta. MAKER does not train the gene predictors for you, and the hints it gives are on a per gene basis, so splitting contigs has no affect on that. For initial training of gene predictors, run MAKER on about 10-30 Mb of your largest contigs and use either the protein2genome or est2genome prediction options to build gene models to train the predictors on. You will need to train Augustus or SNAP yourself using those models and their own documentation. If training SNAP, you can use maker2zff to convert for SNAPs training format. You can also use the tool CEGMA from Ian Korf's lab to train SNAP. Use the cegma2zff script that comes with MAKER to do the conversion for training input. If you have questions once you start training, just send them to the list. Thanks, Carson From: Daniel Lawson Date: Thursday, 27 June, 2013 9:37 AM To: Cc: Subject: Re: [maker-devel] spliting genome for annotation Michel, It is about the size of your scaffolds rather than the whole genome. Presumably you don't have 1.2 Gb of contiguous sequence. If you have long scaffolds then the compute time will be constrained by the time taken to process the largest scaffold. regards Dan On 27 June 2013 14:33, wrote: > Dear Maker-developers > > If i understood correctly, in order to increase speed and reduce needed > resources one can split the genome into chunks and annotate each chunk > separately. > (i would really like to use that as i am working with a 1.2 Gbasepair > draftgenome and cant use MPI on the computing cluster) > I am a bit worried about how this might affect the annotation as the > gene-predictor would get trained quite differently for each chunk, right? > Or is there communication between the chunks using the -base function of > maker? > > Could you maybe name some pros and cons of splitting your genome for the > annotation with maker? > > Thank you very much, > Michel > > > > > ________________________________________ > Von: Moser, Michel (IPS) > Gesendet: Donnerstag, 27. Juni 2013 15:24 > An: Carson Holt > Betreff: AW: [maker-devel] start position for some genes results > > ________________________________________ > Von: maker-devel [maker-devel-bounces at yandell-lab.org]" im Auftrag von > "Carson Holt [carsonhh at gmail.com] > Gesendet: Mittwoch, 26. Juni 2013 04:02 > An: Jingjing Jin; maker-devel at yandell-lab.org > Betreff: Re: [maker-devel] start position for some genes results > > The point of the failure you are seeing is occurring in the initialization > stage, before reaching any of the changes that would have been introduced by > 2.28. Try running the test data that comes with MAKER, does it fail as well? > > --Carson > > > > From: Jingjing Jin > > > Date: Tuesday, 25 June, 2013 9:53 PM > To: Carson Holt >, > "maker-devel at yandell-lab.org" > > > Subject: RE: [maker-devel] start position for some genes results > > Yes, this is the real name. > > There is also no ":" in the name. > > Because I have use the same file for maker.2.27 and have no problem. > > I am not sure what is wrong with the new version. > > Jingjing > > > ________________________________ > From: Carson Holt [carsonhh at gmail.com] > Sent: Tuesday, June 25, 2013 9:47 PM > To: Jingjing Jin; > maker-devel at yandell-lab.org > Subject: Re: [maker-devel] start position for some genes results > > Could you check for this sequence in your input genome file for > "processed_tobacco_genome_sequences_c1", make sure that it is in fact that > exact name, and there are no ':' characters in the name because they can > confuse the bioperl fasta indexer. > > --Carson > > > From: Jingjing Jin > > > Date: Tuesday, 25 June, 2013 9:30 PM > To: Carson Holt >, > "maker-devel at yandell-lab.org" > > > Subject: RE: [maker-devel] start position for some genes results > > Dear Carson, > > > I am so sorry. The problem is still here. > > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.ma > ker.output/tobacco_seq_1_datastore > > To access files for individual sequences use the datastore index: > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.ma > ker.output/tobacco_seq_1_master_datastore_index.log > > STATUS: Now running MAKER... > WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to > re-index the fasta. > stop here: processed_tobacco_genome_sequences_c1 > ERROR: Fasta index error > at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm > line 239. > Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', > 'HASH(0x4e10810)', 0) called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line > 73 > Process::MpiTiers::__ANON__() called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 > eval {...} called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 > Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line > 79 > Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') > called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line > 56 > Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, > 'Process::MpiChunk') called at > /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 > --> rank=NA, hostname=ChuaServer1 > ERROR: Failed in tier preparation > WARNING: You must always set a rank before running MpiTiers > FATAL: argument `seq_id` does not exist in MpiTier object > > ________________________________ > From: Carson Holt [carsonhh at gmail.com] > Sent: Tuesday, June 25, 2013 8:55 PM > To: Jingjing Jin; > maker-devel at yandell-lab.org > Subject: Re: [maker-devel] start position for some genes results > > Delete the mpi_blastdb directory before starting, to make sure all indexes get > rebuilt. Also make sure you are not setting TMP= to a network mounted > location. > > --Carson > > > From: Jingjing Jin > > > Date: Tuesday, 25 June, 2013 8:53 PM > To: Carson Holt >, > "maker-devel at yandell-lab.org" > > > Subject: RE: [maker-devel] start position for some genes results > > Dear Carson, > > When I use the new version of maker, I have another problem like this: > > jingjing at ChuaServer1:~/project/$ > /home/jingjing/software/maker.2.28/maker/bin/./maker > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.ma > ker.output/tobacco_seq_1_datastore > > To access files for individual sequences use the datastore index: > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.ma > ker.output/tobacco_seq_1_master_datastore_index.log > > STATUS: Now running MAKER... > WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to > re-index the fasta. > stop here: processed_tobacco_genome_sequences_c1 > ERROR: Fasta index error > > > Do you know how to fix this problem about new version? > > Thanks! > > Jingjing > > > > ________________________________ > From: Carson Holt [carsonhh at gmail.com] > Sent: Tuesday, June 25, 2013 6:55 PM > To: Jingjing Jin; > maker-devel at yandell-lab.org > Subject: Re: [maker-devel] start position for some genes results > > What MAKER version are you using? This should be fixed in the current 2.28. > It only happened under a very specific set of circumstances, but I remember > fixing it. So let me know if you are using 2.28. > > --Carson > > > > From: Jingjing Jin > > > Date: Tuesday, 25 June, 2013 5:13 PM > To: "maker-devel at yandell-lab.org" > > > Subject: [maker-devel] start position for some genes results > > Dear all, > > I find some strange things about location for my final result. > > Like for some start position of final gene model: > > c124062 maker gene -1 507 . - . > ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 > > > It start position is -1. > > Does someone know why the start position is -1? > > Is there something wrong? > > Thanks! > > Jingjing > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Ensembl Genomes | VectorBase | i5K insect genome initiative _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From benayoun at stanford.edu Thu Jun 27 19:47:29 2013 From: benayoun at stanford.edu (=?ISO-8859-1?Q?B=E9r=E9nice_Benayoun?=) Date: Thu, 27 Jun 2013 17:47:29 -0700 Subject: [maker-devel] Maker and mono-exonic genes ? In-Reply-To: References: Message-ID: Hi maker devel team, just wanted to say that retraining SNAP apparently fixed the problem (I modified the defaults and added "-min-intron 0" to the training everywhere relevant (default is 30bp, and must prevent single exon genes to be predicted). Thanks for your insights/help ! Berenice 2013/6/10 Carson Holt > One more note. The ESTs appear to be from multiple overlapping HSPs > (based on red line pattern in image). I'd have to see the actual GFF3 to > be sure, but if that is the case, then there probably isn't an ORF to work > with at that location on that strand (so SNAP can't call it). Possibly the > result of assembly error or a pseudogene. > > --Carson > > > > From: Daniel Ence > Date: Friday, 7 June, 2013 5:32 PM > To: B?r?nice Benayoun , " > maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Maker and mono-exonic genes ? > > Hi Berenice, Thank you for sending that screenshot and the maker_opts.log > file. Those are exactly what we need to understand how to expect MAKER to > perform. > > In looking at the screenshot, it doesn't look like any of the gene > predictors gave a prediction in this region. Uses the predictions from > ab-initio tools as a basis for models and considers models that are > supported by evidence. It won't by default create a model when there isn't > a prediction in the region. > > Can I ask which gene predictors you used and how they were trained? You > might consider training one or more of them on the specific evidence that > you expect to support these genes and then rerunning maker with the > retrained predictors. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ------------------------------ > *From:* maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of > B?r?nice Benayoun [benayoun at stanford.edu] > *Sent:* Friday, June 07, 2013 11:17 AM > *To:* maker-devel at yandell-lab.org > *Subject:* [maker-devel] Maker and mono-exonic genes ? > > Dear maker developers, > > I am currently annotating a de novo fish genome, and have started looking > for genes of interest in particular in Maker's output to verify that it's > outputting proper gene sets. > > While many of the genes I look for seem to be correctly annotated by the > pipeline, I have noticed that important genes that do have strong > evidentiary support but are monoexonic are NOT reported by maker. > > I am attaching a screenshot for the contig that I know should contain the > * Foxl2* gene (notoriously monoexonic across evolution), and highlighted > the corresponding evidence for it. > > Is there any setting I can give to maker to force it to output monoexonic > genes ? I already set "single_exon=1" with no success. I attached my config > file FYI. > > Thank you so much in advance for your answer !!! > > Best, > > Berenice. > -- > B?r?nice A. BENAYOUN, Ph.D. > Stanford University/Genetics Department > *BRUNET Laboratory*, 'Molecular Basis of Longevity and Age Related > Diseases' > M312 Alway Building > 300, Pasteur Drive > MC 5120 > Stanford, CA 94305-5120 > USA > Email: benayoun at stanford.edu > Web: www.stanford.edu/group/brunet/ > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- B?r?nice A. BENAYOUN, Ph.D. Stanford University/Genetics Department *BRUNET Laboratory*, 'Molecular Basis of Longevity and Age Related Diseases' M312 Alway Building 300, Pasteur Drive MC 5120 Stanford, CA 94305-5120 USA Email: benayoun at stanford.edu Web: www.stanford.edu/group/brunet/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jun 28 20:01:47 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 28 Jun 2013 21:01:47 -0400 Subject: [maker-devel] Maker and mono-exonic genes ? In-Reply-To: Message-ID: I'm glad it's working for you. Let us know if you run into additional problems. Thanks, Carson From: B?r?nice Benayoun Date: Thursday, June 27, 2013 8:47 PM To: Carson Holt Cc: Daniel Ence , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Maker and mono-exonic genes ? Hi maker devel team, just wanted to say that retraining SNAP apparently fixed the problem (I modified the defaults and added "-min-intron 0" to the training everywhere relevant (default is 30bp, and must prevent single exon genes to be predicted). Thanks for your insights/help ! Berenice 2013/6/10 Carson Holt > One more note. The ESTs appear to be from multiple overlapping HSPs (based on > red line pattern in image). I'd have to see the actual GFF3 to be sure, but > if that is the case, then there probably isn't an ORF to work with at that > location on that strand (so SNAP can't call it). Possibly the result of > assembly error or a pseudogene. > > --Carson > > > > From: Daniel Ence > Date: Friday, 7 June, 2013 5:32 PM > To: B?r?nice Benayoun , "maker-devel at yandell-lab.org" > > Subject: Re: [maker-devel] Maker and mono-exonic genes ? > > Hi Berenice, Thank you for sending that screenshot and the maker_opts.log > file. Those are exactly what we need to understand how to expect MAKER to > perform. > > In looking at the screenshot, it doesn't look like any of the gene predictors > gave a prediction in this region. Uses the predictions from ab-initio tools as > a basis for models and considers models that are supported by evidence. It > won't by default create a model when there isn't a prediction in the region. > > Can I ask which gene predictors you used and how they were trained? You might > consider training one or more of them on the specific evidence that you expect > to support these genes and then rerunning maker with the retrained predictors. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of B?r?nice > Benayoun [benayoun at stanford.edu] > Sent: Friday, June 07, 2013 11:17 AM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] Maker and mono-exonic genes ? > > Dear maker developers, > > I am currently annotating a de novo fish genome, and have started looking for > genes of interest in particular in Maker's output to verify that it's > outputting proper gene sets. > > While many of the genes I look for seem to be correctly annotated by the > pipeline, I have noticed that important genes that do have strong evidentiary > support but are monoexonic are NOT reported by maker. > > I am attaching a screenshot for the contig that I know should contain the > Foxl2 gene (notoriously monoexonic across evolution), and highlighted the > corresponding evidence for it. > > Is there any setting I can give to maker to force it to output monoexonic > genes ? I already set "single_exon=1" with no success. I attached my config > file FYI. > > Thank you so much in advance for your answer !!! > > Best, > > Berenice. > -- > B?r?nice A. BENAYOUN, Ph.D. > Stanford University/Genetics Department > BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' > M312 Alway Building > 300, Pasteur Drive > MC 5120 > Stanford, CA 94305-5120 > USA > Email: benayoun at stanford.edu > Web: www.stanford.edu/group/brunet/ > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -- B?r?nice A. BENAYOUN, Ph.D. Stanford University/Genetics Department BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' M312 Alway Building 300, Pasteur Drive MC 5120 Stanford, CA 94305-5120 USA Email: benayoun at stanford.edu Web: www.stanford.edu/group/brunet/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.stajich at gmail.com Sun Jun 2 12:28:50 2013 From: jason.stajich at gmail.com (Jason Stajich) Date: Sun, 2 Jun 2013 11:28:50 -0700 Subject: [maker-devel] getting protein sequences from genomes In-Reply-To: References: <18790D2A402432409BCC7E00F2AE8926ACE666@rexma.intranet.epfl.ch> <18790D2A402432409BCC7E00F2AE8926AD4807@REXMF.intranet.epfl.ch> <98C45AF6-8F3E-4C06-B283-56AD9C07DD2C@genetics.utah.edu> Message-ID: seems like in your case you want to do more of a liftover-based annotation. generate that and feed it as a gff file to maker if your intention is also gene discovery in your population? On May 23, 2013, at 9:48 AM, Daniel Hughes wrote: > would gene annotation by projection using synteny/WGA not be more appropriate? either way what's wrong with running one of the standard orthology predictions tools or just basic best reciprocal blast? > > dan. > > Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) > ------------------------------------------------------------------------------------- > dsth at cantab.net > dsth at cpan.org > > > 2013/5/23 Barry Moore > Hi Liciano, > > If I understand correctly you are including translations of SNAP and Augustus predictions as well as the predictions. If so, you don't want to do that. An overlapping protein evidence is sufficient to promote a prediction to an annotation, so by providing the protein translation of the prediction along with the prediction you will guarantee that every prediction will become an annotation and that means you lose the benefit of evidence supervised annotation that MAKER provides. Include the proteins from the D mel reference and if you want to cast a broader net include proteins from other dipterans or even Uniprot - just depend on how aggressive you want to try to be in capturing new annotations. > > B > > On May 23, 2013, at 8:41 AM, Luciano Abriata wrote: > >> Thanks for your reply! >> >> One more question, can you think of any tips to get the best possible predictions of protein sequences? >> >> I am asking because I am getting a few proteins that are too big to be real and don't exist if I blast them, plus a few others which don't start with Methionine... So far I am including transcripts and translations from flybase, and snap and augustus with their available trainings for flies. Do you see any possible source of error in that? >> >> Thanks again, >> >> Luciano >> >> De: Barry Moore [barry.moore at genetics.utah.edu] >> Enviado el: viernes, 17 de mayo de 2013 09:02 p.m. >> Para: Luciano Abriata >> Cc: maker-devel at yandell-lab.org >> Asunto: Re: [maker-devel] getting protein sequences from genomes >> >> >> On May 17, 2013, at 3:45 AM, Luciano Abriata wrote: >> >>> Hello, I am trying to use Maker to annotate genomes from different individuals of a population (D. melanogaster flies). >>> >>> My ultimate goal is to get, for each gene, the amino acid sequences of the coded proteins as they are expressed from each genome. My questions are: >>> >>> 1) How can I match proteins predicted for the same gene in two genomes? >> >> blastp tweaked with parameters to optimize near perfect match >> >>> >>> 2) What is the meaning of all the data in a line such as the following one (taken from the protein.fasta output) >>> >>> maker-2L-augustus-gene-0.19-mRNA-1 protein AED:0.0322873164323667 eAED:0.0322873164323667 QI:2|1|0.66|1|1|1|3|208|541 >>> >> >> AED = Annotation edit distance describes how closely the prediction matches the evidence. This is a distance measure and thus 0 is a perfect match and 1 is no overlap. >> >> eAED = Exon adjusted annotation edit distance: This metric is the same as AED with a couple of exceptions. For a protein coding exon to be counted as overlapping protein evidence the reading frame must be the same in the coding exon and the protein evidence. Second, when mRNA Seq data is used as evidence and both ends of an exon are supported with splice site spanning reads, the middle of that exon is counted as supported as well even if coverage drops off in the interior of the exon.. For the most part AED and eAED will always be the same, but eAED tends to work better on many fringe cases. >> >> QI values are as follows: >> >> 5' UTR Length >> Fraction of splice sites confirmed by EST alignment. >> Fraction of exons that overlap and EST alignment. >> Fraction of exons that overlap EST or protein alignment. >> Fraction of splice sites confirmed by an ab initio prediction. >> Fraction of exons that overlap an ab intitio prediction. >> Number of exons in the transcript. >> 3' UTR length. >> Length of encoded protein. >> >> >>> 3) If I include snap and augustus to improve protein predictions, I get several protein.fasta files: augustus_masked.proteins.fasta , snap_masked.proteins.fasta , non_overlapping_ab_initio.proteins.fasta , and proteins.fasta >>> >>> Which of these files contains the definite set of predicted protein sequences? >> >> The proteins.fasta file is the final set of proteins for all genes that MAKER created annotations for. >> >>> >>> >>> >>> Thanks in advance! >>> >>> Luciano >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> Barry Moore >> Research Scientist >> Dept. of Human Genetics >> University of Utah >> Salt Lake City, UT 84112 >> -------------------------------------------- >> (801) 585-3543 >> >> >> >> >> > > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Jason Stajich jason.stajich at gmail.com jason at bioperl.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Jun 3 07:04:08 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 03 Jun 2013 09:04:08 -0400 Subject: [maker-devel] Advice on params for ciliates In-Reply-To: <9D9882BB-3A26-45D6-A5B0-9B18F9BF5C31@hms.harvard.edu> Message-ID: I don't have any specific advice, but In general I always set blast_depth parameters in the maker_bopts file to 20 or 30 (faster runtimes). Also max_dna_len can be set to 2x higher if you have sufficient memory (3-4 Gb per cpu as opposed to 1-2 Gb that are assumed with the default). Other than that split_hit, pred_flank, and single_exon are the only ones I might change around. You sort of have to run on a few large contigs before deciding what to do with these parameters. split_hit --> set max intron size for alignments pred_flank --> affects clustering for gene dense organisms single_exon --> leave off unless you expect a lot of singel exon genes. --Carson From: "Freeman, Robert M." Date: Thursday, 23 May, 2013 4:17 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Advice on params for ciliates Dear MAKER community, Am embarking on updating models for a ciliate (taxa Ciliophora) and was wondering if folks had recommendations for MAKER parameters. Thanks, Bob ----------------------------------------------------- Bob Freeman, Ph.D. Acorn Worm Informatics, Kirschner lab Dept of Systems Biology, Alpert 524 Harvard Medical School 200 Longwood Avenue Boston, MA 02115 617/432.2294, vox "Sorry I'm late. Oh, God, that sounded insincere. I'm late." -- Karen Walker, from Will and Grace _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bob_Freeman at hms.harvard.edu Wed Jun 5 07:28:36 2013 From: Bob_Freeman at hms.harvard.edu (Bob Freeman) Date: Wed, 5 Jun 2013 09:28:36 -0400 Subject: [maker-devel] Advice on params for ciliates In-Reply-To: References: Message-ID: Thanks, Carson, for these helpful hints. (Separately, the other code did not work again on our cluster. Have been so swamped -- I'll get to the write-up next week. Have been using the 2.25beta binary and that works OK). Best, Bob On Jun 3, 2013, at 9:04 AM, Carson Holt wrote: > I don't have any specific advice, but In general I always set blast_depth parameters in the maker_bopts file to 20 or 30 (faster runtimes). Also max_dna_len can be set to 2x higher if you have sufficient memory (3-4 Gb per cpu as opposed to 1-2 Gb that are assumed with the default). > > Other than that split_hit, pred_flank, and single_exon are the only ones I might change around. You sort of have to run on a few large contigs before deciding what to do with these parameters. > > split_hit --> set max intron size for alignments > pred_flank --> affects clustering for gene dense organisms > single_exon --> leave off unless you expect a lot of singel exon genes. > > --Carson > > > From: "Freeman, Robert M." > Date: Thursday, 23 May, 2013 4:17 PM > To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] Advice on params for ciliates > > Dear MAKER community, > > Am embarking on updating models for a ciliate (taxa Ciliophora) and was wondering if folks had recommendations for MAKER parameters. > > Thanks, > Bob > > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org ----------------------------------------------------- Bob Freeman, Ph.D. Acorn Worm Informatics, Kirschner lab Dept of Systems Biology, Alpert 524 Harvard Medical School 200 Longwood Avenue Boston, MA 02115 617/432.2294, vox "Sorry I'm late. Oh, God, that sounded insincere. I'm late." -- Karen Walker, from Will and Grace -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Wed Jun 5 10:28:46 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Wed, 5 Jun 2013 11:28:46 -0500 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: References: Message-ID: I upgraded to 2.28 and Maker is not running. Thanks! On Wed, May 22, 2013 at 9:03 AM, Carson Holt wrote: > Are you using MAKER version 2.10? I ask because there is in issue with > other_gff in that version that has since been fixed. So if you don't get > other_gff to pass-through, you will need to upgrade to 2.28 (release date > is later today coincidentally). > > For the Augustus GFF3 file, the format is a little weird which is > causing the problem. They are mRNA features not attached to genes. Rather > than build the expected 3 level gene/mRNA/exon structure for these, it is > simpler just to convert it to the 2 level match/match_part structure. Just > convert the 'mRNA' tag to 'match' and all 'exon' tags to 'match_part'. > Rename the GFF3 when your done so that it will force rebuild of the GFF3 > database when you run again. > > Thanks, > Carson > > > > From: Innocent Onsongo > Date: Wednesday, 22 May, 2013 8:47 AM > To: Barry Moore > Cc: > Subject: Re: [maker-devel] Maker: Re-annotation > > No. The MAKER produced GFF3 file does not contain any annotations. I > even tried setting the keep_preds parameter to 1 (keep_preds=1) to see if > it will pass annotations from the Augustus produced GFF file into the final > annotation but that didn't work. I have attached the maker_opts.ctl file > I used together with the first 100 lines of the GFF files it's using. I > also include the GFF file produced by MAKER (CGS01058First100.gff) > > > > > On Tue, May 21, 2013 at 10:43 PM, Barry Moore wrote: > >> Hi Getiria, >> >> Does the MAKER produced GFF3 file contain any annotations at all? Can >> you send the first ~100 lines each of the MAKER produced GFF3 file and of >> the GFF3 files that you passed via maker_opts.ctl? >> >> B >> >> On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: >> >> Maker Development Team, >> >> I am trying to use Maker for re-annotation using gene predictions from >> Augustus. We had previously used Augustus for gene prediction but now want >> to combine these annotations with some EST data. I updated >> fields maker_opts.ctl as below >> >> genome=CGS01058.fasta #genome sequence file in fasta format >> est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT >> pred_gff=Augustus.gff3 #ab-initio predictions from >> other_gff=Promoters.gff3 #promoter annotations >> other_gff=CpG_Islands.gff3 # CpG island annotations >> >> Maker runs to completion and according to the log file annotation was >> successful. However, it also gives a "Segmentation fault (core dumped)" >> message. It does produce a GFF3 file but when I load the GFF3 file into IGV >> and look it does not contain any of the exon definitions in Augustus.gff3. >> Am I missing something? >> >> Regards, >> Getiria >> >> -- >> Getiria Onsongo, Ph.D. >> Informatics Analyst, Research Informatics Support System >> Minnesota Supercomputing Institute for Advanced Computational Research >> University of Minnesota >> Minneapolis, MN 55455 >> Phone: 612-624-0532 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> Barry Moore >> Research Scientist >> Dept. of Human Genetics >> University of Utah >> Salt Lake City, UT 84112 >> -------------------------------------------- >> (801) 585-3543 >> >> >> >> >> > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jun 5 08:30:20 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Jun 2013 10:30:20 -0400 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: Message-ID: What does it do? --Carson From: Innocent Onsongo Date: Wednesday, 5 June, 2013 12:28 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" , Barry Moore Subject: Re: [maker-devel] Maker: Re-annotation I upgraded to 2.28 and Maker is not running. Thanks! On Wed, May 22, 2013 at 9:03 AM, Carson Holt wrote: > Are you using MAKER version 2.10? I ask because there is in issue with > other_gff in that version that has since been fixed. So if you don't get > other_gff to pass-through, you will need to upgrade to 2.28 (release date is > later today coincidentally). > > For the Augustus GFF3 file, the format is a little weird which is causing the > problem. They are mRNA features not attached to genes. Rather than build the > expected 3 level gene/mRNA/exon structure for these, it is simpler just to > convert it to the 2 level match/match_part structure. Just convert the 'mRNA' > tag to 'match' and all 'exon' tags to 'match_part'. Rename the GFF3 when your > done so that it will force rebuild of the GFF3 database when you run again. > > Thanks, > Carson > > > > From: Innocent Onsongo > Date: Wednesday, 22 May, 2013 8:47 AM > To: Barry Moore > Cc: > Subject: Re: [maker-devel] Maker: Re-annotation > > No. The MAKER produced GFF3 file does not contain any annotations. I even > tried setting the keep_preds parameter to 1 (keep_preds=1) to see if it will > pass annotations from the Augustus produced GFF file into the final annotation > but that didn't work. I have attached the maker_opts.ctl file I used together > with the first 100 lines of the GFF files it's using. I also include the GFF > file produced by MAKER (CGS01058First100.gff) > > > > > On Tue, May 21, 2013 at 10:43 PM, Barry Moore wrote: >> Hi Getiria, >> >> Does the MAKER produced GFF3 file contain any annotations at all? Can you >> send the first ~100 lines each of the MAKER produced GFF3 file and of the >> GFF3 files that you passed via maker_opts.ctl? >> >> B >> >> On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: >> >>> Maker Development Team, >>> >>> I am trying to use Maker for re-annotation using gene predictions from >>> Augustus. We had previously used Augustus for gene prediction but now want >>> to combine these annotations with some EST data. I updated fields >>> maker_opts.ctl as below >>> >>> genome=CGS01058.fasta #genome sequence file in fasta format >>> est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT >>> pred_gff=Augustus.gff3 #ab-initio predictions from >>> other_gff=Promoters.gff3 #promoter annotations >>> other_gff=CpG_Islands.gff3 # CpG island annotations >>> >>> Maker runs to completion and according to the log file annotation was >>> successful. However, it also gives a "Segmentation fault (core dumped)" >>> message. It does produce a GFF3 file but when I load the GFF3 file into IGV >>> and look it does not contain any of the exon definitions in Augustus.gff3. >>> Am I missing something? >>> >>> Regards, >>> Getiria >>> >>> -- >>> Getiria Onsongo, Ph.D. >>> Informatics Analyst, Research Informatics Support System >>> Minnesota Supercomputing Institute for Advanced Computational Research >>> University of Minnesota >>> Minneapolis, MN 55455 >>> Phone: 612-624-0532 >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> Barry Moore >> Research Scientist >> Dept. of Human Genetics >> University of Utah >> Salt Lake City, UT 84112 >> -------------------------------------------- >> (801) 585-3543 >> >> >> >> > > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Wed Jun 5 10:35:43 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Wed, 5 Jun 2013 11:35:43 -0500 Subject: [maker-devel] Maker: accessory scripts Message-ID: I was able to successfully ran Maker and now want to converts the gene prediction match/match_part format to annotation gene/mRNA/exon/CDS format. I looked at the tutorial and the script gff3_preds2models is supposed to do this conversion. How do I access this script. It is not in /maker/2.28-beta/bin/ Also, in running gff3_preds2models is the file I used for pred_gff=? Long story short, how do I transform the GFF output from Maker to the more traditional annotation of exon/intron? Thanks, Getiria -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Wed Jun 5 10:37:01 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Wed, 5 Jun 2013 11:37:01 -0500 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: References: Message-ID: Oops! I meant to type Maker is NOW running. On Wed, Jun 5, 2013 at 9:30 AM, Carson Holt wrote: > What does it do? > > --Carson > > From: Innocent Onsongo > Date: Wednesday, 5 June, 2013 12:28 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" , Barry > Moore > > Subject: Re: [maker-devel] Maker: Re-annotation > > I upgraded to 2.28 and Maker is not running. Thanks! > > > On Wed, May 22, 2013 at 9:03 AM, Carson Holt wrote: > >> Are you using MAKER version 2.10? I ask because there is in issue with >> other_gff in that version that has since been fixed. So if you don't get >> other_gff to pass-through, you will need to upgrade to 2.28 (release date >> is later today coincidentally). >> >> For the Augustus GFF3 file, the format is a little weird which is causing >> the problem. They are mRNA features not attached to genes. Rather than >> build the expected 3 level gene/mRNA/exon structure for these, it is >> simpler just to convert it to the 2 level match/match_part structure. Just >> convert the 'mRNA' tag to 'match' and all 'exon' tags to 'match_part'. >> Rename the GFF3 when your done so that it will force rebuild of the GFF3 >> database when you run again. >> >> Thanks, >> Carson >> >> >> >> From: Innocent Onsongo >> Date: Wednesday, 22 May, 2013 8:47 AM >> To: Barry Moore >> Cc: >> Subject: Re: [maker-devel] Maker: Re-annotation >> >> No. The MAKER produced GFF3 file does not contain any annotations. I even >> tried setting the keep_preds parameter to 1 (keep_preds=1) to see if it >> will pass annotations from the Augustus produced GFF file into the final >> annotation but that didn't work. I have attached the maker_opts.ctl file >> I used together with the first 100 lines of the GFF files it's using. I >> also include the GFF file produced by MAKER (CGS01058First100.gff) >> >> >> >> >> On Tue, May 21, 2013 at 10:43 PM, Barry Moore wrote: >> >>> Hi Getiria, >>> >>> Does the MAKER produced GFF3 file contain any annotations at all? Can >>> you send the first ~100 lines each of the MAKER produced GFF3 file and of >>> the GFF3 files that you passed via maker_opts.ctl? >>> >>> B >>> >>> On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: >>> >>> Maker Development Team, >>> >>> I am trying to use Maker for re-annotation using gene predictions from >>> Augustus. We had previously used Augustus for gene prediction but now want >>> to combine these annotations with some EST data. I updated >>> fields maker_opts.ctl as below >>> >>> genome=CGS01058.fasta #genome sequence file in fasta format >>> est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT >>> pred_gff=Augustus.gff3 #ab-initio predictions from >>> other_gff=Promoters.gff3 #promoter annotations >>> other_gff=CpG_Islands.gff3 # CpG island annotations >>> >>> Maker runs to completion and according to the log file annotation was >>> successful. However, it also gives a "Segmentation fault (core dumped)" >>> message. It does produce a GFF3 file but when I load the GFF3 file into IGV >>> and look it does not contain any of the exon definitions in Augustus.gff3. >>> Am I missing something? >>> >>> Regards, >>> Getiria >>> >>> -- >>> Getiria Onsongo, Ph.D. >>> Informatics Analyst, Research Informatics Support System >>> Minnesota Supercomputing Institute for Advanced Computational Research >>> University of Minnesota >>> Minneapolis, MN 55455 >>> Phone: 612-624-0532 >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> Barry Moore >>> Research Scientist >>> Dept. of Human Genetics >>> University of Utah >>> Salt Lake City, UT 84112 >>> -------------------------------------------- >>> (801) 585-3543 >>> >>> >>> >>> >>> >> >> >> -- >> Getiria Onsongo, Ph.D. >> Informatics Analyst, Research Informatics Support System >> Minnesota Supercomputing Institute for Advanced Computational Research >> University of Minnesota >> Minneapolis, MN 55455 >> Phone: 612-624-0532 >> > > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Jun 5 10:38:59 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 5 Jun 2013 16:38:59 +0000 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: References: Message-ID: Hi Innocent, I'm just jumping in this conversation kind of late in the game, but if you look in the gff3 file that maker gave you, do you see any gene, exon, or CDS features in the output? When you give evidence (protein or EST) and ab-initio predictors to maker the default behavior is to create gene models. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] on behalf of Innocent Onsongo [onson001 at umn.edu] Sent: Wednesday, June 05, 2013 10:35 AM To: Carson Holt Cc: maker-devel at yandell-lab.org; Barry Moore Subject: [maker-devel] Maker: accessory scripts I was able to successfully ran Maker and now want to converts the gene prediction match/match_part format to annotation gene/mRNA/exon/CDS format. I looked at the tutorial and the script gff3_preds2models is supposed to do this conversion. How do I access this script. It is not in /maker/2.28-beta/bin/ Also, in running gff3_preds2models is the file I used for pred_gff=? Long story short, how do I transform the GFF output from Maker to the more traditional annotation of exon/intron? Thanks, Getiria -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jun 5 08:44:36 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Jun 2013 10:44:36 -0400 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: Message-ID: All maker gene annotations will be of the format gene/mRNA/exon/CDS. Anything in the format match/match_part is an evidence alignment or rejected model and is there for reference purposes. If you want to upgrade all of the rejected loci to gene annotations, set keep_preds=1 in the control files. If you want to upgrade a subset of rejected models to a full annotation, create a list of IDs (one per line) then give them to the attached script. gff3_preds2models was previously deprecated and no longer part of the maker distribution, but the attached script is an updated version with the same functionality. --Carson From: Innocent Onsongo Date: Wednesday, 5 June, 2013 12:35 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" , Barry Moore Subject: [maker-devel] Maker: accessory scripts I was able to successfully ran Maker and now want to converts the gene prediction match/match_part format to annotation gene/mRNA/exon/CDS format. I looked at the tutorial and the script gff3_preds2models is supposed to do this conversion. How do I access this script. It is not in /maker/2.28-beta/bin/ Also, in running gff3_preds2models is the file I used for pred_gff=? Long story short, how do I transform the GFF output from Maker to the more traditional annotation of exon/intron? Thanks, Getiria _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gff3_preds2models Type: application/octet-stream Size: 4777 bytes Desc: not available URL: From carsonhh at gmail.com Wed Jun 5 08:45:10 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Jun 2013 10:45:10 -0400 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: Message-ID: Gotcha :-) --Carson From: Innocent Onsongo Date: Wednesday, 5 June, 2013 12:37 PM To: Carson Holt Cc: Carson Holt , "maker-devel at yandell-lab.org" , Barry Moore Subject: Re: [maker-devel] Maker: Re-annotation Oops! I meant to type Maker is NOW running. On Wed, Jun 5, 2013 at 9:30 AM, Carson Holt wrote: > What does it do? > > --Carson > > From: Innocent Onsongo > Date: Wednesday, 5 June, 2013 12:28 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" , Barry Moore > > > Subject: Re: [maker-devel] Maker: Re-annotation > > I upgraded to 2.28 and Maker is not running. Thanks! > > > On Wed, May 22, 2013 at 9:03 AM, Carson Holt wrote: >> Are you using MAKER version 2.10? I ask because there is in issue with >> other_gff in that version that has since been fixed. So if you don't get >> other_gff to pass-through, you will need to upgrade to 2.28 (release date is >> later today coincidentally). >> >> For the Augustus GFF3 file, the format is a little weird which is causing the >> problem. They are mRNA features not attached to genes. Rather than build >> the expected 3 level gene/mRNA/exon structure for these, it is simpler just >> to convert it to the 2 level match/match_part structure. Just convert the >> 'mRNA' tag to 'match' and all 'exon' tags to 'match_part'. Rename the GFF3 >> when your done so that it will force rebuild of the GFF3 database when you >> run again. >> >> Thanks, >> Carson >> >> >> >> From: Innocent Onsongo >> Date: Wednesday, 22 May, 2013 8:47 AM >> To: Barry Moore >> Cc: >> Subject: Re: [maker-devel] Maker: Re-annotation >> >> No. The MAKER produced GFF3 file does not contain any annotations. I even >> tried setting the keep_preds parameter to 1 (keep_preds=1) to see if it will >> pass annotations from the Augustus produced GFF file into the final >> annotation but that didn't work. I have attached the maker_opts.ctl file I >> used together with the first 100 lines of the GFF files it's using. I also >> include the GFF file produced by MAKER (CGS01058First100.gff) >> >> >> >> >> On Tue, May 21, 2013 at 10:43 PM, Barry Moore wrote: >>> Hi Getiria, >>> >>> Does the MAKER produced GFF3 file contain any annotations at all? Can you >>> send the first ~100 lines each of the MAKER produced GFF3 file and of the >>> GFF3 files that you passed via maker_opts.ctl? >>> >>> B >>> >>> On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: >>> >>>> Maker Development Team, >>>> >>>> I am trying to use Maker for re-annotation using gene predictions from >>>> Augustus. We had previously used Augustus for gene prediction but now want >>>> to combine these annotations with some EST data. I updated fields >>>> maker_opts.ctl as below >>>> >>>> genome=CGS01058.fasta #genome sequence file in fasta format >>>> est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT >>>> pred_gff=Augustus.gff3 #ab-initio predictions from >>>> other_gff=Promoters.gff3 #promoter annotations >>>> other_gff=CpG_Islands.gff3 # CpG island annotations >>>> >>>> Maker runs to completion and according to the log file annotation was >>>> successful. However, it also gives a "Segmentation fault (core dumped)" >>>> message. It does produce a GFF3 file but when I load the GFF3 file into IGV >>>> and look it does not contain any of the exon definitions in Augustus.gff3. >>>> Am I missing something? >>>> >>>> Regards, >>>> Getiria >>>> >>>> -- >>>> Getiria Onsongo, Ph.D. >>>> Informatics Analyst, Research Informatics Support System >>>> Minnesota Supercomputing Institute for Advanced Computational Research >>>> University of Minnesota >>>> Minneapolis, MN 55455 >>>> Phone: 612-624-0532 >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> Barry Moore >>> Research Scientist >>> Dept. of Human Genetics >>> University of Utah >>> Salt Lake City, UT 84112 >>> -------------------------------------------- >>> (801) 585-3543 >>> >>> >>> >>> >> >> >> >> -- >> Getiria Onsongo, Ph.D. >> Informatics Analyst, Research Informatics Support System >> Minnesota Supercomputing Institute for Advanced Computational Research >> University of Minnesota >> Minneapolis, MN 55455 >> Phone: 612-624-0532 > > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jun 5 08:47:51 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Jun 2013 10:47:51 -0400 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: Message-ID: Also, just a note, models are rejected if they have no protein or EST support. This is because ab inito predictors over predict (you may have 10 false positives for every true positive in some genomes for example). --Carson From: Carson Holt Date: Wednesday, 5 June, 2013 10:44 AM To: Innocent Onsongo , Carson Holt Cc: "maker-devel at yandell-lab.org" , Barry Moore Subject: Re: [maker-devel] Maker: accessory scripts All maker gene annotations will be of the format gene/mRNA/exon/CDS. Anything in the format match/match_part is an evidence alignment or rejected model and is there for reference purposes. If you want to upgrade all of the rejected loci to gene annotations, set keep_preds=1 in the control files. If you want to upgrade a subset of rejected models to a full annotation, create a list of IDs (one per line) then give them to the attached script. gff3_preds2models was previously deprecated and no longer part of the maker distribution, but the attached script is an updated version with the same functionality. --Carson From: Innocent Onsongo Date: Wednesday, 5 June, 2013 12:35 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" , Barry Moore Subject: [maker-devel] Maker: accessory scripts I was able to successfully ran Maker and now want to converts the gene prediction match/match_part format to annotation gene/mRNA/exon/CDS format. I looked at the tutorial and the script gff3_preds2models is supposed to do this conversion. How do I access this script. It is not in /maker/2.28-beta/bin/ Also, in running gff3_preds2models is the file I used for pred_gff=? Long story short, how do I transform the GFF output from Maker to the more traditional annotation of exon/intron? Thanks, Getiria _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From amelia.ireland at gmod.org Wed Jun 5 11:14:05 2013 From: amelia.ireland at gmod.org (Amelia Ireland) Date: Wed, 5 Jun 2013 10:14:05 -0700 Subject: [maker-devel] Apply now for the GMOD Summer School! Message-ID: Closing date for applications: 10 June July 19-23, 2013; NESCent, Durham, North Carolina http://gmod.org/wiki/2013_GMOD_Summer_School The 2013 GMOD Summer School is the best way to get to grips with GMOD in the Cloud, GMOD's suite of genomic and genetic software. Over five days, attendees will learn how to install, configure, and run popular GMOD software for visualization, storage, and dissemination of genetic and genomic data. The following software is covered: - Chado, a species-independent database schema covering many areas of genetic and genomic data; - GBrowse, the ubiquitous genome browser; - GBrowse syn, a synteny browser built on GBrowse; - Galaxy, analysis and computation pipeline; - JBrowse, genome browsing evolved; - MAKER, automated annotation pipeline; - Tripal, a slick web interface for displaying and editing data from Chado; and - WebApollo, distributed community genome annotation tool (built on JBrowse). There are additional sessions on setting up a GMOD in the Cloud virtual machine in the Amazon cloud, and common file formats. Courses are taught by members of the software development teams, and there are work sessions in the evenings for participants to talk to the developers or apply what they have been taught to their own data. For more information and to apply, visit http://gmod.org/wiki/2013_GMOD_Summer_School. There are some scholarship funds available for those from underrepresented minorities. All applications should be in by June 10th. If you have any questions, please contact the GMOD help desk at help at gmod.org. Hope to see you there! Thanks, Amelia Ireland GMOD Community Support http://gmod.org || @gmodproject -- Amelia Ireland GMOD Community Support http://gmod.org || @gmodproject -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnuhn at ebi.ac.uk Thu Jun 6 02:44:10 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Thu, 06 Jun 2013 09:44:10 +0100 Subject: [maker-devel] Effect of the unmask option Message-ID: <51B04BDA.7050307@ebi.ac.uk> Hello Carson! When running maker with the unmask option, how does maker use the predictions generated from running the gene predictors on the unmasked sequence? The tutorial says: "You do have the option to run ab initio gene predictors on both the masked and unmasked sequence if repeat masking worries you though. You do this by setting unmask:1 in the maker_opt.ctl configuration file." http://gmod.org/wiki/MAKER_Tutorial_2012 But in the sub get_non_overlaping_abinits in maker::auto_annotator (maker version 2.27) they are skipped: #only accept masked predictions unless I'm not masking or the predictor is genemark my $src = $g->{algorithm}; unless($src =~ /_masked$|^pred_gff/ || $CTL_OPT->{_no_mask} || $CTL_OPT->{predictor} eq 'genemark') { next; } Cheers, Michael. From carsonhh at gmail.com Thu Jun 6 07:55:08 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Jun 2013 09:55:08 -0400 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: Message-ID: One thing to keep in mind is the strandedness of the evidence and the model (they must be on the same strand). Further protein evidence is only valid support if it is in the same reading frame as the model. Could you send the full GFF3 for the contig (I need features and GFF3 internal fasta) and the coordinates of the region in question, and I can take a look? Also if you can, it would be good to let maker run Augustus as well with the species file rather than just passing in the GFF3. This is because MAKER can only talk to Augustus to generate competing hint based models if you provide the species. Thanks, Carson From: Innocent Onsongo Date: Wednesday, 5 June, 2013 1:10 PM To: Carson Holt Cc: Carson Holt , "maker-devel at yandell-lab.org" , Barry Moore Subject: Re: [maker-devel] Maker: accessory scripts I checked visually in IGV and there are some exons in the predicted model with protein and EST support but the maker output GFF only has match_part and protein_match in column 3. Does that mean Maker doesn't deem any of the evidence sufficient to make a gene model prediction? I guess I am somewhat surprised I am not getting any exons predicted by Maker. Is there a parameter I can alter to reduce the threshold at which Maker makes this call? I have attached the first 400 lines of one of my GFF files together with the control file (maker_opts.ctl) just in case they might be useful. Getiria On Wed, Jun 5, 2013 at 9:47 AM, Carson Holt wrote: > Also, just a note, models are rejected if they have no protein or EST support. > This is because ab inito predictors over predict (you may have 10 false > positives for every true positive in some genomes for example). > > --Carson > > > > From: Carson Holt > Date: Wednesday, 5 June, 2013 10:44 AM > To: Innocent Onsongo , Carson Holt > > Cc: "maker-devel at yandell-lab.org" , Barry Moore > > Subject: Re: [maker-devel] Maker: accessory scripts > > All maker gene annotations will be of the format gene/mRNA/exon/CDS. > Anything in the format match/match_part is an evidence alignment or rejected > model and is there for reference purposes. If you want to upgrade all of the > rejected loci to gene annotations, set keep_preds=1 in the control files. If > you want to upgrade a subset of rejected models to a full annotation, create a > list of IDs (one per line) then give them to the attached script. > gff3_preds2models was previously deprecated and no longer part of the maker > distribution, but the attached script is an updated version with the same > functionality. > > --Carson > > > > From: Innocent Onsongo > Date: Wednesday, 5 June, 2013 12:35 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" , Barry Moore > > Subject: [maker-devel] Maker: accessory scripts > > I was able to successfully ran Maker and now want to converts the gene > prediction match/match_part format to annotation gene/mRNA/exon/CDS format. I > looked at the tutorial and the script gff3_preds2models > is supposed to do this conversion. How do I access this script. It is not in > /maker/2.28-beta/bin/ > > Also, in running gff3_preds2models is the > file I used for pred_gff=? > > Long story short, how do I transform the GFF output from Maker to the more > traditional annotation of exon/intron? > > Thanks, > Getiria > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Wed Jun 5 11:10:01 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Wed, 5 Jun 2013 12:10:01 -0500 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: References: Message-ID: I checked visually in IGV and there are some exons in the predicted model with protein and EST support but the maker output GFF only has match_part and protein_match in column 3. Does that mean Maker doesn't deem any of the evidence sufficient to make a gene model prediction? I guess I am somewhat surprised I am not getting any exons predicted by Maker. Is there a parameter I can alter to reduce the threshold at which Maker makes this call? I have attached the first 400 lines of one of my GFF files together with the control file (maker_opts.ctl) just in case they might be useful. Getiria On Wed, Jun 5, 2013 at 9:47 AM, Carson Holt wrote: > Also, just a note, models are rejected if they have no protein or EST > support. This is because ab inito predictors over predict (you may have 10 > false positives for every true positive in some genomes for example). > > --Carson > > > > From: Carson Holt > Date: Wednesday, 5 June, 2013 10:44 AM > To: Innocent Onsongo , Carson Holt < > carson.holt at oicr.on.ca> > > Cc: "maker-devel at yandell-lab.org" , Barry > Moore > Subject: Re: [maker-devel] Maker: accessory scripts > > All maker gene annotations will be of the format gene/mRNA/exon/CDS. > Anything in the format match/match_part is an evidence alignment or > rejected model and is there for reference purposes. If you want to upgrade > all of the rejected loci to gene annotations, set keep_preds=1 in the > control files. If you want to upgrade a subset of rejected models to a > full annotation, create a list of IDs (one per line) then give them to the > attached script. gff3_preds2models was previously deprecated and no longer > part of the maker distribution, but the attached script is an updated > version with the same functionality. > > --Carson > > > > From: Innocent Onsongo > Date: Wednesday, 5 June, 2013 12:35 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" , Barry > Moore > Subject: [maker-devel] Maker: accessory scripts > > I was able to successfully ran Maker and now want to converts the gene > prediction match/match_part format to annotation gene/mRNA/exon/CDS format. > I looked at the tutorial and the script gff3_preds2models > is supposed to do this conversion. How do I access this script. It is not > in /maker/2.28-beta/bin/ > > Also, in running gff3_preds2models is list> the file I used for pred_gff=? > > Long story short, how do I transform the GFF output from Maker to the more > traditional annotation of exon/intron? > > Thanks, > Getiria > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4525 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: MakerFirst400.gff Type: application/octet-stream Size: 74870 bytes Desc: not available URL: From onson001 at umn.edu Thu Jun 6 12:58:21 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Thu, 6 Jun 2013 13:58:21 -0500 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: References: Message-ID: Thanks for the timely feedback Carson. I made a change to my pred_gff and est_gff GFF3 files and now I am getting results but I am not sure if the changes I made are valid. I want to make sure the changes I made did not lead Maker to behave in an unexpected way and lead to results that might be incorrect. In my pred_gff file, I replaced "mRNA" with "protein_match" and "exon" with "match" below are the first three lines of the old and new pred_gff files respectively ---------------old pred_gff ##gff-version 3 CGS00003 AUGUSTUS mRNA 1 10865 1 + . CGS00003 AUGUSTUS exon 2013 2050 . + 1 ---------------new pred_gff ##gff-version 3 CGS00003 AUGUSTUS protein_match 1 10865 1 + . CGS00003 AUGUSTUS match_part 2013 2050 . + 1 In my est_gff file, I replaced "mRNA" with "protein_match" and "exon" with "match" below are the first three lines of the old and new pred_gff files respectively ----------------old est_gff ##gff-version 3 CGS00003 EST_BLAT mRNA 4641336 4758501 6072 - . CGS00003 EST_BLAT exon 4641336 4641979 644 - . ----------------new est_gff CGS00003 EST_BLAT expressed_sequence_match 4641336 4758501 6072 - . CGS00003 EST_BLAT match_part 4641336 4641979 644 - . Are the changes I made valid? Thanks, Getiria On Wed, Jun 5, 2013 at 9:47 AM, Carson Holt wrote: > Also, just a note, models are rejected if they have no protein or EST > support. This is because ab inito predictors over predict (you may have 10 > false positives for every true positive in some genomes for example). > > --Carson > > > > From: Carson Holt > Date: Wednesday, 5 June, 2013 10:44 AM > To: Innocent Onsongo , Carson Holt < > carson.holt at oicr.on.ca> > > Cc: "maker-devel at yandell-lab.org" , Barry > Moore > Subject: Re: [maker-devel] Maker: accessory scripts > > All maker gene annotations will be of the format gene/mRNA/exon/CDS. > Anything in the format match/match_part is an evidence alignment or > rejected model and is there for reference purposes. If you want to upgrade > all of the rejected loci to gene annotations, set keep_preds=1 in the > control files. If you want to upgrade a subset of rejected models to a > full annotation, create a list of IDs (one per line) then give them to the > attached script. gff3_preds2models was previously deprecated and no longer > part of the maker distribution, but the attached script is an updated > version with the same functionality. > > --Carson > > > > From: Innocent Onsongo > Date: Wednesday, 5 June, 2013 12:35 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" , Barry > Moore > Subject: [maker-devel] Maker: accessory scripts > > I was able to successfully ran Maker and now want to converts the gene > prediction match/match_part format to annotation gene/mRNA/exon/CDS format. > I looked at the tutorial and the script gff3_preds2models > is supposed to do this conversion. How do I access this script. It is not > in /maker/2.28-beta/bin/ > > Also, in running gff3_preds2models is list> the file I used for pred_gff=? > > Long story short, how do I transform the GFF output from Maker to the more > traditional annotation of exon/intron? > > Thanks, > Getiria > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From benayoun at stanford.edu Fri Jun 7 11:17:47 2013 From: benayoun at stanford.edu (=?ISO-8859-1?Q?B=E9r=E9nice_Benayoun?=) Date: Fri, 7 Jun 2013 10:17:47 -0700 Subject: [maker-devel] Maker and mono-exonic genes ? Message-ID: Dear maker developers, I am currently annotating a de novo fish genome, and have started looking for genes of interest in particular in Maker's output to verify that it's outputting proper gene sets. While many of the genes I look for seem to be correctly annotated by the pipeline, I have noticed that important genes that do have strong evidentiary support but are monoexonic are NOT reported by maker. I am attaching a screenshot for the contig that I know should contain the * Foxl2* gene (notoriously monoexonic across evolution), and highlighted the corresponding evidence for it. Is there any setting I can give to maker to force it to output monoexonic genes ? I already set "single_exon=1" with no success. I attached my config file FYI. Thank you so much in advance for your answer !!! Best, Berenice. -- B?r?nice A. BENAYOUN, Ph.D. Stanford University/Genetics Department *BRUNET Laboratory*, 'Molecular Basis of Longevity and Age Related Diseases' M312 Alway Building 300, Pasteur Drive MC 5120 Stanford, CA 94305-5120 USA Email: benayoun at stanford.edu Web: www.stanford.edu/group/brunet/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Appolo_screenshot_missing_monoexonic_pred.pdf Type: application/pdf Size: 709436 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.log Type: application/octet-stream Size: 5154 bytes Desc: not available URL: From onson001 at umn.edu Fri Jun 7 14:08:43 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Fri, 7 Jun 2013 15:08:43 -0500 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: References: Message-ID: Carson, I have attached the full gff3 for the contig together with a screen shot from IGV with regions I was expecting Maker to make a consensus call. The region on question is CGS00003:5264784-5273457. I will greatly appreciate any insights. Thanks, Getiria On Thu, Jun 6, 2013 at 8:55 AM, Carson Holt wrote: > One thing to keep in mind is the strandedness of the evidence and the > model (they must be on the same strand). Further protein evidence is only > valid support if it is in the same reading frame as the model. > > Could you send the full GFF3 for the contig (I need features and GFF3 > internal fasta) and the coordinates of the region in question, and I can > take a look? Also if you can, it would be good to let maker run Augustus > as well with the species file rather than just passing in the GFF3. This > is because MAKER can only talk to Augustus to generate competing hint based > models if you provide the species. > > Thanks, > Carson > > > From: Innocent Onsongo > Date: Wednesday, 5 June, 2013 1:10 PM > To: Carson Holt > Cc: Carson Holt , "maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org>, Barry Moore > > Subject: Re: [maker-devel] Maker: accessory scripts > > I checked visually in IGV and there are some exons in the predicted model > with protein and EST support but the maker output GFF only has match_part > and protein_match in column 3. Does that mean Maker doesn't deem any of the > evidence sufficient to make a gene model prediction? > > I guess I am somewhat surprised I am not getting any exons predicted by > Maker. Is there a parameter I can alter to reduce the threshold at which > Maker makes this call? I have attached the first 400 lines of one of my GFF > files together with the control file (maker_opts.ctl) just in case they > might be useful. > > Getiria > > > On Wed, Jun 5, 2013 at 9:47 AM, Carson Holt wrote: > >> Also, just a note, models are rejected if they have no protein or EST >> support. This is because ab inito predictors over predict (you may have 10 >> false positives for every true positive in some genomes for example). >> >> --Carson >> >> >> >> From: Carson Holt >> Date: Wednesday, 5 June, 2013 10:44 AM >> To: Innocent Onsongo , Carson Holt < >> carson.holt at oicr.on.ca> >> >> Cc: "maker-devel at yandell-lab.org" , Barry >> Moore >> Subject: Re: [maker-devel] Maker: accessory scripts >> >> All maker gene annotations will be of the format gene/mRNA/exon/CDS. >> Anything in the format match/match_part is an evidence alignment or >> rejected model and is there for reference purposes. If you want to upgrade >> all of the rejected loci to gene annotations, set keep_preds=1 in the >> control files. If you want to upgrade a subset of rejected models to a >> full annotation, create a list of IDs (one per line) then give them to the >> attached script. gff3_preds2models was previously deprecated and no longer >> part of the maker distribution, but the attached script is an updated >> version with the same functionality. >> >> --Carson >> >> >> >> From: Innocent Onsongo >> Date: Wednesday, 5 June, 2013 12:35 PM >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" , Barry >> Moore >> Subject: [maker-devel] Maker: accessory scripts >> >> I was able to successfully ran Maker and now want to converts the gene >> prediction match/match_part format to annotation gene/mRNA/exon/CDS format. >> I looked at the tutorial and the script gff3_preds2models >> is supposed to do this conversion. How do I access this script. It is not >> in /maker/2.28-beta/bin/ >> >> Also, in running gff3_preds2models is > list> the file I used for pred_gff=? >> >> Long story short, how do I transform the GFF output from Maker to the >> more traditional annotation of exon/intron? >> >> Thanks, >> Getiria >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: CGS00003.gff Type: application/octet-stream Size: 11835535 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: CGS00003_5264784-5273457.pdf Type: application/pdf Size: 124264 bytes Desc: not available URL: From dence at genetics.utah.edu Fri Jun 7 15:32:57 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 7 Jun 2013 21:32:57 +0000 Subject: [maker-devel] Maker and mono-exonic genes ? In-Reply-To: References: Message-ID: Hi Berenice, Thank you for sending that screenshot and the maker_opts.log file. Those are exactly what we need to understand how to expect MAKER to perform. In looking at the screenshot, it doesn't look like any of the gene predictors gave a prediction in this region. Uses the predictions from ab-initio tools as a basis for models and considers models that are supported by evidence. It won't by default create a model when there isn't a prediction in the region. Can I ask which gene predictors you used and how they were trained? You might consider training one or more of them on the specific evidence that you expect to support these genes and then rerunning maker with the retrained predictors. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of B?r?nice Benayoun [benayoun at stanford.edu] Sent: Friday, June 07, 2013 11:17 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] Maker and mono-exonic genes ? Dear maker developers, I am currently annotating a de novo fish genome, and have started looking for genes of interest in particular in Maker's output to verify that it's outputting proper gene sets. While many of the genes I look for seem to be correctly annotated by the pipeline, I have noticed that important genes that do have strong evidentiary support but are monoexonic are NOT reported by maker. I am attaching a screenshot for the contig that I know should contain the Foxl2 gene (notoriously monoexonic across evolution), and highlighted the corresponding evidence for it. Is there any setting I can give to maker to force it to output monoexonic genes ? I already set "single_exon=1" with no success. I attached my config file FYI. Thank you so much in advance for your answer !!! Best, Berenice. -- B?r?nice A. BENAYOUN, Ph.D. Stanford University/Genetics Department BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' M312 Alway Building 300, Pasteur Drive MC 5120 Stanford, CA 94305-5120 USA Email: benayoun at stanford.edu Web: www.stanford.edu/group/brunet/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Fri Jun 7 15:58:16 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 7 Jun 2013 21:58:16 +0000 Subject: [maker-devel] Maker and mono-exonic genes ? In-Reply-To: References: , Message-ID: Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: berenice.benayoun at gmail.com [berenice.benayoun at gmail.com] on behalf of B?r?nice Benayoun [benayoun at stanford.edu] Sent: Friday, June 07, 2013 3:50 PM To: Daniel Ence Subject: Re: [maker-devel] Maker and mono-exonic genes ? Hi Daniel, Thanks for the quick answer ! I used SNAP, and trained from a hmm model made with the CEGMA output on my genome (240 gene models) plus a first run of maker of 1/3 of the genome. I tried GenemarkES and Augustus, but for some reason they don't run, so I stopped indicating their existence to maker. Should I do something in particular to train it "better" ? Is there any other predictor that would be worth running ? Thanks so much for your help ! Berenice 2013/6/7 Daniel Ence > Hi Berenice, Thank you for sending that screenshot and the maker_opts.log file. Those are exactly what we need to understand how to expect MAKER to perform. In looking at the screenshot, it doesn't look like any of the gene predictors gave a prediction in this region. Uses the predictions from ab-initio tools as a basis for models and considers models that are supported by evidence. It won't by default create a model when there isn't a prediction in the region. Can I ask which gene predictors you used and how they were trained? You might consider training one or more of them on the specific evidence that you expect to support these genes and then rerunning maker with the retrained predictors. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of B?r?nice Benayoun [benayoun at stanford.edu] Sent: Friday, June 07, 2013 11:17 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] Maker and mono-exonic genes ? Dear maker developers, I am currently annotating a de novo fish genome, and have started looking for genes of interest in particular in Maker's output to verify that it's outputting proper gene sets. While many of the genes I look for seem to be correctly annotated by the pipeline, I have noticed that important genes that do have strong evidentiary support but are monoexonic are NOT reported by maker. I am attaching a screenshot for the contig that I know should contain the Foxl2 gene (notoriously monoexonic across evolution), and highlighted the corresponding evidence for it. Is there any setting I can give to maker to force it to output monoexonic genes ? I already set "single_exon=1" with no success. I attached my config file FYI. Thank you so much in advance for your answer !!! Best, Berenice. -- B?r?nice A. BENAYOUN, Ph.D. Stanford University/Genetics Department BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' M312 Alway Building 300, Pasteur Drive MC 5120 Stanford, CA 94305-5120 USA Email: benayoun at stanford.edu Web: www.stanford.edu/group/brunet/ -- B?r?nice A. BENAYOUN, Ph.D. Stanford University/Genetics Department BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' M312 Alway Building 300, Pasteur Drive MC 5120 Stanford, CA 94305-5120 USA Email: benayoun at stanford.edu Web: www.stanford.edu/group/brunet/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Fri Jun 7 16:30:35 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri, 7 Jun 2013 16:30:35 -0600 Subject: [maker-devel] Maker and mono-exonic genes ? In-Reply-To: References: , Message-ID: <11A6EF4C-B82E-4851-80FC-B8668531E2EC@genetics.utah.edu> Hi Berenice, SNAP is a good gene predictor, but for most genomes Augustus can be more accurate - of course it is also harder to train. Running a first round of MAKER annotation with SNAP as the predictor and then training SNAP on the output from that run followed by a second MAKER run (runs pretty fast second time because all the blast jobs are reused) is a good way to start. Ultimately running Augustus as well (along with custom training) is probably worth it for a final annotation effort. The good thing is you can run these iterative cycles of annotation with minimal effort because MAKER will reuse an computations that have already run. B On Jun 7, 2013, at 3:58 PM, Daniel Ence wrote: > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > From: berenice.benayoun at gmail.com [berenice.benayoun at gmail.com] on behalf of B?r?nice Benayoun [benayoun at stanford.edu] > Sent: Friday, June 07, 2013 3:50 PM > To: Daniel Ence > Subject: Re: [maker-devel] Maker and mono-exonic genes ? > > Hi Daniel, > > Thanks for the quick answer ! > > I used SNAP, and trained from a hmm model made with the CEGMA output on my genome (240 gene models) plus a first run of maker of 1/3 of the genome. I tried GenemarkES and Augustus, but for some reason they don't run, so I stopped indicating their existence to maker. > > Should I do something in particular to train it "better" ? Is there any other predictor that would be worth running ? > > Thanks so much for your help ! > > Berenice > > 2013/6/7 Daniel Ence > Hi Berenice, Thank you for sending that screenshot and the maker_opts.log file. Those are exactly what we need to understand how to expect MAKER to perform. > > In looking at the screenshot, it doesn't look like any of the gene predictors gave a prediction in this region. Uses the predictions from ab-initio tools as a basis for models and considers models that are supported by evidence. It won't by default create a model when there isn't a prediction in the region. > > Can I ask which gene predictors you used and how they were trained? You might consider training one or more of them on the specific evidence that you expect to support these genes and then rerunning maker with the retrained predictors. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of B?r?nice Benayoun [benayoun at stanford.edu] > Sent: Friday, June 07, 2013 11:17 AM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] Maker and mono-exonic genes ? > > Dear maker developers, > > I am currently annotating a de novo fish genome, and have started looking for genes of interest in particular in Maker's output to verify that it's outputting proper gene sets. > > While many of the genes I look for seem to be correctly annotated by the pipeline, I have noticed that important genes that do have strong evidentiary support but are monoexonic are NOT reported by maker. > > I am attaching a screenshot for the contig that I know should contain the Foxl2 gene (notoriously monoexonic across evolution), and highlighted the corresponding evidence for it. > > Is there any setting I can give to maker to force it to output monoexonic genes ? I already set "single_exon=1" with no success. I attached my config file FYI. > > Thank you so much in advance for your answer !!! > > Best, > > Berenice. > -- > B?r?nice A. BENAYOUN, Ph.D. > Stanford University/Genetics Department > BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' > M312 Alway Building > 300, Pasteur Drive > MC 5120 > Stanford, CA 94305-5120 > USA > Email: benayoun at stanford.edu > Web: www.stanford.edu/group/brunet/ > > > > -- > B?r?nice A. BENAYOUN, Ph.D. > Stanford University/Genetics Department > BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' > M312 Alway Building > 300, Pasteur Drive > MC 5120 > Stanford, CA 94305-5120 > USA > Email: benayoun at stanford.edu > Web: www.stanford.edu/group/brunet/ > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jun 7 15:51:53 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 07 Jun 2013 16:51:53 -0500 Subject: [maker-devel] Effect of the unmask option In-Reply-To: <51B04BDA.7050307@ebi.ac.uk> Message-ID: The unmasked option allows the ab initio predictions ran on unmasked sequence to compete against other models, and only if they have a better AED score are they selected. They are not available for non-overlapping rejected models at the end of the run because that set is non-redundant and they tend to have a very high likelihood of being transposons themselves. So I don't let a repeat containing model override a non-repeat containing model unless there is evidence supporting it (there is noever evidence supporting the non-overlapping models). --Carson On 13-06-06 4:44 AM, "Michael Nuhn" wrote: >Hello Carson! > >When running maker with the unmask option, how does maker use the >predictions generated from running the gene predictors on the unmasked >sequence? > >The tutorial says: > >"You do have the option to run ab initio gene predictors on both the >masked and unmasked sequence if repeat masking worries you though. You >do this by setting unmask:1 in the maker_opt.ctl configuration file." > >http://gmod.org/wiki/MAKER_Tutorial_2012 > >But in the sub get_non_overlaping_abinits in maker::auto_annotator >(maker version 2.27) they are skipped: > >#only accept masked predictions unless I'm not masking or the predictor >is genemark >my $src = $g->{algorithm}; >unless($src =~ /_masked$|^pred_gff/ || $CTL_OPT->{_no_mask} || >$CTL_OPT->{predictor} eq 'genemark') { > next; >} > >Cheers, >Michael. > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri Jun 7 16:10:09 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 07 Jun 2013 17:10:09 -0500 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: Message-ID: You seem to be running this in a very odd way. First the GFF3 is not correctly formatted. There are lines containing score=1 (all by itself)? I believe this may be coming through because you are trying to pass in augustus predictions as GFF3 and that input is malformed. All of your Augustus models are also single exon genes, but they are very long and do not even correspond to proper ORFs. The EST evidence is spliced and is thus contradicting the augustus model (they don't support each other). If you want MAKER to be able to use the evidence as feedback for the model, you need to let MAKER run augustus. Otherwise it is only able to accept or reject the model from the GFF3 (nothing more ? no attempt at consensus). Perhaps if you supply you input dataset and control files we can help you get the best settings. You would need to provide the Augustus species set you are using as well (contained in a directory in ?/augustus/config/species). --Carson From: Innocent Onsongo Date: Friday, 7 June, 2013 2:08 PM To: Carson Holt Cc: Carson Holt , "maker-devel at yandell-lab.org" , Barry Moore Subject: Re: [maker-devel] Maker: accessory scripts Carson, I have attached the full gff3 for the contig together with a screen shot from IGV with regions I was expecting Maker to make a consensus call. The region on question is CGS00003:5264784-5273457. I will greatly appreciate any insights. Thanks, Getiria On Thu, Jun 6, 2013 at 8:55 AM, Carson Holt wrote: > One thing to keep in mind is the strandedness of the evidence and the model > (they must be on the same strand). Further protein evidence is only valid > support if it is in the same reading frame as the model. > > Could you send the full GFF3 for the contig (I need features and GFF3 internal > fasta) and the coordinates of the region in question, and I can take a look? > Also if you can, it would be good to let maker run Augustus as well with the > species file rather than just passing in the GFF3. This is because MAKER can > only talk to Augustus to generate competing hint based models if you provide > the species. > > Thanks, > Carson > > > From: Innocent Onsongo > Date: Wednesday, 5 June, 2013 1:10 PM > To: Carson Holt > Cc: Carson Holt , "maker-devel at yandell-lab.org" > , Barry Moore > > Subject: Re: [maker-devel] Maker: accessory scripts > > I checked visually in IGV and there are some exons in the predicted model with > protein and EST support but the maker output GFF only has match_part and > protein_match in column 3. Does that mean Maker doesn't deem any of the > evidence sufficient to make a gene model prediction? > > I guess I am somewhat surprised I am not getting any exons predicted by Maker. > Is there a parameter I can alter to reduce the threshold at which Maker makes > this call? I have attached the first 400 lines of one of my GFF files together > with the control file (maker_opts.ctl) just in case they might be useful. > > Getiria > > > On Wed, Jun 5, 2013 at 9:47 AM, Carson Holt wrote: >> Also, just a note, models are rejected if they have no protein or EST >> support. This is because ab inito predictors over predict (you may have 10 >> false positives for every true positive in some genomes for example). >> >> --Carson >> >> >> >> From: Carson Holt >> Date: Wednesday, 5 June, 2013 10:44 AM >> To: Innocent Onsongo , Carson Holt >> >> >> Cc: "maker-devel at yandell-lab.org" , Barry Moore >> >> Subject: Re: [maker-devel] Maker: accessory scripts >> >> All maker gene annotations will be of the format gene/mRNA/exon/CDS. >> Anything in the format match/match_part is an evidence alignment or rejected >> model and is there for reference purposes. If you want to upgrade all of the >> rejected loci to gene annotations, set keep_preds=1 in the control files. If >> you want to upgrade a subset of rejected models to a full annotation, create >> a list of IDs (one per line) then give them to the attached script. >> gff3_preds2models was previously deprecated and no longer part of the maker >> distribution, but the attached script is an updated version with the same >> functionality. >> >> --Carson >> >> >> >> From: Innocent Onsongo >> Date: Wednesday, 5 June, 2013 12:35 PM >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" , Barry Moore >> >> Subject: [maker-devel] Maker: accessory scripts >> >> I was able to successfully ran Maker and now want to converts the gene >> prediction match/match_part format to annotation gene/mRNA/exon/CDS format. I >> looked at the tutorial and the script gff3_preds2models >> is supposed to do this conversion. How do I access this script. It is not in >> /maker/2.28-beta/bin/ >> >> Also, in running gff3_preds2models is >> the file I used for pred_gff=? >> >> Long story short, how do I transform the GFF output from Maker to the more >> traditional annotation of exon/intron? >> >> Thanks, >> Getiria >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/ma >> ker-devel_yandell-lab.org > > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Fri Jun 7 20:29:50 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Fri, 7 Jun 2013 21:29:50 -0500 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: References: Message-ID: I appreciate the feedback. I will try letting MAKER run augustus instead of passing the Augustus predictions as GFF3. Thanks for all you help! Getiria On Fri, Jun 7, 2013 at 5:10 PM, Carson Holt wrote: > You seem to be running this in a very odd way. First the GFF3 is not > correctly formatted. There are lines containing score=1 (all by itself)? I > believe this may be coming through because you are trying to pass in > augustus predictions as GFF3 and that input is malformed. All of your > Augustus models are also single exon genes, but they are very long and do > not even correspond to proper ORFs. The EST evidence is spliced and is > thus contradicting the augustus model (they don't support each other). If > you want MAKER to be able to use the evidence as feedback for the model, > you need to let MAKER run augustus. Otherwise it is only able to accept or > reject the model from the GFF3 (nothing more ? no attempt at consensus). > > Perhaps if you supply you input dataset and control files we can help you > get the best settings. You would need to provide the Augustus species set > you are using as well (contained in a directory in > ?/augustus/config/species). > > --Carson > > > From: Innocent Onsongo > Date: Friday, 7 June, 2013 2:08 PM > > To: Carson Holt > Cc: Carson Holt , "maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org>, Barry Moore > Subject: Re: [maker-devel] Maker: accessory scripts > > Carson, > > I have attached the full gff3 for the contig together with a screen shot > from IGV with regions I was expecting Maker to make a consensus call. The > region on question is CGS00003:5264784-5273457. I will greatly appreciate > any insights. > > > Thanks, > > Getiria > > > > > On Thu, Jun 6, 2013 at 8:55 AM, Carson Holt wrote: > >> One thing to keep in mind is the strandedness of the evidence and the >> model (they must be on the same strand). Further protein evidence is only >> valid support if it is in the same reading frame as the model. >> >> Could you send the full GFF3 for the contig (I need features and GFF3 >> internal fasta) and the coordinates of the region in question, and I can >> take a look? Also if you can, it would be good to let maker run Augustus >> as well with the species file rather than just passing in the GFF3. This >> is because MAKER can only talk to Augustus to generate competing hint based >> models if you provide the species. >> >> Thanks, >> Carson >> >> >> From: Innocent Onsongo >> Date: Wednesday, 5 June, 2013 1:10 PM >> To: Carson Holt >> Cc: Carson Holt , "maker-devel at yandell-lab.org" < >> maker-devel at yandell-lab.org>, Barry Moore >> >> Subject: Re: [maker-devel] Maker: accessory scripts >> >> I checked visually in IGV and there are some exons in the predicted model >> with protein and EST support but the maker output GFF only has match_part >> and protein_match in column 3. Does that mean Maker doesn't deem any of the >> evidence sufficient to make a gene model prediction? >> >> I guess I am somewhat surprised I am not getting any exons predicted by >> Maker. Is there a parameter I can alter to reduce the threshold at which >> Maker makes this call? I have attached the first 400 lines of one of my GFF >> files together with the control file (maker_opts.ctl) just in case they >> might be useful. >> >> Getiria >> >> >> On Wed, Jun 5, 2013 at 9:47 AM, Carson Holt wrote: >> >>> Also, just a note, models are rejected if they have no protein or EST >>> support. This is because ab inito predictors over predict (you may have 10 >>> false positives for every true positive in some genomes for example). >>> >>> --Carson >>> >>> >>> >>> From: Carson Holt >>> Date: Wednesday, 5 June, 2013 10:44 AM >>> To: Innocent Onsongo , Carson Holt < >>> carson.holt at oicr.on.ca> >>> >>> Cc: "maker-devel at yandell-lab.org" , Barry >>> Moore >>> Subject: Re: [maker-devel] Maker: accessory scripts >>> >>> All maker gene annotations will be of the format gene/mRNA/exon/CDS. >>> Anything in the format match/match_part is an evidence alignment or >>> rejected model and is there for reference purposes. If you want to upgrade >>> all of the rejected loci to gene annotations, set keep_preds=1 in the >>> control files. If you want to upgrade a subset of rejected models to a >>> full annotation, create a list of IDs (one per line) then give them to the >>> attached script. gff3_preds2models was previously deprecated and no longer >>> part of the maker distribution, but the attached script is an updated >>> version with the same functionality. >>> >>> --Carson >>> >>> >>> >>> From: Innocent Onsongo >>> Date: Wednesday, 5 June, 2013 12:35 PM >>> To: Carson Holt >>> Cc: "maker-devel at yandell-lab.org" , Barry >>> Moore >>> Subject: [maker-devel] Maker: accessory scripts >>> >>> I was able to successfully ran Maker and now want to converts the gene >>> prediction match/match_part format to annotation gene/mRNA/exon/CDS format. >>> I looked at the tutorial and the script gff3_preds2models >>> is supposed to do this conversion. How do I access this script. It is >>> not in /maker/2.28-beta/bin/ >>> >>> Also, in running gff3_preds2models is >> list> the file I used for pred_gff=? >>> >>> Long story short, how do I transform the GFF output from Maker to the >>> more traditional annotation of exon/intron? >>> >>> Thanks, >>> Getiria >>> _______________________________________________ maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> >> >> >> -- >> Getiria Onsongo, Ph.D. >> Informatics Analyst, Research Informatics Support System >> Minnesota Supercomputing Institute for Advanced Computational Research >> University of Minnesota >> Minneapolis, MN 55455 >> Phone: 612-624-0532 >> > > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Jun 10 06:40:35 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Jun 2013 08:40:35 -0400 Subject: [maker-devel] Maker and mono-exonic genes ? In-Reply-To: Message-ID: One more note. The ESTs appear to be from multiple overlapping HSPs (based on red line pattern in image). I'd have to see the actual GFF3 to be sure, but if that is the case, then there probably isn't an ORF to work with at that location on that strand (so SNAP can't call it). Possibly the result of assembly error or a pseudogene. --Carson From: Daniel Ence Date: Friday, 7 June, 2013 5:32 PM To: B?r?nice Benayoun , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Maker and mono-exonic genes ? Hi Berenice, Thank you for sending that screenshot and the maker_opts.log file. Those are exactly what we need to understand how to expect MAKER to perform. In looking at the screenshot, it doesn't look like any of the gene predictors gave a prediction in this region. Uses the predictions from ab-initio tools as a basis for models and considers models that are supported by evidence. It won't by default create a model when there isn't a prediction in the region. Can I ask which gene predictors you used and how they were trained? You might consider training one or more of them on the specific evidence that you expect to support these genes and then rerunning maker with the retrained predictors. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of B?r?nice Benayoun [benayoun at stanford.edu] Sent: Friday, June 07, 2013 11:17 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] Maker and mono-exonic genes ? Dear maker developers, I am currently annotating a de novo fish genome, and have started looking for genes of interest in particular in Maker's output to verify that it's outputting proper gene sets. While many of the genes I look for seem to be correctly annotated by the pipeline, I have noticed that important genes that do have strong evidentiary support but are monoexonic are NOT reported by maker. I am attaching a screenshot for the contig that I know should contain the Foxl2 gene (notoriously monoexonic across evolution), and highlighted the corresponding evidence for it. Is there any setting I can give to maker to force it to output monoexonic genes ? I already set "single_exon=1" with no success. I attached my config file FYI. Thank you so much in advance for your answer !!! Best, Berenice. -- B?r?nice A. BENAYOUN, Ph.D. Stanford University/Genetics Department BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' M312 Alway Building 300, Pasteur Drive MC 5120 Stanford, CA 94305-5120 USA Email: benayoun at stanford.edu Web: www.stanford.edu/group/brunet/ _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michel.moser at ips.unibe.ch Mon Jun 10 07:03:06 2013 From: michel.moser at ips.unibe.ch (michel.moser at ips.unibe.ch) Date: Mon, 10 Jun 2013 13:03:06 +0000 Subject: [maker-devel] maker 2.28 blastx error Message-ID: Hello Maker-developper and user I am using maker for the first time to annotate some BAC-sequences. I successfully run both of the test-data sets provided in the maker tarball but when i run maker on my sequences and provide some EST-evidence from cufflinks, i get errors at repeat database blasting (see error below). As te_protein data set i just use the provided file in maker/data/. I sent the data to a colleague which could run it without problem using maker2.10. Or is the problem that i dont have wublast and RepBase installed? Any hint is highly appreciated! Thanks, Michel std.error STATUS: Parsing control files... WARNING: blast_type is set to 'wublast' but executables cannot be located The blast_type 'ncbi+' will be used instead. STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/insert-bac2_datastore To access files for individual sequences use the datastore index: /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/insert-bac2_master_datastore_index.log STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: bac2:383-131865 Length: 131482 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks doing repeat masking doing blastx repeats formating database... #--------- command -------------# Widget::formater: /usr/bin/makeblastdb -dbtype prot -in /tmp/maker_rcBcxr/0/blastprep/te_proteins%2Efasta.mpi.10.0 #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /usr/bin/blastx -db /tmp/maker_rcBcxr/te_proteins%2Efasta.mpi.10.0 -query /tmp/maker_rcBcxr/0/bac2%3A383-131865.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/insert-bac2_datastore/1D/F1/bac2%3A383-131865//theVoid.bac2%3A383-131865/0/bac2%3A383-131865.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner #-------------------------------# BLAST engine error: Warning: Sequence contains no data BLAST engine error: Warning: Sequence contains no data ERROR: BLASTX failed --> rank=NA, hostname=ipsktube ERROR: Failed while doing blastx repeats ERROR: Chunk failed at level:1, tier_type:1 FAILED CONTIG:bac2:383-131865 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:bac2:383-131865 examining contents of the fasta file and run log -------------- next part -------------- A non-text attachment was scrubbed... Name: test1.fasta Type: application/octet-stream Size: 14791 bytes Desc: test1.fasta URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_bopts.ctl Type: application/octet-stream Size: 1413 bytes Desc: maker_bopts.ctl URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_exe.ctl Type: application/octet-stream Size: 1201 bytes Desc: maker_exe.ctl URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4457 bytes Desc: maker_opts.ctl URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: protein.fasta Type: application/octet-stream Size: 452 bytes Desc: protein.fasta URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: insert-bac2.fasta Type: application/octet-stream Size: 131500 bytes Desc: insert-bac2.fasta URL: From anthony.bretaudeau at rennes.inra.fr Mon Jun 10 09:48:13 2013 From: anthony.bretaudeau at rennes.inra.fr (Anthony Bretaudeau) Date: Mon, 10 Jun 2013 17:48:13 +0200 Subject: [maker-devel] Patch for a bug with repeat gff Message-ID: <51B5F53D.90505@rennes.inra.fr> Hello, I am running Maker 2.27b on an insect genome, and I use a gff file containing some repeat positions (rm_gff option in maker_opts.ctl). I encountered an error on 10 scaffolds (the genome contains ~40000 scaffolds) : "substr outside of string" (similar to this post: http://gmod.827538.n3.nabble.com/substr-outside-of-string-td4031889.html). After a lot a debugging, it turns out the problem came from the code of "phathits_on_chunk" function in lib/GFFDB.pm, near line 539: there is a SQL query that fetches features that overlap with the border of the sequence chunk. The problem is that it also fetches features that are completely outside of the chunk in the same region. This produces an error when maker tries to mask the sequence as it does a substr outside the string. I fixed it by patching lib/repeat_mask_seq.pm, near line 138: I replaced: substr($$seq, $b -1 , $l, "$replace"x$l); By: if ($b < length($$seq)) { substr($$seq, $b -1 , $l, "$replace"x$l); } I don't know if there is a more elegant solution, but this seems to solve the problem. Cheers Anthony From carsonhh at gmail.com Mon Jun 10 10:13:50 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Jun 2013 12:13:50 -0400 Subject: [maker-devel] Patch for a bug with repeat gff In-Reply-To: <51B5F53D.90505@rennes.inra.fr> Message-ID: Could you use MAKER version 2.28 instead (launch with maker -a if it still fails). Thanks, Carson On 13-06-10 11:48 AM, "Anthony Bretaudeau" wrote: >Hello, >I am running Maker 2.27b on an insect genome, and I use a gff file >containing some repeat positions (rm_gff option in maker_opts.ctl). > >I encountered an error on 10 scaffolds (the genome contains ~40000 >scaffolds) : "substr outside of string" (similar to this post: >http://gmod.827538.n3.nabble.com/substr-outside-of-string-td4031889.html). > >After a lot a debugging, it turns out the problem came from the code of >"phathits_on_chunk" function in lib/GFFDB.pm, near line 539: there is a >SQL query that fetches features that overlap with the border of the >sequence chunk. >The problem is that it also fetches features that are completely outside >of the chunk in the same region. This produces an error when maker tries >to mask the sequence as it does a substr outside the string. > >I fixed it by patching lib/repeat_mask_seq.pm, near line 138: >I replaced: > substr($$seq, $b -1 , $l, "$replace"x$l); >By: > if ($b < length($$seq)) { > substr($$seq, $b -1 , $l, "$replace"x$l); > } > >I don't know if there is a more elegant solution, but this seems to >solve the problem. > >Cheers >Anthony > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From barry.moore at genetics.utah.edu Mon Jun 10 11:13:49 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Mon, 10 Jun 2013 11:13:49 -0600 Subject: [maker-devel] maker 2.28 blastx error In-Reply-To: References: Message-ID: <1618E393-D123-4D96-AD98-8DDFA9BCD9EF@genetics.utah.edu> Hi Michel, Yes wublast is the problem. On current versions of maker the opts file defaults to ncbi+, but older versions the opts file defaults to wublast. Just edit you maker_bopts.ctl file to have the line: blast_type=ncbi+ It seems like this option may have been in maker_opts.ctl in older files, so if you don't find it in bopts then look in opts. B On Jun 10, 2013, at 7:03 AM, wrote: > Hello Maker-developper and user > > I am using maker for the first time to annotate some BAC-sequences. > I successfully run both of the test-data sets provided in the maker tarball but when i run maker on my > sequences and provide some EST-evidence from cufflinks, i get errors at repeat database blasting (see error below). > As te_protein data set i just use the provided file in maker/data/. > > I sent the data to a colleague which could run it without problem using maker2.10. > Or is the problem that i dont have wublast and RepBase installed? > > Any hint is highly appreciated! > > Thanks, > Michel > > > std.error > > STATUS: Parsing control files... > WARNING: blast_type is set to 'wublast' but executables cannot be located > The blast_type 'ncbi+' will be used instead. > > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/insert-bac2_datastore > > To access files for individual sequences use the datastore index: > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/insert-bac2_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > #--------------------------------------------------------------------- > Now starting the contig!! > SeqID: bac2:383-131865 > Length: 131482 > #--------------------------------------------------------------------- > > > setting up GFF3 output and fasta chunks > doing repeat masking > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /usr/bin/makeblastdb -dbtype prot -in /tmp/maker_rcBcxr/0/blastprep/te_proteins%2Efasta.mpi.10.0 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /usr/bin/blastx -db /tmp/maker_rcBcxr/te_proteins%2Efasta.mpi.10.0 -query /tmp/maker_rcBcxr/0/bac2%3A383-131865.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/insert-bac2_datastore/1D/F1/bac2%3A383-131865//theVoid.bac2%3A383-131865/0/bac2%3A383-131865.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner > #-------------------------------# > BLAST engine error: Warning: Sequence contains no data > BLAST engine error: Warning: Sequence contains no data > ERROR: BLASTX failed > --> rank=NA, hostname=ipsktube > ERROR: Failed while doing blastx repeats > ERROR: Chunk failed at level:1, tier_type:1 > FAILED CONTIG:bac2:383-131865 > > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:bac2:383-131865 > > examining contents of the fasta file and run log > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Jun 10 11:32:55 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Jun 2013 13:32:55 -0400 Subject: [maker-devel] maker 2.28 blastx error In-Reply-To: <1618E393-D123-4D96-AD98-8DDFA9BCD9EF@genetics.utah.edu> Message-ID: It's actually a little more complicated than that. You are already using BLAST+. The sequence you are running on is apparently entirely masked, so there is nothing there to align. The error thrown by NCBI BLAST+ when this happens (currently "Sequence contains no data ") has changed slightly over time. As a result it causes MAKER to fail where wublast doesn't because the error it throws is still recognized, captured by MAKER, and ignored. You can probably ignore that contig, run with a different version of BLAST, or put the attached files in the ?/maker/lib/Widget/ directory. I fixed the check for the current message, so it will ignore the error (as long as the error is still going to STDERR and not STDOUT). --Carson From: Barry Moore Date: Monday, 10 June, 2013 1:13 PM To: Cc: Subject: Re: [maker-devel] maker 2.28 blastx error Hi Michel, Yes wublast is the problem. On current versions of maker the opts file defaults to ncbi+, but older versions the opts file defaults to wublast. Just edit you maker_bopts.ctl file to have the line: blast_type=ncbi+ It seems like this option may have been in maker_opts.ctl in older files, so if you don't find it in bopts then look in opts. B On Jun 10, 2013, at 7:03 AM, wrote: > Hello Maker-developper and user > > I am using maker for the first time to annotate some BAC-sequences. > I successfully run both of the test-data sets provided in the maker tarball > but when i run maker on my > sequences and provide some EST-evidence from cufflinks, i get errors at repeat > database blasting (see error below). > As te_protein data set i just use the provided file in maker/data/. > > I sent the data to a colleague which could run it without problem using > maker2.10. > Or is the problem that i dont have wublast and RepBase installed? > > Any hint is highly appreciated! > > Thanks, > Michel > > > std.error > > STATUS: Parsing control files... > WARNING: blast_type is set to 'wublast' but executables cannot be located > The blast_type 'ncbi+' will be used instead. > > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/inser > t-bac2_datastore > > To access files for individual sequences use the datastore index: > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/inser > t-bac2_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > #--------------------------------------------------------------------- > Now starting the contig!! > SeqID: bac2:383-131865 > Length: 131482 > #--------------------------------------------------------------------- > > > setting up GFF3 output and fasta chunks > doing repeat masking > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /usr/bin/makeblastdb -dbtype prot -in > /tmp/maker_rcBcxr/0/blastprep/te_proteins%2Efasta.mpi.10.0 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /usr/bin/blastx -db /tmp/maker_rcBcxr/te_proteins%2Efasta.mpi.10.0 -query > /tmp/maker_rcBcxr/0/bac2%3A383-131865.0 -num_alignments 10000 > -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 > -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/inser > t-bac2_datastore/1D/F1/bac2%3A383-131865//theVoid.bac2%3A383-131865/0/bac2%3A3 > 83-131865.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi. > 10.0.repeatrunner > #-------------------------------# > BLAST engine error: Warning: Sequence contains no data > BLAST engine error: Warning: Sequence contains no data > ERROR: BLASTX failed > --> rank=NA, hostname=ipsktube > ERROR: Failed while doing blastx repeats > ERROR: Chunk failed at level:1, tier_type:1 > FAILED CONTIG:bac2:383-131865 > > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:bac2:383-131865 > > examining contents of the fasta file and run log > > > > nsert-bac2.fasta>_______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blastn.pm Type: text/x-perl-script Size: 7441 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blastx.pm Type: text/x-perl-script Size: 7501 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: tblastx.pm Type: text/x-perl-script Size: 8363 bytes Desc: not available URL: From carsonhh at gmail.com Mon Jun 10 11:53:53 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Jun 2013 13:53:53 -0400 Subject: [maker-devel] maker 2.28 blastx error In-Reply-To: <1618E393-D123-4D96-AD98-8DDFA9BCD9EF@genetics.utah.edu> Message-ID: Never mind. It's even a little weirder than what I just explained. The contig named (bac2:383-131865) is triggering a behavior on the BioPerl indexer where it recognizes it as a region and not a contig. As a result it can't find the sequence, but also doesn't throw an error (results in an empty fasta). Solution: Just change the name of the contig. Try using 'bac2_383-131865' instread. --Carson From: Barry Moore Date: Monday, 10 June, 2013 1:13 PM To: Cc: Subject: Re: [maker-devel] maker 2.28 blastx error Hi Michel, Yes wublast is the problem. On current versions of maker the opts file defaults to ncbi+, but older versions the opts file defaults to wublast. Just edit you maker_bopts.ctl file to have the line: blast_type=ncbi+ It seems like this option may have been in maker_opts.ctl in older files, so if you don't find it in bopts then look in opts. B On Jun 10, 2013, at 7:03 AM, wrote: > Hello Maker-developper and user > > I am using maker for the first time to annotate some BAC-sequences. > I successfully run both of the test-data sets provided in the maker tarball > but when i run maker on my > sequences and provide some EST-evidence from cufflinks, i get errors at repeat > database blasting (see error below). > As te_protein data set i just use the provided file in maker/data/. > > I sent the data to a colleague which could run it without problem using > maker2.10. > Or is the problem that i dont have wublast and RepBase installed? > > Any hint is highly appreciated! > > Thanks, > Michel > > > std.error > > STATUS: Parsing control files... > WARNING: blast_type is set to 'wublast' but executables cannot be located > The blast_type 'ncbi+' will be used instead. > > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/inser > t-bac2_datastore > > To access files for individual sequences use the datastore index: > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/inser > t-bac2_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > #--------------------------------------------------------------------- > Now starting the contig!! > SeqID: bac2:383-131865 > Length: 131482 > #--------------------------------------------------------------------- > > > setting up GFF3 output and fasta chunks > doing repeat masking > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /usr/bin/makeblastdb -dbtype prot -in > /tmp/maker_rcBcxr/0/blastprep/te_proteins%2Efasta.mpi.10.0 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /usr/bin/blastx -db /tmp/maker_rcBcxr/te_proteins%2Efasta.mpi.10.0 -query > /tmp/maker_rcBcxr/0/bac2%3A383-131865.0 -num_alignments 10000 > -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 > -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/inser > t-bac2_datastore/1D/F1/bac2%3A383-131865//theVoid.bac2%3A383-131865/0/bac2%3A3 > 83-131865.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi. > 10.0.repeatrunner > #-------------------------------# > BLAST engine error: Warning: Sequence contains no data > BLAST engine error: Warning: Sequence contains no data > ERROR: BLASTX failed > --> rank=NA, hostname=ipsktube > ERROR: Failed while doing blastx repeats > ERROR: Chunk failed at level:1, tier_type:1 > FAILED CONTIG:bac2:383-131865 > > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:bac2:383-131865 > > examining contents of the fasta file and run log > > > > nsert-bac2.fasta>_______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From anthony.bretaudeau at rennes.inra.fr Tue Jun 11 09:03:42 2013 From: anthony.bretaudeau at rennes.inra.fr (Anthony Bretaudeau) Date: Tue, 11 Jun 2013 17:03:42 +0200 Subject: [maker-devel] Patch for a bug with repeat gff In-Reply-To: References: Message-ID: <51B73C4E.6030204@rennes.inra.fr> Hello, I have just tested with 2.28b: the problem is still there, and my fix works on this version too. Cheers Anthony On 10/06/2013 18:13, Carson Holt wrote: > Could you use MAKER version 2.28 instead (launch with maker -a if it still > fails). > > Thanks, > Carson > > > > On 13-06-10 11:48 AM, "Anthony Bretaudeau" > wrote: > >> Hello, >> I am running Maker 2.27b on an insect genome, and I use a gff file >> containing some repeat positions (rm_gff option in maker_opts.ctl). >> >> I encountered an error on 10 scaffolds (the genome contains ~40000 >> scaffolds) : "substr outside of string" (similar to this post: >> http://gmod.827538.n3.nabble.com/substr-outside-of-string-td4031889.html). >> >> After a lot a debugging, it turns out the problem came from the code of >> "phathits_on_chunk" function in lib/GFFDB.pm, near line 539: there is a >> SQL query that fetches features that overlap with the border of the >> sequence chunk. >> The problem is that it also fetches features that are completely outside >> of the chunk in the same region. This produces an error when maker tries >> to mask the sequence as it does a substr outside the string. >> >> I fixed it by patching lib/repeat_mask_seq.pm, near line 138: >> I replaced: >> substr($$seq, $b -1 , $l, "$replace"x$l); >> By: >> if ($b < length($$seq)) { >> substr($$seq, $b -1 , $l, "$replace"x$l); >> } >> >> I don't know if there is a more elegant solution, but this seems to >> solve the problem. >> >> Cheers >> Anthony >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From carsonhh at gmail.com Tue Jun 11 09:06:10 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 11 Jun 2013 11:06:10 -0400 Subject: [maker-devel] Patch for a bug with repeat gff In-Reply-To: <51B73C4E.6030204@rennes.inra.fr> Message-ID: Could you send me your repeat_gff and genome fasta, so I can take a look. Thanks, Carson On 13-06-11 11:03 AM, "Anthony Bretaudeau" wrote: >Hello, >I have just tested with 2.28b: the problem is still there, and my fix >works on this version too. >Cheers >Anthony > >On 10/06/2013 18:13, Carson Holt wrote: >> Could you use MAKER version 2.28 instead (launch with maker -a if it >>still >> fails). >> >> Thanks, >> Carson >> >> >> >> On 13-06-10 11:48 AM, "Anthony Bretaudeau" >> wrote: >> >>> Hello, >>> I am running Maker 2.27b on an insect genome, and I use a gff file >>> containing some repeat positions (rm_gff option in maker_opts.ctl). >>> >>> I encountered an error on 10 scaffolds (the genome contains ~40000 >>> scaffolds) : "substr outside of string" (similar to this post: >>> >>>http://gmod.827538.n3.nabble.com/substr-outside-of-string-td4031889.html >>>). >>> >>> After a lot a debugging, it turns out the problem came from the code of >>> "phathits_on_chunk" function in lib/GFFDB.pm, near line 539: there is a >>> SQL query that fetches features that overlap with the border of the >>> sequence chunk. >>> The problem is that it also fetches features that are completely >>>outside >>> of the chunk in the same region. This produces an error when maker >>>tries >>> to mask the sequence as it does a substr outside the string. >>> >>> I fixed it by patching lib/repeat_mask_seq.pm, near line 138: >>> I replaced: >>> substr($$seq, $b -1 , $l, "$replace"x$l); >>> By: >>> if ($b < length($$seq)) { >>> substr($$seq, $b -1 , $l, "$replace"x$l); >>> } >>> >>> I don't know if there is a more elegant solution, but this seems to >>> solve the problem. >>> >>> Cheers >>> Anthony >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > From anthony.bretaudeau at rennes.inra.fr Wed Jun 12 07:29:14 2013 From: anthony.bretaudeau at rennes.inra.fr (Anthony Bretaudeau) Date: Wed, 12 Jun 2013 15:29:14 +0200 Subject: [maker-devel] Patch for a bug with repeat gff In-Reply-To: References: Message-ID: <51B877AA.8060803@rennes.inra.fr> Hi, Here is a minimal gff file that allows to reproduce the bug. It should work with any fasta (my real data is not yet published, I can't share it publicly yet). Tell me if you need more info Anthony On 11/06/2013 17:06, Carson Holt wrote: > Could you send me your repeat_gff and genome fasta, so I can take a look. > > Thanks, > Carson > > > > On 13-06-11 11:03 AM, "Anthony Bretaudeau" > wrote: > >> Hello, >> I have just tested with 2.28b: the problem is still there, and my fix >> works on this version too. >> Cheers >> Anthony >> >> On 10/06/2013 18:13, Carson Holt wrote: >>> Could you use MAKER version 2.28 instead (launch with maker -a if it >>> still >>> fails). >>> >>> Thanks, >>> Carson >>> >>> >>> >>> On 13-06-10 11:48 AM, "Anthony Bretaudeau" >>> wrote: >>> >>>> Hello, >>>> I am running Maker 2.27b on an insect genome, and I use a gff file >>>> containing some repeat positions (rm_gff option in maker_opts.ctl). >>>> >>>> I encountered an error on 10 scaffolds (the genome contains ~40000 >>>> scaffolds) : "substr outside of string" (similar to this post: >>>> >>>> http://gmod.827538.n3.nabble.com/substr-outside-of-string-td4031889.html >>>> ). >>>> >>>> After a lot a debugging, it turns out the problem came from the code of >>>> "phathits_on_chunk" function in lib/GFFDB.pm, near line 539: there is a >>>> SQL query that fetches features that overlap with the border of the >>>> sequence chunk. >>>> The problem is that it also fetches features that are completely >>>> outside >>>> of the chunk in the same region. This produces an error when maker >>>> tries >>>> to mask the sequence as it does a substr outside the string. >>>> >>>> I fixed it by patching lib/repeat_mask_seq.pm, near line 138: >>>> I replaced: >>>> substr($$seq, $b -1 , $l, "$replace"x$l); >>>> By: >>>> if ($b < length($$seq)) { >>>> substr($$seq, $b -1 , $l, "$replace"x$l); >>>> } >>>> >>>> I don't know if there is a more elegant solution, but this seems to >>>> solve the problem. >>>> >>>> Cheers >>>> Anthony >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- scaffold_20 TEs match 199889 203598 0.0 + . ID=some_id_1 scaffold_20 TEs match_part 199889 200163 2.6e-12 + . ID=part_1;Parent=some_id_1 scaffold_20 TEs match_part 203256 203598 2.6e-12 + . ID=part_2;Parent=some_id_1 From sickler.alex at gmail.com Wed Jun 12 12:22:17 2013 From: sickler.alex at gmail.com (Alex Sickler) Date: Wed, 12 Jun 2013 14:22:17 -0400 Subject: [maker-devel] Problem Installing with opencc Message-ID: Hi all, I am trying to install Maker 2.28. When I go to install Maker, it gives the following error message: /usr/bin/perl /usr/local/share/perl5/ExtUtils/xsubpp -typemap "/usr/share/perl5/ExtUtils/typemap" MPI.xs $ /share/apps/openmpi/OpenMPI-1.6.3/bin/mpicc -c -I"/share/apps/maker/src" -I/share/apps/openmpi/OpenMPI-1.6.3/include -D_REENTRANT -D_GNU_SOUR$ opencc WARNING: unknown flag: -fstack-protector opencc WARNING: unknown flag: -fstack-protector opencc ERROR: -- not allowed in non XPG4 environment opencc ERROR parsing --param=ssp-buffer-size=4: unknown flag make: *** [MPI.o] Error 2 The to everything is correct. I tried looking in the Makefile.PL but could not find the "param=" option. Any help is greatly appreciated, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jun 13 13:38:52 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 13 Jun 2013 15:38:52 -0400 Subject: [maker-devel] Problem Installing with opencc In-Reply-To: Message-ID: MAKER installation doesn't have a Makefile.PL. The parameters for compilation of the MPI bindings are being set by mpicc itself, Perl, or environmental variables on your system. In general you want both Perl and OpenMPI to be compiled by the same compiler or you can get cross library problems (as Perl is using the shared libraries in OpenMPI so all communication is really at the C level). This is not always the case, but can happen (I have been fine for the most part mixing pgi, intel, and gcc compiled OpenMPI, but have never tried open64 compilers). Alternatively you can try manually setting the values in the following environmental variables before installing MAKER which should affect the parameter settings (this means before even running the 'perl Build.PL' step): LDFLAGS= LDDLFLAGS= CCCDLFLAGS= CCDLFLAGS= Also you need to export the following variable for OpenMPI to work with shared libraries before trying to install MAKER or run MAKER (this means before even running the 'perl Build.PL' step). It's best just to add it to your ~/.bashrc or ~/.bash_profile. export LD_PRELOAD=/share/apps/openmpi/OpenMPI-1.6.3/lib/libmpi.so You will need to run 'source ~/.bashrc' or 'source ~/.bash_profile' after adding it to implement the changes into the current terminal session. Thanks, Carson From: Alex Sickler Date: Wednesday, 12 June, 2013 2:22 PM To: Cc: Subject: [maker-devel] Problem Installing with opencc Hi all, I am trying to install Maker 2.28. When I go to install Maker, it gives the following error message: /usr/bin/perl /usr/local/share/perl5/ExtUtils/xsubpp -typemap "/usr/share/perl5/ExtUtils/typemap" MPI.xs $ /share/apps/openmpi/OpenMPI-1.6.3/bin/mpicc -c -I"/share/apps/maker/src" -I/share/apps/openmpi/OpenMPI-1.6.3/include -D_REENTRANT -D_GNU_SOUR$ opencc WARNING: unknown flag: -fstack-protector opencc WARNING: unknown flag: -fstack-protector opencc ERROR: -- not allowed in non XPG4 environment opencc ERROR parsing --param=ssp-buffer-size=4: unknown flag make: *** [MPI.o] Error 2 The to everything is correct. I tried looking in the Makefile.PL but could not find the "param=" option. Any help is greatly appreciated, Alex _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun Jun 16 13:46:51 2013 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 16 Jun 2013 15:46:51 -0400 Subject: [maker-devel] Patch for a bug with repeat gff In-Reply-To: <51B877AA.8060803@rennes.inra.fr> Message-ID: Thanks for the detailed report and test files. The problem initiates with your GFF3 giving a repeat structure that is a spliced repeat. I don't know if such a thing can really occur, but regardless maker doesn't expect them to occur, and as a result when assembled some of the spliced exons run off the edge of the sequence. The script currently checks for repeats where the end of a repeat runs off the edge and adjusts accordingly, but does not check for a start that runs off the edge (because it's not expecting spliced repeats). The result is the substring outside of string error. I added 'next if($l <=0)' to both the _soft_mask_seq and _hard_mask_seq functions, and hopefully having spliced repeats won't cause other hidden errors elsewhere downstream, but you may need to be aware of the possibility. Thanks, Carson On 13-06-12 9:29 AM, "Anthony Bretaudeau" wrote: >Hi, >Here is a minimal gff file that allows to reproduce the bug. It should >work with any fasta (my real data is not yet published, I can't share it >publicly yet). >Tell me if you need more info >Anthony > >On 11/06/2013 17:06, Carson Holt wrote: >> Could you send me your repeat_gff and genome fasta, so I can take a >>look. >> >> Thanks, >> Carson >> >> >> >> On 13-06-11 11:03 AM, "Anthony Bretaudeau" >> wrote: >> >>> Hello, >>> I have just tested with 2.28b: the problem is still there, and my fix >>> works on this version too. >>> Cheers >>> Anthony >>> >>> On 10/06/2013 18:13, Carson Holt wrote: >>>> Could you use MAKER version 2.28 instead (launch with maker -a if it >>>> still >>>> fails). >>>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> >>>> On 13-06-10 11:48 AM, "Anthony Bretaudeau" >>>> wrote: >>>> >>>>> Hello, >>>>> I am running Maker 2.27b on an insect genome, and I use a gff file >>>>> containing some repeat positions (rm_gff option in maker_opts.ctl). >>>>> >>>>> I encountered an error on 10 scaffolds (the genome contains ~40000 >>>>> scaffolds) : "substr outside of string" (similar to this post: >>>>> >>>>> >>>>>http://gmod.827538.n3.nabble.com/substr-outside-of-string-td4031889.ht >>>>>ml >>>>> ). >>>>> >>>>> After a lot a debugging, it turns out the problem came from the code >>>>>of >>>>> "phathits_on_chunk" function in lib/GFFDB.pm, near line 539: there >>>>>is a >>>>> SQL query that fetches features that overlap with the border of the >>>>> sequence chunk. >>>>> The problem is that it also fetches features that are completely >>>>> outside >>>>> of the chunk in the same region. This produces an error when maker >>>>> tries >>>>> to mask the sequence as it does a substr outside the string. >>>>> >>>>> I fixed it by patching lib/repeat_mask_seq.pm, near line 138: >>>>> I replaced: >>>>> substr($$seq, $b -1 , $l, "$replace"x$l); >>>>> By: >>>>> if ($b < length($$seq)) { >>>>> substr($$seq, $b -1 , $l, "$replace"x$l); >>>>> } >>>>> >>>>> I don't know if there is a more elegant solution, but this seems to >>>>> solve the problem. >>>>> >>>>> Cheers >>>>> Anthony >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> >>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>g >> > From jmdoyle at purdue.edu Mon Jun 17 11:20:42 2013 From: jmdoyle at purdue.edu (Jacqueline R M Doyle) Date: Mon, 17 Jun 2013 13:20:42 -0400 (EDT) Subject: [maker-devel] altest without MPI? Message-ID: <1755059295.37969.1371489642806.JavaMail.root@mailhub042.itcs.purdue.edu> Hi! I am beginning my first MAKER annotation and had a quick question. I am currently planning on following the ?Training ab initio Gene Predictors? section of the MAKER 2012 tutorial. For my species of interest, I have 784290 scaffolds in which 80% are greater than 100 kb. I have EST data from a closely related species and was also going to use the core cegma protein sequences. With this in mind, I made the following changes to my maker_opts file: genome=scaffolds.fasta altest=Trinity.fasta protein=cegma.fa est2genome=1 cpus=48 My primary concern is that this is going to take a long time to run with altest, even with the extra cpus for BLAST. The software was not originally installed on our computer cluster with MPICH2, but I may be able to talk our computer guys into reinstalling if the situation is going to be completely untenable without MPI. I guess my question is, is there any point in trying to run the above without MPI? Is there a good way to monitor the progress of such a run if I was to give it a shot? Thanks for your help with this! Jackie From carsonhh at gmail.com Mon Jun 17 14:12:58 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 17 Jun 2013 16:12:58 -0400 Subject: [maker-devel] altest without MPI? In-Reply-To: <1755059295.37969.1371489642806.JavaMail.root@mailhub042.itcs.purdue.edu> Message-ID: It's best to use the cegma results with the cegma2zff script to generate a training set for SNAP. Then don't use the cegma proteins. If you can get proteins from a related species with an annotated genome, it will be better than altest option for a different species. This is because altest is aligned via tbalstx which is 3-4 time slower than protein alignments. Also they will rarely be good enough to produce many est2genome models (best to only use them if you have nothing else). The cpus= option is a blast parameter for specifying how many cpus to give to each blast job. It is not an MPI parameter. The number of cpus for MPI is specified using the -n option from mpiexec and not in the maker control files. You don't have to use MPI. You can also split your contigs up into separate jobs and run MAKER multiple times. Use the fasta_tool script that comes with MAKER to split your input file up. Let us know if you come across anything you have more questions on. Thanks, Carson On 13-06-17 1:20 PM, "Jacqueline R M Doyle" wrote: >Hi! > >I am beginning my first MAKER annotation and had a quick question. I am >currently planning on following the ?Training ab initio Gene Predictors? >section of the MAKER 2012 tutorial. For my species of interest, I have >784290 scaffolds in which 80% are greater than 100 kb. I have EST data >from a closely related species and was also going to use the core cegma >protein sequences. With this in mind, I made the following changes to my >maker_opts file: > >genome=scaffolds.fasta >altest=Trinity.fasta >protein=cegma.fa >est2genome=1 >cpus=48 > >My primary concern is that this is going to take a long time to run with >altest, even with the extra cpus for BLAST. The software was not >originally installed on our computer cluster with MPICH2, but I may be >able to talk our computer guys into reinstalling if the situation is >going to be completely untenable without MPI. I guess my question is, is >there any point in trying to run the above without MPI? Is there a good >way to monitor the progress of such a run if I was to give it a shot? > >Thanks for your help with this! > >Jackie > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Jun 19 19:05:49 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 19 Jun 2013 21:05:49 -0400 Subject: [maker-devel] altest without MPI? In-Reply-To: <1997335285.43753.1371676376399.JavaMail.root@mailhub042.itcs.purdue.edu> Message-ID: The throughput is based on contig length, so long contigs will take longer than short contigs. Any contig less than 10kb is mostly useless for annotation purposes (so you can filter those from your 800,000 right away). Take your contigs that finish, and sum up their length to get a better estimate of how long it will take to complete running. Most genomes can complete in a few days an a multi-core machine. Bigger genomes or bigger datasets take longer. (note that altest evidence takes 3-4x longer to align than proteins). The advantage of proteins is that the species do not have to be closely related. Nucleotide sequence diverges quickly and proteins slowly (that's why proteins are used for phylogenetic trees). A good strategy would be to get ~10Mb of sequence (use your longest contigs). Run with Chicken, turkey, and pigeon proteins. Use the protein2genome option to generate annotations. Those annotations should now be sufficient to train SNAP and Augustus. Then you can finish by running all your contigs with the same dataset (protein2genome now turned off), use the newly trained snap and augustus files along with any altest files you want to use. Note that the size of the dataset will determine the total run time. To get things to run faster, you can also run on your university's computer cluster (then you will have hundreds of cpus available to you). The purdue cluster supports MPI and with 30-50 cpus you could annotate even large genomes in a reasonable time. Alternatively you can request a startup account at XSEDE, an NFS funded computer resource open to all US institutions. A startup allocation with 50,000 cpu hours only takes 2 weeks to approve. You should request an allocation on the Lonestar cluster if you go that route, it has 64,000 cpus. I was able to annotate the Maize genome (which is a very large genome at over 2 gigabases). I used an abnormally large EST and protein datasets (~4 gigabases of evidence which is much more than a normal annotation job), and it completed in under 3 hours on 2,100 cpus. --Carson On 13-06-19 5:12 PM, "Jacqueline R M Doyle" wrote: >Hi Carson (and whoever else might be reading this!) > >Thanks so much, I think splitting the files up using fasta_tool will >definitely move things along. I did a trial version with altest this >weekend, and seemed to be averaging about an hour a scaffold (with 1 >cpu). I'm a little concerned, as we have ~800,000 scaffolds. Does this >seem like a reasonable estimate of the time it should take to annotate >one sequence? Could I be missing something in my maker_opts file? > >Let me back up for just a minute and describe the project a little more >generally. As I mentioned before, we have no protein sequences or ESTs >for our species of interest, which is an avian species. I could >potentially use proteins from chicken or turkey, but neither is closely >related to our species. Time is a bit of an issue... do you have any >thoughts on how much time per scaffold it should take to annotate using >protein2genome? If chicken and turkey are not closely related, is it >worth the time investment? > >Let me finish by saying I think MAKER is wonderful, and I really >appreciate the discussions on this group. > >Best wishes, Jackie From jjin01 at mail.rockefeller.edu Thu Jun 20 14:22:22 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Thu, 20 Jun 2013 20:22:22 +0000 Subject: [maker-devel] maker exon result Message-ID: Dear all, I have used maker to predict the gene model in my draft genome. However, when I check the sequence for each exon, I find some of them just have start codon, without stop codon. Is it reasonable for this? Like in this example: processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9 processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0|0|1|0.14|0.12|8|0|362 processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . + 2 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG This is the sequence for this gene, the red color is for the first exon?? However, for this exon, I cannot found the stop codon??? I also find for some exon, there are several stop codon in one exon??? Does anyone have the same problem with me? Or there is something wrong when I configure the maker file?? Thanks! Jingjing -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Thu Jun 20 17:06:29 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 20 Jun 2013 23:06:29 +0000 Subject: [maker-devel] maker exon result In-Reply-To: References: Message-ID: Hi Jingjing, It's really hard to find the stop codon in the nucleotide sequence that you sent. I think most people determine the presence of a stop codon in a gene by viewing the annotations and sequence in some kind of viewer. The one that I use the most is Apollo, but many people also like gbrowse and igv. When you view gene models in Apollo, the start codons are highlighted in green and the stop codons are highlighted in red. Sometimes MAKER couldn't find the stop or start codon for a gene, and in those cases, the end of the gene model is marked with an orange arrow. I hope that I understood your question. Feel free to reply back on the mailing list if I didn't. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing Jin [jjin01 at mail.rockefeller.edu] Sent: Thursday, June 20, 2013 2:22 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker exon result Dear all, I have used maker to predict the gene model in my draft genome. However, when I check the sequence for each exon, I find some of them just have start codon, without stop codon. Is it reasonable for this? Like in this example: processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9 processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0|0|1|0.14|0.12|8|0|362 processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . + 2 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG This is the sequence for this gene, the red color is for the first exon?? However, for this exon, I cannot found the stop codon??? I also find for some exon, there are several stop codon in one exon??? Does anyone have the same problem with me? Or there is something wrong when I configure the maker file?? Thanks! Jingjing -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Thu Jun 20 17:11:56 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu, 20 Jun 2013 17:11:56 -0600 Subject: [maker-devel] maker exon result In-Reply-To: References: Message-ID: <6312A919-6E3A-43F5-A553-5947204FC6DB@genetics.utah.edu> To add to what Daniel suggested if you want to find the stop codon for this gene, look at the last three nucleotides of the last CDS. B On Jun 20, 2013, at 5:06 PM, Daniel Ence wrote: > Hi Jingjing, > > It's really hard to find the stop codon in the nucleotide sequence that you sent. I think most people determine the presence of a stop codon in a gene by viewing the annotations and sequence in some kind of viewer. The one that I use the most is Apollo, but many people also like gbrowse and igv. > > When you view gene models in Apollo, the start codons are highlighted in green and the stop codons are highlighted in red. Sometimes MAKER couldn't find the stop or start codon for a gene, and in those cases, the end of the gene model is marked with an orange arrow. > > I hope that I understood your question. Feel free to reply back on the mailing list if I didn't. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing Jin [jjin01 at mail.rockefeller.edu] > Sent: Thursday, June 20, 2013 2:22 PM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] maker exon result > > Dear all, > > I have used maker to predict the gene model in my draft genome. > > However, when I check the sequence for each exon, I find some of them just have start codon, without stop codon. > > Is it reasonable for this? > > Like in this example: > > processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9 > processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0|0|1|0.14|0.12|8|0|362 > processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . + 2 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > > ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT > CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT > AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG > ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA > TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC > TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG > TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG > ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT > TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT > TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC > TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC > AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT > TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT > AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC > TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA > AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT > TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG > CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC > GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT > TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG > AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA > AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA > TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC > TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC > TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG > TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA > TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC > TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA > ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT > GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA > AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA > GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT > TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT > TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG > AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC > TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT > TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT > AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC > TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA > GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG > GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA > TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA > TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT > TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG > CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA > GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG > AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT > ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT > TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA > CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA > CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT > ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA > TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG > AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG > > > This is the sequence for this gene, the red color is for the first exon?? > > However, for this exon, I cannot found the stop codon??? > > I also find for some exon, there are several stop codon in one exon??? > > Does anyone have the same problem with me? > Or there is something wrong when I configure the maker file?? > > Thanks! > > Jingjing > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Thu Jun 20 18:18:18 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Fri, 21 Jun 2013 00:18:18 +0000 Subject: [maker-devel] maker exon result In-Reply-To: References: , Message-ID: For my understanding, the prediction gene model should be connect different exon together. For each exon of a gene, I think it should have a start codon and stop codon. However, it may be wrong. However, when I check some gene model from maker prediction, some exon of one gene, I cannot find stop codon for it. Like the example I give, the red color is the first exon. However, the last 3 NT is not a stop codon. Even for last 3 NT for last exon, it is also not a stop codon. Is it reasonable? Thanks! Jingjing ________________________________ From: Daniel Ence [dence at genetics.utah.edu] Sent: Thursday, June 20, 2013 7:06 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: RE: maker exon result Hi Jingjing, It's really hard to find the stop codon in the nucleotide sequence that you sent. I think most people determine the presence of a stop codon in a gene by viewing the annotations and sequence in some kind of viewer. The one that I use the most is Apollo, but many people also like gbrowse and igv. When you view gene models in Apollo, the start codons are highlighted in green and the stop codons are highlighted in red. Sometimes MAKER couldn't find the stop or start codon for a gene, and in those cases, the end of the gene model is marked with an orange arrow. I hope that I understood your question. Feel free to reply back on the mailing list if I didn't. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing Jin [jjin01 at mail.rockefeller.edu] Sent: Thursday, June 20, 2013 2:22 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker exon result Dear all, I have used maker to predict the gene model in my draft genome. However, when I check the sequence for each exon, I find some of them just have start codon, without stop codon. Is it reasonable for this? Like in this example: processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9 processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0|0|1|0.14|0.12|8|0|362 processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . + 2 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG This is the sequence for this gene, the red color is for the first exon?? However, for this exon, I cannot found the stop codon??? I also find for some exon, there are several stop codon in one exon??? Does anyone have the same problem with me? Or there is something wrong when I configure the maker file?? Thanks! Jingjing -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Thu Jun 20 18:21:38 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Fri, 21 Jun 2013 00:21:38 +0000 Subject: [maker-devel] maker exon result In-Reply-To: <6312A919-6E3A-43F5-A553-5947204FC6DB@genetics.utah.edu> References: , <6312A919-6E3A-43F5-A553-5947204FC6DB@genetics.utah.edu> Message-ID: For the last three nucleotides of this example, it is also not stop codon. Jingjing ________________________________ From: Barry Moore [barry.moore at genetics.utah.edu] Sent: Thursday, June 20, 2013 7:11 PM To: Daniel Ence Cc: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] maker exon result To add to what Daniel suggested if you want to find the stop codon for this gene, look at the last three nucleotides of the last CDS. B On Jun 20, 2013, at 5:06 PM, Daniel Ence wrote: Hi Jingjing, It's really hard to find the stop codon in the nucleotide sequence that you sent. I think most people determine the presence of a stop codon in a gene by viewing the annotations and sequence in some kind of viewer. The one that I use the most is Apollo, but many people also like gbrowse and igv. When you view gene models in Apollo, the start codons are highlighted in green and the stop codons are highlighted in red. Sometimes MAKER couldn't find the stop or start codon for a gene, and in those cases, the end of the gene model is marked with an orange arrow. I hope that I understood your question. Feel free to reply back on the mailing list if I didn't. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing Jin [jjin01 at mail.rockefeller.edu] Sent: Thursday, June 20, 2013 2:22 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker exon result Dear all, I have used maker to predict the gene model in my draft genome. However, when I check the sequence for each exon, I find some of them just have start codon, without stop codon. Is it reasonable for this? Like in this example: processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9 processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0|0|1|0.14|0.12|8|0|362 processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . + 2 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG This is the sequence for this gene, the red color is for the first exon?? However, for this exon, I cannot found the stop codon??? I also find for some exon, there are several stop codon in one exon??? Does anyone have the same problem with me? Or there is something wrong when I configure the maker file?? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From myandell at genetics.utah.edu Thu Jun 20 19:11:40 2013 From: myandell at genetics.utah.edu (Mark Yandell) Date: Fri, 21 Jun 2013 01:11:40 +0000 Subject: [maker-devel] maker exon result In-Reply-To: References: , , Message-ID: <7A60AB257EFF2B48B1F4C814817EA05365E18B22@mxb2.hg.genetics.utah.edu> Hi Jin, only the terminal coding exon (CDS) of a gene model will contain a stop codon. Sometimes though there is no stop codon as the gene actually runs of the end of the scaffold, or is lost in a gab in the assembly... --mark Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing Jin [jjin01 at mail.rockefeller.edu] Sent: Thursday, June 20, 2013 6:18 PM To: Daniel Ence; maker-devel at yandell-lab.org Subject: Re: [maker-devel] maker exon result For my understanding, the prediction gene model should be connect different exon together. For each exon of a gene, I think it should have a start codon and stop codon. However, it may be wrong. However, when I check some gene model from maker prediction, some exon of one gene, I cannot find stop codon for it. Like the example I give, the red color is the first exon. However, the last 3 NT is not a stop codon. Even for last 3 NT for last exon, it is also not a stop codon. Is it reasonable? Thanks! Jingjing ________________________________ From: Daniel Ence [dence at genetics.utah.edu] Sent: Thursday, June 20, 2013 7:06 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: RE: maker exon result Hi Jingjing, It's really hard to find the stop codon in the nucleotide sequence that you sent. I think most people determine the presence of a stop codon in a gene by viewing the annotations and sequence in some kind of viewer. The one that I use the most is Apollo, but many people also like gbrowse and igv. When you view gene models in Apollo, the start codons are highlighted in green and the stop codons are highlighted in red. Sometimes MAKER couldn't find the stop or start codon for a gene, and in those cases, the end of the gene model is marked with an orange arrow. I hope that I understood your question. Feel free to reply back on the mailing list if I didn't. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing Jin [jjin01 at mail.rockefeller.edu] Sent: Thursday, June 20, 2013 2:22 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker exon result Dear all, I have used maker to predict the gene model in my draft genome. However, when I check the sequence for each exon, I find some of them just have start codon, without stop codon. Is it reasonable for this? Like in this example: processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9 processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0|0|1|0.14|0.12|8|0|362 processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . + 2 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG This is the sequence for this gene, the red color is for the first exon?? However, for this exon, I cannot found the stop codon??? I also find for some exon, there are several stop codon in one exon??? Does anyone have the same problem with me? Or there is something wrong when I configure the maker file?? Thanks! Jingjing From bmoore at genetics.utah.edu Thu Jun 20 19:29:41 2013 From: bmoore at genetics.utah.edu (Barry Moore) Date: Fri, 21 Jun 2013 01:29:41 +0000 Subject: [maker-devel] maker exon result In-Reply-To: References: , , Message-ID: <8BA467BB-5549-4385-A398-65951A19B86C@genetics.utah.edu> To clarify things a bit Jin. Not every exon will have a start and/or stop codon only the fist coding exon will have a start and the last coding exon will have a stop. In the GFF3 format a coding exon is a feature of type 'CDS' (column 3) so only look at CDS features not at 'exon' features. For CDSs you must then concatenate the sequence I'd each CDS line for a given transcript (and reverse compliment the sequence if it is on the minus strand). The resulting sequence will usually (but not always) have start and stop codons at the beginning and end. B Barry Moore Research Scientist Dept. Human Genetics University of Utah On Jun 20, 2013, at 6:18 PM, "Jingjing Jin" > wrote: For my understanding, the prediction gene model should be connect different exon together. For each exon of a gene, I think it should have a start codon and stop codon. However, it may be wrong. However, when I check some gene model from maker prediction, some exon of one gene, I cannot find stop codon for it. Like the example I give, the red color is the first exon. However, the last 3 NT is not a stop codon. Even for last 3 NT for last exon, it is also not a stop codon. Is it reasonable? Thanks! Jingjing ________________________________ From: Daniel Ence [dence at genetics.utah.edu] Sent: Thursday, June 20, 2013 7:06 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: RE: maker exon result Hi Jingjing, It's really hard to find the stop codon in the nucleotide sequence that you sent. I think most people determine the presence of a stop codon in a gene by viewing the annotations and sequence in some kind of viewer. The one that I use the most is Apollo, but many people also like gbrowse and igv. When you view gene models in Apollo, the start codons are highlighted in green and the stop codons are highlighted in red. Sometimes MAKER couldn't find the stop or start codon for a gene, and in those cases, the end of the gene model is marked with an orange arrow. I hope that I understood your question. Feel free to reply back on the mailing list if I didn't. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing Jin [jjin01 at mail.rockefeller.edu] Sent: Thursday, June 20, 2013 2:22 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker exon result Dear all, I have used maker to predict the gene model in my draft genome. However, when I check the sequence for each exon, I find some of them just have start codon, without stop codon. Is it reasonable for this? Like in this example: processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9 processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0|0|1|0.14|0.12|8|0|362 processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . + 2 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG This is the sequence for this gene, the red color is for the first exon?? However, for this exon, I cannot found the stop codon??? I also find for some exon, there are several stop codon in one exon??? Does anyone have the same problem with me? Or there is something wrong when I configure the maker file?? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From kara.deleon at biofilm.montana.edu Thu Jun 20 16:25:31 2013 From: kara.deleon at biofilm.montana.edu (Bowen, Kara (De Leon)) Date: Thu, 20 Jun 2013 16:25:31 -0600 Subject: [maker-devel] augustus_species Message-ID: <3E82665C-ECB7-4A07-B0FF-24E8395EDC4D@biofilm.montana.edu> Hello, I am trying to annotation a Chlamydomonas genome and C. reinhartii was used as a model organism in Augustus. I would like to add this model to augustus_species in the maker_opts.ctl file, but I'm not sure how this information should be inserted on this line (ie. as genus name, file location, etc). I am also having an issue with providing a protein file. When I put in the protein fasta file of C. reinhartti from the Augustus website, I get a fatal error (below). I've looked through the fasta and I'm not seeing anything obvious that would cause this error to be thrown. Do you have any suggestions on where to start to look? Can't open sequence index file /Users/kara/Desktop/CBMW_maker_protein/contigs.maker.output/mpi_blastdb/augustus%2Eu9_aa%2Efasta.mpi.10/augustus%2Eu9_aa%2Efasta.mpi.10.1.index: Inappropriate file type or format at /sw/lib/perl5/5.12.4/Bio/DB/Fasta.pm line 527. FATAL ERROR Thanks for any help you can provide. Kara ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Kara De Le?n Postdoctoral Research Associate Montana State University Center for Biofilm Engineering 366 EPS Building Bozeman, MT 59717 208-484-9078 kara.deleon at biofilm.montana.edu ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -------------- next part -------------- An HTML attachment was scrubbed... URL: From gowthaman.ramasamy at seattlebiomed.org Fri Jun 21 07:29:06 2013 From: gowthaman.ramasamy at seattlebiomed.org (Gowthaman Ramasamy) Date: Fri, 21 Jun 2013 06:29:06 -0700 Subject: [maker-devel] augustus_species Message-ID: I believe the model file should go to Augustus installation directory. Actually in to the 'genomes' sub folder there. Then use the exact name of the model file ( minus extension) in .CTL file....... "Bowen, Kara (De Leon)" wrote: Hello, I am trying to annotation a Chlamydomonas genome and C. reinhartii was used as a model organism in Augustus. I would like to add this model to augustus_species in the maker_opts.ctl file, but I'm not sure how this information should be inserted on this line (ie. as genus name, file location, etc). I am also having an issue with providing a protein file. When I put in the protein fasta file of C. reinhartti from the Augustus website, I get a fatal error (below). I've looked through the fasta and I'm not seeing anything obvious that would cause this error to be thrown. Do you have any suggestions on where to start to look? Can't open sequence index file /Users/kara/Desktop/CBMW_maker_protein/contigs.maker.output/mpi_blastdb/augustus%2Eu9_aa%2Efasta.mpi.10/augustus%2Eu9_aa%2Efasta.mpi.10.1.index: Inappropriate file type or format at /sw/lib/perl5/5.12.4/Bio/DB/Fasta.pm line 527. FATAL ERROR Thanks for any help you can provide. Kara ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Kara De Le?n Postdoctoral Research Associate Montana State University Center for Biofilm Engineering 366 EPS Building Bozeman, MT 59717 208-484-9078 kara.deleon at biofilm.montana.edu ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From carsonhh at gmail.com Fri Jun 21 09:24:17 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Jun 2013 11:24:17 -0400 Subject: [maker-devel] augustus_species In-Reply-To: Message-ID: The model files must go in .../augustus/config/species/ under the augustus installation directory (Each model gets a different directory). The species that augustus can accept will be the same as the directory names under .../augustus/config/species/. The command 'augustus --species=help' will also provide a list of those names. For the protein file can you send it to me? --Carson On 13-06-21 9:29 AM, "Gowthaman Ramasamy" wrote: >I believe the model file should go to Augustus installation directory. >Actually in to the 'genomes' sub folder there. Then use the exact name of >the model file ( minus extension) in .CTL file....... > >"Bowen, Kara (De Leon)" wrote: > > > >Hello, >I am trying to annotation a Chlamydomonas genome and C. reinhartii was >used as a model organism in Augustus. I would like to add this model to >augustus_species in the maker_opts.ctl file, but I'm not sure how this >information should be inserted on this line (ie. as genus name, file >location, etc). > >I am also having an issue with providing a protein file. When I put in >the protein fasta file of C. reinhartti from the Augustus website, I get >a fatal error (below). I've looked through the fasta and I'm not seeing >anything obvious that would cause this error to be thrown. Do you have >any suggestions on where to start to look? > > >Can't open sequence index file >/Users/kara/Desktop/CBMW_maker_protein/contigs.maker.output/mpi_blastdb/au >gustus%2Eu9_aa%2Efasta.mpi.10/augustus%2Eu9_aa%2Efasta.mpi.10.1.index: >Inappropriate file type or format at /sw/lib/perl5/5.12.4/Bio/DB/Fasta.pm >line 527. > >FATAL ERROR > > >Thanks for any help you can provide. > >Kara > > >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >Kara De Le?n >Postdoctoral Research Associate >Montana State University >Center for Biofilm Engineering >366 EPS Building >Bozeman, MT 59717 >208-484-9078 >kara.deleon at biofilm.montana.edu >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri Jun 21 07:58:35 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Jun 2013 09:58:35 -0400 Subject: [maker-devel] maker exon result In-Reply-To: <8BA467BB-5549-4385-A398-65951A19B86C@genetics.utah.edu> Message-ID: To further illustrate this I've highlighted the location of all CDS entries. You need to cut them out, string them together linearly, and only then can you translate. There is a start codon for the merged CDS then all open reading frame following that, but no stop codon so this is a partial transcript. Sometimes the gene predictors do not find a likely stop and a partial model scores better. You can force MAKER to try and find a stop even when the gene predictor (snap, augustus, etc.) doesn't by setting always_complete=1 in the maker_opts.ctl file. Keep in mind that this is just a forced canonical completion. ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCA GAATTATCTCCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCG GGGAAGTTGATTGAGAATAATCCATCAGGGgtgatacagaagaattgtttcagTATCTTG TTGAAATATTGGCTTCTAGAGTGTATGATGTAGCAATTGATTCCCCCTTGCAAAATGCAA CTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGATCAAAAGAGAGGATATGCAGTCCg tatgtttctcctctcttctttttttgatgtagcatttgctttaacttagaatttgtggtt ttaaacataccattagaaaggtatggaggttgaggattagggtagtaaagtaggtagtct agagtgttcataacagtaatattgacaagcagtctcgctttccgttggtagtaggttttt atgactaaccgttattttctttcattgttgatcaacttacttttgttgtttttattctgc ttttatatggctttttggtactgtcccttcttgtctatattttcattaatgtggtgctta tgcttttctaagccgagagtttattggaaacaactttcatatcctcacaaggtaggggta aggtgtgcgtacacactaccctccccagactctacggtgtgggataatatttagtatgtt attgtcgttgttgttgtaaacgttttttttgttgctatcaaagcatgttattacgggtaa aatagaaacatttaaagtgaaagagtttccaaacgtaggaaagcttttttttctttcgga atacaccgaaaaaagaaagactatcatttaagatagaacaacaacagcgacggagctagc cttcgacttactggttcggcagaacccaataattttggcccaaactctgtacttgtacta aaaagctcacttaatatgtataaaaagcctagtaattaagttgcatttttttctttctaa aatctagagctcataaactcaaaattatgtctccgcctctgaacaatggggatattattc tacttttaactatcttagataagttaataattgttctctttttcaaacgtttctgccttg tattattgtgtaactatttatactgtgtggacgcttcaaaatgttgttgcgcccgcgtcg gatcctcaaaaaatatatattttgaggattcgacacgcacccgatgaccttttcggagaa ttcgagcaatataggtaactaatattgctagctcatcaactggtggtattttttagGTGC TCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTC AGAGACTTAAATGTACTGCTACGATTgtcatgcctgttaccacaccagagatcaaggtaa ttagttctctcctgttaatttatccttcatgttcgattcatgtgaatctagttgatcggg cactgagttttactaaaaaatgaagactttcggaacttgggagctttaacatgctgtaac atttgtgtagttataagacttttgaaacttatagtcttagtgggtgtttggacataagaa ttgtaaagttccaagaaaagtgaaaaaaaattcaagtgaaaatggtatttgaaaattaga gttgtgtttggacatgaatataattttaggttgtttttgaagttttgtgagtgatctgac acaaattttgaaaaaacaactttttggagtttttcaaattttcgaaaaattccaaaatgc atcttcaagtgaaaattggaaattatatgaccaaacgctgatttcgggaaaaaaattcga aaaaatgtgaaaattttcttatgtccaaacgggctcttaaatgcgtcataacgtttgtgt ggttataaaagtctctcatctgaatagggtcacacaactaaaacagagagaacaaaataa ttcactaaaaaaaaattggaactagctacaaacttcgtcgcaagtctcgctaaatcgctc gtagctaatagaatttctagataatttgtttagcttgtagcatgaaatttttctatttag caacagaagtagtctgtcgctaattcctatttttttagtagaaagtattgtgaaattatt tgtttttctaaaggaccattttctttacaaatgaacagattgaagcagttaagaacttgg atggtaatgtagttctacagGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGT TGGCTGAAGATGAAGgtctcacattcatcccgcctttcgatcacatcttaaagatataca tgcagtatttctgcctgtagGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAA AAGGGTTGCTCCTCATACAAAGATTATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGAC ACAGTCTTTGTACCACGGAATGAGAGTAAAGTTAGAACAAGTTGATAATTTTGCAGATGG CgtagctgttgcactagTTAGTTGGTGAAGAAACTTTCCGTCTTTGCAAAGATTTAATAG ACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGgttagcacgcacc atctcctaatggtttcagatatgatccgtccaaccagccaaaattggttagaataggacg ggttgaactatcaacccaatcaatcacagcccaaataacatttatgtgggtatatgactc gcccatttattaactcaaccaattttggtccattcaaattcaggctaacccgtccacgtt tgacattcatactttagatgtggattaaagtaactttcttaaatttccctctggttttga catgtactagtttgtgtttgtgtgtgttttgttctttttttcaatagGATGTGTACGACA AAGGAAGGAACATATTAGAGACATCAGGTGCACTCGCCATAGCTGGAGCTGAAGCATACT GCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTAGTGGAGCCAATATGG ACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGGAAGCTC TGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTgtgc gttacttagagcacttaacaagcattttagccagagtttaagttatatacatcgtcgtca gtgtaagaaacttttataccgtcttgatggagtaaaaatttgttacactgacgtgtacat aacttaaaacttttttagttactatatgatactttctgtctaagaaactgaaatattgac ttgaattactggtgggacctatgattattaccgaattcaagtacagatataactctggaa gaaaacaagctctagttctgtacaggtaattaaagttctattcatttttagaggggatgt tggcttctcattttagatttgctttattagttgttaggaaaaaagaaattacttattaca ttcaatttttagATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGGAAG TTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG Thanks, Carson From: Barry Moore Date: Thursday, 20 June, 2013 9:29 PM To: Jingjing Jin Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] maker exon result To clarify things a bit Jin. Not every exon will have a start and/or stop codon only the fist coding exon will have a start and the last coding exon will have a stop. In the GFF3 format a coding exon is a feature of type 'CDS' (column 3) so only look at CDS features not at 'exon' features. For CDSs you must then concatenate the sequence I'd each CDS line for a given transcript (and reverse compliment the sequence if it is on the minus strand). The resulting sequence will usually (but not always) have start and stop codons at the beginning and end. B Barry Moore Research Scientist Dept. Human Genetics University of Utah On Jun 20, 2013, at 6:18 PM, "Jingjing Jin" wrote: > For my understanding, the prediction gene model should be connect different > exon together. > > For each exon of a gene, I think it should have a start codon and stop codon. > However, it may be wrong. > > However, when I check some gene model from maker prediction, some exon of one > gene, I cannot find stop codon for it. Like the example I give, the red color > is the first exon. However, the last 3 NT is not a stop codon. > > Even for last 3 NT for last exon, it is also not a stop codon. > > Is it reasonable? > > Thanks! > > Jingjing > > > > From: Daniel Ence [dence at genetics.utah.edu] > Sent: Thursday, June 20, 2013 7:06 PM > To: Jingjing Jin; maker-devel at yandell-lab.org > Subject: RE: maker exon result > > Hi Jingjing, > > It's really hard to find the stop codon in the nucleotide sequence that you > sent. I think most people determine the presence of a stop codon in a gene by > viewing the annotations and sequence in some kind of viewer. The one that I > use the most is Apollo, but many people also like gbrowse and igv. > > When you view gene models in Apollo, the start codons are highlighted in green > and the stop codons are highlighted in red. Sometimes MAKER couldn't find the > stop or start codon for a gene, and in those cases, the end of the gene model > is marked with an orange arrow. > > I hope that I understood your question. Feel free to reply back on the mailing > list if I didn't. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing > Jin [jjin01 at mail.rockefeller.edu] > Sent: Thursday, June 20, 2013 2:22 PM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] maker exon result > > Dear all, > > I have used maker to predict the gene model in my draft genome. > > However, when I check the sequence for each exon, I find some of them just > have start codon, without stop codon. > > Is it reasonable for this? > > Like in this example: > > processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-proce > ssed_tobacco_genome_sequences_c33-snap-gene-0.9 > processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=ma > ker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_ > tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0 > |0|1|0.14|0.12|8|0|362 > processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . > + 0 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . > + 0 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . > + 0 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . > + 0 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . > + 2 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . > + 0 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . > + 0 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . > + 0 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > > > ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT > CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT > AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG > ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA > TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC > TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG > TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG > ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT > TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT > TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC > TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC > AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT > TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT > AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC > TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA > AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT > TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG > CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC > GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT > TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG > AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA > AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA > TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC > TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC > TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG > TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA > TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC > TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA > ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT > GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA > AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA > GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT > TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT > TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG > AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC > TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT > TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT > AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC > TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA > GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG > GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA > TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA > TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT > TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG > CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA > GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG > AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT > ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT > TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA > CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA > CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT > ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA > TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG > AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG > > > > > This is the sequence for this gene, the red color is for the first exon?? > > > However, for this exon, I cannot found the stop codon??? > > > I also find for some exon, there are several stop codon in one exon??? > > > Does anyone have the same problem with me? > > Or there is something wrong when I configure the maker file?? > > > Thanks! > > > Jingjing > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From amelia.ireland at gmod.org Sun Jun 23 20:15:37 2013 From: amelia.ireland at gmod.org (Amelia Ireland) Date: Sun, 23 Jun 2013 19:15:37 -0700 Subject: [maker-devel] Fwd: about running MAKER In-Reply-To: References: Message-ID: >From the GMOD helpdesk; please cc Lin, lin11 at cougars.csusm.edu. ---------- Forwarded message ---------- From: Yunxi Lin Date: Sun, Jun 23, 2013 at 4:14 PM Subject: about running MAKER To: "gmod-help at gmod.org" Hi I'm running a eukaryote project on our server. Because our server do not have the GUI, is that still work for MAKER? And our command already ran more than one month to try to generate the model use for the training of SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux server. Thank you. Sincerely, Lin -- Amelia Ireland GMOD Community Support http://gmod.org || @gmodproject -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Jun 24 07:05:27 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Jun 2013 09:05:27 -0400 Subject: [maker-devel] Fwd: about running MAKER In-Reply-To: Message-ID: Run time is dependent on the size of your evidence dataset, genome size, and number of processors you use. If you have a large genome (Gb size) and you are running on a single cpu then that could take a long time. This is especially true if you use the alt_est option for evidence as these are aligned via tblastx which is 3-4 times slower than protein alignments, and 10-20 time slower than standard EST alignments. 95% of MAKER's runtime is BLAST alignment so your evidence dataset is the major factor. Also you do not need results from the entire genome to train SNAP. If you get results from ~10Mb of the genome that is usually sufficient. Also make sure you are taking advantage of parallelization. Launch via MPI to get maximum performance. I commonly launch on 16 and 32 cpu Linux servers which can annotate most fungal genomes in a few hours and larger genomes in a few days. --Carson From: Amelia Ireland Date: Sunday, 23 June, 2013 10:15 PM To: Cc: Subject: [maker-devel] Fwd: about running MAKER >From the GMOD helpdesk; please cc Lin, lin11 at cougars.csusm.edu. ---------- Forwarded message ---------- From: Yunxi Lin Date: Sun, Jun 23, 2013 at 4:14 PM Subject: about running MAKER To: "gmod-help at gmod.org" Hi I'm running a eukaryote project on our server. Because our server do not have the GUI, is that still work for MAKER? And our command already ran more than one month to try to generate the model use for the training of SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux server. Thank you. Sincerely, Lin -- Amelia Ireland GMOD Community Support http://gmod.org || @gmodproject _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Carson.Holt at oicr.on.ca Mon Jun 24 18:39:08 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Tue, 25 Jun 2013 00:39:08 +0000 Subject: [maker-devel] Fwd: about running MAKER In-Reply-To: Message-ID: You are most likely only getting 1 cpu of performance. You should just install MPICH2. It's easy just to let MAKER do it for you: Go to the ?/maker/src/ directory Run './Build mpich2' Once it finishes installing, it will be in the ?/maker/exe/mpich2/bin/ directory. Setup MAKER again to use MPICH2: Go to the ?/maker/src/ directory Run 'perl Build.PL' Say yes to the "use MPI": question Run './Build install' Now run MAKER via 'mpiexec'. Example --> ?/maker/exe/mpich2/bin/mpiexec -n 16 maker The ?n flag specifies how many CPUS to use. Mpiexec handles process communication either on the same machine or across machines. You will get much better performance. Thanks, Carson From: Yunxi Lin > Date: Monday, 24 June, 2013 7:11 PM To: Carson Holt > Cc: Amelia Ireland >, > Subject: Re: [maker-devel] Fwd: about running MAKER Hi Carson Thank your for your help. My genome estimated size is 250M base pairs. I ran it in 16cpu, but we don't have the MPI so I cannot use it. I don't think I'm using the alt_est option. I was following the tutorial to do that. I used TopHat and Cufflinks to generate the ESTs from the assembly sequence based on RNA-seq. I used that ESTs to run the MAKER. I think I already got more than 10Mb data. The information you mentioned is very helpful. I may go to use them to try to train the SNAP and Augustus. Because this is my first time using the MAKER, I ran already a month, I was wondering maybe the command I used in a wrong way. Sincerely, Yunxi 2013/6/24 Carson Holt > Run time is dependent on the size of your evidence dataset, genome size, and number of processors you use. If you have a large genome (Gb size) and you are running on a single cpu then that could take a long time. This is especially true if you use the alt_est option for evidence as these are aligned via tblastx which is 3-4 times slower than protein alignments, and 10-20 time slower than standard EST alignments. 95% of MAKER's runtime is BLAST alignment so your evidence dataset is the major factor. Also you do not need results from the entire genome to train SNAP. If you get results from ~10Mb of the genome that is usually sufficient. Also make sure you are taking advantage of parallelization. Launch via MPI to get maximum performance. I commonly launch on 16 and 32 cpu Linux servers which can annotate most fungal genomes in a few hours and larger genomes in a few days. --Carson From: Amelia Ireland > Date: Sunday, 23 June, 2013 10:15 PM To: > Cc: > Subject: [maker-devel] Fwd: about running MAKER >From the GMOD helpdesk; please cc Lin, lin11 at cougars.csusm.edu. ---------- Forwarded message ---------- From: Yunxi Lin > Date: Sun, Jun 23, 2013 at 4:14 PM Subject: about running MAKER To: "gmod-help at gmod.org" > Hi I'm running a eukaryote project on our server. Because our server do not have the GUI, is that still work for MAKER? And our command already ran more than one month to try to generate the model use for the training of SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux server. Thank you. Sincerely, Lin -- Amelia Ireland GMOD Community Support http://gmod.org || @gmodproject _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From lin11 at cougars.csusm.edu Mon Jun 24 17:11:23 2013 From: lin11 at cougars.csusm.edu (Yunxi Lin) Date: Mon, 24 Jun 2013 16:11:23 -0700 Subject: [maker-devel] Fwd: about running MAKER In-Reply-To: References: Message-ID: Hi Carson Thank your for your help. My genome estimated size is 250M base pairs. I ran it in 16cpu, but we don't have the MPI so I cannot use it. I don't think I'm using the alt_est option. I was following the tutorial to do that. I used TopHat and Cufflinks to generate the ESTs from the assembly sequence based on RNA-seq. I used that ESTs to run the MAKER. I think I already got more than 10Mb data. The information you mentioned is very helpful. I may go to use them to try to train the SNAP and Augustus. Because this is my first time using the MAKER, I ran already a month, I was wondering maybe the command I used in a wrong way. Sincerely, Yunxi 2013/6/24 Carson Holt > Run time is dependent on the size of your evidence dataset, genome size, > and number of processors you use. If you have a large genome (Gb size) and > you are running on a single cpu then that could take a long time. This is > especially true if you use the alt_est option for evidence as these are > aligned via tblastx which is 3-4 times slower than protein alignments, and > 10-20 time slower than standard EST alignments. 95% of MAKER's runtime is > BLAST alignment so your evidence dataset is the major factor. > > Also you do not need results from the entire genome to train SNAP. If you > get results from ~10Mb of the genome that is usually sufficient. Also make > sure you are taking advantage of parallelization. Launch via MPI to get > maximum performance. I commonly launch on 16 and 32 cpu Linux servers > which can annotate most fungal genomes in a few hours and larger genomes in > a few days. > > --Carson > > > From: Amelia Ireland > Date: Sunday, 23 June, 2013 10:15 PM > To: > Cc: > Subject: [maker-devel] Fwd: about running MAKER > > From the GMOD helpdesk; please cc Lin, lin11 at cougars.csusm.edu. > > ---------- Forwarded message ---------- > From: Yunxi Lin > Date: Sun, Jun 23, 2013 at 4:14 PM > Subject: about running MAKER > To: "gmod-help at gmod.org" > > > Hi > > I'm running a eukaryote project on our server. Because our server do not > have the GUI, is that still work for MAKER? And our command already ran > more than one month to try to generate the model use for the training of > SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux > server. > > Thank you. > > Sincerely, > Lin > > > > -- > Amelia Ireland > GMOD Community Support > http://gmod.org || @gmodproject > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Jun 25 08:12:45 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 25 Jun 2013 14:12:45 +0000 Subject: [maker-devel] Fwd: about running MAKER In-Reply-To: References: , Message-ID: Hi Yunxi, During the maker installation, there is an option to automatically install MPICH2, which would let you run maker parallelized. Try rerunning the perl Build.PL script in the "maker/src" directory, and when the option to install MPICH2 comes up, tell it yes. This will start an automated download and install onto your server. You can also start more than one maker process. They will work on annotating the genome together. You can start as many as ten or more processes like this, but MPI is a better parallelizing option. Hope that helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Yunxi Lin [lin11 at cougars.csusm.edu] Sent: Monday, June 24, 2013 5:11 PM To: Carson Holt Cc: maker-devel at yandell-lab.org; Amelia Ireland Subject: Re: [maker-devel] Fwd: about running MAKER Hi Carson Thank your for your help. My genome estimated size is 250M base pairs. I ran it in 16cpu, but we don't have the MPI so I cannot use it. I don't think I'm using the alt_est option. I was following the tutorial to do that. I used TopHat and Cufflinks to generate the ESTs from the assembly sequence based on RNA-seq. I used that ESTs to run the MAKER. I think I already got more than 10Mb data. The information you mentioned is very helpful. I may go to use them to try to train the SNAP and Augustus. Because this is my first time using the MAKER, I ran already a month, I was wondering maybe the command I used in a wrong way. Sincerely, Yunxi 2013/6/24 Carson Holt > Run time is dependent on the size of your evidence dataset, genome size, and number of processors you use. If you have a large genome (Gb size) and you are running on a single cpu then that could take a long time. This is especially true if you use the alt_est option for evidence as these are aligned via tblastx which is 3-4 times slower than protein alignments, and 10-20 time slower than standard EST alignments. 95% of MAKER's runtime is BLAST alignment so your evidence dataset is the major factor. Also you do not need results from the entire genome to train SNAP. If you get results from ~10Mb of the genome that is usually sufficient. Also make sure you are taking advantage of parallelization. Launch via MPI to get maximum performance. I commonly launch on 16 and 32 cpu Linux servers which can annotate most fungal genomes in a few hours and larger genomes in a few days. --Carson From: Amelia Ireland > Date: Sunday, 23 June, 2013 10:15 PM To: > Cc: > Subject: [maker-devel] Fwd: about running MAKER >From the GMOD helpdesk; please cc Lin, lin11 at cougars.csusm.edu. ---------- Forwarded message ---------- From: Yunxi Lin > Date: Sun, Jun 23, 2013 at 4:14 PM Subject: about running MAKER To: "gmod-help at gmod.org" > Hi I'm running a eukaryote project on our server. Because our server do not have the GUI, is that still work for MAKER? And our command already ran more than one month to try to generate the model use for the training of SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux server. Thank you. Sincerely, Lin -- Amelia Ireland GMOD Community Support http://gmod.org || @gmodproject _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Carson.Holt at oicr.on.ca Tue Jun 25 09:56:22 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Tue, 25 Jun 2013 15:56:22 +0000 Subject: [maker-devel] Fwd: about running MAKER In-Reply-To: <9FC132E2-9E59-42E9-ADBA-FD91644E2124@cougars.csusm.edu> Message-ID: You can get blast to use more than 1 cpu via the cpus= option, but that is still significantly limiting MAKER's performance. When you let MAKER install MPICH2, it will be local to the MAKER installation (MAKER only). It will be in ?/maker/exe/mpich2. This was purposely done for people who have limited access and install MAKER themselves, so they can run via MPI without having to get upgraded privileges. So I don't know if you installed MAKER yourself, but if you did, then this is an option that will let you run. --Carson From: csusm > Date: Tuesday, 25 June, 2013 11:40 AM To: Carson Holt > Subject: Re: [maker-devel] Fwd: about running MAKER Hi Carson Thank you for your suggestion. Do you mean if I dont use MPI, i could only run it on one cpu? Because my school own the server, I only have the limit authorization. Yunxi Lin On Jun 24, 2013, at 5:39 PM, Carson Holt > wrote: You are most likely only getting 1 cpu of performance. You should just install MPICH2. It's easy just to let MAKER do it for you: Go to the ?/maker/src/ directory Run './Build mpich2' Once it finishes installing, it will be in the ?/maker/exe/mpich2/bin/ directory. Setup MAKER again to use MPICH2: Go to the ?/maker/src/ directory Run 'perl Build.PL' Say yes to the "use MPI": question Run './Build install' Now run MAKER via 'mpiexec'. Example --> ?/maker/exe/mpich2/bin/mpiexec -n 16 maker The ?n flag specifies how many CPUS to use. Mpiexec handles process communication either on the same machine or across machines. You will get much better performance. Thanks, Carson From: Yunxi Lin > Date: Monday, 24 June, 2013 7:11 PM To: Carson Holt > Cc: Amelia Ireland >, > Subject: Re: [maker-devel] Fwd: about running MAKER Hi Carson Thank your for your help. My genome estimated size is 250M base pairs. I ran it in 16cpu, but we don't have the MPI so I cannot use it. I don't think I'm using the alt_est option. I was following the tutorial to do that. I used TopHat and Cufflinks to generate the ESTs from the assembly sequence based on RNA-seq. I used that ESTs to run the MAKER. I think I already got more than 10Mb data. The information you mentioned is very helpful. I may go to use them to try to train the SNAP and Augustus. Because this is my first time using the MAKER, I ran already a month, I was wondering maybe the command I used in a wrong way. Sincerely, Yunxi 2013/6/24 Carson Holt > Run time is dependent on the size of your evidence dataset, genome size, and number of processors you use. If you have a large genome (Gb size) and you are running on a single cpu then that could take a long time. This is especially true if you use the alt_est option for evidence as these are aligned via tblastx which is 3-4 times slower than protein alignments, and 10-20 time slower than standard EST alignments. 95% of MAKER's runtime is BLAST alignment so your evidence dataset is the major factor. Also you do not need results from the entire genome to train SNAP. If you get results from ~10Mb of the genome that is usually sufficient. Also make sure you are taking advantage of parallelization. Launch via MPI to get maximum performance. I commonly launch on 16 and 32 cpu Linux servers which can annotate most fungal genomes in a few hours and larger genomes in a few days. --Carson From: Amelia Ireland > Date: Sunday, 23 June, 2013 10:15 PM To: > Cc: > Subject: [maker-devel] Fwd: about running MAKER >From the GMOD helpdesk; please cc Lin, lin11 at cougars.csusm.edu. ---------- Forwarded message ---------- From: Yunxi Lin > Date: Sun, Jun 23, 2013 at 4:14 PM Subject: about running MAKER To: "gmod-help at gmod.org" > Hi I'm running a eukaryote project on our server. Because our server do not have the GUI, is that still work for MAKER? And our command already ran more than one month to try to generate the model use for the training of SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux server. Thank you. Sincerely, Lin -- Amelia Ireland GMOD Community Support http://gmod.org || @gmodproject _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Tue Jun 25 15:13:53 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Tue, 25 Jun 2013 21:13:53 +0000 Subject: [maker-devel] start position for some genes results Message-ID: Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062 maker gene -1 507 . - . ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jun 25 16:55:11 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Jun 2013 18:55:11 -0400 Subject: [maker-devel] start position for some genes results In-Reply-To: Message-ID: What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062makergene-1507.-.ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-sn ap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Tue Jun 25 17:00:37 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Tue, 25 Jun 2013 23:00:37 +0000 Subject: [maker-devel] start position for some genes results In-Reply-To: References: , Message-ID: Sorry, I have checked. I think it is old version:2.27. I will try the new one. Thanks! Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062 maker gene -1 507 . - . ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Tue Jun 25 18:53:01 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Wed, 26 Jun 2013 00:53:01 +0000 Subject: [maker-devel] start position for some genes results In-Reply-To: References: , Message-ID: Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062 maker gene -1 507 . - . ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jun 25 18:55:54 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Jun 2013 20:55:54 -0400 Subject: [maker-devel] start position for some genes results In-Reply-To: Message-ID: Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062makergene-1507.-.ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-sn ap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Tue Jun 25 19:30:09 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Wed, 26 Jun 2013 01:30:09 +0000 Subject: [maker-devel] start position for some genes results In-Reply-To: References: , Message-ID: Dear Carson, I am so sorry. The problem is still here. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', 'HASH(0x4e10810)', 0) called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 73 Process::MpiTiers::__ANON__() called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 eval {...} called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, 'Process::MpiChunk') called at /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 --> rank=NA, hostname=ChuaServer1 ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `seq_id` does not exist in MpiTier object ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 8:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062 maker gene -1 507 . - . ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jun 25 19:47:10 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Jun 2013 21:47:10 -0400 Subject: [maker-devel] start position for some genes results In-Reply-To: Message-ID: Could you check for this sequence in your input genome file for "processed_tobacco_genome_sequences_c1", make sure that it is in fact that exact name, and there are no ':' characters in the name because they can confuse the bioperl fasta indexer. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 9:30 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Dear Carson, I am so sorry. The problem is still here. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', 'HASH(0x4e10810)', 0) called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 73 Process::MpiTiers::__ANON__() called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 eval {...} called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, 'Process::MpiChunk') called at /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 --> rank=NA, hostname=ChuaServer1 ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `seq_id` does not exist in MpiTier object From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 8:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062makergene-1507.-.ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-sn ap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Tue Jun 25 19:53:33 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Wed, 26 Jun 2013 01:53:33 +0000 Subject: [maker-devel] start position for some genes results In-Reply-To: References: , Message-ID: Yes, this is the real name. There is also no ":" in the name. Because I have use the same file for maker.2.27 and have no problem. I am not sure what is wrong with the new version. Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 9:47 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Could you check for this sequence in your input genome file for "processed_tobacco_genome_sequences_c1", make sure that it is in fact that exact name, and there are no ':' characters in the name because they can confuse the bioperl fasta indexer. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 9:30 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Dear Carson, I am so sorry. The problem is still here. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', 'HASH(0x4e10810)', 0) called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 73 Process::MpiTiers::__ANON__() called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 eval {...} called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, 'Process::MpiChunk') called at /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 --> rank=NA, hostname=ChuaServer1 ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `seq_id` does not exist in MpiTier object ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 8:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062 maker gene -1 507 . - . ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jun 25 20:02:51 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Jun 2013 22:02:51 -0400 Subject: [maker-devel] start position for some genes results In-Reply-To: Message-ID: The point of the failure you are seeing is occurring in the initialization stage, before reaching any of the changes that would have been introduced by 2.28. Try running the test data that comes with MAKER, does it fail as well? --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 9:53 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Yes, this is the real name. There is also no ":" in the name. Because I have use the same file for maker.2.27 and have no problem. I am not sure what is wrong with the new version. Jingjing From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 9:47 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Could you check for this sequence in your input genome file for "processed_tobacco_genome_sequences_c1", make sure that it is in fact that exact name, and there are no ':' characters in the name because they can confuse the bioperl fasta indexer. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 9:30 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Dear Carson, I am so sorry. The problem is still here. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', 'HASH(0x4e10810)', 0) called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 73 Process::MpiTiers::__ANON__() called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 eval {...} called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, 'Process::MpiChunk') called at /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 --> rank=NA, hostname=ChuaServer1 ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `seq_id` does not exist in MpiTier object From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 8:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062makergene-1507.-.ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-sn ap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Tue Jun 25 20:15:46 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Wed, 26 Jun 2013 02:15:46 +0000 Subject: [maker-devel] start position for some genes results In-Reply-To: References: , Message-ID: Yes, it also fails on test data. jingjing at ChuaServer1:~/software/maker.2.28/maker/data/example$ ../../bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/software/maker.2.28/maker/data/example/dpp_contig.maker.output/dpp_contig_datastore To access files for individual sequences use the datastore index: /home/jingjing/software/maker.2.28/maker/data/example/dpp_contig.maker.output/dpp_contig_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >contig-dpp-500-500, trying to re-index the fasta. stop here: contig-dpp-500-500 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 10:02 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results The point of the failure you are seeing is occurring in the initialization stage, before reaching any of the changes that would have been introduced by 2.28. Try running the test data that comes with MAKER, does it fail as well? --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 9:53 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Yes, this is the real name. There is also no ":" in the name. Because I have use the same file for maker.2.27 and have no problem. I am not sure what is wrong with the new version. Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 9:47 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Could you check for this sequence in your input genome file for "processed_tobacco_genome_sequences_c1", make sure that it is in fact that exact name, and there are no ':' characters in the name because they can confuse the bioperl fasta indexer. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 9:30 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Dear Carson, I am so sorry. The problem is still here. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', 'HASH(0x4e10810)', 0) called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 73 Process::MpiTiers::__ANON__() called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 eval {...} called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, 'Process::MpiChunk') called at /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 --> rank=NA, hostname=ChuaServer1 ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `seq_id` does not exist in MpiTier object ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 8:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062 maker gene -1 507 . - . ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jun 26 05:49:11 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Jun 2013 07:49:11 -0400 Subject: [maker-devel] start position for some genes results In-Reply-To: Message-ID: I thought as much. There is something wrong with the installation itself. Could you run maker with the --debug flag and kill it after 30 seconds. Capture the STDERR and send it to me. This is just to check prerequisite that are installed on your system for know incompatabilities. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 10:15 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Yes, it also fails on test data. jingjing at ChuaServer1:~/software/maker.2.28/maker/data/example$ ../../bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/software/maker.2.28/maker/data/example/dpp_contig.maker.outpu t/dpp_contig_datastore To access files for individual sequences use the datastore index: /home/jingjing/software/maker.2.28/maker/data/example/dpp_contig.maker.outpu t/dpp_contig_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >contig-dpp-500-500, trying to re-index the fasta. stop here: contig-dpp-500-500 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 10:02 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results The point of the failure you are seeing is occurring in the initialization stage, before reaching any of the changes that would have been introduced by 2.28. Try running the test data that comes with MAKER, does it fail as well? --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 9:53 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Yes, this is the real name. There is also no ":" in the name. Because I have use the same file for maker.2.27 and have no problem. I am not sure what is wrong with the new version. Jingjing From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 9:47 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Could you check for this sequence in your input genome file for "processed_tobacco_genome_sequences_c1", make sure that it is in fact that exact name, and there are no ':' characters in the name because they can confuse the bioperl fasta indexer. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 9:30 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Dear Carson, I am so sorry. The problem is still here. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', 'HASH(0x4e10810)', 0) called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 73 Process::MpiTiers::__ANON__() called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 eval {...} called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, 'Process::MpiChunk') called at /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 --> rank=NA, hostname=ChuaServer1 ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `seq_id` does not exist in MpiTier object From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 8:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062makergene-1507.-.ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-sn ap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michel.moser at ips.unibe.ch Thu Jun 27 07:33:15 2013 From: michel.moser at ips.unibe.ch (michel.moser at ips.unibe.ch) Date: Thu, 27 Jun 2013 13:33:15 +0000 Subject: [maker-devel] spliting genome for annotation Message-ID: Dear Maker-developers If i understood correctly, in order to increase speed and reduce needed resources one can split the genome into chunks and annotate each chunk separately. (i would really like to use that as i am working with a 1.2 Gbasepair draftgenome and cant use MPI on the computing cluster) I am a bit worried about how this might affect the annotation as the gene-predictor would get trained quite differently for each chunk, right? Or is there communication between the chunks using the -base function of maker? Could you maybe name some pros and cons of splitting your genome for the annotation with maker? Thank you very much, Michel ________________________________________ Von: Moser, Michel (IPS) Gesendet: Donnerstag, 27. Juni 2013 15:24 An: Carson Holt Betreff: AW: [maker-devel] start position for some genes results ________________________________________ Von: maker-devel [maker-devel-bounces at yandell-lab.org]" im Auftrag von "Carson Holt [carsonhh at gmail.com] Gesendet: Mittwoch, 26. Juni 2013 04:02 An: Jingjing Jin; maker-devel at yandell-lab.org Betreff: Re: [maker-devel] start position for some genes results The point of the failure you are seeing is occurring in the initialization stage, before reaching any of the changes that would have been introduced by 2.28. Try running the test data that comes with MAKER, does it fail as well? --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 9:53 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Yes, this is the real name. There is also no ":" in the name. Because I have use the same file for maker.2.27 and have no problem. I am not sure what is wrong with the new version. Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 9:47 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Could you check for this sequence in your input genome file for "processed_tobacco_genome_sequences_c1", make sure that it is in fact that exact name, and there are no ':' characters in the name because they can confuse the bioperl fasta indexer. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 9:30 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Dear Carson, I am so sorry. The problem is still here. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', 'HASH(0x4e10810)', 0) called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 73 Process::MpiTiers::__ANON__() called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 eval {...} called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, 'Process::MpiChunk') called at /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 --> rank=NA, hostname=ChuaServer1 ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `seq_id` does not exist in MpiTier object ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 8:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062 maker gene -1 507 . - . ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From lawson at ebi.ac.uk Thu Jun 27 07:37:10 2013 From: lawson at ebi.ac.uk (Daniel Lawson) Date: Thu, 27 Jun 2013 14:37:10 +0100 Subject: [maker-devel] spliting genome for annotation In-Reply-To: References: Message-ID: Michel, It is about the size of your scaffolds rather than the whole genome. Presumably you don't have 1.2 Gb of contiguous sequence. If you have long scaffolds then the compute time will be constrained by the time taken to process the largest scaffold. regards Dan On 27 June 2013 14:33, wrote: > Dear Maker-developers > > If i understood correctly, in order to increase speed and reduce needed > resources one can split the genome into chunks and annotate each chunk > separately. > (i would really like to use that as i am working with a 1.2 Gbasepair > draftgenome and cant use MPI on the computing cluster) > I am a bit worried about how this might affect the annotation as the > gene-predictor would get trained quite differently for each chunk, right? > Or is there communication between the chunks using the -base function of > maker? > > Could you maybe name some pros and cons of splitting your genome for the > annotation with maker? > > Thank you very much, > Michel > > > > > ________________________________________ > Von: Moser, Michel (IPS) > Gesendet: Donnerstag, 27. Juni 2013 15:24 > An: Carson Holt > Betreff: AW: [maker-devel] start position for some genes results > > ________________________________________ > Von: maker-devel [maker-devel-bounces at yandell-lab.org]" im Auftrag > von "Carson Holt [carsonhh at gmail.com] > Gesendet: Mittwoch, 26. Juni 2013 04:02 > An: Jingjing Jin; maker-devel at yandell-lab.org > Betreff: Re: [maker-devel] start position for some genes results > > The point of the failure you are seeing is occurring in the initialization > stage, before reaching any of the changes that would have been introduced > by 2.28. Try running the test data that comes with MAKER, does it fail as > well? > > --Carson > > > > From: Jingjing Jin jjin01 at mail.rockefeller.edu>> > Date: Tuesday, 25 June, 2013 9:53 PM > To: Carson Holt >, " > maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org> > Subject: RE: [maker-devel] start position for some genes results > > Yes, this is the real name. > > There is also no ":" in the name. > > Because I have use the same file for maker.2.27 and have no problem. > > I am not sure what is wrong with the new version. > > Jingjing > > > ________________________________ > From: Carson Holt [carsonhh at gmail.com] > Sent: Tuesday, June 25, 2013 9:47 PM > To: Jingjing Jin; maker-devel at yandell-lab.org maker-devel at yandell-lab.org> > Subject: Re: [maker-devel] start position for some genes results > > Could you check for this sequence in your input genome file for > "processed_tobacco_genome_sequences_c1", make sure that it is in fact that > exact name, and there are no ':' characters in the name because they can > confuse the bioperl fasta indexer. > > --Carson > > > From: Jingjing Jin jjin01 at mail.rockefeller.edu>> > Date: Tuesday, 25 June, 2013 9:30 PM > To: Carson Holt >, " > maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org> > Subject: RE: [maker-devel] start position for some genes results > > Dear Carson, > > > I am so sorry. The problem is still here. > > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore > > To access files for individual sequences use the datastore index: > > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log > > STATUS: Now running MAKER... > WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to > re-index the fasta. > stop here: processed_tobacco_genome_sequences_c1 > ERROR: Fasta index error > at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm > line 239. > Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', > 'HASH(0x4e10810)', 0) called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm > line 73 > Process::MpiTiers::__ANON__() called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 > eval {...} called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 > Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm > line 79 > Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') > called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm > line 56 > Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, > 'Process::MpiChunk') called at > /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 > --> rank=NA, hostname=ChuaServer1 > ERROR: Failed in tier preparation > WARNING: You must always set a rank before running MpiTiers > FATAL: argument `seq_id` does not exist in MpiTier object > > ________________________________ > From: Carson Holt [carsonhh at gmail.com] > Sent: Tuesday, June 25, 2013 8:55 PM > To: Jingjing Jin; maker-devel at yandell-lab.org maker-devel at yandell-lab.org> > Subject: Re: [maker-devel] start position for some genes results > > Delete the mpi_blastdb directory before starting, to make sure all indexes > get rebuilt. Also make sure you are not setting TMP= to a network mounted > location. > > --Carson > > > From: Jingjing Jin jjin01 at mail.rockefeller.edu>> > Date: Tuesday, 25 June, 2013 8:53 PM > To: Carson Holt >, " > maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org> > Subject: RE: [maker-devel] start position for some genes results > > Dear Carson, > > When I use the new version of maker, I have another problem like this: > > jingjing at ChuaServer1:~/project/$ > /home/jingjing/software/maker.2.28/maker/bin/./maker > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore > > To access files for individual sequences use the datastore index: > > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log > > STATUS: Now running MAKER... > WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to > re-index the fasta. > stop here: processed_tobacco_genome_sequences_c1 > ERROR: Fasta index error > > > Do you know how to fix this problem about new version? > > Thanks! > > Jingjing > > > > ________________________________ > From: Carson Holt [carsonhh at gmail.com] > Sent: Tuesday, June 25, 2013 6:55 PM > To: Jingjing Jin; maker-devel at yandell-lab.org maker-devel at yandell-lab.org> > Subject: Re: [maker-devel] start position for some genes results > > What MAKER version are you using? This should be fixed in the current > 2.28. It only happened under a very specific set of circumstances, but I > remember fixing it. So let me know if you are using 2.28. > > --Carson > > > > From: Jingjing Jin jjin01 at mail.rockefeller.edu>> > Date: Tuesday, 25 June, 2013 5:13 PM > To: "maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org> > Subject: [maker-devel] start position for some genes results > > Dear all, > > I find some strange things about location for my final result. > > Like for some start position of final gene model: > > c124062 maker gene -1 507 . - . > ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 > > > It start position is -1. > > Does someone know why the start position is -1? > > Is there something wrong? > > Thanks! > > Jingjing > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Ensembl Genomes | VectorBase | i5K insect genome initiative -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jun 27 09:42:26 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 27 Jun 2013 11:42:26 -0400 Subject: [maker-devel] spliting genome for annotation In-Reply-To: Message-ID: Correct. The level of splitting is going to be limited by the largest config. The largest config will then be your slowest job, but the total runtime will be based off how much splitting you can achieve. Splitting into 10 jobs and running them all simultaneously will make total run time 1/10 as long. You can use the ?base flag with MAKER to make all jobs write to the same directory. Use the ?g flag to specify a different input fasta file for each job (then they can all share the same control files). You will then need to run maker once using the original assembly fasta and the ?dsindex flag when all jobs complete to get MAKER to clean up the datastore log file (rebuilt to index all contigs). That only takes 2 minutes to run. You can use the fasta_tool utility that comes with MAKER to conveniently split the input assembly fasta. MAKER does not train the gene predictors for you, and the hints it gives are on a per gene basis, so splitting contigs has no affect on that. For initial training of gene predictors, run MAKER on about 10-30 Mb of your largest contigs and use either the protein2genome or est2genome prediction options to build gene models to train the predictors on. You will need to train Augustus or SNAP yourself using those models and their own documentation. If training SNAP, you can use maker2zff to convert for SNAPs training format. You can also use the tool CEGMA from Ian Korf's lab to train SNAP. Use the cegma2zff script that comes with MAKER to do the conversion for training input. If you have questions once you start training, just send them to the list. Thanks, Carson From: Daniel Lawson Date: Thursday, 27 June, 2013 9:37 AM To: Cc: Subject: Re: [maker-devel] spliting genome for annotation Michel, It is about the size of your scaffolds rather than the whole genome. Presumably you don't have 1.2 Gb of contiguous sequence. If you have long scaffolds then the compute time will be constrained by the time taken to process the largest scaffold. regards Dan On 27 June 2013 14:33, wrote: > Dear Maker-developers > > If i understood correctly, in order to increase speed and reduce needed > resources one can split the genome into chunks and annotate each chunk > separately. > (i would really like to use that as i am working with a 1.2 Gbasepair > draftgenome and cant use MPI on the computing cluster) > I am a bit worried about how this might affect the annotation as the > gene-predictor would get trained quite differently for each chunk, right? > Or is there communication between the chunks using the -base function of > maker? > > Could you maybe name some pros and cons of splitting your genome for the > annotation with maker? > > Thank you very much, > Michel > > > > > ________________________________________ > Von: Moser, Michel (IPS) > Gesendet: Donnerstag, 27. Juni 2013 15:24 > An: Carson Holt > Betreff: AW: [maker-devel] start position for some genes results > > ________________________________________ > Von: maker-devel [maker-devel-bounces at yandell-lab.org]" im Auftrag von > "Carson Holt [carsonhh at gmail.com] > Gesendet: Mittwoch, 26. Juni 2013 04:02 > An: Jingjing Jin; maker-devel at yandell-lab.org > Betreff: Re: [maker-devel] start position for some genes results > > The point of the failure you are seeing is occurring in the initialization > stage, before reaching any of the changes that would have been introduced by > 2.28. Try running the test data that comes with MAKER, does it fail as well? > > --Carson > > > > From: Jingjing Jin > > > Date: Tuesday, 25 June, 2013 9:53 PM > To: Carson Holt >, > "maker-devel at yandell-lab.org" > > > Subject: RE: [maker-devel] start position for some genes results > > Yes, this is the real name. > > There is also no ":" in the name. > > Because I have use the same file for maker.2.27 and have no problem. > > I am not sure what is wrong with the new version. > > Jingjing > > > ________________________________ > From: Carson Holt [carsonhh at gmail.com] > Sent: Tuesday, June 25, 2013 9:47 PM > To: Jingjing Jin; > maker-devel at yandell-lab.org > Subject: Re: [maker-devel] start position for some genes results > > Could you check for this sequence in your input genome file for > "processed_tobacco_genome_sequences_c1", make sure that it is in fact that > exact name, and there are no ':' characters in the name because they can > confuse the bioperl fasta indexer. > > --Carson > > > From: Jingjing Jin > > > Date: Tuesday, 25 June, 2013 9:30 PM > To: Carson Holt >, > "maker-devel at yandell-lab.org" > > > Subject: RE: [maker-devel] start position for some genes results > > Dear Carson, > > > I am so sorry. The problem is still here. > > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.ma > ker.output/tobacco_seq_1_datastore > > To access files for individual sequences use the datastore index: > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.ma > ker.output/tobacco_seq_1_master_datastore_index.log > > STATUS: Now running MAKER... > WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to > re-index the fasta. > stop here: processed_tobacco_genome_sequences_c1 > ERROR: Fasta index error > at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm > line 239. > Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', > 'HASH(0x4e10810)', 0) called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line > 73 > Process::MpiTiers::__ANON__() called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 > eval {...} called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 > Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line > 79 > Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') > called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line > 56 > Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, > 'Process::MpiChunk') called at > /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 > --> rank=NA, hostname=ChuaServer1 > ERROR: Failed in tier preparation > WARNING: You must always set a rank before running MpiTiers > FATAL: argument `seq_id` does not exist in MpiTier object > > ________________________________ > From: Carson Holt [carsonhh at gmail.com] > Sent: Tuesday, June 25, 2013 8:55 PM > To: Jingjing Jin; > maker-devel at yandell-lab.org > Subject: Re: [maker-devel] start position for some genes results > > Delete the mpi_blastdb directory before starting, to make sure all indexes get > rebuilt. Also make sure you are not setting TMP= to a network mounted > location. > > --Carson > > > From: Jingjing Jin > > > Date: Tuesday, 25 June, 2013 8:53 PM > To: Carson Holt >, > "maker-devel at yandell-lab.org" > > > Subject: RE: [maker-devel] start position for some genes results > > Dear Carson, > > When I use the new version of maker, I have another problem like this: > > jingjing at ChuaServer1:~/project/$ > /home/jingjing/software/maker.2.28/maker/bin/./maker > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.ma > ker.output/tobacco_seq_1_datastore > > To access files for individual sequences use the datastore index: > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.ma > ker.output/tobacco_seq_1_master_datastore_index.log > > STATUS: Now running MAKER... > WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to > re-index the fasta. > stop here: processed_tobacco_genome_sequences_c1 > ERROR: Fasta index error > > > Do you know how to fix this problem about new version? > > Thanks! > > Jingjing > > > > ________________________________ > From: Carson Holt [carsonhh at gmail.com] > Sent: Tuesday, June 25, 2013 6:55 PM > To: Jingjing Jin; > maker-devel at yandell-lab.org > Subject: Re: [maker-devel] start position for some genes results > > What MAKER version are you using? This should be fixed in the current 2.28. > It only happened under a very specific set of circumstances, but I remember > fixing it. So let me know if you are using 2.28. > > --Carson > > > > From: Jingjing Jin > > > Date: Tuesday, 25 June, 2013 5:13 PM > To: "maker-devel at yandell-lab.org" > > > Subject: [maker-devel] start position for some genes results > > Dear all, > > I find some strange things about location for my final result. > > Like for some start position of final gene model: > > c124062 maker gene -1 507 . - . > ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 > > > It start position is -1. > > Does someone know why the start position is -1? > > Is there something wrong? > > Thanks! > > Jingjing > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Ensembl Genomes | VectorBase | i5K insect genome initiative _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From benayoun at stanford.edu Thu Jun 27 18:47:29 2013 From: benayoun at stanford.edu (=?ISO-8859-1?Q?B=E9r=E9nice_Benayoun?=) Date: Thu, 27 Jun 2013 17:47:29 -0700 Subject: [maker-devel] Maker and mono-exonic genes ? In-Reply-To: References: Message-ID: Hi maker devel team, just wanted to say that retraining SNAP apparently fixed the problem (I modified the defaults and added "-min-intron 0" to the training everywhere relevant (default is 30bp, and must prevent single exon genes to be predicted). Thanks for your insights/help ! Berenice 2013/6/10 Carson Holt > One more note. The ESTs appear to be from multiple overlapping HSPs > (based on red line pattern in image). I'd have to see the actual GFF3 to > be sure, but if that is the case, then there probably isn't an ORF to work > with at that location on that strand (so SNAP can't call it). Possibly the > result of assembly error or a pseudogene. > > --Carson > > > > From: Daniel Ence > Date: Friday, 7 June, 2013 5:32 PM > To: B?r?nice Benayoun , " > maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Maker and mono-exonic genes ? > > Hi Berenice, Thank you for sending that screenshot and the maker_opts.log > file. Those are exactly what we need to understand how to expect MAKER to > perform. > > In looking at the screenshot, it doesn't look like any of the gene > predictors gave a prediction in this region. Uses the predictions from > ab-initio tools as a basis for models and considers models that are > supported by evidence. It won't by default create a model when there isn't > a prediction in the region. > > Can I ask which gene predictors you used and how they were trained? You > might consider training one or more of them on the specific evidence that > you expect to support these genes and then rerunning maker with the > retrained predictors. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ------------------------------ > *From:* maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of > B?r?nice Benayoun [benayoun at stanford.edu] > *Sent:* Friday, June 07, 2013 11:17 AM > *To:* maker-devel at yandell-lab.org > *Subject:* [maker-devel] Maker and mono-exonic genes ? > > Dear maker developers, > > I am currently annotating a de novo fish genome, and have started looking > for genes of interest in particular in Maker's output to verify that it's > outputting proper gene sets. > > While many of the genes I look for seem to be correctly annotated by the > pipeline, I have noticed that important genes that do have strong > evidentiary support but are monoexonic are NOT reported by maker. > > I am attaching a screenshot for the contig that I know should contain the > * Foxl2* gene (notoriously monoexonic across evolution), and highlighted > the corresponding evidence for it. > > Is there any setting I can give to maker to force it to output monoexonic > genes ? I already set "single_exon=1" with no success. I attached my config > file FYI. > > Thank you so much in advance for your answer !!! > > Best, > > Berenice. > -- > B?r?nice A. BENAYOUN, Ph.D. > Stanford University/Genetics Department > *BRUNET Laboratory*, 'Molecular Basis of Longevity and Age Related > Diseases' > M312 Alway Building > 300, Pasteur Drive > MC 5120 > Stanford, CA 94305-5120 > USA > Email: benayoun at stanford.edu > Web: www.stanford.edu/group/brunet/ > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- B?r?nice A. BENAYOUN, Ph.D. Stanford University/Genetics Department *BRUNET Laboratory*, 'Molecular Basis of Longevity and Age Related Diseases' M312 Alway Building 300, Pasteur Drive MC 5120 Stanford, CA 94305-5120 USA Email: benayoun at stanford.edu Web: www.stanford.edu/group/brunet/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jun 28 19:01:47 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 28 Jun 2013 21:01:47 -0400 Subject: [maker-devel] Maker and mono-exonic genes ? In-Reply-To: Message-ID: I'm glad it's working for you. Let us know if you run into additional problems. Thanks, Carson From: B?r?nice Benayoun Date: Thursday, June 27, 2013 8:47 PM To: Carson Holt Cc: Daniel Ence , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Maker and mono-exonic genes ? Hi maker devel team, just wanted to say that retraining SNAP apparently fixed the problem (I modified the defaults and added "-min-intron 0" to the training everywhere relevant (default is 30bp, and must prevent single exon genes to be predicted). Thanks for your insights/help ! Berenice 2013/6/10 Carson Holt > One more note. The ESTs appear to be from multiple overlapping HSPs (based on > red line pattern in image). I'd have to see the actual GFF3 to be sure, but > if that is the case, then there probably isn't an ORF to work with at that > location on that strand (so SNAP can't call it). Possibly the result of > assembly error or a pseudogene. > > --Carson > > > > From: Daniel Ence > Date: Friday, 7 June, 2013 5:32 PM > To: B?r?nice Benayoun , "maker-devel at yandell-lab.org" > > Subject: Re: [maker-devel] Maker and mono-exonic genes ? > > Hi Berenice, Thank you for sending that screenshot and the maker_opts.log > file. Those are exactly what we need to understand how to expect MAKER to > perform. > > In looking at the screenshot, it doesn't look like any of the gene predictors > gave a prediction in this region. Uses the predictions from ab-initio tools as > a basis for models and considers models that are supported by evidence. It > won't by default create a model when there isn't a prediction in the region. > > Can I ask which gene predictors you used and how they were trained? You might > consider training one or more of them on the specific evidence that you expect > to support these genes and then rerunning maker with the retrained predictors. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of B?r?nice > Benayoun [benayoun at stanford.edu] > Sent: Friday, June 07, 2013 11:17 AM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] Maker and mono-exonic genes ? > > Dear maker developers, > > I am currently annotating a de novo fish genome, and have started looking for > genes of interest in particular in Maker's output to verify that it's > outputting proper gene sets. > > While many of the genes I look for seem to be correctly annotated by the > pipeline, I have noticed that important genes that do have strong evidentiary > support but are monoexonic are NOT reported by maker. > > I am attaching a screenshot for the contig that I know should contain the > Foxl2 gene (notoriously monoexonic across evolution), and highlighted the > corresponding evidence for it. > > Is there any setting I can give to maker to force it to output monoexonic > genes ? I already set "single_exon=1" with no success. I attached my config > file FYI. > > Thank you so much in advance for your answer !!! > > Best, > > Berenice. > -- > B?r?nice A. BENAYOUN, Ph.D. > Stanford University/Genetics Department > BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' > M312 Alway Building > 300, Pasteur Drive > MC 5120 > Stanford, CA 94305-5120 > USA > Email: benayoun at stanford.edu > Web: www.stanford.edu/group/brunet/ > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -- B?r?nice A. BENAYOUN, Ph.D. Stanford University/Genetics Department BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' M312 Alway Building 300, Pasteur Drive MC 5120 Stanford, CA 94305-5120 USA Email: benayoun at stanford.edu Web: www.stanford.edu/group/brunet/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.stajich at gmail.com Sun Jun 2 12:28:50 2013 From: jason.stajich at gmail.com (Jason Stajich) Date: Sun, 2 Jun 2013 11:28:50 -0700 Subject: [maker-devel] getting protein sequences from genomes In-Reply-To: References: <18790D2A402432409BCC7E00F2AE8926ACE666@rexma.intranet.epfl.ch> <18790D2A402432409BCC7E00F2AE8926AD4807@REXMF.intranet.epfl.ch> <98C45AF6-8F3E-4C06-B283-56AD9C07DD2C@genetics.utah.edu> Message-ID: seems like in your case you want to do more of a liftover-based annotation. generate that and feed it as a gff file to maker if your intention is also gene discovery in your population? On May 23, 2013, at 9:48 AM, Daniel Hughes wrote: > would gene annotation by projection using synteny/WGA not be more appropriate? either way what's wrong with running one of the standard orthology predictions tools or just basic best reciprocal blast? > > dan. > > Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) > ------------------------------------------------------------------------------------- > dsth at cantab.net > dsth at cpan.org > > > 2013/5/23 Barry Moore > Hi Liciano, > > If I understand correctly you are including translations of SNAP and Augustus predictions as well as the predictions. If so, you don't want to do that. An overlapping protein evidence is sufficient to promote a prediction to an annotation, so by providing the protein translation of the prediction along with the prediction you will guarantee that every prediction will become an annotation and that means you lose the benefit of evidence supervised annotation that MAKER provides. Include the proteins from the D mel reference and if you want to cast a broader net include proteins from other dipterans or even Uniprot - just depend on how aggressive you want to try to be in capturing new annotations. > > B > > On May 23, 2013, at 8:41 AM, Luciano Abriata wrote: > >> Thanks for your reply! >> >> One more question, can you think of any tips to get the best possible predictions of protein sequences? >> >> I am asking because I am getting a few proteins that are too big to be real and don't exist if I blast them, plus a few others which don't start with Methionine... So far I am including transcripts and translations from flybase, and snap and augustus with their available trainings for flies. Do you see any possible source of error in that? >> >> Thanks again, >> >> Luciano >> >> De: Barry Moore [barry.moore at genetics.utah.edu] >> Enviado el: viernes, 17 de mayo de 2013 09:02 p.m. >> Para: Luciano Abriata >> Cc: maker-devel at yandell-lab.org >> Asunto: Re: [maker-devel] getting protein sequences from genomes >> >> >> On May 17, 2013, at 3:45 AM, Luciano Abriata wrote: >> >>> Hello, I am trying to use Maker to annotate genomes from different individuals of a population (D. melanogaster flies). >>> >>> My ultimate goal is to get, for each gene, the amino acid sequences of the coded proteins as they are expressed from each genome. My questions are: >>> >>> 1) How can I match proteins predicted for the same gene in two genomes? >> >> blastp tweaked with parameters to optimize near perfect match >> >>> >>> 2) What is the meaning of all the data in a line such as the following one (taken from the protein.fasta output) >>> >>> maker-2L-augustus-gene-0.19-mRNA-1 protein AED:0.0322873164323667 eAED:0.0322873164323667 QI:2|1|0.66|1|1|1|3|208|541 >>> >> >> AED = Annotation edit distance describes how closely the prediction matches the evidence. This is a distance measure and thus 0 is a perfect match and 1 is no overlap. >> >> eAED = Exon adjusted annotation edit distance: This metric is the same as AED with a couple of exceptions. For a protein coding exon to be counted as overlapping protein evidence the reading frame must be the same in the coding exon and the protein evidence. Second, when mRNA Seq data is used as evidence and both ends of an exon are supported with splice site spanning reads, the middle of that exon is counted as supported as well even if coverage drops off in the interior of the exon.. For the most part AED and eAED will always be the same, but eAED tends to work better on many fringe cases. >> >> QI values are as follows: >> >> 5' UTR Length >> Fraction of splice sites confirmed by EST alignment. >> Fraction of exons that overlap and EST alignment. >> Fraction of exons that overlap EST or protein alignment. >> Fraction of splice sites confirmed by an ab initio prediction. >> Fraction of exons that overlap an ab intitio prediction. >> Number of exons in the transcript. >> 3' UTR length. >> Length of encoded protein. >> >> >>> 3) If I include snap and augustus to improve protein predictions, I get several protein.fasta files: augustus_masked.proteins.fasta , snap_masked.proteins.fasta , non_overlapping_ab_initio.proteins.fasta , and proteins.fasta >>> >>> Which of these files contains the definite set of predicted protein sequences? >> >> The proteins.fasta file is the final set of proteins for all genes that MAKER created annotations for. >> >>> >>> >>> >>> Thanks in advance! >>> >>> Luciano >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> Barry Moore >> Research Scientist >> Dept. of Human Genetics >> University of Utah >> Salt Lake City, UT 84112 >> -------------------------------------------- >> (801) 585-3543 >> >> >> >> >> > > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Jason Stajich jason.stajich at gmail.com jason at bioperl.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Jun 3 07:04:08 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 03 Jun 2013 09:04:08 -0400 Subject: [maker-devel] Advice on params for ciliates In-Reply-To: <9D9882BB-3A26-45D6-A5B0-9B18F9BF5C31@hms.harvard.edu> Message-ID: I don't have any specific advice, but In general I always set blast_depth parameters in the maker_bopts file to 20 or 30 (faster runtimes). Also max_dna_len can be set to 2x higher if you have sufficient memory (3-4 Gb per cpu as opposed to 1-2 Gb that are assumed with the default). Other than that split_hit, pred_flank, and single_exon are the only ones I might change around. You sort of have to run on a few large contigs before deciding what to do with these parameters. split_hit --> set max intron size for alignments pred_flank --> affects clustering for gene dense organisms single_exon --> leave off unless you expect a lot of singel exon genes. --Carson From: "Freeman, Robert M." Date: Thursday, 23 May, 2013 4:17 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Advice on params for ciliates Dear MAKER community, Am embarking on updating models for a ciliate (taxa Ciliophora) and was wondering if folks had recommendations for MAKER parameters. Thanks, Bob ----------------------------------------------------- Bob Freeman, Ph.D. Acorn Worm Informatics, Kirschner lab Dept of Systems Biology, Alpert 524 Harvard Medical School 200 Longwood Avenue Boston, MA 02115 617/432.2294, vox "Sorry I'm late. Oh, God, that sounded insincere. I'm late." -- Karen Walker, from Will and Grace _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bob_Freeman at hms.harvard.edu Wed Jun 5 07:28:36 2013 From: Bob_Freeman at hms.harvard.edu (Bob Freeman) Date: Wed, 5 Jun 2013 09:28:36 -0400 Subject: [maker-devel] Advice on params for ciliates In-Reply-To: References: Message-ID: Thanks, Carson, for these helpful hints. (Separately, the other code did not work again on our cluster. Have been so swamped -- I'll get to the write-up next week. Have been using the 2.25beta binary and that works OK). Best, Bob On Jun 3, 2013, at 9:04 AM, Carson Holt wrote: > I don't have any specific advice, but In general I always set blast_depth parameters in the maker_bopts file to 20 or 30 (faster runtimes). Also max_dna_len can be set to 2x higher if you have sufficient memory (3-4 Gb per cpu as opposed to 1-2 Gb that are assumed with the default). > > Other than that split_hit, pred_flank, and single_exon are the only ones I might change around. You sort of have to run on a few large contigs before deciding what to do with these parameters. > > split_hit --> set max intron size for alignments > pred_flank --> affects clustering for gene dense organisms > single_exon --> leave off unless you expect a lot of singel exon genes. > > --Carson > > > From: "Freeman, Robert M." > Date: Thursday, 23 May, 2013 4:17 PM > To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] Advice on params for ciliates > > Dear MAKER community, > > Am embarking on updating models for a ciliate (taxa Ciliophora) and was wondering if folks had recommendations for MAKER parameters. > > Thanks, > Bob > > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org ----------------------------------------------------- Bob Freeman, Ph.D. Acorn Worm Informatics, Kirschner lab Dept of Systems Biology, Alpert 524 Harvard Medical School 200 Longwood Avenue Boston, MA 02115 617/432.2294, vox "Sorry I'm late. Oh, God, that sounded insincere. I'm late." -- Karen Walker, from Will and Grace -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Wed Jun 5 10:28:46 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Wed, 5 Jun 2013 11:28:46 -0500 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: References: Message-ID: I upgraded to 2.28 and Maker is not running. Thanks! On Wed, May 22, 2013 at 9:03 AM, Carson Holt wrote: > Are you using MAKER version 2.10? I ask because there is in issue with > other_gff in that version that has since been fixed. So if you don't get > other_gff to pass-through, you will need to upgrade to 2.28 (release date > is later today coincidentally). > > For the Augustus GFF3 file, the format is a little weird which is > causing the problem. They are mRNA features not attached to genes. Rather > than build the expected 3 level gene/mRNA/exon structure for these, it is > simpler just to convert it to the 2 level match/match_part structure. Just > convert the 'mRNA' tag to 'match' and all 'exon' tags to 'match_part'. > Rename the GFF3 when your done so that it will force rebuild of the GFF3 > database when you run again. > > Thanks, > Carson > > > > From: Innocent Onsongo > Date: Wednesday, 22 May, 2013 8:47 AM > To: Barry Moore > Cc: > Subject: Re: [maker-devel] Maker: Re-annotation > > No. The MAKER produced GFF3 file does not contain any annotations. I > even tried setting the keep_preds parameter to 1 (keep_preds=1) to see if > it will pass annotations from the Augustus produced GFF file into the final > annotation but that didn't work. I have attached the maker_opts.ctl file > I used together with the first 100 lines of the GFF files it's using. I > also include the GFF file produced by MAKER (CGS01058First100.gff) > > > > > On Tue, May 21, 2013 at 10:43 PM, Barry Moore wrote: > >> Hi Getiria, >> >> Does the MAKER produced GFF3 file contain any annotations at all? Can >> you send the first ~100 lines each of the MAKER produced GFF3 file and of >> the GFF3 files that you passed via maker_opts.ctl? >> >> B >> >> On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: >> >> Maker Development Team, >> >> I am trying to use Maker for re-annotation using gene predictions from >> Augustus. We had previously used Augustus for gene prediction but now want >> to combine these annotations with some EST data. I updated >> fields maker_opts.ctl as below >> >> genome=CGS01058.fasta #genome sequence file in fasta format >> est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT >> pred_gff=Augustus.gff3 #ab-initio predictions from >> other_gff=Promoters.gff3 #promoter annotations >> other_gff=CpG_Islands.gff3 # CpG island annotations >> >> Maker runs to completion and according to the log file annotation was >> successful. However, it also gives a "Segmentation fault (core dumped)" >> message. It does produce a GFF3 file but when I load the GFF3 file into IGV >> and look it does not contain any of the exon definitions in Augustus.gff3. >> Am I missing something? >> >> Regards, >> Getiria >> >> -- >> Getiria Onsongo, Ph.D. >> Informatics Analyst, Research Informatics Support System >> Minnesota Supercomputing Institute for Advanced Computational Research >> University of Minnesota >> Minneapolis, MN 55455 >> Phone: 612-624-0532 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> Barry Moore >> Research Scientist >> Dept. of Human Genetics >> University of Utah >> Salt Lake City, UT 84112 >> -------------------------------------------- >> (801) 585-3543 >> >> >> >> >> > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jun 5 08:30:20 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Jun 2013 10:30:20 -0400 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: Message-ID: What does it do? --Carson From: Innocent Onsongo Date: Wednesday, 5 June, 2013 12:28 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" , Barry Moore Subject: Re: [maker-devel] Maker: Re-annotation I upgraded to 2.28 and Maker is not running. Thanks! On Wed, May 22, 2013 at 9:03 AM, Carson Holt wrote: > Are you using MAKER version 2.10? I ask because there is in issue with > other_gff in that version that has since been fixed. So if you don't get > other_gff to pass-through, you will need to upgrade to 2.28 (release date is > later today coincidentally). > > For the Augustus GFF3 file, the format is a little weird which is causing the > problem. They are mRNA features not attached to genes. Rather than build the > expected 3 level gene/mRNA/exon structure for these, it is simpler just to > convert it to the 2 level match/match_part structure. Just convert the 'mRNA' > tag to 'match' and all 'exon' tags to 'match_part'. Rename the GFF3 when your > done so that it will force rebuild of the GFF3 database when you run again. > > Thanks, > Carson > > > > From: Innocent Onsongo > Date: Wednesday, 22 May, 2013 8:47 AM > To: Barry Moore > Cc: > Subject: Re: [maker-devel] Maker: Re-annotation > > No. The MAKER produced GFF3 file does not contain any annotations. I even > tried setting the keep_preds parameter to 1 (keep_preds=1) to see if it will > pass annotations from the Augustus produced GFF file into the final annotation > but that didn't work. I have attached the maker_opts.ctl file I used together > with the first 100 lines of the GFF files it's using. I also include the GFF > file produced by MAKER (CGS01058First100.gff) > > > > > On Tue, May 21, 2013 at 10:43 PM, Barry Moore wrote: >> Hi Getiria, >> >> Does the MAKER produced GFF3 file contain any annotations at all? Can you >> send the first ~100 lines each of the MAKER produced GFF3 file and of the >> GFF3 files that you passed via maker_opts.ctl? >> >> B >> >> On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: >> >>> Maker Development Team, >>> >>> I am trying to use Maker for re-annotation using gene predictions from >>> Augustus. We had previously used Augustus for gene prediction but now want >>> to combine these annotations with some EST data. I updated fields >>> maker_opts.ctl as below >>> >>> genome=CGS01058.fasta #genome sequence file in fasta format >>> est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT >>> pred_gff=Augustus.gff3 #ab-initio predictions from >>> other_gff=Promoters.gff3 #promoter annotations >>> other_gff=CpG_Islands.gff3 # CpG island annotations >>> >>> Maker runs to completion and according to the log file annotation was >>> successful. However, it also gives a "Segmentation fault (core dumped)" >>> message. It does produce a GFF3 file but when I load the GFF3 file into IGV >>> and look it does not contain any of the exon definitions in Augustus.gff3. >>> Am I missing something? >>> >>> Regards, >>> Getiria >>> >>> -- >>> Getiria Onsongo, Ph.D. >>> Informatics Analyst, Research Informatics Support System >>> Minnesota Supercomputing Institute for Advanced Computational Research >>> University of Minnesota >>> Minneapolis, MN 55455 >>> Phone: 612-624-0532 >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> Barry Moore >> Research Scientist >> Dept. of Human Genetics >> University of Utah >> Salt Lake City, UT 84112 >> -------------------------------------------- >> (801) 585-3543 >> >> >> >> > > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Wed Jun 5 10:35:43 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Wed, 5 Jun 2013 11:35:43 -0500 Subject: [maker-devel] Maker: accessory scripts Message-ID: I was able to successfully ran Maker and now want to converts the gene prediction match/match_part format to annotation gene/mRNA/exon/CDS format. I looked at the tutorial and the script gff3_preds2models is supposed to do this conversion. How do I access this script. It is not in /maker/2.28-beta/bin/ Also, in running gff3_preds2models is the file I used for pred_gff=? Long story short, how do I transform the GFF output from Maker to the more traditional annotation of exon/intron? Thanks, Getiria -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Wed Jun 5 10:37:01 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Wed, 5 Jun 2013 11:37:01 -0500 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: References: Message-ID: Oops! I meant to type Maker is NOW running. On Wed, Jun 5, 2013 at 9:30 AM, Carson Holt wrote: > What does it do? > > --Carson > > From: Innocent Onsongo > Date: Wednesday, 5 June, 2013 12:28 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" , Barry > Moore > > Subject: Re: [maker-devel] Maker: Re-annotation > > I upgraded to 2.28 and Maker is not running. Thanks! > > > On Wed, May 22, 2013 at 9:03 AM, Carson Holt wrote: > >> Are you using MAKER version 2.10? I ask because there is in issue with >> other_gff in that version that has since been fixed. So if you don't get >> other_gff to pass-through, you will need to upgrade to 2.28 (release date >> is later today coincidentally). >> >> For the Augustus GFF3 file, the format is a little weird which is causing >> the problem. They are mRNA features not attached to genes. Rather than >> build the expected 3 level gene/mRNA/exon structure for these, it is >> simpler just to convert it to the 2 level match/match_part structure. Just >> convert the 'mRNA' tag to 'match' and all 'exon' tags to 'match_part'. >> Rename the GFF3 when your done so that it will force rebuild of the GFF3 >> database when you run again. >> >> Thanks, >> Carson >> >> >> >> From: Innocent Onsongo >> Date: Wednesday, 22 May, 2013 8:47 AM >> To: Barry Moore >> Cc: >> Subject: Re: [maker-devel] Maker: Re-annotation >> >> No. The MAKER produced GFF3 file does not contain any annotations. I even >> tried setting the keep_preds parameter to 1 (keep_preds=1) to see if it >> will pass annotations from the Augustus produced GFF file into the final >> annotation but that didn't work. I have attached the maker_opts.ctl file >> I used together with the first 100 lines of the GFF files it's using. I >> also include the GFF file produced by MAKER (CGS01058First100.gff) >> >> >> >> >> On Tue, May 21, 2013 at 10:43 PM, Barry Moore wrote: >> >>> Hi Getiria, >>> >>> Does the MAKER produced GFF3 file contain any annotations at all? Can >>> you send the first ~100 lines each of the MAKER produced GFF3 file and of >>> the GFF3 files that you passed via maker_opts.ctl? >>> >>> B >>> >>> On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: >>> >>> Maker Development Team, >>> >>> I am trying to use Maker for re-annotation using gene predictions from >>> Augustus. We had previously used Augustus for gene prediction but now want >>> to combine these annotations with some EST data. I updated >>> fields maker_opts.ctl as below >>> >>> genome=CGS01058.fasta #genome sequence file in fasta format >>> est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT >>> pred_gff=Augustus.gff3 #ab-initio predictions from >>> other_gff=Promoters.gff3 #promoter annotations >>> other_gff=CpG_Islands.gff3 # CpG island annotations >>> >>> Maker runs to completion and according to the log file annotation was >>> successful. However, it also gives a "Segmentation fault (core dumped)" >>> message. It does produce a GFF3 file but when I load the GFF3 file into IGV >>> and look it does not contain any of the exon definitions in Augustus.gff3. >>> Am I missing something? >>> >>> Regards, >>> Getiria >>> >>> -- >>> Getiria Onsongo, Ph.D. >>> Informatics Analyst, Research Informatics Support System >>> Minnesota Supercomputing Institute for Advanced Computational Research >>> University of Minnesota >>> Minneapolis, MN 55455 >>> Phone: 612-624-0532 >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> Barry Moore >>> Research Scientist >>> Dept. of Human Genetics >>> University of Utah >>> Salt Lake City, UT 84112 >>> -------------------------------------------- >>> (801) 585-3543 >>> >>> >>> >>> >>> >> >> >> -- >> Getiria Onsongo, Ph.D. >> Informatics Analyst, Research Informatics Support System >> Minnesota Supercomputing Institute for Advanced Computational Research >> University of Minnesota >> Minneapolis, MN 55455 >> Phone: 612-624-0532 >> > > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Jun 5 10:38:59 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 5 Jun 2013 16:38:59 +0000 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: References: Message-ID: Hi Innocent, I'm just jumping in this conversation kind of late in the game, but if you look in the gff3 file that maker gave you, do you see any gene, exon, or CDS features in the output? When you give evidence (protein or EST) and ab-initio predictors to maker the default behavior is to create gene models. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] on behalf of Innocent Onsongo [onson001 at umn.edu] Sent: Wednesday, June 05, 2013 10:35 AM To: Carson Holt Cc: maker-devel at yandell-lab.org; Barry Moore Subject: [maker-devel] Maker: accessory scripts I was able to successfully ran Maker and now want to converts the gene prediction match/match_part format to annotation gene/mRNA/exon/CDS format. I looked at the tutorial and the script gff3_preds2models is supposed to do this conversion. How do I access this script. It is not in /maker/2.28-beta/bin/ Also, in running gff3_preds2models is the file I used for pred_gff=? Long story short, how do I transform the GFF output from Maker to the more traditional annotation of exon/intron? Thanks, Getiria -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jun 5 08:44:36 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Jun 2013 10:44:36 -0400 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: Message-ID: All maker gene annotations will be of the format gene/mRNA/exon/CDS. Anything in the format match/match_part is an evidence alignment or rejected model and is there for reference purposes. If you want to upgrade all of the rejected loci to gene annotations, set keep_preds=1 in the control files. If you want to upgrade a subset of rejected models to a full annotation, create a list of IDs (one per line) then give them to the attached script. gff3_preds2models was previously deprecated and no longer part of the maker distribution, but the attached script is an updated version with the same functionality. --Carson From: Innocent Onsongo Date: Wednesday, 5 June, 2013 12:35 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" , Barry Moore Subject: [maker-devel] Maker: accessory scripts I was able to successfully ran Maker and now want to converts the gene prediction match/match_part format to annotation gene/mRNA/exon/CDS format. I looked at the tutorial and the script gff3_preds2models is supposed to do this conversion. How do I access this script. It is not in /maker/2.28-beta/bin/ Also, in running gff3_preds2models is the file I used for pred_gff=? Long story short, how do I transform the GFF output from Maker to the more traditional annotation of exon/intron? Thanks, Getiria _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gff3_preds2models Type: application/octet-stream Size: 4778 bytes Desc: not available URL: From carsonhh at gmail.com Wed Jun 5 08:45:10 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Jun 2013 10:45:10 -0400 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: Message-ID: Gotcha :-) --Carson From: Innocent Onsongo Date: Wednesday, 5 June, 2013 12:37 PM To: Carson Holt Cc: Carson Holt , "maker-devel at yandell-lab.org" , Barry Moore Subject: Re: [maker-devel] Maker: Re-annotation Oops! I meant to type Maker is NOW running. On Wed, Jun 5, 2013 at 9:30 AM, Carson Holt wrote: > What does it do? > > --Carson > > From: Innocent Onsongo > Date: Wednesday, 5 June, 2013 12:28 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" , Barry Moore > > > Subject: Re: [maker-devel] Maker: Re-annotation > > I upgraded to 2.28 and Maker is not running. Thanks! > > > On Wed, May 22, 2013 at 9:03 AM, Carson Holt wrote: >> Are you using MAKER version 2.10? I ask because there is in issue with >> other_gff in that version that has since been fixed. So if you don't get >> other_gff to pass-through, you will need to upgrade to 2.28 (release date is >> later today coincidentally). >> >> For the Augustus GFF3 file, the format is a little weird which is causing the >> problem. They are mRNA features not attached to genes. Rather than build >> the expected 3 level gene/mRNA/exon structure for these, it is simpler just >> to convert it to the 2 level match/match_part structure. Just convert the >> 'mRNA' tag to 'match' and all 'exon' tags to 'match_part'. Rename the GFF3 >> when your done so that it will force rebuild of the GFF3 database when you >> run again. >> >> Thanks, >> Carson >> >> >> >> From: Innocent Onsongo >> Date: Wednesday, 22 May, 2013 8:47 AM >> To: Barry Moore >> Cc: >> Subject: Re: [maker-devel] Maker: Re-annotation >> >> No. The MAKER produced GFF3 file does not contain any annotations. I even >> tried setting the keep_preds parameter to 1 (keep_preds=1) to see if it will >> pass annotations from the Augustus produced GFF file into the final >> annotation but that didn't work. I have attached the maker_opts.ctl file I >> used together with the first 100 lines of the GFF files it's using. I also >> include the GFF file produced by MAKER (CGS01058First100.gff) >> >> >> >> >> On Tue, May 21, 2013 at 10:43 PM, Barry Moore wrote: >>> Hi Getiria, >>> >>> Does the MAKER produced GFF3 file contain any annotations at all? Can you >>> send the first ~100 lines each of the MAKER produced GFF3 file and of the >>> GFF3 files that you passed via maker_opts.ctl? >>> >>> B >>> >>> On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: >>> >>>> Maker Development Team, >>>> >>>> I am trying to use Maker for re-annotation using gene predictions from >>>> Augustus. We had previously used Augustus for gene prediction but now want >>>> to combine these annotations with some EST data. I updated fields >>>> maker_opts.ctl as below >>>> >>>> genome=CGS01058.fasta #genome sequence file in fasta format >>>> est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT >>>> pred_gff=Augustus.gff3 #ab-initio predictions from >>>> other_gff=Promoters.gff3 #promoter annotations >>>> other_gff=CpG_Islands.gff3 # CpG island annotations >>>> >>>> Maker runs to completion and according to the log file annotation was >>>> successful. However, it also gives a "Segmentation fault (core dumped)" >>>> message. It does produce a GFF3 file but when I load the GFF3 file into IGV >>>> and look it does not contain any of the exon definitions in Augustus.gff3. >>>> Am I missing something? >>>> >>>> Regards, >>>> Getiria >>>> >>>> -- >>>> Getiria Onsongo, Ph.D. >>>> Informatics Analyst, Research Informatics Support System >>>> Minnesota Supercomputing Institute for Advanced Computational Research >>>> University of Minnesota >>>> Minneapolis, MN 55455 >>>> Phone: 612-624-0532 >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> Barry Moore >>> Research Scientist >>> Dept. of Human Genetics >>> University of Utah >>> Salt Lake City, UT 84112 >>> -------------------------------------------- >>> (801) 585-3543 >>> >>> >>> >>> >> >> >> >> -- >> Getiria Onsongo, Ph.D. >> Informatics Analyst, Research Informatics Support System >> Minnesota Supercomputing Institute for Advanced Computational Research >> University of Minnesota >> Minneapolis, MN 55455 >> Phone: 612-624-0532 > > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jun 5 08:47:51 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Jun 2013 10:47:51 -0400 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: Message-ID: Also, just a note, models are rejected if they have no protein or EST support. This is because ab inito predictors over predict (you may have 10 false positives for every true positive in some genomes for example). --Carson From: Carson Holt Date: Wednesday, 5 June, 2013 10:44 AM To: Innocent Onsongo , Carson Holt Cc: "maker-devel at yandell-lab.org" , Barry Moore Subject: Re: [maker-devel] Maker: accessory scripts All maker gene annotations will be of the format gene/mRNA/exon/CDS. Anything in the format match/match_part is an evidence alignment or rejected model and is there for reference purposes. If you want to upgrade all of the rejected loci to gene annotations, set keep_preds=1 in the control files. If you want to upgrade a subset of rejected models to a full annotation, create a list of IDs (one per line) then give them to the attached script. gff3_preds2models was previously deprecated and no longer part of the maker distribution, but the attached script is an updated version with the same functionality. --Carson From: Innocent Onsongo Date: Wednesday, 5 June, 2013 12:35 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" , Barry Moore Subject: [maker-devel] Maker: accessory scripts I was able to successfully ran Maker and now want to converts the gene prediction match/match_part format to annotation gene/mRNA/exon/CDS format. I looked at the tutorial and the script gff3_preds2models is supposed to do this conversion. How do I access this script. It is not in /maker/2.28-beta/bin/ Also, in running gff3_preds2models is the file I used for pred_gff=? Long story short, how do I transform the GFF output from Maker to the more traditional annotation of exon/intron? Thanks, Getiria _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From amelia.ireland at gmod.org Wed Jun 5 11:14:05 2013 From: amelia.ireland at gmod.org (Amelia Ireland) Date: Wed, 5 Jun 2013 10:14:05 -0700 Subject: [maker-devel] Apply now for the GMOD Summer School! Message-ID: Closing date for applications: 10 June July 19-23, 2013; NESCent, Durham, North Carolina http://gmod.org/wiki/2013_GMOD_Summer_School The 2013 GMOD Summer School is the best way to get to grips with GMOD in the Cloud, GMOD's suite of genomic and genetic software. Over five days, attendees will learn how to install, configure, and run popular GMOD software for visualization, storage, and dissemination of genetic and genomic data. The following software is covered: - Chado, a species-independent database schema covering many areas of genetic and genomic data; - GBrowse, the ubiquitous genome browser; - GBrowse syn, a synteny browser built on GBrowse; - Galaxy, analysis and computation pipeline; - JBrowse, genome browsing evolved; - MAKER, automated annotation pipeline; - Tripal, a slick web interface for displaying and editing data from Chado; and - WebApollo, distributed community genome annotation tool (built on JBrowse). There are additional sessions on setting up a GMOD in the Cloud virtual machine in the Amazon cloud, and common file formats. Courses are taught by members of the software development teams, and there are work sessions in the evenings for participants to talk to the developers or apply what they have been taught to their own data. For more information and to apply, visit http://gmod.org/wiki/2013_GMOD_Summer_School. There are some scholarship funds available for those from underrepresented minorities. All applications should be in by June 10th. If you have any questions, please contact the GMOD help desk at help at gmod.org. Hope to see you there! Thanks, Amelia Ireland GMOD Community Support http://gmod.org || @gmodproject -- Amelia Ireland GMOD Community Support http://gmod.org || @gmodproject -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnuhn at ebi.ac.uk Thu Jun 6 02:44:10 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Thu, 06 Jun 2013 09:44:10 +0100 Subject: [maker-devel] Effect of the unmask option Message-ID: <51B04BDA.7050307@ebi.ac.uk> Hello Carson! When running maker with the unmask option, how does maker use the predictions generated from running the gene predictors on the unmasked sequence? The tutorial says: "You do have the option to run ab initio gene predictors on both the masked and unmasked sequence if repeat masking worries you though. You do this by setting unmask:1 in the maker_opt.ctl configuration file." http://gmod.org/wiki/MAKER_Tutorial_2012 But in the sub get_non_overlaping_abinits in maker::auto_annotator (maker version 2.27) they are skipped: #only accept masked predictions unless I'm not masking or the predictor is genemark my $src = $g->{algorithm}; unless($src =~ /_masked$|^pred_gff/ || $CTL_OPT->{_no_mask} || $CTL_OPT->{predictor} eq 'genemark') { next; } Cheers, Michael. From carsonhh at gmail.com Thu Jun 6 07:55:08 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Jun 2013 09:55:08 -0400 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: Message-ID: One thing to keep in mind is the strandedness of the evidence and the model (they must be on the same strand). Further protein evidence is only valid support if it is in the same reading frame as the model. Could you send the full GFF3 for the contig (I need features and GFF3 internal fasta) and the coordinates of the region in question, and I can take a look? Also if you can, it would be good to let maker run Augustus as well with the species file rather than just passing in the GFF3. This is because MAKER can only talk to Augustus to generate competing hint based models if you provide the species. Thanks, Carson From: Innocent Onsongo Date: Wednesday, 5 June, 2013 1:10 PM To: Carson Holt Cc: Carson Holt , "maker-devel at yandell-lab.org" , Barry Moore Subject: Re: [maker-devel] Maker: accessory scripts I checked visually in IGV and there are some exons in the predicted model with protein and EST support but the maker output GFF only has match_part and protein_match in column 3. Does that mean Maker doesn't deem any of the evidence sufficient to make a gene model prediction? I guess I am somewhat surprised I am not getting any exons predicted by Maker. Is there a parameter I can alter to reduce the threshold at which Maker makes this call? I have attached the first 400 lines of one of my GFF files together with the control file (maker_opts.ctl) just in case they might be useful. Getiria On Wed, Jun 5, 2013 at 9:47 AM, Carson Holt wrote: > Also, just a note, models are rejected if they have no protein or EST support. > This is because ab inito predictors over predict (you may have 10 false > positives for every true positive in some genomes for example). > > --Carson > > > > From: Carson Holt > Date: Wednesday, 5 June, 2013 10:44 AM > To: Innocent Onsongo , Carson Holt > > Cc: "maker-devel at yandell-lab.org" , Barry Moore > > Subject: Re: [maker-devel] Maker: accessory scripts > > All maker gene annotations will be of the format gene/mRNA/exon/CDS. > Anything in the format match/match_part is an evidence alignment or rejected > model and is there for reference purposes. If you want to upgrade all of the > rejected loci to gene annotations, set keep_preds=1 in the control files. If > you want to upgrade a subset of rejected models to a full annotation, create a > list of IDs (one per line) then give them to the attached script. > gff3_preds2models was previously deprecated and no longer part of the maker > distribution, but the attached script is an updated version with the same > functionality. > > --Carson > > > > From: Innocent Onsongo > Date: Wednesday, 5 June, 2013 12:35 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" , Barry Moore > > Subject: [maker-devel] Maker: accessory scripts > > I was able to successfully ran Maker and now want to converts the gene > prediction match/match_part format to annotation gene/mRNA/exon/CDS format. I > looked at the tutorial and the script gff3_preds2models > is supposed to do this conversion. How do I access this script. It is not in > /maker/2.28-beta/bin/ > > Also, in running gff3_preds2models is the > file I used for pred_gff=? > > Long story short, how do I transform the GFF output from Maker to the more > traditional annotation of exon/intron? > > Thanks, > Getiria > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Wed Jun 5 11:10:01 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Wed, 5 Jun 2013 12:10:01 -0500 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: References: Message-ID: I checked visually in IGV and there are some exons in the predicted model with protein and EST support but the maker output GFF only has match_part and protein_match in column 3. Does that mean Maker doesn't deem any of the evidence sufficient to make a gene model prediction? I guess I am somewhat surprised I am not getting any exons predicted by Maker. Is there a parameter I can alter to reduce the threshold at which Maker makes this call? I have attached the first 400 lines of one of my GFF files together with the control file (maker_opts.ctl) just in case they might be useful. Getiria On Wed, Jun 5, 2013 at 9:47 AM, Carson Holt wrote: > Also, just a note, models are rejected if they have no protein or EST > support. This is because ab inito predictors over predict (you may have 10 > false positives for every true positive in some genomes for example). > > --Carson > > > > From: Carson Holt > Date: Wednesday, 5 June, 2013 10:44 AM > To: Innocent Onsongo , Carson Holt < > carson.holt at oicr.on.ca> > > Cc: "maker-devel at yandell-lab.org" , Barry > Moore > Subject: Re: [maker-devel] Maker: accessory scripts > > All maker gene annotations will be of the format gene/mRNA/exon/CDS. > Anything in the format match/match_part is an evidence alignment or > rejected model and is there for reference purposes. If you want to upgrade > all of the rejected loci to gene annotations, set keep_preds=1 in the > control files. If you want to upgrade a subset of rejected models to a > full annotation, create a list of IDs (one per line) then give them to the > attached script. gff3_preds2models was previously deprecated and no longer > part of the maker distribution, but the attached script is an updated > version with the same functionality. > > --Carson > > > > From: Innocent Onsongo > Date: Wednesday, 5 June, 2013 12:35 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" , Barry > Moore > Subject: [maker-devel] Maker: accessory scripts > > I was able to successfully ran Maker and now want to converts the gene > prediction match/match_part format to annotation gene/mRNA/exon/CDS format. > I looked at the tutorial and the script gff3_preds2models > is supposed to do this conversion. How do I access this script. It is not > in /maker/2.28-beta/bin/ > > Also, in running gff3_preds2models is list> the file I used for pred_gff=? > > Long story short, how do I transform the GFF output from Maker to the more > traditional annotation of exon/intron? > > Thanks, > Getiria > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4526 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: MakerFirst400.gff Type: application/octet-stream Size: 74871 bytes Desc: not available URL: From onson001 at umn.edu Thu Jun 6 12:58:21 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Thu, 6 Jun 2013 13:58:21 -0500 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: References: Message-ID: Thanks for the timely feedback Carson. I made a change to my pred_gff and est_gff GFF3 files and now I am getting results but I am not sure if the changes I made are valid. I want to make sure the changes I made did not lead Maker to behave in an unexpected way and lead to results that might be incorrect. In my pred_gff file, I replaced "mRNA" with "protein_match" and "exon" with "match" below are the first three lines of the old and new pred_gff files respectively ---------------old pred_gff ##gff-version 3 CGS00003 AUGUSTUS mRNA 1 10865 1 + . CGS00003 AUGUSTUS exon 2013 2050 . + 1 ---------------new pred_gff ##gff-version 3 CGS00003 AUGUSTUS protein_match 1 10865 1 + . CGS00003 AUGUSTUS match_part 2013 2050 . + 1 In my est_gff file, I replaced "mRNA" with "protein_match" and "exon" with "match" below are the first three lines of the old and new pred_gff files respectively ----------------old est_gff ##gff-version 3 CGS00003 EST_BLAT mRNA 4641336 4758501 6072 - . CGS00003 EST_BLAT exon 4641336 4641979 644 - . ----------------new est_gff CGS00003 EST_BLAT expressed_sequence_match 4641336 4758501 6072 - . CGS00003 EST_BLAT match_part 4641336 4641979 644 - . Are the changes I made valid? Thanks, Getiria On Wed, Jun 5, 2013 at 9:47 AM, Carson Holt wrote: > Also, just a note, models are rejected if they have no protein or EST > support. This is because ab inito predictors over predict (you may have 10 > false positives for every true positive in some genomes for example). > > --Carson > > > > From: Carson Holt > Date: Wednesday, 5 June, 2013 10:44 AM > To: Innocent Onsongo , Carson Holt < > carson.holt at oicr.on.ca> > > Cc: "maker-devel at yandell-lab.org" , Barry > Moore > Subject: Re: [maker-devel] Maker: accessory scripts > > All maker gene annotations will be of the format gene/mRNA/exon/CDS. > Anything in the format match/match_part is an evidence alignment or > rejected model and is there for reference purposes. If you want to upgrade > all of the rejected loci to gene annotations, set keep_preds=1 in the > control files. If you want to upgrade a subset of rejected models to a > full annotation, create a list of IDs (one per line) then give them to the > attached script. gff3_preds2models was previously deprecated and no longer > part of the maker distribution, but the attached script is an updated > version with the same functionality. > > --Carson > > > > From: Innocent Onsongo > Date: Wednesday, 5 June, 2013 12:35 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" , Barry > Moore > Subject: [maker-devel] Maker: accessory scripts > > I was able to successfully ran Maker and now want to converts the gene > prediction match/match_part format to annotation gene/mRNA/exon/CDS format. > I looked at the tutorial and the script gff3_preds2models > is supposed to do this conversion. How do I access this script. It is not > in /maker/2.28-beta/bin/ > > Also, in running gff3_preds2models is list> the file I used for pred_gff=? > > Long story short, how do I transform the GFF output from Maker to the more > traditional annotation of exon/intron? > > Thanks, > Getiria > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From benayoun at stanford.edu Fri Jun 7 11:17:47 2013 From: benayoun at stanford.edu (=?ISO-8859-1?Q?B=E9r=E9nice_Benayoun?=) Date: Fri, 7 Jun 2013 10:17:47 -0700 Subject: [maker-devel] Maker and mono-exonic genes ? Message-ID: Dear maker developers, I am currently annotating a de novo fish genome, and have started looking for genes of interest in particular in Maker's output to verify that it's outputting proper gene sets. While many of the genes I look for seem to be correctly annotated by the pipeline, I have noticed that important genes that do have strong evidentiary support but are monoexonic are NOT reported by maker. I am attaching a screenshot for the contig that I know should contain the * Foxl2* gene (notoriously monoexonic across evolution), and highlighted the corresponding evidence for it. Is there any setting I can give to maker to force it to output monoexonic genes ? I already set "single_exon=1" with no success. I attached my config file FYI. Thank you so much in advance for your answer !!! Best, Berenice. -- B?r?nice A. BENAYOUN, Ph.D. Stanford University/Genetics Department *BRUNET Laboratory*, 'Molecular Basis of Longevity and Age Related Diseases' M312 Alway Building 300, Pasteur Drive MC 5120 Stanford, CA 94305-5120 USA Email: benayoun at stanford.edu Web: www.stanford.edu/group/brunet/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Appolo_screenshot_missing_monoexonic_pred.pdf Type: application/pdf Size: 709436 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.log Type: application/octet-stream Size: 5155 bytes Desc: not available URL: From onson001 at umn.edu Fri Jun 7 14:08:43 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Fri, 7 Jun 2013 15:08:43 -0500 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: References: Message-ID: Carson, I have attached the full gff3 for the contig together with a screen shot from IGV with regions I was expecting Maker to make a consensus call. The region on question is CGS00003:5264784-5273457. I will greatly appreciate any insights. Thanks, Getiria On Thu, Jun 6, 2013 at 8:55 AM, Carson Holt wrote: > One thing to keep in mind is the strandedness of the evidence and the > model (they must be on the same strand). Further protein evidence is only > valid support if it is in the same reading frame as the model. > > Could you send the full GFF3 for the contig (I need features and GFF3 > internal fasta) and the coordinates of the region in question, and I can > take a look? Also if you can, it would be good to let maker run Augustus > as well with the species file rather than just passing in the GFF3. This > is because MAKER can only talk to Augustus to generate competing hint based > models if you provide the species. > > Thanks, > Carson > > > From: Innocent Onsongo > Date: Wednesday, 5 June, 2013 1:10 PM > To: Carson Holt > Cc: Carson Holt , "maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org>, Barry Moore > > Subject: Re: [maker-devel] Maker: accessory scripts > > I checked visually in IGV and there are some exons in the predicted model > with protein and EST support but the maker output GFF only has match_part > and protein_match in column 3. Does that mean Maker doesn't deem any of the > evidence sufficient to make a gene model prediction? > > I guess I am somewhat surprised I am not getting any exons predicted by > Maker. Is there a parameter I can alter to reduce the threshold at which > Maker makes this call? I have attached the first 400 lines of one of my GFF > files together with the control file (maker_opts.ctl) just in case they > might be useful. > > Getiria > > > On Wed, Jun 5, 2013 at 9:47 AM, Carson Holt wrote: > >> Also, just a note, models are rejected if they have no protein or EST >> support. This is because ab inito predictors over predict (you may have 10 >> false positives for every true positive in some genomes for example). >> >> --Carson >> >> >> >> From: Carson Holt >> Date: Wednesday, 5 June, 2013 10:44 AM >> To: Innocent Onsongo , Carson Holt < >> carson.holt at oicr.on.ca> >> >> Cc: "maker-devel at yandell-lab.org" , Barry >> Moore >> Subject: Re: [maker-devel] Maker: accessory scripts >> >> All maker gene annotations will be of the format gene/mRNA/exon/CDS. >> Anything in the format match/match_part is an evidence alignment or >> rejected model and is there for reference purposes. If you want to upgrade >> all of the rejected loci to gene annotations, set keep_preds=1 in the >> control files. If you want to upgrade a subset of rejected models to a >> full annotation, create a list of IDs (one per line) then give them to the >> attached script. gff3_preds2models was previously deprecated and no longer >> part of the maker distribution, but the attached script is an updated >> version with the same functionality. >> >> --Carson >> >> >> >> From: Innocent Onsongo >> Date: Wednesday, 5 June, 2013 12:35 PM >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" , Barry >> Moore >> Subject: [maker-devel] Maker: accessory scripts >> >> I was able to successfully ran Maker and now want to converts the gene >> prediction match/match_part format to annotation gene/mRNA/exon/CDS format. >> I looked at the tutorial and the script gff3_preds2models >> is supposed to do this conversion. How do I access this script. It is not >> in /maker/2.28-beta/bin/ >> >> Also, in running gff3_preds2models is > list> the file I used for pred_gff=? >> >> Long story short, how do I transform the GFF output from Maker to the >> more traditional annotation of exon/intron? >> >> Thanks, >> Getiria >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: CGS00003.gff Type: application/octet-stream Size: 11835536 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: CGS00003_5264784-5273457.pdf Type: application/pdf Size: 124265 bytes Desc: not available URL: From dence at genetics.utah.edu Fri Jun 7 15:32:57 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 7 Jun 2013 21:32:57 +0000 Subject: [maker-devel] Maker and mono-exonic genes ? In-Reply-To: References: Message-ID: Hi Berenice, Thank you for sending that screenshot and the maker_opts.log file. Those are exactly what we need to understand how to expect MAKER to perform. In looking at the screenshot, it doesn't look like any of the gene predictors gave a prediction in this region. Uses the predictions from ab-initio tools as a basis for models and considers models that are supported by evidence. It won't by default create a model when there isn't a prediction in the region. Can I ask which gene predictors you used and how they were trained? You might consider training one or more of them on the specific evidence that you expect to support these genes and then rerunning maker with the retrained predictors. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of B?r?nice Benayoun [benayoun at stanford.edu] Sent: Friday, June 07, 2013 11:17 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] Maker and mono-exonic genes ? Dear maker developers, I am currently annotating a de novo fish genome, and have started looking for genes of interest in particular in Maker's output to verify that it's outputting proper gene sets. While many of the genes I look for seem to be correctly annotated by the pipeline, I have noticed that important genes that do have strong evidentiary support but are monoexonic are NOT reported by maker. I am attaching a screenshot for the contig that I know should contain the Foxl2 gene (notoriously monoexonic across evolution), and highlighted the corresponding evidence for it. Is there any setting I can give to maker to force it to output monoexonic genes ? I already set "single_exon=1" with no success. I attached my config file FYI. Thank you so much in advance for your answer !!! Best, Berenice. -- B?r?nice A. BENAYOUN, Ph.D. Stanford University/Genetics Department BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' M312 Alway Building 300, Pasteur Drive MC 5120 Stanford, CA 94305-5120 USA Email: benayoun at stanford.edu Web: www.stanford.edu/group/brunet/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Fri Jun 7 15:58:16 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 7 Jun 2013 21:58:16 +0000 Subject: [maker-devel] Maker and mono-exonic genes ? In-Reply-To: References: , Message-ID: Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: berenice.benayoun at gmail.com [berenice.benayoun at gmail.com] on behalf of B?r?nice Benayoun [benayoun at stanford.edu] Sent: Friday, June 07, 2013 3:50 PM To: Daniel Ence Subject: Re: [maker-devel] Maker and mono-exonic genes ? Hi Daniel, Thanks for the quick answer ! I used SNAP, and trained from a hmm model made with the CEGMA output on my genome (240 gene models) plus a first run of maker of 1/3 of the genome. I tried GenemarkES and Augustus, but for some reason they don't run, so I stopped indicating their existence to maker. Should I do something in particular to train it "better" ? Is there any other predictor that would be worth running ? Thanks so much for your help ! Berenice 2013/6/7 Daniel Ence > Hi Berenice, Thank you for sending that screenshot and the maker_opts.log file. Those are exactly what we need to understand how to expect MAKER to perform. In looking at the screenshot, it doesn't look like any of the gene predictors gave a prediction in this region. Uses the predictions from ab-initio tools as a basis for models and considers models that are supported by evidence. It won't by default create a model when there isn't a prediction in the region. Can I ask which gene predictors you used and how they were trained? You might consider training one or more of them on the specific evidence that you expect to support these genes and then rerunning maker with the retrained predictors. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of B?r?nice Benayoun [benayoun at stanford.edu] Sent: Friday, June 07, 2013 11:17 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] Maker and mono-exonic genes ? Dear maker developers, I am currently annotating a de novo fish genome, and have started looking for genes of interest in particular in Maker's output to verify that it's outputting proper gene sets. While many of the genes I look for seem to be correctly annotated by the pipeline, I have noticed that important genes that do have strong evidentiary support but are monoexonic are NOT reported by maker. I am attaching a screenshot for the contig that I know should contain the Foxl2 gene (notoriously monoexonic across evolution), and highlighted the corresponding evidence for it. Is there any setting I can give to maker to force it to output monoexonic genes ? I already set "single_exon=1" with no success. I attached my config file FYI. Thank you so much in advance for your answer !!! Best, Berenice. -- B?r?nice A. BENAYOUN, Ph.D. Stanford University/Genetics Department BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' M312 Alway Building 300, Pasteur Drive MC 5120 Stanford, CA 94305-5120 USA Email: benayoun at stanford.edu Web: www.stanford.edu/group/brunet/ -- B?r?nice A. BENAYOUN, Ph.D. Stanford University/Genetics Department BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' M312 Alway Building 300, Pasteur Drive MC 5120 Stanford, CA 94305-5120 USA Email: benayoun at stanford.edu Web: www.stanford.edu/group/brunet/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Fri Jun 7 16:30:35 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri, 7 Jun 2013 16:30:35 -0600 Subject: [maker-devel] Maker and mono-exonic genes ? In-Reply-To: References: , Message-ID: <11A6EF4C-B82E-4851-80FC-B8668531E2EC@genetics.utah.edu> Hi Berenice, SNAP is a good gene predictor, but for most genomes Augustus can be more accurate - of course it is also harder to train. Running a first round of MAKER annotation with SNAP as the predictor and then training SNAP on the output from that run followed by a second MAKER run (runs pretty fast second time because all the blast jobs are reused) is a good way to start. Ultimately running Augustus as well (along with custom training) is probably worth it for a final annotation effort. The good thing is you can run these iterative cycles of annotation with minimal effort because MAKER will reuse an computations that have already run. B On Jun 7, 2013, at 3:58 PM, Daniel Ence wrote: > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > From: berenice.benayoun at gmail.com [berenice.benayoun at gmail.com] on behalf of B?r?nice Benayoun [benayoun at stanford.edu] > Sent: Friday, June 07, 2013 3:50 PM > To: Daniel Ence > Subject: Re: [maker-devel] Maker and mono-exonic genes ? > > Hi Daniel, > > Thanks for the quick answer ! > > I used SNAP, and trained from a hmm model made with the CEGMA output on my genome (240 gene models) plus a first run of maker of 1/3 of the genome. I tried GenemarkES and Augustus, but for some reason they don't run, so I stopped indicating their existence to maker. > > Should I do something in particular to train it "better" ? Is there any other predictor that would be worth running ? > > Thanks so much for your help ! > > Berenice > > 2013/6/7 Daniel Ence > Hi Berenice, Thank you for sending that screenshot and the maker_opts.log file. Those are exactly what we need to understand how to expect MAKER to perform. > > In looking at the screenshot, it doesn't look like any of the gene predictors gave a prediction in this region. Uses the predictions from ab-initio tools as a basis for models and considers models that are supported by evidence. It won't by default create a model when there isn't a prediction in the region. > > Can I ask which gene predictors you used and how they were trained? You might consider training one or more of them on the specific evidence that you expect to support these genes and then rerunning maker with the retrained predictors. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of B?r?nice Benayoun [benayoun at stanford.edu] > Sent: Friday, June 07, 2013 11:17 AM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] Maker and mono-exonic genes ? > > Dear maker developers, > > I am currently annotating a de novo fish genome, and have started looking for genes of interest in particular in Maker's output to verify that it's outputting proper gene sets. > > While many of the genes I look for seem to be correctly annotated by the pipeline, I have noticed that important genes that do have strong evidentiary support but are monoexonic are NOT reported by maker. > > I am attaching a screenshot for the contig that I know should contain the Foxl2 gene (notoriously monoexonic across evolution), and highlighted the corresponding evidence for it. > > Is there any setting I can give to maker to force it to output monoexonic genes ? I already set "single_exon=1" with no success. I attached my config file FYI. > > Thank you so much in advance for your answer !!! > > Best, > > Berenice. > -- > B?r?nice A. BENAYOUN, Ph.D. > Stanford University/Genetics Department > BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' > M312 Alway Building > 300, Pasteur Drive > MC 5120 > Stanford, CA 94305-5120 > USA > Email: benayoun at stanford.edu > Web: www.stanford.edu/group/brunet/ > > > > -- > B?r?nice A. BENAYOUN, Ph.D. > Stanford University/Genetics Department > BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' > M312 Alway Building > 300, Pasteur Drive > MC 5120 > Stanford, CA 94305-5120 > USA > Email: benayoun at stanford.edu > Web: www.stanford.edu/group/brunet/ > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jun 7 15:51:53 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 07 Jun 2013 16:51:53 -0500 Subject: [maker-devel] Effect of the unmask option In-Reply-To: <51B04BDA.7050307@ebi.ac.uk> Message-ID: The unmasked option allows the ab initio predictions ran on unmasked sequence to compete against other models, and only if they have a better AED score are they selected. They are not available for non-overlapping rejected models at the end of the run because that set is non-redundant and they tend to have a very high likelihood of being transposons themselves. So I don't let a repeat containing model override a non-repeat containing model unless there is evidence supporting it (there is noever evidence supporting the non-overlapping models). --Carson On 13-06-06 4:44 AM, "Michael Nuhn" wrote: >Hello Carson! > >When running maker with the unmask option, how does maker use the >predictions generated from running the gene predictors on the unmasked >sequence? > >The tutorial says: > >"You do have the option to run ab initio gene predictors on both the >masked and unmasked sequence if repeat masking worries you though. You >do this by setting unmask:1 in the maker_opt.ctl configuration file." > >http://gmod.org/wiki/MAKER_Tutorial_2012 > >But in the sub get_non_overlaping_abinits in maker::auto_annotator >(maker version 2.27) they are skipped: > >#only accept masked predictions unless I'm not masking or the predictor >is genemark >my $src = $g->{algorithm}; >unless($src =~ /_masked$|^pred_gff/ || $CTL_OPT->{_no_mask} || >$CTL_OPT->{predictor} eq 'genemark') { > next; >} > >Cheers, >Michael. > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri Jun 7 16:10:09 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 07 Jun 2013 17:10:09 -0500 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: Message-ID: You seem to be running this in a very odd way. First the GFF3 is not correctly formatted. There are lines containing score=1 (all by itself)? I believe this may be coming through because you are trying to pass in augustus predictions as GFF3 and that input is malformed. All of your Augustus models are also single exon genes, but they are very long and do not even correspond to proper ORFs. The EST evidence is spliced and is thus contradicting the augustus model (they don't support each other). If you want MAKER to be able to use the evidence as feedback for the model, you need to let MAKER run augustus. Otherwise it is only able to accept or reject the model from the GFF3 (nothing more ? no attempt at consensus). Perhaps if you supply you input dataset and control files we can help you get the best settings. You would need to provide the Augustus species set you are using as well (contained in a directory in ?/augustus/config/species). --Carson From: Innocent Onsongo Date: Friday, 7 June, 2013 2:08 PM To: Carson Holt Cc: Carson Holt , "maker-devel at yandell-lab.org" , Barry Moore Subject: Re: [maker-devel] Maker: accessory scripts Carson, I have attached the full gff3 for the contig together with a screen shot from IGV with regions I was expecting Maker to make a consensus call. The region on question is CGS00003:5264784-5273457. I will greatly appreciate any insights. Thanks, Getiria On Thu, Jun 6, 2013 at 8:55 AM, Carson Holt wrote: > One thing to keep in mind is the strandedness of the evidence and the model > (they must be on the same strand). Further protein evidence is only valid > support if it is in the same reading frame as the model. > > Could you send the full GFF3 for the contig (I need features and GFF3 internal > fasta) and the coordinates of the region in question, and I can take a look? > Also if you can, it would be good to let maker run Augustus as well with the > species file rather than just passing in the GFF3. This is because MAKER can > only talk to Augustus to generate competing hint based models if you provide > the species. > > Thanks, > Carson > > > From: Innocent Onsongo > Date: Wednesday, 5 June, 2013 1:10 PM > To: Carson Holt > Cc: Carson Holt , "maker-devel at yandell-lab.org" > , Barry Moore > > Subject: Re: [maker-devel] Maker: accessory scripts > > I checked visually in IGV and there are some exons in the predicted model with > protein and EST support but the maker output GFF only has match_part and > protein_match in column 3. Does that mean Maker doesn't deem any of the > evidence sufficient to make a gene model prediction? > > I guess I am somewhat surprised I am not getting any exons predicted by Maker. > Is there a parameter I can alter to reduce the threshold at which Maker makes > this call? I have attached the first 400 lines of one of my GFF files together > with the control file (maker_opts.ctl) just in case they might be useful. > > Getiria > > > On Wed, Jun 5, 2013 at 9:47 AM, Carson Holt wrote: >> Also, just a note, models are rejected if they have no protein or EST >> support. This is because ab inito predictors over predict (you may have 10 >> false positives for every true positive in some genomes for example). >> >> --Carson >> >> >> >> From: Carson Holt >> Date: Wednesday, 5 June, 2013 10:44 AM >> To: Innocent Onsongo , Carson Holt >> >> >> Cc: "maker-devel at yandell-lab.org" , Barry Moore >> >> Subject: Re: [maker-devel] Maker: accessory scripts >> >> All maker gene annotations will be of the format gene/mRNA/exon/CDS. >> Anything in the format match/match_part is an evidence alignment or rejected >> model and is there for reference purposes. If you want to upgrade all of the >> rejected loci to gene annotations, set keep_preds=1 in the control files. If >> you want to upgrade a subset of rejected models to a full annotation, create >> a list of IDs (one per line) then give them to the attached script. >> gff3_preds2models was previously deprecated and no longer part of the maker >> distribution, but the attached script is an updated version with the same >> functionality. >> >> --Carson >> >> >> >> From: Innocent Onsongo >> Date: Wednesday, 5 June, 2013 12:35 PM >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" , Barry Moore >> >> Subject: [maker-devel] Maker: accessory scripts >> >> I was able to successfully ran Maker and now want to converts the gene >> prediction match/match_part format to annotation gene/mRNA/exon/CDS format. I >> looked at the tutorial and the script gff3_preds2models >> is supposed to do this conversion. How do I access this script. It is not in >> /maker/2.28-beta/bin/ >> >> Also, in running gff3_preds2models is >> the file I used for pred_gff=? >> >> Long story short, how do I transform the GFF output from Maker to the more >> traditional annotation of exon/intron? >> >> Thanks, >> Getiria >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/ma >> ker-devel_yandell-lab.org > > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Fri Jun 7 20:29:50 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Fri, 7 Jun 2013 21:29:50 -0500 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: References: Message-ID: I appreciate the feedback. I will try letting MAKER run augustus instead of passing the Augustus predictions as GFF3. Thanks for all you help! Getiria On Fri, Jun 7, 2013 at 5:10 PM, Carson Holt wrote: > You seem to be running this in a very odd way. First the GFF3 is not > correctly formatted. There are lines containing score=1 (all by itself)? I > believe this may be coming through because you are trying to pass in > augustus predictions as GFF3 and that input is malformed. All of your > Augustus models are also single exon genes, but they are very long and do > not even correspond to proper ORFs. The EST evidence is spliced and is > thus contradicting the augustus model (they don't support each other). If > you want MAKER to be able to use the evidence as feedback for the model, > you need to let MAKER run augustus. Otherwise it is only able to accept or > reject the model from the GFF3 (nothing more ? no attempt at consensus). > > Perhaps if you supply you input dataset and control files we can help you > get the best settings. You would need to provide the Augustus species set > you are using as well (contained in a directory in > ?/augustus/config/species). > > --Carson > > > From: Innocent Onsongo > Date: Friday, 7 June, 2013 2:08 PM > > To: Carson Holt > Cc: Carson Holt , "maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org>, Barry Moore > Subject: Re: [maker-devel] Maker: accessory scripts > > Carson, > > I have attached the full gff3 for the contig together with a screen shot > from IGV with regions I was expecting Maker to make a consensus call. The > region on question is CGS00003:5264784-5273457. I will greatly appreciate > any insights. > > > Thanks, > > Getiria > > > > > On Thu, Jun 6, 2013 at 8:55 AM, Carson Holt wrote: > >> One thing to keep in mind is the strandedness of the evidence and the >> model (they must be on the same strand). Further protein evidence is only >> valid support if it is in the same reading frame as the model. >> >> Could you send the full GFF3 for the contig (I need features and GFF3 >> internal fasta) and the coordinates of the region in question, and I can >> take a look? Also if you can, it would be good to let maker run Augustus >> as well with the species file rather than just passing in the GFF3. This >> is because MAKER can only talk to Augustus to generate competing hint based >> models if you provide the species. >> >> Thanks, >> Carson >> >> >> From: Innocent Onsongo >> Date: Wednesday, 5 June, 2013 1:10 PM >> To: Carson Holt >> Cc: Carson Holt , "maker-devel at yandell-lab.org" < >> maker-devel at yandell-lab.org>, Barry Moore >> >> Subject: Re: [maker-devel] Maker: accessory scripts >> >> I checked visually in IGV and there are some exons in the predicted model >> with protein and EST support but the maker output GFF only has match_part >> and protein_match in column 3. Does that mean Maker doesn't deem any of the >> evidence sufficient to make a gene model prediction? >> >> I guess I am somewhat surprised I am not getting any exons predicted by >> Maker. Is there a parameter I can alter to reduce the threshold at which >> Maker makes this call? I have attached the first 400 lines of one of my GFF >> files together with the control file (maker_opts.ctl) just in case they >> might be useful. >> >> Getiria >> >> >> On Wed, Jun 5, 2013 at 9:47 AM, Carson Holt wrote: >> >>> Also, just a note, models are rejected if they have no protein or EST >>> support. This is because ab inito predictors over predict (you may have 10 >>> false positives for every true positive in some genomes for example). >>> >>> --Carson >>> >>> >>> >>> From: Carson Holt >>> Date: Wednesday, 5 June, 2013 10:44 AM >>> To: Innocent Onsongo , Carson Holt < >>> carson.holt at oicr.on.ca> >>> >>> Cc: "maker-devel at yandell-lab.org" , Barry >>> Moore >>> Subject: Re: [maker-devel] Maker: accessory scripts >>> >>> All maker gene annotations will be of the format gene/mRNA/exon/CDS. >>> Anything in the format match/match_part is an evidence alignment or >>> rejected model and is there for reference purposes. If you want to upgrade >>> all of the rejected loci to gene annotations, set keep_preds=1 in the >>> control files. If you want to upgrade a subset of rejected models to a >>> full annotation, create a list of IDs (one per line) then give them to the >>> attached script. gff3_preds2models was previously deprecated and no longer >>> part of the maker distribution, but the attached script is an updated >>> version with the same functionality. >>> >>> --Carson >>> >>> >>> >>> From: Innocent Onsongo >>> Date: Wednesday, 5 June, 2013 12:35 PM >>> To: Carson Holt >>> Cc: "maker-devel at yandell-lab.org" , Barry >>> Moore >>> Subject: [maker-devel] Maker: accessory scripts >>> >>> I was able to successfully ran Maker and now want to converts the gene >>> prediction match/match_part format to annotation gene/mRNA/exon/CDS format. >>> I looked at the tutorial and the script gff3_preds2models >>> is supposed to do this conversion. How do I access this script. It is >>> not in /maker/2.28-beta/bin/ >>> >>> Also, in running gff3_preds2models is >> list> the file I used for pred_gff=? >>> >>> Long story short, how do I transform the GFF output from Maker to the >>> more traditional annotation of exon/intron? >>> >>> Thanks, >>> Getiria >>> _______________________________________________ maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> >> >> >> -- >> Getiria Onsongo, Ph.D. >> Informatics Analyst, Research Informatics Support System >> Minnesota Supercomputing Institute for Advanced Computational Research >> University of Minnesota >> Minneapolis, MN 55455 >> Phone: 612-624-0532 >> > > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Jun 10 06:40:35 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Jun 2013 08:40:35 -0400 Subject: [maker-devel] Maker and mono-exonic genes ? In-Reply-To: Message-ID: One more note. The ESTs appear to be from multiple overlapping HSPs (based on red line pattern in image). I'd have to see the actual GFF3 to be sure, but if that is the case, then there probably isn't an ORF to work with at that location on that strand (so SNAP can't call it). Possibly the result of assembly error or a pseudogene. --Carson From: Daniel Ence Date: Friday, 7 June, 2013 5:32 PM To: B?r?nice Benayoun , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Maker and mono-exonic genes ? Hi Berenice, Thank you for sending that screenshot and the maker_opts.log file. Those are exactly what we need to understand how to expect MAKER to perform. In looking at the screenshot, it doesn't look like any of the gene predictors gave a prediction in this region. Uses the predictions from ab-initio tools as a basis for models and considers models that are supported by evidence. It won't by default create a model when there isn't a prediction in the region. Can I ask which gene predictors you used and how they were trained? You might consider training one or more of them on the specific evidence that you expect to support these genes and then rerunning maker with the retrained predictors. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of B?r?nice Benayoun [benayoun at stanford.edu] Sent: Friday, June 07, 2013 11:17 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] Maker and mono-exonic genes ? Dear maker developers, I am currently annotating a de novo fish genome, and have started looking for genes of interest in particular in Maker's output to verify that it's outputting proper gene sets. While many of the genes I look for seem to be correctly annotated by the pipeline, I have noticed that important genes that do have strong evidentiary support but are monoexonic are NOT reported by maker. I am attaching a screenshot for the contig that I know should contain the Foxl2 gene (notoriously monoexonic across evolution), and highlighted the corresponding evidence for it. Is there any setting I can give to maker to force it to output monoexonic genes ? I already set "single_exon=1" with no success. I attached my config file FYI. Thank you so much in advance for your answer !!! Best, Berenice. -- B?r?nice A. BENAYOUN, Ph.D. Stanford University/Genetics Department BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' M312 Alway Building 300, Pasteur Drive MC 5120 Stanford, CA 94305-5120 USA Email: benayoun at stanford.edu Web: www.stanford.edu/group/brunet/ _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michel.moser at ips.unibe.ch Mon Jun 10 07:03:06 2013 From: michel.moser at ips.unibe.ch (michel.moser at ips.unibe.ch) Date: Mon, 10 Jun 2013 13:03:06 +0000 Subject: [maker-devel] maker 2.28 blastx error Message-ID: Hello Maker-developper and user I am using maker for the first time to annotate some BAC-sequences. I successfully run both of the test-data sets provided in the maker tarball but when i run maker on my sequences and provide some EST-evidence from cufflinks, i get errors at repeat database blasting (see error below). As te_protein data set i just use the provided file in maker/data/. I sent the data to a colleague which could run it without problem using maker2.10. Or is the problem that i dont have wublast and RepBase installed? Any hint is highly appreciated! Thanks, Michel std.error STATUS: Parsing control files... WARNING: blast_type is set to 'wublast' but executables cannot be located The blast_type 'ncbi+' will be used instead. STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/insert-bac2_datastore To access files for individual sequences use the datastore index: /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/insert-bac2_master_datastore_index.log STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: bac2:383-131865 Length: 131482 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks doing repeat masking doing blastx repeats formating database... #--------- command -------------# Widget::formater: /usr/bin/makeblastdb -dbtype prot -in /tmp/maker_rcBcxr/0/blastprep/te_proteins%2Efasta.mpi.10.0 #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /usr/bin/blastx -db /tmp/maker_rcBcxr/te_proteins%2Efasta.mpi.10.0 -query /tmp/maker_rcBcxr/0/bac2%3A383-131865.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/insert-bac2_datastore/1D/F1/bac2%3A383-131865//theVoid.bac2%3A383-131865/0/bac2%3A383-131865.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner #-------------------------------# BLAST engine error: Warning: Sequence contains no data BLAST engine error: Warning: Sequence contains no data ERROR: BLASTX failed --> rank=NA, hostname=ipsktube ERROR: Failed while doing blastx repeats ERROR: Chunk failed at level:1, tier_type:1 FAILED CONTIG:bac2:383-131865 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:bac2:383-131865 examining contents of the fasta file and run log -------------- next part -------------- A non-text attachment was scrubbed... Name: test1.fasta Type: application/octet-stream Size: 14791 bytes Desc: test1.fasta URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_bopts.ctl Type: application/octet-stream Size: 1413 bytes Desc: maker_bopts.ctl URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_exe.ctl Type: application/octet-stream Size: 1201 bytes Desc: maker_exe.ctl URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4457 bytes Desc: maker_opts.ctl URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: protein.fasta Type: application/octet-stream Size: 452 bytes Desc: protein.fasta URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: insert-bac2.fasta Type: application/octet-stream Size: 131500 bytes Desc: insert-bac2.fasta URL: From anthony.bretaudeau at rennes.inra.fr Mon Jun 10 09:48:13 2013 From: anthony.bretaudeau at rennes.inra.fr (Anthony Bretaudeau) Date: Mon, 10 Jun 2013 17:48:13 +0200 Subject: [maker-devel] Patch for a bug with repeat gff Message-ID: <51B5F53D.90505@rennes.inra.fr> Hello, I am running Maker 2.27b on an insect genome, and I use a gff file containing some repeat positions (rm_gff option in maker_opts.ctl). I encountered an error on 10 scaffolds (the genome contains ~40000 scaffolds) : "substr outside of string" (similar to this post: http://gmod.827538.n3.nabble.com/substr-outside-of-string-td4031889.html). After a lot a debugging, it turns out the problem came from the code of "phathits_on_chunk" function in lib/GFFDB.pm, near line 539: there is a SQL query that fetches features that overlap with the border of the sequence chunk. The problem is that it also fetches features that are completely outside of the chunk in the same region. This produces an error when maker tries to mask the sequence as it does a substr outside the string. I fixed it by patching lib/repeat_mask_seq.pm, near line 138: I replaced: substr($$seq, $b -1 , $l, "$replace"x$l); By: if ($b < length($$seq)) { substr($$seq, $b -1 , $l, "$replace"x$l); } I don't know if there is a more elegant solution, but this seems to solve the problem. Cheers Anthony From carsonhh at gmail.com Mon Jun 10 10:13:50 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Jun 2013 12:13:50 -0400 Subject: [maker-devel] Patch for a bug with repeat gff In-Reply-To: <51B5F53D.90505@rennes.inra.fr> Message-ID: Could you use MAKER version 2.28 instead (launch with maker -a if it still fails). Thanks, Carson On 13-06-10 11:48 AM, "Anthony Bretaudeau" wrote: >Hello, >I am running Maker 2.27b on an insect genome, and I use a gff file >containing some repeat positions (rm_gff option in maker_opts.ctl). > >I encountered an error on 10 scaffolds (the genome contains ~40000 >scaffolds) : "substr outside of string" (similar to this post: >http://gmod.827538.n3.nabble.com/substr-outside-of-string-td4031889.html). > >After a lot a debugging, it turns out the problem came from the code of >"phathits_on_chunk" function in lib/GFFDB.pm, near line 539: there is a >SQL query that fetches features that overlap with the border of the >sequence chunk. >The problem is that it also fetches features that are completely outside >of the chunk in the same region. This produces an error when maker tries >to mask the sequence as it does a substr outside the string. > >I fixed it by patching lib/repeat_mask_seq.pm, near line 138: >I replaced: > substr($$seq, $b -1 , $l, "$replace"x$l); >By: > if ($b < length($$seq)) { > substr($$seq, $b -1 , $l, "$replace"x$l); > } > >I don't know if there is a more elegant solution, but this seems to >solve the problem. > >Cheers >Anthony > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From barry.moore at genetics.utah.edu Mon Jun 10 11:13:49 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Mon, 10 Jun 2013 11:13:49 -0600 Subject: [maker-devel] maker 2.28 blastx error In-Reply-To: References: Message-ID: <1618E393-D123-4D96-AD98-8DDFA9BCD9EF@genetics.utah.edu> Hi Michel, Yes wublast is the problem. On current versions of maker the opts file defaults to ncbi+, but older versions the opts file defaults to wublast. Just edit you maker_bopts.ctl file to have the line: blast_type=ncbi+ It seems like this option may have been in maker_opts.ctl in older files, so if you don't find it in bopts then look in opts. B On Jun 10, 2013, at 7:03 AM, wrote: > Hello Maker-developper and user > > I am using maker for the first time to annotate some BAC-sequences. > I successfully run both of the test-data sets provided in the maker tarball but when i run maker on my > sequences and provide some EST-evidence from cufflinks, i get errors at repeat database blasting (see error below). > As te_protein data set i just use the provided file in maker/data/. > > I sent the data to a colleague which could run it without problem using maker2.10. > Or is the problem that i dont have wublast and RepBase installed? > > Any hint is highly appreciated! > > Thanks, > Michel > > > std.error > > STATUS: Parsing control files... > WARNING: blast_type is set to 'wublast' but executables cannot be located > The blast_type 'ncbi+' will be used instead. > > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/insert-bac2_datastore > > To access files for individual sequences use the datastore index: > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/insert-bac2_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > #--------------------------------------------------------------------- > Now starting the contig!! > SeqID: bac2:383-131865 > Length: 131482 > #--------------------------------------------------------------------- > > > setting up GFF3 output and fasta chunks > doing repeat masking > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /usr/bin/makeblastdb -dbtype prot -in /tmp/maker_rcBcxr/0/blastprep/te_proteins%2Efasta.mpi.10.0 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /usr/bin/blastx -db /tmp/maker_rcBcxr/te_proteins%2Efasta.mpi.10.0 -query /tmp/maker_rcBcxr/0/bac2%3A383-131865.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/insert-bac2_datastore/1D/F1/bac2%3A383-131865//theVoid.bac2%3A383-131865/0/bac2%3A383-131865.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner > #-------------------------------# > BLAST engine error: Warning: Sequence contains no data > BLAST engine error: Warning: Sequence contains no data > ERROR: BLASTX failed > --> rank=NA, hostname=ipsktube > ERROR: Failed while doing blastx repeats > ERROR: Chunk failed at level:1, tier_type:1 > FAILED CONTIG:bac2:383-131865 > > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:bac2:383-131865 > > examining contents of the fasta file and run log > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Jun 10 11:32:55 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Jun 2013 13:32:55 -0400 Subject: [maker-devel] maker 2.28 blastx error In-Reply-To: <1618E393-D123-4D96-AD98-8DDFA9BCD9EF@genetics.utah.edu> Message-ID: It's actually a little more complicated than that. You are already using BLAST+. The sequence you are running on is apparently entirely masked, so there is nothing there to align. The error thrown by NCBI BLAST+ when this happens (currently "Sequence contains no data ") has changed slightly over time. As a result it causes MAKER to fail where wublast doesn't because the error it throws is still recognized, captured by MAKER, and ignored. You can probably ignore that contig, run with a different version of BLAST, or put the attached files in the ?/maker/lib/Widget/ directory. I fixed the check for the current message, so it will ignore the error (as long as the error is still going to STDERR and not STDOUT). --Carson From: Barry Moore Date: Monday, 10 June, 2013 1:13 PM To: Cc: Subject: Re: [maker-devel] maker 2.28 blastx error Hi Michel, Yes wublast is the problem. On current versions of maker the opts file defaults to ncbi+, but older versions the opts file defaults to wublast. Just edit you maker_bopts.ctl file to have the line: blast_type=ncbi+ It seems like this option may have been in maker_opts.ctl in older files, so if you don't find it in bopts then look in opts. B On Jun 10, 2013, at 7:03 AM, wrote: > Hello Maker-developper and user > > I am using maker for the first time to annotate some BAC-sequences. > I successfully run both of the test-data sets provided in the maker tarball > but when i run maker on my > sequences and provide some EST-evidence from cufflinks, i get errors at repeat > database blasting (see error below). > As te_protein data set i just use the provided file in maker/data/. > > I sent the data to a colleague which could run it without problem using > maker2.10. > Or is the problem that i dont have wublast and RepBase installed? > > Any hint is highly appreciated! > > Thanks, > Michel > > > std.error > > STATUS: Parsing control files... > WARNING: blast_type is set to 'wublast' but executables cannot be located > The blast_type 'ncbi+' will be used instead. > > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/inser > t-bac2_datastore > > To access files for individual sequences use the datastore index: > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/inser > t-bac2_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > #--------------------------------------------------------------------- > Now starting the contig!! > SeqID: bac2:383-131865 > Length: 131482 > #--------------------------------------------------------------------- > > > setting up GFF3 output and fasta chunks > doing repeat masking > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /usr/bin/makeblastdb -dbtype prot -in > /tmp/maker_rcBcxr/0/blastprep/te_proteins%2Efasta.mpi.10.0 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /usr/bin/blastx -db /tmp/maker_rcBcxr/te_proteins%2Efasta.mpi.10.0 -query > /tmp/maker_rcBcxr/0/bac2%3A383-131865.0 -num_alignments 10000 > -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 > -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/inser > t-bac2_datastore/1D/F1/bac2%3A383-131865//theVoid.bac2%3A383-131865/0/bac2%3A3 > 83-131865.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi. > 10.0.repeatrunner > #-------------------------------# > BLAST engine error: Warning: Sequence contains no data > BLAST engine error: Warning: Sequence contains no data > ERROR: BLASTX failed > --> rank=NA, hostname=ipsktube > ERROR: Failed while doing blastx repeats > ERROR: Chunk failed at level:1, tier_type:1 > FAILED CONTIG:bac2:383-131865 > > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:bac2:383-131865 > > examining contents of the fasta file and run log > > > > nsert-bac2.fasta>_______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blastn.pm Type: text/x-perl-script Size: 7442 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blastx.pm Type: text/x-perl-script Size: 7502 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: tblastx.pm Type: text/x-perl-script Size: 8364 bytes Desc: not available URL: From carsonhh at gmail.com Mon Jun 10 11:53:53 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Jun 2013 13:53:53 -0400 Subject: [maker-devel] maker 2.28 blastx error In-Reply-To: <1618E393-D123-4D96-AD98-8DDFA9BCD9EF@genetics.utah.edu> Message-ID: Never mind. It's even a little weirder than what I just explained. The contig named (bac2:383-131865) is triggering a behavior on the BioPerl indexer where it recognizes it as a region and not a contig. As a result it can't find the sequence, but also doesn't throw an error (results in an empty fasta). Solution: Just change the name of the contig. Try using 'bac2_383-131865' instread. --Carson From: Barry Moore Date: Monday, 10 June, 2013 1:13 PM To: Cc: Subject: Re: [maker-devel] maker 2.28 blastx error Hi Michel, Yes wublast is the problem. On current versions of maker the opts file defaults to ncbi+, but older versions the opts file defaults to wublast. Just edit you maker_bopts.ctl file to have the line: blast_type=ncbi+ It seems like this option may have been in maker_opts.ctl in older files, so if you don't find it in bopts then look in opts. B On Jun 10, 2013, at 7:03 AM, wrote: > Hello Maker-developper and user > > I am using maker for the first time to annotate some BAC-sequences. > I successfully run both of the test-data sets provided in the maker tarball > but when i run maker on my > sequences and provide some EST-evidence from cufflinks, i get errors at repeat > database blasting (see error below). > As te_protein data set i just use the provided file in maker/data/. > > I sent the data to a colleague which could run it without problem using > maker2.10. > Or is the problem that i dont have wublast and RepBase installed? > > Any hint is highly appreciated! > > Thanks, > Michel > > > std.error > > STATUS: Parsing control files... > WARNING: blast_type is set to 'wublast' but executables cannot be located > The blast_type 'ncbi+' will be used instead. > > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/inser > t-bac2_datastore > > To access files for individual sequences use the datastore index: > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/inser > t-bac2_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > #--------------------------------------------------------------------- > Now starting the contig!! > SeqID: bac2:383-131865 > Length: 131482 > #--------------------------------------------------------------------- > > > setting up GFF3 output and fasta chunks > doing repeat masking > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /usr/bin/makeblastdb -dbtype prot -in > /tmp/maker_rcBcxr/0/blastprep/te_proteins%2Efasta.mpi.10.0 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /usr/bin/blastx -db /tmp/maker_rcBcxr/te_proteins%2Efasta.mpi.10.0 -query > /tmp/maker_rcBcxr/0/bac2%3A383-131865.0 -num_alignments 10000 > -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 > -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/inser > t-bac2_datastore/1D/F1/bac2%3A383-131865//theVoid.bac2%3A383-131865/0/bac2%3A3 > 83-131865.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi. > 10.0.repeatrunner > #-------------------------------# > BLAST engine error: Warning: Sequence contains no data > BLAST engine error: Warning: Sequence contains no data > ERROR: BLASTX failed > --> rank=NA, hostname=ipsktube > ERROR: Failed while doing blastx repeats > ERROR: Chunk failed at level:1, tier_type:1 > FAILED CONTIG:bac2:383-131865 > > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:bac2:383-131865 > > examining contents of the fasta file and run log > > > > nsert-bac2.fasta>_______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From anthony.bretaudeau at rennes.inra.fr Tue Jun 11 09:03:42 2013 From: anthony.bretaudeau at rennes.inra.fr (Anthony Bretaudeau) Date: Tue, 11 Jun 2013 17:03:42 +0200 Subject: [maker-devel] Patch for a bug with repeat gff In-Reply-To: References: Message-ID: <51B73C4E.6030204@rennes.inra.fr> Hello, I have just tested with 2.28b: the problem is still there, and my fix works on this version too. Cheers Anthony On 10/06/2013 18:13, Carson Holt wrote: > Could you use MAKER version 2.28 instead (launch with maker -a if it still > fails). > > Thanks, > Carson > > > > On 13-06-10 11:48 AM, "Anthony Bretaudeau" > wrote: > >> Hello, >> I am running Maker 2.27b on an insect genome, and I use a gff file >> containing some repeat positions (rm_gff option in maker_opts.ctl). >> >> I encountered an error on 10 scaffolds (the genome contains ~40000 >> scaffolds) : "substr outside of string" (similar to this post: >> http://gmod.827538.n3.nabble.com/substr-outside-of-string-td4031889.html). >> >> After a lot a debugging, it turns out the problem came from the code of >> "phathits_on_chunk" function in lib/GFFDB.pm, near line 539: there is a >> SQL query that fetches features that overlap with the border of the >> sequence chunk. >> The problem is that it also fetches features that are completely outside >> of the chunk in the same region. This produces an error when maker tries >> to mask the sequence as it does a substr outside the string. >> >> I fixed it by patching lib/repeat_mask_seq.pm, near line 138: >> I replaced: >> substr($$seq, $b -1 , $l, "$replace"x$l); >> By: >> if ($b < length($$seq)) { >> substr($$seq, $b -1 , $l, "$replace"x$l); >> } >> >> I don't know if there is a more elegant solution, but this seems to >> solve the problem. >> >> Cheers >> Anthony >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From carsonhh at gmail.com Tue Jun 11 09:06:10 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 11 Jun 2013 11:06:10 -0400 Subject: [maker-devel] Patch for a bug with repeat gff In-Reply-To: <51B73C4E.6030204@rennes.inra.fr> Message-ID: Could you send me your repeat_gff and genome fasta, so I can take a look. Thanks, Carson On 13-06-11 11:03 AM, "Anthony Bretaudeau" wrote: >Hello, >I have just tested with 2.28b: the problem is still there, and my fix >works on this version too. >Cheers >Anthony > >On 10/06/2013 18:13, Carson Holt wrote: >> Could you use MAKER version 2.28 instead (launch with maker -a if it >>still >> fails). >> >> Thanks, >> Carson >> >> >> >> On 13-06-10 11:48 AM, "Anthony Bretaudeau" >> wrote: >> >>> Hello, >>> I am running Maker 2.27b on an insect genome, and I use a gff file >>> containing some repeat positions (rm_gff option in maker_opts.ctl). >>> >>> I encountered an error on 10 scaffolds (the genome contains ~40000 >>> scaffolds) : "substr outside of string" (similar to this post: >>> >>>http://gmod.827538.n3.nabble.com/substr-outside-of-string-td4031889.html >>>). >>> >>> After a lot a debugging, it turns out the problem came from the code of >>> "phathits_on_chunk" function in lib/GFFDB.pm, near line 539: there is a >>> SQL query that fetches features that overlap with the border of the >>> sequence chunk. >>> The problem is that it also fetches features that are completely >>>outside >>> of the chunk in the same region. This produces an error when maker >>>tries >>> to mask the sequence as it does a substr outside the string. >>> >>> I fixed it by patching lib/repeat_mask_seq.pm, near line 138: >>> I replaced: >>> substr($$seq, $b -1 , $l, "$replace"x$l); >>> By: >>> if ($b < length($$seq)) { >>> substr($$seq, $b -1 , $l, "$replace"x$l); >>> } >>> >>> I don't know if there is a more elegant solution, but this seems to >>> solve the problem. >>> >>> Cheers >>> Anthony >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > From anthony.bretaudeau at rennes.inra.fr Wed Jun 12 07:29:14 2013 From: anthony.bretaudeau at rennes.inra.fr (Anthony Bretaudeau) Date: Wed, 12 Jun 2013 15:29:14 +0200 Subject: [maker-devel] Patch for a bug with repeat gff In-Reply-To: References: Message-ID: <51B877AA.8060803@rennes.inra.fr> Hi, Here is a minimal gff file that allows to reproduce the bug. It should work with any fasta (my real data is not yet published, I can't share it publicly yet). Tell me if you need more info Anthony On 11/06/2013 17:06, Carson Holt wrote: > Could you send me your repeat_gff and genome fasta, so I can take a look. > > Thanks, > Carson > > > > On 13-06-11 11:03 AM, "Anthony Bretaudeau" > wrote: > >> Hello, >> I have just tested with 2.28b: the problem is still there, and my fix >> works on this version too. >> Cheers >> Anthony >> >> On 10/06/2013 18:13, Carson Holt wrote: >>> Could you use MAKER version 2.28 instead (launch with maker -a if it >>> still >>> fails). >>> >>> Thanks, >>> Carson >>> >>> >>> >>> On 13-06-10 11:48 AM, "Anthony Bretaudeau" >>> wrote: >>> >>>> Hello, >>>> I am running Maker 2.27b on an insect genome, and I use a gff file >>>> containing some repeat positions (rm_gff option in maker_opts.ctl). >>>> >>>> I encountered an error on 10 scaffolds (the genome contains ~40000 >>>> scaffolds) : "substr outside of string" (similar to this post: >>>> >>>> http://gmod.827538.n3.nabble.com/substr-outside-of-string-td4031889.html >>>> ). >>>> >>>> After a lot a debugging, it turns out the problem came from the code of >>>> "phathits_on_chunk" function in lib/GFFDB.pm, near line 539: there is a >>>> SQL query that fetches features that overlap with the border of the >>>> sequence chunk. >>>> The problem is that it also fetches features that are completely >>>> outside >>>> of the chunk in the same region. This produces an error when maker >>>> tries >>>> to mask the sequence as it does a substr outside the string. >>>> >>>> I fixed it by patching lib/repeat_mask_seq.pm, near line 138: >>>> I replaced: >>>> substr($$seq, $b -1 , $l, "$replace"x$l); >>>> By: >>>> if ($b < length($$seq)) { >>>> substr($$seq, $b -1 , $l, "$replace"x$l); >>>> } >>>> >>>> I don't know if there is a more elegant solution, but this seems to >>>> solve the problem. >>>> >>>> Cheers >>>> Anthony >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- scaffold_20 TEs match 199889 203598 0.0 + . ID=some_id_1 scaffold_20 TEs match_part 199889 200163 2.6e-12 + . ID=part_1;Parent=some_id_1 scaffold_20 TEs match_part 203256 203598 2.6e-12 + . ID=part_2;Parent=some_id_1 From sickler.alex at gmail.com Wed Jun 12 12:22:17 2013 From: sickler.alex at gmail.com (Alex Sickler) Date: Wed, 12 Jun 2013 14:22:17 -0400 Subject: [maker-devel] Problem Installing with opencc Message-ID: Hi all, I am trying to install Maker 2.28. When I go to install Maker, it gives the following error message: /usr/bin/perl /usr/local/share/perl5/ExtUtils/xsubpp -typemap "/usr/share/perl5/ExtUtils/typemap" MPI.xs $ /share/apps/openmpi/OpenMPI-1.6.3/bin/mpicc -c -I"/share/apps/maker/src" -I/share/apps/openmpi/OpenMPI-1.6.3/include -D_REENTRANT -D_GNU_SOUR$ opencc WARNING: unknown flag: -fstack-protector opencc WARNING: unknown flag: -fstack-protector opencc ERROR: -- not allowed in non XPG4 environment opencc ERROR parsing --param=ssp-buffer-size=4: unknown flag make: *** [MPI.o] Error 2 The to everything is correct. I tried looking in the Makefile.PL but could not find the "param=" option. Any help is greatly appreciated, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jun 13 13:38:52 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 13 Jun 2013 15:38:52 -0400 Subject: [maker-devel] Problem Installing with opencc In-Reply-To: Message-ID: MAKER installation doesn't have a Makefile.PL. The parameters for compilation of the MPI bindings are being set by mpicc itself, Perl, or environmental variables on your system. In general you want both Perl and OpenMPI to be compiled by the same compiler or you can get cross library problems (as Perl is using the shared libraries in OpenMPI so all communication is really at the C level). This is not always the case, but can happen (I have been fine for the most part mixing pgi, intel, and gcc compiled OpenMPI, but have never tried open64 compilers). Alternatively you can try manually setting the values in the following environmental variables before installing MAKER which should affect the parameter settings (this means before even running the 'perl Build.PL' step): LDFLAGS= LDDLFLAGS= CCCDLFLAGS= CCDLFLAGS= Also you need to export the following variable for OpenMPI to work with shared libraries before trying to install MAKER or run MAKER (this means before even running the 'perl Build.PL' step). It's best just to add it to your ~/.bashrc or ~/.bash_profile. export LD_PRELOAD=/share/apps/openmpi/OpenMPI-1.6.3/lib/libmpi.so You will need to run 'source ~/.bashrc' or 'source ~/.bash_profile' after adding it to implement the changes into the current terminal session. Thanks, Carson From: Alex Sickler Date: Wednesday, 12 June, 2013 2:22 PM To: Cc: Subject: [maker-devel] Problem Installing with opencc Hi all, I am trying to install Maker 2.28. When I go to install Maker, it gives the following error message: /usr/bin/perl /usr/local/share/perl5/ExtUtils/xsubpp -typemap "/usr/share/perl5/ExtUtils/typemap" MPI.xs $ /share/apps/openmpi/OpenMPI-1.6.3/bin/mpicc -c -I"/share/apps/maker/src" -I/share/apps/openmpi/OpenMPI-1.6.3/include -D_REENTRANT -D_GNU_SOUR$ opencc WARNING: unknown flag: -fstack-protector opencc WARNING: unknown flag: -fstack-protector opencc ERROR: -- not allowed in non XPG4 environment opencc ERROR parsing --param=ssp-buffer-size=4: unknown flag make: *** [MPI.o] Error 2 The to everything is correct. I tried looking in the Makefile.PL but could not find the "param=" option. Any help is greatly appreciated, Alex _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun Jun 16 13:46:51 2013 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 16 Jun 2013 15:46:51 -0400 Subject: [maker-devel] Patch for a bug with repeat gff In-Reply-To: <51B877AA.8060803@rennes.inra.fr> Message-ID: Thanks for the detailed report and test files. The problem initiates with your GFF3 giving a repeat structure that is a spliced repeat. I don't know if such a thing can really occur, but regardless maker doesn't expect them to occur, and as a result when assembled some of the spliced exons run off the edge of the sequence. The script currently checks for repeats where the end of a repeat runs off the edge and adjusts accordingly, but does not check for a start that runs off the edge (because it's not expecting spliced repeats). The result is the substring outside of string error. I added 'next if($l <=0)' to both the _soft_mask_seq and _hard_mask_seq functions, and hopefully having spliced repeats won't cause other hidden errors elsewhere downstream, but you may need to be aware of the possibility. Thanks, Carson On 13-06-12 9:29 AM, "Anthony Bretaudeau" wrote: >Hi, >Here is a minimal gff file that allows to reproduce the bug. It should >work with any fasta (my real data is not yet published, I can't share it >publicly yet). >Tell me if you need more info >Anthony > >On 11/06/2013 17:06, Carson Holt wrote: >> Could you send me your repeat_gff and genome fasta, so I can take a >>look. >> >> Thanks, >> Carson >> >> >> >> On 13-06-11 11:03 AM, "Anthony Bretaudeau" >> wrote: >> >>> Hello, >>> I have just tested with 2.28b: the problem is still there, and my fix >>> works on this version too. >>> Cheers >>> Anthony >>> >>> On 10/06/2013 18:13, Carson Holt wrote: >>>> Could you use MAKER version 2.28 instead (launch with maker -a if it >>>> still >>>> fails). >>>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> >>>> On 13-06-10 11:48 AM, "Anthony Bretaudeau" >>>> wrote: >>>> >>>>> Hello, >>>>> I am running Maker 2.27b on an insect genome, and I use a gff file >>>>> containing some repeat positions (rm_gff option in maker_opts.ctl). >>>>> >>>>> I encountered an error on 10 scaffolds (the genome contains ~40000 >>>>> scaffolds) : "substr outside of string" (similar to this post: >>>>> >>>>> >>>>>http://gmod.827538.n3.nabble.com/substr-outside-of-string-td4031889.ht >>>>>ml >>>>> ). >>>>> >>>>> After a lot a debugging, it turns out the problem came from the code >>>>>of >>>>> "phathits_on_chunk" function in lib/GFFDB.pm, near line 539: there >>>>>is a >>>>> SQL query that fetches features that overlap with the border of the >>>>> sequence chunk. >>>>> The problem is that it also fetches features that are completely >>>>> outside >>>>> of the chunk in the same region. This produces an error when maker >>>>> tries >>>>> to mask the sequence as it does a substr outside the string. >>>>> >>>>> I fixed it by patching lib/repeat_mask_seq.pm, near line 138: >>>>> I replaced: >>>>> substr($$seq, $b -1 , $l, "$replace"x$l); >>>>> By: >>>>> if ($b < length($$seq)) { >>>>> substr($$seq, $b -1 , $l, "$replace"x$l); >>>>> } >>>>> >>>>> I don't know if there is a more elegant solution, but this seems to >>>>> solve the problem. >>>>> >>>>> Cheers >>>>> Anthony >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> >>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>g >> > From jmdoyle at purdue.edu Mon Jun 17 11:20:42 2013 From: jmdoyle at purdue.edu (Jacqueline R M Doyle) Date: Mon, 17 Jun 2013 13:20:42 -0400 (EDT) Subject: [maker-devel] altest without MPI? Message-ID: <1755059295.37969.1371489642806.JavaMail.root@mailhub042.itcs.purdue.edu> Hi! I am beginning my first MAKER annotation and had a quick question. I am currently planning on following the ?Training ab initio Gene Predictors? section of the MAKER 2012 tutorial. For my species of interest, I have 784290 scaffolds in which 80% are greater than 100 kb. I have EST data from a closely related species and was also going to use the core cegma protein sequences. With this in mind, I made the following changes to my maker_opts file: genome=scaffolds.fasta altest=Trinity.fasta protein=cegma.fa est2genome=1 cpus=48 My primary concern is that this is going to take a long time to run with altest, even with the extra cpus for BLAST. The software was not originally installed on our computer cluster with MPICH2, but I may be able to talk our computer guys into reinstalling if the situation is going to be completely untenable without MPI. I guess my question is, is there any point in trying to run the above without MPI? Is there a good way to monitor the progress of such a run if I was to give it a shot? Thanks for your help with this! Jackie From carsonhh at gmail.com Mon Jun 17 14:12:58 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 17 Jun 2013 16:12:58 -0400 Subject: [maker-devel] altest without MPI? In-Reply-To: <1755059295.37969.1371489642806.JavaMail.root@mailhub042.itcs.purdue.edu> Message-ID: It's best to use the cegma results with the cegma2zff script to generate a training set for SNAP. Then don't use the cegma proteins. If you can get proteins from a related species with an annotated genome, it will be better than altest option for a different species. This is because altest is aligned via tbalstx which is 3-4 time slower than protein alignments. Also they will rarely be good enough to produce many est2genome models (best to only use them if you have nothing else). The cpus= option is a blast parameter for specifying how many cpus to give to each blast job. It is not an MPI parameter. The number of cpus for MPI is specified using the -n option from mpiexec and not in the maker control files. You don't have to use MPI. You can also split your contigs up into separate jobs and run MAKER multiple times. Use the fasta_tool script that comes with MAKER to split your input file up. Let us know if you come across anything you have more questions on. Thanks, Carson On 13-06-17 1:20 PM, "Jacqueline R M Doyle" wrote: >Hi! > >I am beginning my first MAKER annotation and had a quick question. I am >currently planning on following the ?Training ab initio Gene Predictors? >section of the MAKER 2012 tutorial. For my species of interest, I have >784290 scaffolds in which 80% are greater than 100 kb. I have EST data >from a closely related species and was also going to use the core cegma >protein sequences. With this in mind, I made the following changes to my >maker_opts file: > >genome=scaffolds.fasta >altest=Trinity.fasta >protein=cegma.fa >est2genome=1 >cpus=48 > >My primary concern is that this is going to take a long time to run with >altest, even with the extra cpus for BLAST. The software was not >originally installed on our computer cluster with MPICH2, but I may be >able to talk our computer guys into reinstalling if the situation is >going to be completely untenable without MPI. I guess my question is, is >there any point in trying to run the above without MPI? Is there a good >way to monitor the progress of such a run if I was to give it a shot? > >Thanks for your help with this! > >Jackie > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Jun 19 19:05:49 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 19 Jun 2013 21:05:49 -0400 Subject: [maker-devel] altest without MPI? In-Reply-To: <1997335285.43753.1371676376399.JavaMail.root@mailhub042.itcs.purdue.edu> Message-ID: The throughput is based on contig length, so long contigs will take longer than short contigs. Any contig less than 10kb is mostly useless for annotation purposes (so you can filter those from your 800,000 right away). Take your contigs that finish, and sum up their length to get a better estimate of how long it will take to complete running. Most genomes can complete in a few days an a multi-core machine. Bigger genomes or bigger datasets take longer. (note that altest evidence takes 3-4x longer to align than proteins). The advantage of proteins is that the species do not have to be closely related. Nucleotide sequence diverges quickly and proteins slowly (that's why proteins are used for phylogenetic trees). A good strategy would be to get ~10Mb of sequence (use your longest contigs). Run with Chicken, turkey, and pigeon proteins. Use the protein2genome option to generate annotations. Those annotations should now be sufficient to train SNAP and Augustus. Then you can finish by running all your contigs with the same dataset (protein2genome now turned off), use the newly trained snap and augustus files along with any altest files you want to use. Note that the size of the dataset will determine the total run time. To get things to run faster, you can also run on your university's computer cluster (then you will have hundreds of cpus available to you). The purdue cluster supports MPI and with 30-50 cpus you could annotate even large genomes in a reasonable time. Alternatively you can request a startup account at XSEDE, an NFS funded computer resource open to all US institutions. A startup allocation with 50,000 cpu hours only takes 2 weeks to approve. You should request an allocation on the Lonestar cluster if you go that route, it has 64,000 cpus. I was able to annotate the Maize genome (which is a very large genome at over 2 gigabases). I used an abnormally large EST and protein datasets (~4 gigabases of evidence which is much more than a normal annotation job), and it completed in under 3 hours on 2,100 cpus. --Carson On 13-06-19 5:12 PM, "Jacqueline R M Doyle" wrote: >Hi Carson (and whoever else might be reading this!) > >Thanks so much, I think splitting the files up using fasta_tool will >definitely move things along. I did a trial version with altest this >weekend, and seemed to be averaging about an hour a scaffold (with 1 >cpu). I'm a little concerned, as we have ~800,000 scaffolds. Does this >seem like a reasonable estimate of the time it should take to annotate >one sequence? Could I be missing something in my maker_opts file? > >Let me back up for just a minute and describe the project a little more >generally. As I mentioned before, we have no protein sequences or ESTs >for our species of interest, which is an avian species. I could >potentially use proteins from chicken or turkey, but neither is closely >related to our species. Time is a bit of an issue... do you have any >thoughts on how much time per scaffold it should take to annotate using >protein2genome? If chicken and turkey are not closely related, is it >worth the time investment? > >Let me finish by saying I think MAKER is wonderful, and I really >appreciate the discussions on this group. > >Best wishes, Jackie From jjin01 at mail.rockefeller.edu Thu Jun 20 14:22:22 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Thu, 20 Jun 2013 20:22:22 +0000 Subject: [maker-devel] maker exon result Message-ID: Dear all, I have used maker to predict the gene model in my draft genome. However, when I check the sequence for each exon, I find some of them just have start codon, without stop codon. Is it reasonable for this? Like in this example: processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9 processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0|0|1|0.14|0.12|8|0|362 processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . + 2 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG This is the sequence for this gene, the red color is for the first exon?? However, for this exon, I cannot found the stop codon??? I also find for some exon, there are several stop codon in one exon??? Does anyone have the same problem with me? Or there is something wrong when I configure the maker file?? Thanks! Jingjing -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Thu Jun 20 17:06:29 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 20 Jun 2013 23:06:29 +0000 Subject: [maker-devel] maker exon result In-Reply-To: References: Message-ID: Hi Jingjing, It's really hard to find the stop codon in the nucleotide sequence that you sent. I think most people determine the presence of a stop codon in a gene by viewing the annotations and sequence in some kind of viewer. The one that I use the most is Apollo, but many people also like gbrowse and igv. When you view gene models in Apollo, the start codons are highlighted in green and the stop codons are highlighted in red. Sometimes MAKER couldn't find the stop or start codon for a gene, and in those cases, the end of the gene model is marked with an orange arrow. I hope that I understood your question. Feel free to reply back on the mailing list if I didn't. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing Jin [jjin01 at mail.rockefeller.edu] Sent: Thursday, June 20, 2013 2:22 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker exon result Dear all, I have used maker to predict the gene model in my draft genome. However, when I check the sequence for each exon, I find some of them just have start codon, without stop codon. Is it reasonable for this? Like in this example: processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9 processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0|0|1|0.14|0.12|8|0|362 processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . + 2 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG This is the sequence for this gene, the red color is for the first exon?? However, for this exon, I cannot found the stop codon??? I also find for some exon, there are several stop codon in one exon??? Does anyone have the same problem with me? Or there is something wrong when I configure the maker file?? Thanks! Jingjing -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Thu Jun 20 17:11:56 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu, 20 Jun 2013 17:11:56 -0600 Subject: [maker-devel] maker exon result In-Reply-To: References: Message-ID: <6312A919-6E3A-43F5-A553-5947204FC6DB@genetics.utah.edu> To add to what Daniel suggested if you want to find the stop codon for this gene, look at the last three nucleotides of the last CDS. B On Jun 20, 2013, at 5:06 PM, Daniel Ence wrote: > Hi Jingjing, > > It's really hard to find the stop codon in the nucleotide sequence that you sent. I think most people determine the presence of a stop codon in a gene by viewing the annotations and sequence in some kind of viewer. The one that I use the most is Apollo, but many people also like gbrowse and igv. > > When you view gene models in Apollo, the start codons are highlighted in green and the stop codons are highlighted in red. Sometimes MAKER couldn't find the stop or start codon for a gene, and in those cases, the end of the gene model is marked with an orange arrow. > > I hope that I understood your question. Feel free to reply back on the mailing list if I didn't. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing Jin [jjin01 at mail.rockefeller.edu] > Sent: Thursday, June 20, 2013 2:22 PM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] maker exon result > > Dear all, > > I have used maker to predict the gene model in my draft genome. > > However, when I check the sequence for each exon, I find some of them just have start codon, without stop codon. > > Is it reasonable for this? > > Like in this example: > > processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9 > processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0|0|1|0.14|0.12|8|0|362 > processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . + 2 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > > ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT > CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT > AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG > ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA > TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC > TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG > TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG > ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT > TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT > TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC > TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC > AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT > TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT > AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC > TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA > AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT > TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG > CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC > GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT > TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG > AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA > AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA > TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC > TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC > TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG > TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA > TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC > TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA > ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT > GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA > AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA > GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT > TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT > TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG > AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC > TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT > TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT > AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC > TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA > GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG > GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA > TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA > TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT > TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG > CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA > GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG > AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT > ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT > TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA > CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA > CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT > ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA > TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG > AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG > > > This is the sequence for this gene, the red color is for the first exon?? > > However, for this exon, I cannot found the stop codon??? > > I also find for some exon, there are several stop codon in one exon??? > > Does anyone have the same problem with me? > Or there is something wrong when I configure the maker file?? > > Thanks! > > Jingjing > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Thu Jun 20 18:18:18 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Fri, 21 Jun 2013 00:18:18 +0000 Subject: [maker-devel] maker exon result In-Reply-To: References: , Message-ID: For my understanding, the prediction gene model should be connect different exon together. For each exon of a gene, I think it should have a start codon and stop codon. However, it may be wrong. However, when I check some gene model from maker prediction, some exon of one gene, I cannot find stop codon for it. Like the example I give, the red color is the first exon. However, the last 3 NT is not a stop codon. Even for last 3 NT for last exon, it is also not a stop codon. Is it reasonable? Thanks! Jingjing ________________________________ From: Daniel Ence [dence at genetics.utah.edu] Sent: Thursday, June 20, 2013 7:06 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: RE: maker exon result Hi Jingjing, It's really hard to find the stop codon in the nucleotide sequence that you sent. I think most people determine the presence of a stop codon in a gene by viewing the annotations and sequence in some kind of viewer. The one that I use the most is Apollo, but many people also like gbrowse and igv. When you view gene models in Apollo, the start codons are highlighted in green and the stop codons are highlighted in red. Sometimes MAKER couldn't find the stop or start codon for a gene, and in those cases, the end of the gene model is marked with an orange arrow. I hope that I understood your question. Feel free to reply back on the mailing list if I didn't. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing Jin [jjin01 at mail.rockefeller.edu] Sent: Thursday, June 20, 2013 2:22 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker exon result Dear all, I have used maker to predict the gene model in my draft genome. However, when I check the sequence for each exon, I find some of them just have start codon, without stop codon. Is it reasonable for this? Like in this example: processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9 processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0|0|1|0.14|0.12|8|0|362 processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . + 2 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG This is the sequence for this gene, the red color is for the first exon?? However, for this exon, I cannot found the stop codon??? I also find for some exon, there are several stop codon in one exon??? Does anyone have the same problem with me? Or there is something wrong when I configure the maker file?? Thanks! Jingjing -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Thu Jun 20 18:21:38 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Fri, 21 Jun 2013 00:21:38 +0000 Subject: [maker-devel] maker exon result In-Reply-To: <6312A919-6E3A-43F5-A553-5947204FC6DB@genetics.utah.edu> References: , <6312A919-6E3A-43F5-A553-5947204FC6DB@genetics.utah.edu> Message-ID: For the last three nucleotides of this example, it is also not stop codon. Jingjing ________________________________ From: Barry Moore [barry.moore at genetics.utah.edu] Sent: Thursday, June 20, 2013 7:11 PM To: Daniel Ence Cc: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] maker exon result To add to what Daniel suggested if you want to find the stop codon for this gene, look at the last three nucleotides of the last CDS. B On Jun 20, 2013, at 5:06 PM, Daniel Ence wrote: Hi Jingjing, It's really hard to find the stop codon in the nucleotide sequence that you sent. I think most people determine the presence of a stop codon in a gene by viewing the annotations and sequence in some kind of viewer. The one that I use the most is Apollo, but many people also like gbrowse and igv. When you view gene models in Apollo, the start codons are highlighted in green and the stop codons are highlighted in red. Sometimes MAKER couldn't find the stop or start codon for a gene, and in those cases, the end of the gene model is marked with an orange arrow. I hope that I understood your question. Feel free to reply back on the mailing list if I didn't. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing Jin [jjin01 at mail.rockefeller.edu] Sent: Thursday, June 20, 2013 2:22 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker exon result Dear all, I have used maker to predict the gene model in my draft genome. However, when I check the sequence for each exon, I find some of them just have start codon, without stop codon. Is it reasonable for this? Like in this example: processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9 processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0|0|1|0.14|0.12|8|0|362 processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . + 2 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG This is the sequence for this gene, the red color is for the first exon?? However, for this exon, I cannot found the stop codon??? I also find for some exon, there are several stop codon in one exon??? Does anyone have the same problem with me? Or there is something wrong when I configure the maker file?? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From myandell at genetics.utah.edu Thu Jun 20 19:11:40 2013 From: myandell at genetics.utah.edu (Mark Yandell) Date: Fri, 21 Jun 2013 01:11:40 +0000 Subject: [maker-devel] maker exon result In-Reply-To: References: , , Message-ID: <7A60AB257EFF2B48B1F4C814817EA05365E18B22@mxb2.hg.genetics.utah.edu> Hi Jin, only the terminal coding exon (CDS) of a gene model will contain a stop codon. Sometimes though there is no stop codon as the gene actually runs of the end of the scaffold, or is lost in a gab in the assembly... --mark Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing Jin [jjin01 at mail.rockefeller.edu] Sent: Thursday, June 20, 2013 6:18 PM To: Daniel Ence; maker-devel at yandell-lab.org Subject: Re: [maker-devel] maker exon result For my understanding, the prediction gene model should be connect different exon together. For each exon of a gene, I think it should have a start codon and stop codon. However, it may be wrong. However, when I check some gene model from maker prediction, some exon of one gene, I cannot find stop codon for it. Like the example I give, the red color is the first exon. However, the last 3 NT is not a stop codon. Even for last 3 NT for last exon, it is also not a stop codon. Is it reasonable? Thanks! Jingjing ________________________________ From: Daniel Ence [dence at genetics.utah.edu] Sent: Thursday, June 20, 2013 7:06 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: RE: maker exon result Hi Jingjing, It's really hard to find the stop codon in the nucleotide sequence that you sent. I think most people determine the presence of a stop codon in a gene by viewing the annotations and sequence in some kind of viewer. The one that I use the most is Apollo, but many people also like gbrowse and igv. When you view gene models in Apollo, the start codons are highlighted in green and the stop codons are highlighted in red. Sometimes MAKER couldn't find the stop or start codon for a gene, and in those cases, the end of the gene model is marked with an orange arrow. I hope that I understood your question. Feel free to reply back on the mailing list if I didn't. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing Jin [jjin01 at mail.rockefeller.edu] Sent: Thursday, June 20, 2013 2:22 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker exon result Dear all, I have used maker to predict the gene model in my draft genome. However, when I check the sequence for each exon, I find some of them just have start codon, without stop codon. Is it reasonable for this? Like in this example: processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9 processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0|0|1|0.14|0.12|8|0|362 processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . + 2 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG This is the sequence for this gene, the red color is for the first exon?? However, for this exon, I cannot found the stop codon??? I also find for some exon, there are several stop codon in one exon??? Does anyone have the same problem with me? Or there is something wrong when I configure the maker file?? Thanks! Jingjing From bmoore at genetics.utah.edu Thu Jun 20 19:29:41 2013 From: bmoore at genetics.utah.edu (Barry Moore) Date: Fri, 21 Jun 2013 01:29:41 +0000 Subject: [maker-devel] maker exon result In-Reply-To: References: , , Message-ID: <8BA467BB-5549-4385-A398-65951A19B86C@genetics.utah.edu> To clarify things a bit Jin. Not every exon will have a start and/or stop codon only the fist coding exon will have a start and the last coding exon will have a stop. In the GFF3 format a coding exon is a feature of type 'CDS' (column 3) so only look at CDS features not at 'exon' features. For CDSs you must then concatenate the sequence I'd each CDS line for a given transcript (and reverse compliment the sequence if it is on the minus strand). The resulting sequence will usually (but not always) have start and stop codons at the beginning and end. B Barry Moore Research Scientist Dept. Human Genetics University of Utah On Jun 20, 2013, at 6:18 PM, "Jingjing Jin" > wrote: For my understanding, the prediction gene model should be connect different exon together. For each exon of a gene, I think it should have a start codon and stop codon. However, it may be wrong. However, when I check some gene model from maker prediction, some exon of one gene, I cannot find stop codon for it. Like the example I give, the red color is the first exon. However, the last 3 NT is not a stop codon. Even for last 3 NT for last exon, it is also not a stop codon. Is it reasonable? Thanks! Jingjing ________________________________ From: Daniel Ence [dence at genetics.utah.edu] Sent: Thursday, June 20, 2013 7:06 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: RE: maker exon result Hi Jingjing, It's really hard to find the stop codon in the nucleotide sequence that you sent. I think most people determine the presence of a stop codon in a gene by viewing the annotations and sequence in some kind of viewer. The one that I use the most is Apollo, but many people also like gbrowse and igv. When you view gene models in Apollo, the start codons are highlighted in green and the stop codons are highlighted in red. Sometimes MAKER couldn't find the stop or start codon for a gene, and in those cases, the end of the gene model is marked with an orange arrow. I hope that I understood your question. Feel free to reply back on the mailing list if I didn't. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing Jin [jjin01 at mail.rockefeller.edu] Sent: Thursday, June 20, 2013 2:22 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker exon result Dear all, I have used maker to predict the gene model in my draft genome. However, when I check the sequence for each exon, I find some of them just have start codon, without stop codon. Is it reasonable for this? Like in this example: processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9 processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0|0|1|0.14|0.12|8|0|362 processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . + 2 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG This is the sequence for this gene, the red color is for the first exon?? However, for this exon, I cannot found the stop codon??? I also find for some exon, there are several stop codon in one exon??? Does anyone have the same problem with me? Or there is something wrong when I configure the maker file?? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From kara.deleon at biofilm.montana.edu Thu Jun 20 16:25:31 2013 From: kara.deleon at biofilm.montana.edu (Bowen, Kara (De Leon)) Date: Thu, 20 Jun 2013 16:25:31 -0600 Subject: [maker-devel] augustus_species Message-ID: <3E82665C-ECB7-4A07-B0FF-24E8395EDC4D@biofilm.montana.edu> Hello, I am trying to annotation a Chlamydomonas genome and C. reinhartii was used as a model organism in Augustus. I would like to add this model to augustus_species in the maker_opts.ctl file, but I'm not sure how this information should be inserted on this line (ie. as genus name, file location, etc). I am also having an issue with providing a protein file. When I put in the protein fasta file of C. reinhartti from the Augustus website, I get a fatal error (below). I've looked through the fasta and I'm not seeing anything obvious that would cause this error to be thrown. Do you have any suggestions on where to start to look? Can't open sequence index file /Users/kara/Desktop/CBMW_maker_protein/contigs.maker.output/mpi_blastdb/augustus%2Eu9_aa%2Efasta.mpi.10/augustus%2Eu9_aa%2Efasta.mpi.10.1.index: Inappropriate file type or format at /sw/lib/perl5/5.12.4/Bio/DB/Fasta.pm line 527. FATAL ERROR Thanks for any help you can provide. Kara ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Kara De Le?n Postdoctoral Research Associate Montana State University Center for Biofilm Engineering 366 EPS Building Bozeman, MT 59717 208-484-9078 kara.deleon at biofilm.montana.edu ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -------------- next part -------------- An HTML attachment was scrubbed... URL: From gowthaman.ramasamy at seattlebiomed.org Fri Jun 21 07:29:06 2013 From: gowthaman.ramasamy at seattlebiomed.org (Gowthaman Ramasamy) Date: Fri, 21 Jun 2013 06:29:06 -0700 Subject: [maker-devel] augustus_species Message-ID: I believe the model file should go to Augustus installation directory. Actually in to the 'genomes' sub folder there. Then use the exact name of the model file ( minus extension) in .CTL file....... "Bowen, Kara (De Leon)" wrote: Hello, I am trying to annotation a Chlamydomonas genome and C. reinhartii was used as a model organism in Augustus. I would like to add this model to augustus_species in the maker_opts.ctl file, but I'm not sure how this information should be inserted on this line (ie. as genus name, file location, etc). I am also having an issue with providing a protein file. When I put in the protein fasta file of C. reinhartti from the Augustus website, I get a fatal error (below). I've looked through the fasta and I'm not seeing anything obvious that would cause this error to be thrown. Do you have any suggestions on where to start to look? Can't open sequence index file /Users/kara/Desktop/CBMW_maker_protein/contigs.maker.output/mpi_blastdb/augustus%2Eu9_aa%2Efasta.mpi.10/augustus%2Eu9_aa%2Efasta.mpi.10.1.index: Inappropriate file type or format at /sw/lib/perl5/5.12.4/Bio/DB/Fasta.pm line 527. FATAL ERROR Thanks for any help you can provide. Kara ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Kara De Le?n Postdoctoral Research Associate Montana State University Center for Biofilm Engineering 366 EPS Building Bozeman, MT 59717 208-484-9078 kara.deleon at biofilm.montana.edu ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From carsonhh at gmail.com Fri Jun 21 09:24:17 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Jun 2013 11:24:17 -0400 Subject: [maker-devel] augustus_species In-Reply-To: Message-ID: The model files must go in .../augustus/config/species/ under the augustus installation directory (Each model gets a different directory). The species that augustus can accept will be the same as the directory names under .../augustus/config/species/. The command 'augustus --species=help' will also provide a list of those names. For the protein file can you send it to me? --Carson On 13-06-21 9:29 AM, "Gowthaman Ramasamy" wrote: >I believe the model file should go to Augustus installation directory. >Actually in to the 'genomes' sub folder there. Then use the exact name of >the model file ( minus extension) in .CTL file....... > >"Bowen, Kara (De Leon)" wrote: > > > >Hello, >I am trying to annotation a Chlamydomonas genome and C. reinhartii was >used as a model organism in Augustus. I would like to add this model to >augustus_species in the maker_opts.ctl file, but I'm not sure how this >information should be inserted on this line (ie. as genus name, file >location, etc). > >I am also having an issue with providing a protein file. When I put in >the protein fasta file of C. reinhartti from the Augustus website, I get >a fatal error (below). I've looked through the fasta and I'm not seeing >anything obvious that would cause this error to be thrown. Do you have >any suggestions on where to start to look? > > >Can't open sequence index file >/Users/kara/Desktop/CBMW_maker_protein/contigs.maker.output/mpi_blastdb/au >gustus%2Eu9_aa%2Efasta.mpi.10/augustus%2Eu9_aa%2Efasta.mpi.10.1.index: >Inappropriate file type or format at /sw/lib/perl5/5.12.4/Bio/DB/Fasta.pm >line 527. > >FATAL ERROR > > >Thanks for any help you can provide. > >Kara > > >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >Kara De Le?n >Postdoctoral Research Associate >Montana State University >Center for Biofilm Engineering >366 EPS Building >Bozeman, MT 59717 >208-484-9078 >kara.deleon at biofilm.montana.edu >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri Jun 21 07:58:35 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Jun 2013 09:58:35 -0400 Subject: [maker-devel] maker exon result In-Reply-To: <8BA467BB-5549-4385-A398-65951A19B86C@genetics.utah.edu> Message-ID: To further illustrate this I've highlighted the location of all CDS entries. You need to cut them out, string them together linearly, and only then can you translate. There is a start codon for the merged CDS then all open reading frame following that, but no stop codon so this is a partial transcript. Sometimes the gene predictors do not find a likely stop and a partial model scores better. You can force MAKER to try and find a stop even when the gene predictor (snap, augustus, etc.) doesn't by setting always_complete=1 in the maker_opts.ctl file. Keep in mind that this is just a forced canonical completion. ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCA GAATTATCTCCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCG GGGAAGTTGATTGAGAATAATCCATCAGGGgtgatacagaagaattgtttcagTATCTTG TTGAAATATTGGCTTCTAGAGTGTATGATGTAGCAATTGATTCCCCCTTGCAAAATGCAA CTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGATCAAAAGAGAGGATATGCAGTCCg tatgtttctcctctcttctttttttgatgtagcatttgctttaacttagaatttgtggtt ttaaacataccattagaaaggtatggaggttgaggattagggtagtaaagtaggtagtct agagtgttcataacagtaatattgacaagcagtctcgctttccgttggtagtaggttttt atgactaaccgttattttctttcattgttgatcaacttacttttgttgtttttattctgc ttttatatggctttttggtactgtcccttcttgtctatattttcattaatgtggtgctta tgcttttctaagccgagagtttattggaaacaactttcatatcctcacaaggtaggggta aggtgtgcgtacacactaccctccccagactctacggtgtgggataatatttagtatgtt attgtcgttgttgttgtaaacgttttttttgttgctatcaaagcatgttattacgggtaa aatagaaacatttaaagtgaaagagtttccaaacgtaggaaagcttttttttctttcgga atacaccgaaaaaagaaagactatcatttaagatagaacaacaacagcgacggagctagc cttcgacttactggttcggcagaacccaataattttggcccaaactctgtacttgtacta aaaagctcacttaatatgtataaaaagcctagtaattaagttgcatttttttctttctaa aatctagagctcataaactcaaaattatgtctccgcctctgaacaatggggatattattc tacttttaactatcttagataagttaataattgttctctttttcaaacgtttctgccttg tattattgtgtaactatttatactgtgtggacgcttcaaaatgttgttgcgcccgcgtcg gatcctcaaaaaatatatattttgaggattcgacacgcacccgatgaccttttcggagaa ttcgagcaatataggtaactaatattgctagctcatcaactggtggtattttttagGTGC TCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTC AGAGACTTAAATGTACTGCTACGATTgtcatgcctgttaccacaccagagatcaaggtaa ttagttctctcctgttaatttatccttcatgttcgattcatgtgaatctagttgatcggg cactgagttttactaaaaaatgaagactttcggaacttgggagctttaacatgctgtaac atttgtgtagttataagacttttgaaacttatagtcttagtgggtgtttggacataagaa ttgtaaagttccaagaaaagtgaaaaaaaattcaagtgaaaatggtatttgaaaattaga gttgtgtttggacatgaatataattttaggttgtttttgaagttttgtgagtgatctgac acaaattttgaaaaaacaactttttggagtttttcaaattttcgaaaaattccaaaatgc atcttcaagtgaaaattggaaattatatgaccaaacgctgatttcgggaaaaaaattcga aaaaatgtgaaaattttcttatgtccaaacgggctcttaaatgcgtcataacgtttgtgt ggttataaaagtctctcatctgaatagggtcacacaactaaaacagagagaacaaaataa ttcactaaaaaaaaattggaactagctacaaacttcgtcgcaagtctcgctaaatcgctc gtagctaatagaatttctagataatttgtttagcttgtagcatgaaatttttctatttag caacagaagtagtctgtcgctaattcctatttttttagtagaaagtattgtgaaattatt tgtttttctaaaggaccattttctttacaaatgaacagattgaagcagttaagaacttgg atggtaatgtagttctacagGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGT TGGCTGAAGATGAAGgtctcacattcatcccgcctttcgatcacatcttaaagatataca tgcagtatttctgcctgtagGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAA AAGGGTTGCTCCTCATACAAAGATTATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGAC ACAGTCTTTGTACCACGGAATGAGAGTAAAGTTAGAACAAGTTGATAATTTTGCAGATGG CgtagctgttgcactagTTAGTTGGTGAAGAAACTTTCCGTCTTTGCAAAGATTTAATAG ACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGgttagcacgcacc atctcctaatggtttcagatatgatccgtccaaccagccaaaattggttagaataggacg ggttgaactatcaacccaatcaatcacagcccaaataacatttatgtgggtatatgactc gcccatttattaactcaaccaattttggtccattcaaattcaggctaacccgtccacgtt tgacattcatactttagatgtggattaaagtaactttcttaaatttccctctggttttga catgtactagtttgtgtttgtgtgtgttttgttctttttttcaatagGATGTGTACGACA AAGGAAGGAACATATTAGAGACATCAGGTGCACTCGCCATAGCTGGAGCTGAAGCATACT GCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTAGTGGAGCCAATATGG ACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGGAAGCTC TGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTgtgc gttacttagagcacttaacaagcattttagccagagtttaagttatatacatcgtcgtca gtgtaagaaacttttataccgtcttgatggagtaaaaatttgttacactgacgtgtacat aacttaaaacttttttagttactatatgatactttctgtctaagaaactgaaatattgac ttgaattactggtgggacctatgattattaccgaattcaagtacagatataactctggaa gaaaacaagctctagttctgtacaggtaattaaagttctattcatttttagaggggatgt tggcttctcattttagatttgctttattagttgttaggaaaaaagaaattacttattaca ttcaatttttagATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGGAAG TTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG Thanks, Carson From: Barry Moore Date: Thursday, 20 June, 2013 9:29 PM To: Jingjing Jin Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] maker exon result To clarify things a bit Jin. Not every exon will have a start and/or stop codon only the fist coding exon will have a start and the last coding exon will have a stop. In the GFF3 format a coding exon is a feature of type 'CDS' (column 3) so only look at CDS features not at 'exon' features. For CDSs you must then concatenate the sequence I'd each CDS line for a given transcript (and reverse compliment the sequence if it is on the minus strand). The resulting sequence will usually (but not always) have start and stop codons at the beginning and end. B Barry Moore Research Scientist Dept. Human Genetics University of Utah On Jun 20, 2013, at 6:18 PM, "Jingjing Jin" wrote: > For my understanding, the prediction gene model should be connect different > exon together. > > For each exon of a gene, I think it should have a start codon and stop codon. > However, it may be wrong. > > However, when I check some gene model from maker prediction, some exon of one > gene, I cannot find stop codon for it. Like the example I give, the red color > is the first exon. However, the last 3 NT is not a stop codon. > > Even for last 3 NT for last exon, it is also not a stop codon. > > Is it reasonable? > > Thanks! > > Jingjing > > > > From: Daniel Ence [dence at genetics.utah.edu] > Sent: Thursday, June 20, 2013 7:06 PM > To: Jingjing Jin; maker-devel at yandell-lab.org > Subject: RE: maker exon result > > Hi Jingjing, > > It's really hard to find the stop codon in the nucleotide sequence that you > sent. I think most people determine the presence of a stop codon in a gene by > viewing the annotations and sequence in some kind of viewer. The one that I > use the most is Apollo, but many people also like gbrowse and igv. > > When you view gene models in Apollo, the start codons are highlighted in green > and the stop codons are highlighted in red. Sometimes MAKER couldn't find the > stop or start codon for a gene, and in those cases, the end of the gene model > is marked with an orange arrow. > > I hope that I understood your question. Feel free to reply back on the mailing > list if I didn't. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing > Jin [jjin01 at mail.rockefeller.edu] > Sent: Thursday, June 20, 2013 2:22 PM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] maker exon result > > Dear all, > > I have used maker to predict the gene model in my draft genome. > > However, when I check the sequence for each exon, I find some of them just > have start codon, without stop codon. > > Is it reasonable for this? > > Like in this example: > > processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-proce > ssed_tobacco_genome_sequences_c33-snap-gene-0.9 > processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=ma > ker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_ > tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0 > |0|1|0.14|0.12|8|0|362 > processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . > + 0 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . > + 0 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . > + 0 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . > + 0 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . > + 2 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . > + 0 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . > + 0 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . > + 0 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > > > ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT > CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT > AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG > ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA > TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC > TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG > TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG > ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT > TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT > TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC > TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC > AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT > TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT > AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC > TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA > AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT > TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG > CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC > GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT > TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG > AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA > AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA > TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC > TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC > TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG > TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA > TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC > TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA > ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT > GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA > AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA > GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT > TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT > TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG > AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC > TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT > TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT > AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC > TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA > GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG > GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA > TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA > TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT > TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG > CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA > GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG > AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT > ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT > TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA > CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA > CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT > ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA > TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG > AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG > > > > > This is the sequence for this gene, the red color is for the first exon?? > > > However, for this exon, I cannot found the stop codon??? > > > I also find for some exon, there are several stop codon in one exon??? > > > Does anyone have the same problem with me? > > Or there is something wrong when I configure the maker file?? > > > Thanks! > > > Jingjing > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From amelia.ireland at gmod.org Sun Jun 23 20:15:37 2013 From: amelia.ireland at gmod.org (Amelia Ireland) Date: Sun, 23 Jun 2013 19:15:37 -0700 Subject: [maker-devel] Fwd: about running MAKER In-Reply-To: References: Message-ID: >From the GMOD helpdesk; please cc Lin, lin11 at cougars.csusm.edu. ---------- Forwarded message ---------- From: Yunxi Lin Date: Sun, Jun 23, 2013 at 4:14 PM Subject: about running MAKER To: "gmod-help at gmod.org" Hi I'm running a eukaryote project on our server. Because our server do not have the GUI, is that still work for MAKER? And our command already ran more than one month to try to generate the model use for the training of SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux server. Thank you. Sincerely, Lin -- Amelia Ireland GMOD Community Support http://gmod.org || @gmodproject -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Jun 24 07:05:27 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Jun 2013 09:05:27 -0400 Subject: [maker-devel] Fwd: about running MAKER In-Reply-To: Message-ID: Run time is dependent on the size of your evidence dataset, genome size, and number of processors you use. If you have a large genome (Gb size) and you are running on a single cpu then that could take a long time. This is especially true if you use the alt_est option for evidence as these are aligned via tblastx which is 3-4 times slower than protein alignments, and 10-20 time slower than standard EST alignments. 95% of MAKER's runtime is BLAST alignment so your evidence dataset is the major factor. Also you do not need results from the entire genome to train SNAP. If you get results from ~10Mb of the genome that is usually sufficient. Also make sure you are taking advantage of parallelization. Launch via MPI to get maximum performance. I commonly launch on 16 and 32 cpu Linux servers which can annotate most fungal genomes in a few hours and larger genomes in a few days. --Carson From: Amelia Ireland Date: Sunday, 23 June, 2013 10:15 PM To: Cc: Subject: [maker-devel] Fwd: about running MAKER >From the GMOD helpdesk; please cc Lin, lin11 at cougars.csusm.edu. ---------- Forwarded message ---------- From: Yunxi Lin Date: Sun, Jun 23, 2013 at 4:14 PM Subject: about running MAKER To: "gmod-help at gmod.org" Hi I'm running a eukaryote project on our server. Because our server do not have the GUI, is that still work for MAKER? And our command already ran more than one month to try to generate the model use for the training of SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux server. Thank you. Sincerely, Lin -- Amelia Ireland GMOD Community Support http://gmod.org || @gmodproject _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Carson.Holt at oicr.on.ca Mon Jun 24 18:39:08 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Tue, 25 Jun 2013 00:39:08 +0000 Subject: [maker-devel] Fwd: about running MAKER In-Reply-To: Message-ID: You are most likely only getting 1 cpu of performance. You should just install MPICH2. It's easy just to let MAKER do it for you: Go to the ?/maker/src/ directory Run './Build mpich2' Once it finishes installing, it will be in the ?/maker/exe/mpich2/bin/ directory. Setup MAKER again to use MPICH2: Go to the ?/maker/src/ directory Run 'perl Build.PL' Say yes to the "use MPI": question Run './Build install' Now run MAKER via 'mpiexec'. Example --> ?/maker/exe/mpich2/bin/mpiexec -n 16 maker The ?n flag specifies how many CPUS to use. Mpiexec handles process communication either on the same machine or across machines. You will get much better performance. Thanks, Carson From: Yunxi Lin > Date: Monday, 24 June, 2013 7:11 PM To: Carson Holt > Cc: Amelia Ireland >, > Subject: Re: [maker-devel] Fwd: about running MAKER Hi Carson Thank your for your help. My genome estimated size is 250M base pairs. I ran it in 16cpu, but we don't have the MPI so I cannot use it. I don't think I'm using the alt_est option. I was following the tutorial to do that. I used TopHat and Cufflinks to generate the ESTs from the assembly sequence based on RNA-seq. I used that ESTs to run the MAKER. I think I already got more than 10Mb data. The information you mentioned is very helpful. I may go to use them to try to train the SNAP and Augustus. Because this is my first time using the MAKER, I ran already a month, I was wondering maybe the command I used in a wrong way. Sincerely, Yunxi 2013/6/24 Carson Holt > Run time is dependent on the size of your evidence dataset, genome size, and number of processors you use. If you have a large genome (Gb size) and you are running on a single cpu then that could take a long time. This is especially true if you use the alt_est option for evidence as these are aligned via tblastx which is 3-4 times slower than protein alignments, and 10-20 time slower than standard EST alignments. 95% of MAKER's runtime is BLAST alignment so your evidence dataset is the major factor. Also you do not need results from the entire genome to train SNAP. If you get results from ~10Mb of the genome that is usually sufficient. Also make sure you are taking advantage of parallelization. Launch via MPI to get maximum performance. I commonly launch on 16 and 32 cpu Linux servers which can annotate most fungal genomes in a few hours and larger genomes in a few days. --Carson From: Amelia Ireland > Date: Sunday, 23 June, 2013 10:15 PM To: > Cc: > Subject: [maker-devel] Fwd: about running MAKER >From the GMOD helpdesk; please cc Lin, lin11 at cougars.csusm.edu. ---------- Forwarded message ---------- From: Yunxi Lin > Date: Sun, Jun 23, 2013 at 4:14 PM Subject: about running MAKER To: "gmod-help at gmod.org" > Hi I'm running a eukaryote project on our server. Because our server do not have the GUI, is that still work for MAKER? And our command already ran more than one month to try to generate the model use for the training of SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux server. Thank you. Sincerely, Lin -- Amelia Ireland GMOD Community Support http://gmod.org || @gmodproject _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From lin11 at cougars.csusm.edu Mon Jun 24 17:11:23 2013 From: lin11 at cougars.csusm.edu (Yunxi Lin) Date: Mon, 24 Jun 2013 16:11:23 -0700 Subject: [maker-devel] Fwd: about running MAKER In-Reply-To: References: Message-ID: Hi Carson Thank your for your help. My genome estimated size is 250M base pairs. I ran it in 16cpu, but we don't have the MPI so I cannot use it. I don't think I'm using the alt_est option. I was following the tutorial to do that. I used TopHat and Cufflinks to generate the ESTs from the assembly sequence based on RNA-seq. I used that ESTs to run the MAKER. I think I already got more than 10Mb data. The information you mentioned is very helpful. I may go to use them to try to train the SNAP and Augustus. Because this is my first time using the MAKER, I ran already a month, I was wondering maybe the command I used in a wrong way. Sincerely, Yunxi 2013/6/24 Carson Holt > Run time is dependent on the size of your evidence dataset, genome size, > and number of processors you use. If you have a large genome (Gb size) and > you are running on a single cpu then that could take a long time. This is > especially true if you use the alt_est option for evidence as these are > aligned via tblastx which is 3-4 times slower than protein alignments, and > 10-20 time slower than standard EST alignments. 95% of MAKER's runtime is > BLAST alignment so your evidence dataset is the major factor. > > Also you do not need results from the entire genome to train SNAP. If you > get results from ~10Mb of the genome that is usually sufficient. Also make > sure you are taking advantage of parallelization. Launch via MPI to get > maximum performance. I commonly launch on 16 and 32 cpu Linux servers > which can annotate most fungal genomes in a few hours and larger genomes in > a few days. > > --Carson > > > From: Amelia Ireland > Date: Sunday, 23 June, 2013 10:15 PM > To: > Cc: > Subject: [maker-devel] Fwd: about running MAKER > > From the GMOD helpdesk; please cc Lin, lin11 at cougars.csusm.edu. > > ---------- Forwarded message ---------- > From: Yunxi Lin > Date: Sun, Jun 23, 2013 at 4:14 PM > Subject: about running MAKER > To: "gmod-help at gmod.org" > > > Hi > > I'm running a eukaryote project on our server. Because our server do not > have the GUI, is that still work for MAKER? And our command already ran > more than one month to try to generate the model use for the training of > SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux > server. > > Thank you. > > Sincerely, > Lin > > > > -- > Amelia Ireland > GMOD Community Support > http://gmod.org || @gmodproject > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Jun 25 08:12:45 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 25 Jun 2013 14:12:45 +0000 Subject: [maker-devel] Fwd: about running MAKER In-Reply-To: References: , Message-ID: Hi Yunxi, During the maker installation, there is an option to automatically install MPICH2, which would let you run maker parallelized. Try rerunning the perl Build.PL script in the "maker/src" directory, and when the option to install MPICH2 comes up, tell it yes. This will start an automated download and install onto your server. You can also start more than one maker process. They will work on annotating the genome together. You can start as many as ten or more processes like this, but MPI is a better parallelizing option. Hope that helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Yunxi Lin [lin11 at cougars.csusm.edu] Sent: Monday, June 24, 2013 5:11 PM To: Carson Holt Cc: maker-devel at yandell-lab.org; Amelia Ireland Subject: Re: [maker-devel] Fwd: about running MAKER Hi Carson Thank your for your help. My genome estimated size is 250M base pairs. I ran it in 16cpu, but we don't have the MPI so I cannot use it. I don't think I'm using the alt_est option. I was following the tutorial to do that. I used TopHat and Cufflinks to generate the ESTs from the assembly sequence based on RNA-seq. I used that ESTs to run the MAKER. I think I already got more than 10Mb data. The information you mentioned is very helpful. I may go to use them to try to train the SNAP and Augustus. Because this is my first time using the MAKER, I ran already a month, I was wondering maybe the command I used in a wrong way. Sincerely, Yunxi 2013/6/24 Carson Holt > Run time is dependent on the size of your evidence dataset, genome size, and number of processors you use. If you have a large genome (Gb size) and you are running on a single cpu then that could take a long time. This is especially true if you use the alt_est option for evidence as these are aligned via tblastx which is 3-4 times slower than protein alignments, and 10-20 time slower than standard EST alignments. 95% of MAKER's runtime is BLAST alignment so your evidence dataset is the major factor. Also you do not need results from the entire genome to train SNAP. If you get results from ~10Mb of the genome that is usually sufficient. Also make sure you are taking advantage of parallelization. Launch via MPI to get maximum performance. I commonly launch on 16 and 32 cpu Linux servers which can annotate most fungal genomes in a few hours and larger genomes in a few days. --Carson From: Amelia Ireland > Date: Sunday, 23 June, 2013 10:15 PM To: > Cc: > Subject: [maker-devel] Fwd: about running MAKER >From the GMOD helpdesk; please cc Lin, lin11 at cougars.csusm.edu. ---------- Forwarded message ---------- From: Yunxi Lin > Date: Sun, Jun 23, 2013 at 4:14 PM Subject: about running MAKER To: "gmod-help at gmod.org" > Hi I'm running a eukaryote project on our server. Because our server do not have the GUI, is that still work for MAKER? And our command already ran more than one month to try to generate the model use for the training of SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux server. Thank you. Sincerely, Lin -- Amelia Ireland GMOD Community Support http://gmod.org || @gmodproject _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Carson.Holt at oicr.on.ca Tue Jun 25 09:56:22 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Tue, 25 Jun 2013 15:56:22 +0000 Subject: [maker-devel] Fwd: about running MAKER In-Reply-To: <9FC132E2-9E59-42E9-ADBA-FD91644E2124@cougars.csusm.edu> Message-ID: You can get blast to use more than 1 cpu via the cpus= option, but that is still significantly limiting MAKER's performance. When you let MAKER install MPICH2, it will be local to the MAKER installation (MAKER only). It will be in ?/maker/exe/mpich2. This was purposely done for people who have limited access and install MAKER themselves, so they can run via MPI without having to get upgraded privileges. So I don't know if you installed MAKER yourself, but if you did, then this is an option that will let you run. --Carson From: csusm > Date: Tuesday, 25 June, 2013 11:40 AM To: Carson Holt > Subject: Re: [maker-devel] Fwd: about running MAKER Hi Carson Thank you for your suggestion. Do you mean if I dont use MPI, i could only run it on one cpu? Because my school own the server, I only have the limit authorization. Yunxi Lin On Jun 24, 2013, at 5:39 PM, Carson Holt > wrote: You are most likely only getting 1 cpu of performance. You should just install MPICH2. It's easy just to let MAKER do it for you: Go to the ?/maker/src/ directory Run './Build mpich2' Once it finishes installing, it will be in the ?/maker/exe/mpich2/bin/ directory. Setup MAKER again to use MPICH2: Go to the ?/maker/src/ directory Run 'perl Build.PL' Say yes to the "use MPI": question Run './Build install' Now run MAKER via 'mpiexec'. Example --> ?/maker/exe/mpich2/bin/mpiexec -n 16 maker The ?n flag specifies how many CPUS to use. Mpiexec handles process communication either on the same machine or across machines. You will get much better performance. Thanks, Carson From: Yunxi Lin > Date: Monday, 24 June, 2013 7:11 PM To: Carson Holt > Cc: Amelia Ireland >, > Subject: Re: [maker-devel] Fwd: about running MAKER Hi Carson Thank your for your help. My genome estimated size is 250M base pairs. I ran it in 16cpu, but we don't have the MPI so I cannot use it. I don't think I'm using the alt_est option. I was following the tutorial to do that. I used TopHat and Cufflinks to generate the ESTs from the assembly sequence based on RNA-seq. I used that ESTs to run the MAKER. I think I already got more than 10Mb data. The information you mentioned is very helpful. I may go to use them to try to train the SNAP and Augustus. Because this is my first time using the MAKER, I ran already a month, I was wondering maybe the command I used in a wrong way. Sincerely, Yunxi 2013/6/24 Carson Holt > Run time is dependent on the size of your evidence dataset, genome size, and number of processors you use. If you have a large genome (Gb size) and you are running on a single cpu then that could take a long time. This is especially true if you use the alt_est option for evidence as these are aligned via tblastx which is 3-4 times slower than protein alignments, and 10-20 time slower than standard EST alignments. 95% of MAKER's runtime is BLAST alignment so your evidence dataset is the major factor. Also you do not need results from the entire genome to train SNAP. If you get results from ~10Mb of the genome that is usually sufficient. Also make sure you are taking advantage of parallelization. Launch via MPI to get maximum performance. I commonly launch on 16 and 32 cpu Linux servers which can annotate most fungal genomes in a few hours and larger genomes in a few days. --Carson From: Amelia Ireland > Date: Sunday, 23 June, 2013 10:15 PM To: > Cc: > Subject: [maker-devel] Fwd: about running MAKER >From the GMOD helpdesk; please cc Lin, lin11 at cougars.csusm.edu. ---------- Forwarded message ---------- From: Yunxi Lin > Date: Sun, Jun 23, 2013 at 4:14 PM Subject: about running MAKER To: "gmod-help at gmod.org" > Hi I'm running a eukaryote project on our server. Because our server do not have the GUI, is that still work for MAKER? And our command already ran more than one month to try to generate the model use for the training of SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux server. Thank you. Sincerely, Lin -- Amelia Ireland GMOD Community Support http://gmod.org || @gmodproject _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Tue Jun 25 15:13:53 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Tue, 25 Jun 2013 21:13:53 +0000 Subject: [maker-devel] start position for some genes results Message-ID: Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062 maker gene -1 507 . - . ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jun 25 16:55:11 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Jun 2013 18:55:11 -0400 Subject: [maker-devel] start position for some genes results In-Reply-To: Message-ID: What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062makergene-1507.-.ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-sn ap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Tue Jun 25 17:00:37 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Tue, 25 Jun 2013 23:00:37 +0000 Subject: [maker-devel] start position for some genes results In-Reply-To: References: , Message-ID: Sorry, I have checked. I think it is old version:2.27. I will try the new one. Thanks! Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062 maker gene -1 507 . - . ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Tue Jun 25 18:53:01 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Wed, 26 Jun 2013 00:53:01 +0000 Subject: [maker-devel] start position for some genes results In-Reply-To: References: , Message-ID: Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062 maker gene -1 507 . - . ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jun 25 18:55:54 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Jun 2013 20:55:54 -0400 Subject: [maker-devel] start position for some genes results In-Reply-To: Message-ID: Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062makergene-1507.-.ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-sn ap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Tue Jun 25 19:30:09 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Wed, 26 Jun 2013 01:30:09 +0000 Subject: [maker-devel] start position for some genes results In-Reply-To: References: , Message-ID: Dear Carson, I am so sorry. The problem is still here. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', 'HASH(0x4e10810)', 0) called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 73 Process::MpiTiers::__ANON__() called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 eval {...} called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, 'Process::MpiChunk') called at /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 --> rank=NA, hostname=ChuaServer1 ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `seq_id` does not exist in MpiTier object ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 8:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062 maker gene -1 507 . - . ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jun 25 19:47:10 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Jun 2013 21:47:10 -0400 Subject: [maker-devel] start position for some genes results In-Reply-To: Message-ID: Could you check for this sequence in your input genome file for "processed_tobacco_genome_sequences_c1", make sure that it is in fact that exact name, and there are no ':' characters in the name because they can confuse the bioperl fasta indexer. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 9:30 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Dear Carson, I am so sorry. The problem is still here. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', 'HASH(0x4e10810)', 0) called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 73 Process::MpiTiers::__ANON__() called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 eval {...} called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, 'Process::MpiChunk') called at /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 --> rank=NA, hostname=ChuaServer1 ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `seq_id` does not exist in MpiTier object From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 8:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062makergene-1507.-.ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-sn ap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Tue Jun 25 19:53:33 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Wed, 26 Jun 2013 01:53:33 +0000 Subject: [maker-devel] start position for some genes results In-Reply-To: References: , Message-ID: Yes, this is the real name. There is also no ":" in the name. Because I have use the same file for maker.2.27 and have no problem. I am not sure what is wrong with the new version. Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 9:47 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Could you check for this sequence in your input genome file for "processed_tobacco_genome_sequences_c1", make sure that it is in fact that exact name, and there are no ':' characters in the name because they can confuse the bioperl fasta indexer. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 9:30 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Dear Carson, I am so sorry. The problem is still here. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', 'HASH(0x4e10810)', 0) called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 73 Process::MpiTiers::__ANON__() called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 eval {...} called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, 'Process::MpiChunk') called at /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 --> rank=NA, hostname=ChuaServer1 ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `seq_id` does not exist in MpiTier object ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 8:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062 maker gene -1 507 . - . ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jun 25 20:02:51 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Jun 2013 22:02:51 -0400 Subject: [maker-devel] start position for some genes results In-Reply-To: Message-ID: The point of the failure you are seeing is occurring in the initialization stage, before reaching any of the changes that would have been introduced by 2.28. Try running the test data that comes with MAKER, does it fail as well? --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 9:53 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Yes, this is the real name. There is also no ":" in the name. Because I have use the same file for maker.2.27 and have no problem. I am not sure what is wrong with the new version. Jingjing From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 9:47 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Could you check for this sequence in your input genome file for "processed_tobacco_genome_sequences_c1", make sure that it is in fact that exact name, and there are no ':' characters in the name because they can confuse the bioperl fasta indexer. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 9:30 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Dear Carson, I am so sorry. The problem is still here. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', 'HASH(0x4e10810)', 0) called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 73 Process::MpiTiers::__ANON__() called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 eval {...} called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, 'Process::MpiChunk') called at /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 --> rank=NA, hostname=ChuaServer1 ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `seq_id` does not exist in MpiTier object From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 8:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062makergene-1507.-.ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-sn ap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Tue Jun 25 20:15:46 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Wed, 26 Jun 2013 02:15:46 +0000 Subject: [maker-devel] start position for some genes results In-Reply-To: References: , Message-ID: Yes, it also fails on test data. jingjing at ChuaServer1:~/software/maker.2.28/maker/data/example$ ../../bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/software/maker.2.28/maker/data/example/dpp_contig.maker.output/dpp_contig_datastore To access files for individual sequences use the datastore index: /home/jingjing/software/maker.2.28/maker/data/example/dpp_contig.maker.output/dpp_contig_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >contig-dpp-500-500, trying to re-index the fasta. stop here: contig-dpp-500-500 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 10:02 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results The point of the failure you are seeing is occurring in the initialization stage, before reaching any of the changes that would have been introduced by 2.28. Try running the test data that comes with MAKER, does it fail as well? --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 9:53 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Yes, this is the real name. There is also no ":" in the name. Because I have use the same file for maker.2.27 and have no problem. I am not sure what is wrong with the new version. Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 9:47 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Could you check for this sequence in your input genome file for "processed_tobacco_genome_sequences_c1", make sure that it is in fact that exact name, and there are no ':' characters in the name because they can confuse the bioperl fasta indexer. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 9:30 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Dear Carson, I am so sorry. The problem is still here. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', 'HASH(0x4e10810)', 0) called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 73 Process::MpiTiers::__ANON__() called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 eval {...} called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, 'Process::MpiChunk') called at /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 --> rank=NA, hostname=ChuaServer1 ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `seq_id` does not exist in MpiTier object ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 8:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062 maker gene -1 507 . - . ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jun 26 05:49:11 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Jun 2013 07:49:11 -0400 Subject: [maker-devel] start position for some genes results In-Reply-To: Message-ID: I thought as much. There is something wrong with the installation itself. Could you run maker with the --debug flag and kill it after 30 seconds. Capture the STDERR and send it to me. This is just to check prerequisite that are installed on your system for know incompatabilities. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 10:15 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Yes, it also fails on test data. jingjing at ChuaServer1:~/software/maker.2.28/maker/data/example$ ../../bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/software/maker.2.28/maker/data/example/dpp_contig.maker.outpu t/dpp_contig_datastore To access files for individual sequences use the datastore index: /home/jingjing/software/maker.2.28/maker/data/example/dpp_contig.maker.outpu t/dpp_contig_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >contig-dpp-500-500, trying to re-index the fasta. stop here: contig-dpp-500-500 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 10:02 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results The point of the failure you are seeing is occurring in the initialization stage, before reaching any of the changes that would have been introduced by 2.28. Try running the test data that comes with MAKER, does it fail as well? --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 9:53 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Yes, this is the real name. There is also no ":" in the name. Because I have use the same file for maker.2.27 and have no problem. I am not sure what is wrong with the new version. Jingjing From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 9:47 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Could you check for this sequence in your input genome file for "processed_tobacco_genome_sequences_c1", make sure that it is in fact that exact name, and there are no ':' characters in the name because they can confuse the bioperl fasta indexer. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 9:30 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Dear Carson, I am so sorry. The problem is still here. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', 'HASH(0x4e10810)', 0) called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 73 Process::MpiTiers::__ANON__() called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 eval {...} called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, 'Process::MpiChunk') called at /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 --> rank=NA, hostname=ChuaServer1 ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `seq_id` does not exist in MpiTier object From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 8:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062makergene-1507.-.ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-sn ap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michel.moser at ips.unibe.ch Thu Jun 27 07:33:15 2013 From: michel.moser at ips.unibe.ch (michel.moser at ips.unibe.ch) Date: Thu, 27 Jun 2013 13:33:15 +0000 Subject: [maker-devel] spliting genome for annotation Message-ID: Dear Maker-developers If i understood correctly, in order to increase speed and reduce needed resources one can split the genome into chunks and annotate each chunk separately. (i would really like to use that as i am working with a 1.2 Gbasepair draftgenome and cant use MPI on the computing cluster) I am a bit worried about how this might affect the annotation as the gene-predictor would get trained quite differently for each chunk, right? Or is there communication between the chunks using the -base function of maker? Could you maybe name some pros and cons of splitting your genome for the annotation with maker? Thank you very much, Michel ________________________________________ Von: Moser, Michel (IPS) Gesendet: Donnerstag, 27. Juni 2013 15:24 An: Carson Holt Betreff: AW: [maker-devel] start position for some genes results ________________________________________ Von: maker-devel [maker-devel-bounces at yandell-lab.org]" im Auftrag von "Carson Holt [carsonhh at gmail.com] Gesendet: Mittwoch, 26. Juni 2013 04:02 An: Jingjing Jin; maker-devel at yandell-lab.org Betreff: Re: [maker-devel] start position for some genes results The point of the failure you are seeing is occurring in the initialization stage, before reaching any of the changes that would have been introduced by 2.28. Try running the test data that comes with MAKER, does it fail as well? --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 9:53 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Yes, this is the real name. There is also no ":" in the name. Because I have use the same file for maker.2.27 and have no problem. I am not sure what is wrong with the new version. Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 9:47 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Could you check for this sequence in your input genome file for "processed_tobacco_genome_sequences_c1", make sure that it is in fact that exact name, and there are no ':' characters in the name because they can confuse the bioperl fasta indexer. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 9:30 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Dear Carson, I am so sorry. The problem is still here. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', 'HASH(0x4e10810)', 0) called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 73 Process::MpiTiers::__ANON__() called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 eval {...} called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, 'Process::MpiChunk') called at /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 --> rank=NA, hostname=ChuaServer1 ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `seq_id` does not exist in MpiTier object ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 8:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062 maker gene -1 507 . - . ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From lawson at ebi.ac.uk Thu Jun 27 07:37:10 2013 From: lawson at ebi.ac.uk (Daniel Lawson) Date: Thu, 27 Jun 2013 14:37:10 +0100 Subject: [maker-devel] spliting genome for annotation In-Reply-To: References: Message-ID: Michel, It is about the size of your scaffolds rather than the whole genome. Presumably you don't have 1.2 Gb of contiguous sequence. If you have long scaffolds then the compute time will be constrained by the time taken to process the largest scaffold. regards Dan On 27 June 2013 14:33, wrote: > Dear Maker-developers > > If i understood correctly, in order to increase speed and reduce needed > resources one can split the genome into chunks and annotate each chunk > separately. > (i would really like to use that as i am working with a 1.2 Gbasepair > draftgenome and cant use MPI on the computing cluster) > I am a bit worried about how this might affect the annotation as the > gene-predictor would get trained quite differently for each chunk, right? > Or is there communication between the chunks using the -base function of > maker? > > Could you maybe name some pros and cons of splitting your genome for the > annotation with maker? > > Thank you very much, > Michel > > > > > ________________________________________ > Von: Moser, Michel (IPS) > Gesendet: Donnerstag, 27. Juni 2013 15:24 > An: Carson Holt > Betreff: AW: [maker-devel] start position for some genes results > > ________________________________________ > Von: maker-devel [maker-devel-bounces at yandell-lab.org]" im Auftrag > von "Carson Holt [carsonhh at gmail.com] > Gesendet: Mittwoch, 26. Juni 2013 04:02 > An: Jingjing Jin; maker-devel at yandell-lab.org > Betreff: Re: [maker-devel] start position for some genes results > > The point of the failure you are seeing is occurring in the initialization > stage, before reaching any of the changes that would have been introduced > by 2.28. Try running the test data that comes with MAKER, does it fail as > well? > > --Carson > > > > From: Jingjing Jin jjin01 at mail.rockefeller.edu>> > Date: Tuesday, 25 June, 2013 9:53 PM > To: Carson Holt >, " > maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org> > Subject: RE: [maker-devel] start position for some genes results > > Yes, this is the real name. > > There is also no ":" in the name. > > Because I have use the same file for maker.2.27 and have no problem. > > I am not sure what is wrong with the new version. > > Jingjing > > > ________________________________ > From: Carson Holt [carsonhh at gmail.com] > Sent: Tuesday, June 25, 2013 9:47 PM > To: Jingjing Jin; maker-devel at yandell-lab.org maker-devel at yandell-lab.org> > Subject: Re: [maker-devel] start position for some genes results > > Could you check for this sequence in your input genome file for > "processed_tobacco_genome_sequences_c1", make sure that it is in fact that > exact name, and there are no ':' characters in the name because they can > confuse the bioperl fasta indexer. > > --Carson > > > From: Jingjing Jin jjin01 at mail.rockefeller.edu>> > Date: Tuesday, 25 June, 2013 9:30 PM > To: Carson Holt >, " > maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org> > Subject: RE: [maker-devel] start position for some genes results > > Dear Carson, > > > I am so sorry. The problem is still here. > > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore > > To access files for individual sequences use the datastore index: > > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log > > STATUS: Now running MAKER... > WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to > re-index the fasta. > stop here: processed_tobacco_genome_sequences_c1 > ERROR: Fasta index error > at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm > line 239. > Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', > 'HASH(0x4e10810)', 0) called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm > line 73 > Process::MpiTiers::__ANON__() called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 > eval {...} called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 > Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm > line 79 > Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') > called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm > line 56 > Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, > 'Process::MpiChunk') called at > /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 > --> rank=NA, hostname=ChuaServer1 > ERROR: Failed in tier preparation > WARNING: You must always set a rank before running MpiTiers > FATAL: argument `seq_id` does not exist in MpiTier object > > ________________________________ > From: Carson Holt [carsonhh at gmail.com] > Sent: Tuesday, June 25, 2013 8:55 PM > To: Jingjing Jin; maker-devel at yandell-lab.org maker-devel at yandell-lab.org> > Subject: Re: [maker-devel] start position for some genes results > > Delete the mpi_blastdb directory before starting, to make sure all indexes > get rebuilt. Also make sure you are not setting TMP= to a network mounted > location. > > --Carson > > > From: Jingjing Jin jjin01 at mail.rockefeller.edu>> > Date: Tuesday, 25 June, 2013 8:53 PM > To: Carson Holt >, " > maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org> > Subject: RE: [maker-devel] start position for some genes results > > Dear Carson, > > When I use the new version of maker, I have another problem like this: > > jingjing at ChuaServer1:~/project/$ > /home/jingjing/software/maker.2.28/maker/bin/./maker > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore > > To access files for individual sequences use the datastore index: > > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log > > STATUS: Now running MAKER... > WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to > re-index the fasta. > stop here: processed_tobacco_genome_sequences_c1 > ERROR: Fasta index error > > > Do you know how to fix this problem about new version? > > Thanks! > > Jingjing > > > > ________________________________ > From: Carson Holt [carsonhh at gmail.com] > Sent: Tuesday, June 25, 2013 6:55 PM > To: Jingjing Jin; maker-devel at yandell-lab.org maker-devel at yandell-lab.org> > Subject: Re: [maker-devel] start position for some genes results > > What MAKER version are you using? This should be fixed in the current > 2.28. It only happened under a very specific set of circumstances, but I > remember fixing it. So let me know if you are using 2.28. > > --Carson > > > > From: Jingjing Jin jjin01 at mail.rockefeller.edu>> > Date: Tuesday, 25 June, 2013 5:13 PM > To: "maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org> > Subject: [maker-devel] start position for some genes results > > Dear all, > > I find some strange things about location for my final result. > > Like for some start position of final gene model: > > c124062 maker gene -1 507 . - . > ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 > > > It start position is -1. > > Does someone know why the start position is -1? > > Is there something wrong? > > Thanks! > > Jingjing > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Ensembl Genomes | VectorBase | i5K insect genome initiative -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jun 27 09:42:26 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 27 Jun 2013 11:42:26 -0400 Subject: [maker-devel] spliting genome for annotation In-Reply-To: Message-ID: Correct. The level of splitting is going to be limited by the largest config. The largest config will then be your slowest job, but the total runtime will be based off how much splitting you can achieve. Splitting into 10 jobs and running them all simultaneously will make total run time 1/10 as long. You can use the ?base flag with MAKER to make all jobs write to the same directory. Use the ?g flag to specify a different input fasta file for each job (then they can all share the same control files). You will then need to run maker once using the original assembly fasta and the ?dsindex flag when all jobs complete to get MAKER to clean up the datastore log file (rebuilt to index all contigs). That only takes 2 minutes to run. You can use the fasta_tool utility that comes with MAKER to conveniently split the input assembly fasta. MAKER does not train the gene predictors for you, and the hints it gives are on a per gene basis, so splitting contigs has no affect on that. For initial training of gene predictors, run MAKER on about 10-30 Mb of your largest contigs and use either the protein2genome or est2genome prediction options to build gene models to train the predictors on. You will need to train Augustus or SNAP yourself using those models and their own documentation. If training SNAP, you can use maker2zff to convert for SNAPs training format. You can also use the tool CEGMA from Ian Korf's lab to train SNAP. Use the cegma2zff script that comes with MAKER to do the conversion for training input. If you have questions once you start training, just send them to the list. Thanks, Carson From: Daniel Lawson Date: Thursday, 27 June, 2013 9:37 AM To: Cc: Subject: Re: [maker-devel] spliting genome for annotation Michel, It is about the size of your scaffolds rather than the whole genome. Presumably you don't have 1.2 Gb of contiguous sequence. If you have long scaffolds then the compute time will be constrained by the time taken to process the largest scaffold. regards Dan On 27 June 2013 14:33, wrote: > Dear Maker-developers > > If i understood correctly, in order to increase speed and reduce needed > resources one can split the genome into chunks and annotate each chunk > separately. > (i would really like to use that as i am working with a 1.2 Gbasepair > draftgenome and cant use MPI on the computing cluster) > I am a bit worried about how this might affect the annotation as the > gene-predictor would get trained quite differently for each chunk, right? > Or is there communication between the chunks using the -base function of > maker? > > Could you maybe name some pros and cons of splitting your genome for the > annotation with maker? > > Thank you very much, > Michel > > > > > ________________________________________ > Von: Moser, Michel (IPS) > Gesendet: Donnerstag, 27. Juni 2013 15:24 > An: Carson Holt > Betreff: AW: [maker-devel] start position for some genes results > > ________________________________________ > Von: maker-devel [maker-devel-bounces at yandell-lab.org]" im Auftrag von > "Carson Holt [carsonhh at gmail.com] > Gesendet: Mittwoch, 26. Juni 2013 04:02 > An: Jingjing Jin; maker-devel at yandell-lab.org > Betreff: Re: [maker-devel] start position for some genes results > > The point of the failure you are seeing is occurring in the initialization > stage, before reaching any of the changes that would have been introduced by > 2.28. Try running the test data that comes with MAKER, does it fail as well? > > --Carson > > > > From: Jingjing Jin > > > Date: Tuesday, 25 June, 2013 9:53 PM > To: Carson Holt >, > "maker-devel at yandell-lab.org" > > > Subject: RE: [maker-devel] start position for some genes results > > Yes, this is the real name. > > There is also no ":" in the name. > > Because I have use the same file for maker.2.27 and have no problem. > > I am not sure what is wrong with the new version. > > Jingjing > > > ________________________________ > From: Carson Holt [carsonhh at gmail.com] > Sent: Tuesday, June 25, 2013 9:47 PM > To: Jingjing Jin; > maker-devel at yandell-lab.org > Subject: Re: [maker-devel] start position for some genes results > > Could you check for this sequence in your input genome file for > "processed_tobacco_genome_sequences_c1", make sure that it is in fact that > exact name, and there are no ':' characters in the name because they can > confuse the bioperl fasta indexer. > > --Carson > > > From: Jingjing Jin > > > Date: Tuesday, 25 June, 2013 9:30 PM > To: Carson Holt >, > "maker-devel at yandell-lab.org" > > > Subject: RE: [maker-devel] start position for some genes results > > Dear Carson, > > > I am so sorry. The problem is still here. > > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.ma > ker.output/tobacco_seq_1_datastore > > To access files for individual sequences use the datastore index: > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.ma > ker.output/tobacco_seq_1_master_datastore_index.log > > STATUS: Now running MAKER... > WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to > re-index the fasta. > stop here: processed_tobacco_genome_sequences_c1 > ERROR: Fasta index error > at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm > line 239. > Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', > 'HASH(0x4e10810)', 0) called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line > 73 > Process::MpiTiers::__ANON__() called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 > eval {...} called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 > Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line > 79 > Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') > called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line > 56 > Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, > 'Process::MpiChunk') called at > /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 > --> rank=NA, hostname=ChuaServer1 > ERROR: Failed in tier preparation > WARNING: You must always set a rank before running MpiTiers > FATAL: argument `seq_id` does not exist in MpiTier object > > ________________________________ > From: Carson Holt [carsonhh at gmail.com] > Sent: Tuesday, June 25, 2013 8:55 PM > To: Jingjing Jin; > maker-devel at yandell-lab.org > Subject: Re: [maker-devel] start position for some genes results > > Delete the mpi_blastdb directory before starting, to make sure all indexes get > rebuilt. Also make sure you are not setting TMP= to a network mounted > location. > > --Carson > > > From: Jingjing Jin > > > Date: Tuesday, 25 June, 2013 8:53 PM > To: Carson Holt >, > "maker-devel at yandell-lab.org" > > > Subject: RE: [maker-devel] start position for some genes results > > Dear Carson, > > When I use the new version of maker, I have another problem like this: > > jingjing at ChuaServer1:~/project/$ > /home/jingjing/software/maker.2.28/maker/bin/./maker > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.ma > ker.output/tobacco_seq_1_datastore > > To access files for individual sequences use the datastore index: > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.ma > ker.output/tobacco_seq_1_master_datastore_index.log > > STATUS: Now running MAKER... > WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to > re-index the fasta. > stop here: processed_tobacco_genome_sequences_c1 > ERROR: Fasta index error > > > Do you know how to fix this problem about new version? > > Thanks! > > Jingjing > > > > ________________________________ > From: Carson Holt [carsonhh at gmail.com] > Sent: Tuesday, June 25, 2013 6:55 PM > To: Jingjing Jin; > maker-devel at yandell-lab.org > Subject: Re: [maker-devel] start position for some genes results > > What MAKER version are you using? This should be fixed in the current 2.28. > It only happened under a very specific set of circumstances, but I remember > fixing it. So let me know if you are using 2.28. > > --Carson > > > > From: Jingjing Jin > > > Date: Tuesday, 25 June, 2013 5:13 PM > To: "maker-devel at yandell-lab.org" > > > Subject: [maker-devel] start position for some genes results > > Dear all, > > I find some strange things about location for my final result. > > Like for some start position of final gene model: > > c124062 maker gene -1 507 . - . > ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 > > > It start position is -1. > > Does someone know why the start position is -1? > > Is there something wrong? > > Thanks! > > Jingjing > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Ensembl Genomes | VectorBase | i5K insect genome initiative _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From benayoun at stanford.edu Thu Jun 27 18:47:29 2013 From: benayoun at stanford.edu (=?ISO-8859-1?Q?B=E9r=E9nice_Benayoun?=) Date: Thu, 27 Jun 2013 17:47:29 -0700 Subject: [maker-devel] Maker and mono-exonic genes ? In-Reply-To: References: Message-ID: Hi maker devel team, just wanted to say that retraining SNAP apparently fixed the problem (I modified the defaults and added "-min-intron 0" to the training everywhere relevant (default is 30bp, and must prevent single exon genes to be predicted). Thanks for your insights/help ! Berenice 2013/6/10 Carson Holt > One more note. The ESTs appear to be from multiple overlapping HSPs > (based on red line pattern in image). I'd have to see the actual GFF3 to > be sure, but if that is the case, then there probably isn't an ORF to work > with at that location on that strand (so SNAP can't call it). Possibly the > result of assembly error or a pseudogene. > > --Carson > > > > From: Daniel Ence > Date: Friday, 7 June, 2013 5:32 PM > To: B?r?nice Benayoun , " > maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Maker and mono-exonic genes ? > > Hi Berenice, Thank you for sending that screenshot and the maker_opts.log > file. Those are exactly what we need to understand how to expect MAKER to > perform. > > In looking at the screenshot, it doesn't look like any of the gene > predictors gave a prediction in this region. Uses the predictions from > ab-initio tools as a basis for models and considers models that are > supported by evidence. It won't by default create a model when there isn't > a prediction in the region. > > Can I ask which gene predictors you used and how they were trained? You > might consider training one or more of them on the specific evidence that > you expect to support these genes and then rerunning maker with the > retrained predictors. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ------------------------------ > *From:* maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of > B?r?nice Benayoun [benayoun at stanford.edu] > *Sent:* Friday, June 07, 2013 11:17 AM > *To:* maker-devel at yandell-lab.org > *Subject:* [maker-devel] Maker and mono-exonic genes ? > > Dear maker developers, > > I am currently annotating a de novo fish genome, and have started looking > for genes of interest in particular in Maker's output to verify that it's > outputting proper gene sets. > > While many of the genes I look for seem to be correctly annotated by the > pipeline, I have noticed that important genes that do have strong > evidentiary support but are monoexonic are NOT reported by maker. > > I am attaching a screenshot for the contig that I know should contain the > * Foxl2* gene (notoriously monoexonic across evolution), and highlighted > the corresponding evidence for it. > > Is there any setting I can give to maker to force it to output monoexonic > genes ? I already set "single_exon=1" with no success. I attached my config > file FYI. > > Thank you so much in advance for your answer !!! > > Best, > > Berenice. > -- > B?r?nice A. BENAYOUN, Ph.D. > Stanford University/Genetics Department > *BRUNET Laboratory*, 'Molecular Basis of Longevity and Age Related > Diseases' > M312 Alway Building > 300, Pasteur Drive > MC 5120 > Stanford, CA 94305-5120 > USA > Email: benayoun at stanford.edu > Web: www.stanford.edu/group/brunet/ > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- B?r?nice A. BENAYOUN, Ph.D. Stanford University/Genetics Department *BRUNET Laboratory*, 'Molecular Basis of Longevity and Age Related Diseases' M312 Alway Building 300, Pasteur Drive MC 5120 Stanford, CA 94305-5120 USA Email: benayoun at stanford.edu Web: www.stanford.edu/group/brunet/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jun 28 19:01:47 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 28 Jun 2013 21:01:47 -0400 Subject: [maker-devel] Maker and mono-exonic genes ? In-Reply-To: Message-ID: I'm glad it's working for you. Let us know if you run into additional problems. Thanks, Carson From: B?r?nice Benayoun Date: Thursday, June 27, 2013 8:47 PM To: Carson Holt Cc: Daniel Ence , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Maker and mono-exonic genes ? Hi maker devel team, just wanted to say that retraining SNAP apparently fixed the problem (I modified the defaults and added "-min-intron 0" to the training everywhere relevant (default is 30bp, and must prevent single exon genes to be predicted). Thanks for your insights/help ! Berenice 2013/6/10 Carson Holt > One more note. The ESTs appear to be from multiple overlapping HSPs (based on > red line pattern in image). I'd have to see the actual GFF3 to be sure, but > if that is the case, then there probably isn't an ORF to work with at that > location on that strand (so SNAP can't call it). Possibly the result of > assembly error or a pseudogene. > > --Carson > > > > From: Daniel Ence > Date: Friday, 7 June, 2013 5:32 PM > To: B?r?nice Benayoun , "maker-devel at yandell-lab.org" > > Subject: Re: [maker-devel] Maker and mono-exonic genes ? > > Hi Berenice, Thank you for sending that screenshot and the maker_opts.log > file. Those are exactly what we need to understand how to expect MAKER to > perform. > > In looking at the screenshot, it doesn't look like any of the gene predictors > gave a prediction in this region. Uses the predictions from ab-initio tools as > a basis for models and considers models that are supported by evidence. It > won't by default create a model when there isn't a prediction in the region. > > Can I ask which gene predictors you used and how they were trained? You might > consider training one or more of them on the specific evidence that you expect > to support these genes and then rerunning maker with the retrained predictors. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of B?r?nice > Benayoun [benayoun at stanford.edu] > Sent: Friday, June 07, 2013 11:17 AM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] Maker and mono-exonic genes ? > > Dear maker developers, > > I am currently annotating a de novo fish genome, and have started looking for > genes of interest in particular in Maker's output to verify that it's > outputting proper gene sets. > > While many of the genes I look for seem to be correctly annotated by the > pipeline, I have noticed that important genes that do have strong evidentiary > support but are monoexonic are NOT reported by maker. > > I am attaching a screenshot for the contig that I know should contain the > Foxl2 gene (notoriously monoexonic across evolution), and highlighted the > corresponding evidence for it. > > Is there any setting I can give to maker to force it to output monoexonic > genes ? I already set "single_exon=1" with no success. I attached my config > file FYI. > > Thank you so much in advance for your answer !!! > > Best, > > Berenice. > -- > B?r?nice A. BENAYOUN, Ph.D. > Stanford University/Genetics Department > BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' > M312 Alway Building > 300, Pasteur Drive > MC 5120 > Stanford, CA 94305-5120 > USA > Email: benayoun at stanford.edu > Web: www.stanford.edu/group/brunet/ > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -- B?r?nice A. BENAYOUN, Ph.D. Stanford University/Genetics Department BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' M312 Alway Building 300, Pasteur Drive MC 5120 Stanford, CA 94305-5120 USA Email: benayoun at stanford.edu Web: www.stanford.edu/group/brunet/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.stajich at gmail.com Sun Jun 2 12:28:50 2013 From: jason.stajich at gmail.com (Jason Stajich) Date: Sun, 2 Jun 2013 11:28:50 -0700 Subject: [maker-devel] getting protein sequences from genomes In-Reply-To: References: <18790D2A402432409BCC7E00F2AE8926ACE666@rexma.intranet.epfl.ch> <18790D2A402432409BCC7E00F2AE8926AD4807@REXMF.intranet.epfl.ch> <98C45AF6-8F3E-4C06-B283-56AD9C07DD2C@genetics.utah.edu> Message-ID: seems like in your case you want to do more of a liftover-based annotation. generate that and feed it as a gff file to maker if your intention is also gene discovery in your population? On May 23, 2013, at 9:48 AM, Daniel Hughes wrote: > would gene annotation by projection using synteny/WGA not be more appropriate? either way what's wrong with running one of the standard orthology predictions tools or just basic best reciprocal blast? > > dan. > > Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) > ------------------------------------------------------------------------------------- > dsth at cantab.net > dsth at cpan.org > > > 2013/5/23 Barry Moore > Hi Liciano, > > If I understand correctly you are including translations of SNAP and Augustus predictions as well as the predictions. If so, you don't want to do that. An overlapping protein evidence is sufficient to promote a prediction to an annotation, so by providing the protein translation of the prediction along with the prediction you will guarantee that every prediction will become an annotation and that means you lose the benefit of evidence supervised annotation that MAKER provides. Include the proteins from the D mel reference and if you want to cast a broader net include proteins from other dipterans or even Uniprot - just depend on how aggressive you want to try to be in capturing new annotations. > > B > > On May 23, 2013, at 8:41 AM, Luciano Abriata wrote: > >> Thanks for your reply! >> >> One more question, can you think of any tips to get the best possible predictions of protein sequences? >> >> I am asking because I am getting a few proteins that are too big to be real and don't exist if I blast them, plus a few others which don't start with Methionine... So far I am including transcripts and translations from flybase, and snap and augustus with their available trainings for flies. Do you see any possible source of error in that? >> >> Thanks again, >> >> Luciano >> >> De: Barry Moore [barry.moore at genetics.utah.edu] >> Enviado el: viernes, 17 de mayo de 2013 09:02 p.m. >> Para: Luciano Abriata >> Cc: maker-devel at yandell-lab.org >> Asunto: Re: [maker-devel] getting protein sequences from genomes >> >> >> On May 17, 2013, at 3:45 AM, Luciano Abriata wrote: >> >>> Hello, I am trying to use Maker to annotate genomes from different individuals of a population (D. melanogaster flies). >>> >>> My ultimate goal is to get, for each gene, the amino acid sequences of the coded proteins as they are expressed from each genome. My questions are: >>> >>> 1) How can I match proteins predicted for the same gene in two genomes? >> >> blastp tweaked with parameters to optimize near perfect match >> >>> >>> 2) What is the meaning of all the data in a line such as the following one (taken from the protein.fasta output) >>> >>> maker-2L-augustus-gene-0.19-mRNA-1 protein AED:0.0322873164323667 eAED:0.0322873164323667 QI:2|1|0.66|1|1|1|3|208|541 >>> >> >> AED = Annotation edit distance describes how closely the prediction matches the evidence. This is a distance measure and thus 0 is a perfect match and 1 is no overlap. >> >> eAED = Exon adjusted annotation edit distance: This metric is the same as AED with a couple of exceptions. For a protein coding exon to be counted as overlapping protein evidence the reading frame must be the same in the coding exon and the protein evidence. Second, when mRNA Seq data is used as evidence and both ends of an exon are supported with splice site spanning reads, the middle of that exon is counted as supported as well even if coverage drops off in the interior of the exon.. For the most part AED and eAED will always be the same, but eAED tends to work better on many fringe cases. >> >> QI values are as follows: >> >> 5' UTR Length >> Fraction of splice sites confirmed by EST alignment. >> Fraction of exons that overlap and EST alignment. >> Fraction of exons that overlap EST or protein alignment. >> Fraction of splice sites confirmed by an ab initio prediction. >> Fraction of exons that overlap an ab intitio prediction. >> Number of exons in the transcript. >> 3' UTR length. >> Length of encoded protein. >> >> >>> 3) If I include snap and augustus to improve protein predictions, I get several protein.fasta files: augustus_masked.proteins.fasta , snap_masked.proteins.fasta , non_overlapping_ab_initio.proteins.fasta , and proteins.fasta >>> >>> Which of these files contains the definite set of predicted protein sequences? >> >> The proteins.fasta file is the final set of proteins for all genes that MAKER created annotations for. >> >>> >>> >>> >>> Thanks in advance! >>> >>> Luciano >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> Barry Moore >> Research Scientist >> Dept. of Human Genetics >> University of Utah >> Salt Lake City, UT 84112 >> -------------------------------------------- >> (801) 585-3543 >> >> >> >> >> > > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Jason Stajich jason.stajich at gmail.com jason at bioperl.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Jun 3 07:04:08 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 03 Jun 2013 09:04:08 -0400 Subject: [maker-devel] Advice on params for ciliates In-Reply-To: <9D9882BB-3A26-45D6-A5B0-9B18F9BF5C31@hms.harvard.edu> Message-ID: I don't have any specific advice, but In general I always set blast_depth parameters in the maker_bopts file to 20 or 30 (faster runtimes). Also max_dna_len can be set to 2x higher if you have sufficient memory (3-4 Gb per cpu as opposed to 1-2 Gb that are assumed with the default). Other than that split_hit, pred_flank, and single_exon are the only ones I might change around. You sort of have to run on a few large contigs before deciding what to do with these parameters. split_hit --> set max intron size for alignments pred_flank --> affects clustering for gene dense organisms single_exon --> leave off unless you expect a lot of singel exon genes. --Carson From: "Freeman, Robert M." Date: Thursday, 23 May, 2013 4:17 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Advice on params for ciliates Dear MAKER community, Am embarking on updating models for a ciliate (taxa Ciliophora) and was wondering if folks had recommendations for MAKER parameters. Thanks, Bob ----------------------------------------------------- Bob Freeman, Ph.D. Acorn Worm Informatics, Kirschner lab Dept of Systems Biology, Alpert 524 Harvard Medical School 200 Longwood Avenue Boston, MA 02115 617/432.2294, vox "Sorry I'm late. Oh, God, that sounded insincere. I'm late." -- Karen Walker, from Will and Grace _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bob_Freeman at hms.harvard.edu Wed Jun 5 07:28:36 2013 From: Bob_Freeman at hms.harvard.edu (Bob Freeman) Date: Wed, 5 Jun 2013 09:28:36 -0400 Subject: [maker-devel] Advice on params for ciliates In-Reply-To: References: Message-ID: Thanks, Carson, for these helpful hints. (Separately, the other code did not work again on our cluster. Have been so swamped -- I'll get to the write-up next week. Have been using the 2.25beta binary and that works OK). Best, Bob On Jun 3, 2013, at 9:04 AM, Carson Holt wrote: > I don't have any specific advice, but In general I always set blast_depth parameters in the maker_bopts file to 20 or 30 (faster runtimes). Also max_dna_len can be set to 2x higher if you have sufficient memory (3-4 Gb per cpu as opposed to 1-2 Gb that are assumed with the default). > > Other than that split_hit, pred_flank, and single_exon are the only ones I might change around. You sort of have to run on a few large contigs before deciding what to do with these parameters. > > split_hit --> set max intron size for alignments > pred_flank --> affects clustering for gene dense organisms > single_exon --> leave off unless you expect a lot of singel exon genes. > > --Carson > > > From: "Freeman, Robert M." > Date: Thursday, 23 May, 2013 4:17 PM > To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] Advice on params for ciliates > > Dear MAKER community, > > Am embarking on updating models for a ciliate (taxa Ciliophora) and was wondering if folks had recommendations for MAKER parameters. > > Thanks, > Bob > > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org ----------------------------------------------------- Bob Freeman, Ph.D. Acorn Worm Informatics, Kirschner lab Dept of Systems Biology, Alpert 524 Harvard Medical School 200 Longwood Avenue Boston, MA 02115 617/432.2294, vox "Sorry I'm late. Oh, God, that sounded insincere. I'm late." -- Karen Walker, from Will and Grace -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Wed Jun 5 10:28:46 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Wed, 5 Jun 2013 11:28:46 -0500 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: References: Message-ID: I upgraded to 2.28 and Maker is not running. Thanks! On Wed, May 22, 2013 at 9:03 AM, Carson Holt wrote: > Are you using MAKER version 2.10? I ask because there is in issue with > other_gff in that version that has since been fixed. So if you don't get > other_gff to pass-through, you will need to upgrade to 2.28 (release date > is later today coincidentally). > > For the Augustus GFF3 file, the format is a little weird which is > causing the problem. They are mRNA features not attached to genes. Rather > than build the expected 3 level gene/mRNA/exon structure for these, it is > simpler just to convert it to the 2 level match/match_part structure. Just > convert the 'mRNA' tag to 'match' and all 'exon' tags to 'match_part'. > Rename the GFF3 when your done so that it will force rebuild of the GFF3 > database when you run again. > > Thanks, > Carson > > > > From: Innocent Onsongo > Date: Wednesday, 22 May, 2013 8:47 AM > To: Barry Moore > Cc: > Subject: Re: [maker-devel] Maker: Re-annotation > > No. The MAKER produced GFF3 file does not contain any annotations. I > even tried setting the keep_preds parameter to 1 (keep_preds=1) to see if > it will pass annotations from the Augustus produced GFF file into the final > annotation but that didn't work. I have attached the maker_opts.ctl file > I used together with the first 100 lines of the GFF files it's using. I > also include the GFF file produced by MAKER (CGS01058First100.gff) > > > > > On Tue, May 21, 2013 at 10:43 PM, Barry Moore wrote: > >> Hi Getiria, >> >> Does the MAKER produced GFF3 file contain any annotations at all? Can >> you send the first ~100 lines each of the MAKER produced GFF3 file and of >> the GFF3 files that you passed via maker_opts.ctl? >> >> B >> >> On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: >> >> Maker Development Team, >> >> I am trying to use Maker for re-annotation using gene predictions from >> Augustus. We had previously used Augustus for gene prediction but now want >> to combine these annotations with some EST data. I updated >> fields maker_opts.ctl as below >> >> genome=CGS01058.fasta #genome sequence file in fasta format >> est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT >> pred_gff=Augustus.gff3 #ab-initio predictions from >> other_gff=Promoters.gff3 #promoter annotations >> other_gff=CpG_Islands.gff3 # CpG island annotations >> >> Maker runs to completion and according to the log file annotation was >> successful. However, it also gives a "Segmentation fault (core dumped)" >> message. It does produce a GFF3 file but when I load the GFF3 file into IGV >> and look it does not contain any of the exon definitions in Augustus.gff3. >> Am I missing something? >> >> Regards, >> Getiria >> >> -- >> Getiria Onsongo, Ph.D. >> Informatics Analyst, Research Informatics Support System >> Minnesota Supercomputing Institute for Advanced Computational Research >> University of Minnesota >> Minneapolis, MN 55455 >> Phone: 612-624-0532 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> Barry Moore >> Research Scientist >> Dept. of Human Genetics >> University of Utah >> Salt Lake City, UT 84112 >> -------------------------------------------- >> (801) 585-3543 >> >> >> >> >> > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jun 5 08:30:20 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Jun 2013 10:30:20 -0400 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: Message-ID: What does it do? --Carson From: Innocent Onsongo Date: Wednesday, 5 June, 2013 12:28 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" , Barry Moore Subject: Re: [maker-devel] Maker: Re-annotation I upgraded to 2.28 and Maker is not running. Thanks! On Wed, May 22, 2013 at 9:03 AM, Carson Holt wrote: > Are you using MAKER version 2.10? I ask because there is in issue with > other_gff in that version that has since been fixed. So if you don't get > other_gff to pass-through, you will need to upgrade to 2.28 (release date is > later today coincidentally). > > For the Augustus GFF3 file, the format is a little weird which is causing the > problem. They are mRNA features not attached to genes. Rather than build the > expected 3 level gene/mRNA/exon structure for these, it is simpler just to > convert it to the 2 level match/match_part structure. Just convert the 'mRNA' > tag to 'match' and all 'exon' tags to 'match_part'. Rename the GFF3 when your > done so that it will force rebuild of the GFF3 database when you run again. > > Thanks, > Carson > > > > From: Innocent Onsongo > Date: Wednesday, 22 May, 2013 8:47 AM > To: Barry Moore > Cc: > Subject: Re: [maker-devel] Maker: Re-annotation > > No. The MAKER produced GFF3 file does not contain any annotations. I even > tried setting the keep_preds parameter to 1 (keep_preds=1) to see if it will > pass annotations from the Augustus produced GFF file into the final annotation > but that didn't work. I have attached the maker_opts.ctl file I used together > with the first 100 lines of the GFF files it's using. I also include the GFF > file produced by MAKER (CGS01058First100.gff) > > > > > On Tue, May 21, 2013 at 10:43 PM, Barry Moore wrote: >> Hi Getiria, >> >> Does the MAKER produced GFF3 file contain any annotations at all? Can you >> send the first ~100 lines each of the MAKER produced GFF3 file and of the >> GFF3 files that you passed via maker_opts.ctl? >> >> B >> >> On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: >> >>> Maker Development Team, >>> >>> I am trying to use Maker for re-annotation using gene predictions from >>> Augustus. We had previously used Augustus for gene prediction but now want >>> to combine these annotations with some EST data. I updated fields >>> maker_opts.ctl as below >>> >>> genome=CGS01058.fasta #genome sequence file in fasta format >>> est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT >>> pred_gff=Augustus.gff3 #ab-initio predictions from >>> other_gff=Promoters.gff3 #promoter annotations >>> other_gff=CpG_Islands.gff3 # CpG island annotations >>> >>> Maker runs to completion and according to the log file annotation was >>> successful. However, it also gives a "Segmentation fault (core dumped)" >>> message. It does produce a GFF3 file but when I load the GFF3 file into IGV >>> and look it does not contain any of the exon definitions in Augustus.gff3. >>> Am I missing something? >>> >>> Regards, >>> Getiria >>> >>> -- >>> Getiria Onsongo, Ph.D. >>> Informatics Analyst, Research Informatics Support System >>> Minnesota Supercomputing Institute for Advanced Computational Research >>> University of Minnesota >>> Minneapolis, MN 55455 >>> Phone: 612-624-0532 >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> Barry Moore >> Research Scientist >> Dept. of Human Genetics >> University of Utah >> Salt Lake City, UT 84112 >> -------------------------------------------- >> (801) 585-3543 >> >> >> >> > > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Wed Jun 5 10:35:43 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Wed, 5 Jun 2013 11:35:43 -0500 Subject: [maker-devel] Maker: accessory scripts Message-ID: I was able to successfully ran Maker and now want to converts the gene prediction match/match_part format to annotation gene/mRNA/exon/CDS format. I looked at the tutorial and the script gff3_preds2models is supposed to do this conversion. How do I access this script. It is not in /maker/2.28-beta/bin/ Also, in running gff3_preds2models is the file I used for pred_gff=? Long story short, how do I transform the GFF output from Maker to the more traditional annotation of exon/intron? Thanks, Getiria -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Wed Jun 5 10:37:01 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Wed, 5 Jun 2013 11:37:01 -0500 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: References: Message-ID: Oops! I meant to type Maker is NOW running. On Wed, Jun 5, 2013 at 9:30 AM, Carson Holt wrote: > What does it do? > > --Carson > > From: Innocent Onsongo > Date: Wednesday, 5 June, 2013 12:28 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" , Barry > Moore > > Subject: Re: [maker-devel] Maker: Re-annotation > > I upgraded to 2.28 and Maker is not running. Thanks! > > > On Wed, May 22, 2013 at 9:03 AM, Carson Holt wrote: > >> Are you using MAKER version 2.10? I ask because there is in issue with >> other_gff in that version that has since been fixed. So if you don't get >> other_gff to pass-through, you will need to upgrade to 2.28 (release date >> is later today coincidentally). >> >> For the Augustus GFF3 file, the format is a little weird which is causing >> the problem. They are mRNA features not attached to genes. Rather than >> build the expected 3 level gene/mRNA/exon structure for these, it is >> simpler just to convert it to the 2 level match/match_part structure. Just >> convert the 'mRNA' tag to 'match' and all 'exon' tags to 'match_part'. >> Rename the GFF3 when your done so that it will force rebuild of the GFF3 >> database when you run again. >> >> Thanks, >> Carson >> >> >> >> From: Innocent Onsongo >> Date: Wednesday, 22 May, 2013 8:47 AM >> To: Barry Moore >> Cc: >> Subject: Re: [maker-devel] Maker: Re-annotation >> >> No. The MAKER produced GFF3 file does not contain any annotations. I even >> tried setting the keep_preds parameter to 1 (keep_preds=1) to see if it >> will pass annotations from the Augustus produced GFF file into the final >> annotation but that didn't work. I have attached the maker_opts.ctl file >> I used together with the first 100 lines of the GFF files it's using. I >> also include the GFF file produced by MAKER (CGS01058First100.gff) >> >> >> >> >> On Tue, May 21, 2013 at 10:43 PM, Barry Moore wrote: >> >>> Hi Getiria, >>> >>> Does the MAKER produced GFF3 file contain any annotations at all? Can >>> you send the first ~100 lines each of the MAKER produced GFF3 file and of >>> the GFF3 files that you passed via maker_opts.ctl? >>> >>> B >>> >>> On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: >>> >>> Maker Development Team, >>> >>> I am trying to use Maker for re-annotation using gene predictions from >>> Augustus. We had previously used Augustus for gene prediction but now want >>> to combine these annotations with some EST data. I updated >>> fields maker_opts.ctl as below >>> >>> genome=CGS01058.fasta #genome sequence file in fasta format >>> est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT >>> pred_gff=Augustus.gff3 #ab-initio predictions from >>> other_gff=Promoters.gff3 #promoter annotations >>> other_gff=CpG_Islands.gff3 # CpG island annotations >>> >>> Maker runs to completion and according to the log file annotation was >>> successful. However, it also gives a "Segmentation fault (core dumped)" >>> message. It does produce a GFF3 file but when I load the GFF3 file into IGV >>> and look it does not contain any of the exon definitions in Augustus.gff3. >>> Am I missing something? >>> >>> Regards, >>> Getiria >>> >>> -- >>> Getiria Onsongo, Ph.D. >>> Informatics Analyst, Research Informatics Support System >>> Minnesota Supercomputing Institute for Advanced Computational Research >>> University of Minnesota >>> Minneapolis, MN 55455 >>> Phone: 612-624-0532 >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> Barry Moore >>> Research Scientist >>> Dept. of Human Genetics >>> University of Utah >>> Salt Lake City, UT 84112 >>> -------------------------------------------- >>> (801) 585-3543 >>> >>> >>> >>> >>> >> >> >> -- >> Getiria Onsongo, Ph.D. >> Informatics Analyst, Research Informatics Support System >> Minnesota Supercomputing Institute for Advanced Computational Research >> University of Minnesota >> Minneapolis, MN 55455 >> Phone: 612-624-0532 >> > > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Jun 5 10:38:59 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 5 Jun 2013 16:38:59 +0000 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: References: Message-ID: Hi Innocent, I'm just jumping in this conversation kind of late in the game, but if you look in the gff3 file that maker gave you, do you see any gene, exon, or CDS features in the output? When you give evidence (protein or EST) and ab-initio predictors to maker the default behavior is to create gene models. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] on behalf of Innocent Onsongo [onson001 at umn.edu] Sent: Wednesday, June 05, 2013 10:35 AM To: Carson Holt Cc: maker-devel at yandell-lab.org; Barry Moore Subject: [maker-devel] Maker: accessory scripts I was able to successfully ran Maker and now want to converts the gene prediction match/match_part format to annotation gene/mRNA/exon/CDS format. I looked at the tutorial and the script gff3_preds2models is supposed to do this conversion. How do I access this script. It is not in /maker/2.28-beta/bin/ Also, in running gff3_preds2models is the file I used for pred_gff=? Long story short, how do I transform the GFF output from Maker to the more traditional annotation of exon/intron? Thanks, Getiria -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jun 5 08:44:36 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Jun 2013 10:44:36 -0400 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: Message-ID: All maker gene annotations will be of the format gene/mRNA/exon/CDS. Anything in the format match/match_part is an evidence alignment or rejected model and is there for reference purposes. If you want to upgrade all of the rejected loci to gene annotations, set keep_preds=1 in the control files. If you want to upgrade a subset of rejected models to a full annotation, create a list of IDs (one per line) then give them to the attached script. gff3_preds2models was previously deprecated and no longer part of the maker distribution, but the attached script is an updated version with the same functionality. --Carson From: Innocent Onsongo Date: Wednesday, 5 June, 2013 12:35 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" , Barry Moore Subject: [maker-devel] Maker: accessory scripts I was able to successfully ran Maker and now want to converts the gene prediction match/match_part format to annotation gene/mRNA/exon/CDS format. I looked at the tutorial and the script gff3_preds2models is supposed to do this conversion. How do I access this script. It is not in /maker/2.28-beta/bin/ Also, in running gff3_preds2models is the file I used for pred_gff=? Long story short, how do I transform the GFF output from Maker to the more traditional annotation of exon/intron? Thanks, Getiria _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gff3_preds2models Type: application/octet-stream Size: 4778 bytes Desc: not available URL: From carsonhh at gmail.com Wed Jun 5 08:45:10 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Jun 2013 10:45:10 -0400 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: Message-ID: Gotcha :-) --Carson From: Innocent Onsongo Date: Wednesday, 5 June, 2013 12:37 PM To: Carson Holt Cc: Carson Holt , "maker-devel at yandell-lab.org" , Barry Moore Subject: Re: [maker-devel] Maker: Re-annotation Oops! I meant to type Maker is NOW running. On Wed, Jun 5, 2013 at 9:30 AM, Carson Holt wrote: > What does it do? > > --Carson > > From: Innocent Onsongo > Date: Wednesday, 5 June, 2013 12:28 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" , Barry Moore > > > Subject: Re: [maker-devel] Maker: Re-annotation > > I upgraded to 2.28 and Maker is not running. Thanks! > > > On Wed, May 22, 2013 at 9:03 AM, Carson Holt wrote: >> Are you using MAKER version 2.10? I ask because there is in issue with >> other_gff in that version that has since been fixed. So if you don't get >> other_gff to pass-through, you will need to upgrade to 2.28 (release date is >> later today coincidentally). >> >> For the Augustus GFF3 file, the format is a little weird which is causing the >> problem. They are mRNA features not attached to genes. Rather than build >> the expected 3 level gene/mRNA/exon structure for these, it is simpler just >> to convert it to the 2 level match/match_part structure. Just convert the >> 'mRNA' tag to 'match' and all 'exon' tags to 'match_part'. Rename the GFF3 >> when your done so that it will force rebuild of the GFF3 database when you >> run again. >> >> Thanks, >> Carson >> >> >> >> From: Innocent Onsongo >> Date: Wednesday, 22 May, 2013 8:47 AM >> To: Barry Moore >> Cc: >> Subject: Re: [maker-devel] Maker: Re-annotation >> >> No. The MAKER produced GFF3 file does not contain any annotations. I even >> tried setting the keep_preds parameter to 1 (keep_preds=1) to see if it will >> pass annotations from the Augustus produced GFF file into the final >> annotation but that didn't work. I have attached the maker_opts.ctl file I >> used together with the first 100 lines of the GFF files it's using. I also >> include the GFF file produced by MAKER (CGS01058First100.gff) >> >> >> >> >> On Tue, May 21, 2013 at 10:43 PM, Barry Moore wrote: >>> Hi Getiria, >>> >>> Does the MAKER produced GFF3 file contain any annotations at all? Can you >>> send the first ~100 lines each of the MAKER produced GFF3 file and of the >>> GFF3 files that you passed via maker_opts.ctl? >>> >>> B >>> >>> On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: >>> >>>> Maker Development Team, >>>> >>>> I am trying to use Maker for re-annotation using gene predictions from >>>> Augustus. We had previously used Augustus for gene prediction but now want >>>> to combine these annotations with some EST data. I updated fields >>>> maker_opts.ctl as below >>>> >>>> genome=CGS01058.fasta #genome sequence file in fasta format >>>> est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT >>>> pred_gff=Augustus.gff3 #ab-initio predictions from >>>> other_gff=Promoters.gff3 #promoter annotations >>>> other_gff=CpG_Islands.gff3 # CpG island annotations >>>> >>>> Maker runs to completion and according to the log file annotation was >>>> successful. However, it also gives a "Segmentation fault (core dumped)" >>>> message. It does produce a GFF3 file but when I load the GFF3 file into IGV >>>> and look it does not contain any of the exon definitions in Augustus.gff3. >>>> Am I missing something? >>>> >>>> Regards, >>>> Getiria >>>> >>>> -- >>>> Getiria Onsongo, Ph.D. >>>> Informatics Analyst, Research Informatics Support System >>>> Minnesota Supercomputing Institute for Advanced Computational Research >>>> University of Minnesota >>>> Minneapolis, MN 55455 >>>> Phone: 612-624-0532 >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> Barry Moore >>> Research Scientist >>> Dept. of Human Genetics >>> University of Utah >>> Salt Lake City, UT 84112 >>> -------------------------------------------- >>> (801) 585-3543 >>> >>> >>> >>> >> >> >> >> -- >> Getiria Onsongo, Ph.D. >> Informatics Analyst, Research Informatics Support System >> Minnesota Supercomputing Institute for Advanced Computational Research >> University of Minnesota >> Minneapolis, MN 55455 >> Phone: 612-624-0532 > > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jun 5 08:47:51 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Jun 2013 10:47:51 -0400 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: Message-ID: Also, just a note, models are rejected if they have no protein or EST support. This is because ab inito predictors over predict (you may have 10 false positives for every true positive in some genomes for example). --Carson From: Carson Holt Date: Wednesday, 5 June, 2013 10:44 AM To: Innocent Onsongo , Carson Holt Cc: "maker-devel at yandell-lab.org" , Barry Moore Subject: Re: [maker-devel] Maker: accessory scripts All maker gene annotations will be of the format gene/mRNA/exon/CDS. Anything in the format match/match_part is an evidence alignment or rejected model and is there for reference purposes. If you want to upgrade all of the rejected loci to gene annotations, set keep_preds=1 in the control files. If you want to upgrade a subset of rejected models to a full annotation, create a list of IDs (one per line) then give them to the attached script. gff3_preds2models was previously deprecated and no longer part of the maker distribution, but the attached script is an updated version with the same functionality. --Carson From: Innocent Onsongo Date: Wednesday, 5 June, 2013 12:35 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" , Barry Moore Subject: [maker-devel] Maker: accessory scripts I was able to successfully ran Maker and now want to converts the gene prediction match/match_part format to annotation gene/mRNA/exon/CDS format. I looked at the tutorial and the script gff3_preds2models is supposed to do this conversion. How do I access this script. It is not in /maker/2.28-beta/bin/ Also, in running gff3_preds2models is the file I used for pred_gff=? Long story short, how do I transform the GFF output from Maker to the more traditional annotation of exon/intron? Thanks, Getiria _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From amelia.ireland at gmod.org Wed Jun 5 11:14:05 2013 From: amelia.ireland at gmod.org (Amelia Ireland) Date: Wed, 5 Jun 2013 10:14:05 -0700 Subject: [maker-devel] Apply now for the GMOD Summer School! Message-ID: Closing date for applications: 10 June July 19-23, 2013; NESCent, Durham, North Carolina http://gmod.org/wiki/2013_GMOD_Summer_School The 2013 GMOD Summer School is the best way to get to grips with GMOD in the Cloud, GMOD's suite of genomic and genetic software. Over five days, attendees will learn how to install, configure, and run popular GMOD software for visualization, storage, and dissemination of genetic and genomic data. The following software is covered: - Chado, a species-independent database schema covering many areas of genetic and genomic data; - GBrowse, the ubiquitous genome browser; - GBrowse syn, a synteny browser built on GBrowse; - Galaxy, analysis and computation pipeline; - JBrowse, genome browsing evolved; - MAKER, automated annotation pipeline; - Tripal, a slick web interface for displaying and editing data from Chado; and - WebApollo, distributed community genome annotation tool (built on JBrowse). There are additional sessions on setting up a GMOD in the Cloud virtual machine in the Amazon cloud, and common file formats. Courses are taught by members of the software development teams, and there are work sessions in the evenings for participants to talk to the developers or apply what they have been taught to their own data. For more information and to apply, visit http://gmod.org/wiki/2013_GMOD_Summer_School. There are some scholarship funds available for those from underrepresented minorities. All applications should be in by June 10th. If you have any questions, please contact the GMOD help desk at help at gmod.org. Hope to see you there! Thanks, Amelia Ireland GMOD Community Support http://gmod.org || @gmodproject -- Amelia Ireland GMOD Community Support http://gmod.org || @gmodproject -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnuhn at ebi.ac.uk Thu Jun 6 02:44:10 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Thu, 06 Jun 2013 09:44:10 +0100 Subject: [maker-devel] Effect of the unmask option Message-ID: <51B04BDA.7050307@ebi.ac.uk> Hello Carson! When running maker with the unmask option, how does maker use the predictions generated from running the gene predictors on the unmasked sequence? The tutorial says: "You do have the option to run ab initio gene predictors on both the masked and unmasked sequence if repeat masking worries you though. You do this by setting unmask:1 in the maker_opt.ctl configuration file." http://gmod.org/wiki/MAKER_Tutorial_2012 But in the sub get_non_overlaping_abinits in maker::auto_annotator (maker version 2.27) they are skipped: #only accept masked predictions unless I'm not masking or the predictor is genemark my $src = $g->{algorithm}; unless($src =~ /_masked$|^pred_gff/ || $CTL_OPT->{_no_mask} || $CTL_OPT->{predictor} eq 'genemark') { next; } Cheers, Michael. From carsonhh at gmail.com Thu Jun 6 07:55:08 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Jun 2013 09:55:08 -0400 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: Message-ID: One thing to keep in mind is the strandedness of the evidence and the model (they must be on the same strand). Further protein evidence is only valid support if it is in the same reading frame as the model. Could you send the full GFF3 for the contig (I need features and GFF3 internal fasta) and the coordinates of the region in question, and I can take a look? Also if you can, it would be good to let maker run Augustus as well with the species file rather than just passing in the GFF3. This is because MAKER can only talk to Augustus to generate competing hint based models if you provide the species. Thanks, Carson From: Innocent Onsongo Date: Wednesday, 5 June, 2013 1:10 PM To: Carson Holt Cc: Carson Holt , "maker-devel at yandell-lab.org" , Barry Moore Subject: Re: [maker-devel] Maker: accessory scripts I checked visually in IGV and there are some exons in the predicted model with protein and EST support but the maker output GFF only has match_part and protein_match in column 3. Does that mean Maker doesn't deem any of the evidence sufficient to make a gene model prediction? I guess I am somewhat surprised I am not getting any exons predicted by Maker. Is there a parameter I can alter to reduce the threshold at which Maker makes this call? I have attached the first 400 lines of one of my GFF files together with the control file (maker_opts.ctl) just in case they might be useful. Getiria On Wed, Jun 5, 2013 at 9:47 AM, Carson Holt wrote: > Also, just a note, models are rejected if they have no protein or EST support. > This is because ab inito predictors over predict (you may have 10 false > positives for every true positive in some genomes for example). > > --Carson > > > > From: Carson Holt > Date: Wednesday, 5 June, 2013 10:44 AM > To: Innocent Onsongo , Carson Holt > > Cc: "maker-devel at yandell-lab.org" , Barry Moore > > Subject: Re: [maker-devel] Maker: accessory scripts > > All maker gene annotations will be of the format gene/mRNA/exon/CDS. > Anything in the format match/match_part is an evidence alignment or rejected > model and is there for reference purposes. If you want to upgrade all of the > rejected loci to gene annotations, set keep_preds=1 in the control files. If > you want to upgrade a subset of rejected models to a full annotation, create a > list of IDs (one per line) then give them to the attached script. > gff3_preds2models was previously deprecated and no longer part of the maker > distribution, but the attached script is an updated version with the same > functionality. > > --Carson > > > > From: Innocent Onsongo > Date: Wednesday, 5 June, 2013 12:35 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" , Barry Moore > > Subject: [maker-devel] Maker: accessory scripts > > I was able to successfully ran Maker and now want to converts the gene > prediction match/match_part format to annotation gene/mRNA/exon/CDS format. I > looked at the tutorial and the script gff3_preds2models > is supposed to do this conversion. How do I access this script. It is not in > /maker/2.28-beta/bin/ > > Also, in running gff3_preds2models is the > file I used for pred_gff=? > > Long story short, how do I transform the GFF output from Maker to the more > traditional annotation of exon/intron? > > Thanks, > Getiria > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Wed Jun 5 11:10:01 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Wed, 5 Jun 2013 12:10:01 -0500 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: References: Message-ID: I checked visually in IGV and there are some exons in the predicted model with protein and EST support but the maker output GFF only has match_part and protein_match in column 3. Does that mean Maker doesn't deem any of the evidence sufficient to make a gene model prediction? I guess I am somewhat surprised I am not getting any exons predicted by Maker. Is there a parameter I can alter to reduce the threshold at which Maker makes this call? I have attached the first 400 lines of one of my GFF files together with the control file (maker_opts.ctl) just in case they might be useful. Getiria On Wed, Jun 5, 2013 at 9:47 AM, Carson Holt wrote: > Also, just a note, models are rejected if they have no protein or EST > support. This is because ab inito predictors over predict (you may have 10 > false positives for every true positive in some genomes for example). > > --Carson > > > > From: Carson Holt > Date: Wednesday, 5 June, 2013 10:44 AM > To: Innocent Onsongo , Carson Holt < > carson.holt at oicr.on.ca> > > Cc: "maker-devel at yandell-lab.org" , Barry > Moore > Subject: Re: [maker-devel] Maker: accessory scripts > > All maker gene annotations will be of the format gene/mRNA/exon/CDS. > Anything in the format match/match_part is an evidence alignment or > rejected model and is there for reference purposes. If you want to upgrade > all of the rejected loci to gene annotations, set keep_preds=1 in the > control files. If you want to upgrade a subset of rejected models to a > full annotation, create a list of IDs (one per line) then give them to the > attached script. gff3_preds2models was previously deprecated and no longer > part of the maker distribution, but the attached script is an updated > version with the same functionality. > > --Carson > > > > From: Innocent Onsongo > Date: Wednesday, 5 June, 2013 12:35 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" , Barry > Moore > Subject: [maker-devel] Maker: accessory scripts > > I was able to successfully ran Maker and now want to converts the gene > prediction match/match_part format to annotation gene/mRNA/exon/CDS format. > I looked at the tutorial and the script gff3_preds2models > is supposed to do this conversion. How do I access this script. It is not > in /maker/2.28-beta/bin/ > > Also, in running gff3_preds2models is list> the file I used for pred_gff=? > > Long story short, how do I transform the GFF output from Maker to the more > traditional annotation of exon/intron? > > Thanks, > Getiria > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4526 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: MakerFirst400.gff Type: application/octet-stream Size: 74871 bytes Desc: not available URL: From onson001 at umn.edu Thu Jun 6 12:58:21 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Thu, 6 Jun 2013 13:58:21 -0500 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: References: Message-ID: Thanks for the timely feedback Carson. I made a change to my pred_gff and est_gff GFF3 files and now I am getting results but I am not sure if the changes I made are valid. I want to make sure the changes I made did not lead Maker to behave in an unexpected way and lead to results that might be incorrect. In my pred_gff file, I replaced "mRNA" with "protein_match" and "exon" with "match" below are the first three lines of the old and new pred_gff files respectively ---------------old pred_gff ##gff-version 3 CGS00003 AUGUSTUS mRNA 1 10865 1 + . CGS00003 AUGUSTUS exon 2013 2050 . + 1 ---------------new pred_gff ##gff-version 3 CGS00003 AUGUSTUS protein_match 1 10865 1 + . CGS00003 AUGUSTUS match_part 2013 2050 . + 1 In my est_gff file, I replaced "mRNA" with "protein_match" and "exon" with "match" below are the first three lines of the old and new pred_gff files respectively ----------------old est_gff ##gff-version 3 CGS00003 EST_BLAT mRNA 4641336 4758501 6072 - . CGS00003 EST_BLAT exon 4641336 4641979 644 - . ----------------new est_gff CGS00003 EST_BLAT expressed_sequence_match 4641336 4758501 6072 - . CGS00003 EST_BLAT match_part 4641336 4641979 644 - . Are the changes I made valid? Thanks, Getiria On Wed, Jun 5, 2013 at 9:47 AM, Carson Holt wrote: > Also, just a note, models are rejected if they have no protein or EST > support. This is because ab inito predictors over predict (you may have 10 > false positives for every true positive in some genomes for example). > > --Carson > > > > From: Carson Holt > Date: Wednesday, 5 June, 2013 10:44 AM > To: Innocent Onsongo , Carson Holt < > carson.holt at oicr.on.ca> > > Cc: "maker-devel at yandell-lab.org" , Barry > Moore > Subject: Re: [maker-devel] Maker: accessory scripts > > All maker gene annotations will be of the format gene/mRNA/exon/CDS. > Anything in the format match/match_part is an evidence alignment or > rejected model and is there for reference purposes. If you want to upgrade > all of the rejected loci to gene annotations, set keep_preds=1 in the > control files. If you want to upgrade a subset of rejected models to a > full annotation, create a list of IDs (one per line) then give them to the > attached script. gff3_preds2models was previously deprecated and no longer > part of the maker distribution, but the attached script is an updated > version with the same functionality. > > --Carson > > > > From: Innocent Onsongo > Date: Wednesday, 5 June, 2013 12:35 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" , Barry > Moore > Subject: [maker-devel] Maker: accessory scripts > > I was able to successfully ran Maker and now want to converts the gene > prediction match/match_part format to annotation gene/mRNA/exon/CDS format. > I looked at the tutorial and the script gff3_preds2models > is supposed to do this conversion. How do I access this script. It is not > in /maker/2.28-beta/bin/ > > Also, in running gff3_preds2models is list> the file I used for pred_gff=? > > Long story short, how do I transform the GFF output from Maker to the more > traditional annotation of exon/intron? > > Thanks, > Getiria > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From benayoun at stanford.edu Fri Jun 7 11:17:47 2013 From: benayoun at stanford.edu (=?ISO-8859-1?Q?B=E9r=E9nice_Benayoun?=) Date: Fri, 7 Jun 2013 10:17:47 -0700 Subject: [maker-devel] Maker and mono-exonic genes ? Message-ID: Dear maker developers, I am currently annotating a de novo fish genome, and have started looking for genes of interest in particular in Maker's output to verify that it's outputting proper gene sets. While many of the genes I look for seem to be correctly annotated by the pipeline, I have noticed that important genes that do have strong evidentiary support but are monoexonic are NOT reported by maker. I am attaching a screenshot for the contig that I know should contain the * Foxl2* gene (notoriously monoexonic across evolution), and highlighted the corresponding evidence for it. Is there any setting I can give to maker to force it to output monoexonic genes ? I already set "single_exon=1" with no success. I attached my config file FYI. Thank you so much in advance for your answer !!! Best, Berenice. -- B?r?nice A. BENAYOUN, Ph.D. Stanford University/Genetics Department *BRUNET Laboratory*, 'Molecular Basis of Longevity and Age Related Diseases' M312 Alway Building 300, Pasteur Drive MC 5120 Stanford, CA 94305-5120 USA Email: benayoun at stanford.edu Web: www.stanford.edu/group/brunet/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Appolo_screenshot_missing_monoexonic_pred.pdf Type: application/pdf Size: 709436 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.log Type: application/octet-stream Size: 5155 bytes Desc: not available URL: From onson001 at umn.edu Fri Jun 7 14:08:43 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Fri, 7 Jun 2013 15:08:43 -0500 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: References: Message-ID: Carson, I have attached the full gff3 for the contig together with a screen shot from IGV with regions I was expecting Maker to make a consensus call. The region on question is CGS00003:5264784-5273457. I will greatly appreciate any insights. Thanks, Getiria On Thu, Jun 6, 2013 at 8:55 AM, Carson Holt wrote: > One thing to keep in mind is the strandedness of the evidence and the > model (they must be on the same strand). Further protein evidence is only > valid support if it is in the same reading frame as the model. > > Could you send the full GFF3 for the contig (I need features and GFF3 > internal fasta) and the coordinates of the region in question, and I can > take a look? Also if you can, it would be good to let maker run Augustus > as well with the species file rather than just passing in the GFF3. This > is because MAKER can only talk to Augustus to generate competing hint based > models if you provide the species. > > Thanks, > Carson > > > From: Innocent Onsongo > Date: Wednesday, 5 June, 2013 1:10 PM > To: Carson Holt > Cc: Carson Holt , "maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org>, Barry Moore > > Subject: Re: [maker-devel] Maker: accessory scripts > > I checked visually in IGV and there are some exons in the predicted model > with protein and EST support but the maker output GFF only has match_part > and protein_match in column 3. Does that mean Maker doesn't deem any of the > evidence sufficient to make a gene model prediction? > > I guess I am somewhat surprised I am not getting any exons predicted by > Maker. Is there a parameter I can alter to reduce the threshold at which > Maker makes this call? I have attached the first 400 lines of one of my GFF > files together with the control file (maker_opts.ctl) just in case they > might be useful. > > Getiria > > > On Wed, Jun 5, 2013 at 9:47 AM, Carson Holt wrote: > >> Also, just a note, models are rejected if they have no protein or EST >> support. This is because ab inito predictors over predict (you may have 10 >> false positives for every true positive in some genomes for example). >> >> --Carson >> >> >> >> From: Carson Holt >> Date: Wednesday, 5 June, 2013 10:44 AM >> To: Innocent Onsongo , Carson Holt < >> carson.holt at oicr.on.ca> >> >> Cc: "maker-devel at yandell-lab.org" , Barry >> Moore >> Subject: Re: [maker-devel] Maker: accessory scripts >> >> All maker gene annotations will be of the format gene/mRNA/exon/CDS. >> Anything in the format match/match_part is an evidence alignment or >> rejected model and is there for reference purposes. If you want to upgrade >> all of the rejected loci to gene annotations, set keep_preds=1 in the >> control files. If you want to upgrade a subset of rejected models to a >> full annotation, create a list of IDs (one per line) then give them to the >> attached script. gff3_preds2models was previously deprecated and no longer >> part of the maker distribution, but the attached script is an updated >> version with the same functionality. >> >> --Carson >> >> >> >> From: Innocent Onsongo >> Date: Wednesday, 5 June, 2013 12:35 PM >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" , Barry >> Moore >> Subject: [maker-devel] Maker: accessory scripts >> >> I was able to successfully ran Maker and now want to converts the gene >> prediction match/match_part format to annotation gene/mRNA/exon/CDS format. >> I looked at the tutorial and the script gff3_preds2models >> is supposed to do this conversion. How do I access this script. It is not >> in /maker/2.28-beta/bin/ >> >> Also, in running gff3_preds2models is > list> the file I used for pred_gff=? >> >> Long story short, how do I transform the GFF output from Maker to the >> more traditional annotation of exon/intron? >> >> Thanks, >> Getiria >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: CGS00003.gff Type: application/octet-stream Size: 11835536 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: CGS00003_5264784-5273457.pdf Type: application/pdf Size: 124265 bytes Desc: not available URL: From dence at genetics.utah.edu Fri Jun 7 15:32:57 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 7 Jun 2013 21:32:57 +0000 Subject: [maker-devel] Maker and mono-exonic genes ? In-Reply-To: References: Message-ID: Hi Berenice, Thank you for sending that screenshot and the maker_opts.log file. Those are exactly what we need to understand how to expect MAKER to perform. In looking at the screenshot, it doesn't look like any of the gene predictors gave a prediction in this region. Uses the predictions from ab-initio tools as a basis for models and considers models that are supported by evidence. It won't by default create a model when there isn't a prediction in the region. Can I ask which gene predictors you used and how they were trained? You might consider training one or more of them on the specific evidence that you expect to support these genes and then rerunning maker with the retrained predictors. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of B?r?nice Benayoun [benayoun at stanford.edu] Sent: Friday, June 07, 2013 11:17 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] Maker and mono-exonic genes ? Dear maker developers, I am currently annotating a de novo fish genome, and have started looking for genes of interest in particular in Maker's output to verify that it's outputting proper gene sets. While many of the genes I look for seem to be correctly annotated by the pipeline, I have noticed that important genes that do have strong evidentiary support but are monoexonic are NOT reported by maker. I am attaching a screenshot for the contig that I know should contain the Foxl2 gene (notoriously monoexonic across evolution), and highlighted the corresponding evidence for it. Is there any setting I can give to maker to force it to output monoexonic genes ? I already set "single_exon=1" with no success. I attached my config file FYI. Thank you so much in advance for your answer !!! Best, Berenice. -- B?r?nice A. BENAYOUN, Ph.D. Stanford University/Genetics Department BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' M312 Alway Building 300, Pasteur Drive MC 5120 Stanford, CA 94305-5120 USA Email: benayoun at stanford.edu Web: www.stanford.edu/group/brunet/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Fri Jun 7 15:58:16 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 7 Jun 2013 21:58:16 +0000 Subject: [maker-devel] Maker and mono-exonic genes ? In-Reply-To: References: , Message-ID: Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: berenice.benayoun at gmail.com [berenice.benayoun at gmail.com] on behalf of B?r?nice Benayoun [benayoun at stanford.edu] Sent: Friday, June 07, 2013 3:50 PM To: Daniel Ence Subject: Re: [maker-devel] Maker and mono-exonic genes ? Hi Daniel, Thanks for the quick answer ! I used SNAP, and trained from a hmm model made with the CEGMA output on my genome (240 gene models) plus a first run of maker of 1/3 of the genome. I tried GenemarkES and Augustus, but for some reason they don't run, so I stopped indicating their existence to maker. Should I do something in particular to train it "better" ? Is there any other predictor that would be worth running ? Thanks so much for your help ! Berenice 2013/6/7 Daniel Ence > Hi Berenice, Thank you for sending that screenshot and the maker_opts.log file. Those are exactly what we need to understand how to expect MAKER to perform. In looking at the screenshot, it doesn't look like any of the gene predictors gave a prediction in this region. Uses the predictions from ab-initio tools as a basis for models and considers models that are supported by evidence. It won't by default create a model when there isn't a prediction in the region. Can I ask which gene predictors you used and how they were trained? You might consider training one or more of them on the specific evidence that you expect to support these genes and then rerunning maker with the retrained predictors. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of B?r?nice Benayoun [benayoun at stanford.edu] Sent: Friday, June 07, 2013 11:17 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] Maker and mono-exonic genes ? Dear maker developers, I am currently annotating a de novo fish genome, and have started looking for genes of interest in particular in Maker's output to verify that it's outputting proper gene sets. While many of the genes I look for seem to be correctly annotated by the pipeline, I have noticed that important genes that do have strong evidentiary support but are monoexonic are NOT reported by maker. I am attaching a screenshot for the contig that I know should contain the Foxl2 gene (notoriously monoexonic across evolution), and highlighted the corresponding evidence for it. Is there any setting I can give to maker to force it to output monoexonic genes ? I already set "single_exon=1" with no success. I attached my config file FYI. Thank you so much in advance for your answer !!! Best, Berenice. -- B?r?nice A. BENAYOUN, Ph.D. Stanford University/Genetics Department BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' M312 Alway Building 300, Pasteur Drive MC 5120 Stanford, CA 94305-5120 USA Email: benayoun at stanford.edu Web: www.stanford.edu/group/brunet/ -- B?r?nice A. BENAYOUN, Ph.D. Stanford University/Genetics Department BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' M312 Alway Building 300, Pasteur Drive MC 5120 Stanford, CA 94305-5120 USA Email: benayoun at stanford.edu Web: www.stanford.edu/group/brunet/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Fri Jun 7 16:30:35 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri, 7 Jun 2013 16:30:35 -0600 Subject: [maker-devel] Maker and mono-exonic genes ? In-Reply-To: References: , Message-ID: <11A6EF4C-B82E-4851-80FC-B8668531E2EC@genetics.utah.edu> Hi Berenice, SNAP is a good gene predictor, but for most genomes Augustus can be more accurate - of course it is also harder to train. Running a first round of MAKER annotation with SNAP as the predictor and then training SNAP on the output from that run followed by a second MAKER run (runs pretty fast second time because all the blast jobs are reused) is a good way to start. Ultimately running Augustus as well (along with custom training) is probably worth it for a final annotation effort. The good thing is you can run these iterative cycles of annotation with minimal effort because MAKER will reuse an computations that have already run. B On Jun 7, 2013, at 3:58 PM, Daniel Ence wrote: > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > From: berenice.benayoun at gmail.com [berenice.benayoun at gmail.com] on behalf of B?r?nice Benayoun [benayoun at stanford.edu] > Sent: Friday, June 07, 2013 3:50 PM > To: Daniel Ence > Subject: Re: [maker-devel] Maker and mono-exonic genes ? > > Hi Daniel, > > Thanks for the quick answer ! > > I used SNAP, and trained from a hmm model made with the CEGMA output on my genome (240 gene models) plus a first run of maker of 1/3 of the genome. I tried GenemarkES and Augustus, but for some reason they don't run, so I stopped indicating their existence to maker. > > Should I do something in particular to train it "better" ? Is there any other predictor that would be worth running ? > > Thanks so much for your help ! > > Berenice > > 2013/6/7 Daniel Ence > Hi Berenice, Thank you for sending that screenshot and the maker_opts.log file. Those are exactly what we need to understand how to expect MAKER to perform. > > In looking at the screenshot, it doesn't look like any of the gene predictors gave a prediction in this region. Uses the predictions from ab-initio tools as a basis for models and considers models that are supported by evidence. It won't by default create a model when there isn't a prediction in the region. > > Can I ask which gene predictors you used and how they were trained? You might consider training one or more of them on the specific evidence that you expect to support these genes and then rerunning maker with the retrained predictors. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of B?r?nice Benayoun [benayoun at stanford.edu] > Sent: Friday, June 07, 2013 11:17 AM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] Maker and mono-exonic genes ? > > Dear maker developers, > > I am currently annotating a de novo fish genome, and have started looking for genes of interest in particular in Maker's output to verify that it's outputting proper gene sets. > > While many of the genes I look for seem to be correctly annotated by the pipeline, I have noticed that important genes that do have strong evidentiary support but are monoexonic are NOT reported by maker. > > I am attaching a screenshot for the contig that I know should contain the Foxl2 gene (notoriously monoexonic across evolution), and highlighted the corresponding evidence for it. > > Is there any setting I can give to maker to force it to output monoexonic genes ? I already set "single_exon=1" with no success. I attached my config file FYI. > > Thank you so much in advance for your answer !!! > > Best, > > Berenice. > -- > B?r?nice A. BENAYOUN, Ph.D. > Stanford University/Genetics Department > BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' > M312 Alway Building > 300, Pasteur Drive > MC 5120 > Stanford, CA 94305-5120 > USA > Email: benayoun at stanford.edu > Web: www.stanford.edu/group/brunet/ > > > > -- > B?r?nice A. BENAYOUN, Ph.D. > Stanford University/Genetics Department > BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' > M312 Alway Building > 300, Pasteur Drive > MC 5120 > Stanford, CA 94305-5120 > USA > Email: benayoun at stanford.edu > Web: www.stanford.edu/group/brunet/ > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jun 7 15:51:53 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 07 Jun 2013 16:51:53 -0500 Subject: [maker-devel] Effect of the unmask option In-Reply-To: <51B04BDA.7050307@ebi.ac.uk> Message-ID: The unmasked option allows the ab initio predictions ran on unmasked sequence to compete against other models, and only if they have a better AED score are they selected. They are not available for non-overlapping rejected models at the end of the run because that set is non-redundant and they tend to have a very high likelihood of being transposons themselves. So I don't let a repeat containing model override a non-repeat containing model unless there is evidence supporting it (there is noever evidence supporting the non-overlapping models). --Carson On 13-06-06 4:44 AM, "Michael Nuhn" wrote: >Hello Carson! > >When running maker with the unmask option, how does maker use the >predictions generated from running the gene predictors on the unmasked >sequence? > >The tutorial says: > >"You do have the option to run ab initio gene predictors on both the >masked and unmasked sequence if repeat masking worries you though. You >do this by setting unmask:1 in the maker_opt.ctl configuration file." > >http://gmod.org/wiki/MAKER_Tutorial_2012 > >But in the sub get_non_overlaping_abinits in maker::auto_annotator >(maker version 2.27) they are skipped: > >#only accept masked predictions unless I'm not masking or the predictor >is genemark >my $src = $g->{algorithm}; >unless($src =~ /_masked$|^pred_gff/ || $CTL_OPT->{_no_mask} || >$CTL_OPT->{predictor} eq 'genemark') { > next; >} > >Cheers, >Michael. > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri Jun 7 16:10:09 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 07 Jun 2013 17:10:09 -0500 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: Message-ID: You seem to be running this in a very odd way. First the GFF3 is not correctly formatted. There are lines containing score=1 (all by itself)? I believe this may be coming through because you are trying to pass in augustus predictions as GFF3 and that input is malformed. All of your Augustus models are also single exon genes, but they are very long and do not even correspond to proper ORFs. The EST evidence is spliced and is thus contradicting the augustus model (they don't support each other). If you want MAKER to be able to use the evidence as feedback for the model, you need to let MAKER run augustus. Otherwise it is only able to accept or reject the model from the GFF3 (nothing more ? no attempt at consensus). Perhaps if you supply you input dataset and control files we can help you get the best settings. You would need to provide the Augustus species set you are using as well (contained in a directory in ?/augustus/config/species). --Carson From: Innocent Onsongo Date: Friday, 7 June, 2013 2:08 PM To: Carson Holt Cc: Carson Holt , "maker-devel at yandell-lab.org" , Barry Moore Subject: Re: [maker-devel] Maker: accessory scripts Carson, I have attached the full gff3 for the contig together with a screen shot from IGV with regions I was expecting Maker to make a consensus call. The region on question is CGS00003:5264784-5273457. I will greatly appreciate any insights. Thanks, Getiria On Thu, Jun 6, 2013 at 8:55 AM, Carson Holt wrote: > One thing to keep in mind is the strandedness of the evidence and the model > (they must be on the same strand). Further protein evidence is only valid > support if it is in the same reading frame as the model. > > Could you send the full GFF3 for the contig (I need features and GFF3 internal > fasta) and the coordinates of the region in question, and I can take a look? > Also if you can, it would be good to let maker run Augustus as well with the > species file rather than just passing in the GFF3. This is because MAKER can > only talk to Augustus to generate competing hint based models if you provide > the species. > > Thanks, > Carson > > > From: Innocent Onsongo > Date: Wednesday, 5 June, 2013 1:10 PM > To: Carson Holt > Cc: Carson Holt , "maker-devel at yandell-lab.org" > , Barry Moore > > Subject: Re: [maker-devel] Maker: accessory scripts > > I checked visually in IGV and there are some exons in the predicted model with > protein and EST support but the maker output GFF only has match_part and > protein_match in column 3. Does that mean Maker doesn't deem any of the > evidence sufficient to make a gene model prediction? > > I guess I am somewhat surprised I am not getting any exons predicted by Maker. > Is there a parameter I can alter to reduce the threshold at which Maker makes > this call? I have attached the first 400 lines of one of my GFF files together > with the control file (maker_opts.ctl) just in case they might be useful. > > Getiria > > > On Wed, Jun 5, 2013 at 9:47 AM, Carson Holt wrote: >> Also, just a note, models are rejected if they have no protein or EST >> support. This is because ab inito predictors over predict (you may have 10 >> false positives for every true positive in some genomes for example). >> >> --Carson >> >> >> >> From: Carson Holt >> Date: Wednesday, 5 June, 2013 10:44 AM >> To: Innocent Onsongo , Carson Holt >> >> >> Cc: "maker-devel at yandell-lab.org" , Barry Moore >> >> Subject: Re: [maker-devel] Maker: accessory scripts >> >> All maker gene annotations will be of the format gene/mRNA/exon/CDS. >> Anything in the format match/match_part is an evidence alignment or rejected >> model and is there for reference purposes. If you want to upgrade all of the >> rejected loci to gene annotations, set keep_preds=1 in the control files. If >> you want to upgrade a subset of rejected models to a full annotation, create >> a list of IDs (one per line) then give them to the attached script. >> gff3_preds2models was previously deprecated and no longer part of the maker >> distribution, but the attached script is an updated version with the same >> functionality. >> >> --Carson >> >> >> >> From: Innocent Onsongo >> Date: Wednesday, 5 June, 2013 12:35 PM >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" , Barry Moore >> >> Subject: [maker-devel] Maker: accessory scripts >> >> I was able to successfully ran Maker and now want to converts the gene >> prediction match/match_part format to annotation gene/mRNA/exon/CDS format. I >> looked at the tutorial and the script gff3_preds2models >> is supposed to do this conversion. How do I access this script. It is not in >> /maker/2.28-beta/bin/ >> >> Also, in running gff3_preds2models is >> the file I used for pred_gff=? >> >> Long story short, how do I transform the GFF output from Maker to the more >> traditional annotation of exon/intron? >> >> Thanks, >> Getiria >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/ma >> ker-devel_yandell-lab.org > > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Fri Jun 7 20:29:50 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Fri, 7 Jun 2013 21:29:50 -0500 Subject: [maker-devel] Maker: accessory scripts In-Reply-To: References: Message-ID: I appreciate the feedback. I will try letting MAKER run augustus instead of passing the Augustus predictions as GFF3. Thanks for all you help! Getiria On Fri, Jun 7, 2013 at 5:10 PM, Carson Holt wrote: > You seem to be running this in a very odd way. First the GFF3 is not > correctly formatted. There are lines containing score=1 (all by itself)? I > believe this may be coming through because you are trying to pass in > augustus predictions as GFF3 and that input is malformed. All of your > Augustus models are also single exon genes, but they are very long and do > not even correspond to proper ORFs. The EST evidence is spliced and is > thus contradicting the augustus model (they don't support each other). If > you want MAKER to be able to use the evidence as feedback for the model, > you need to let MAKER run augustus. Otherwise it is only able to accept or > reject the model from the GFF3 (nothing more ? no attempt at consensus). > > Perhaps if you supply you input dataset and control files we can help you > get the best settings. You would need to provide the Augustus species set > you are using as well (contained in a directory in > ?/augustus/config/species). > > --Carson > > > From: Innocent Onsongo > Date: Friday, 7 June, 2013 2:08 PM > > To: Carson Holt > Cc: Carson Holt , "maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org>, Barry Moore > Subject: Re: [maker-devel] Maker: accessory scripts > > Carson, > > I have attached the full gff3 for the contig together with a screen shot > from IGV with regions I was expecting Maker to make a consensus call. The > region on question is CGS00003:5264784-5273457. I will greatly appreciate > any insights. > > > Thanks, > > Getiria > > > > > On Thu, Jun 6, 2013 at 8:55 AM, Carson Holt wrote: > >> One thing to keep in mind is the strandedness of the evidence and the >> model (they must be on the same strand). Further protein evidence is only >> valid support if it is in the same reading frame as the model. >> >> Could you send the full GFF3 for the contig (I need features and GFF3 >> internal fasta) and the coordinates of the region in question, and I can >> take a look? Also if you can, it would be good to let maker run Augustus >> as well with the species file rather than just passing in the GFF3. This >> is because MAKER can only talk to Augustus to generate competing hint based >> models if you provide the species. >> >> Thanks, >> Carson >> >> >> From: Innocent Onsongo >> Date: Wednesday, 5 June, 2013 1:10 PM >> To: Carson Holt >> Cc: Carson Holt , "maker-devel at yandell-lab.org" < >> maker-devel at yandell-lab.org>, Barry Moore >> >> Subject: Re: [maker-devel] Maker: accessory scripts >> >> I checked visually in IGV and there are some exons in the predicted model >> with protein and EST support but the maker output GFF only has match_part >> and protein_match in column 3. Does that mean Maker doesn't deem any of the >> evidence sufficient to make a gene model prediction? >> >> I guess I am somewhat surprised I am not getting any exons predicted by >> Maker. Is there a parameter I can alter to reduce the threshold at which >> Maker makes this call? I have attached the first 400 lines of one of my GFF >> files together with the control file (maker_opts.ctl) just in case they >> might be useful. >> >> Getiria >> >> >> On Wed, Jun 5, 2013 at 9:47 AM, Carson Holt wrote: >> >>> Also, just a note, models are rejected if they have no protein or EST >>> support. This is because ab inito predictors over predict (you may have 10 >>> false positives for every true positive in some genomes for example). >>> >>> --Carson >>> >>> >>> >>> From: Carson Holt >>> Date: Wednesday, 5 June, 2013 10:44 AM >>> To: Innocent Onsongo , Carson Holt < >>> carson.holt at oicr.on.ca> >>> >>> Cc: "maker-devel at yandell-lab.org" , Barry >>> Moore >>> Subject: Re: [maker-devel] Maker: accessory scripts >>> >>> All maker gene annotations will be of the format gene/mRNA/exon/CDS. >>> Anything in the format match/match_part is an evidence alignment or >>> rejected model and is there for reference purposes. If you want to upgrade >>> all of the rejected loci to gene annotations, set keep_preds=1 in the >>> control files. If you want to upgrade a subset of rejected models to a >>> full annotation, create a list of IDs (one per line) then give them to the >>> attached script. gff3_preds2models was previously deprecated and no longer >>> part of the maker distribution, but the attached script is an updated >>> version with the same functionality. >>> >>> --Carson >>> >>> >>> >>> From: Innocent Onsongo >>> Date: Wednesday, 5 June, 2013 12:35 PM >>> To: Carson Holt >>> Cc: "maker-devel at yandell-lab.org" , Barry >>> Moore >>> Subject: [maker-devel] Maker: accessory scripts >>> >>> I was able to successfully ran Maker and now want to converts the gene >>> prediction match/match_part format to annotation gene/mRNA/exon/CDS format. >>> I looked at the tutorial and the script gff3_preds2models >>> is supposed to do this conversion. How do I access this script. It is >>> not in /maker/2.28-beta/bin/ >>> >>> Also, in running gff3_preds2models is >> list> the file I used for pred_gff=? >>> >>> Long story short, how do I transform the GFF output from Maker to the >>> more traditional annotation of exon/intron? >>> >>> Thanks, >>> Getiria >>> _______________________________________________ maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> >> >> >> -- >> Getiria Onsongo, Ph.D. >> Informatics Analyst, Research Informatics Support System >> Minnesota Supercomputing Institute for Advanced Computational Research >> University of Minnesota >> Minneapolis, MN 55455 >> Phone: 612-624-0532 >> > > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Jun 10 06:40:35 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Jun 2013 08:40:35 -0400 Subject: [maker-devel] Maker and mono-exonic genes ? In-Reply-To: Message-ID: One more note. The ESTs appear to be from multiple overlapping HSPs (based on red line pattern in image). I'd have to see the actual GFF3 to be sure, but if that is the case, then there probably isn't an ORF to work with at that location on that strand (so SNAP can't call it). Possibly the result of assembly error or a pseudogene. --Carson From: Daniel Ence Date: Friday, 7 June, 2013 5:32 PM To: B?r?nice Benayoun , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Maker and mono-exonic genes ? Hi Berenice, Thank you for sending that screenshot and the maker_opts.log file. Those are exactly what we need to understand how to expect MAKER to perform. In looking at the screenshot, it doesn't look like any of the gene predictors gave a prediction in this region. Uses the predictions from ab-initio tools as a basis for models and considers models that are supported by evidence. It won't by default create a model when there isn't a prediction in the region. Can I ask which gene predictors you used and how they were trained? You might consider training one or more of them on the specific evidence that you expect to support these genes and then rerunning maker with the retrained predictors. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of B?r?nice Benayoun [benayoun at stanford.edu] Sent: Friday, June 07, 2013 11:17 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] Maker and mono-exonic genes ? Dear maker developers, I am currently annotating a de novo fish genome, and have started looking for genes of interest in particular in Maker's output to verify that it's outputting proper gene sets. While many of the genes I look for seem to be correctly annotated by the pipeline, I have noticed that important genes that do have strong evidentiary support but are monoexonic are NOT reported by maker. I am attaching a screenshot for the contig that I know should contain the Foxl2 gene (notoriously monoexonic across evolution), and highlighted the corresponding evidence for it. Is there any setting I can give to maker to force it to output monoexonic genes ? I already set "single_exon=1" with no success. I attached my config file FYI. Thank you so much in advance for your answer !!! Best, Berenice. -- B?r?nice A. BENAYOUN, Ph.D. Stanford University/Genetics Department BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' M312 Alway Building 300, Pasteur Drive MC 5120 Stanford, CA 94305-5120 USA Email: benayoun at stanford.edu Web: www.stanford.edu/group/brunet/ _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michel.moser at ips.unibe.ch Mon Jun 10 07:03:06 2013 From: michel.moser at ips.unibe.ch (michel.moser at ips.unibe.ch) Date: Mon, 10 Jun 2013 13:03:06 +0000 Subject: [maker-devel] maker 2.28 blastx error Message-ID: Hello Maker-developper and user I am using maker for the first time to annotate some BAC-sequences. I successfully run both of the test-data sets provided in the maker tarball but when i run maker on my sequences and provide some EST-evidence from cufflinks, i get errors at repeat database blasting (see error below). As te_protein data set i just use the provided file in maker/data/. I sent the data to a colleague which could run it without problem using maker2.10. Or is the problem that i dont have wublast and RepBase installed? Any hint is highly appreciated! Thanks, Michel std.error STATUS: Parsing control files... WARNING: blast_type is set to 'wublast' but executables cannot be located The blast_type 'ncbi+' will be used instead. STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/insert-bac2_datastore To access files for individual sequences use the datastore index: /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/insert-bac2_master_datastore_index.log STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: bac2:383-131865 Length: 131482 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks doing repeat masking doing blastx repeats formating database... #--------- command -------------# Widget::formater: /usr/bin/makeblastdb -dbtype prot -in /tmp/maker_rcBcxr/0/blastprep/te_proteins%2Efasta.mpi.10.0 #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /usr/bin/blastx -db /tmp/maker_rcBcxr/te_proteins%2Efasta.mpi.10.0 -query /tmp/maker_rcBcxr/0/bac2%3A383-131865.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/insert-bac2_datastore/1D/F1/bac2%3A383-131865//theVoid.bac2%3A383-131865/0/bac2%3A383-131865.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner #-------------------------------# BLAST engine error: Warning: Sequence contains no data BLAST engine error: Warning: Sequence contains no data ERROR: BLASTX failed --> rank=NA, hostname=ipsktube ERROR: Failed while doing blastx repeats ERROR: Chunk failed at level:1, tier_type:1 FAILED CONTIG:bac2:383-131865 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:bac2:383-131865 examining contents of the fasta file and run log -------------- next part -------------- A non-text attachment was scrubbed... Name: test1.fasta Type: application/octet-stream Size: 14791 bytes Desc: test1.fasta URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_bopts.ctl Type: application/octet-stream Size: 1413 bytes Desc: maker_bopts.ctl URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_exe.ctl Type: application/octet-stream Size: 1201 bytes Desc: maker_exe.ctl URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4457 bytes Desc: maker_opts.ctl URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: protein.fasta Type: application/octet-stream Size: 452 bytes Desc: protein.fasta URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: insert-bac2.fasta Type: application/octet-stream Size: 131500 bytes Desc: insert-bac2.fasta URL: From anthony.bretaudeau at rennes.inra.fr Mon Jun 10 09:48:13 2013 From: anthony.bretaudeau at rennes.inra.fr (Anthony Bretaudeau) Date: Mon, 10 Jun 2013 17:48:13 +0200 Subject: [maker-devel] Patch for a bug with repeat gff Message-ID: <51B5F53D.90505@rennes.inra.fr> Hello, I am running Maker 2.27b on an insect genome, and I use a gff file containing some repeat positions (rm_gff option in maker_opts.ctl). I encountered an error on 10 scaffolds (the genome contains ~40000 scaffolds) : "substr outside of string" (similar to this post: http://gmod.827538.n3.nabble.com/substr-outside-of-string-td4031889.html). After a lot a debugging, it turns out the problem came from the code of "phathits_on_chunk" function in lib/GFFDB.pm, near line 539: there is a SQL query that fetches features that overlap with the border of the sequence chunk. The problem is that it also fetches features that are completely outside of the chunk in the same region. This produces an error when maker tries to mask the sequence as it does a substr outside the string. I fixed it by patching lib/repeat_mask_seq.pm, near line 138: I replaced: substr($$seq, $b -1 , $l, "$replace"x$l); By: if ($b < length($$seq)) { substr($$seq, $b -1 , $l, "$replace"x$l); } I don't know if there is a more elegant solution, but this seems to solve the problem. Cheers Anthony From carsonhh at gmail.com Mon Jun 10 10:13:50 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Jun 2013 12:13:50 -0400 Subject: [maker-devel] Patch for a bug with repeat gff In-Reply-To: <51B5F53D.90505@rennes.inra.fr> Message-ID: Could you use MAKER version 2.28 instead (launch with maker -a if it still fails). Thanks, Carson On 13-06-10 11:48 AM, "Anthony Bretaudeau" wrote: >Hello, >I am running Maker 2.27b on an insect genome, and I use a gff file >containing some repeat positions (rm_gff option in maker_opts.ctl). > >I encountered an error on 10 scaffolds (the genome contains ~40000 >scaffolds) : "substr outside of string" (similar to this post: >http://gmod.827538.n3.nabble.com/substr-outside-of-string-td4031889.html). > >After a lot a debugging, it turns out the problem came from the code of >"phathits_on_chunk" function in lib/GFFDB.pm, near line 539: there is a >SQL query that fetches features that overlap with the border of the >sequence chunk. >The problem is that it also fetches features that are completely outside >of the chunk in the same region. This produces an error when maker tries >to mask the sequence as it does a substr outside the string. > >I fixed it by patching lib/repeat_mask_seq.pm, near line 138: >I replaced: > substr($$seq, $b -1 , $l, "$replace"x$l); >By: > if ($b < length($$seq)) { > substr($$seq, $b -1 , $l, "$replace"x$l); > } > >I don't know if there is a more elegant solution, but this seems to >solve the problem. > >Cheers >Anthony > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From barry.moore at genetics.utah.edu Mon Jun 10 11:13:49 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Mon, 10 Jun 2013 11:13:49 -0600 Subject: [maker-devel] maker 2.28 blastx error In-Reply-To: References: Message-ID: <1618E393-D123-4D96-AD98-8DDFA9BCD9EF@genetics.utah.edu> Hi Michel, Yes wublast is the problem. On current versions of maker the opts file defaults to ncbi+, but older versions the opts file defaults to wublast. Just edit you maker_bopts.ctl file to have the line: blast_type=ncbi+ It seems like this option may have been in maker_opts.ctl in older files, so if you don't find it in bopts then look in opts. B On Jun 10, 2013, at 7:03 AM, wrote: > Hello Maker-developper and user > > I am using maker for the first time to annotate some BAC-sequences. > I successfully run both of the test-data sets provided in the maker tarball but when i run maker on my > sequences and provide some EST-evidence from cufflinks, i get errors at repeat database blasting (see error below). > As te_protein data set i just use the provided file in maker/data/. > > I sent the data to a colleague which could run it without problem using maker2.10. > Or is the problem that i dont have wublast and RepBase installed? > > Any hint is highly appreciated! > > Thanks, > Michel > > > std.error > > STATUS: Parsing control files... > WARNING: blast_type is set to 'wublast' but executables cannot be located > The blast_type 'ncbi+' will be used instead. > > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/insert-bac2_datastore > > To access files for individual sequences use the datastore index: > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/insert-bac2_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > #--------------------------------------------------------------------- > Now starting the contig!! > SeqID: bac2:383-131865 > Length: 131482 > #--------------------------------------------------------------------- > > > setting up GFF3 output and fasta chunks > doing repeat masking > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /usr/bin/makeblastdb -dbtype prot -in /tmp/maker_rcBcxr/0/blastprep/te_proteins%2Efasta.mpi.10.0 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /usr/bin/blastx -db /tmp/maker_rcBcxr/te_proteins%2Efasta.mpi.10.0 -query /tmp/maker_rcBcxr/0/bac2%3A383-131865.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/insert-bac2_datastore/1D/F1/bac2%3A383-131865//theVoid.bac2%3A383-131865/0/bac2%3A383-131865.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner > #-------------------------------# > BLAST engine error: Warning: Sequence contains no data > BLAST engine error: Warning: Sequence contains no data > ERROR: BLASTX failed > --> rank=NA, hostname=ipsktube > ERROR: Failed while doing blastx repeats > ERROR: Chunk failed at level:1, tier_type:1 > FAILED CONTIG:bac2:383-131865 > > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:bac2:383-131865 > > examining contents of the fasta file and run log > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Jun 10 11:32:55 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Jun 2013 13:32:55 -0400 Subject: [maker-devel] maker 2.28 blastx error In-Reply-To: <1618E393-D123-4D96-AD98-8DDFA9BCD9EF@genetics.utah.edu> Message-ID: It's actually a little more complicated than that. You are already using BLAST+. The sequence you are running on is apparently entirely masked, so there is nothing there to align. The error thrown by NCBI BLAST+ when this happens (currently "Sequence contains no data ") has changed slightly over time. As a result it causes MAKER to fail where wublast doesn't because the error it throws is still recognized, captured by MAKER, and ignored. You can probably ignore that contig, run with a different version of BLAST, or put the attached files in the ?/maker/lib/Widget/ directory. I fixed the check for the current message, so it will ignore the error (as long as the error is still going to STDERR and not STDOUT). --Carson From: Barry Moore Date: Monday, 10 June, 2013 1:13 PM To: Cc: Subject: Re: [maker-devel] maker 2.28 blastx error Hi Michel, Yes wublast is the problem. On current versions of maker the opts file defaults to ncbi+, but older versions the opts file defaults to wublast. Just edit you maker_bopts.ctl file to have the line: blast_type=ncbi+ It seems like this option may have been in maker_opts.ctl in older files, so if you don't find it in bopts then look in opts. B On Jun 10, 2013, at 7:03 AM, wrote: > Hello Maker-developper and user > > I am using maker for the first time to annotate some BAC-sequences. > I successfully run both of the test-data sets provided in the maker tarball > but when i run maker on my > sequences and provide some EST-evidence from cufflinks, i get errors at repeat > database blasting (see error below). > As te_protein data set i just use the provided file in maker/data/. > > I sent the data to a colleague which could run it without problem using > maker2.10. > Or is the problem that i dont have wublast and RepBase installed? > > Any hint is highly appreciated! > > Thanks, > Michel > > > std.error > > STATUS: Parsing control files... > WARNING: blast_type is set to 'wublast' but executables cannot be located > The blast_type 'ncbi+' will be used instead. > > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/inser > t-bac2_datastore > > To access files for individual sequences use the datastore index: > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/inser > t-bac2_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > #--------------------------------------------------------------------- > Now starting the contig!! > SeqID: bac2:383-131865 > Length: 131482 > #--------------------------------------------------------------------- > > > setting up GFF3 output and fasta chunks > doing repeat masking > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /usr/bin/makeblastdb -dbtype prot -in > /tmp/maker_rcBcxr/0/blastprep/te_proteins%2Efasta.mpi.10.0 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /usr/bin/blastx -db /tmp/maker_rcBcxr/te_proteins%2Efasta.mpi.10.0 -query > /tmp/maker_rcBcxr/0/bac2%3A383-131865.0 -num_alignments 10000 > -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 > -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/inser > t-bac2_datastore/1D/F1/bac2%3A383-131865//theVoid.bac2%3A383-131865/0/bac2%3A3 > 83-131865.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi. > 10.0.repeatrunner > #-------------------------------# > BLAST engine error: Warning: Sequence contains no data > BLAST engine error: Warning: Sequence contains no data > ERROR: BLASTX failed > --> rank=NA, hostname=ipsktube > ERROR: Failed while doing blastx repeats > ERROR: Chunk failed at level:1, tier_type:1 > FAILED CONTIG:bac2:383-131865 > > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:bac2:383-131865 > > examining contents of the fasta file and run log > > > > nsert-bac2.fasta>_______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blastn.pm Type: text/x-perl-script Size: 7442 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: blastx.pm Type: text/x-perl-script Size: 7502 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: tblastx.pm Type: text/x-perl-script Size: 8364 bytes Desc: not available URL: From carsonhh at gmail.com Mon Jun 10 11:53:53 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Jun 2013 13:53:53 -0400 Subject: [maker-devel] maker 2.28 blastx error In-Reply-To: <1618E393-D123-4D96-AD98-8DDFA9BCD9EF@genetics.utah.edu> Message-ID: Never mind. It's even a little weirder than what I just explained. The contig named (bac2:383-131865) is triggering a behavior on the BioPerl indexer where it recognizes it as a region and not a contig. As a result it can't find the sequence, but also doesn't throw an error (results in an empty fasta). Solution: Just change the name of the contig. Try using 'bac2_383-131865' instread. --Carson From: Barry Moore Date: Monday, 10 June, 2013 1:13 PM To: Cc: Subject: Re: [maker-devel] maker 2.28 blastx error Hi Michel, Yes wublast is the problem. On current versions of maker the opts file defaults to ncbi+, but older versions the opts file defaults to wublast. Just edit you maker_bopts.ctl file to have the line: blast_type=ncbi+ It seems like this option may have been in maker_opts.ctl in older files, so if you don't find it in bopts then look in opts. B On Jun 10, 2013, at 7:03 AM, wrote: > Hello Maker-developper and user > > I am using maker for the first time to annotate some BAC-sequences. > I successfully run both of the test-data sets provided in the maker tarball > but when i run maker on my > sequences and provide some EST-evidence from cufflinks, i get errors at repeat > database blasting (see error below). > As te_protein data set i just use the provided file in maker/data/. > > I sent the data to a colleague which could run it without problem using > maker2.10. > Or is the problem that i dont have wublast and RepBase installed? > > Any hint is highly appreciated! > > Thanks, > Michel > > > std.error > > STATUS: Parsing control files... > WARNING: blast_type is set to 'wublast' but executables cannot be located > The blast_type 'ncbi+' will be used instead. > > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/inser > t-bac2_datastore > > To access files for individual sequences use the datastore index: > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/inser > t-bac2_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > #--------------------------------------------------------------------- > Now starting the contig!! > SeqID: bac2:383-131865 > Length: 131482 > #--------------------------------------------------------------------- > > > setting up GFF3 output and fasta chunks > doing repeat masking > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /usr/bin/makeblastdb -dbtype prot -in > /tmp/maker_rcBcxr/0/blastprep/te_proteins%2Efasta.mpi.10.0 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /usr/bin/blastx -db /tmp/maker_rcBcxr/te_proteins%2Efasta.mpi.10.0 -query > /tmp/maker_rcBcxr/0/bac2%3A383-131865.0 -num_alignments 10000 > -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 > -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out > /home/moser/PHD/ANNOTATION/maker/BAC2/ginas-try/insert-bac2.maker.output/inser > t-bac2_datastore/1D/F1/bac2%3A383-131865//theVoid.bac2%3A383-131865/0/bac2%3A3 > 83-131865.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi. > 10.0.repeatrunner > #-------------------------------# > BLAST engine error: Warning: Sequence contains no data > BLAST engine error: Warning: Sequence contains no data > ERROR: BLASTX failed > --> rank=NA, hostname=ipsktube > ERROR: Failed while doing blastx repeats > ERROR: Chunk failed at level:1, tier_type:1 > FAILED CONTIG:bac2:383-131865 > > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:bac2:383-131865 > > examining contents of the fasta file and run log > > > > nsert-bac2.fasta>_______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From anthony.bretaudeau at rennes.inra.fr Tue Jun 11 09:03:42 2013 From: anthony.bretaudeau at rennes.inra.fr (Anthony Bretaudeau) Date: Tue, 11 Jun 2013 17:03:42 +0200 Subject: [maker-devel] Patch for a bug with repeat gff In-Reply-To: References: Message-ID: <51B73C4E.6030204@rennes.inra.fr> Hello, I have just tested with 2.28b: the problem is still there, and my fix works on this version too. Cheers Anthony On 10/06/2013 18:13, Carson Holt wrote: > Could you use MAKER version 2.28 instead (launch with maker -a if it still > fails). > > Thanks, > Carson > > > > On 13-06-10 11:48 AM, "Anthony Bretaudeau" > wrote: > >> Hello, >> I am running Maker 2.27b on an insect genome, and I use a gff file >> containing some repeat positions (rm_gff option in maker_opts.ctl). >> >> I encountered an error on 10 scaffolds (the genome contains ~40000 >> scaffolds) : "substr outside of string" (similar to this post: >> http://gmod.827538.n3.nabble.com/substr-outside-of-string-td4031889.html). >> >> After a lot a debugging, it turns out the problem came from the code of >> "phathits_on_chunk" function in lib/GFFDB.pm, near line 539: there is a >> SQL query that fetches features that overlap with the border of the >> sequence chunk. >> The problem is that it also fetches features that are completely outside >> of the chunk in the same region. This produces an error when maker tries >> to mask the sequence as it does a substr outside the string. >> >> I fixed it by patching lib/repeat_mask_seq.pm, near line 138: >> I replaced: >> substr($$seq, $b -1 , $l, "$replace"x$l); >> By: >> if ($b < length($$seq)) { >> substr($$seq, $b -1 , $l, "$replace"x$l); >> } >> >> I don't know if there is a more elegant solution, but this seems to >> solve the problem. >> >> Cheers >> Anthony >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From carsonhh at gmail.com Tue Jun 11 09:06:10 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 11 Jun 2013 11:06:10 -0400 Subject: [maker-devel] Patch for a bug with repeat gff In-Reply-To: <51B73C4E.6030204@rennes.inra.fr> Message-ID: Could you send me your repeat_gff and genome fasta, so I can take a look. Thanks, Carson On 13-06-11 11:03 AM, "Anthony Bretaudeau" wrote: >Hello, >I have just tested with 2.28b: the problem is still there, and my fix >works on this version too. >Cheers >Anthony > >On 10/06/2013 18:13, Carson Holt wrote: >> Could you use MAKER version 2.28 instead (launch with maker -a if it >>still >> fails). >> >> Thanks, >> Carson >> >> >> >> On 13-06-10 11:48 AM, "Anthony Bretaudeau" >> wrote: >> >>> Hello, >>> I am running Maker 2.27b on an insect genome, and I use a gff file >>> containing some repeat positions (rm_gff option in maker_opts.ctl). >>> >>> I encountered an error on 10 scaffolds (the genome contains ~40000 >>> scaffolds) : "substr outside of string" (similar to this post: >>> >>>http://gmod.827538.n3.nabble.com/substr-outside-of-string-td4031889.html >>>). >>> >>> After a lot a debugging, it turns out the problem came from the code of >>> "phathits_on_chunk" function in lib/GFFDB.pm, near line 539: there is a >>> SQL query that fetches features that overlap with the border of the >>> sequence chunk. >>> The problem is that it also fetches features that are completely >>>outside >>> of the chunk in the same region. This produces an error when maker >>>tries >>> to mask the sequence as it does a substr outside the string. >>> >>> I fixed it by patching lib/repeat_mask_seq.pm, near line 138: >>> I replaced: >>> substr($$seq, $b -1 , $l, "$replace"x$l); >>> By: >>> if ($b < length($$seq)) { >>> substr($$seq, $b -1 , $l, "$replace"x$l); >>> } >>> >>> I don't know if there is a more elegant solution, but this seems to >>> solve the problem. >>> >>> Cheers >>> Anthony >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > From anthony.bretaudeau at rennes.inra.fr Wed Jun 12 07:29:14 2013 From: anthony.bretaudeau at rennes.inra.fr (Anthony Bretaudeau) Date: Wed, 12 Jun 2013 15:29:14 +0200 Subject: [maker-devel] Patch for a bug with repeat gff In-Reply-To: References: Message-ID: <51B877AA.8060803@rennes.inra.fr> Hi, Here is a minimal gff file that allows to reproduce the bug. It should work with any fasta (my real data is not yet published, I can't share it publicly yet). Tell me if you need more info Anthony On 11/06/2013 17:06, Carson Holt wrote: > Could you send me your repeat_gff and genome fasta, so I can take a look. > > Thanks, > Carson > > > > On 13-06-11 11:03 AM, "Anthony Bretaudeau" > wrote: > >> Hello, >> I have just tested with 2.28b: the problem is still there, and my fix >> works on this version too. >> Cheers >> Anthony >> >> On 10/06/2013 18:13, Carson Holt wrote: >>> Could you use MAKER version 2.28 instead (launch with maker -a if it >>> still >>> fails). >>> >>> Thanks, >>> Carson >>> >>> >>> >>> On 13-06-10 11:48 AM, "Anthony Bretaudeau" >>> wrote: >>> >>>> Hello, >>>> I am running Maker 2.27b on an insect genome, and I use a gff file >>>> containing some repeat positions (rm_gff option in maker_opts.ctl). >>>> >>>> I encountered an error on 10 scaffolds (the genome contains ~40000 >>>> scaffolds) : "substr outside of string" (similar to this post: >>>> >>>> http://gmod.827538.n3.nabble.com/substr-outside-of-string-td4031889.html >>>> ). >>>> >>>> After a lot a debugging, it turns out the problem came from the code of >>>> "phathits_on_chunk" function in lib/GFFDB.pm, near line 539: there is a >>>> SQL query that fetches features that overlap with the border of the >>>> sequence chunk. >>>> The problem is that it also fetches features that are completely >>>> outside >>>> of the chunk in the same region. This produces an error when maker >>>> tries >>>> to mask the sequence as it does a substr outside the string. >>>> >>>> I fixed it by patching lib/repeat_mask_seq.pm, near line 138: >>>> I replaced: >>>> substr($$seq, $b -1 , $l, "$replace"x$l); >>>> By: >>>> if ($b < length($$seq)) { >>>> substr($$seq, $b -1 , $l, "$replace"x$l); >>>> } >>>> >>>> I don't know if there is a more elegant solution, but this seems to >>>> solve the problem. >>>> >>>> Cheers >>>> Anthony >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- scaffold_20 TEs match 199889 203598 0.0 + . ID=some_id_1 scaffold_20 TEs match_part 199889 200163 2.6e-12 + . ID=part_1;Parent=some_id_1 scaffold_20 TEs match_part 203256 203598 2.6e-12 + . ID=part_2;Parent=some_id_1 From sickler.alex at gmail.com Wed Jun 12 12:22:17 2013 From: sickler.alex at gmail.com (Alex Sickler) Date: Wed, 12 Jun 2013 14:22:17 -0400 Subject: [maker-devel] Problem Installing with opencc Message-ID: Hi all, I am trying to install Maker 2.28. When I go to install Maker, it gives the following error message: /usr/bin/perl /usr/local/share/perl5/ExtUtils/xsubpp -typemap "/usr/share/perl5/ExtUtils/typemap" MPI.xs $ /share/apps/openmpi/OpenMPI-1.6.3/bin/mpicc -c -I"/share/apps/maker/src" -I/share/apps/openmpi/OpenMPI-1.6.3/include -D_REENTRANT -D_GNU_SOUR$ opencc WARNING: unknown flag: -fstack-protector opencc WARNING: unknown flag: -fstack-protector opencc ERROR: -- not allowed in non XPG4 environment opencc ERROR parsing --param=ssp-buffer-size=4: unknown flag make: *** [MPI.o] Error 2 The to everything is correct. I tried looking in the Makefile.PL but could not find the "param=" option. Any help is greatly appreciated, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jun 13 13:38:52 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 13 Jun 2013 15:38:52 -0400 Subject: [maker-devel] Problem Installing with opencc In-Reply-To: Message-ID: MAKER installation doesn't have a Makefile.PL. The parameters for compilation of the MPI bindings are being set by mpicc itself, Perl, or environmental variables on your system. In general you want both Perl and OpenMPI to be compiled by the same compiler or you can get cross library problems (as Perl is using the shared libraries in OpenMPI so all communication is really at the C level). This is not always the case, but can happen (I have been fine for the most part mixing pgi, intel, and gcc compiled OpenMPI, but have never tried open64 compilers). Alternatively you can try manually setting the values in the following environmental variables before installing MAKER which should affect the parameter settings (this means before even running the 'perl Build.PL' step): LDFLAGS= LDDLFLAGS= CCCDLFLAGS= CCDLFLAGS= Also you need to export the following variable for OpenMPI to work with shared libraries before trying to install MAKER or run MAKER (this means before even running the 'perl Build.PL' step). It's best just to add it to your ~/.bashrc or ~/.bash_profile. export LD_PRELOAD=/share/apps/openmpi/OpenMPI-1.6.3/lib/libmpi.so You will need to run 'source ~/.bashrc' or 'source ~/.bash_profile' after adding it to implement the changes into the current terminal session. Thanks, Carson From: Alex Sickler Date: Wednesday, 12 June, 2013 2:22 PM To: Cc: Subject: [maker-devel] Problem Installing with opencc Hi all, I am trying to install Maker 2.28. When I go to install Maker, it gives the following error message: /usr/bin/perl /usr/local/share/perl5/ExtUtils/xsubpp -typemap "/usr/share/perl5/ExtUtils/typemap" MPI.xs $ /share/apps/openmpi/OpenMPI-1.6.3/bin/mpicc -c -I"/share/apps/maker/src" -I/share/apps/openmpi/OpenMPI-1.6.3/include -D_REENTRANT -D_GNU_SOUR$ opencc WARNING: unknown flag: -fstack-protector opencc WARNING: unknown flag: -fstack-protector opencc ERROR: -- not allowed in non XPG4 environment opencc ERROR parsing --param=ssp-buffer-size=4: unknown flag make: *** [MPI.o] Error 2 The to everything is correct. I tried looking in the Makefile.PL but could not find the "param=" option. Any help is greatly appreciated, Alex _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun Jun 16 13:46:51 2013 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 16 Jun 2013 15:46:51 -0400 Subject: [maker-devel] Patch for a bug with repeat gff In-Reply-To: <51B877AA.8060803@rennes.inra.fr> Message-ID: Thanks for the detailed report and test files. The problem initiates with your GFF3 giving a repeat structure that is a spliced repeat. I don't know if such a thing can really occur, but regardless maker doesn't expect them to occur, and as a result when assembled some of the spliced exons run off the edge of the sequence. The script currently checks for repeats where the end of a repeat runs off the edge and adjusts accordingly, but does not check for a start that runs off the edge (because it's not expecting spliced repeats). The result is the substring outside of string error. I added 'next if($l <=0)' to both the _soft_mask_seq and _hard_mask_seq functions, and hopefully having spliced repeats won't cause other hidden errors elsewhere downstream, but you may need to be aware of the possibility. Thanks, Carson On 13-06-12 9:29 AM, "Anthony Bretaudeau" wrote: >Hi, >Here is a minimal gff file that allows to reproduce the bug. It should >work with any fasta (my real data is not yet published, I can't share it >publicly yet). >Tell me if you need more info >Anthony > >On 11/06/2013 17:06, Carson Holt wrote: >> Could you send me your repeat_gff and genome fasta, so I can take a >>look. >> >> Thanks, >> Carson >> >> >> >> On 13-06-11 11:03 AM, "Anthony Bretaudeau" >> wrote: >> >>> Hello, >>> I have just tested with 2.28b: the problem is still there, and my fix >>> works on this version too. >>> Cheers >>> Anthony >>> >>> On 10/06/2013 18:13, Carson Holt wrote: >>>> Could you use MAKER version 2.28 instead (launch with maker -a if it >>>> still >>>> fails). >>>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> >>>> On 13-06-10 11:48 AM, "Anthony Bretaudeau" >>>> wrote: >>>> >>>>> Hello, >>>>> I am running Maker 2.27b on an insect genome, and I use a gff file >>>>> containing some repeat positions (rm_gff option in maker_opts.ctl). >>>>> >>>>> I encountered an error on 10 scaffolds (the genome contains ~40000 >>>>> scaffolds) : "substr outside of string" (similar to this post: >>>>> >>>>> >>>>>http://gmod.827538.n3.nabble.com/substr-outside-of-string-td4031889.ht >>>>>ml >>>>> ). >>>>> >>>>> After a lot a debugging, it turns out the problem came from the code >>>>>of >>>>> "phathits_on_chunk" function in lib/GFFDB.pm, near line 539: there >>>>>is a >>>>> SQL query that fetches features that overlap with the border of the >>>>> sequence chunk. >>>>> The problem is that it also fetches features that are completely >>>>> outside >>>>> of the chunk in the same region. This produces an error when maker >>>>> tries >>>>> to mask the sequence as it does a substr outside the string. >>>>> >>>>> I fixed it by patching lib/repeat_mask_seq.pm, near line 138: >>>>> I replaced: >>>>> substr($$seq, $b -1 , $l, "$replace"x$l); >>>>> By: >>>>> if ($b < length($$seq)) { >>>>> substr($$seq, $b -1 , $l, "$replace"x$l); >>>>> } >>>>> >>>>> I don't know if there is a more elegant solution, but this seems to >>>>> solve the problem. >>>>> >>>>> Cheers >>>>> Anthony >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> >>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>g >> > From jmdoyle at purdue.edu Mon Jun 17 11:20:42 2013 From: jmdoyle at purdue.edu (Jacqueline R M Doyle) Date: Mon, 17 Jun 2013 13:20:42 -0400 (EDT) Subject: [maker-devel] altest without MPI? Message-ID: <1755059295.37969.1371489642806.JavaMail.root@mailhub042.itcs.purdue.edu> Hi! I am beginning my first MAKER annotation and had a quick question. I am currently planning on following the ?Training ab initio Gene Predictors? section of the MAKER 2012 tutorial. For my species of interest, I have 784290 scaffolds in which 80% are greater than 100 kb. I have EST data from a closely related species and was also going to use the core cegma protein sequences. With this in mind, I made the following changes to my maker_opts file: genome=scaffolds.fasta altest=Trinity.fasta protein=cegma.fa est2genome=1 cpus=48 My primary concern is that this is going to take a long time to run with altest, even with the extra cpus for BLAST. The software was not originally installed on our computer cluster with MPICH2, but I may be able to talk our computer guys into reinstalling if the situation is going to be completely untenable without MPI. I guess my question is, is there any point in trying to run the above without MPI? Is there a good way to monitor the progress of such a run if I was to give it a shot? Thanks for your help with this! Jackie From carsonhh at gmail.com Mon Jun 17 14:12:58 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 17 Jun 2013 16:12:58 -0400 Subject: [maker-devel] altest without MPI? In-Reply-To: <1755059295.37969.1371489642806.JavaMail.root@mailhub042.itcs.purdue.edu> Message-ID: It's best to use the cegma results with the cegma2zff script to generate a training set for SNAP. Then don't use the cegma proteins. If you can get proteins from a related species with an annotated genome, it will be better than altest option for a different species. This is because altest is aligned via tbalstx which is 3-4 time slower than protein alignments. Also they will rarely be good enough to produce many est2genome models (best to only use them if you have nothing else). The cpus= option is a blast parameter for specifying how many cpus to give to each blast job. It is not an MPI parameter. The number of cpus for MPI is specified using the -n option from mpiexec and not in the maker control files. You don't have to use MPI. You can also split your contigs up into separate jobs and run MAKER multiple times. Use the fasta_tool script that comes with MAKER to split your input file up. Let us know if you come across anything you have more questions on. Thanks, Carson On 13-06-17 1:20 PM, "Jacqueline R M Doyle" wrote: >Hi! > >I am beginning my first MAKER annotation and had a quick question. I am >currently planning on following the ?Training ab initio Gene Predictors? >section of the MAKER 2012 tutorial. For my species of interest, I have >784290 scaffolds in which 80% are greater than 100 kb. I have EST data >from a closely related species and was also going to use the core cegma >protein sequences. With this in mind, I made the following changes to my >maker_opts file: > >genome=scaffolds.fasta >altest=Trinity.fasta >protein=cegma.fa >est2genome=1 >cpus=48 > >My primary concern is that this is going to take a long time to run with >altest, even with the extra cpus for BLAST. The software was not >originally installed on our computer cluster with MPICH2, but I may be >able to talk our computer guys into reinstalling if the situation is >going to be completely untenable without MPI. I guess my question is, is >there any point in trying to run the above without MPI? Is there a good >way to monitor the progress of such a run if I was to give it a shot? > >Thanks for your help with this! > >Jackie > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Jun 19 19:05:49 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 19 Jun 2013 21:05:49 -0400 Subject: [maker-devel] altest without MPI? In-Reply-To: <1997335285.43753.1371676376399.JavaMail.root@mailhub042.itcs.purdue.edu> Message-ID: The throughput is based on contig length, so long contigs will take longer than short contigs. Any contig less than 10kb is mostly useless for annotation purposes (so you can filter those from your 800,000 right away). Take your contigs that finish, and sum up their length to get a better estimate of how long it will take to complete running. Most genomes can complete in a few days an a multi-core machine. Bigger genomes or bigger datasets take longer. (note that altest evidence takes 3-4x longer to align than proteins). The advantage of proteins is that the species do not have to be closely related. Nucleotide sequence diverges quickly and proteins slowly (that's why proteins are used for phylogenetic trees). A good strategy would be to get ~10Mb of sequence (use your longest contigs). Run with Chicken, turkey, and pigeon proteins. Use the protein2genome option to generate annotations. Those annotations should now be sufficient to train SNAP and Augustus. Then you can finish by running all your contigs with the same dataset (protein2genome now turned off), use the newly trained snap and augustus files along with any altest files you want to use. Note that the size of the dataset will determine the total run time. To get things to run faster, you can also run on your university's computer cluster (then you will have hundreds of cpus available to you). The purdue cluster supports MPI and with 30-50 cpus you could annotate even large genomes in a reasonable time. Alternatively you can request a startup account at XSEDE, an NFS funded computer resource open to all US institutions. A startup allocation with 50,000 cpu hours only takes 2 weeks to approve. You should request an allocation on the Lonestar cluster if you go that route, it has 64,000 cpus. I was able to annotate the Maize genome (which is a very large genome at over 2 gigabases). I used an abnormally large EST and protein datasets (~4 gigabases of evidence which is much more than a normal annotation job), and it completed in under 3 hours on 2,100 cpus. --Carson On 13-06-19 5:12 PM, "Jacqueline R M Doyle" wrote: >Hi Carson (and whoever else might be reading this!) > >Thanks so much, I think splitting the files up using fasta_tool will >definitely move things along. I did a trial version with altest this >weekend, and seemed to be averaging about an hour a scaffold (with 1 >cpu). I'm a little concerned, as we have ~800,000 scaffolds. Does this >seem like a reasonable estimate of the time it should take to annotate >one sequence? Could I be missing something in my maker_opts file? > >Let me back up for just a minute and describe the project a little more >generally. As I mentioned before, we have no protein sequences or ESTs >for our species of interest, which is an avian species. I could >potentially use proteins from chicken or turkey, but neither is closely >related to our species. Time is a bit of an issue... do you have any >thoughts on how much time per scaffold it should take to annotate using >protein2genome? If chicken and turkey are not closely related, is it >worth the time investment? > >Let me finish by saying I think MAKER is wonderful, and I really >appreciate the discussions on this group. > >Best wishes, Jackie From jjin01 at mail.rockefeller.edu Thu Jun 20 14:22:22 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Thu, 20 Jun 2013 20:22:22 +0000 Subject: [maker-devel] maker exon result Message-ID: Dear all, I have used maker to predict the gene model in my draft genome. However, when I check the sequence for each exon, I find some of them just have start codon, without stop codon. Is it reasonable for this? Like in this example: processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9 processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0|0|1|0.14|0.12|8|0|362 processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . + 2 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG This is the sequence for this gene, the red color is for the first exon?? However, for this exon, I cannot found the stop codon??? I also find for some exon, there are several stop codon in one exon??? Does anyone have the same problem with me? Or there is something wrong when I configure the maker file?? Thanks! Jingjing -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Thu Jun 20 17:06:29 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 20 Jun 2013 23:06:29 +0000 Subject: [maker-devel] maker exon result In-Reply-To: References: Message-ID: Hi Jingjing, It's really hard to find the stop codon in the nucleotide sequence that you sent. I think most people determine the presence of a stop codon in a gene by viewing the annotations and sequence in some kind of viewer. The one that I use the most is Apollo, but many people also like gbrowse and igv. When you view gene models in Apollo, the start codons are highlighted in green and the stop codons are highlighted in red. Sometimes MAKER couldn't find the stop or start codon for a gene, and in those cases, the end of the gene model is marked with an orange arrow. I hope that I understood your question. Feel free to reply back on the mailing list if I didn't. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing Jin [jjin01 at mail.rockefeller.edu] Sent: Thursday, June 20, 2013 2:22 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker exon result Dear all, I have used maker to predict the gene model in my draft genome. However, when I check the sequence for each exon, I find some of them just have start codon, without stop codon. Is it reasonable for this? Like in this example: processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9 processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0|0|1|0.14|0.12|8|0|362 processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . + 2 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG This is the sequence for this gene, the red color is for the first exon?? However, for this exon, I cannot found the stop codon??? I also find for some exon, there are several stop codon in one exon??? Does anyone have the same problem with me? Or there is something wrong when I configure the maker file?? Thanks! Jingjing -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Thu Jun 20 17:11:56 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu, 20 Jun 2013 17:11:56 -0600 Subject: [maker-devel] maker exon result In-Reply-To: References: Message-ID: <6312A919-6E3A-43F5-A553-5947204FC6DB@genetics.utah.edu> To add to what Daniel suggested if you want to find the stop codon for this gene, look at the last three nucleotides of the last CDS. B On Jun 20, 2013, at 5:06 PM, Daniel Ence wrote: > Hi Jingjing, > > It's really hard to find the stop codon in the nucleotide sequence that you sent. I think most people determine the presence of a stop codon in a gene by viewing the annotations and sequence in some kind of viewer. The one that I use the most is Apollo, but many people also like gbrowse and igv. > > When you view gene models in Apollo, the start codons are highlighted in green and the stop codons are highlighted in red. Sometimes MAKER couldn't find the stop or start codon for a gene, and in those cases, the end of the gene model is marked with an orange arrow. > > I hope that I understood your question. Feel free to reply back on the mailing list if I didn't. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing Jin [jjin01 at mail.rockefeller.edu] > Sent: Thursday, June 20, 2013 2:22 PM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] maker exon result > > Dear all, > > I have used maker to predict the gene model in my draft genome. > > However, when I check the sequence for each exon, I find some of them just have start codon, without stop codon. > > Is it reasonable for this? > > Like in this example: > > processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9 > processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0|0|1|0.14|0.12|8|0|362 > processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . + 2 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > > ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT > CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT > AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG > ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA > TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC > TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG > TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG > ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT > TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT > TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC > TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC > AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT > TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT > AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC > TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA > AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT > TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG > CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC > GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT > TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG > AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA > AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA > TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC > TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC > TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG > TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA > TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC > TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA > ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT > GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA > AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA > GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT > TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT > TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG > AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC > TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT > TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT > AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC > TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA > GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG > GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA > TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA > TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT > TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG > CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA > GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG > AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT > ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT > TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA > CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA > CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT > ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA > TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG > AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG > > > This is the sequence for this gene, the red color is for the first exon?? > > However, for this exon, I cannot found the stop codon??? > > I also find for some exon, there are several stop codon in one exon??? > > Does anyone have the same problem with me? > Or there is something wrong when I configure the maker file?? > > Thanks! > > Jingjing > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Thu Jun 20 18:18:18 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Fri, 21 Jun 2013 00:18:18 +0000 Subject: [maker-devel] maker exon result In-Reply-To: References: , Message-ID: For my understanding, the prediction gene model should be connect different exon together. For each exon of a gene, I think it should have a start codon and stop codon. However, it may be wrong. However, when I check some gene model from maker prediction, some exon of one gene, I cannot find stop codon for it. Like the example I give, the red color is the first exon. However, the last 3 NT is not a stop codon. Even for last 3 NT for last exon, it is also not a stop codon. Is it reasonable? Thanks! Jingjing ________________________________ From: Daniel Ence [dence at genetics.utah.edu] Sent: Thursday, June 20, 2013 7:06 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: RE: maker exon result Hi Jingjing, It's really hard to find the stop codon in the nucleotide sequence that you sent. I think most people determine the presence of a stop codon in a gene by viewing the annotations and sequence in some kind of viewer. The one that I use the most is Apollo, but many people also like gbrowse and igv. When you view gene models in Apollo, the start codons are highlighted in green and the stop codons are highlighted in red. Sometimes MAKER couldn't find the stop or start codon for a gene, and in those cases, the end of the gene model is marked with an orange arrow. I hope that I understood your question. Feel free to reply back on the mailing list if I didn't. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing Jin [jjin01 at mail.rockefeller.edu] Sent: Thursday, June 20, 2013 2:22 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker exon result Dear all, I have used maker to predict the gene model in my draft genome. However, when I check the sequence for each exon, I find some of them just have start codon, without stop codon. Is it reasonable for this? Like in this example: processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9 processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0|0|1|0.14|0.12|8|0|362 processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . + 2 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG This is the sequence for this gene, the red color is for the first exon?? However, for this exon, I cannot found the stop codon??? I also find for some exon, there are several stop codon in one exon??? Does anyone have the same problem with me? Or there is something wrong when I configure the maker file?? Thanks! Jingjing -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Thu Jun 20 18:21:38 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Fri, 21 Jun 2013 00:21:38 +0000 Subject: [maker-devel] maker exon result In-Reply-To: <6312A919-6E3A-43F5-A553-5947204FC6DB@genetics.utah.edu> References: , <6312A919-6E3A-43F5-A553-5947204FC6DB@genetics.utah.edu> Message-ID: For the last three nucleotides of this example, it is also not stop codon. Jingjing ________________________________ From: Barry Moore [barry.moore at genetics.utah.edu] Sent: Thursday, June 20, 2013 7:11 PM To: Daniel Ence Cc: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] maker exon result To add to what Daniel suggested if you want to find the stop codon for this gene, look at the last three nucleotides of the last CDS. B On Jun 20, 2013, at 5:06 PM, Daniel Ence wrote: Hi Jingjing, It's really hard to find the stop codon in the nucleotide sequence that you sent. I think most people determine the presence of a stop codon in a gene by viewing the annotations and sequence in some kind of viewer. The one that I use the most is Apollo, but many people also like gbrowse and igv. When you view gene models in Apollo, the start codons are highlighted in green and the stop codons are highlighted in red. Sometimes MAKER couldn't find the stop or start codon for a gene, and in those cases, the end of the gene model is marked with an orange arrow. I hope that I understood your question. Feel free to reply back on the mailing list if I didn't. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing Jin [jjin01 at mail.rockefeller.edu] Sent: Thursday, June 20, 2013 2:22 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker exon result Dear all, I have used maker to predict the gene model in my draft genome. However, when I check the sequence for each exon, I find some of them just have start codon, without stop codon. Is it reasonable for this? Like in this example: processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9 processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0|0|1|0.14|0.12|8|0|362 processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . + 2 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG This is the sequence for this gene, the red color is for the first exon?? However, for this exon, I cannot found the stop codon??? I also find for some exon, there are several stop codon in one exon??? Does anyone have the same problem with me? Or there is something wrong when I configure the maker file?? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From myandell at genetics.utah.edu Thu Jun 20 19:11:40 2013 From: myandell at genetics.utah.edu (Mark Yandell) Date: Fri, 21 Jun 2013 01:11:40 +0000 Subject: [maker-devel] maker exon result In-Reply-To: References: , , Message-ID: <7A60AB257EFF2B48B1F4C814817EA05365E18B22@mxb2.hg.genetics.utah.edu> Hi Jin, only the terminal coding exon (CDS) of a gene model will contain a stop codon. Sometimes though there is no stop codon as the gene actually runs of the end of the scaffold, or is lost in a gab in the assembly... --mark Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing Jin [jjin01 at mail.rockefeller.edu] Sent: Thursday, June 20, 2013 6:18 PM To: Daniel Ence; maker-devel at yandell-lab.org Subject: Re: [maker-devel] maker exon result For my understanding, the prediction gene model should be connect different exon together. For each exon of a gene, I think it should have a start codon and stop codon. However, it may be wrong. However, when I check some gene model from maker prediction, some exon of one gene, I cannot find stop codon for it. Like the example I give, the red color is the first exon. However, the last 3 NT is not a stop codon. Even for last 3 NT for last exon, it is also not a stop codon. Is it reasonable? Thanks! Jingjing ________________________________ From: Daniel Ence [dence at genetics.utah.edu] Sent: Thursday, June 20, 2013 7:06 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: RE: maker exon result Hi Jingjing, It's really hard to find the stop codon in the nucleotide sequence that you sent. I think most people determine the presence of a stop codon in a gene by viewing the annotations and sequence in some kind of viewer. The one that I use the most is Apollo, but many people also like gbrowse and igv. When you view gene models in Apollo, the start codons are highlighted in green and the stop codons are highlighted in red. Sometimes MAKER couldn't find the stop or start codon for a gene, and in those cases, the end of the gene model is marked with an orange arrow. I hope that I understood your question. Feel free to reply back on the mailing list if I didn't. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing Jin [jjin01 at mail.rockefeller.edu] Sent: Thursday, June 20, 2013 2:22 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker exon result Dear all, I have used maker to predict the gene model in my draft genome. However, when I check the sequence for each exon, I find some of them just have start codon, without stop codon. Is it reasonable for this? Like in this example: processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9 processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0|0|1|0.14|0.12|8|0|362 processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . + 2 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG This is the sequence for this gene, the red color is for the first exon?? However, for this exon, I cannot found the stop codon??? I also find for some exon, there are several stop codon in one exon??? Does anyone have the same problem with me? Or there is something wrong when I configure the maker file?? Thanks! Jingjing From bmoore at genetics.utah.edu Thu Jun 20 19:29:41 2013 From: bmoore at genetics.utah.edu (Barry Moore) Date: Fri, 21 Jun 2013 01:29:41 +0000 Subject: [maker-devel] maker exon result In-Reply-To: References: , , Message-ID: <8BA467BB-5549-4385-A398-65951A19B86C@genetics.utah.edu> To clarify things a bit Jin. Not every exon will have a start and/or stop codon only the fist coding exon will have a start and the last coding exon will have a stop. In the GFF3 format a coding exon is a feature of type 'CDS' (column 3) so only look at CDS features not at 'exon' features. For CDSs you must then concatenate the sequence I'd each CDS line for a given transcript (and reverse compliment the sequence if it is on the minus strand). The resulting sequence will usually (but not always) have start and stop codons at the beginning and end. B Barry Moore Research Scientist Dept. Human Genetics University of Utah On Jun 20, 2013, at 6:18 PM, "Jingjing Jin" > wrote: For my understanding, the prediction gene model should be connect different exon together. For each exon of a gene, I think it should have a start codon and stop codon. However, it may be wrong. However, when I check some gene model from maker prediction, some exon of one gene, I cannot find stop codon for it. Like the example I give, the red color is the first exon. However, the last 3 NT is not a stop codon. Even for last 3 NT for last exon, it is also not a stop codon. Is it reasonable? Thanks! Jingjing ________________________________ From: Daniel Ence [dence at genetics.utah.edu] Sent: Thursday, June 20, 2013 7:06 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: RE: maker exon result Hi Jingjing, It's really hard to find the stop codon in the nucleotide sequence that you sent. I think most people determine the presence of a stop codon in a gene by viewing the annotations and sequence in some kind of viewer. The one that I use the most is Apollo, but many people also like gbrowse and igv. When you view gene models in Apollo, the start codons are highlighted in green and the stop codons are highlighted in red. Sometimes MAKER couldn't find the stop or start codon for a gene, and in those cases, the end of the gene model is marked with an orange arrow. I hope that I understood your question. Feel free to reply back on the mailing list if I didn't. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing Jin [jjin01 at mail.rockefeller.edu] Sent: Thursday, June 20, 2013 2:22 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker exon result Dear all, I have used maker to predict the gene model in my draft genome. However, when I check the sequence for each exon, I find some of them just have start codon, without stop codon. Is it reasonable for this? Like in this example: processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9 processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0|0|1|0.14|0.12|8|0|362 processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . + . ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . + 2 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . + 0 ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG This is the sequence for this gene, the red color is for the first exon?? However, for this exon, I cannot found the stop codon??? I also find for some exon, there are several stop codon in one exon??? Does anyone have the same problem with me? Or there is something wrong when I configure the maker file?? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From kara.deleon at biofilm.montana.edu Thu Jun 20 16:25:31 2013 From: kara.deleon at biofilm.montana.edu (Bowen, Kara (De Leon)) Date: Thu, 20 Jun 2013 16:25:31 -0600 Subject: [maker-devel] augustus_species Message-ID: <3E82665C-ECB7-4A07-B0FF-24E8395EDC4D@biofilm.montana.edu> Hello, I am trying to annotation a Chlamydomonas genome and C. reinhartii was used as a model organism in Augustus. I would like to add this model to augustus_species in the maker_opts.ctl file, but I'm not sure how this information should be inserted on this line (ie. as genus name, file location, etc). I am also having an issue with providing a protein file. When I put in the protein fasta file of C. reinhartti from the Augustus website, I get a fatal error (below). I've looked through the fasta and I'm not seeing anything obvious that would cause this error to be thrown. Do you have any suggestions on where to start to look? Can't open sequence index file /Users/kara/Desktop/CBMW_maker_protein/contigs.maker.output/mpi_blastdb/augustus%2Eu9_aa%2Efasta.mpi.10/augustus%2Eu9_aa%2Efasta.mpi.10.1.index: Inappropriate file type or format at /sw/lib/perl5/5.12.4/Bio/DB/Fasta.pm line 527. FATAL ERROR Thanks for any help you can provide. Kara ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Kara De Le?n Postdoctoral Research Associate Montana State University Center for Biofilm Engineering 366 EPS Building Bozeman, MT 59717 208-484-9078 kara.deleon at biofilm.montana.edu ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -------------- next part -------------- An HTML attachment was scrubbed... URL: From gowthaman.ramasamy at seattlebiomed.org Fri Jun 21 07:29:06 2013 From: gowthaman.ramasamy at seattlebiomed.org (Gowthaman Ramasamy) Date: Fri, 21 Jun 2013 06:29:06 -0700 Subject: [maker-devel] augustus_species Message-ID: I believe the model file should go to Augustus installation directory. Actually in to the 'genomes' sub folder there. Then use the exact name of the model file ( minus extension) in .CTL file....... "Bowen, Kara (De Leon)" wrote: Hello, I am trying to annotation a Chlamydomonas genome and C. reinhartii was used as a model organism in Augustus. I would like to add this model to augustus_species in the maker_opts.ctl file, but I'm not sure how this information should be inserted on this line (ie. as genus name, file location, etc). I am also having an issue with providing a protein file. When I put in the protein fasta file of C. reinhartti from the Augustus website, I get a fatal error (below). I've looked through the fasta and I'm not seeing anything obvious that would cause this error to be thrown. Do you have any suggestions on where to start to look? Can't open sequence index file /Users/kara/Desktop/CBMW_maker_protein/contigs.maker.output/mpi_blastdb/augustus%2Eu9_aa%2Efasta.mpi.10/augustus%2Eu9_aa%2Efasta.mpi.10.1.index: Inappropriate file type or format at /sw/lib/perl5/5.12.4/Bio/DB/Fasta.pm line 527. FATAL ERROR Thanks for any help you can provide. Kara ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Kara De Le?n Postdoctoral Research Associate Montana State University Center for Biofilm Engineering 366 EPS Building Bozeman, MT 59717 208-484-9078 kara.deleon at biofilm.montana.edu ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From carsonhh at gmail.com Fri Jun 21 09:24:17 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Jun 2013 11:24:17 -0400 Subject: [maker-devel] augustus_species In-Reply-To: Message-ID: The model files must go in .../augustus/config/species/ under the augustus installation directory (Each model gets a different directory). The species that augustus can accept will be the same as the directory names under .../augustus/config/species/. The command 'augustus --species=help' will also provide a list of those names. For the protein file can you send it to me? --Carson On 13-06-21 9:29 AM, "Gowthaman Ramasamy" wrote: >I believe the model file should go to Augustus installation directory. >Actually in to the 'genomes' sub folder there. Then use the exact name of >the model file ( minus extension) in .CTL file....... > >"Bowen, Kara (De Leon)" wrote: > > > >Hello, >I am trying to annotation a Chlamydomonas genome and C. reinhartii was >used as a model organism in Augustus. I would like to add this model to >augustus_species in the maker_opts.ctl file, but I'm not sure how this >information should be inserted on this line (ie. as genus name, file >location, etc). > >I am also having an issue with providing a protein file. When I put in >the protein fasta file of C. reinhartti from the Augustus website, I get >a fatal error (below). I've looked through the fasta and I'm not seeing >anything obvious that would cause this error to be thrown. Do you have >any suggestions on where to start to look? > > >Can't open sequence index file >/Users/kara/Desktop/CBMW_maker_protein/contigs.maker.output/mpi_blastdb/au >gustus%2Eu9_aa%2Efasta.mpi.10/augustus%2Eu9_aa%2Efasta.mpi.10.1.index: >Inappropriate file type or format at /sw/lib/perl5/5.12.4/Bio/DB/Fasta.pm >line 527. > >FATAL ERROR > > >Thanks for any help you can provide. > >Kara > > >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >Kara De Le?n >Postdoctoral Research Associate >Montana State University >Center for Biofilm Engineering >366 EPS Building >Bozeman, MT 59717 >208-484-9078 >kara.deleon at biofilm.montana.edu >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri Jun 21 07:58:35 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Jun 2013 09:58:35 -0400 Subject: [maker-devel] maker exon result In-Reply-To: <8BA467BB-5549-4385-A398-65951A19B86C@genetics.utah.edu> Message-ID: To further illustrate this I've highlighted the location of all CDS entries. You need to cut them out, string them together linearly, and only then can you translate. There is a start codon for the merged CDS then all open reading frame following that, but no stop codon so this is a partial transcript. Sometimes the gene predictors do not find a likely stop and a partial model scores better. You can force MAKER to try and find a stop even when the gene predictor (snap, augustus, etc.) doesn't by setting always_complete=1 in the maker_opts.ctl file. Keep in mind that this is just a forced canonical completion. ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCA GAATTATCTCCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCG GGGAAGTTGATTGAGAATAATCCATCAGGGgtgatacagaagaattgtttcagTATCTTG TTGAAATATTGGCTTCTAGAGTGTATGATGTAGCAATTGATTCCCCCTTGCAAAATGCAA CTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGATCAAAAGAGAGGATATGCAGTCCg tatgtttctcctctcttctttttttgatgtagcatttgctttaacttagaatttgtggtt ttaaacataccattagaaaggtatggaggttgaggattagggtagtaaagtaggtagtct agagtgttcataacagtaatattgacaagcagtctcgctttccgttggtagtaggttttt atgactaaccgttattttctttcattgttgatcaacttacttttgttgtttttattctgc ttttatatggctttttggtactgtcccttcttgtctatattttcattaatgtggtgctta tgcttttctaagccgagagtttattggaaacaactttcatatcctcacaaggtaggggta aggtgtgcgtacacactaccctccccagactctacggtgtgggataatatttagtatgtt attgtcgttgttgttgtaaacgttttttttgttgctatcaaagcatgttattacgggtaa aatagaaacatttaaagtgaaagagtttccaaacgtaggaaagcttttttttctttcgga atacaccgaaaaaagaaagactatcatttaagatagaacaacaacagcgacggagctagc cttcgacttactggttcggcagaacccaataattttggcccaaactctgtacttgtacta aaaagctcacttaatatgtataaaaagcctagtaattaagttgcatttttttctttctaa aatctagagctcataaactcaaaattatgtctccgcctctgaacaatggggatattattc tacttttaactatcttagataagttaataattgttctctttttcaaacgtttctgccttg tattattgtgtaactatttatactgtgtggacgcttcaaaatgttgttgcgcccgcgtcg gatcctcaaaaaatatatattttgaggattcgacacgcacccgatgaccttttcggagaa ttcgagcaatataggtaactaatattgctagctcatcaactggtggtattttttagGTGC TCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTC AGAGACTTAAATGTACTGCTACGATTgtcatgcctgttaccacaccagagatcaaggtaa ttagttctctcctgttaatttatccttcatgttcgattcatgtgaatctagttgatcggg cactgagttttactaaaaaatgaagactttcggaacttgggagctttaacatgctgtaac atttgtgtagttataagacttttgaaacttatagtcttagtgggtgtttggacataagaa ttgtaaagttccaagaaaagtgaaaaaaaattcaagtgaaaatggtatttgaaaattaga gttgtgtttggacatgaatataattttaggttgtttttgaagttttgtgagtgatctgac acaaattttgaaaaaacaactttttggagtttttcaaattttcgaaaaattccaaaatgc atcttcaagtgaaaattggaaattatatgaccaaacgctgatttcgggaaaaaaattcga aaaaatgtgaaaattttcttatgtccaaacgggctcttaaatgcgtcataacgtttgtgt ggttataaaagtctctcatctgaatagggtcacacaactaaaacagagagaacaaaataa ttcactaaaaaaaaattggaactagctacaaacttcgtcgcaagtctcgctaaatcgctc gtagctaatagaatttctagataatttgtttagcttgtagcatgaaatttttctatttag caacagaagtagtctgtcgctaattcctatttttttagtagaaagtattgtgaaattatt tgtttttctaaaggaccattttctttacaaatgaacagattgaagcagttaagaacttgg atggtaatgtagttctacagGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGT TGGCTGAAGATGAAGgtctcacattcatcccgcctttcgatcacatcttaaagatataca tgcagtatttctgcctgtagGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAA AAGGGTTGCTCCTCATACAAAGATTATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGAC ACAGTCTTTGTACCACGGAATGAGAGTAAAGTTAGAACAAGTTGATAATTTTGCAGATGG CgtagctgttgcactagTTAGTTGGTGAAGAAACTTTCCGTCTTTGCAAAGATTTAATAG ACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGgttagcacgcacc atctcctaatggtttcagatatgatccgtccaaccagccaaaattggttagaataggacg ggttgaactatcaacccaatcaatcacagcccaaataacatttatgtgggtatatgactc gcccatttattaactcaaccaattttggtccattcaaattcaggctaacccgtccacgtt tgacattcatactttagatgtggattaaagtaactttcttaaatttccctctggttttga catgtactagtttgtgtttgtgtgtgttttgttctttttttcaatagGATGTGTACGACA AAGGAAGGAACATATTAGAGACATCAGGTGCACTCGCCATAGCTGGAGCTGAAGCATACT GCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTAGTGGAGCCAATATGG ACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGGAAGCTC TGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTgtgc gttacttagagcacttaacaagcattttagccagagtttaagttatatacatcgtcgtca gtgtaagaaacttttataccgtcttgatggagtaaaaatttgttacactgacgtgtacat aacttaaaacttttttagttactatatgatactttctgtctaagaaactgaaatattgac ttgaattactggtgggacctatgattattaccgaattcaagtacagatataactctggaa gaaaacaagctctagttctgtacaggtaattaaagttctattcatttttagaggggatgt tggcttctcattttagatttgctttattagttgttaggaaaaaagaaattacttattaca ttcaatttttagATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGGAAG TTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG Thanks, Carson From: Barry Moore Date: Thursday, 20 June, 2013 9:29 PM To: Jingjing Jin Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] maker exon result To clarify things a bit Jin. Not every exon will have a start and/or stop codon only the fist coding exon will have a start and the last coding exon will have a stop. In the GFF3 format a coding exon is a feature of type 'CDS' (column 3) so only look at CDS features not at 'exon' features. For CDSs you must then concatenate the sequence I'd each CDS line for a given transcript (and reverse compliment the sequence if it is on the minus strand). The resulting sequence will usually (but not always) have start and stop codons at the beginning and end. B Barry Moore Research Scientist Dept. Human Genetics University of Utah On Jun 20, 2013, at 6:18 PM, "Jingjing Jin" wrote: > For my understanding, the prediction gene model should be connect different > exon together. > > For each exon of a gene, I think it should have a start codon and stop codon. > However, it may be wrong. > > However, when I check some gene model from maker prediction, some exon of one > gene, I cannot find stop codon for it. Like the example I give, the red color > is the first exon. However, the last 3 NT is not a stop codon. > > Even for last 3 NT for last exon, it is also not a stop codon. > > Is it reasonable? > > Thanks! > > Jingjing > > > > From: Daniel Ence [dence at genetics.utah.edu] > Sent: Thursday, June 20, 2013 7:06 PM > To: Jingjing Jin; maker-devel at yandell-lab.org > Subject: RE: maker exon result > > Hi Jingjing, > > It's really hard to find the stop codon in the nucleotide sequence that you > sent. I think most people determine the presence of a stop codon in a gene by > viewing the annotations and sequence in some kind of viewer. The one that I > use the most is Apollo, but many people also like gbrowse and igv. > > When you view gene models in Apollo, the start codons are highlighted in green > and the stop codons are highlighted in red. Sometimes MAKER couldn't find the > stop or start codon for a gene, and in those cases, the end of the gene model > is marked with an orange arrow. > > I hope that I understood your question. Feel free to reply back on the mailing > list if I didn't. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jingjing > Jin [jjin01 at mail.rockefeller.edu] > Sent: Thursday, June 20, 2013 2:22 PM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] maker exon result > > Dear all, > > I have used maker to predict the gene model in my draft genome. > > However, when I check the sequence for each exon, I find some of them just > have start codon, without stop codon. > > Is it reasonable for this? > > Like in this example: > > processed_tobacco_genome_sequences_c33 maker gene 8916 12632 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-proce > ssed_tobacco_genome_sequences_c33-snap-gene-0.9 > processed_tobacco_genome_sequences_c33 maker mRNA 8916 12632 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;Parent=ma > ker-processed_tobacco_genome_sequences_c33-snap-gene-0.9;Name=maker-processed_ > tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1;_AED=0.13;_eAED=0.13;_QI=0|0 > |0|1|0.14|0.12|8|0|362 > processed_tobacco_genome_sequences_c33 maker exon 8916 9065 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:148; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 9089 9214 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:149; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 10232 10381 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:150; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11216 11270 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:151; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11336 11496 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:152; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11513 11602 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:153; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 11903 12151 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:154; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker exon 12528 12632 . > + . > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:exon:155; > Parent=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 8916 9065 . > + 0 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 9089 9214 . > + 0 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 10232 10381 . > + 0 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11216 11270 . > + 0 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11336 11496 . > + 2 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11513 11602 . > + 0 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 11903 12151 . > + 0 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > processed_tobacco_genome_sequences_c33 maker CDS 12528 12632 . > + 0 > ID=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1:cds;Paren > t=maker-processed_tobacco_genome_sequences_c33-snap-gene-0.9-mRNA-1 > > > ATGAAGGGCGCGATACGTACTACGATTCCAAAACCATCAGCATTGCCATTGAAGGTCTCAGAATTATCT > CCATCAGCTGATTCAGTACCCGTTCCAGCGTCTTTACAGGATGTCGAGGCGGGGAAGTTGATTGAGAAT > AATCCATCAGGGGTGATACAGAAGAATTGTTTCAGTATCTTGTTGAAATATTGGCTTCTAGAGTGTATG > ATGTAGCAATTGATTCCCCCTTGCAAAATGCAACTAAGCTTTCCAAGAAGCTTGGAGTTAACTTTTGGA > TCAAAAGAGAGGATATGCAGTCCGTATGTTTCTCCTCTCTTCTTTTTTTGATGTAGCATTTGCTTTAAC > TTAGAATTTGTGGTTTTAAACATACCATTAGAAAGGTATGGAGGTTGAGGATTAGGGTAGTAAAGTAGG > TAGTCTAGAGTGTTCATAACAGTAATATTGACAAGCAGTCTCGCTTTCCGTTGGTAGTAGGTTTTTATG > ACTAACCGTTATTTTCTTTCATTGTTGATCAACTTACTTTTGTTGTTTTTATTCTGCTTTTATATGGCT > TTTTGGTACTGTCCCTTCTTGTCTATATTTTCATTAATGTGGTGCTTATGCTTTTCTAAGCCGAGAGTT > TATTGGAAACAACTTTCATATCCTCACAAGGTAGGGGTAAGGTGTGCGTACACACTACCCTCCCCAGAC > TCTACGGTGTGGGATAATATTTAGTATGTTATTGTCGTTGTTGTTGTAAACGTTTTTTTTGTTGCTATC > AAAGCATGTTATTACGGGTAAAATAGAAACATTTAAAGTGAAAGAGTTTCCAAACGTAGGAAAGCTTTT > TTTTCTTTCGGAATACACCGAAAAAAGAAAGACTATCATTTAAGATAGAACAACAACAGCGACGGAGCT > AGCCTTCGACTTACTGGTTCGGCAGAACCCAATAATTTTGGCCCAAACTCTGTACTTGTACTAAAAAGC > TCACTTAATATGTATAAAAAGCCTAGTAATTAAGTTGCATTTTTTTCTTTCTAAAATCTAGAGCTCATA > AACTCAAAATTATGTCTCCGCCTCTGAACAATGGGGATATTATTCTACTTTTAACTATCTTAGATAAGT > TAATAATTGTTCTCTTTTTCAAACGTTTCTGCCTTGTATTATTGTGTAACTATTTATACTGTGTGGACG > CTTCAAAATGTTGTTGCGCCCGCGTCGGATCCTCAAAAAATATATATTTTGAGGATTCGACACGCACCC > GATGACCTTTTCGGAGAATTCGAGCAATATAGGTAACTAATATTGCTAGCTCATCAACTGGTGGTATTT > TTTAGGTGCTCTCATTCAAGCTTAGAGGAGCTTATAACATGATGACCAAACTCTCAAAGGAGCAATTAG > AAAGAGGGGTTATAACTGCTTCAGCTGGAAATCATGCACAAGGTGTTGCATTAGGTGCTCAGAGACTTA > AATGTACTGCTACGATTGTCATGCCTGTTACCACACCAGAGATCAAGGTAATTAGTTCTCTCCTGTTAA > TTTATCCTTCATGTTCGATTCATGTGAATCTAGTTGATCGGGCACTGAGTTTTACTAAAAAATGAAGAC > TTTCGGAACTTGGGAGCTTTAACATGCTGTAACATTTGTGTAGTTATAAGACTTTTGAAACTTATAGTC > TTAGTGGGTGTTTGGACATAAGAATTGTAAAGTTCCAAGAAAAGTGAAAAAAAATTCAAGTGAAAATGG > TATTTGAAAATTAGAGTTGTGTTTGGACATGAATATAATTTTAGGTTGTTTTTGAAGTTTTGTGAGTGA > TCTGACACAAATTTTGAAAAAACAACTTTTTGGAGTTTTTCAAATTTTCGAAAAATTCCAAAATGCATC > TTCAAGTGAAAATTGGAAATTATATGACCAAACGCTGATTTCGGGAAAAAAATTCGAAAAAATGTGAAA > ATTTTCTTATGTCCAAACGGGCTCTTAAATGCGTCATAACGTTTGTGTGGTTATAAAAGTCTCTCATCT > GAATAGGGTCACACAACTAAAACAGAGAGAACAAAATAATTCACTAAAAAAAAATTGGAACTAGCTACA > AACTTCGTCGCAAGTCTCGCTAAATCGCTCGTAGCTAATAGAATTTCTAGATAATTTGTTTAGCTTGTA > GCATGAAATTTTTCTATTTAGCAACAGAAGTAGTCTGTCGCTAATTCCTATTTTTTTAGTAGAAAGTAT > TGTGAAATTATTTGTTTTTCTAAAGGACCATTTTCTTTACAAATGAACAGATTGAAGCAGTTAAGAACT > TGGATGGTAATGTAGTTCTACAGGGTGACACATTTGATGAAGCTCAAGCACATGCTTTAAAGTTGGCTG > AAGATGAAGGTCTCACATTCATCCCGCCTTTCGATCACATCTTAAAGATATACATGCAGTATTTCTGCC > TGTAGGAGGAGGAGGTTTAATAGCTGGTGTTGCTGCATATTTCAAAAGGGTTGCTCCTCATACAAAGAT > TATAGGAGTTGAGCCATTTGGTGCAAGTTCAATGACACAGTCTTTGTACCACGGAATGAGAGTAAAGTT > AGAACAAGTTGATAATTTTGCAGATGGCGTAGCTGTTGCACTAGTTAGTTGGTGAAGAAACTTTCCGTC > TTTGCAAAGATTTAATAGACGGAATGGTCTTAGTCAGTAACGATGCTATTAGTGCAGCAGTAAAGGTTA > GCACGCACCATCTCCTAATGGTTTCAGATATGATCCGTCCAACCAGCCAAAATTGGTTAGAATAGGACG > GGTTGAACTATCAACCCAATCAATCACAGCCCAAATAACATTTATGTGGGTATATGACTCGCCCATTTA > TTAACTCAACCAATTTTGGTCCATTCAAATTCAGGCTAACCCGTCCACGTTTGACATTCATACTTTAGA > TGTGGATTAAAGTAACTTTCTTAAATTTCCCTCTGGTTTTGACATGTACTAGTTTGTGTTTGTGTGTGT > TTTGTTCTTTTTTTCAATAGGATGTGTACGACAAAGGAAGGAACATATTAGAGACATCAGGTGCACTCG > CCATAGCTGGAGCTGAAGCATACTGCAAATACTATGACATAAAGGGCGAAAACGTTGTAGCAATTGCTA > GTGGAGCCAATATGGACATCAGCAAACTAAAATTAGTCGTCGATTTAGCAGATATTGGTGGACAGAGGG > AAGCTCTGCTGGCTACTTTTATGCCAGAAGAACCAGGAAGCTTCAAAAAATTCTGCGAACTTGTGCGTT > ACTTAGAGCACTTAACAAGCATTTTAGCCAGAGTTTAAGTTATATACATCGTCGTCAGTGTAAGAAACT > TTTATACCGTCTTGATGGAGTAAAAATTTGTTACACTGACGTGTACATAACTTAAAACTTTTTTAGTTA > CTATATGATACTTTCTGTCTAAGAAACTGAAATATTGACTTGAATTACTGGTGGGACCTATGATTATTA > CCGAATTCAAGTACAGATATAACTCTGGAAGAAAACAAGCTCTAGTTCTGTACAGGTAATTAAAGTTCT > ATTCATTTTTAGAGGGGATGTTGGCTTCTCATTTTAGATTTGCTTTATTAGTTGTTAGGAAAAAAGAAA > TTACTTATTACATTCAATTTTTAGATTTTCTGTCAATTCATATTTCCTGAGAAGCCTGGAGCTTTAAGG > AAGTTCTTAGATGCTTTCAGCCCTCGATGGAATATAAGTTTGTTCCATTATCGTGAACAG > > > > > This is the sequence for this gene, the red color is for the first exon?? > > > However, for this exon, I cannot found the stop codon??? > > > I also find for some exon, there are several stop codon in one exon??? > > > Does anyone have the same problem with me? > > Or there is something wrong when I configure the maker file?? > > > Thanks! > > > Jingjing > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From amelia.ireland at gmod.org Sun Jun 23 20:15:37 2013 From: amelia.ireland at gmod.org (Amelia Ireland) Date: Sun, 23 Jun 2013 19:15:37 -0700 Subject: [maker-devel] Fwd: about running MAKER In-Reply-To: References: Message-ID: >From the GMOD helpdesk; please cc Lin, lin11 at cougars.csusm.edu. ---------- Forwarded message ---------- From: Yunxi Lin Date: Sun, Jun 23, 2013 at 4:14 PM Subject: about running MAKER To: "gmod-help at gmod.org" Hi I'm running a eukaryote project on our server. Because our server do not have the GUI, is that still work for MAKER? And our command already ran more than one month to try to generate the model use for the training of SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux server. Thank you. Sincerely, Lin -- Amelia Ireland GMOD Community Support http://gmod.org || @gmodproject -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Jun 24 07:05:27 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Jun 2013 09:05:27 -0400 Subject: [maker-devel] Fwd: about running MAKER In-Reply-To: Message-ID: Run time is dependent on the size of your evidence dataset, genome size, and number of processors you use. If you have a large genome (Gb size) and you are running on a single cpu then that could take a long time. This is especially true if you use the alt_est option for evidence as these are aligned via tblastx which is 3-4 times slower than protein alignments, and 10-20 time slower than standard EST alignments. 95% of MAKER's runtime is BLAST alignment so your evidence dataset is the major factor. Also you do not need results from the entire genome to train SNAP. If you get results from ~10Mb of the genome that is usually sufficient. Also make sure you are taking advantage of parallelization. Launch via MPI to get maximum performance. I commonly launch on 16 and 32 cpu Linux servers which can annotate most fungal genomes in a few hours and larger genomes in a few days. --Carson From: Amelia Ireland Date: Sunday, 23 June, 2013 10:15 PM To: Cc: Subject: [maker-devel] Fwd: about running MAKER >From the GMOD helpdesk; please cc Lin, lin11 at cougars.csusm.edu. ---------- Forwarded message ---------- From: Yunxi Lin Date: Sun, Jun 23, 2013 at 4:14 PM Subject: about running MAKER To: "gmod-help at gmod.org" Hi I'm running a eukaryote project on our server. Because our server do not have the GUI, is that still work for MAKER? And our command already ran more than one month to try to generate the model use for the training of SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux server. Thank you. Sincerely, Lin -- Amelia Ireland GMOD Community Support http://gmod.org || @gmodproject _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Carson.Holt at oicr.on.ca Mon Jun 24 18:39:08 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Tue, 25 Jun 2013 00:39:08 +0000 Subject: [maker-devel] Fwd: about running MAKER In-Reply-To: Message-ID: You are most likely only getting 1 cpu of performance. You should just install MPICH2. It's easy just to let MAKER do it for you: Go to the ?/maker/src/ directory Run './Build mpich2' Once it finishes installing, it will be in the ?/maker/exe/mpich2/bin/ directory. Setup MAKER again to use MPICH2: Go to the ?/maker/src/ directory Run 'perl Build.PL' Say yes to the "use MPI": question Run './Build install' Now run MAKER via 'mpiexec'. Example --> ?/maker/exe/mpich2/bin/mpiexec -n 16 maker The ?n flag specifies how many CPUS to use. Mpiexec handles process communication either on the same machine or across machines. You will get much better performance. Thanks, Carson From: Yunxi Lin > Date: Monday, 24 June, 2013 7:11 PM To: Carson Holt > Cc: Amelia Ireland >, > Subject: Re: [maker-devel] Fwd: about running MAKER Hi Carson Thank your for your help. My genome estimated size is 250M base pairs. I ran it in 16cpu, but we don't have the MPI so I cannot use it. I don't think I'm using the alt_est option. I was following the tutorial to do that. I used TopHat and Cufflinks to generate the ESTs from the assembly sequence based on RNA-seq. I used that ESTs to run the MAKER. I think I already got more than 10Mb data. The information you mentioned is very helpful. I may go to use them to try to train the SNAP and Augustus. Because this is my first time using the MAKER, I ran already a month, I was wondering maybe the command I used in a wrong way. Sincerely, Yunxi 2013/6/24 Carson Holt > Run time is dependent on the size of your evidence dataset, genome size, and number of processors you use. If you have a large genome (Gb size) and you are running on a single cpu then that could take a long time. This is especially true if you use the alt_est option for evidence as these are aligned via tblastx which is 3-4 times slower than protein alignments, and 10-20 time slower than standard EST alignments. 95% of MAKER's runtime is BLAST alignment so your evidence dataset is the major factor. Also you do not need results from the entire genome to train SNAP. If you get results from ~10Mb of the genome that is usually sufficient. Also make sure you are taking advantage of parallelization. Launch via MPI to get maximum performance. I commonly launch on 16 and 32 cpu Linux servers which can annotate most fungal genomes in a few hours and larger genomes in a few days. --Carson From: Amelia Ireland > Date: Sunday, 23 June, 2013 10:15 PM To: > Cc: > Subject: [maker-devel] Fwd: about running MAKER >From the GMOD helpdesk; please cc Lin, lin11 at cougars.csusm.edu. ---------- Forwarded message ---------- From: Yunxi Lin > Date: Sun, Jun 23, 2013 at 4:14 PM Subject: about running MAKER To: "gmod-help at gmod.org" > Hi I'm running a eukaryote project on our server. Because our server do not have the GUI, is that still work for MAKER? And our command already ran more than one month to try to generate the model use for the training of SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux server. Thank you. Sincerely, Lin -- Amelia Ireland GMOD Community Support http://gmod.org || @gmodproject _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From lin11 at cougars.csusm.edu Mon Jun 24 17:11:23 2013 From: lin11 at cougars.csusm.edu (Yunxi Lin) Date: Mon, 24 Jun 2013 16:11:23 -0700 Subject: [maker-devel] Fwd: about running MAKER In-Reply-To: References: Message-ID: Hi Carson Thank your for your help. My genome estimated size is 250M base pairs. I ran it in 16cpu, but we don't have the MPI so I cannot use it. I don't think I'm using the alt_est option. I was following the tutorial to do that. I used TopHat and Cufflinks to generate the ESTs from the assembly sequence based on RNA-seq. I used that ESTs to run the MAKER. I think I already got more than 10Mb data. The information you mentioned is very helpful. I may go to use them to try to train the SNAP and Augustus. Because this is my first time using the MAKER, I ran already a month, I was wondering maybe the command I used in a wrong way. Sincerely, Yunxi 2013/6/24 Carson Holt > Run time is dependent on the size of your evidence dataset, genome size, > and number of processors you use. If you have a large genome (Gb size) and > you are running on a single cpu then that could take a long time. This is > especially true if you use the alt_est option for evidence as these are > aligned via tblastx which is 3-4 times slower than protein alignments, and > 10-20 time slower than standard EST alignments. 95% of MAKER's runtime is > BLAST alignment so your evidence dataset is the major factor. > > Also you do not need results from the entire genome to train SNAP. If you > get results from ~10Mb of the genome that is usually sufficient. Also make > sure you are taking advantage of parallelization. Launch via MPI to get > maximum performance. I commonly launch on 16 and 32 cpu Linux servers > which can annotate most fungal genomes in a few hours and larger genomes in > a few days. > > --Carson > > > From: Amelia Ireland > Date: Sunday, 23 June, 2013 10:15 PM > To: > Cc: > Subject: [maker-devel] Fwd: about running MAKER > > From the GMOD helpdesk; please cc Lin, lin11 at cougars.csusm.edu. > > ---------- Forwarded message ---------- > From: Yunxi Lin > Date: Sun, Jun 23, 2013 at 4:14 PM > Subject: about running MAKER > To: "gmod-help at gmod.org" > > > Hi > > I'm running a eukaryote project on our server. Because our server do not > have the GUI, is that still work for MAKER? And our command already ran > more than one month to try to generate the model use for the training of > SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux > server. > > Thank you. > > Sincerely, > Lin > > > > -- > Amelia Ireland > GMOD Community Support > http://gmod.org || @gmodproject > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Jun 25 08:12:45 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 25 Jun 2013 14:12:45 +0000 Subject: [maker-devel] Fwd: about running MAKER In-Reply-To: References: , Message-ID: Hi Yunxi, During the maker installation, there is an option to automatically install MPICH2, which would let you run maker parallelized. Try rerunning the perl Build.PL script in the "maker/src" directory, and when the option to install MPICH2 comes up, tell it yes. This will start an automated download and install onto your server. You can also start more than one maker process. They will work on annotating the genome together. You can start as many as ten or more processes like this, but MPI is a better parallelizing option. Hope that helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Yunxi Lin [lin11 at cougars.csusm.edu] Sent: Monday, June 24, 2013 5:11 PM To: Carson Holt Cc: maker-devel at yandell-lab.org; Amelia Ireland Subject: Re: [maker-devel] Fwd: about running MAKER Hi Carson Thank your for your help. My genome estimated size is 250M base pairs. I ran it in 16cpu, but we don't have the MPI so I cannot use it. I don't think I'm using the alt_est option. I was following the tutorial to do that. I used TopHat and Cufflinks to generate the ESTs from the assembly sequence based on RNA-seq. I used that ESTs to run the MAKER. I think I already got more than 10Mb data. The information you mentioned is very helpful. I may go to use them to try to train the SNAP and Augustus. Because this is my first time using the MAKER, I ran already a month, I was wondering maybe the command I used in a wrong way. Sincerely, Yunxi 2013/6/24 Carson Holt > Run time is dependent on the size of your evidence dataset, genome size, and number of processors you use. If you have a large genome (Gb size) and you are running on a single cpu then that could take a long time. This is especially true if you use the alt_est option for evidence as these are aligned via tblastx which is 3-4 times slower than protein alignments, and 10-20 time slower than standard EST alignments. 95% of MAKER's runtime is BLAST alignment so your evidence dataset is the major factor. Also you do not need results from the entire genome to train SNAP. If you get results from ~10Mb of the genome that is usually sufficient. Also make sure you are taking advantage of parallelization. Launch via MPI to get maximum performance. I commonly launch on 16 and 32 cpu Linux servers which can annotate most fungal genomes in a few hours and larger genomes in a few days. --Carson From: Amelia Ireland > Date: Sunday, 23 June, 2013 10:15 PM To: > Cc: > Subject: [maker-devel] Fwd: about running MAKER >From the GMOD helpdesk; please cc Lin, lin11 at cougars.csusm.edu. ---------- Forwarded message ---------- From: Yunxi Lin > Date: Sun, Jun 23, 2013 at 4:14 PM Subject: about running MAKER To: "gmod-help at gmod.org" > Hi I'm running a eukaryote project on our server. Because our server do not have the GUI, is that still work for MAKER? And our command already ran more than one month to try to generate the model use for the training of SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux server. Thank you. Sincerely, Lin -- Amelia Ireland GMOD Community Support http://gmod.org || @gmodproject _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Carson.Holt at oicr.on.ca Tue Jun 25 09:56:22 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Tue, 25 Jun 2013 15:56:22 +0000 Subject: [maker-devel] Fwd: about running MAKER In-Reply-To: <9FC132E2-9E59-42E9-ADBA-FD91644E2124@cougars.csusm.edu> Message-ID: You can get blast to use more than 1 cpu via the cpus= option, but that is still significantly limiting MAKER's performance. When you let MAKER install MPICH2, it will be local to the MAKER installation (MAKER only). It will be in ?/maker/exe/mpich2. This was purposely done for people who have limited access and install MAKER themselves, so they can run via MPI without having to get upgraded privileges. So I don't know if you installed MAKER yourself, but if you did, then this is an option that will let you run. --Carson From: csusm > Date: Tuesday, 25 June, 2013 11:40 AM To: Carson Holt > Subject: Re: [maker-devel] Fwd: about running MAKER Hi Carson Thank you for your suggestion. Do you mean if I dont use MPI, i could only run it on one cpu? Because my school own the server, I only have the limit authorization. Yunxi Lin On Jun 24, 2013, at 5:39 PM, Carson Holt > wrote: You are most likely only getting 1 cpu of performance. You should just install MPICH2. It's easy just to let MAKER do it for you: Go to the ?/maker/src/ directory Run './Build mpich2' Once it finishes installing, it will be in the ?/maker/exe/mpich2/bin/ directory. Setup MAKER again to use MPICH2: Go to the ?/maker/src/ directory Run 'perl Build.PL' Say yes to the "use MPI": question Run './Build install' Now run MAKER via 'mpiexec'. Example --> ?/maker/exe/mpich2/bin/mpiexec -n 16 maker The ?n flag specifies how many CPUS to use. Mpiexec handles process communication either on the same machine or across machines. You will get much better performance. Thanks, Carson From: Yunxi Lin > Date: Monday, 24 June, 2013 7:11 PM To: Carson Holt > Cc: Amelia Ireland >, > Subject: Re: [maker-devel] Fwd: about running MAKER Hi Carson Thank your for your help. My genome estimated size is 250M base pairs. I ran it in 16cpu, but we don't have the MPI so I cannot use it. I don't think I'm using the alt_est option. I was following the tutorial to do that. I used TopHat and Cufflinks to generate the ESTs from the assembly sequence based on RNA-seq. I used that ESTs to run the MAKER. I think I already got more than 10Mb data. The information you mentioned is very helpful. I may go to use them to try to train the SNAP and Augustus. Because this is my first time using the MAKER, I ran already a month, I was wondering maybe the command I used in a wrong way. Sincerely, Yunxi 2013/6/24 Carson Holt > Run time is dependent on the size of your evidence dataset, genome size, and number of processors you use. If you have a large genome (Gb size) and you are running on a single cpu then that could take a long time. This is especially true if you use the alt_est option for evidence as these are aligned via tblastx which is 3-4 times slower than protein alignments, and 10-20 time slower than standard EST alignments. 95% of MAKER's runtime is BLAST alignment so your evidence dataset is the major factor. Also you do not need results from the entire genome to train SNAP. If you get results from ~10Mb of the genome that is usually sufficient. Also make sure you are taking advantage of parallelization. Launch via MPI to get maximum performance. I commonly launch on 16 and 32 cpu Linux servers which can annotate most fungal genomes in a few hours and larger genomes in a few days. --Carson From: Amelia Ireland > Date: Sunday, 23 June, 2013 10:15 PM To: > Cc: > Subject: [maker-devel] Fwd: about running MAKER >From the GMOD helpdesk; please cc Lin, lin11 at cougars.csusm.edu. ---------- Forwarded message ---------- From: Yunxi Lin > Date: Sun, Jun 23, 2013 at 4:14 PM Subject: about running MAKER To: "gmod-help at gmod.org" > Hi I'm running a eukaryote project on our server. Because our server do not have the GUI, is that still work for MAKER? And our command already ran more than one month to try to generate the model use for the training of SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux server. Thank you. Sincerely, Lin -- Amelia Ireland GMOD Community Support http://gmod.org || @gmodproject _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Tue Jun 25 15:13:53 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Tue, 25 Jun 2013 21:13:53 +0000 Subject: [maker-devel] start position for some genes results Message-ID: Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062 maker gene -1 507 . - . ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jun 25 16:55:11 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Jun 2013 18:55:11 -0400 Subject: [maker-devel] start position for some genes results In-Reply-To: Message-ID: What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062makergene-1507.-.ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-sn ap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Tue Jun 25 17:00:37 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Tue, 25 Jun 2013 23:00:37 +0000 Subject: [maker-devel] start position for some genes results In-Reply-To: References: , Message-ID: Sorry, I have checked. I think it is old version:2.27. I will try the new one. Thanks! Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062 maker gene -1 507 . - . ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Tue Jun 25 18:53:01 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Wed, 26 Jun 2013 00:53:01 +0000 Subject: [maker-devel] start position for some genes results In-Reply-To: References: , Message-ID: Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062 maker gene -1 507 . - . ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jun 25 18:55:54 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Jun 2013 20:55:54 -0400 Subject: [maker-devel] start position for some genes results In-Reply-To: Message-ID: Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062makergene-1507.-.ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-sn ap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Tue Jun 25 19:30:09 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Wed, 26 Jun 2013 01:30:09 +0000 Subject: [maker-devel] start position for some genes results In-Reply-To: References: , Message-ID: Dear Carson, I am so sorry. The problem is still here. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', 'HASH(0x4e10810)', 0) called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 73 Process::MpiTiers::__ANON__() called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 eval {...} called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, 'Process::MpiChunk') called at /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 --> rank=NA, hostname=ChuaServer1 ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `seq_id` does not exist in MpiTier object ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 8:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062 maker gene -1 507 . - . ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jun 25 19:47:10 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Jun 2013 21:47:10 -0400 Subject: [maker-devel] start position for some genes results In-Reply-To: Message-ID: Could you check for this sequence in your input genome file for "processed_tobacco_genome_sequences_c1", make sure that it is in fact that exact name, and there are no ':' characters in the name because they can confuse the bioperl fasta indexer. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 9:30 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Dear Carson, I am so sorry. The problem is still here. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', 'HASH(0x4e10810)', 0) called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 73 Process::MpiTiers::__ANON__() called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 eval {...} called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, 'Process::MpiChunk') called at /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 --> rank=NA, hostname=ChuaServer1 ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `seq_id` does not exist in MpiTier object From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 8:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062makergene-1507.-.ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-sn ap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Tue Jun 25 19:53:33 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Wed, 26 Jun 2013 01:53:33 +0000 Subject: [maker-devel] start position for some genes results In-Reply-To: References: , Message-ID: Yes, this is the real name. There is also no ":" in the name. Because I have use the same file for maker.2.27 and have no problem. I am not sure what is wrong with the new version. Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 9:47 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Could you check for this sequence in your input genome file for "processed_tobacco_genome_sequences_c1", make sure that it is in fact that exact name, and there are no ':' characters in the name because they can confuse the bioperl fasta indexer. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 9:30 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Dear Carson, I am so sorry. The problem is still here. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', 'HASH(0x4e10810)', 0) called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 73 Process::MpiTiers::__ANON__() called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 eval {...} called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, 'Process::MpiChunk') called at /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 --> rank=NA, hostname=ChuaServer1 ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `seq_id` does not exist in MpiTier object ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 8:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062 maker gene -1 507 . - . ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jun 25 20:02:51 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Jun 2013 22:02:51 -0400 Subject: [maker-devel] start position for some genes results In-Reply-To: Message-ID: The point of the failure you are seeing is occurring in the initialization stage, before reaching any of the changes that would have been introduced by 2.28. Try running the test data that comes with MAKER, does it fail as well? --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 9:53 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Yes, this is the real name. There is also no ":" in the name. Because I have use the same file for maker.2.27 and have no problem. I am not sure what is wrong with the new version. Jingjing From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 9:47 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Could you check for this sequence in your input genome file for "processed_tobacco_genome_sequences_c1", make sure that it is in fact that exact name, and there are no ':' characters in the name because they can confuse the bioperl fasta indexer. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 9:30 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Dear Carson, I am so sorry. The problem is still here. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', 'HASH(0x4e10810)', 0) called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 73 Process::MpiTiers::__ANON__() called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 eval {...} called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, 'Process::MpiChunk') called at /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 --> rank=NA, hostname=ChuaServer1 ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `seq_id` does not exist in MpiTier object From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 8:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062makergene-1507.-.ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-sn ap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Tue Jun 25 20:15:46 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Wed, 26 Jun 2013 02:15:46 +0000 Subject: [maker-devel] start position for some genes results In-Reply-To: References: , Message-ID: Yes, it also fails on test data. jingjing at ChuaServer1:~/software/maker.2.28/maker/data/example$ ../../bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/software/maker.2.28/maker/data/example/dpp_contig.maker.output/dpp_contig_datastore To access files for individual sequences use the datastore index: /home/jingjing/software/maker.2.28/maker/data/example/dpp_contig.maker.output/dpp_contig_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >contig-dpp-500-500, trying to re-index the fasta. stop here: contig-dpp-500-500 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 10:02 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results The point of the failure you are seeing is occurring in the initialization stage, before reaching any of the changes that would have been introduced by 2.28. Try running the test data that comes with MAKER, does it fail as well? --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 9:53 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Yes, this is the real name. There is also no ":" in the name. Because I have use the same file for maker.2.27 and have no problem. I am not sure what is wrong with the new version. Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 9:47 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Could you check for this sequence in your input genome file for "processed_tobacco_genome_sequences_c1", make sure that it is in fact that exact name, and there are no ':' characters in the name because they can confuse the bioperl fasta indexer. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 9:30 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Dear Carson, I am so sorry. The problem is still here. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', 'HASH(0x4e10810)', 0) called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 73 Process::MpiTiers::__ANON__() called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 eval {...} called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, 'Process::MpiChunk') called at /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 --> rank=NA, hostname=ChuaServer1 ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `seq_id` does not exist in MpiTier object ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 8:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062 maker gene -1 507 . - . ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jun 26 05:49:11 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Jun 2013 07:49:11 -0400 Subject: [maker-devel] start position for some genes results In-Reply-To: Message-ID: I thought as much. There is something wrong with the installation itself. Could you run maker with the --debug flag and kill it after 30 seconds. Capture the STDERR and send it to me. This is just to check prerequisite that are installed on your system for know incompatabilities. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 10:15 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Yes, it also fails on test data. jingjing at ChuaServer1:~/software/maker.2.28/maker/data/example$ ../../bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/software/maker.2.28/maker/data/example/dpp_contig.maker.outpu t/dpp_contig_datastore To access files for individual sequences use the datastore index: /home/jingjing/software/maker.2.28/maker/data/example/dpp_contig.maker.outpu t/dpp_contig_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >contig-dpp-500-500, trying to re-index the fasta. stop here: contig-dpp-500-500 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 10:02 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results The point of the failure you are seeing is occurring in the initialization stage, before reaching any of the changes that would have been introduced by 2.28. Try running the test data that comes with MAKER, does it fail as well? --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 9:53 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Yes, this is the real name. There is also no ":" in the name. Because I have use the same file for maker.2.27 and have no problem. I am not sure what is wrong with the new version. Jingjing From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 9:47 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Could you check for this sequence in your input genome file for "processed_tobacco_genome_sequences_c1", make sure that it is in fact that exact name, and there are no ':' characters in the name because they can confuse the bioperl fasta indexer. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 9:30 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Dear Carson, I am so sorry. The problem is still here. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', 'HASH(0x4e10810)', 0) called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 73 Process::MpiTiers::__ANON__() called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 eval {...} called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, 'Process::MpiChunk') called at /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 --> rank=NA, hostname=ChuaServer1 ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `seq_id` does not exist in MpiTier object From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 8:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1. maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062makergene-1507.-.ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-sn ap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michel.moser at ips.unibe.ch Thu Jun 27 07:33:15 2013 From: michel.moser at ips.unibe.ch (michel.moser at ips.unibe.ch) Date: Thu, 27 Jun 2013 13:33:15 +0000 Subject: [maker-devel] spliting genome for annotation Message-ID: Dear Maker-developers If i understood correctly, in order to increase speed and reduce needed resources one can split the genome into chunks and annotate each chunk separately. (i would really like to use that as i am working with a 1.2 Gbasepair draftgenome and cant use MPI on the computing cluster) I am a bit worried about how this might affect the annotation as the gene-predictor would get trained quite differently for each chunk, right? Or is there communication between the chunks using the -base function of maker? Could you maybe name some pros and cons of splitting your genome for the annotation with maker? Thank you very much, Michel ________________________________________ Von: Moser, Michel (IPS) Gesendet: Donnerstag, 27. Juni 2013 15:24 An: Carson Holt Betreff: AW: [maker-devel] start position for some genes results ________________________________________ Von: maker-devel [maker-devel-bounces at yandell-lab.org]" im Auftrag von "Carson Holt [carsonhh at gmail.com] Gesendet: Mittwoch, 26. Juni 2013 04:02 An: Jingjing Jin; maker-devel at yandell-lab.org Betreff: Re: [maker-devel] start position for some genes results The point of the failure you are seeing is occurring in the initialization stage, before reaching any of the changes that would have been introduced by 2.28. Try running the test data that comes with MAKER, does it fail as well? --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 9:53 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Yes, this is the real name. There is also no ":" in the name. Because I have use the same file for maker.2.27 and have no problem. I am not sure what is wrong with the new version. Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 9:47 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Could you check for this sequence in your input genome file for "processed_tobacco_genome_sequences_c1", make sure that it is in fact that exact name, and there are no ':' characters in the name because they can confuse the bioperl fasta indexer. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 9:30 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Dear Carson, I am so sorry. The problem is still here. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm line 239. Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', 'HASH(0x4e10810)', 0) called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 73 Process::MpiTiers::__ANON__() called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 eval {...} called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') called at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, 'Process::MpiChunk') called at /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 --> rank=NA, hostname=ChuaServer1 ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `seq_id` does not exist in MpiTier object ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 8:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results Delete the mpi_blastdb directory before starting, to make sure all indexes get rebuilt. Also make sure you are not setting TMP= to a network mounted location. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 8:53 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: RE: [maker-devel] start position for some genes results Dear Carson, When I use the new version of maker, I have another problem like this: jingjing at ChuaServer1:~/project/$ /home/jingjing/software/maker.2.28/maker/bin/./maker STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore To access files for individual sequences use the datastore index: /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log STATUS: Now running MAKER... WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to re-index the fasta. stop here: processed_tobacco_genome_sequences_c1 ERROR: Fasta index error Do you know how to fix this problem about new version? Thanks! Jingjing ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Tuesday, June 25, 2013 6:55 PM To: Jingjing Jin; maker-devel at yandell-lab.org Subject: Re: [maker-devel] start position for some genes results What MAKER version are you using? This should be fixed in the current 2.28. It only happened under a very specific set of circumstances, but I remember fixing it. So let me know if you are using 2.28. --Carson From: Jingjing Jin > Date: Tuesday, 25 June, 2013 5:13 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] start position for some genes results Dear all, I find some strange things about location for my final result. Like for some start position of final gene model: c124062 maker gene -1 507 . - . ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 It start position is -1. Does someone know why the start position is -1? Is there something wrong? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From lawson at ebi.ac.uk Thu Jun 27 07:37:10 2013 From: lawson at ebi.ac.uk (Daniel Lawson) Date: Thu, 27 Jun 2013 14:37:10 +0100 Subject: [maker-devel] spliting genome for annotation In-Reply-To: References: Message-ID: Michel, It is about the size of your scaffolds rather than the whole genome. Presumably you don't have 1.2 Gb of contiguous sequence. If you have long scaffolds then the compute time will be constrained by the time taken to process the largest scaffold. regards Dan On 27 June 2013 14:33, wrote: > Dear Maker-developers > > If i understood correctly, in order to increase speed and reduce needed > resources one can split the genome into chunks and annotate each chunk > separately. > (i would really like to use that as i am working with a 1.2 Gbasepair > draftgenome and cant use MPI on the computing cluster) > I am a bit worried about how this might affect the annotation as the > gene-predictor would get trained quite differently for each chunk, right? > Or is there communication between the chunks using the -base function of > maker? > > Could you maybe name some pros and cons of splitting your genome for the > annotation with maker? > > Thank you very much, > Michel > > > > > ________________________________________ > Von: Moser, Michel (IPS) > Gesendet: Donnerstag, 27. Juni 2013 15:24 > An: Carson Holt > Betreff: AW: [maker-devel] start position for some genes results > > ________________________________________ > Von: maker-devel [maker-devel-bounces at yandell-lab.org]" im Auftrag > von "Carson Holt [carsonhh at gmail.com] > Gesendet: Mittwoch, 26. Juni 2013 04:02 > An: Jingjing Jin; maker-devel at yandell-lab.org > Betreff: Re: [maker-devel] start position for some genes results > > The point of the failure you are seeing is occurring in the initialization > stage, before reaching any of the changes that would have been introduced > by 2.28. Try running the test data that comes with MAKER, does it fail as > well? > > --Carson > > > > From: Jingjing Jin jjin01 at mail.rockefeller.edu>> > Date: Tuesday, 25 June, 2013 9:53 PM > To: Carson Holt >, " > maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org> > Subject: RE: [maker-devel] start position for some genes results > > Yes, this is the real name. > > There is also no ":" in the name. > > Because I have use the same file for maker.2.27 and have no problem. > > I am not sure what is wrong with the new version. > > Jingjing > > > ________________________________ > From: Carson Holt [carsonhh at gmail.com] > Sent: Tuesday, June 25, 2013 9:47 PM > To: Jingjing Jin; maker-devel at yandell-lab.org maker-devel at yandell-lab.org> > Subject: Re: [maker-devel] start position for some genes results > > Could you check for this sequence in your input genome file for > "processed_tobacco_genome_sequences_c1", make sure that it is in fact that > exact name, and there are no ':' characters in the name because they can > confuse the bioperl fasta indexer. > > --Carson > > > From: Jingjing Jin jjin01 at mail.rockefeller.edu>> > Date: Tuesday, 25 June, 2013 9:30 PM > To: Carson Holt >, " > maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org> > Subject: RE: [maker-devel] start position for some genes results > > Dear Carson, > > > I am so sorry. The problem is still here. > > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore > > To access files for individual sequences use the datastore index: > > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log > > STATUS: Now running MAKER... > WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to > re-index the fasta. > stop here: processed_tobacco_genome_sequences_c1 > ERROR: Fasta index error > at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm > line 239. > Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', > 'HASH(0x4e10810)', 0) called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm > line 73 > Process::MpiTiers::__ANON__() called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 > eval {...} called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 > Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm > line 79 > Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') > called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm > line 56 > Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, > 'Process::MpiChunk') called at > /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 > --> rank=NA, hostname=ChuaServer1 > ERROR: Failed in tier preparation > WARNING: You must always set a rank before running MpiTiers > FATAL: argument `seq_id` does not exist in MpiTier object > > ________________________________ > From: Carson Holt [carsonhh at gmail.com] > Sent: Tuesday, June 25, 2013 8:55 PM > To: Jingjing Jin; maker-devel at yandell-lab.org maker-devel at yandell-lab.org> > Subject: Re: [maker-devel] start position for some genes results > > Delete the mpi_blastdb directory before starting, to make sure all indexes > get rebuilt. Also make sure you are not setting TMP= to a network mounted > location. > > --Carson > > > From: Jingjing Jin jjin01 at mail.rockefeller.edu>> > Date: Tuesday, 25 June, 2013 8:53 PM > To: Carson Holt >, " > maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org> > Subject: RE: [maker-devel] start position for some genes results > > Dear Carson, > > When I use the new version of maker, I have another problem like this: > > jingjing at ChuaServer1:~/project/$ > /home/jingjing/software/maker.2.28/maker/bin/./maker > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_datastore > > To access files for individual sequences use the datastore index: > > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.maker.output/tobacco_seq_1_master_datastore_index.log > > STATUS: Now running MAKER... > WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to > re-index the fasta. > stop here: processed_tobacco_genome_sequences_c1 > ERROR: Fasta index error > > > Do you know how to fix this problem about new version? > > Thanks! > > Jingjing > > > > ________________________________ > From: Carson Holt [carsonhh at gmail.com] > Sent: Tuesday, June 25, 2013 6:55 PM > To: Jingjing Jin; maker-devel at yandell-lab.org maker-devel at yandell-lab.org> > Subject: Re: [maker-devel] start position for some genes results > > What MAKER version are you using? This should be fixed in the current > 2.28. It only happened under a very specific set of circumstances, but I > remember fixing it. So let me know if you are using 2.28. > > --Carson > > > > From: Jingjing Jin jjin01 at mail.rockefeller.edu>> > Date: Tuesday, 25 June, 2013 5:13 PM > To: "maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org> > Subject: [maker-devel] start position for some genes results > > Dear all, > > I find some strange things about location for my final result. > > Like for some start position of final gene model: > > c124062 maker gene -1 507 . - . > ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 > > > It start position is -1. > > Does someone know why the start position is -1? > > Is there something wrong? > > Thanks! > > Jingjing > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Ensembl Genomes | VectorBase | i5K insect genome initiative -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jun 27 09:42:26 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 27 Jun 2013 11:42:26 -0400 Subject: [maker-devel] spliting genome for annotation In-Reply-To: Message-ID: Correct. The level of splitting is going to be limited by the largest config. The largest config will then be your slowest job, but the total runtime will be based off how much splitting you can achieve. Splitting into 10 jobs and running them all simultaneously will make total run time 1/10 as long. You can use the ?base flag with MAKER to make all jobs write to the same directory. Use the ?g flag to specify a different input fasta file for each job (then they can all share the same control files). You will then need to run maker once using the original assembly fasta and the ?dsindex flag when all jobs complete to get MAKER to clean up the datastore log file (rebuilt to index all contigs). That only takes 2 minutes to run. You can use the fasta_tool utility that comes with MAKER to conveniently split the input assembly fasta. MAKER does not train the gene predictors for you, and the hints it gives are on a per gene basis, so splitting contigs has no affect on that. For initial training of gene predictors, run MAKER on about 10-30 Mb of your largest contigs and use either the protein2genome or est2genome prediction options to build gene models to train the predictors on. You will need to train Augustus or SNAP yourself using those models and their own documentation. If training SNAP, you can use maker2zff to convert for SNAPs training format. You can also use the tool CEGMA from Ian Korf's lab to train SNAP. Use the cegma2zff script that comes with MAKER to do the conversion for training input. If you have questions once you start training, just send them to the list. Thanks, Carson From: Daniel Lawson Date: Thursday, 27 June, 2013 9:37 AM To: Cc: Subject: Re: [maker-devel] spliting genome for annotation Michel, It is about the size of your scaffolds rather than the whole genome. Presumably you don't have 1.2 Gb of contiguous sequence. If you have long scaffolds then the compute time will be constrained by the time taken to process the largest scaffold. regards Dan On 27 June 2013 14:33, wrote: > Dear Maker-developers > > If i understood correctly, in order to increase speed and reduce needed > resources one can split the genome into chunks and annotate each chunk > separately. > (i would really like to use that as i am working with a 1.2 Gbasepair > draftgenome and cant use MPI on the computing cluster) > I am a bit worried about how this might affect the annotation as the > gene-predictor would get trained quite differently for each chunk, right? > Or is there communication between the chunks using the -base function of > maker? > > Could you maybe name some pros and cons of splitting your genome for the > annotation with maker? > > Thank you very much, > Michel > > > > > ________________________________________ > Von: Moser, Michel (IPS) > Gesendet: Donnerstag, 27. Juni 2013 15:24 > An: Carson Holt > Betreff: AW: [maker-devel] start position for some genes results > > ________________________________________ > Von: maker-devel [maker-devel-bounces at yandell-lab.org]" im Auftrag von > "Carson Holt [carsonhh at gmail.com] > Gesendet: Mittwoch, 26. Juni 2013 04:02 > An: Jingjing Jin; maker-devel at yandell-lab.org > Betreff: Re: [maker-devel] start position for some genes results > > The point of the failure you are seeing is occurring in the initialization > stage, before reaching any of the changes that would have been introduced by > 2.28. Try running the test data that comes with MAKER, does it fail as well? > > --Carson > > > > From: Jingjing Jin > > > Date: Tuesday, 25 June, 2013 9:53 PM > To: Carson Holt >, > "maker-devel at yandell-lab.org" > > > Subject: RE: [maker-devel] start position for some genes results > > Yes, this is the real name. > > There is also no ":" in the name. > > Because I have use the same file for maker.2.27 and have no problem. > > I am not sure what is wrong with the new version. > > Jingjing > > > ________________________________ > From: Carson Holt [carsonhh at gmail.com] > Sent: Tuesday, June 25, 2013 9:47 PM > To: Jingjing Jin; > maker-devel at yandell-lab.org > Subject: Re: [maker-devel] start position for some genes results > > Could you check for this sequence in your input genome file for > "processed_tobacco_genome_sequences_c1", make sure that it is in fact that > exact name, and there are no ':' characters in the name because they can > confuse the bioperl fasta indexer. > > --Carson > > > From: Jingjing Jin > > > Date: Tuesday, 25 June, 2013 9:30 PM > To: Carson Holt >, > "maker-devel at yandell-lab.org" > > > Subject: RE: [maker-devel] start position for some genes results > > Dear Carson, > > > I am so sorry. The problem is still here. > > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.ma > ker.output/tobacco_seq_1_datastore > > To access files for individual sequences use the datastore index: > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.ma > ker.output/tobacco_seq_1_master_datastore_index.log > > STATUS: Now running MAKER... > WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to > re-index the fasta. > stop here: processed_tobacco_genome_sequences_c1 > ERROR: Fasta index error > at /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiChunk.pm > line 239. > Process::MpiChunk::_prepare('Process::MpiChunk=HASH(0x4e16178)', > 'HASH(0x4e10810)', 0) called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line > 73 > Process::MpiTiers::__ANON__() called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 415 > eval {...} called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Error.pm line 407 > Error::subs::try('CODE(0x4e19100)', 'HASH(0x4e1bd58)') called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line > 79 > Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x4e16e68)') > called at > /home/jingjing/software/maker.2.28/maker/bin/../lib/Process/MpiTiers.pm line > 56 > Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x4e16ad8)', 0, > 'Process::MpiChunk') called at > /home/jingjing/software/maker.2.28/maker/bin/./maker line 650 > --> rank=NA, hostname=ChuaServer1 > ERROR: Failed in tier preparation > WARNING: You must always set a rank before running MpiTiers > FATAL: argument `seq_id` does not exist in MpiTier object > > ________________________________ > From: Carson Holt [carsonhh at gmail.com] > Sent: Tuesday, June 25, 2013 8:55 PM > To: Jingjing Jin; > maker-devel at yandell-lab.org > Subject: Re: [maker-devel] start position for some genes results > > Delete the mpi_blastdb directory before starting, to make sure all indexes get > rebuilt. Also make sure you are not setting TMP= to a network mounted > location. > > --Carson > > > From: Jingjing Jin > > > Date: Tuesday, 25 June, 2013 8:53 PM > To: Carson Holt >, > "maker-devel at yandell-lab.org" > > > Subject: RE: [maker-devel] start position for some genes results > > Dear Carson, > > When I use the new version of maker, I have another problem like this: > > jingjing at ChuaServer1:~/project/$ > /home/jingjing/software/maker.2.28/maker/bin/./maker > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.ma > ker.output/tobacco_seq_1_datastore > > To access files for individual sequences use the datastore index: > /home/jingjing/project/tobacco/Nicotiana_tabacum/maker.2.28/1/tobacco_seq_1.ma > ker.output/tobacco_seq_1_master_datastore_index.log > > STATUS: Now running MAKER... > WARNING: Cannot find >processed_tobacco_genome_sequences_c1, trying to > re-index the fasta. > stop here: processed_tobacco_genome_sequences_c1 > ERROR: Fasta index error > > > Do you know how to fix this problem about new version? > > Thanks! > > Jingjing > > > > ________________________________ > From: Carson Holt [carsonhh at gmail.com] > Sent: Tuesday, June 25, 2013 6:55 PM > To: Jingjing Jin; > maker-devel at yandell-lab.org > Subject: Re: [maker-devel] start position for some genes results > > What MAKER version are you using? This should be fixed in the current 2.28. > It only happened under a very specific set of circumstances, but I remember > fixing it. So let me know if you are using 2.28. > > --Carson > > > > From: Jingjing Jin > > > Date: Tuesday, 25 June, 2013 5:13 PM > To: "maker-devel at yandell-lab.org" > > > Subject: [maker-devel] start position for some genes results > > Dear all, > > I find some strange things about location for my final result. > > Like for some start position of final gene model: > > c124062 maker gene -1 507 . - . > ID=maker-c124062-snap-gene-0.2;Name=maker-c124062-snap-gene-0.2 > > > It start position is -1. > > Does someone know why the start position is -1? > > Is there something wrong? > > Thanks! > > Jingjing > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Ensembl Genomes | VectorBase | i5K insect genome initiative _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From benayoun at stanford.edu Thu Jun 27 18:47:29 2013 From: benayoun at stanford.edu (=?ISO-8859-1?Q?B=E9r=E9nice_Benayoun?=) Date: Thu, 27 Jun 2013 17:47:29 -0700 Subject: [maker-devel] Maker and mono-exonic genes ? In-Reply-To: References: Message-ID: Hi maker devel team, just wanted to say that retraining SNAP apparently fixed the problem (I modified the defaults and added "-min-intron 0" to the training everywhere relevant (default is 30bp, and must prevent single exon genes to be predicted). Thanks for your insights/help ! Berenice 2013/6/10 Carson Holt > One more note. The ESTs appear to be from multiple overlapping HSPs > (based on red line pattern in image). I'd have to see the actual GFF3 to > be sure, but if that is the case, then there probably isn't an ORF to work > with at that location on that strand (so SNAP can't call it). Possibly the > result of assembly error or a pseudogene. > > --Carson > > > > From: Daniel Ence > Date: Friday, 7 June, 2013 5:32 PM > To: B?r?nice Benayoun , " > maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Maker and mono-exonic genes ? > > Hi Berenice, Thank you for sending that screenshot and the maker_opts.log > file. Those are exactly what we need to understand how to expect MAKER to > perform. > > In looking at the screenshot, it doesn't look like any of the gene > predictors gave a prediction in this region. Uses the predictions from > ab-initio tools as a basis for models and considers models that are > supported by evidence. It won't by default create a model when there isn't > a prediction in the region. > > Can I ask which gene predictors you used and how they were trained? You > might consider training one or more of them on the specific evidence that > you expect to support these genes and then rerunning maker with the > retrained predictors. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ------------------------------ > *From:* maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of > B?r?nice Benayoun [benayoun at stanford.edu] > *Sent:* Friday, June 07, 2013 11:17 AM > *To:* maker-devel at yandell-lab.org > *Subject:* [maker-devel] Maker and mono-exonic genes ? > > Dear maker developers, > > I am currently annotating a de novo fish genome, and have started looking > for genes of interest in particular in Maker's output to verify that it's > outputting proper gene sets. > > While many of the genes I look for seem to be correctly annotated by the > pipeline, I have noticed that important genes that do have strong > evidentiary support but are monoexonic are NOT reported by maker. > > I am attaching a screenshot for the contig that I know should contain the > * Foxl2* gene (notoriously monoexonic across evolution), and highlighted > the corresponding evidence for it. > > Is there any setting I can give to maker to force it to output monoexonic > genes ? I already set "single_exon=1" with no success. I attached my config > file FYI. > > Thank you so much in advance for your answer !!! > > Best, > > Berenice. > -- > B?r?nice A. BENAYOUN, Ph.D. > Stanford University/Genetics Department > *BRUNET Laboratory*, 'Molecular Basis of Longevity and Age Related > Diseases' > M312 Alway Building > 300, Pasteur Drive > MC 5120 > Stanford, CA 94305-5120 > USA > Email: benayoun at stanford.edu > Web: www.stanford.edu/group/brunet/ > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- B?r?nice A. BENAYOUN, Ph.D. Stanford University/Genetics Department *BRUNET Laboratory*, 'Molecular Basis of Longevity and Age Related Diseases' M312 Alway Building 300, Pasteur Drive MC 5120 Stanford, CA 94305-5120 USA Email: benayoun at stanford.edu Web: www.stanford.edu/group/brunet/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jun 28 19:01:47 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 28 Jun 2013 21:01:47 -0400 Subject: [maker-devel] Maker and mono-exonic genes ? In-Reply-To: Message-ID: I'm glad it's working for you. Let us know if you run into additional problems. Thanks, Carson From: B?r?nice Benayoun Date: Thursday, June 27, 2013 8:47 PM To: Carson Holt Cc: Daniel Ence , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Maker and mono-exonic genes ? Hi maker devel team, just wanted to say that retraining SNAP apparently fixed the problem (I modified the defaults and added "-min-intron 0" to the training everywhere relevant (default is 30bp, and must prevent single exon genes to be predicted). Thanks for your insights/help ! Berenice 2013/6/10 Carson Holt > One more note. The ESTs appear to be from multiple overlapping HSPs (based on > red line pattern in image). I'd have to see the actual GFF3 to be sure, but > if that is the case, then there probably isn't an ORF to work with at that > location on that strand (so SNAP can't call it). Possibly the result of > assembly error or a pseudogene. > > --Carson > > > > From: Daniel Ence > Date: Friday, 7 June, 2013 5:32 PM > To: B?r?nice Benayoun , "maker-devel at yandell-lab.org" > > Subject: Re: [maker-devel] Maker and mono-exonic genes ? > > Hi Berenice, Thank you for sending that screenshot and the maker_opts.log > file. Those are exactly what we need to understand how to expect MAKER to > perform. > > In looking at the screenshot, it doesn't look like any of the gene predictors > gave a prediction in this region. Uses the predictions from ab-initio tools as > a basis for models and considers models that are supported by evidence. It > won't by default create a model when there isn't a prediction in the region. > > Can I ask which gene predictors you used and how they were trained? You might > consider training one or more of them on the specific evidence that you expect > to support these genes and then rerunning maker with the retrained predictors. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of B?r?nice > Benayoun [benayoun at stanford.edu] > Sent: Friday, June 07, 2013 11:17 AM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] Maker and mono-exonic genes ? > > Dear maker developers, > > I am currently annotating a de novo fish genome, and have started looking for > genes of interest in particular in Maker's output to verify that it's > outputting proper gene sets. > > While many of the genes I look for seem to be correctly annotated by the > pipeline, I have noticed that important genes that do have strong evidentiary > support but are monoexonic are NOT reported by maker. > > I am attaching a screenshot for the contig that I know should contain the > Foxl2 gene (notoriously monoexonic across evolution), and highlighted the > corresponding evidence for it. > > Is there any setting I can give to maker to force it to output monoexonic > genes ? I already set "single_exon=1" with no success. I attached my config > file FYI. > > Thank you so much in advance for your answer !!! > > Best, > > Berenice. > -- > B?r?nice A. BENAYOUN, Ph.D. > Stanford University/Genetics Department > BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' > M312 Alway Building > 300, Pasteur Drive > MC 5120 > Stanford, CA 94305-5120 > USA > Email: benayoun at stanford.edu > Web: www.stanford.edu/group/brunet/ > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -- B?r?nice A. BENAYOUN, Ph.D. Stanford University/Genetics Department BRUNET Laboratory, 'Molecular Basis of Longevity and Age Related Diseases' M312 Alway Building 300, Pasteur Drive MC 5120 Stanford, CA 94305-5120 USA Email: benayoun at stanford.edu Web: www.stanford.edu/group/brunet/ -------------- next part -------------- An HTML attachment was scrubbed... URL: