From dandence at gmail.com Mon Oct 2 08:17:57 2017 From: dandence at gmail.com (Daniel Ence) Date: Mon, 2 Oct 2017 09:17:57 -0400 Subject: [maker-devel] Error with Maker_functional_gff In-Reply-To: References: Message-ID: Hi Emmanuel, I think this script is expecting the file ?uniprot_sprot.fasta? downloaded from the uniprot download page at http://www.uniprot.org/downloads#uniprotkblink The fasta headers in this file are different from the fasta header that the file you used has: >sp|Q6GZX4|001R_FRG3G Putative transcription factor 001R OS=Frog virus 3 (isolate Goorha) GN=FV3-001R PE=4 SV=1 Let us know if that helps, Daniel > On Oct 2, 2017, at 1:03 AM, Emmanuel Nnadi wrote: > > Hello, > I intend to rename genes for Genebank submission > > I downloaded swissprot.fa from NCBI and used blast MAKER generated file to swissprot. > > the output of BLAST RESULT looks like this > snap_masked-contig_8151-processed-gene-0.8-mRNA-1 P10978.1 49.315 73 37 0 43 115 874 946 2.61e-14 71.6 > > I attempted to run maker_funtional_gff using the swissprot.fa downloaded and the blastp result > > I got the following result > > Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 139, <$IN> line 2897906. > Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 141, <$IN> line 2897906. > Can't parse details from FASTA header: >P11684.1 RecName: Full=Uteroglobin; AltName: Full=Clara cell phospholipid-binding protein; Short=CCPBP; AltName: Full=Clara cells 10 kDa secretory protein; Short=CC10; AltName: Full=Secretoglobin family 1A member 1; AltName: Full=Urinary protein 1; Short=UP-1; Short=UP1; Short=Urine protein 1; Flags: Precursor > > > Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 139, <$IN> line 1608599. > Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 141, <$IN> line 1608599. > Can't parse details from FASTA header: >Q9HZU2.1 RecName: Full=Precorrin-8X methylmutase; AltName: Full=HBA synthase; AltName: Full=Precorrin isomerase > > What can I do? > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Oct 2 08:30:43 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 2 Oct 2017 09:30:43 -0400 Subject: [maker-devel] question on gene numbers with quality_filter.pl In-Reply-To: <16C1890A-2042-4BE1-93CE-8A8DC0C18151@ad.unc.edu> References: <16C1890A-2042-4BE1-93CE-8A8DC0C18151@ad.unc.edu> Message-ID: <30C718DD-D3E5-4659-B83D-B9520DD20E34@gmail.com> Hi Chris, This is interesting. -d in quality_filter.pl should only filter out genes based on AED. Is there a chance that you counted transcripts instead of genes? If there is a transcript with an AED of 1 then quality filter should remove it but leave the gene and the transcripts with AEDs less than 1. I can have a look at it if you send me one of the genes (in GFF3 format) that was filtered out by quality_filter.pl even though it had an AED less than 1. Thanks, Mike > On Sep 29, 2017, at 1:20 PM, Willett, Christopher S wrote: > > Hello- > > We are getting to the final stages (hopefully) of a reannotation of a new assembly of a copepod genome using MAKER and we had some questions about which set of genes to use. Our latest runs were using Pfam domains to define default vs standard set using the quality_filter.pl script and I had a question about stringency of the filters for this script. It appears that the default is more stringent than the output that we get from MAKER without using this script (all with AED max set to 1). Are there additional filters in this script beyond AED that would cause this? > > Here is what we are seeing if more details would be helpful. With a run with or without the keep_pred turned our final MAKER run gives ~21500 predicted genes with or 15200 without the keep predictions turned on. What I was wondering about was why this 15200 is higher than the default set (which gives ~14500 genes) after we filter the gff using the -d setting in quality_filter.pl. For completeness the standard set (-s setting) is retaining ~14800 genes and if I filter the 15200 gff file with the default parameters that yields ~14100 genes. So I was curious what else was going on in the filter script beyond AED that would trim out genes? > > The genes sets look pretty good overall and seem like reasonable numbers so we were debating which set to use as our final set. I am also trying a few other analyses in InterProScan to see if that identifies additional genes beyond Pfam for retention but that seems a bit independent from the question above. > > Thanks for your help, > > Best, > > Chris Willett > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Research Associate Professor > Department of Biology > CB#3280 Coker Hall > University of North Carolina, Chapel Hill > Chapel Hill, NC, 27599-3280 > > Office: 2252 Genome Science Building > phone: > 919-843-8663 > fax: > 919-962-1625 > > http://labs.bio.unc.edu/Willett/ > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Oct 2 14:19:51 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 2 Oct 2017 15:19:51 -0400 Subject: [maker-devel] question on gene numbers with quality_filter.pl In-Reply-To: <4C24415C-8A2A-499F-A55A-0026F7D1329F@ad.unc.edu> References: <16C1890A-2042-4BE1-93CE-8A8DC0C18151@ad.unc.edu> <30C718DD-D3E5-4659-B83D-B9520DD20E34@gmail.com> <4C24415C-8A2A-499F-A55A-0026F7D1329F@ad.unc.edu> Message-ID: <0A5A51F2-C551-493B-943B-7F5F81C294BF@gmail.com> Hi Chris, Yeah By default MAKER shouldn?t keep any annotation with an AED of 1. I?ve ccd the dev list on this to see if anyone else has any idea why you might get AED 1 genes with keep_preds=0. Could you send me the maker_opts.ctl file for the run. There may be something informative in there. Thanks, Mike > On Oct 2, 2017, at 2:32 PM, Willett, Christopher S wrote: > > Hi Mike- > > I was looking at the lists of mRNAs and I think what is happening is that there are still genes retained in our initial output from MAKER that have an AED=1 that are then getting trimmed out of the filtered file. If I am setting the AED threshold equal to 1 in the control file for the MAKER run is that less than one or less than or equal to one for retention? Should these AED=1 genes be making it into the gene and mRNA pools if we have the keep predictions parameter set to 0? > > Thanks for your help, > > Best, > > Chris > > > >> On Oct 2, 2017, at 9:30 AM, Michael Campbell > wrote: >> >> Hi Chris, >> >> This is interesting. -d in quality_filter.pl should only filter out genes based on AED. Is there a chance that you counted transcripts instead of genes? If there is a transcript with an AED of 1 then quality filter should remove it but leave the gene and the transcripts with AEDs less than 1. I can have a look at it if you send me one of the genes (in GFF3 format) that was filtered out by quality_filter.pl even though it had an AED less than 1. >> >> Thanks, >> Mike >> >> >>> On Sep 29, 2017, at 1:20 PM, Willett, Christopher S > wrote: >>> >>> Hello- >>> >>> We are getting to the final stages (hopefully) of a reannotation of a new assembly of a copepod genome using MAKER and we had some questions about which set of genes to use. Our latest runs were using Pfam domains to define default vs standard set using the quality_filter.pl script and I had a question about stringency of the filters for this script. It appears that the default is more stringent than the output that we get from MAKER without using this script (all with AED max set to 1). Are there additional filters in this script beyond AED that would cause this? >>> >>> Here is what we are seeing if more details would be helpful. With a run with or without the keep_pred turned our final MAKER run gives ~21500 predicted genes with or 15200 without the keep predictions turned on. What I was wondering about was why this 15200 is higher than the default set (which gives ~14500 genes) after we filter the gff using the -d setting in quality_filter.pl. For completeness the standard set (-s setting) is retaining ~14800 genes and if I filter the 15200 gff file with the default parameters that yields ~14100 genes. So I was curious what else was going on in the filter script beyond AED that would trim out genes? >>> >>> The genes sets look pretty good overall and seem like reasonable numbers so we were debating which set to use as our final set. I am also trying a few other analyses in InterProScan to see if that identifies additional genes beyond Pfam for retention but that seems a bit independent from the question above. >>> >>> Thanks for your help, >>> >>> Best, >>> >>> Chris Willett >>> >>> >>> >>> >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> Research Associate Professor >>> Department of Biology >>> CB#3280 Coker Hall >>> University of North Carolina, Chapel Hill >>> Chapel Hill, NC, 27599-3280 >>> >>> Office: 2252 Genome Science Building >>> phone: >>> 919-843-8663 >>> fax: >>> 919-962-1625 >>> >>> http://labs.bio.unc.edu/Willett/ >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Oct 2 14:35:55 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 2 Oct 2017 15:35:55 -0400 Subject: [maker-devel] question on gene numbers with quality_filter.pl In-Reply-To: References: <16C1890A-2042-4BE1-93CE-8A8DC0C18151@ad.unc.edu> <30C718DD-D3E5-4659-B83D-B9520DD20E34@gmail.com> <4C24415C-8A2A-499F-A55A-0026F7D1329F@ad.unc.edu> <0A5A51F2-C551-493B-943B-7F5F81C294BF@gmail.com> Message-ID: <4C4E3DE7-CE28-4DF7-B234-E88701CAD172@gmail.com> Hi Chris, It?s this line here: model_gff=/proj/willetlb/users/cwillett/MAKER_analyses/dovetail_ann/SDv1.0_est-forward-SDv2.1.gff Anything passed to model_gff is treated as sacred by MAKER and will be kept regardless of AED. If you pass it in as pred_gff= then it will be subject to the AED filters. I hope this helps, Mike > On Oct 2, 2017, at 3:28 PM, Willett, Christopher S wrote: > > From daren.card at gmail.com Wed Oct 4 10:53:42 2017 From: daren.card at gmail.com (Daren C. Card) Date: Wed, 4 Oct 2017 10:53:42 -0500 Subject: [maker-devel] MAKER RepeatRunner error on long scaffolds only Message-ID: Hi all, I?ve been having an issue with MAKER (v. 2.31.8) that I haven?t been able to overcome, and no former questions have really addressed or helped fix the problem. I?ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size? I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can?t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I?m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library). Any help would be greatly appreciated. Daren Card University of Texas Arlington ################################################### doing blastx repeats running blast search. #--------- command -------------# Widget::blastx: /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner #-------------------------------# deleted:0 hits collecting blastx repeatmasking processing all repeats in cluster::shadow_cluster... Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. --> rank=3, hostname=moonunit0 ERROR: Failed while processing all repeats ERROR: Chunk failed at level:3, tier_type:1 FAILED CONTIG:scaffold-1 doing blastx repeats running blast search. #--------- command -------------# Widget::blastx: /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner #-------------------------------# ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:scaffold-1 deleted:0 hits deleted:0 hits ################################################### From carsonhh at gmail.com Wed Oct 4 11:03:52 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 4 Oct 2017 10:03:52 -0600 Subject: [maker-devel] MAKER RepeatRunner error on long scaffolds only In-Reply-To: References: Message-ID: <2460BB61-C918-40B5-ABF2-03193BF13CCC@gmail.com> The point where it dies is because there is no start/end coordinate for one of the alignments. The issue can either be with the GFF3 you gave it or is a truncated BLAST report. Recently there have been a number of weird BLAST+ issues related to truncated reports. Updating to 2.6+ seems to solve it for most people. There is also a 2.6 update for rmblast inside RepeatMasker. I submitted a bug report and example set to BLAST a few months ago. ?Carson > On Oct 4, 2017, at 9:53 AM, Daren C. Card wrote: > > Hi all, > > I?ve been having an issue with MAKER (v. 2.31.8) that I haven?t been able to overcome, and no former questions have really addressed or helped fix the problem. I?ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size? > > I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can?t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I?m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library). > > Any help would be greatly appreciated. > Daren Card > > University of Texas Arlington > > ################################################### > doing blastx repeats > running blast search. > #--------- command -------------# > Widget::blastx: > /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner > #-------------------------------# > deleted:0 hits > collecting blastx repeatmasking > processing all repeats > in cluster::shadow_cluster... > Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. > --> rank=3, hostname=moonunit0 > ERROR: Failed while processing all repeats > ERROR: Chunk failed at level:3, tier_type:1 > FAILED CONTIG:scaffold-1 > > doing blastx repeats > running blast search. > #--------- command -------------# > Widget::blastx: > /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner > #-------------------------------# > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:scaffold-1 > > deleted:0 hits > deleted:0 hits > ################################################### > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From qwzhang0601 at gmail.com Wed Oct 4 17:31:09 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Wed, 4 Oct 2017 18:31:09 -0400 Subject: [maker-devel] About eAED Message-ID: Hello: I ran the maker2 pipeline and got the default gene sets (with AED<1). But I found there are several hundred genes with eAED 1. Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can be the reason of the great difference between AED and eAED. For this gene it has a very low AED score, is it still a reliable gene model if its eAED equals 1? >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 QI:75|0|0|1|0|0|2|111|35 Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Wed Oct 4 17:35:41 2017 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Thu, 5 Oct 2017 09:35:41 +1100 Subject: [maker-devel] About eAED In-Reply-To: References: Message-ID: Carson commented on this here https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ On 5 October 2017 at 09:31, Quanwei Zhang wrote: > Hello: > > I ran the maker2 pipeline and got the default gene sets (with AED<1). But > I found there are several hundred genes with eAED 1. > > Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can > be the reason of the great difference between AED and eAED. For this gene > it has a very low AED score, is it still a reliable gene model if its eAED > equals 1? > > >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 > QI:75|0|0|1|0|0|2|111|35 > > Thanks > > Best > Quanwei > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Oct 4 17:38:00 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 4 Oct 2017 16:38:00 -0600 Subject: [maker-devel] About eAED In-Reply-To: References: Message-ID: <77155DA5-6454-4B25-BCF6-DE6B077BA548@gmail.com> eAED is an extended AED calculation that does some inference about the evidence (i.e. checks reading frame and not just overlap, and may infer support for an exon if by splice sites are confirmed etc.). If eAED is 1 that means that while there is evidence supporting the model, the evidence is more likely to be spurious, so it may be a false model. ?Carson > On Oct 4, 2017, at 4:31 PM, Quanwei Zhang wrote: > > Hello: > > I ran the maker2 pipeline and got the default gene sets (with AED<1). But I found there are several hundred genes with eAED 1. > > Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can be the reason of the great difference between AED and eAED. For this gene it has a very low AED score, is it still a reliable gene model if its eAED equals 1? > > >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 QI:75|0|0|1|0|0|2|111|35 > > Thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Oct 4 17:39:52 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 4 Oct 2017 16:39:52 -0600 Subject: [maker-devel] About eAED In-Reply-To: References: Message-ID: <606958D2-D9BB-477D-ACE8-E096A9AD9666@gmail.com> This one is an even better explanation than the answer I just gave. Thank you. ?Carson > On Oct 4, 2017, at 4:35 PM, Xabier V?zquez-Campos wrote: > > Carson commented on this here > https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ > > On 5 October 2017 at 09:31, Quanwei Zhang > wrote: > Hello: > > I ran the maker2 pipeline and got the default gene sets (with AED<1). But I found there are several hundred genes with eAED 1. > > Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can be the reason of the great difference between AED and eAED. For this gene it has a very low AED score, is it still a reliable gene model if its eAED equals 1? > > >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 QI:75|0|0|1|0|0|2|111|35 > > Thanks > > Best > Quanwei > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > Xabier V?zquez-Campos, PhD > Research Associate > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From eennadi at gmail.com Mon Oct 2 00:03:01 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Mon, 2 Oct 2017 06:03:01 +0100 Subject: [maker-devel] Error with Maker_functional_gff Message-ID: Hello, I intend to rename genes for Genebank submission I downloaded swissprot.fa from NCBI and used blast MAKER generated file to swissprot. the output of BLAST RESULT looks like this snap_masked-contig_8151-processed-gene-0.8-mRNA-1 P10978.1 49.315 73 37 0 43 115 874 946 2.61e-14 71.6 I attempted to run maker_funtional_gff using the swissprot.fa downloaded and the blastp result I got the following result Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 139, <$IN> line 2897906. Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 141, <$IN> line 2897906. Can't parse details from FASTA header: >P11684.1 RecName: Full=Uteroglobin; AltName: Full=Clara cell phospholipid-binding protein; Short=CCPBP; AltName: Full=Clara cells 10 kDa secretory protein; Short=CC10; AltName: Full=Secretoglobin family 1A member 1; AltName: Full=Urinary protein 1; Short=UP-1; Short=UP1; Short=Urine protein 1; Flags: Precursor Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 139, <$IN> line 1608599. Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 141, <$IN> line 1608599. Can't parse details from FASTA header: >Q9HZU2.1 RecName: Full=Precorrin-8X methylmutase; AltName: Full=HBA synthase; AltName: Full=Precorrin isomerase What can I do? Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From willett4 at email.unc.edu Mon Oct 2 10:04:38 2017 From: willett4 at email.unc.edu (Willett, Christopher S) Date: Mon, 2 Oct 2017 15:04:38 +0000 Subject: [maker-devel] question on gene numbers with quality_filter.pl In-Reply-To: <30C718DD-D3E5-4659-B83D-B9520DD20E34@gmail.com> References: <16C1890A-2042-4BE1-93CE-8A8DC0C18151@ad.unc.edu> <30C718DD-D3E5-4659-B83D-B9520DD20E34@gmail.com> Message-ID: Hi Mike- Thanks for getting back to me. I was using the grep -cP '\tgene\t? syntax to count the numbers and it seems to be giving me the same numbers I got before when I was counting either the transcripts or the genes in the fasta output files from our original run. I will have to look at the files a bit more to see if I can find some examples of genes that fit what you are suggesting. Best, Chris On Oct 2, 2017, at 9:30 AM, Michael Campbell > wrote: Hi Chris, This is interesting. -d in quality_filter.pl should only filter out genes based on AED. Is there a chance that you counted transcripts instead of genes? If there is a transcript with an AED of 1 then quality filter should remove it but leave the gene and the transcripts with AEDs less than 1. I can have a look at it if you send me one of the genes (in GFF3 format) that was filtered out by quality_filter.pl even though it had an AED less than 1. Thanks, Mike On Sep 29, 2017, at 1:20 PM, Willett, Christopher S > wrote: Hello- We are getting to the final stages (hopefully) of a reannotation of a new assembly of a copepod genome using MAKER and we had some questions about which set of genes to use. Our latest runs were using Pfam domains to define default vs standard set using the quality_filter.pl script and I had a question about stringency of the filters for this script. It appears that the default is more stringent than the output that we get from MAKER without using this script (all with AED max set to 1). Are there additional filters in this script beyond AED that would cause this? Here is what we are seeing if more details would be helpful. With a run with or without the keep_pred turned our final MAKER run gives ~21500 predicted genes with or 15200 without the keep predictions turned on. What I was wondering about was why this 15200 is higher than the default set (which gives ~14500 genes) after we filter the gff using the -d setting in quality_filter.pl. For completeness the standard set (-s setting) is retaining ~14800 genes and if I filter the 15200 gff file with the default parameters that yields ~14100 genes. So I was curious what else was going on in the filter script beyond AED that would trim out genes? The genes sets look pretty good overall and seem like reasonable numbers so we were debating which set to use as our final set. I am also trying a few other analyses in InterProScan to see if that identifies additional genes beyond Pfam for retention but that seems a bit independent from the question above. Thanks for your help, Best, Chris Willett ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Research Associate Professor Department of Biology CB#3280 Coker Hall University of North Carolina, Chapel Hill Chapel Hill, NC, 27599-3280 Office: 2252 Genome Science Building phone: 919-843-8663 fax: 919-962-1625 http://labs.bio.unc.edu/Willett/ _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From willett4 at email.unc.edu Mon Oct 2 14:28:19 2017 From: willett4 at email.unc.edu (Willett, Christopher S) Date: Mon, 2 Oct 2017 19:28:19 +0000 Subject: [maker-devel] question on gene numbers with quality_filter.pl In-Reply-To: <0A5A51F2-C551-493B-943B-7F5F81C294BF@gmail.com> References: <16C1890A-2042-4BE1-93CE-8A8DC0C18151@ad.unc.edu> <30C718DD-D3E5-4659-B83D-B9520DD20E34@gmail.com> <4C24415C-8A2A-499F-A55A-0026F7D1329F@ad.unc.edu> <0A5A51F2-C551-493B-943B-7F5F81C294BF@gmail.com> Message-ID: Hi Mike- Here is the control file for the last run of MAKER with keep_preds=0 and here is an example of one mRNA retained from the gff file: Chromosome_6 maker mRNA 556000 557215 . + . ID=maker-Chromosome_6-exonerate_est2genome-gene-5.3-mRNA-1;Parent=maker-Chromosome_6-exonerate_est2genome-gene-5.3;Name=TCALIF_02833-PA;_AED=1.00;_eAED=1.00;_QI=15|0|0|0|1|1|2|75|338;score=100;Alias=TCALIF_02833-PA Thanks, Chris On Oct 2, 2017, at 3:19 PM, Michael Campbell > wrote: Hi Chris, Yeah By default MAKER shouldn?t keep any annotation with an AED of 1. I?ve ccd the dev list on this to see if anyone else has any idea why you might get AED 1 genes with keep_preds=0. Could you send me the maker_opts.ctl file for the run. There may be something informative in there. Thanks, Mike On Oct 2, 2017, at 2:32 PM, Willett, Christopher S > wrote: Hi Mike- I was looking at the lists of mRNAs and I think what is happening is that there are still genes retained in our initial output from MAKER that have an AED=1 that are then getting trimmed out of the filtered file. If I am setting the AED threshold equal to 1 in the control file for the MAKER run is that less than one or less than or equal to one for retention? Should these AED=1 genes be making it into the gene and mRNA pools if we have the keep predictions parameter set to 0? Thanks for your help, Best, Chris On Oct 2, 2017, at 9:30 AM, Michael Campbell > wrote: Hi Chris, This is interesting. -d in quality_filter.pl should only filter out genes based on AED. Is there a chance that you counted transcripts instead of genes? If there is a transcript with an AED of 1 then quality filter should remove it but leave the gene and the transcripts with AEDs less than 1. I can have a look at it if you send me one of the genes (in GFF3 format) that was filtered out by quality_filter.pl even though it had an AED less than 1. Thanks, Mike On Sep 29, 2017, at 1:20 PM, Willett, Christopher S > wrote: Hello- We are getting to the final stages (hopefully) of a reannotation of a new assembly of a copepod genome using MAKER and we had some questions about which set of genes to use. Our latest runs were using Pfam domains to define default vs standard set using the quality_filter.pl script and I had a question about stringency of the filters for this script. It appears that the default is more stringent than the output that we get from MAKER without using this script (all with AED max set to 1). Are there additional filters in this script beyond AED that would cause this? Here is what we are seeing if more details would be helpful. With a run with or without the keep_pred turned our final MAKER run gives ~21500 predicted genes with or 15200 without the keep predictions turned on. What I was wondering about was why this 15200 is higher than the default set (which gives ~14500 genes) after we filter the gff using the -d setting in quality_filter.pl. For completeness the standard set (-s setting) is retaining ~14800 genes and if I filter the 15200 gff file with the default parameters that yields ~14100 genes. So I was curious what else was going on in the filter script beyond AED that would trim out genes? The genes sets look pretty good overall and seem like reasonable numbers so we were debating which set to use as our final set. I am also trying a few other analyses in InterProScan to see if that identifies additional genes beyond Pfam for retention but that seems a bit independent from the question above. Thanks for your help, Best, Chris Willett ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Research Associate Professor Department of Biology CB#3280 Coker Hall University of North Carolina, Chapel Hill Chapel Hill, NC, 27599-3280 Office: 2252 Genome Science Building phone: 919-843-8663 fax: 919-962-1625 http://labs.bio.unc.edu/Willett/ _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl_full8 Type: application/octet-stream Size: 5617 bytes Desc: maker_opts.ctl_full8 URL: From qwzhang0601 at gmail.com Wed Oct 4 21:35:55 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Wed, 4 Oct 2017 22:35:55 -0400 Subject: [maker-devel] About eAED In-Reply-To: <606958D2-D9BB-477D-ACE8-E096A9AD9666@gmail.com> References: <606958D2-D9BB-477D-ACE8-E096A9AD9666@gmail.com> Message-ID: Thank you all. Most time, the AED is equal to or lower than eAED, but there are some genes whose eAED is smaller than AED. I feel the eAED is more stringent than AED. Would you give me an example, under what condition eAED can be smaller than AED? The default maker2 gene set includes all genes with AED less than 1. Do you think eAED is a better choice to filter gene models than AED? Best Quanwei 2017-10-04 18:39 GMT-04:00 Carson Holt : > This one is an even better explanation than the answer I just gave. Thank > you. > > ?Carson > > On Oct 4, 2017, at 4:35 PM, Xabier V?zquez-Campos > wrote: > > Carson commented on this here > https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ > > On 5 October 2017 at 09:31, Quanwei Zhang wrote: > >> Hello: >> >> I ran the maker2 pipeline and got the default gene sets (with AED<1). But >> I found there are several hundred genes with eAED 1. >> >> Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can >> be the reason of the great difference between AED and eAED. For this gene >> it has a very low AED score, is it still a reliable gene model if its eAED >> equals 1? >> >> >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 >> QI:75|0|0|1|0|0|2|111|35 >> >> Thanks >> >> Best >> Quanwei >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Oct 4 21:38:25 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 4 Oct 2017 20:38:25 -0600 Subject: [maker-devel] About eAED In-Reply-To: References: <606958D2-D9BB-477D-ACE8-E096A9AD9666@gmail.com> Message-ID: <5DEAC021-9925-4B41-9332-AB48685D7304@gmail.com> The previous linked comment explains in detail ?> https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ Basically the middle support of exon is inferred from edge support even though no overlap exists (so eAED infers support and AED does not). ?Carson > On Oct 4, 2017, at 8:35 PM, Quanwei Zhang wrote: > > Thank you all. Most time, the AED is equal to or lower than eAED, but there are some genes whose eAED is smaller than AED. I feel the eAED is more stringent than AED. Would you give me an example, under what condition eAED can be smaller than AED? > > The default maker2 gene set includes all genes with AED less than 1. Do you think eAED is a better choice to filter gene models than AED? > > Best > Quanwei > > > > 2017-10-04 18:39 GMT-04:00 Carson Holt >: > This one is an even better explanation than the answer I just gave. Thank you. > > ?Carson > >> On Oct 4, 2017, at 4:35 PM, Xabier V?zquez-Campos > wrote: >> >> Carson commented on this here >> https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ >> >> On 5 October 2017 at 09:31, Quanwei Zhang > wrote: >> Hello: >> >> I ran the maker2 pipeline and got the default gene sets (with AED<1). But I found there are several hundred genes with eAED 1. >> >> Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can be the reason of the great difference between AED and eAED. For this gene it has a very low AED score, is it still a reliable gene model if its eAED equals 1? >> >> >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 QI:75|0|0|1|0|0|2|111|35 >> >> Thanks >> >> Best >> Quanwei >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> >> -- >> Xabier V?zquez-Campos, PhD >> Research Associate >> NSW Systems Biology Initiative >> School of Biotechnology and Biomolecular Sciences >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Oct 4 21:43:28 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 4 Oct 2017 20:43:28 -0600 Subject: [maker-devel] About eAED In-Reply-To: <5DEAC021-9925-4B41-9332-AB48685D7304@gmail.com> References: <606958D2-D9BB-477D-ACE8-E096A9AD9666@gmail.com> <5DEAC021-9925-4B41-9332-AB48685D7304@gmail.com> Message-ID: eAED can be better for edge cases, but neither is perfect. Low AED generally correlates with better models. But a high AED does not mean the model doesn?t exist, it just means you should spend a little more time deciding if you really believe it or not. ?Carson > On Oct 4, 2017, at 8:38 PM, Carson Holt wrote: > > The previous linked comment explains in detail ?> https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ > > Basically the middle support of exon is inferred from edge support even though no overlap exists (so eAED infers support and AED does not). > > ?Carson > > >> On Oct 4, 2017, at 8:35 PM, Quanwei Zhang > wrote: >> >> Thank you all. Most time, the AED is equal to or lower than eAED, but there are some genes whose eAED is smaller than AED. I feel the eAED is more stringent than AED. Would you give me an example, under what condition eAED can be smaller than AED? >> >> The default maker2 gene set includes all genes with AED less than 1. Do you think eAED is a better choice to filter gene models than AED? >> >> Best >> Quanwei >> >> >> >> 2017-10-04 18:39 GMT-04:00 Carson Holt >: >> This one is an even better explanation than the answer I just gave. Thank you. >> >> ?Carson >> >>> On Oct 4, 2017, at 4:35 PM, Xabier V?zquez-Campos > wrote: >>> >>> Carson commented on this here >>> https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ >>> >>> On 5 October 2017 at 09:31, Quanwei Zhang > wrote: >>> Hello: >>> >>> I ran the maker2 pipeline and got the default gene sets (with AED<1). But I found there are several hundred genes with eAED 1. >>> >>> Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can be the reason of the great difference between AED and eAED. For this gene it has a very low AED score, is it still a reliable gene model if its eAED equals 1? >>> >>> >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 QI:75|0|0|1|0|0|2|111|35 >>> >>> Thanks >>> >>> Best >>> Quanwei >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >>> >>> -- >>> Xabier V?zquez-Campos, PhD >>> Research Associate >>> NSW Systems Biology Initiative >>> School of Biotechnology and Biomolecular Sciences >>> The University of New South Wales >>> Sydney NSW 2052 AUSTRALIA >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Wed Oct 4 22:25:24 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Wed, 4 Oct 2017 23:25:24 -0400 Subject: [maker-devel] About eAED In-Reply-To: References: <606958D2-D9BB-477D-ACE8-E096A9AD9666@gmail.com> <5DEAC021-9925-4B41-9332-AB48685D7304@gmail.com> Message-ID: Thanks for your explanation. Best Quanwei 2017-10-04 22:43 GMT-04:00 Carson Holt : > eAED can be better for edge cases, but neither is perfect. Low AED > generally correlates with better models. But a high AED does not mean the > model doesn?t exist, it just means you should spend a little more time > deciding if you really believe it or not. > > ?Carson > > > > On Oct 4, 2017, at 8:38 PM, Carson Holt wrote: > > The previous linked comment explains in detail ?> > https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ > > Basically the middle support of exon is inferred from edge support even > though no overlap exists (so eAED infers support and AED does not). > > ?Carson > > > On Oct 4, 2017, at 8:35 PM, Quanwei Zhang wrote: > > Thank you all. Most time, the AED is equal to or lower than eAED, but > there are some genes whose eAED is smaller than AED. I feel the eAED is > more stringent than AED. Would you give me an example, under what condition > eAED can be smaller than AED? > > The default maker2 gene set includes all genes with AED less than 1. Do > you think eAED is a better choice to filter gene models than AED? > > Best > Quanwei > > > > 2017-10-04 18:39 GMT-04:00 Carson Holt : > >> This one is an even better explanation than the answer I just gave. Thank >> you. >> >> ?Carson >> >> On Oct 4, 2017, at 4:35 PM, Xabier V?zquez-Campos >> wrote: >> >> Carson commented on this here >> https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa- >> ko/iC4KTuIitGEJ >> >> On 5 October 2017 at 09:31, Quanwei Zhang wrote: >> >>> Hello: >>> >>> I ran the maker2 pipeline and got the default gene sets (with AED<1). >>> But I found there are several hundred genes with eAED 1. >>> >>> Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can >>> be the reason of the great difference between AED and eAED. For this gene >>> it has a very low AED score, is it still a reliable gene model if its eAED >>> equals 1? >>> >>> >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 >>> QI:75|0|0|1|0|0|2|111|35 >>> >>> Thanks >>> >>> Best >>> Quanwei >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> >> >> -- >> Xabier V?zquez-Campos, *PhD* >> *Research Associate* >> NSW Systems Biology Initiative >> School of Biotechnology and Biomolecular Sciences >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dandence at gmail.com Thu Oct 5 09:00:21 2017 From: dandence at gmail.com (Daniel Ence) Date: Thu, 5 Oct 2017 10:00:21 -0400 Subject: [maker-devel] Error with Maker_functional_gff In-Reply-To: References: Message-ID: Hi Emmanuel, I can?t tell whether it?s will work from the blast lines that you sent. It will depend on the full headers in the fasta lines, which you?ll run after all the blasts are complete. Assembly isn?t really my expertise or the topic of this mailing list, but assembling your contigs into scaffolds would probably help your annotations by connecting some parts of genes that are broken across contigs, and will definitely help downstream analysis if you need to know which genes are located next to each other. How much improvement you can get by scaffolding depends on the type of sequence data you have. Each scaffolder makes assumptions and has requirements, and some assemblers like velvet and SOAPdenovo have scaffolding built into their algorithms. I?d recommend starting with a review like this one: http://www.sciencedirect.com/science/article/pii/S1672022912000095 ~Daniel > On Oct 2, 2017, at 10:47 AM, Emmanuel Nnadi wrote: > > Hello Daniel, > > Thanks for the tip, I was able to download uniprot_swiss.fa I am currently running the blast now > > it looks like this > > MUCPR_041061-RA sp|P10978|POLX_TOBAC 49.315 73 37 0 43 115 874 946 2.95e-14 71.6 > MUCPR_026643-RA sp|Q00451|PRF1_SOLLC 86.207 87 11 1 243 328 257 343 3.65e-32 126 > > Is it ok? > > I wish to ask, I did not assemble my contigs into scaffold before annotating would it affect the end result? > > I wish to assemble my sequence into scaffold can you advice on the best software to use? > > I attempted using SSPACE: a new stand-alone scaffolding tool for small and large genomes > but am having problem with the library. Funny enough the software does not have support to solve problems > > Thanks > > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > On Mon, Oct 2, 2017 at 2:17 PM, Daniel Ence > wrote: > Hi Emmanuel, I think this script is expecting the file ?uniprot_sprot.fasta? downloaded from the uniprot download page at http://www.uniprot.org/downloads#uniprotkblink > The fasta headers in this file are different from the fasta header that the file you used has: > >sp|Q6GZX4|001R_FRG3G Putative transcription factor 001R OS=Frog virus 3 (isolate Goorha) GN=FV3-001R PE=4 SV=1 > > Let us know if that helps, > Daniel > >> On Oct 2, 2017, at 1:03 AM, Emmanuel Nnadi > wrote: >> >> Hello, >> I intend to rename genes for Genebank submission >> >> I downloaded swissprot.fa from NCBI and used blast MAKER generated file to swissprot. >> >> the output of BLAST RESULT looks like this >> snap_masked-contig_8151-processed-gene-0.8-mRNA-1 P10978.1 49.315 73 37 0 43 115 874 946 2.61e-14 71.6 >> >> I attempted to run maker_funtional_gff using the swissprot.fa downloaded and the blastp result >> >> I got the following result >> >> Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 139, <$IN> line 2897906. >> Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 141, <$IN> line 2897906. >> Can't parse details from FASTA header: >P11684.1 RecName: Full=Uteroglobin; AltName: Full=Clara cell phospholipid-binding protein; Short=CCPBP; AltName: Full=Clara cells 10 kDa secretory protein; Short=CC10; AltName: Full=Secretoglobin family 1A member 1; AltName: Full=Urinary protein 1; Short=UP-1; Short=UP1; Short=Urine protein 1; Flags: Precursor >> >> >> Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 139, <$IN> line 1608599. >> Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 141, <$IN> line 1608599. >> Can't parse details from FASTA header: >Q9HZU2.1 RecName: Full=Precorrin-8X methylmutase; AltName: Full=HBA synthase; AltName: Full=Precorrin isomerase >> >> What can I do? >> >> >> Nnadi Nnaemeka Emmanuel >> Department of Microbiology, >> Faculty of Natural and Applied Science, >> Plateau State University, Bokkos, Plateau State, Nigeria. >> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daren.card at gmail.com Fri Oct 6 07:23:36 2017 From: daren.card at gmail.com (Daren C. Card) Date: Fri, 6 Oct 2017 07:23:36 -0500 Subject: [maker-devel] MAKER RepeatRunner error on long scaffolds only In-Reply-To: <2460BB61-C918-40B5-ABF2-03193BF13CCC@gmail.com> References: <2460BB61-C918-40B5-ABF2-03193BF13CCC@gmail.com> Message-ID: <52A27F91-063E-45C5-BEE0-BED0BF4E861E@gmail.com> Dear Carson, Thanks so much for the quick reply. I updated BLAST to v2.6 and reran the configure script for RepeatMasker. Looks like MAKER should natively work with the BLAST that is available in the $PATH. Unfortunately, I?m still getting the same error what appears to be at roughly the same spot (~child 226). I?ve copied the stderr below. I checked my GFF file and I don?t see any issues with coordinates. I?m going to try running without a GFF of repeat annotations to see what that does, but in the meantime I wanted to send an update and see if there is anything else I should look into. Thank you, Daren Card ################################################ doing repeat masking re reading repeat masker report. /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.simple.rb.out doing blastx repeats re reading blast report. /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.te_proteins%2Efasta.repeatrunner deleted:2 hits doing blastx repeats doing blastx repeats doing blastx repeats doing blastx repeats doing blastx repeats doing blastx repeats doing blastx repeats doing blastx repeats doing blastx repeats collecting blastx repeatmasking processing all repeats in cluster::shadow_cluster... Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. --> rank=NA, hostname=moonunit0 ERROR: Failed while processing all repeats ERROR: Chunk failed at level:3, tier_type:1 FAILED CONTIG:scaffold-1 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:scaffold-1 examining contents of the fasta file and run log ################################################ > On Oct 4, 2017, at 11:03 AM, Carson Holt wrote: > > The point where it dies is because there is no start/end coordinate for one of the alignments. The issue can either be with the GFF3 you gave it or is a truncated BLAST report. Recently there have been a number of weird BLAST+ issues related to truncated reports. Updating to 2.6+ seems to solve it for most people. There is also a 2.6 update for rmblast inside RepeatMasker. I submitted a bug report and example set to BLAST a few months ago. > > ?Carson > > >> On Oct 4, 2017, at 9:53 AM, Daren C. Card wrote: >> >> Hi all, >> >> I?ve been having an issue with MAKER (v. 2.31.8) that I haven?t been able to overcome, and no former questions have really addressed or helped fix the problem. I?ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size? >> >> I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can?t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I?m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library). >> >> Any help would be greatly appreciated. >> Daren Card >> >> University of Texas Arlington >> >> ################################################### >> doing blastx repeats >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner >> #-------------------------------# >> deleted:0 hits >> collecting blastx repeatmasking >> processing all repeats >> in cluster::shadow_cluster... >> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >> --> rank=3, hostname=moonunit0 >> ERROR: Failed while processing all repeats >> ERROR: Chunk failed at level:3, tier_type:1 >> FAILED CONTIG:scaffold-1 >> >> doing blastx repeats >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner >> #-------------------------------# >> ERROR: Chunk failed at level:2, tier_type:0 >> FAILED CONTIG:scaffold-1 >> >> deleted:0 hits >> deleted:0 hits >> ################################################### >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From eennadi at gmail.com Sat Oct 7 16:34:46 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Sat, 7 Oct 2017 22:34:46 +0100 Subject: [maker-devel] jbrowse not working Message-ID: Please, I ran the command line maker2jbrowse muc1_genome_snap2.all.gff The command created some folders. However, at the end it read No reference sequences defined in configuration, nothing to do. Please what does it mean? How can I view it in jbrowse. Thanks Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun Oct 8 19:37:12 2017 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 8 Oct 2017 18:37:12 -0600 Subject: [maker-devel] MAKER RepeatRunner error on long scaffolds only In-Reply-To: <52A27F91-063E-45C5-BEE0-BED0BF4E861E@gmail.com> References: <2460BB61-C918-40B5-ABF2-03193BF13CCC@gmail.com> <52A27F91-063E-45C5-BEE0-BED0BF4E861E@gmail.com> Message-ID: <228ECD18-7B0E-47EB-9F58-FA3C31421A52@gmail.com> MAKER will use whatever blast is indicated in maker_exe.ctl, so make sure the new installation is the one indicated there. RepeatRunner is not part of RepeatMasker, and is a separate step that is essentially just a modified BLASTX against a protein database. So the standard NCBI blast+ installation is what gets used for that (not RMBLAST). The error you get is because the BLAST report is truncated. At the top of a BLAST report there is a summary of results, and then below there are details about each result. What is happening is that there are results in the top summary that are not being found in the bottom detail section. If Updating to BLAST+ 2.6 does not fix it for you, you may need to drop to legacy NCBI BLAST (i.e. the one that is not the BLAST+ rewrite). Here ?> ftp://ftp.ncbi.nlm.nih.gov/blast/executables/legacy/2.2.26/ ?Carson > On Oct 6, 2017, at 6:23 AM, Daren C. Card wrote: > > Dear Carson, > > Thanks so much for the quick reply. I updated BLAST to v2.6 and reran the configure script for RepeatMasker. Looks like MAKER should natively work with the BLAST that is available in the $PATH. > > Unfortunately, I?m still getting the same error what appears to be at roughly the same spot (~child 226). I?ve copied the stderr below. I checked my GFF file and I don?t see any issues with coordinates. I?m going to try running without a GFF of repeat annotations to see what that does, but in the meantime I wanted to send an update and see if there is anything else I should look into. > > Thank you, > Daren Card > > > ################################################ > doing repeat masking > re reading repeat masker report. > /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.simple.rb.out > doing blastx repeats > re reading blast report. > /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.te_proteins%2Efasta.repeatrunner > deleted:2 hits > doing blastx repeats > doing blastx repeats > doing blastx repeats > doing blastx repeats > doing blastx repeats > doing blastx repeats > doing blastx repeats > doing blastx repeats > doing blastx repeats > collecting blastx repeatmasking > processing all repeats > in cluster::shadow_cluster... > Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. > --> rank=NA, hostname=moonunit0 > ERROR: Failed while processing all repeats > ERROR: Chunk failed at level:3, tier_type:1 > FAILED CONTIG:scaffold-1 > > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:scaffold-1 > > examining contents of the fasta file and run log > ################################################ > > > >> On Oct 4, 2017, at 11:03 AM, Carson Holt wrote: >> >> The point where it dies is because there is no start/end coordinate for one of the alignments. The issue can either be with the GFF3 you gave it or is a truncated BLAST report. Recently there have been a number of weird BLAST+ issues related to truncated reports. Updating to 2.6+ seems to solve it for most people. There is also a 2.6 update for rmblast inside RepeatMasker. I submitted a bug report and example set to BLAST a few months ago. >> >> ?Carson >> >> >>> On Oct 4, 2017, at 9:53 AM, Daren C. Card wrote: >>> >>> Hi all, >>> >>> I?ve been having an issue with MAKER (v. 2.31.8) that I haven?t been able to overcome, and no former questions have really addressed or helped fix the problem. I?ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size? >>> >>> I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can?t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I?m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library). >>> >>> Any help would be greatly appreciated. >>> Daren Card >>> >>> University of Texas Arlington >>> >>> ################################################### >>> doing blastx repeats >>> running blast search. >>> #--------- command -------------# >>> Widget::blastx: >>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner >>> #-------------------------------# >>> deleted:0 hits >>> collecting blastx repeatmasking >>> processing all repeats >>> in cluster::shadow_cluster... >>> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >>> --> rank=3, hostname=moonunit0 >>> ERROR: Failed while processing all repeats >>> ERROR: Chunk failed at level:3, tier_type:1 >>> FAILED CONTIG:scaffold-1 >>> >>> doing blastx repeats >>> running blast search. >>> #--------- command -------------# >>> Widget::blastx: >>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner >>> #-------------------------------# >>> ERROR: Chunk failed at level:2, tier_type:0 >>> FAILED CONTIG:scaffold-1 >>> >>> deleted:0 hits >>> deleted:0 hits >>> ################################################### >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Oct 9 19:35:49 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 9 Oct 2017 18:35:49 -0600 Subject: [maker-devel] jbrowse not working In-Reply-To: References: Message-ID: <83AFE420-D54D-4CE8-833F-DE6CCC34A229@gmail.com> Is muc1_genome_snap2.all.gff missing embedded fasta entries at the end of the file? That can happen if you use the -n option with gff3_merge. Alternatively it?s possible one of the individual contig gff3 used to build the merged gff3 is truncated. If that is the case then gff3_merge should have thrown some sort of error or warning when you run it. Thanks, Carson > On Oct 7, 2017, at 3:34 PM, Emmanuel Nnadi wrote: > > Please, > I ran the command line > > maker2jbrowse muc1_genome_snap2.all.gff > > The command created some folders. However, at the end it read > No reference sequences defined in configuration, nothing to do. > > Please what does it mean? How can I view it in jbrowse. > > Thanks > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From eennadi at gmail.com Mon Oct 9 23:42:35 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Tue, 10 Oct 2017 05:42:35 +0100 Subject: [maker-devel] jbrowse not working In-Reply-To: <83AFE420-D54D-4CE8-833F-DE6CCC34A229@gmail.com> References: <83AFE420-D54D-4CE8-833F-DE6CCC34A229@gmail.com> Message-ID: Hi Carson Thanks for the reply I generated the off with this command gff3_merge ?d dpp_contig.maker.output/dpp_contig_master_datastore_index.log I had to rerun browse with the following command maker2jbrowse /Users/emmannaemeka/desktop/Gpm/maker/data/mucuna/muc1_genome_snap2.maker.output/muc1_genome_snap2.functional_blast.gff\maker2jbrowse -d /Users/emmannaemeka/Desktop/Gpm/maker/data/mucuna/muc1_genome_snap2.maker.output/muc1_genome_snap2_master_datastore_index.log \-out /Library/WebServer/Documents/JBrowse-1.12.1/muc/muc_jb Although its showing WARNING: No matching features found for mRNA I don't know what it means I don't understand what it means Successfully, I was able to setup the jbrowse local host. I had to move the jbrowse folder to my local host The jbrowse is up and running however, I have about 18488 contigs only 31 contigs are showing, how can i make all my contigs to show on jbrowse? Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications On Tue, Oct 10, 2017 at 1:35 AM, Carson Holt wrote: > Is muc1_genome_snap2.all.gff missing embedded fasta entries at the end of > the file? That can happen if you use the -n option with gff3_merge. > Alternatively it?s possible one of the individual contig gff3 used to build > the merged gff3 is truncated. If that is the case then gff3_merge should > have thrown some sort of error or warning when you run it. > > Thanks, > Carson > > > > > On Oct 7, 2017, at 3:34 PM, Emmanuel Nnadi wrote: > > Please, > I ran the command line > > maker2jbrowse muc1_genome_snap2.all.gff > > The command created some folders. However, at the end it read > No reference sequences defined in configuration, nothing to do. > > Please what does it mean? How can I view it in jbrowse. > > Thanks > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/ > publications > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at nbis.se Tue Oct 10 04:24:34 2017 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Tue, 10 Oct 2017 11:24:34 +0200 Subject: [maker-devel] MAKER annotation submission (EMBLmyGFF3) Message-ID: <967873FE-D61F-4233-A004-C877A60A2AC1@nbis.se> Hi MAKER users, I take advantage to this mailing list to share a tool that I hope will be useful for MAKER's users. One of the steps once we are happy of our wonderful annotation is to submit it to the public archives through one of the three INSDC databases (EMBL-EBI / NCBI / DDBJ). We developed EMBLmyGFF3, allowing to easily convert any kind of GFF3 annotation to the EMBL flat file format in order to submit to the European Nucleotide Archive (ENA) Database that is part of EMBL-EBI. It works well, amongst others, with the MAKER annotation output. We hope the tool will ease the submission process of your annotations. You will find it here: https://github.com/NBISweden/EMBLmyGFF3 A typical usage case will look like that (where ERSXXXXXX and PRJXXXXXX are the accession number and the project ID provided by EMBL-EBI prior to any submission): ./EMBLmyGFF3.py maker.gff3 maker.fa --data_class STD --topology linear --molecule_type 'genomic DNA' --table 1 --species 'Drosophila melanogaster (fly)' --taxonomy INV --accession ERSXXXXXXX --project_id PRJXXXXXXX --rg MYGROUP -o result.embl Best regards, Jacques Dainat, PhD --------------------------------------- NBIS (National Bioinformatics Infrastructure Sweden) Genome Annotation Service --------------------------------------- Uppsala University, Biomedicinska Centrum Department of Medical Biochemistry Microbiology, Genomics -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcsimenc at gmail.com Wed Oct 11 09:53:36 2017 From: mcsimenc at gmail.com (Matt Simenc) Date: Wed, 11 Oct 2017 07:53:36 -0700 Subject: [maker-devel] QI codes insufficient - how to get frac exons with EST only? Message-ID: Hey MAKER people, I would like to make a Venn diagram showing the kinds of evidence supporting gene models in my MAKER annotation where the left side shows number of genes with EST support only, the right side shows number of genes with protein support only, and the intersection shows number of genes with EST and protein support. QI summary has: Fraction of exons that overlap an EST alignment Fraction of exons that overlap EST or Protein alignments Please correct me if I'm wrong, because I am interpreting the first to be fraction of exons that overlap an EST alignment and possibly also a protein alignment. If that is the case then we can't calculate the number of genes that overlap only EST or (EST and protein) from the QI information. Anyone have a way to do this or have a script to parse the MAKER GFF3 to get this? Thanks!!! Matt Simenc -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Wed Oct 11 10:18:54 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Wed, 11 Oct 2017 11:18:54 -0400 Subject: [maker-devel] QI codes insufficient - how to get frac exons with EST only? In-Reply-To: References: Message-ID: <93934B45-909D-48FD-A840-B4F59F15AB53@gmail.com> Hi Matt, I have a hacky way that I?ve done it. It requires running MAKER two more times but they are quicker runs. To identify the genes that have protein support I pass all of the annotation back to MAKER using the model_gff option in the maker_opts.ctl file. Then I pull out all of the protein2genome features from the big MAKER GFF3 file and pass them in using the protein_gff option. I turn off all repeat masking and run MAKER. It runs fast because it doesn?t have to run any gene finders, align evidence, or repeatmask. In the output any gene with an AED less than 1 has protein support. Then I do the same thing with est2genome lines from the big GFF3 file and put them in as est_gff. The output of that one gives you genes with EST support. Then the genes with an AED of less than one in both sets have support from protein and EST. Hope this helps, Mike > On Oct 11, 2017, at 10:53 AM, Matt Simenc wrote: > > Hey MAKER people, > > I would like to make a Venn diagram showing the kinds of evidence supporting gene models in my MAKER annotation where the left side shows number of genes with EST support only, the right side shows number of genes with protein support only, and the intersection shows number of genes with EST and protein support. > > QI summary has: > > Fraction of exons that overlap an EST alignment > Fraction of exons that overlap EST or Protein alignments > > Please correct me if I'm wrong, because I am interpreting the first to be fraction of exons that overlap an EST alignment and possibly also a protein alignment. If that is the case then we can't calculate the number of genes that overlap only EST or (EST and protein) from the QI information. > > Anyone have a way to do this or have a script to parse the MAKER GFF3 to get this? > > Thanks!!! > Matt Simenc > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Oct 11 10:22:54 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 11 Oct 2017 09:22:54 -0600 Subject: [maker-devel] QI codes insufficient - how to get frac exons with EST only? In-Reply-To: <93934B45-909D-48FD-A840-B4F59F15AB53@gmail.com> References: <93934B45-909D-48FD-A840-B4F59F15AB53@gmail.com> Message-ID: <6A3091A3-5F0E-470D-89F3-4B6C16E50F4B@gmail.com> Also look at GAL for building GFF3 feature queries ?> https://github.com/The-Sequence-Ontology/GAL ?Carson > On Oct 11, 2017, at 9:18 AM, Michael Campbell wrote: > > Hi Matt, > > I have a hacky way that I?ve done it. It requires running MAKER two more times but they are quicker runs. > > To identify the genes that have protein support I pass all of the annotation back to MAKER using the model_gff option in the maker_opts.ctl file. Then I pull out all of the protein2genome features from the big MAKER GFF3 file and pass them in using the protein_gff option. I turn off all repeat masking and run MAKER. It runs fast because it doesn?t have to run any gene finders, align evidence, or repeatmask. In the output any gene with an AED less than 1 has protein support. Then I do the same thing with est2genome lines from the big GFF3 file and put them in as est_gff. The output of that one gives you genes with EST support. Then the genes with an AED of less than one in both sets have support from protein and EST. > > Hope this helps, > Mike > >> On Oct 11, 2017, at 10:53 AM, Matt Simenc wrote: >> >> Hey MAKER people, >> >> I would like to make a Venn diagram showing the kinds of evidence supporting gene models in my MAKER annotation where the left side shows number of genes with EST support only, the right side shows number of genes with protein support only, and the intersection shows number of genes with EST and protein support. >> >> QI summary has: >> >> Fraction of exons that overlap an EST alignment >> Fraction of exons that overlap EST or Protein alignments >> >> Please correct me if I'm wrong, because I am interpreting the first to be fraction of exons that overlap an EST alignment and possibly also a protein alignment. If that is the case then we can't calculate the number of genes that overlap only EST or (EST and protein) from the QI information. >> >> Anyone have a way to do this or have a script to parse the MAKER GFF3 to get this? >> >> Thanks!!! >> Matt Simenc >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcsimenc at gmail.com Wed Oct 11 23:19:04 2017 From: mcsimenc at gmail.com (Matt Simenc) Date: Wed, 11 Oct 2017 21:19:04 -0700 Subject: [maker-devel] QI codes insufficient - how to get frac exons with EST only? In-Reply-To: <6A3091A3-5F0E-470D-89F3-4B6C16E50F4B@gmail.com> References: <93934B45-909D-48FD-A840-B4F59F15AB53@gmail.com> <6A3091A3-5F0E-470D-89F3-4B6C16E50F4B@gmail.com> Message-ID: Very good, thank you! Matt On Wed, Oct 11, 2017 at 8:22 AM, Carson Holt wrote: > Also look at GAL for building GFF3 feature queries ?> > https://github.com/The-Sequence-Ontology/GAL > > ?Carson > > > > > On Oct 11, 2017, at 9:18 AM, Michael Campbell < > michael.s.campbell1 at gmail.com> wrote: > > Hi Matt, > > I have a hacky way that I?ve done it. It requires running MAKER two more > times but they are quicker runs. > > To identify the genes that have protein support I pass all of the > annotation back to MAKER using the model_gff option in the maker_opts.ctl > file. Then I pull out all of the protein2genome features from the big MAKER > GFF3 file and pass them in using the protein_gff option. I turn off all > repeat masking and run MAKER. It runs fast because it doesn?t have to run > any gene finders, align evidence, or repeatmask. In the output any gene > with an AED less than 1 has protein support. Then I do the same thing with > est2genome lines from the big GFF3 file and put them in as est_gff. The > output of that one gives you genes with EST support. Then the genes with an > AED of less than one in both sets have support from protein and EST. > > Hope this helps, > Mike > > On Oct 11, 2017, at 10:53 AM, Matt Simenc wrote: > > Hey MAKER people, > > I would like to make a Venn diagram showing the kinds of evidence > supporting gene models in my MAKER annotation where the left side shows > number of genes with EST support only, the right side shows number of genes > with protein support only, and the intersection shows number of genes with > EST and protein support. > > QI summary has: > > Fraction of exons that overlap an EST alignment > Fraction of exons that overlap EST or Protein alignments > > Please correct me if I'm wrong, because I am interpreting the first to be > fraction of exons that overlap an EST alignment and possibly also a protein > alignment. If that is the case then we can't calculate the number of genes > that overlap only EST or (EST and protein) from the QI information. > > Anyone have a way to do this or have a script to parse the MAKER GFF3 to > get this? > > Thanks!!! > Matt Simenc > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott at scottcain.net Thu Oct 12 18:33:05 2017 From: scott at scottcain.net (Scott Cain) Date: Thu, 12 Oct 2017 19:33:05 -0400 Subject: [maker-devel] GMOD hackathon before PAG San Diego in January Message-ID: Hi all, This January before PAG on the Wednesday and Thursday before PAG (January 10-11) in San Diego we are planning a GMOD hackathon. We expect that participants will be interested in solving problems/creating solutions related to Tripal, JBrowse, Apollo, and Galaxy but if you're interested in another GMOD project, by all means, let us know! We expect this hackathon to overlap with the Tripal hackathon that is on January 11 (I'm pretty sure; right Stephen?) If you are interested in attending this hackathon, please let me know so I can be sure we have an appropriately sized space. And if you're coming for the pre-PAG hackathon, consider staying for PAG, since there is always a lot of GMOD-related content at the meeting! Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From daren.card at gmail.com Thu Oct 12 21:22:54 2017 From: daren.card at gmail.com (Daren C. Card) Date: Thu, 12 Oct 2017 21:22:54 -0500 Subject: [maker-devel] MAKER RepeatRunner error on long scaffolds only In-Reply-To: <228ECD18-7B0E-47EB-9F58-FA3C31421A52@gmail.com> References: <2460BB61-C918-40B5-ABF2-03193BF13CCC@gmail.com> <52A27F91-063E-45C5-BEE0-BED0BF4E861E@gmail.com> <228ECD18-7B0E-47EB-9F58-FA3C31421A52@gmail.com> Message-ID: <90B18E05-63DB-4458-BC9B-807972BE1414@gmail.com> Hi Carson, Thanks for the help. Issue is still lingering. I?ve tried my full ?ideal? run using both the BLAST legacy 2.2.26 and also 2.6 and get the same error, so doesn?t seem to be a BLAST issue. Or is one that won?t be easy to overcome. Using BLAST v. 2.6, I tried some more runs turning off RepeatRunner or excluding the complex repeat GFF I?m trying to supply. Seems to be running fine without my GFF, which indicates to me that the issue is this file and not BLAST. Disclaimer: I didn?t run the entire scaffold since it is quite large, but it went well past the point at which it was otherwise failing which leads me to believe it would finish okay. I validated the GFF at http://genometools.org/cgi-bin/gff3validator.cgi. I had previously had <10 negative start coordinates for the repeat coordinates in the attributes field of the GFF, which I just set to 1 to give a clean GFF. This was what I used for the runs I described above, so whatever issue there is with this GFF is a mystery to me. What advice do you have for further troubleshooting to try to determine what part of the GFF is causing the issue? I don?t see any obvious way info about how the sequence or the GFF is partitioned up for the annotation among the output files produced, so any help you can provide would be great. Hoping I can resolve this as maybe this is useful to others. Weird that I?m getting this error, as I?ve annotated several other genomes in a similar manner and never had this issue. They were less contiguous, but can?t imagine that really mattering. Thanks, Daren > On Oct 8, 2017, at 7:37 PM, Carson Holt wrote: > > MAKER will use whatever blast is indicated in maker_exe.ctl, so make sure the new installation is the one indicated there. RepeatRunner is not part of RepeatMasker, and is a separate step that is essentially just a modified BLASTX against a protein database. So the standard NCBI blast+ installation is what gets used for that (not RMBLAST). > > The error you get is because the BLAST report is truncated. At the top of a BLAST report there is a summary of results, and then below there are details about each result. What is happening is that there are results in the top summary that are not being found in the bottom detail section. If Updating to BLAST+ 2.6 does not fix it for you, you may need to drop to legacy NCBI BLAST (i.e. the one that is not the BLAST+ rewrite). Here ?> ftp://ftp.ncbi.nlm.nih.gov/blast/executables/legacy/2.2.26/ > > ?Carson > > > > > >> On Oct 6, 2017, at 6:23 AM, Daren C. Card wrote: >> >> Dear Carson, >> >> Thanks so much for the quick reply. I updated BLAST to v2.6 and reran the configure script for RepeatMasker. Looks like MAKER should natively work with the BLAST that is available in the $PATH. >> >> Unfortunately, I?m still getting the same error what appears to be at roughly the same spot (~child 226). I?ve copied the stderr below. I checked my GFF file and I don?t see any issues with coordinates. I?m going to try running without a GFF of repeat annotations to see what that does, but in the meantime I wanted to send an update and see if there is anything else I should look into. >> >> Thank you, >> Daren Card >> >> >> ################################################ >> doing repeat masking >> re reading repeat masker report. >> /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.simple.rb.out >> doing blastx repeats >> re reading blast report. >> /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.te_proteins%2Efasta.repeatrunner >> deleted:2 hits >> doing blastx repeats >> doing blastx repeats >> doing blastx repeats >> doing blastx repeats >> doing blastx repeats >> doing blastx repeats >> doing blastx repeats >> doing blastx repeats >> doing blastx repeats >> collecting blastx repeatmasking >> processing all repeats >> in cluster::shadow_cluster... >> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >> --> rank=NA, hostname=moonunit0 >> ERROR: Failed while processing all repeats >> ERROR: Chunk failed at level:3, tier_type:1 >> FAILED CONTIG:scaffold-1 >> >> ERROR: Chunk failed at level:2, tier_type:0 >> FAILED CONTIG:scaffold-1 >> >> examining contents of the fasta file and run log >> ################################################ >> >> >> >>> On Oct 4, 2017, at 11:03 AM, Carson Holt wrote: >>> >>> The point where it dies is because there is no start/end coordinate for one of the alignments. The issue can either be with the GFF3 you gave it or is a truncated BLAST report. Recently there have been a number of weird BLAST+ issues related to truncated reports. Updating to 2.6+ seems to solve it for most people. There is also a 2.6 update for rmblast inside RepeatMasker. I submitted a bug report and example set to BLAST a few months ago. >>> >>> ?Carson >>> >>> >>>> On Oct 4, 2017, at 9:53 AM, Daren C. Card wrote: >>>> >>>> Hi all, >>>> >>>> I?ve been having an issue with MAKER (v. 2.31.8) that I haven?t been able to overcome, and no former questions have really addressed or helped fix the problem. I?ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size? >>>> >>>> I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can?t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I?m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library). >>>> >>>> Any help would be greatly appreciated. >>>> Daren Card >>>> >>>> University of Texas Arlington >>>> >>>> ################################################### >>>> doing blastx repeats >>>> running blast search. >>>> #--------- command -------------# >>>> Widget::blastx: >>>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner >>>> #-------------------------------# >>>> deleted:0 hits >>>> collecting blastx repeatmasking >>>> processing all repeats >>>> in cluster::shadow_cluster... >>>> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >>>> --> rank=3, hostname=moonunit0 >>>> ERROR: Failed while processing all repeats >>>> ERROR: Chunk failed at level:3, tier_type:1 >>>> FAILED CONTIG:scaffold-1 >>>> >>>> doing blastx repeats >>>> running blast search. >>>> #--------- command -------------# >>>> Widget::blastx: >>>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner >>>> #-------------------------------# >>>> ERROR: Chunk failed at level:2, tier_type:0 >>>> FAILED CONTIG:scaffold-1 >>>> >>>> deleted:0 hits >>>> deleted:0 hits >>>> ################################################### >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> > From robert.zimmermann at univie.ac.at Wed Oct 11 14:42:14 2017 From: robert.zimmermann at univie.ac.at (Bob Zimmermann) Date: Wed, 11 Oct 2017 21:42:14 +0200 Subject: [maker-devel] custom "ab initio" predictions with automatic hint-based predictions Message-ID: Hello, I would like to run maker with a custom set of ab initio predictions (based on hints given to augustus from RNAseq data), but allowing it to incorporate EST and protein data to make an additional run of augustus using hints derived from those alignments. My gene prediction section of the maker_opts.ctl file looks like this: ... augustus_species=all_combined #Augustus gene prediction species model ... pred_gff=../ab_initio_predictions/all_combined.augustus_masked.gff3 #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no ? It seems as though even if pred_gff is set, augustus will still be run for ab initio predictions with no hints if an augustus_species setting is present. I was curious if there was any way around this, partly because custom ab initios could improve my annotation and also because the ab initio step can take long. Thanks for your help! Bob From xvazquezc at gmail.com Thu Oct 12 01:09:32 2017 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Thu, 12 Oct 2017 17:09:32 +1100 Subject: [maker-devel] choosing the right gene model Message-ID: Hi there, I was visualising the annotations and I realised that in some cases, what it seems to be a gene is splitted according to one of the gene models, despite that the other 2, est2genome and prot2genome suggest that it isn't the case. Although the opposite also happens. ? For some reason, the "out of place" model is always (or almost) the one from Genemark. How much weight does carry the RNAseq and protein data on this decision (if any)? How exactly is the final gene selected? Cheers, Xabi -- Xabier V?zquez-Campos, *PhD* *Research Associate* NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: split-gene.png Type: image/png Size: 66389 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: merged-gene.png Type: image/png Size: 63815 bytes Desc: not available URL: From jan.nagel at fabi.up.ac.za Thu Oct 12 02:37:07 2017 From: jan.nagel at fabi.up.ac.za (Jan FABI) Date: Thu, 12 Oct 2017 09:37:07 +0200 Subject: [maker-devel] Maker problem Message-ID: Dear Maker team I am experiencing a problem while running maker and cannot find a solution to it online. I am running maker on a new genome, using BRAKER trained models for Augustus and GeneMark. This was successful and performed as expected, except for one contig where an error was encountered. This error occurs during Augustus and seems to have something to do with intron models. I have made sure that the input fasta does not contain characters other than ATCGN or contains "windows"/non-UNIX carriage returns. I include the relevant portion of the log below. Could you help me determine the cause of this error. setting up GFF3 output and fasta chunks preparing ab-inits running augustus. #--------- command -------------# Widget::augustus: /home/xxx/Desktop/programs/augustus-3.2.3/bin/augustus --species=Np_2017_braker --UTR=off /tmp/maker_bQo5Oc/NODE_1040_length_26483_cov_27%2E125137.abinit_masked.0 > /tmp/maker_bQo5Oc/NODE_1040_length_26483_cov_27%2E125137.abinit_masked.0.Np_2017_braker.augustus #-------------------------------# Sampling error in intron model. state=37 base=26570 /home/xxx/Desktop/programs/augustus-3.2.3/bin/augustus: ERROR Tried to sample from empty list. Sampling error in intron model. state=37 base=26570 /home/xxx/Desktop/programs/augustus-3.2.3/bin/augustus: ERROR Tried to sample from empty list. ERROR: Augustus failed --> rank=NA, hostname=xxx-VirtualBox ERROR: Failed while preparing ab-inits ERROR: Chunk failed at level:0, tier_type:2 FAILED CONTIG:NODE_1040_length_26483_cov_27.125137 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:NODE_1040_length_26483_cov_27.125137 -- Regards Jan Nagel ---------------------------------------------------------------------- PhD Genetics student Department of Genetics Forestry and Agricultural Biotechnology Institute (FABI) FABI 1, Room 1-55 University of Pretoria 74 Lunnon Rd. Hillcrest 0002 Gauteng Province South Africa Email : jan.nagel at fabi.up.ac.za Website: http://www.fabinet.up.ac.za/index.php/people-profile?profile=961 -- This message and attachments are subject to a disclaimer. Please refer to http://upnet.up.ac.za/services/it/documentation/docs/004167.pdf for full details. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott at scottcain.net Thu Oct 12 18:40:33 2017 From: scott at scottcain.net (Scott Cain) Date: Thu, 12 Oct 2017 19:40:33 -0400 Subject: [maker-devel] Call for presentations at GMOD workshop at PAG Message-ID: Hi all, This January in San Diego is the annual Plant and Animal Genomes (PAG) meeting (http://www.intlpag.org). As in previous PAGs, there will be several opportunities to present content related to GMOD projects. If you are interested in attending PAG and giving a talk at the GMOD workshop on Wednesday, January 17, please let me know. Your talk can either be about new developments/functionality in existing GMOD software, about how your organization is using the suite of GMOD software to good effect, or about technologies that you think the GMOD community would be interested in hearing about. Please email me directly with a title, an abstract or a vague idea of what you'd like to talk about. Also, if you'd really like to come but are having a hard time coming up with travel funds, please let me know, I might be able to help you with that too (up to a limit of one person anyway). Cheers, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Oct 13 10:37:25 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 Oct 2017 09:37:25 -0600 Subject: [maker-devel] MAKER RepeatRunner error on long scaffolds only In-Reply-To: <90B18E05-63DB-4458-BC9B-807972BE1414@gmail.com> References: <2460BB61-C918-40B5-ABF2-03193BF13CCC@gmail.com> <52A27F91-063E-45C5-BEE0-BED0BF4E861E@gmail.com> <228ECD18-7B0E-47EB-9F58-FA3C31421A52@gmail.com> <90B18E05-63DB-4458-BC9B-807972BE1414@gmail.com> Message-ID: So you have an input GFF3 file? Could you send it to me along with the problem contig. If you want you can upload the maker control files and evidence sets, and I can just recreate the run for the contig. Upload here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi ?Carson > On Oct 12, 2017, at 8:22 PM, Daren C. Card wrote: > > Hi Carson, > > Thanks for the help. Issue is still lingering. I?ve tried my full ?ideal? run using both the BLAST legacy 2.2.26 and also 2.6 and get the same error, so doesn?t seem to be a BLAST issue. Or is one that won?t be easy to overcome. > > Using BLAST v. 2.6, I tried some more runs turning off RepeatRunner or excluding the complex repeat GFF I?m trying to supply. Seems to be running fine without my GFF, which indicates to me that the issue is this file and not BLAST. Disclaimer: I didn?t run the entire scaffold since it is quite large, but it went well past the point at which it was otherwise failing which leads me to believe it would finish okay. > > I validated the GFF at http://genometools.org/cgi-bin/gff3validator.cgi. I had previously had <10 negative start coordinates for the repeat coordinates in the attributes field of the GFF, which I just set to 1 to give a clean GFF. This was what I used for the runs I described above, so whatever issue there is with this GFF is a mystery to me. > > What advice do you have for further troubleshooting to try to determine what part of the GFF is causing the issue? I don?t see any obvious way info about how the sequence or the GFF is partitioned up for the annotation among the output files produced, so any help you can provide would be great. > > Hoping I can resolve this as maybe this is useful to others. Weird that I?m getting this error, as I?ve annotated several other genomes in a similar manner and never had this issue. They were less contiguous, but can?t imagine that really mattering. > > Thanks, > Daren > > >> On Oct 8, 2017, at 7:37 PM, Carson Holt wrote: >> >> MAKER will use whatever blast is indicated in maker_exe.ctl, so make sure the new installation is the one indicated there. RepeatRunner is not part of RepeatMasker, and is a separate step that is essentially just a modified BLASTX against a protein database. So the standard NCBI blast+ installation is what gets used for that (not RMBLAST). >> >> The error you get is because the BLAST report is truncated. At the top of a BLAST report there is a summary of results, and then below there are details about each result. What is happening is that there are results in the top summary that are not being found in the bottom detail section. If Updating to BLAST+ 2.6 does not fix it for you, you may need to drop to legacy NCBI BLAST (i.e. the one that is not the BLAST+ rewrite). Here ?> ftp://ftp.ncbi.nlm.nih.gov/blast/executables/legacy/2.2.26/ >> >> ?Carson >> >> >> >> >> >>> On Oct 6, 2017, at 6:23 AM, Daren C. Card wrote: >>> >>> Dear Carson, >>> >>> Thanks so much for the quick reply. I updated BLAST to v2.6 and reran the configure script for RepeatMasker. Looks like MAKER should natively work with the BLAST that is available in the $PATH. >>> >>> Unfortunately, I?m still getting the same error what appears to be at roughly the same spot (~child 226). I?ve copied the stderr below. I checked my GFF file and I don?t see any issues with coordinates. I?m going to try running without a GFF of repeat annotations to see what that does, but in the meantime I wanted to send an update and see if there is anything else I should look into. >>> >>> Thank you, >>> Daren Card >>> >>> >>> ################################################ >>> doing repeat masking >>> re reading repeat masker report. >>> /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.simple.rb.out >>> doing blastx repeats >>> re reading blast report. >>> /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.te_proteins%2Efasta.repeatrunner >>> deleted:2 hits >>> doing blastx repeats >>> doing blastx repeats >>> doing blastx repeats >>> doing blastx repeats >>> doing blastx repeats >>> doing blastx repeats >>> doing blastx repeats >>> doing blastx repeats >>> doing blastx repeats >>> collecting blastx repeatmasking >>> processing all repeats >>> in cluster::shadow_cluster... >>> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >>> --> rank=NA, hostname=moonunit0 >>> ERROR: Failed while processing all repeats >>> ERROR: Chunk failed at level:3, tier_type:1 >>> FAILED CONTIG:scaffold-1 >>> >>> ERROR: Chunk failed at level:2, tier_type:0 >>> FAILED CONTIG:scaffold-1 >>> >>> examining contents of the fasta file and run log >>> ################################################ >>> >>> >>> >>>> On Oct 4, 2017, at 11:03 AM, Carson Holt wrote: >>>> >>>> The point where it dies is because there is no start/end coordinate for one of the alignments. The issue can either be with the GFF3 you gave it or is a truncated BLAST report. Recently there have been a number of weird BLAST+ issues related to truncated reports. Updating to 2.6+ seems to solve it for most people. There is also a 2.6 update for rmblast inside RepeatMasker. I submitted a bug report and example set to BLAST a few months ago. >>>> >>>> ?Carson >>>> >>>> >>>>> On Oct 4, 2017, at 9:53 AM, Daren C. Card wrote: >>>>> >>>>> Hi all, >>>>> >>>>> I?ve been having an issue with MAKER (v. 2.31.8) that I haven?t been able to overcome, and no former questions have really addressed or helped fix the problem. I?ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size? >>>>> >>>>> I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can?t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I?m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library). >>>>> >>>>> Any help would be greatly appreciated. >>>>> Daren Card >>>>> >>>>> University of Texas Arlington >>>>> >>>>> ################################################### >>>>> doing blastx repeats >>>>> running blast search. >>>>> #--------- command -------------# >>>>> Widget::blastx: >>>>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner >>>>> #-------------------------------# >>>>> deleted:0 hits >>>>> collecting blastx repeatmasking >>>>> processing all repeats >>>>> in cluster::shadow_cluster... >>>>> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >>>>> --> rank=3, hostname=moonunit0 >>>>> ERROR: Failed while processing all repeats >>>>> ERROR: Chunk failed at level:3, tier_type:1 >>>>> FAILED CONTIG:scaffold-1 >>>>> >>>>> doing blastx repeats >>>>> running blast search. >>>>> #--------- command -------------# >>>>> Widget::blastx: >>>>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner >>>>> #-------------------------------# >>>>> ERROR: Chunk failed at level:2, tier_type:0 >>>>> FAILED CONTIG:scaffold-1 >>>>> >>>>> deleted:0 hits >>>>> deleted:0 hits >>>>> ################################################### >>>>> >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Oct 13 10:42:41 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 Oct 2017 09:42:41 -0600 Subject: [maker-devel] custom "ab initio" predictions with automatic hint-based predictions In-Reply-To: References: Message-ID: <947BFB2F-A893-417B-A043-07CE71F6F97E@gmail.com> Hi Bob, pred_gff is a way to get models MAKER cannot run into the analysis. Input to pred_gff will not get hints since MAKER is not running the program. Setting augustus_species allows MAKER to run Augustus with and without hints and then those models compete against each other. You cannot just run with hints as the raw model is also used as a filter to help reduce false positive gene models that result from bad hints. If the gff3 you are providing is the same as the MAKER run of Augustus, I would recommend not providing it. If it is different in some way, then you can leave it in. If you run under MPI (it?s ok to run MPI on a single machine), then MAKER will parallelize the Augustus run by running multiple configs and contig chunks at the same time. Thanks, Carson > On Oct 11, 2017, at 1:42 PM, Bob Zimmermann wrote: > > Hello, > > I would like to run maker with a custom set of ab initio predictions (based on hints given to augustus from RNAseq data), but allowing it to incorporate EST and protein data to make an additional run of augustus using hints derived from those alignments. > > My gene prediction section of the maker_opts.ctl file looks like this: > ... > augustus_species=all_combined #Augustus gene prediction species model > ... > pred_gff=../ab_initio_predictions/all_combined.augustus_masked.gff3 #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) > est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no > ? > > It seems as though even if pred_gff is set, augustus will still be run for ab initio predictions with no hints if an augustus_species setting is present. I was curious if there was any way around this, partly because custom ab initios could improve my annotation and also because the ab initio step can take long. > > Thanks for your help! > > Bob > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri Oct 13 10:50:26 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 Oct 2017 09:50:26 -0600 Subject: [maker-devel] Maker problem In-Reply-To: References: Message-ID: If you look in the folder of the failed contig under the .../theVoid directory there will be a file called query.masked.fasta. Copy that file somewhere. Then because maker gave you the command that failed, you can run it all by itself outside of MAKER Example ?> /home/xxx/Desktop/programs/augustus-3.2.3/bin/augustus --species=Np_2017_braker --UTR=off query.masked.fasta If it still fails, you now have a test file and command you can send to Mario Stanke (mario.stanke at uni-greifswald.de ). He made Augustus. It may be a bug he has already fixed (current Augustus version is 3.3) or there may be something in the species file causing the error that he can point out. ?Carson > On Oct 12, 2017, at 1:37 AM, Jan FABI wrote: > > Dear Maker team > > I am experiencing a problem while running maker and cannot find a solution to it online. > > I am running maker on a new genome, using BRAKER trained models for Augustus and GeneMark. This was successful and performed as expected, except for one contig where an error was encountered. > > This error occurs during Augustus and seems to have something to do with intron models. I have made sure that the input fasta does not contain characters other than ATCGN or contains "windows"/non-UNIX carriage returns. > > I include the relevant portion of the log below. Could you help me determine the cause of this error. > > > > setting up GFF3 output and fasta chunks > preparing ab-inits > running augustus. > #--------- command -------------# > Widget::augustus: > /home/xxx/Desktop/programs/augustus-3.2.3/bin/augustus --species=Np_2017_braker --UTR=off /tmp/maker_bQo5Oc/NODE_1040_length_26483_cov_27%2E125137.abinit_masked.0 > /tmp/maker_bQo5Oc/NODE_1040_length_26483_cov_27%2E125137.abinit_masked.0.Np_2017_braker.augustus > #-------------------------------# > Sampling error in intron model. state=37 base=26570 > > /home/xxx/Desktop/programs/augustus-3.2.3/bin/augustus: ERROR > Tried to sample from empty list. > > Sampling error in intron model. state=37 base=26570 > > /home/xxx/Desktop/programs/augustus-3.2.3/bin/augustus: ERROR > Tried to sample from empty list. > > ERROR: Augustus failed > --> rank=NA, hostname=xxx-VirtualBox > ERROR: Failed while preparing ab-inits > ERROR: Chunk failed at level:0, tier_type:2 > FAILED CONTIG:NODE_1040_length_26483_cov_27.125137 > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:NODE_1040_length_26483_cov_27.125137 > > -- > Regards > Jan Nagel > ---------------------------------------------------------------------- > PhD Genetics student > Department of Genetics > Forestry and Agricultural Biotechnology Institute (FABI) > FABI 1, Room 1-55 > University of Pretoria > 74 Lunnon Rd. Hillcrest > 0002 > Gauteng Province > South Africa > > Email : jan.nagel at fabi.up.ac.za > > Website: http://www.fabinet.up.ac.za/index.php/people-profile?profile=961 > This message and attachments are subject to a disclaimer. > Please refer to http://upnet.up.ac.za/services/it/documentation/docs/004167.pdf for full details. > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Oct 13 10:56:43 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 Oct 2017 09:56:43 -0600 Subject: [maker-devel] choosing the right gene model In-Reply-To: References: Message-ID: <821CB4FC-5571-41B1-AB2F-5FDD691C49D9@gmail.com> Both transcript and protein evidence will go into the AED calculation for overlap support. So in both cases the chosen model had better overlap (protein evidence will not count toward the eAED overlap calculation if it is out of frame with the model it is supposed to be supporting). The larger merged model generates a clutering affect on it?s evidence, so it?s evidence set for AED calculation is slightly different than the SNAP and Augustus model would generate. In both cases, I think GeneMark is hurting more than it is helping. You may want to just drop it from the analysis (unless it?s a fungi, I often find GeneMark can have that affect). ?Carson > On Oct 12, 2017, at 12:09 AM, Xabier V?zquez-Campos wrote: > > Hi there, > > I was visualising the annotations and I realised that in some cases, what it seems to be a gene is splitted according to one of the gene models, despite that the other 2, est2genome and prot2genome suggest that it isn't the case. > > > > Although the opposite also happens. > > > ? > For some reason, the "out of place" model is always (or almost) the one from Genemark. > > How much weight does carry the RNAseq and protein data on this decision (if any)? > How exactly is the final gene selected? > > Cheers, > Xabi > > -- > Xabier V?zquez-Campos, PhD > Research Associate > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Oct 13 11:56:30 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 Oct 2017 10:56:30 -0600 Subject: [maker-devel] jbrowse not working In-Reply-To: References: <83AFE420-D54D-4CE8-833F-DE6CCC34A229@gmail.com> Message-ID: <2D6E11BC-6853-458D-AEB1-12EF74D041A3@gmail.com> The master_datastore_index.log file has a list of failed and finished contigs. You can grep the file contents for FAILED or DIED to see if any contigs are not finished. Finished contigs will be listed as FINISHED in the file. Also note that if you have errors with the jbrowse build, you have to start over (i.e. wipe out old build). Rerunning the command over a failed build will try and insert again which can generate it?s own errors. If gff3_merge was run without the -n option then you need to see if one of the GFF3 files being used is truncated (possibly dew to an IO error - not uncommon on NFS storage). You will need to see if you can identify which contig file is truncated and rerun it. ?Carson > On Oct 9, 2017, at 10:42 PM, Emmanuel Nnadi wrote: > > Hi Carson > Thanks for the reply > > I generated the off with this command gff3_merge ?d dpp_contig.maker.output/dpp_contig_master_datastore_index.log > > I had to rerun browse with the following command > > maker2jbrowse /Users/emmannaemeka/desktop/Gpm/maker/data/mucuna/muc1_genome_snap2.maker.output/muc1_genome_snap2.functional_blast.gff\maker2jbrowse -d /Users/emmannaemeka/Desktop/Gpm/maker/data/mucuna/muc1_genome_snap2.maker.output/muc1_genome_snap2_master_datastore_index.log \-out /Library/WebServer/Documents/JBrowse-1.12.1/muc/muc_jb > > Although its showing > > WARNING: No matching features found for mRNA I don't know what it means > > I don't understand what it means > > > Successfully, I was able to setup the jbrowse local host. I had to move the jbrowse folder to my local host > > > The jbrowse is up and running however, I have about 18488 contigs only 31 contigs are showing, how can i make all my contigs to show on jbrowse? > > > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > On Tue, Oct 10, 2017 at 1:35 AM, Carson Holt > wrote: > Is muc1_genome_snap2.all.gff missing embedded fasta entries at the end of the file? That can happen if you use the -n option with gff3_merge. Alternatively it?s possible one of the individual contig gff3 used to build the merged gff3 is truncated. If that is the case then gff3_merge should have thrown some sort of error or warning when you run it. > > Thanks, > Carson > > > > >> On Oct 7, 2017, at 3:34 PM, Emmanuel Nnadi > wrote: >> >> Please, >> I ran the command line >> >> maker2jbrowse muc1_genome_snap2.all.gff >> >> The command created some folders. However, at the end it read >> No reference sequences defined in configuration, nothing to do. >> >> Please what does it mean? How can I view it in jbrowse. >> >> Thanks >> >> >> Nnadi Nnaemeka Emmanuel >> Department of Microbiology, >> Faculty of Natural and Applied Science, >> Plateau State University, Bokkos, Plateau State, Nigeria. >> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Fri Oct 13 15:26:40 2017 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Sat, 14 Oct 2017 07:26:40 +1100 Subject: [maker-devel] choosing the right gene model In-Reply-To: <821CB4FC-5571-41B1-AB2F-5FDD691C49D9@gmail.com> References: <821CB4FC-5571-41B1-AB2F-5FDD691C49D9@gmail.com> Message-ID: Actually, it's a fungal genome. Although not very typical, almost half of it are repeats. Worth mention that Genemark generates a lot of predictions that overlap LTRs and other complex repeats, something that neither SNAP or Augustus do. Have you seen this before? On 14 Oct. 2017 02:56, "Carson Holt" wrote: > Both transcript and protein evidence will go into the AED calculation for > overlap support. So in both cases the chosen model had better overlap > (protein evidence will not count toward the eAED overlap calculation if it > is out of frame with the model it is supposed to be supporting). The larger > merged model generates a clutering affect on it?s evidence, so it?s > evidence set for AED calculation is slightly different than the SNAP and > Augustus model would generate. In both cases, I think GeneMark is hurting > more than it is helping. You may want to just drop it from the analysis > (unless it?s a fungi, I often find GeneMark can have that affect). > > ?Carson > > > On Oct 12, 2017, at 12:09 AM, Xabier V?zquez-Campos > wrote: > > Hi there, > > I was visualising the annotations and I realised that in some cases, what > it seems to be a gene is splitted according to one of the gene models, > despite that the other 2, est2genome and prot2genome suggest that it isn't > the case. > > > > Although the opposite also happens. > > > ? > For some reason, the "out of place" model is always (or almost) the one > from Genemark. > > How much weight does carry the RNAseq and protein data on this decision > (if any)? > How exactly is the final gene selected? > > Cheers, > Xabi > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From z2.stewart at qut.edu.au Sun Oct 15 00:02:08 2017 From: z2.stewart at qut.edu.au (ZACHARY STEWART) Date: Sun, 15 Oct 2017 05:02:08 +0000 Subject: [maker-devel] Advanced repeat library construction - CRL step 4 assistance Message-ID: Hello MAKER team, I am hoping I could have a bit of your time if that isn't a problem. I am currently performing the advanced repeat library construction as described on the MAKER wiki, and everything appears to work as expected until I reach "2.1.5 Building examplars". At this point I encounter a problem previously documented in the Google group (title: advanced repeat masking library constructions & rna-seq assembly choices) where the "Inner_Seq_For_BLAST.fasta" and "lLTRs_Seq_For_BLAST.fasta" are empty. I was hoping you could clarify what you meant by simplifying the sequence names. The genomic contig names are in a format such as ">001676F" and I modified the MITE library to have names like ">mite1, >mite2" etc. The passed_outinner_sequence.fasta has sequence names such as ">000021F_(dbseq-nr_766)_[918983,922225]" which I have not tried changing since I suspect the name is important for later reassociation. If you could point me in the right direction that would be very appreciated. Regards, Zac. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eennadi at gmail.com Sun Oct 15 16:32:10 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Sun, 15 Oct 2017 22:32:10 +0100 Subject: [maker-devel] Backlash running through my sequence Message-ID: Hi all, I am trying to running annotation on some of my sequences but noticed that i have backslash that runs through the sequence. Please how do I remove them I attached the sequence Thanks Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: sample_1.fasta Type: application/octet-stream Size: 3884914 bytes Desc: not available URL: From xvazquezc at gmail.com Mon Oct 16 02:26:56 2017 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Mon, 16 Oct 2017 18:26:56 +1100 Subject: [maker-devel] Advanced repeat library construction - CRL step 4 assistance In-Reply-To: References: Message-ID: Hi Zac, The contig names you indicate shouldn't give any problems. And if you changed the names of MITE.lib right after creation and before using it downstream, it shouldn't be an issue. Have you confirmed if the prior blastx output has any results? Also, be sure you use the same version of makeblastdb and blastx/blastn. I remember reading before running the protocol for first time that in some cases, switching versions could give problems. And be careful if you copy/paste from the wiki page, there are a few typos and dashes instead of minus characters in the command line option flags, all of which will result in errors Xabi On 15 October 2017 at 16:02, ZACHARY STEWART wrote: > Hello MAKER team, > > > I am hoping I could have a bit of your time if that isn't a problem. I am > currently performing the advanced repeat library construction as described > on the MAKER wiki, and everything appears to work as expected until I reach > "2.1.5 Building examplars". At this point I encounter a problem previously > documented in the Google group (title: advanced repeat masking library > constructions & rna-seq assembly choices) where the "Inner_Seq_For_BLAST.fasta" > and "lLTRs_Seq_For_BLAST.fasta" are empty. I was hoping you could clarify > what you meant by simplifying the sequence names. The genomic contig names > are in a format such as ">001676F" and I modified the MITE library to > have names like ">mite1, >mite2" etc. The passed_outinner_sequence.fasta > has sequence names such as ">000021F_(dbseq-nr_766)_[918983,922225]" > which I have not tried changing since I suspect the name is important for > later reassociation. If you could point me in the right direction that > would be very appreciated. > > > Regards, > > Zac. > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From yuejiaxing at gmail.com Mon Oct 16 03:54:42 2017 From: yuejiaxing at gmail.com (Jia-Xing Yue) Date: Mon, 16 Oct 2017 10:54:42 +0200 Subject: [maker-devel] maker-devel Digest, Vol 113, Issue 13 In-Reply-To: References: Message-ID: Dear maker developers, I am trying to install maker-3.01.1-beta but encountered the warning message about uninitialized value (see the warning message below) although still finished the installation. [jxyue at paralog src]$ ./Build install Building MAKER Use of uninitialized value $line in chomp at /home/jxyue/Projects/LRSDAY/bu ild/maker/src/../../../build/cpanm/perlmods/lib/perl5/Module/Build/Base.pm line 3082. Use of uninitialized value $line in substitution (s///) at /home/jxyue/Projects/LRSDAY/build/maker/src/../../../build/ cpanm/perlmods/lib/perl5/Module/Build/Base.pm line 3083. Installing MAKER... Building MAKER ... Also, when I ran this installation for the actual work, it reported errors about cannot find my specified snaphmm model for the annotation, despite that I have specified "snaphmm=$LRSDAY_HOME/data/S288C.gene.hmm" in the "maker_opts.ctl" file and this configuration information has been successfully recognized by maker. running snap. #--------- command -------------# Widget::snap: /home/jxyue/Projects/LRSDAY/build/SNAP/snap /home/jxyue/Projects/LRSDAY/data/S288C.gene.hmm /tmp/maker_m8TVEQ/chrI.abinit_masked.0 > /tmp/maker_m8TVEQ/chrI.abinit_masked.0.S288C%2Egene%2Ehmm.snap #-------------------------------# # (my comment: up to now everything looks fine) .... running snap. #--------- command -------------# Widget::snap: /home/jxyue/Projects/LRSDAY/build/SNAP/snap -plus -xdef /tmp/maker_m8TVEQ/0/85_0.4044-4985.S288C.gene.hmm.auto_annotator.xdef.snap S288C.gene.hmm /tmp /maker_m8TVEQ/0/85_0.4044-4985.S288C.gene.hmm.auto_annotator.snap.fasta > /tmp/maker_m8TVEQ/0/85_0.4044-4985.S288C.gene.hmm.auto_annotator.snap #-------------------------------# ZOE ERROR (from /home/jxyue/Projects/LRSDAY/build/SNAP/snap): error opening file (/home/jxyue/Projects/LRSDAY/build/SNAP/Zoe/HMM/S288C.gene.hmm) ZOE library version 2017-03-01 ERROR: Snap failed --> rank=NA, hostname=paralog.itc.unipi.it ERROR: Failed while annotating transcripts ERROR: Chunk failed at level:1, tier_type:4 FAILED CONTIG:chrI ERROR: Chunk failed at level:6, tier_type:0 FAILED CONTIG:chrI examining contents of the fasta file and run log # (my comment: here the error occurred. As you can see, snap somehow forgot about the path to my specified hmm file and instead looks for this file in its default installation location) It is worth noting that the parallel installation and run with maker-3.00.0-beta finish smoothly without any problem. So I suspect both the installation warning and the executing error are caused by the changes during the version update from 3.00.0-beta to 3.01.1-beta. Could you check about this issue? Thanks in advance! Finally, is it possible to also provide access to older version of maker (e.g. 3.00.0-beta in this particular case) when the user finish the registration in the maker download page? This will help users to roll back to older version when needed. Also this helps for the version control when other developers develop annotation pipelines that use maker as a dependency package. Thanks for the consideration! Best, Jia-Xing -- Jia-Xing Yue Population Genomics and Complex Traits Group Tour Pasteur 8eme etage Facult? de M?decine Institute for Research on Cancer and Aging, Nice (IRCAN) CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) 28 Avenue de Valombrose 06107 NICE Cedex 2 France Personal website: http://www.iamphioxus.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Oct 16 11:20:32 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 16 Oct 2017 10:20:32 -0600 Subject: [maker-devel] Backlash running through my sequence In-Reply-To: References: Message-ID: <09603A3A-9DC0-40DC-A111-9DC1FCDF80BB@gmail.com> I would not just remove them. The fact they are there calls into question how they got there in the first place. If you generated this file yourself, you may want to intead use fasta_tool. ?Carson > On Oct 15, 2017, at 3:32 PM, Emmanuel Nnadi wrote: > > Hi all, > I am trying to running annotation on some of my sequences but noticed that i have backslash that runs through the sequence. Please how do I remove them > I attached the sequence > > Thanks > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Tue Oct 17 14:11:39 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 17 Oct 2017 19:11:39 +0000 Subject: [maker-devel] Backlash running through my sequence In-Reply-To: <09603A3A-9DC0-40DC-A111-9DC1FCDF80BB@gmail.com> References: <09603A3A-9DC0-40DC-A111-9DC1FCDF80BB@gmail.com> Message-ID: <8FD23F25-92D4-4A9C-873B-BB559B2CCBF4@illinois.edu> I agree with Carson, though my guess is any fasta converters will either fail on these characters as non-IUPAC, or will silently remove them. Running them through a converter may not solve all the issues though, as the backslash also appears in the FASTA headers at the end of the line: cjfields-imac:MAKER cjfields$ grep '>' sample_1.fasta | grep '\\' >contig_134\ >contig_149\ >contig_158\ >contig_222\ >contig_316\ >contig_582\ >contig_634\ >contig_700\ >contig_741\ ? I?m curious, was this edited using any particular program prior to MAKER (or was this an amalgam of different files)? chris From: maker-devel on behalf of Carson Holt Date: Monday, October 16, 2017 at 11:22 AM To: Emmanuel Nnadi Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Backlash running through my sequence I would not just remove them. The fact they are there calls into question how they got there in the first place. If you generated this file yourself, you may want to intead use fasta_tool. ?Carson On Oct 15, 2017, at 3:32 PM, Emmanuel Nnadi > wrote: Hi all, I am trying to running annotation on some of my sequences but noticed that i have backslash that runs through the sequence. Please how do I remove them I attached the sequence Thanks Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Oct 17 14:33:26 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Oct 2017 13:33:26 -0600 Subject: [maker-devel] maker-devel Digest, Vol 113, Issue 13 In-Reply-To: References: Message-ID: <30F2FDFE-3B4E-4951-89D8-63C2FC772B63@gmail.com> Thanks. The map_fasta_ids script was empty in the bin directory for some reason, so the installer through an error because it could not find the #!/usr/bin/perl line. I have put it back in the bin directory where it was supposed to be and the issue goes away for the install. For the second issue, I think I found it and have updated a new tar ball to the website. Also here is a link to download the old 3.00-beta, although I would not recommend making it part of a pipeline because version 3 is still beta and still has bugs (you should use 2.31.9 instead for piplines). ?> http://topaz.genetics.utah.edu/maker_downloads/static/maker-3.00.0-beta.tgz ?Carson > On Oct 16, 2017, at 2:54 AM, Jia-Xing Yue wrote: > > Dear maker developers, > > I am trying to install maker-3.01.1-beta but encountered the warning message about uninitialized value (see the warning message below) although still finished the installation. > > [jxyue at paralog src]$ ./Build install > Building MAKER > Use of uninitialized value $line in chomp at /home/jxyue/Projects/LRSDAY/build/maker/src/../../../build/cpanm/perlmods/lib/perl5/Module/Build/Base.pm line 3082. > Use of uninitialized value $line in substitution (s///) at /home/jxyue/Projects/LRSDAY/build/maker/src/../../../build/cpanm/perlmods/lib/perl5/Module/Build/Base.pm line 3083. > Installing MAKER... > Building MAKER > ... > > Also, when I ran this installation for the actual work, it reported errors about cannot find my specified snaphmm model for the annotation, despite that I have specified "snaphmm=$LRSDAY_HOME/data/S288C.gene.hmm" in the "maker_opts.ctl" file and this configuration information has been successfully recognized by maker. > > running snap. > #--------- command -------------# > Widget::snap: > /home/jxyue/Projects/LRSDAY/build/SNAP/snap /home/jxyue/Projects/LRSDAY/data/S288C.gene.hmm /tmp/maker_m8TVEQ/chrI.abinit_masked.0 > /tmp/maker_m8TVEQ/chrI.abinit_masked.0.S288C%2Egene%2Ehmm.snap > #-------------------------------# > > # (my comment: up to now everything looks fine) > .... > > running snap. > #--------- command -------------# > Widget::snap: > /home/jxyue/Projects/LRSDAY/build/SNAP/snap -plus -xdef /tmp/maker_m8TVEQ/0/85_0.4044-4985.S288C.gene.hmm.auto_annotator.xdef.snap S288C.gene.hmm /tmp > /maker_m8TVEQ/0/85_0.4044-4985.S288C.gene.hmm.auto_annotator.snap.fasta > /tmp/maker_m8TVEQ/0/85_0.4044-4985.S288C.gene.hmm.auto_annotator.snap > #-------------------------------# > ZOE ERROR (from /home/jxyue/Projects/LRSDAY/build/SNAP/snap): error opening file (/home/jxyue/Projects/LRSDAY/build/SNAP/Zoe/HMM/S288C.gene.hmm) > ZOE library version 2017-03-01 > ERROR: Snap failed > --> rank=NA, hostname=paralog.itc.unipi.it > ERROR: Failed while annotating transcripts > ERROR: Chunk failed at level:1, tier_type:4 > FAILED CONTIG:chrI > > ERROR: Chunk failed at level:6, tier_type:0 > FAILED CONTIG:chrI > > examining contents of the fasta file and run log > > # (my comment: here the error occurred. As you can see, snap somehow forgot about the path to my specified hmm file and instead looks for this file in its default installation location) > > It is worth noting that the parallel installation and run with maker-3.00.0-beta finish smoothly without any problem. So I suspect both the installation warning and the executing error are caused by the changes during the version update from 3.00.0-beta to 3.01.1-beta. Could you check about this issue? Thanks in advance! > > Finally, is it possible to also provide access to older version of maker (e.g. 3.00.0-beta in this particular case) when the user finish the registration in the maker download page? This will help users to roll back to older version when needed. Also this helps for the version control when other developers develop annotation pipelines that use maker as a dependency package. Thanks for the consideration! > > > Best, > Jia-Xing > > -- > Jia-Xing Yue > > Population Genomics and Complex Traits Group > Tour Pasteur 8eme etage > Facult? de M?decine > Institute for Research on Cancer and Aging, Nice (IRCAN) > CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) > 28 Avenue de Valombrose > 06107 NICE Cedex 2 > France > > Personal website: http://www.iamphioxus.org/ > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.tranvan at unil.ch Wed Oct 18 06:47:35 2017 From: patrick.tranvan at unil.ch (Patrick Tran Van) Date: Wed, 18 Oct 2017 11:47:35 +0000 Subject: [maker-devel] MPI vs multiple instance for speed In-Reply-To: <8FD23F25-92D4-4A9C-873B-BB559B2CCBF4@illinois.edu> References: <09603A3A-9DC0-40DC-A111-9DC1FCDF80BB@gmail.com>, <8FD23F25-92D4-4A9C-873B-BB559B2CCBF4@illinois.edu> Message-ID: <1508327278733.19140@unil.ch> Hi Carson, 1) I think I have read one of your post saying that running maker with MPI is faster than multiple instance, can you explain why ? 2) I am trying to annotate a 1GB specie but it's superslow. I have filtered the transcriptome to speed up the process but do you have other suggestion to increase the speed ? Cheers, Patrick Tran Van -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Oct 18 10:09:10 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 18 Oct 2017 09:09:10 -0600 Subject: [maker-devel] MPI vs multiple instance for speed In-Reply-To: <1508327278733.19140@unil.ch> References: <09603A3A-9DC0-40DC-A111-9DC1FCDF80BB@gmail.com> <8FD23F25-92D4-4A9C-873B-BB559B2CCBF4@illinois.edu> <1508327278733.19140@unil.ch> Message-ID: <486FE3D5-0902-4B05-A3E1-96642C68E422@gmail.com> MAKER can coordinate parallelization under MPI in a way it can?t even with multiple simultaneous runs. Because processes can comunicate among themselves under MPI, MAKER can break larger contigs into chunks or even pull off individual steps and pass them onto another processor, then receive the results back from that processor. So multiple BLAST, RepeatMasker, Exonerate, and prediction processes can all run at the same time for the same contig. Then they all pass their result back to the parent process so it can produce output for that contig. MPI was chosen as the parallelization framework rather than threads because it works both within a single machine as well as across multiple machines, so you can scale up to hundreds of processes if needed. ?Carson > On Oct 18, 2017, at 5:47 AM, Patrick Tran Van wrote: > > Hi Carson, > > 1) I think I have read one of your post saying that running maker with MPI is faster than multiple instance, can you explain why ? > > 2) I am trying to annotate a 1GB specie but it's superslow. > I have filtered the transcriptome to speed up the process but do you have other suggestion to increase the speed ? > > Cheers, > > Patrick Tran Van > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jhumann at wsu.edu Wed Oct 18 15:38:41 2017 From: jhumann at wsu.edu (Humann, Jodi Lynn) Date: Wed, 18 Oct 2017 20:38:41 +0000 Subject: [maker-devel] fix nucleotides option on MWAS Message-ID: Hello, I was wondering if there was any way to enable the '-fix_nucleotides' option on the MWAS version we are running locally on our server? I have a genome sequence with a degenerate nucleotide and get the following error: ERROR: The nucleotide sequence file '/local/www/maker/data/users/1/NZ_CP006580.1_EcP101.fasta' appears to contain protein sequence or unrecognized characters. Note the following nucleotides may be valid but are unsupported [RYKMSWBDHV] Please check/fix the file before continuing, or set -fix_nucleotides on the command line to fix this automatically. Invalid Character: 'K' --> rank=NA, hostname=compute2 The error message says the option can be used on the command line. Is that set on the actual command to run Maker (when using the command line version), or is it something that can be set in one of the control files? Any input would be greatly appreciated. I know I can fix my input file, but would prefer to just enable the option if I can. Thanks, Jodi Jodi Humann, Ph.D. Main Bioinformatics Lab Project Coordinator Department of Horticulture Washington State University PO Box 646414 Pullman, WA 99164-6414 509-335-3206 jhumann at wsu.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.zimmermann at univie.ac.at Thu Oct 19 10:25:08 2017 From: robert.zimmermann at univie.ac.at (Bob Zimmermann) Date: Thu, 19 Oct 2017 17:25:08 +0200 Subject: [maker-devel] Fewer gene models output with a superset of EST evidence Message-ID: Hi Maker Developers, I have been playing around with several data sets as input to annotate our newly reassembled genome. We have 3 RNA seq datasets which have been assembled into de novo transcripts using Trinity. These are input into the maker pipeline along with protein evidence. What is strange is that when I run maker with the de novo transcripts from a single set, I optain more maker transcripts than when I run with a combined set (1619 vs 1450 on one chromosome) and they are longer (median transcript length 1619 vs 1450, IQR 872-2160 vs 667-2026). It might make sense if they were more and shorter if the additional evidence was joining transcripts, but this would indicate that it is not the case. Therefore I?m trying to understand the algorithm. From what I understand if it finds evidence for an ab initio prediction for which the internal splice junctions agree, then it is considered for improvement. Why, then, if my combined set is a strict superset of the single set, do i get more transcripts with the single set? Thanks for your help! Best, Bob ? Department of Molecular Evolution and Development Universit?t Wien Althanstra?e 14 (UZA I), Zimmer 2.019 1090 Vienna Austria +43 1 427757002 From robert.zimmermann at univie.ac.at Thu Oct 19 10:28:17 2017 From: robert.zimmermann at univie.ac.at (Bob Zimmermann) Date: Thu, 19 Oct 2017 17:28:17 +0200 Subject: [maker-devel] Fewer gene models output with a superset of EST evidence In-Reply-To: References: Message-ID: Correction to the above numbers, the median lengths are 1414 and 1256. > On 19 Oct 2017, at 17:25, Bob Zimmermann wrote: > > Hi Maker Developers, > > I have been playing around with several data sets as input to annotate our newly reassembled genome. We have 3 RNA seq datasets which have been assembled into de novo transcripts using Trinity. These are input into the maker pipeline along with protein evidence. What is strange is that when I run maker with the de novo transcripts from a single set, I optain more maker transcripts than when I run with a combined set (1619 vs 1450 on one chromosome) and they are longer (median transcript length 1619 vs 1450, IQR 872-2160 vs 667-2026). It might make sense if they were more and shorter if the additional evidence was joining transcripts, but this would indicate that it is not the case. > > Therefore I?m trying to understand the algorithm. From what I understand if it finds evidence for an ab initio prediction for which the internal splice junctions agree, then it is considered for improvement. Why, then, if my combined set is a strict superset of the single set, do i get more transcripts with the single set? > > Thanks for your help! > > Best, > Bob > > ? > > Department of Molecular Evolution and Development > Universit?t Wien > Althanstra?e 14 (UZA I), Zimmer 2.019 > 1090 Vienna > Austria > > +43 1 427757002 > From carsonhh at gmail.com Thu Oct 19 10:44:07 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 19 Oct 2017 09:44:07 -0600 Subject: [maker-devel] Fewer gene models output with a superset of EST evidence In-Reply-To: References: Message-ID: <62F04A76-F3F1-4044-B4AD-129B15A9EEB2@gmail.com> You should look at both in a browser to get a better idea of what?s going on. What MAKER does is take the evidence given, clusters it (strand specific clustering) then uses the transcript evidence as intron hints to the predictors and protein alignments as exon hints (will also use polished protein hints to generate intron hints in the absence of transcript intron hints). Finally it uses overlapping transcript evidence to generate UTR. So look at it in a browser. See if the apparent overlap clusters are different in extent, also look for mRNA-seq evidence being merged. If the cluster is falsely merging between two loci because the mRNA-seq is merged, one of two things will happen you will get multiple models since the predictor can?t make a single model work within the cluster using the hints, or you will get a model with a really long UTR that is blocking other models from existing in the region. Also as depending on the mRNA-seq evidence coming in, you may be generating false models because of noise in the data. Essentially everything is transcribed at a basal level, so as you get more and more mRNA-seq, you generate more and more spurious alignments. So more evidence might gernate fewer long alignments for true loci or by falsely merging genes while simultaneously adding a number of very short spurious results. ?Carson > On Oct 19, 2017, at 9:28 AM, Bob Zimmermann wrote: > > Correction to the above numbers, the median lengths are 1414 and 1256. > >> On 19 Oct 2017, at 17:25, Bob Zimmermann wrote: >> >> Hi Maker Developers, >> >> I have been playing around with several data sets as input to annotate our newly reassembled genome. We have 3 RNA seq datasets which have been assembled into de novo transcripts using Trinity. These are input into the maker pipeline along with protein evidence. What is strange is that when I run maker with the de novo transcripts from a single set, I optain more maker transcripts than when I run with a combined set (1619 vs 1450 on one chromosome) and they are longer (median transcript length 1619 vs 1450, IQR 872-2160 vs 667-2026). It might make sense if they were more and shorter if the additional evidence was joining transcripts, but this would indicate that it is not the case. >> >> Therefore I?m trying to understand the algorithm. From what I understand if it finds evidence for an ab initio prediction for which the internal splice junctions agree, then it is considered for improvement. Why, then, if my combined set is a strict superset of the single set, do i get more transcripts with the single set? >> >> Thanks for your help! >> >> Best, >> Bob >> >> ? >> >> Department of Molecular Evolution and Development >> Universit?t Wien >> Althanstra?e 14 (UZA I), Zimmer 2.019 >> 1090 Vienna >> Austria >> >> +43 1 427757002 >> > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Oct 19 12:32:44 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 19 Oct 2017 11:32:44 -0600 Subject: [maker-devel] fix nucleotides option on MWAS In-Reply-To: References: Message-ID: Hi Jodi, I didn?t even know anyone else even had an MWAS server running (I?ve actually pulled all of the Build options for MWAS out of current releases). But you should be able to add the fix_nucleotide option to the command run by MWAS by editing the mwas_server script (?/maker/MWAS/bin/mwas_server). Somewhere inside the script there will be a line like this ?> $command = "$FindBin::RealBin/../../bin/maker -qq -base $job_id"; You can add -fix_nucleotides to that command so it always runs. fix_nucleotides is as command line flag. It?s basically a warning for the user to let them know something is weird (i.e. it is possible they mixed up transcript/protein sequence files). And then it allows the user to tell MAKER they did not mix files up, rather the data is supposed to look that way and they are ok with MAKER altering the sequence by replacing the letters or dashes seen with N?s. Thanks, Carson > On Oct 18, 2017, at 2:38 PM, Humann, Jodi Lynn wrote: > > Hello, > > I was wondering if there was any way to enable the ??fix_nucleotides? option on the MWAS version we are running locally on our server? I have a genome sequence with a degenerate nucleotide and get the following error: > > ERROR: The nucleotide sequence file '/local/www/maker/data/users/1/NZ_CP006580.1_EcP101.fasta' appears to contain protein sequence or unrecognized characters. Note the following nucleotides may be valid but are unsupported [RYKMSWBDHV] Please check/fix the file before continuing, or set -fix_nucleotides on the command line to fix this automatically. Invalid Character: 'K' --> rank=NA, hostname=compute2 > > The error message says the option can be used on the command line. Is that set on the actual command to run Maker (when using the command line version), or is it something that can be set in one of the control files? Any input would be greatly appreciated. I know I can fix my input file, but would prefer to just enable the option if I can. > > Thanks, > Jodi > > Jodi Humann, Ph.D. > Main Bioinformatics Lab Project Coordinator > Department of Horticulture > Washington State University > PO Box 646414 > Pullman, WA 99164-6414 > 509-335-3206 > jhumann at wsu.edu > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Oct 19 13:46:17 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 19 Oct 2017 12:46:17 -0600 Subject: [maker-devel] fix nucleotides option on MWAS In-Reply-To: References: Message-ID: <052F801C-3B37-4B0F-B40A-A905F5F2B1CE@gmail.com> Yes. That is the current version. ?Carson > On Oct 19, 2017, at 12:45 PM, Humann, Jodi Lynn wrote: > > Thanks for the info, Carson. We are running v2.31.9, and were able to get MWAS running, with some work. That is the current Maker version right? > > Jodi > > From: Carson Holt [mailto:carsonhh at gmail.com ] > Sent: Thursday, October 19, 2017 10:33 AM > To: Humann, Jodi Lynn > > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] fix nucleotides option on MWAS > > Hi Jodi, > > I didn?t even know anyone else even had an MWAS server running (I?ve actually pulled all of the Build options for MWAS out of current releases). But you should be able to add the fix_nucleotide option to the command run by MWAS by editing the mwas_server script (?/maker/MWAS/bin/mwas_server). > > Somewhere inside the script there will be a line like this ?> > $command = "$FindBin::RealBin/../../bin/maker -qq -base $job_id"; > > You can add -fix_nucleotides to that command so it always runs. fix_nucleotides is as command line flag. It?s basically a warning for the user to let them know something is weird (i.e. it is possible they mixed up transcript/protein sequence files). And then it allows the user to tell MAKER they did not mix files up, rather the data is supposed to look that way and they are ok with MAKER altering the sequence by replacing the letters or dashes seen with N?s. > > Thanks, > Carson > > > On Oct 18, 2017, at 2:38 PM, Humann, Jodi Lynn > wrote: > > Hello, > > I was wondering if there was any way to enable the ??fix_nucleotides? option on the MWAS version we are running locally on our server? I have a genome sequence with a degenerate nucleotide and get the following error: > > ERROR: The nucleotide sequence file '/local/www/maker/data/users/1/NZ_CP006580.1_EcP101.fasta' appears to contain protein sequence or unrecognized characters. Note the following nucleotides may be valid but are unsupported [RYKMSWBDHV] Please check/fix the file before continuing, or set -fix_nucleotides on the command line to fix this automatically. Invalid Character: 'K' --> rank=NA, hostname=compute2 > > The error message says the option can be used on the command line. Is that set on the actual command to run Maker (when using the command line version), or is it something that can be set in one of the control files? Any input would be greatly appreciated. I know I can fix my input file, but would prefer to just enable the option if I can. > > Thanks, > Jodi > > Jodi Humann, Ph.D. > Main Bioinformatics Lab Project Coordinator > Department of Horticulture > Washington State University > PO Box 646414 > Pullman, WA 99164-6414 > 509-335-3206 > jhumann at wsu.edu > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jhumann at wsu.edu Thu Oct 19 13:45:43 2017 From: jhumann at wsu.edu (Humann, Jodi Lynn) Date: Thu, 19 Oct 2017 18:45:43 +0000 Subject: [maker-devel] fix nucleotides option on MWAS In-Reply-To: References: Message-ID: Thanks for the info, Carson. We are running v2.31.9, and were able to get MWAS running, with some work. That is the current Maker version right? Jodi From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Thursday, October 19, 2017 10:33 AM To: Humann, Jodi Lynn Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] fix nucleotides option on MWAS Hi Jodi, I didn?t even know anyone else even had an MWAS server running (I?ve actually pulled all of the Build options for MWAS out of current releases). But you should be able to add the fix_nucleotide option to the command run by MWAS by editing the mwas_server script (?/maker/MWAS/bin/mwas_server). Somewhere inside the script there will be a line like this ?> $command = "$FindBin::RealBin/../../bin/maker -qq -base $job_id"; You can add -fix_nucleotides to that command so it always runs. fix_nucleotides is as command line flag. It?s basically a warning for the user to let them know something is weird (i.e. it is possible they mixed up transcript/protein sequence files). And then it allows the user to tell MAKER they did not mix files up, rather the data is supposed to look that way and they are ok with MAKER altering the sequence by replacing the letters or dashes seen with N?s. Thanks, Carson On Oct 18, 2017, at 2:38 PM, Humann, Jodi Lynn > wrote: Hello, I was wondering if there was any way to enable the ??fix_nucleotides? option on the MWAS version we are running locally on our server? I have a genome sequence with a degenerate nucleotide and get the following error: ERROR: The nucleotide sequence file '/local/www/maker/data/users/1/NZ_CP006580.1_EcP101.fasta' appears to contain protein sequence or unrecognized characters. Note the following nucleotides may be valid but are unsupported [RYKMSWBDHV] Please check/fix the file before continuing, or set -fix_nucleotides on the command line to fix this automatically. Invalid Character: 'K' --> rank=NA, hostname=compute2 The error message says the option can be used on the command line. Is that set on the actual command to run Maker (when using the command line version), or is it something that can be set in one of the control files? Any input would be greatly appreciated. I know I can fix my input file, but would prefer to just enable the option if I can. Thanks, Jodi Jodi Humann, Ph.D. Main Bioinformatics Lab Project Coordinator Department of Horticulture Washington State University PO Box 646414 Pullman, WA 99164-6414 509-335-3206 jhumann at wsu.edu _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From eennadi at gmail.com Mon Oct 23 08:30:07 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Mon, 23 Oct 2017 14:30:07 +0100 Subject: [maker-devel] Contamination report from NCBI Message-ID: Hello Good day. Please I submitted my sequence to NCBI and they sent back this contamination report. Please how do I use maker to effect the correction Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- SUBID BioProject BioSample Organism -------------------------------------------------------- SUB3124577 PRJNA414658 SAMN07821433 Mucuna pruriens [] We ran your sequences through our Contamination Screen. The screen found contigs that need to be trimmed and/or excluded. Please adjust the sequences appropriately and then resubmit your sequences. After you remove the contamination, trim any Ns at the ends of the sequence and remove any sequences that are shorter than 200 nt and not part of a multi-component scaffold. Note that hits in eukaryotic genomes to mitochondrial sequences can be ignored when specific criteria are met. Those criteria are explained below. Note that mismatches between the name of the adaptor/primer identified in the screen and the sequencing technology used to generate the sequencing data should not be used to discount the validity of the screen results as the adaptors/primers of many different sequencing platforms share sequence similarity. [] Some of the sequences hit primers or adaptors used in Illumina or 454 or other sequencing strategies or platforms. Primers at the end of a sequence should be removed. However, if primers are present within sequences then you should strongly consider splitting the sequences at the primers because the primer sequence could have been the region of overlap, causing a misassembly. Screened 26,016 sequences, 396,641,426 bp. Note: 5,610 sequences with runs of Ns 10 bp or longer (or those longer that 20 MB) were split before screening. 428 sequences with locations to mask/trim (31 split spans to exclude, 397 split spans with locations to mask/trim) Trim: Sequence name, length, span(s), apparent source contig_10109 13138 13078..13138 adaptor:NGB00847.1 contig_10200 20270 1..76 adaptor:NGB00847.1 contig_10202 22517 1..44 adaptor:NGB00360.1 contig_10218 55661 55592..55661 adaptor:NGB00847.1 contig_10283 11575 1..79 adaptor:NGB00847.1 contig_1038 91134 91073..91134 adaptor:NGB00360.1 contig_104 10061 10005..10061 adaptor:NGB00360.1 contig_10405 24076 1..43 adaptor:NGB00847.1 contig_10425 16694 16639..16694 adaptor:NGB00360.1 contig_10447 37445 37233..37445 adaptor:NGB00360.1 contig_10466 19368 1..52 adaptor:NGB00847.1 contig_10576 12053 12003..12053 adaptor:NGB00360.1 contig_1059 34516 34457..34516 adaptor:NGB00847.1 contig_106 49997 1..45 adaptor:NGB00360.1 contig_10695 27664 1..38 adaptor:NGB01029.1 contig_10753 12481 12413..12481 adaptor:NGB00847.1 contig_10822 33522 33441..33522 adaptor:NGB00847.1 contig_1083 10637 1..23 adaptor:NGB01096.1 contig_10851 36752 36682..36752 adaptor:NGB00360.1 contig_10878 27925 27848..27925 adaptor:NGB00360.1 contig_10965 23597 1..57 adaptor:NGB00360.1 contig_10968 7413 1..40 adaptor:NGB00847.1 contig_1099 35847 1..70 adaptor:NGB00360.1 contig_11034 10224 10166..10224 adaptor:NGB00360.1 contig_11058 32994 1..23 adaptor:NGB01088.1 contig_11138 17426 1..73 adaptor:NGB00847.1 contig_11166 6306 6266..6306 adaptor:NGB00360.1 contig_11182 26558 1..30 adaptor:NGB01096.1 contig_11216 15160 1..59 adaptor:NGB00847.1 contig_11269 14732 14655..14732 adaptor:NGB00847.1 contig_11306 28246 28199..28246 adaptor:NGB00360.1 contig_1136 28186 1..73 adaptor:NGB00847.1 contig_1141 58119 58028..58119 adaptor:NGB00847.1 contig_11416 8561 8539..8561 adaptor:NGB01088.1 contig_11504 8890 8840..8890 adaptor:NGB00360.1 contig_1158 17422 17398..17422 adaptor:NGB01088.1 contig_11647 7021 1..69 adaptor:NGB00847.1 contig_11684 17442 17418..17442 adaptor:NGB01096.1 contig_11752 38337 38314..38337 adaptor:NGB01088.1 contig_11767 6366 6324..6366 adaptor:NGB00847.1 contig_11791 22415 1..43 adaptor:NGB00847.1 contig_11792 58260 1..29 adaptor:NGB01096.1 contig_1187 39501 39462..39501 adaptor:NGB01029.1 contig_12059 10094 1..72 adaptor:NGB00360.1 contig_12130 13210 13164..13210 adaptor:NGB00360.1 contig_12164 17561 17539..17561 adaptor:NGB01096.1 contig_12169 14178 139..196 adaptor:NGB00360.1 contig_12183 15822 61..112 adaptor:NGB00360.1 contig_12266 11704 11640..11704 adaptor:NGB00360.1 contig_12300 9550 9360..9550 adaptor:NGB01088.1 contig_12324 49997 49891..49997 adaptor:NGB00847.1 contig_12423 45971 45860..45918 adaptor:NGB00360.1 contig_12441 15141 1..42 adaptor:NGB00847.1 contig_12514 14655 1..69 adaptor:NGB00847.1 contig_12515 5355 5326..5355 adaptor:NGB01088.1 contig_12535 22496 22458..22496 adaptor:NGB01029.1 contig_12544 19615 19559..19615 adaptor:NGB00360.1 contig_12558 20026 20007..20026 adaptor:NGB01088.1 contig_12613 6880 6793..6880 adaptor:NGB00847.1 contig_12701 18439 18330..18382 adaptor:NGB00360.1 contig_12713 13341 13274..13341 adaptor:NGB00360.1 contig_12723 17913 1..38 adaptor:NGB01088.1 contig_12730 55277 55249..55277 adaptor:NGB01096.1 contig_12739 6792 1..48 adaptor:NGB00360.1 contig_12787 30950 1..19 adaptor:NGB01096.1 contig_1279 18699 18670..18699 adaptor:NGB01088.1 contig_12815 5168 5091..5168 adaptor:NGB00847.1 contig_12846 20753 1..70 adaptor:NGB00360.1 contig_1288 34784 1..31 adaptor:NGB01096.1 contig_12888 12204 1..23 adaptor:NGB01096.1 contig_12919 10315 1..71 adaptor:NGB00360.1 contig_13031 8972 8938..8972 adaptor:NGB01093.1 contig_13088 6275 1..22 adaptor:NGB01088.1 contig_13140 36197 1..48 adaptor:NGB00360.1 contig_13233 16414 16355..16414 adaptor:NGB00847.1 contig_1330 33261 1..44 adaptor:NGB00847.1 contig_13319 19747 1..20 adaptor:NGB01096.1 contig_13367 36004 35868..35931 adaptor:NGB00847.1 contig_13395 5338 1..79 adaptor:NGB00360.1 contig_1341 30756 30734..30756 adaptor:NGB01088.1 contig_13481 9637 9600..9637 adaptor:NGB00360.1 contig_13506 5704 5662..5704 adaptor:NGB00360.1 contig_13548 5814 79..121 adaptor:NGB00360.1 contig_13567 21576 1..47 adaptor:NGB00847.1 contig_13669 8336 1..24 adaptor:NGB01088.1 contig_13718 23500 1..25 adaptor:NGB01096.1 contig_13783 18720 1..41 adaptor:NGB00847.1 contig_13830 32395 32367..32395 adaptor:NGB01096.1 contig_13845 15572 15493..15572 adaptor:NGB00360.1 contig_13854 10932 1..48 adaptor:NGB00360.1 contig_13943 37701 37674..37701 adaptor:NGB01096.1 contig_13957 7159 1..30 adaptor:NGB01096.1 contig_14014 29735 29672..29735 adaptor:NGB00360.1 contig_14027 21418 21340..21418 adaptor:NGB00360.1 contig_14032 47642 1..53 adaptor:NGB00847.1 contig_14047 26936 1..28 adaptor:NGB01088.1 contig_14048 45832 1..22 adaptor:NGB01088.1 contig_14061 11471 1..179 adaptor:NGB01096.1 contig_14113 17661 1..67 adaptor:NGB00360.1 contig_14173 17601 1..41 adaptor:NGB00847.1 contig_1418 31840 1..248 adaptor:NGB00847.1 contig_14194 7456 7294..7456 adaptor:NGB01096.1 contig_14210 8814 1971..2025 adaptor:NGB00360.1 contig_14223 12513 12489..12513 adaptor:NGB01096.1 contig_14317 21472 21410..21472 adaptor:NGB00360.1 contig_14424 6040 5973..6040 adaptor:NGB00360.1 contig_14425 6404 6379..6404 adaptor:NGB01096.1 contig_14426 31457 31398..31457 adaptor:NGB00847.1 contig_14458 6814 6623..6814 adaptor:NGB01088.1 contig_14524 9488 9431..9488 adaptor:NGB00847.1 contig_14584 20433 1..96 adaptor:NGB00847.1 contig_1459 32979 1..32 adaptor:NGB01096.1 contig_14601 19077 1..28 adaptor:NGB01096.1 contig_14641 21747 1..45 adaptor:NGB00847.1 contig_14664 48155 48118..48155 adaptor:NGB00360.1 contig_14711 11854 11827..11854 adaptor:NGB01096.1 contig_14736 21360 1..37 adaptor:NGB01029.1 contig_14749 12830 1..33 adaptor:NGB01093.1 contig_14966 9962 9891..9962 adaptor:NGB00360.1 contig_14999 5248 1..41 adaptor:NGB00360.1 contig_15010 17976 1..43 adaptor:NGB00360.1 contig_15011 26484 26462..26484 adaptor:NGB01096.1 contig_15017 9331 9291..9331 adaptor:NGB00360.1 contig_1503 63533 1..33 adaptor:NGB01096.1 contig_15032 32240 32157..32240 adaptor:NGB00847.1 contig_15060 15050 15010..15050 adaptor:NGB00847.1 contig_15065 13062 12996..13062 adaptor:NGB00360.1 contig_15070 29943 1..29 adaptor:NGB01096.1 contig_15132 20431 1..71 adaptor:NGB00847.1 contig_15169 7086 7051..7086 adaptor:NGB00846.1 contig_15174 19921 1..23 adaptor:NGB01096.1 contig_15194 16100 16039..16100 adaptor:NGB00847.1 contig_15212 9272 1..50 adaptor:NGB00847.1 contig_15215 15591 1..58 adaptor:NGB00360.1 contig_15271 37699 37647..37699 adaptor:NGB00847.1 contig_15276 11087 11031..11087 adaptor:NGB00847.1 contig_15309 10118 1..42 adaptor:NGB00847.1 contig_15320 7963 7901..7963 adaptor:NGB00847.1 contig_15334 5683 1..36 adaptor:NGB00846.1 contig_15364 17306 76..139 adaptor:NGB00847.1 contig_15374 28301 28263..28301 adaptor:NGB00360.1 contig_15377 10470 10428..10470 adaptor:NGB00360.1 contig_15398 24069 23999..24069 adaptor:NGB00847.1 contig_15500 9289 9271..9289 adaptor:NGB01096.1 contig_15507 25565 1..22 adaptor:NGB01088.1 contig_15523 5782 5762..5782 adaptor:NGB01088.1 contig_15529 10225 10143..10225 adaptor:NGB00360.1 contig_15569 9645 9612..9645 adaptor:NGB01090.1 contig_15596 7163 1..42 adaptor:NGB00360.1 contig_15605 18521 1..31 adaptor:NGB01096.1 contig_15672 8446 1..213 adaptor:NGB01088.1 contig_15686 22141 58..90 adaptor:NGB00847.1 contig_15708 18098 17996..18098 adaptor:NGB00847.1 contig_15736 18284 18252..18284 adaptor:NGB01096.1 contig_15777 17192 1..45 adaptor:NGB00360.1 contig_15812 8602 1..77 adaptor:NGB00360.1 contig_15959 10936 10913..10936 adaptor:NGB01096.1 contig_15972 11324 1..71 adaptor:NGB00360.1 contig_15974 24312 24243..24312 adaptor:NGB00847.1 contig_16057 8838 8775..8838 adaptor:NGB00847.1 contig_16088 7608 1..71 adaptor:NGB00360.1 contig_16142 10392 1..53 adaptor:NGB00847.1 contig_1617 14870 255..310 adaptor:NGB00360.1 contig_16183 9226 9205..9226 adaptor:NGB01088.1 contig_16188 62666 62586..62666 adaptor:NGB00847.1 contig_16370 7868 1..42 adaptor:NGB00847.1 contig_16416 19512 1..21 adaptor:NGB01088.1 contig_1645 25016 24951..25016 adaptor:NGB00360.1 contig_16510 31845 31776..31845 adaptor:NGB00847.1 contig_16529 17342 1..45 adaptor:NGB00360.1 contig_16558 9338 9097..9338 adaptor:NGB00360.1 contig_16573 6590 6521..6590 adaptor:NGB00847.1 contig_16608 7397 7324..7397 adaptor:NGB00847.1 contig_16631 11055 1..50 adaptor:NGB00360.1 contig_16641 5482 1..190 adaptor:NGB01088.1 contig_1667 35244 35200..35244 adaptor:NGB01029.1 contig_16682 14500 1..71 adaptor:NGB00847.1 contig_16699 6216 6148..6216 adaptor:NGB00360.1 contig_16734 12674 12625..12674 adaptor:NGB00360.1 contig_16790 6341 1..51 adaptor:NGB00360.1 contig_16807 7512 1..36 adaptor:NGB01096.1 contig_16817 20743 1..155 adaptor:NGB01088.1 contig_16839 6969 1..69 adaptor:multiple contig_16870 10948 1..49 adaptor:NGB00847.1 contig_16880 5622 5549..5622 adaptor:NGB00360.1 contig_16889 9182 1..40 adaptor:NGB00360.1 contig_16911 6691 1..28 adaptor:NGB01088.1 contig_16921 9432 9358..9432 adaptor:NGB00360.1 contig_16951 14285 14262..14285 adaptor:NGB01088.1 contig_17021 12242 1..75 adaptor:NGB00360.1 contig_17092 22712 1..64 adaptor:NGB00360.1 contig_17147 7706 7685..7706 adaptor:NGB01096.1 contig_17195 15668 15643..15668 adaptor:NGB01096.1 contig_17214 7881 7819..7881 adaptor:NGB00847.1 contig_17299 7861 7830..7861 adaptor:NGB01088.1 contig_17344 8915 8765..8823 adaptor:NGB00360.1 contig_17361 8425 1..26 adaptor:NGB01096.1 contig_17422 11017 10964..11017 adaptor:NGB00360.1 contig_17471 5988 5964..5988 adaptor:NGB01096.1 contig_17505 10208 1..74 adaptor:NGB00360.1 contig_17506 6091 1..61 adaptor:NGB00360.1 contig_17520 6084 6028..6084 adaptor:NGB00360.1 contig_17538 5796 5766..5796 adaptor:NGB01096.1 contig_17558 7066 6837..7066 adaptor:NGB01080.1 contig_17561 15165 1..206 adaptor:NGB01083.1 contig_17594 6976 1..26 adaptor:NGB01088.1 contig_17655 14371 14177..14371 adaptor:NGB01088.1 contig_17671 17801 1..50 adaptor:NGB00847.1 contig_17680 5752 5693..5752 adaptor:NGB00847.1 contig_17738 6456 1..44 adaptor:NGB00360.1 contig_17741 10917 10889..10917 adaptor:NGB01096.1 contig_17775 5928 1..79 adaptor:NGB00847.1 contig_17804 11597 11562..11597 adaptor:NGB00846.1 contig_17872 11319 11278..11319 adaptor:NGB00847.1 contig_17876 5647 5613..5647 adaptor:NGB01083.1 contig_17925 9923 1..22 adaptor:NGB01088.1 contig_17938 5246 1..23 adaptor:NGB01088.1 contig_18016 8044 1..29 adaptor:NGB01096.1 contig_18017 6668 6647..6668 adaptor:NGB01096.1 contig_18044 11330 11299..11330 adaptor:NGB01096.1 contig_18049 10560 1..88 adaptor:NGB00847.1 contig_18173 12243 1..159 adaptor:NGB01096.1 contig_18175 8788 8765..8788 adaptor:NGB01096.1 contig_18177 11418 11340..11418 adaptor:multiple contig_18182 11901 11832..11901 adaptor:NGB00847.1 contig_18201 6059 6038..6059 adaptor:NGB01096.1 contig_18222 11216 11136..11216 adaptor:NGB00847.1 contig_18228 8386 8361..8386 adaptor:NGB01088.1 contig_18321 5922 5897..5922 adaptor:NGB01096.1 contig_18370 5400 5085..5116 adaptor:NGB00747.1 contig_18453 5849 1..38 adaptor:NGB00360.1 contig_1846 23210 1..64 adaptor:NGB00360.1 contig_18479 5209 1..44 adaptor:NGB00360.1 contig_18486 5749 5726..5749 adaptor:NGB01088.1 contig_18488 5217 1..19 adaptor:NGB01088.1 contig_1969 65776 1..60 adaptor:NGB00360.1 contig_197 9215 1..83 adaptor:NGB00847.1 contig_1977 13765 1..35 adaptor:NGB01093.1 contig_1999 53427 53398..53427 adaptor:NGB01096.1 contig_2125 11803 11769..11803 adaptor:NGB01083.1 contig_2151 9544 1..37 adaptor:NGB01029.1 contig_2179 38972 1..67 adaptor:NGB00360.1 contig_2186 31110 30935..31110 adaptor:NGB01096.1 contig_2203 60314 60124..60187 adaptor:NGB00847.1 contig_2278 33271 1..36 adaptor:NGB01090.1 contig_2305 17957 1..58 adaptor:NGB00360.1 contig_2361 48816 48764..48816 adaptor:NGB00847.1 contig_242 49604 49535..49604 adaptor:NGB00360.1 contig_2429 76318 76242..76318 adaptor:NGB00847.1 contig_2430 70439 70373..70439 adaptor:NGB00847.1 contig_2459 63920 1..96 adaptor:NGB00847.1 contig_2485 31300 31260..31300 adaptor:NGB00360.1 contig_2508 25152 25095..25152 adaptor:NGB00847.1 contig_2650 36583 1..58 adaptor:NGB00847.1 contig_2668 22089 22052..22089 adaptor:NGB01029.1 contig_2735 13614 1..19 adaptor:NGB01088.1 contig_2781 50403 1..70 adaptor:NGB00847.1 contig_2800 30768 22802..22846 adaptor:NGB00360.1 contig_2824 44109 1..38 adaptor:NGB00847.1 contig_2888 19121 1..89 adaptor:NGB00360.1 contig_2900 36871 1..32 adaptor:NGB01088.1 contig_2949 25959 25916..25959 adaptor:NGB00360.1 contig_2970 20833 1..46 adaptor:NGB00360.1 contig_2986 16429 1..43 adaptor:NGB00360.1 contig_3069 38956 38904..38956 adaptor:NGB00847.1 contig_3106 9135 1..87 adaptor:NGB00847.1 contig_3124 70101 70072..70101 adaptor:NGB01088.1 contig_3129 30402 30379..30402 adaptor:NGB01088.1 contig_3147 10611 10586..10611 adaptor:NGB01096.1 contig_3190 117726 117687..117726 adaptor:NGB01029.1 contig_3243 44291 44273..44291 adaptor:NGB01096.1 contig_3276 57911 1..42 adaptor:NGB00360.1 contig_341 67008 1..22 adaptor:NGB01096.1 contig_3542 16855 1..60 adaptor:NGB00847.1 contig_3595 29288 1..79 adaptor:NGB00847.1 contig_3712 73078 1..78 adaptor:NGB00847.1 contig_3840 40472 40414..40472 adaptor:NGB00360.1 contig_3868 33875 33819..33875 adaptor:NGB00360.1 contig_3903 40080 40010..40080 adaptor:NGB00847.1 contig_3996 44010 43970..44010 adaptor:NGB00360.1 contig_4001 26085 1..73 adaptor:NGB00847.1 contig_4014 30676 30590..30676 adaptor:NGB00360.1 contig_4019 49543 1..76 adaptor:NGB00360.1 contig_4036 58848 58696..58848 adaptor:NGB00846.1 contig_4084 41308 41210..41308 adaptor:NGB00360.1 contig_4095 24801 1..70 adaptor:NGB00847.1 contig_4098 27393 1..189 adaptor:NGB01096.1 contig_410 57740 57678..57740 adaptor:NGB00360.1 contig_4172 20870 9717..9749 adaptor:NGB01096.1 contig_4318 55870 55805..55870 adaptor:NGB00360.1 contig_432 58593 58569..58593 adaptor:NGB01088.1 contig_4323 87370 87304..87370 adaptor:NGB00847.1 contig_4365 27401 27350..27401 adaptor:NGB00847.1 contig_4516 14480 1..98 adaptor:NGB00847.1 contig_452 34031 1..23 adaptor:NGB01096.1 contig_4530 63069 63006..63069 adaptor:NGB00360.1 contig_4651 67570 67518..67570 adaptor:NGB00847.1 contig_4679 20970 1..38 adaptor:NGB00360.1 contig_4686 7411 1..24 adaptor:NGB01096.1 contig_4743 37926 1..79 adaptor:NGB00360.1 contig_4765 11248 11167..11248 adaptor:NGB00360.1 contig_4801 91339 1..50 adaptor:NGB00360.1 contig_4812 37300 37121..37300 adaptor:NGB01093.1 contig_4820 80899 80862..80899 adaptor:NGB00360.1 contig_4904 9220 1..52 adaptor:NGB00847.1 contig_4916 29759 29718..29759 adaptor:NGB00847.1 contig_4924 19015 1..49 adaptor:NGB00847.1 contig_4939 23620 23574..23620 adaptor:NGB01029.1 contig_4956 40890 1..24 adaptor:NGB01088.1 contig_4994 71509 71447..71509 adaptor:NGB00847.1 contig_501 34157 34116..34157 adaptor:NGB00847.1 contig_5036 13162 1..77 adaptor:NGB00360.1 contig_5052 64212 1..170 adaptor:NGB01096.1 contig_5063 35265 35243..35265 adaptor:NGB01096.1 contig_5090 27510 27441..27510 adaptor:NGB00847.1 contig_5157 5988 5805..5988 adaptor:NGB00847.1 contig_5168 6086 6051..6086 adaptor:NGB00846.1 contig_5176 9131 1..41 adaptor:NGB00360.1 contig_5243 44178 1..88 adaptor:NGB00847.1 contig_5270 39229 39177..39229 adaptor:NGB00847.1 contig_5452 30446 1..36 adaptor:NGB00846.1 contig_5576 58918 1..34 adaptor:NGB01096.1 contig_5582 108611 1..87 adaptor:NGB00847.1 contig_5590 55235 55210..55235 adaptor:NGB01088.1 contig_5700 8246 1..82 adaptor:NGB00847.1 contig_5815 99837 1..63 adaptor:NGB00847.1 contig_5820 11616 1..202 adaptor:NGB00847.1 contig_5878 55755 1..26 adaptor:NGB01096.1 contig_59 12390 1..24 adaptor:NGB01096.1 contig_5959 11737 11532..11737 adaptor:NGB01096.1 contig_6065 11492 1..32 adaptor:NGB01088.1 contig_6067 19311 1..39 adaptor:NGB01029.1 contig_6092 14700 1..37 adaptor:NGB01029.1 contig_6194 32760 1..19 adaptor:NGB01088.1 contig_620 10761 1..206 adaptor:NGB01029.1 contig_6259 83001 1..50 adaptor:NGB00360.1 contig_6321 29279 29260..29279 adaptor:NGB01096.1 contig_6408 14690 1..74 adaptor:NGB00360.1 contig_6455 68530 68497..68530 adaptor:NGB01090.1 contig_6513 12061 11986..12061 adaptor:NGB00847.1 contig_6542 45321 1..41 adaptor:NGB00360.1 contig_6569 19579 19500..19579 adaptor:NGB00847.1 contig_6628 13125 13107..13125 adaptor:NGB01096.1 contig_6673 6733 6699..6733 adaptor:NGB01088.1 contig_6676 13298 13265..13298 adaptor:NGB01088.1 contig_6692 17411 1..43 adaptor:NGB00847.1 contig_6703 57771 1..63 adaptor:NGB00360.1 contig_6785 8258 8237..8258 adaptor:NGB01088.1 contig_6908 53004 52732..52792 adaptor:NGB00847.1 contig_6940 18777 18580..18777 adaptor:NGB00360.1 contig_6941 42032 41980..42032 adaptor:NGB00847.1 contig_6945 53258 1..71 adaptor:NGB00360.1 contig_6986 49101 1..21 adaptor:NGB01088.1 contig_701 57358 1..28 adaptor:NGB01096.1 contig_7017 41786 1..88 adaptor:NGB00360.1 contig_7035 53503 53477..53503 adaptor:NGB01096.1 contig_7046 12860 12812..12860 adaptor:NGB00360.1 contig_7081 27746 1..78 adaptor:NGB00847.1 contig_7082 26783 1..73 adaptor:NGB00847.1 contig_7083 44465 1..70 adaptor:NGB00847.1 contig_7117 33739 33661..33739 adaptor:NGB00360.1 contig_7197 5439 5361..5439 adaptor:NGB00360.1 contig_720 34826 34755..34826 adaptor:NGB00360.1 contig_7210 16719 1..30 adaptor:NGB01096.1 contig_7225 51589 51483..51519 adaptor:NGB01090.1 contig_7228 37410 1..64 adaptor:NGB00360.1 contig_7296 6652 1..80 adaptor:NGB00847.1 contig_7317 11682 1..30 adaptor:NGB01088.1 contig_7323 47612 47560..47612 adaptor:NGB00847.1 contig_7353 50534 50506..50534 adaptor:NGB01096.1 contig_7478 44000 43977..44000 adaptor:NGB01088.1 contig_7510 11029 1..22 adaptor:NGB01096.1 contig_7540 12614 12566..12614 adaptor:NGB00360.1 contig_7587 74260 74065..74260 adaptor:NGB00847.1 contig_7607 14652 1..31 adaptor:NGB01088.1 contig_7612 27455 27299..27354 adaptor:NGB00360.1 contig_7705 39772 1..49 adaptor:NGB00360.1 contig_7729 22305 1..172 adaptor:NGB00360.1 contig_7747 11568 11502..11568 adaptor:NGB00847.1 contig_7750 52785 52748..52785 adaptor:NGB01029.1 contig_7800 20628 20588..20628 adaptor:NGB00360.1 contig_7851 53514 53439..53514 adaptor:NGB00360.1 contig_7989 51399 1..97 adaptor:NGB00847.1 contig_7992 9120 9035..9120 adaptor:NGB00360.1 contig_7995 103073 103034..103073 adaptor:NGB00360.1 contig_8000 16924 1..85 adaptor:NGB00847.1 contig_8071 73728 73657..73728 adaptor:NGB00360.1 contig_809 20474 20399..20474 adaptor:NGB00360.1 contig_8139 33627 1..25 adaptor:NGB01088.1 contig_8165 17003 16958..17003 adaptor:NGB00847.1 contig_8207 30300 30275..30300 adaptor:NGB01096.1 contig_821 111683 111656..111683 adaptor:NGB01096.1 contig_8236 30705 1..70 adaptor:NGB00360.1 contig_8261 49091 1..181 adaptor:NGB00847.1 contig_8265 28139 27940..28139 adaptor:NGB00360.1 contig_8307 32654 32591..32654 adaptor:NGB00360.1 contig_8340 12953 12925..12953 adaptor:NGB01096.1 contig_8389 19738 1..75 adaptor:NGB00847.1 contig_8399 35159 1..147 adaptor:NGB01096.1 contig_8569 19455 1..38 adaptor:multiple contig_8735 42362 42335..42362 adaptor:NGB01088.1 contig_8737 22308 1..70 adaptor:NGB00360.1 contig_8790 14216 14198..14216 adaptor:NGB01096.1 contig_8797 6889 1..95 adaptor:NGB00847.1 contig_8815 39194 1..80 adaptor:NGB00360.1 contig_886 10028 1..76 adaptor:NGB00360.1 contig_8861 12192 12145..12192 adaptor:NGB00360.1 contig_8909 11109 11042..11109 adaptor:NGB00360.1 contig_8932 8331 8281..8331 adaptor:NGB00847.1 contig_8975 8730 8671..8730 adaptor:NGB00847.1 contig_8992 12682 12661..12682 adaptor:NGB01088.1 contig_8994 7982 7950..7982 adaptor:NGB01096.1 contig_9017 8069 7896..8069 adaptor:NGB00360.1 contig_9045 35343 535..598 adaptor:NGB00847.1 contig_9082 10766 1..28 adaptor:NGB01096.1 contig_9271 17773 17750..17773 adaptor:NGB01096.1 contig_9273 12180 1..180 adaptor:NGB01096.1 contig_9287 6067 1..77 adaptor:NGB00847.1 contig_9474 33382 33060..33111 adaptor:NGB00360.1 contig_9495 19348 19274..19348 adaptor:NGB00847.1 contig_9540 30855 30836..30855 adaptor:NGB01088.1 contig_9591 10604 1..41 adaptor:NGB00847.1 contig_9628 15083 1..34 adaptor:NGB01096.1 contig_9677 5510 5486..5510 adaptor:NGB01088.1 contig_9693 9823 1..84 adaptor:NGB00847.1 contig_9825 54363 54309..54363 adaptor:NGB00847.1 contig_9863 14033 14013..14033 adaptor:NGB01088.1 contig_9993 35388 1..26 adaptor:NGB01096.1 From xvazquezc at gmail.com Mon Oct 23 17:02:47 2017 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Tue, 24 Oct 2017 09:02:47 +1100 Subject: [maker-devel] Contamination report from NCBI In-Reply-To: References: Message-ID: Hi there, Did you perform quality and adapter trimming of your raw reads? That's actually an assembly issue. I would seriously encourage you to redo the assembly before continuing. If that isnt possible, start by removing those sequences and split the contigs at those places as suggested in the report. For the annotation part, not 100% sure but I'd say start with the "Merge/resolve legacy annotations" steps but maybe Carson or Daniel have a different suggestion http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Merge.2FResolve_Legacy_Annotations Cheers, Xabi On 24 October 2017 at 00:30, Emmanuel Nnadi wrote: > Hello > > Good day. > > Please I submitted my sequence to NCBI and they sent back this > contamination report. > > Please how do I use maker to effect the correction > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/ > publications > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Mon Oct 23 18:21:06 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 23 Oct 2017 23:21:06 +0000 Subject: [maker-devel] Contamination report from NCBI In-Reply-To: References: Message-ID: <8B4331B5-9D10-478A-91A5-80AF702CD9CD@illinois.edu> It looks like the adapter is primarily at the ends, which is easy to remove. However, I agree, removing these and redoing the assembly may improve the assembly quality. chris From: maker-devel on behalf of Xabier V?zquez-Campos Date: Monday, October 23, 2017 at 5:03 PM To: Emmanuel Nnadi Cc: Maker Mailing List , "Ence, daniel" Subject: Re: [maker-devel] Contamination report from NCBI Hi there, Did you perform quality and adapter trimming of your raw reads? That's actually an assembly issue. I would seriously encourage you to redo the assembly before continuing. If that isnt possible, start by removing those sequences and split the contigs at those places as suggested in the report. For the annotation part, not 100% sure but I'd say start with the "Merge/resolve legacy annotations" steps but maybe Carson or Daniel have a different suggestion http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Merge.2FResolve_Legacy_Annotations Cheers, Xabi On 24 October 2017 at 00:30, Emmanuel Nnadi > wrote: Hello Good day. Please I submitted my sequence to NCBI and they sent back this contamination report. Please how do I use maker to effect the correction Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Xabier V?zquez-Campos, PhD Research Associate NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmokrejs at gmail.com Tue Oct 24 05:23:38 2017 From: mmokrejs at gmail.com (=?UTF-8?Q?Martin_MOKREJ=c5=a0?=) Date: Tue, 24 Oct 2017 12:23:38 +0200 Subject: [maker-devel] Contamination report from NCBI In-Reply-To: References: Message-ID: Hi Emmanuel, use trimmomatic or cutadapt to remove the adapters and check the output file for unremoved cases. Once they are all removed redo the assembly. Martin Emmanuel Nnadi wrote: > Hello > > Good day. > > Please I submitted my sequence to NCBI and they sent back this contamination report. > > Please how do I use maker to effect the correction -- Martin Mokrejs, Ph.D. Adapter/artefact removal from datasets based on the following technologies: 454 / IonTorrent / Evrogen MINT / Clontech SMART / ..., Illumina http://www.bioinformatics.cz/software/supported-protocols/ From eennadi at gmail.com Tue Oct 24 05:44:20 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Tue, 24 Oct 2017 11:44:20 +0100 Subject: [maker-devel] Contamination report from NCBI In-Reply-To: References: Message-ID: Thanks! Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications On Oct 24, 2017 11:23 AM, "Martin MOKREJ?" wrote: > Hi Emmanuel, > use trimmomatic or cutadapt to remove the adapters and check the output > file for unremoved cases. Once they are all removed redo the assembly. > Martin > > Emmanuel Nnadi wrote: > > Hello > > > > Good day. > > > > Please I submitted my sequence to NCBI and they sent back this > contamination report. > > > > Please how do I use maker to effect the correction > > -- > Martin Mokrejs, Ph.D. > Adapter/artefact removal from datasets based on the following technologies: > 454 / IonTorrent / Evrogen MINT / Clontech SMART / ..., Illumina > http://www.bioinformatics.cz/software/supported-protocols/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Tue Oct 24 11:54:13 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Tue, 24 Oct 2017 12:54:13 -0400 Subject: [maker-devel] gene annotation for a better genome In-Reply-To: <5AFEDD05-DF02-463F-A6EE-1619A9BB968D@gmail.com> References: <5AFEDD05-DF02-463F-A6EE-1619A9BB968D@gmail.com> Message-ID: Dear Carson: Thank you again for your suggestions. I just get the new genome assembly of NMR and start to do gene annotation. I understand you ideas about this. But can I simply use the old genome transcripts as transcript evidence, and just following the standard Maker2 pipeline? I set est2genome=1 and provide the mRNA sequences in the fasta format for the first round training of SNAP. For transcripts I have the following choices. I think the first choice is more reliable and better, right? (1) There are about 60,000 RefSeq transcripts from NCBI. So I downloaded those sequences in fasta format. (2) We have the raw data of RNA-seq from 11 tissues, we can do assembly by trinity for each sample and then get the transcripts. But I think most of the RNA-seq should have been submitted to NCBI. BTW, if we use the RefSeq data from NCBI, we can download the mRNA sequences, coding sequences or protein sequences. I wonder which type of data are the best to train the SNAP? For Augustus, we will use BUSCO to train it. Many thanks. Best Quanwei 2017-09-29 12:36 GMT-04:00 Carson Holt : > You can try using the est2genome=1 option to map the old models forward > onto the new assembly as if they were ESTs (add a line that says > est_forward=1 to the control file to maintain old naming and set est=1 to > the old model transcript file). Then provide the final models as a pred_gff > for a subsuquent run (i.e. a traditional MAKER run where you are annotating > the new assembly with transcript and protein evidence and ab initio > predictors). Don?t supply the old models to est= on that run. > > The idea behind doing it this way is: > 1. You need to get old models onto the new assembly so coordinates will > change. So by doing it this way, you will at least be able to move many > models forward based on homology. > 2. By providing the models to pred_gff on a subsequent MAKER run, you are > just letting old models compete against new annotations. They will be > rejected if they have no evidence support, or can be kept if they score > better than alternate models from SNAP/Augustus. That way you have the > chance to integrate old models while at the same time rejecting some old > models that have no evidence overlap. > > ?Carson > > > > On Sep 28, 2017, at 6:05 AM, Quanwei Zhang > wrote: > > > > Hello: > > > > Recently, we got a new version of NMR genome, whose genome had been > assembled and annotated a few years ago. We can download the gene > annotation from NCBI. > > > > Now we want to annotate the new genome using Maker2 pipeline. I wonder > how can I fully make use of existing annotations. On the other hand, since > the previous genome is not very well assemblies, some genes annotation > maybe false positives. I hope those false positive genes in previous > annotation won't mislead Maker2 for current gene annotation. > > > > Do you have any suggestions. Thanks > > > > Best > > Quanwei > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Oct 24 17:26:00 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 24 Oct 2017 16:26:00 -0600 Subject: [maker-devel] gene annotation for a better genome In-Reply-To: References: <5AFEDD05-DF02-463F-A6EE-1619A9BB968D@gmail.com> Message-ID: Yes. If you use est2genome it will just align the model, and then find the longest ORF. So it is a quick way to jsut align old models to the new assembly. Alternatively you can just do de novo annotation. ?Carson > On Oct 24, 2017, at 10:54 AM, Quanwei Zhang wrote: > > Dear Carson: > > Thank you again for your suggestions. I just get the new genome assembly of NMR and start to do gene annotation. I understand you ideas about this. But can I simply use the old genome transcripts as transcript evidence, and just following the standard Maker2 pipeline? I set est2genome=1 and provide the mRNA sequences in the fasta format for the first round training of SNAP. > > For transcripts I have the following choices. I think the first choice is more reliable and better, right? > (1) There are about 60,000 RefSeq transcripts from NCBI. So I downloaded those sequences in fasta format. > (2) We have the raw data of RNA-seq from 11 tissues, we can do assembly by trinity for each sample and then get the transcripts. But I think most of the RNA-seq should have been submitted to NCBI. > > BTW, if we use the RefSeq data from NCBI, we can download the mRNA sequences, coding sequences or protein sequences. I wonder which type of data are the best to train the SNAP? For Augustus, we will use BUSCO to train it. > > Many thanks. > > Best > Quanwei > > > > > 2017-09-29 12:36 GMT-04:00 Carson Holt >: > You can try using the est2genome=1 option to map the old models forward onto the new assembly as if they were ESTs (add a line that says est_forward=1 to the control file to maintain old naming and set est=1 to the old model transcript file). Then provide the final models as a pred_gff for a subsuquent run (i.e. a traditional MAKER run where you are annotating the new assembly with transcript and protein evidence and ab initio predictors). Don?t supply the old models to est= on that run. > > The idea behind doing it this way is: > 1. You need to get old models onto the new assembly so coordinates will change. So by doing it this way, you will at least be able to move many models forward based on homology. > 2. By providing the models to pred_gff on a subsequent MAKER run, you are just letting old models compete against new annotations. They will be rejected if they have no evidence support, or can be kept if they score better than alternate models from SNAP/Augustus. That way you have the chance to integrate old models while at the same time rejecting some old models that have no evidence overlap. > > ?Carson > > > > On Sep 28, 2017, at 6:05 AM, Quanwei Zhang > wrote: > > > > Hello: > > > > Recently, we got a new version of NMR genome, whose genome had been assembled and annotated a few years ago. We can download the gene annotation from NCBI. > > > > Now we want to annotate the new genome using Maker2 pipeline. I wonder how can I fully make use of existing annotations. On the other hand, since the previous genome is not very well assemblies, some genes annotation maybe false positives. I hope those false positive genes in previous annotation won't mislead Maker2 for current gene annotation. > > > > Do you have any suggestions. Thanks > > > > Best > > Quanwei > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daren.card at gmail.com Wed Oct 25 07:17:13 2017 From: daren.card at gmail.com (Daren C. Card) Date: Wed, 25 Oct 2017 07:17:13 -0500 Subject: [maker-devel] MAKER RepeatRunner error on long scaffolds only In-Reply-To: <49A07052-11CE-4D20-A8E1-2E036F04C45C@gmail.com> References: <2460BB61-C918-40B5-ABF2-03193BF13CCC@gmail.com> <52A27F91-063E-45C5-BEE0-BED0BF4E861E@gmail.com> <228ECD18-7B0E-47EB-9F58-FA3C31421A52@gmail.com> <90B18E05-63DB-4458-BC9B-807972BE1414@gmail.com> <97656D7C-3613-4B0B-9D99-0441AC28ABCC@gmail.com> <49A07052-11CE-4D20-A8E1-2E036F04C45C@gmail.com> Message-ID: <0406D4C3-9C43-4198-B2EA-241C6C504425@gmail.com> Hi Carson (and CCed MAKER list for the record), Thanks for troubleshooting my issue further. Good to hear that the run should ultimately work, but strange it isn?t for me. I?ll keep playing with it and will hopefully get it sorted out by running through the list you suggested. Thanks again for the help, Daren > On Oct 24, 2017, at 11:27 AM, Carson Holt wrote: > > I cannot seem to replicate this. I ran with MAKER 2.31.8 and 2.31.9, both with and without the GFF3 file (total of 4 runs). It succeeded without issues in every case. > > The only things I can think to try are. > 1. Reinstall BLAST+. Even though you have 2.6.0, just try it anyways. Also Install rmblast 2.6.0 for use wth RepeatMasker (requires that you install from source). > 2. Maker sure you run ./configure inside RepeatMasker to let it know about the new rmblast installation. > 3. Change the location of blast and related scripts in maker_exe.ctl otherwise MAKER won?t know to use your new installation. > 4. delete the mpi_blastdb directory under MAKER?s output directory tp force it to rebuild all BLAST indexes. > 5. delete any fle with a ?.db? extension in the maker output directory to force it to rebuld all GFF3 indexes. > 6. Update BioPerl to the current CPAN version. > > Also here is a link to the results I got for your contig (version 2.31.8 using the repeat masking GFF3 file) ?> http://weatherby.genetics.utah.edu/data/scaffold-1.tgz > > ?Carson > > > >> On Oct 17, 2017, at 6:46 AM, Daren C. Card wrote: >> >> Hi Carson, >> >> Thanks for offering to take a further look at this. I?ve uploaded all the files that I think you?d need to run MAKER on your systems, but let me know if you need anything else. My username is ?guest_5038?. >> >> Repeat annotations GFF is from RepeatModeler, with simple repeats filtered away. Transcript evidence was from Trinity assembly of several RNAseq libraries. Several sets of protein evidence from related species. Also have augustus HMM trained based on the genome assembly using BUSCO with retraining turned on. >> >> The command I?ve used is below, and here are the software versions I?m working with: >> >> Maker - 2.31.8 >> BLAST - 2.6.0 >> Augustus - 3.2.3 >> RepeatMasker - 4.0.6 >> >> mpiexec -n 12 maker -base CroVir_rnd1_chr1 round1_maker_opts.chr1.ctl maker_bopts.ctl maker_exe.ctl >> >> Thanks again! >> Daren >> >> >>> On Oct 13, 2017, at 10:37 AM, Carson Holt wrote: >>> >>> So you have an input GFF3 file? Could you send it to me along with the problem contig. If you want you can upload the maker control files and evidence sets, and I can just recreate the run for the contig. >>> >>> Upload here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>> >>> ?Carson >>> >>> >>> >>>> On Oct 12, 2017, at 8:22 PM, Daren C. Card wrote: >>>> >>>> Hi Carson, >>>> >>>> Thanks for the help. Issue is still lingering. I?ve tried my full ?ideal? run using both the BLAST legacy 2.2.26 and also 2.6 and get the same error, so doesn?t seem to be a BLAST issue. Or is one that won?t be easy to overcome. >>>> >>>> Using BLAST v. 2.6, I tried some more runs turning off RepeatRunner or excluding the complex repeat GFF I?m trying to supply. Seems to be running fine without my GFF, which indicates to me that the issue is this file and not BLAST. Disclaimer: I didn?t run the entire scaffold since it is quite large, but it went well past the point at which it was otherwise failing which leads me to believe it would finish okay. >>>> >>>> I validated the GFF at http://genometools.org/cgi-bin/gff3validator.cgi. I had previously had <10 negative start coordinates for the repeat coordinates in the attributes field of the GFF, which I just set to 1 to give a clean GFF. This was what I used for the runs I described above, so whatever issue there is with this GFF is a mystery to me. >>>> >>>> What advice do you have for further troubleshooting to try to determine what part of the GFF is causing the issue? I don?t see any obvious way info about how the sequence or the GFF is partitioned up for the annotation among the output files produced, so any help you can provide would be great. >>>> >>>> Hoping I can resolve this as maybe this is useful to others. Weird that I?m getting this error, as I?ve annotated several other genomes in a similar manner and never had this issue. They were less contiguous, but can?t imagine that really mattering. >>>> >>>> Thanks, >>>> Daren >>>> >>>> >>>>> On Oct 8, 2017, at 7:37 PM, Carson Holt wrote: >>>>> >>>>> MAKER will use whatever blast is indicated in maker_exe.ctl, so make sure the new installation is the one indicated there. RepeatRunner is not part of RepeatMasker, and is a separate step that is essentially just a modified BLASTX against a protein database. So the standard NCBI blast+ installation is what gets used for that (not RMBLAST). >>>>> >>>>> The error you get is because the BLAST report is truncated. At the top of a BLAST report there is a summary of results, and then below there are details about each result. What is happening is that there are results in the top summary that are not being found in the bottom detail section. If Updating to BLAST+ 2.6 does not fix it for you, you may need to drop to legacy NCBI BLAST (i.e. the one that is not the BLAST+ rewrite). Here ?> ftp://ftp.ncbi.nlm.nih.gov/blast/executables/legacy/2.2.26/ >>>>> >>>>> ?Carson >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> On Oct 6, 2017, at 6:23 AM, Daren C. Card wrote: >>>>>> >>>>>> Dear Carson, >>>>>> >>>>>> Thanks so much for the quick reply. I updated BLAST to v2.6 and reran the configure script for RepeatMasker. Looks like MAKER should natively work with the BLAST that is available in the $PATH. >>>>>> >>>>>> Unfortunately, I?m still getting the same error what appears to be at roughly the same spot (~child 226). I?ve copied the stderr below. I checked my GFF file and I don?t see any issues with coordinates. I?m going to try running without a GFF of repeat annotations to see what that does, but in the meantime I wanted to send an update and see if there is anything else I should look into. >>>>>> >>>>>> Thank you, >>>>>> Daren Card >>>>>> >>>>>> >>>>>> ################################################ >>>>>> doing repeat masking >>>>>> re reading repeat masker report. >>>>>> /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.simple.rb.out >>>>>> doing blastx repeats >>>>>> re reading blast report. >>>>>> /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.te_proteins%2Efasta.repeatrunner >>>>>> deleted:2 hits >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> collecting blastx repeatmasking >>>>>> processing all repeats >>>>>> in cluster::shadow_cluster... >>>>>> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >>>>>> --> rank=NA, hostname=moonunit0 >>>>>> ERROR: Failed while processing all repeats >>>>>> ERROR: Chunk failed at level:3, tier_type:1 >>>>>> FAILED CONTIG:scaffold-1 >>>>>> >>>>>> ERROR: Chunk failed at level:2, tier_type:0 >>>>>> FAILED CONTIG:scaffold-1 >>>>>> >>>>>> examining contents of the fasta file and run log >>>>>> ################################################ >>>>>> >>>>>> >>>>>> >>>>>>> On Oct 4, 2017, at 11:03 AM, Carson Holt wrote: >>>>>>> >>>>>>> The point where it dies is because there is no start/end coordinate for one of the alignments. The issue can either be with the GFF3 you gave it or is a truncated BLAST report. Recently there have been a number of weird BLAST+ issues related to truncated reports. Updating to 2.6+ seems to solve it for most people. There is also a 2.6 update for rmblast inside RepeatMasker. I submitted a bug report and example set to BLAST a few months ago. >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>>> On Oct 4, 2017, at 9:53 AM, Daren C. Card wrote: >>>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I?ve been having an issue with MAKER (v. 2.31.8) that I haven?t been able to overcome, and no former questions have really addressed or helped fix the problem. I?ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size? >>>>>>>> >>>>>>>> I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can?t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I?m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library). >>>>>>>> >>>>>>>> Any help would be greatly appreciated. >>>>>>>> Daren Card >>>>>>>> >>>>>>>> University of Texas Arlington >>>>>>>> >>>>>>>> ################################################### >>>>>>>> doing blastx repeats >>>>>>>> running blast search. >>>>>>>> #--------- command -------------# >>>>>>>> Widget::blastx: >>>>>>>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner >>>>>>>> #-------------------------------# >>>>>>>> deleted:0 hits >>>>>>>> collecting blastx repeatmasking >>>>>>>> processing all repeats >>>>>>>> in cluster::shadow_cluster... >>>>>>>> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >>>>>>>> --> rank=3, hostname=moonunit0 >>>>>>>> ERROR: Failed while processing all repeats >>>>>>>> ERROR: Chunk failed at level:3, tier_type:1 >>>>>>>> FAILED CONTIG:scaffold-1 >>>>>>>> >>>>>>>> doing blastx repeats >>>>>>>> running blast search. >>>>>>>> #--------- command -------------# >>>>>>>> Widget::blastx: >>>>>>>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner >>>>>>>> #-------------------------------# >>>>>>>> ERROR: Chunk failed at level:2, tier_type:0 >>>>>>>> FAILED CONTIG:scaffold-1 >>>>>>>> >>>>>>>> deleted:0 hits >>>>>>>> deleted:0 hits >>>>>>>> ################################################### >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> maker-devel mailing list >>>>>>>> maker-devel at box290.bluehost.com >>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>> >>>>>> >>>>> >>>> >>> >> > From venyao at qq.com Wed Oct 25 02:25:25 2017 From: venyao at qq.com (=?ISO-8859-1?B?V2VuIFlhbw==?=) Date: Wed, 25 Oct 2017 15:25:25 +0800 Subject: [maker-devel] NNN in maker output transcript Message-ID: Dear guys, Recently, I run maker to annotate a genome. I found that the transcript fasta file output by Maker contains "NNN". Is this normal? If not, what's going on? Is this a bug of maker or my configuration of maker is not correct? I told maker to use snap and augustus for de novo prediction and use exonerate to align ESTs and proteins. Thanks! Wen Yao -------------- next part -------------- An HTML attachment was scrubbed... URL: From dandence at gmail.com Wed Oct 25 10:42:04 2017 From: dandence at gmail.com (Daniel Ence) Date: Wed, 25 Oct 2017 11:42:04 -0400 Subject: [maker-devel] NNN in maker output transcript In-Reply-To: References: Message-ID: <4913D7BA-CD9B-4B7F-83EF-B8072B4950A6@gmail.com> Hi Wen Yao, Do you mean that some of the transcript sequences contain ?N? characters or that an entire transcript sequence is ?NNN?? > On Oct 25, 2017, at 3:25 AM, Wen Yao wrote: > > Dear guys, > > Recently, I run maker to annotate a genome. I found that the transcript fasta file output by Maker contains "NNN". Is this normal? > If not, what's going on? Is this a bug of maker or my configuration of maker is not correct? > I told maker to use snap and augustus for de novo prediction and use exonerate to align ESTs and proteins. > > Thanks! > > Wen Yao > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Oct 25 10:42:34 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 25 Oct 2017 09:42:34 -0600 Subject: [maker-devel] NNN in maker output transcript In-Reply-To: References: Message-ID: <96D45DF3-83D0-4EF3-AE29-1B929A369B81@gmail.com> The gene predictor generates the model. I don?t think snap will generate a model that contain an N. Augustus might be able to across a single codon (I?m not sure there). The N means that the nucleotide is unknown (i.e. it can be A, T, C or G). An NNN codon produces the amino acid X (which is the unknown amino acid code). So it is possible that for something as short as one or two codon?s that the predictor thinks it?s ok to assume that it will produce a valid codon and uses it to complete the reading frame. Alternatively if you are using est2genome=1 or est_gff then what you are seeing is just the result of an alignment which can align to a couple of N's. You should not use est2genome=1 for anything but training. Also est_gff or pred_gff will not be filtered if you supplied an feature location that includes an N. ?Carson > On Oct 25, 2017, at 1:25 AM, Wen Yao wrote: > > Dear guys, > > Recently, I run maker to annotate a genome. I found that the transcript fasta file output by Maker contains "NNN". Is this normal? > If not, what's going on? Is this a bug of maker or my configuration of maker is not correct? > I told maker to use snap and augustus for de novo prediction and use exonerate to align ESTs and proteins. > > Thanks! > > Wen Yao > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Oct 25 10:43:37 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 25 Oct 2017 09:43:37 -0600 Subject: [maker-devel] NNN in maker output transcript In-Reply-To: <96D45DF3-83D0-4EF3-AE29-1B929A369B81@gmail.com> References: <96D45DF3-83D0-4EF3-AE29-1B929A369B81@gmail.com> Message-ID: Also you can check the source of the model by looking a the name. i.e. does it have, augustus, snap, or est2genome in the name? ?Carson > On Oct 25, 2017, at 9:42 AM, Carson Holt wrote: > > The gene predictor generates the model. I don?t think snap will generate a model that contain an N. Augustus might be able to across a single codon (I?m not sure there). The N means that the nucleotide is unknown (i.e. it can be A, T, C or G). An NNN codon produces the amino acid X (which is the unknown amino acid code). So it is possible that for something as short as one or two codon?s that the predictor thinks it?s ok to assume that it will produce a valid codon and uses it to complete the reading frame. Alternatively if you are using est2genome=1 or est_gff then what you are seeing is just the result of an alignment which can align to a couple of N's. You should not use est2genome=1 for anything but training. Also est_gff or pred_gff will not be filtered if you supplied an feature location that includes an N. > > ?Carson > > > > >> On Oct 25, 2017, at 1:25 AM, Wen Yao wrote: >> >> Dear guys, >> >> Recently, I run maker to annotate a genome. I found that the transcript fasta file output by Maker contains "NNN". Is this normal? >> If not, what's going on? Is this a bug of maker or my configuration of maker is not correct? >> I told maker to use snap and augustus for de novo prediction and use exonerate to align ESTs and proteins. >> >> Thanks! >> >> Wen Yao >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From eennadi at gmail.com Thu Oct 26 16:34:33 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Thu, 26 Oct 2017 22:34:33 +0100 Subject: [maker-devel] How to remove contigs from GFF file Message-ID: Hello, I need to remove sequences from my GFF file can someone help me with command line for such removal ERROR: valid [SEQ_FEAT.FeatureBeginsOrEndsInGap] Feature begins or ends in gap starting at 17625 FEATURE: Gene: CR513_57782 <46071> [lcl|contig_14719:17653-17724] [lcl|contig_14719: delta, dna len= 17790] ERROR: valid [SEQ_INST.ShortSeq] Sequence only 2 residues BIOSEQ: gnl|aceprd|CR513_62412: raw, aa len= 2 Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From bmoore at genetics.utah.edu Fri Oct 27 08:17:41 2017 From: bmoore at genetics.utah.edu (Marvin B Moore) Date: Fri, 27 Oct 2017 13:17:41 +0000 Subject: [maker-devel] Backlash running through my sequence In-Reply-To: <8FD23F25-92D4-4A9C-873B-BB559B2CCBF4@illinois.edu> References: <09603A3A-9DC0-40DC-A111-9DC1FCDF80BB@gmail.com> <8FD23F25-92D4-4A9C-873B-BB559B2CCBF4@illinois.edu> Message-ID: <98FAE3F3-7C52-4EDA-8FBB-5F43DB7D54C9@umail.utah.edu> Those look suspiciously like the remnants of end-of-line control characters. Since Windows, Mac OS X and Linux all use slightly different control characters to mark end-of-line I?d look at the upstream path of where your files come from and how they?ve been processed by you or others upstream MAKER (were they generated or processed on a MS or Mac server). One bizarre example we?ve seen is that files that simply pass through an MS Outlook server as an e-mail attachment have had their end-of-line characters converted to MS format. Good luck? Barry On Oct 17, 2017, at 1:11 PM, Fields, Christopher J > wrote: I agree with Carson, though my guess is any fasta converters will either fail on these characters as non-IUPAC, or will silently remove them. Running them through a converter may not solve all the issues though, as the backslash also appears in the FASTA headers at the end of the line: cjfields-imac:MAKER cjfields$ grep '>' sample_1.fasta | grep '\\' >contig_134\ >contig_149\ >contig_158\ >contig_222\ >contig_316\ >contig_582\ >contig_634\ >contig_700\ >contig_741\ ? I?m curious, was this edited using any particular program prior to MAKER (or was this an amalgam of different files)? chris From: maker-devel > on behalf of Carson Holt > Date: Monday, October 16, 2017 at 11:22 AM To: Emmanuel Nnadi > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Backlash running through my sequence I would not just remove them. The fact they are there calls into question how they got there in the first place. If you generated this file yourself, you may want to intead use fasta_tool. ?Carson On Oct 15, 2017, at 3:32 PM, Emmanuel Nnadi > wrote: Hi all, I am trying to running annotation on some of my sequences but noticed that i have backslash that runs through the sequence. Please how do I remove them I attached the sequence Thanks Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From bmoore at genetics.utah.edu Fri Oct 27 08:24:44 2017 From: bmoore at genetics.utah.edu (Marvin B Moore) Date: Fri, 27 Oct 2017 13:24:44 +0000 Subject: [maker-devel] QI codes insufficient - how to get frac exons with EST only? In-Reply-To: References: <93934B45-909D-48FD-A840-B4F59F15AB53@gmail.com> <6A3091A3-5F0E-470D-89F3-4B6C16E50F4B@gmail.com> Message-ID: Also, you could probably build these overlap sets on the command line by subsetting the MAKER GFF3 file and then using BedTools intersect for overlap queries. Barry On Oct 11, 2017, at 10:19 PM, Matt Simenc > wrote: Very good, thank you! Matt On Wed, Oct 11, 2017 at 8:22 AM, Carson Holt > wrote: Also look at GAL for building GFF3 feature queries ?> https://github.com/The-Sequence-Ontology/GAL ?Carson On Oct 11, 2017, at 9:18 AM, Michael Campbell > wrote: Hi Matt, I have a hacky way that I?ve done it. It requires running MAKER two more times but they are quicker runs. To identify the genes that have protein support I pass all of the annotation back to MAKER using the model_gff option in the maker_opts.ctl file. Then I pull out all of the protein2genome features from the big MAKER GFF3 file and pass them in using the protein_gff option. I turn off all repeat masking and run MAKER. It runs fast because it doesn?t have to run any gene finders, align evidence, or repeatmask. In the output any gene with an AED less than 1 has protein support. Then I do the same thing with est2genome lines from the big GFF3 file and put them in as est_gff. The output of that one gives you genes with EST support. Then the genes with an AED of less than one in both sets have support from protein and EST. Hope this helps, Mike On Oct 11, 2017, at 10:53 AM, Matt Simenc > wrote: Hey MAKER people, I would like to make a Venn diagram showing the kinds of evidence supporting gene models in my MAKER annotation where the left side shows number of genes with EST support only, the right side shows number of genes with protein support only, and the intersection shows number of genes with EST and protein support. QI summary has: Fraction of exons that overlap an EST alignment Fraction of exons that overlap EST or Protein alignments Please correct me if I'm wrong, because I am interpreting the first to be fraction of exons that overlap an EST alignment and possibly also a protein alignment. If that is the case then we can't calculate the number of genes that overlap only EST or (EST and protein) from the QI information. Anyone have a way to do this or have a script to parse the MAKER GFF3 to get this? Thanks!!! Matt Simenc _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dandence at gmail.com Fri Oct 27 09:51:21 2017 From: dandence at gmail.com (Daniel Ence) Date: Fri, 27 Oct 2017 10:51:21 -0400 Subject: [maker-devel] How to remove contigs from GFF file In-Reply-To: References: Message-ID: Hi Emmanuel, can you send the command that produced the error? If you need to remove certain scaffolds or contigs from a gff3 file, you can use grep to to filter out certain scaffolds like this ?grep -v ?scaffold_name? gff3_file?. ~Daniel > On Oct 26, 2017, at 5:34 PM, Emmanuel Nnadi wrote: > > Hello, > > I need to remove sequences from my GFF file can someone help me with command line for such removal > > ERROR: valid [SEQ_FEAT.FeatureBeginsOrEndsInGap] Feature begins or ends in gap starting at 17625 FEATURE: Gene: CR513_57782 <46071> [lcl|contig_14719:17653-17724] [lcl|contig_14719: delta, dna len= 17790] > ERROR: valid [SEQ_INST.ShortSeq] Sequence only 2 residues BIOSEQ: gnl|aceprd|CR513_62412: raw, aa len= 2 > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From carsonhh at gmail.com Fri Oct 27 17:00:17 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 Oct 2017 16:00:17 -0600 Subject: [maker-devel] "ALRM" isn't numeric in exit - MAKER warning message In-Reply-To: References: Message-ID: <399AB5BD-2FC5-45F4-9AC8-1665CCFEA0D1@gmail.com> Hi Marivi, The only time MAKER uses the ALRM signal is during exit. Sometimes MPI_Finalize can freeze (it has to do with the fact it is being called from Perl). So we set an alarm just in case. Then if it takes to long we assume it is frozen and let things exit in a less than graceful way rather than let it block forever (it is already finished after all). The complaint you get may be because your system doesn?t support the alarm signal or forks.pm (which tries to intercept signals) is having an issue. Or it may just be ugliness related to parts of the process being killed with other parts still being active (it is an ungraceful exit after all). Or it may be another source of the ALRM all together (but I assume it is the MAKER ALRM given that it happens right after MAKER says it is finished). Thanks, Carson > On Oct 27, 2017, at 1:03 PM, Marivi Colle wrote: > > Hi Carson, > > After running MAKER, I checked my std output and here's the message at the end of the file. I was wondering what this warning message means? > > > Start_time: 1508465182 > End_time: 1508950543 > Elapsed: 485361 > > > Maker is now finished!!! > > Argument "ALRM" isn't numeric in exit at /opt/software/BioPerl/1.6.924--GCC-4.4.7/lib64/perl5/forks.pm line 2184. > Argument "ALRM" isn't numeric in exit at /opt/software/BioPerl/1.6.924--GCC-4.4.7/lib64/perl5/forks.pm line 2184. > Argument "ALRM" isn't numeric in exit at /opt/software/BioPerl/1.6.924--GCC-4.4.7/lib64/perl5/forks.pm line 2184. > Argument "ALRM" isn't numeric in exit at /opt/software/BioPerl/1.6.924--GCC-4.4.7/lib64/perl5/forks.pm line 2184 > > > Thank you. > Marivi > > > -- > Marivi G. Colle > Research Associate > Department of Horticulture > Michigan State University > 1066 Bogue St., East Lansing > Michigan 48824-1325, USA -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.tranvan at unil.ch Sat Oct 28 09:14:59 2017 From: patrick.tranvan at unil.ch (Patrick Tran Van) Date: Sat, 28 Oct 2017 14:14:59 +0000 Subject: [maker-devel] Advice on my pipeline In-Reply-To: <651D4267-0FD7-4A92-B778-8976B47353BB@gmail.com> References: <6b029690bace4d3fbae77c0bb1bddce8@prdexch02.ad.unil.ch> <1498470630221.84642@unil.ch> <696C51C6-5606-4ECB-A8B8-9C077182FFFA@gmail.com> <1498908228256.16549@unil.ch> <58E904BF-9AB8-4AC7-B10B-C902F414E03D@gmail.com> <1505986013492.52354@unil.ch>, <651D4267-0FD7-4A92-B778-8976B47353BB@gmail.com> Message-ID: <1509200133044.96929@unil.ch> Hi Carson, If I want to look for alternative splicing variant, can I just add the option alt_splice=1 only at the last round of maker or do I have to set it since the beggining ? (and perform the 4 rounds with this option). Cheers, Patrick ________________________________ From: Carson Holt Sent: Friday, September 22, 2017 10:08 PM To: Patrick Tran Van Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Advice on my pipeline The gff3 passthrough options are there to help users get old data into MAKER when they have lost access to the original files. But for iterative running of the pipeline, it is more effective just to rerun in place so MAKER can access the raw alignment reports. The raw reports from the alignments have more detail than what is stored in the GFF3. Details that are lost when trying to use the GFF3 as input. ?Carson On Sep 21, 2017, at 3:26 AM, Patrick Tran Van > wrote: Hi Carson, I have a doubt for the round 2, so in a previous reply you said: " Also it is more convenient to do each run in the same directory rather than supplying the previous run as GFF3 input. MAKER will automatically recycle previous results archived in the run directory when you do this. Using the maker_gff option is really more for getting data into the run from jobs performed a long time ago (so they can?t be run in the same directory). " Does it means that I don't need to modify the section : #-----Re-annotation Using MAKER Derived GFF3 ? If I let everything by default such as : altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no It will not look again for repeat and protein + transcriptome alignment ? Patrick Tran Van Groups Chapuisat, Robinson-Rechavi & Schwander Department of Ecology and Evolution University of Lausanne Le Biophore CH-1015 Lausanne Switzerland Office 3206 ________________________________ From: Carson Holt > Sent: Monday, July 3, 2017 10:50 PM To: Patrick Tran Van Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Advice on my pipeline maker2zff is just for SNAP training and not for gene filtering (please do not use it for filtering, it does not do what you think). So the final annotation set after maker with correct_est_fusion is 16,850. To decide which set is better, look at them in a browser (gene counts are not useful for guaging result). A well annotated genome will have evidence clusters that closely match the final models. A poorly annoted genome will have evidence clusters that are split or merged by the models. The corrected_est_fusion does two things. It trims long overlapping UTR fragments, and it stops evidence clusters from being merged on BLASTP evidence alone (so gene predictors will get unmerged hint regions if clusters are split). You may also find that using jaccard_clip with Trinity has reduced sensitivity for the transcript data (you may lose things that were there before, but now have better specificity, i.e. fewer false positives). Make sure you provided protein data from at least two related species to help maintain sensitivity lost form the transcript data. You can also add rejected genes models back in after the fact by using iprscan to identify unsupported models with identifiable protein domains ?> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4286374/ Thanks, Carson On Jul 1, 2017, at 5:21 AM, Patrick Tran Van > wrote: So I have assembled my transcriptome with Trinity using the jaccard clip option and I have run maker with and without corrected_est_fusion. I have then use SNAP to train/filter it with: maker2zff specie.all.gff Here are my results: Number of gene after maker -> Number of gene after maker2zff - Without corrected_est_fusion: 21621 -> 13875 - With corrected_est_fusion: 16850 -> 9098 1 )If I understand well how works corrected_est_fusion, because it prevents gene merging, shouldn't be the invert ? Normally I should find more genes with corrected_est_fusion right ? 2) I think I should find something like 13000-14000 genes for my specie. SHould I go with the "Without corrected_est_fusion" for the 2nd iteration of maker ? Thanks for your help Patrick Tran Van Groups Chapuisat, Robinson-Rechavi & Schwander Department of Ecology and Evolution University of Lausanne Le Biophore CH-1015 Lausanne Switzerland Office 3206 ________________________________ From: Carson Holt > Sent: Monday, June 26, 2017 11:38 PM To: Patrick Tran Van Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Advice on my pipeline Sorry the option is ?> correct_est_fusion It is in the maker_opts.ctl file. I would use both SNAP and Augustus on a few large contigs then review the results manually. If one of them is not behaving well, then drop it. If both behave well (i.e. correlate well with evidence alignemnts) then keep them both. ?Carson On Jun 26, 2017, at 3:48 AM, Patrick Tran Van > wrote: Thanks for your answer. 1) Do you think that adding a Augustus training in addition to SNAP at the step 3 and 5 will add more confidence (instead of adding Augustus only for the final round) ? Because I am using autoAug for this and it tooks a while to compute .. 2) I don't see this option : 'avoid_est_fusion=1' . I have tried to add it but I got this error: WARNING: Invalid option 'avoid_est_fusion' in control file maker_opts.ctl (I am using v 2.31.8 ) Patrick Tran Van Groups Chapuisat, Robinson-Rechavi & Schwander Department of Ecology and Evolution University of Lausanne Le Biophore CH-1015 Lausanne Switzerland Office 3206 ________________________________ From: Carson Holt > Sent: Monday, June 5, 2017 8:29 PM To: Patrick Tran Van Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Advice on my pipeline Your plan sounds good. A couple of related notes. Insect genomes tend to have high gene density, so gene merging will be the primary difficulty. You can avoid merging of mRNA-seq evidence by using options like jaccard_clip in Trinity. Then use avoid_est_fusion=1 inside of MAKER. Also it is more convenient to do each run in the same directory rather than supplying the previous run as GFF3 input. MAKER will automatically recycle previous results archived in the run directory when you do this. Using the maker_gff option is really more for getting data into the run from jobs performed a long time ago (so they can?t be run in the same directory). ?Carson On Jun 2, 2017, at 3:56 AM, Patrick Tran Van > wrote: Hello, This is my first time running Maker for an insect genome annotation. I have found various resources and tried to make a consensus, I am looking for your thoughts and advices about my pipeline, if I can improve something or doing useless things: What I have: - RNA evidence: transcriptome - Proteine evidence: swissprot/uniprot + busco protein set of insect - Cegma and busco results of my genome 1) Train SNAP with CEGMA 2) Run (run A) maker with repeat masking with transcript, protein, the new SNAP file (from step 1) and augustus file (from busco). 3) Create SNAP model from run A. 4) Run (run B ) with the new SNAP (done at step 3) with options turned off (est2genome=0) and (protein2genome=0) data, provide gff file (maker_gff=run_A.gff), turn off repeat masking (rm_pass=1), and use previous mapping results (altest_pass=1 and protein_pass=1). 5) Create SNAP model from run B. 6) Run (run C) with the new SNAP (done at step 5) with options turned off (est2genome=0) and (protein2genome=0) data, provide gff file (maker_gff=run_B.gff), turn off repeat masking (rm_pass=1), and use previous mapping results (altest_pass=1 and protein_pass=1). 7) Create SNAP model from run C AND Create Augustus gene model from run C 8) Run (run D) with the new SNAP (done at step 7) + AUGUSTUS file (step 7) with options turned off (est2genome=0) and (protein2genome=0) data, provide gff file (maker_gff=run_C.gff), turn off repeat masking (rm_pass=1), and use previous mapping results (altest_pass=1 and protein_pass=1). + Use keep_preds=1 Does it seems coherent ? Cheers, Patrick Tran Van Groups Chapuisat, Robinson-Rechavi & Schwander Department of Ecology and Evolution University of Lausanne Le Biophore CH-1015 Lausanne Switzerland Office 3206 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dandence at gmail.com Mon Oct 2 07:17:57 2017 From: dandence at gmail.com (Daniel Ence) Date: Mon, 2 Oct 2017 09:17:57 -0400 Subject: [maker-devel] Error with Maker_functional_gff In-Reply-To: References: Message-ID: Hi Emmanuel, I think this script is expecting the file ?uniprot_sprot.fasta? downloaded from the uniprot download page at http://www.uniprot.org/downloads#uniprotkblink The fasta headers in this file are different from the fasta header that the file you used has: >sp|Q6GZX4|001R_FRG3G Putative transcription factor 001R OS=Frog virus 3 (isolate Goorha) GN=FV3-001R PE=4 SV=1 Let us know if that helps, Daniel > On Oct 2, 2017, at 1:03 AM, Emmanuel Nnadi wrote: > > Hello, > I intend to rename genes for Genebank submission > > I downloaded swissprot.fa from NCBI and used blast MAKER generated file to swissprot. > > the output of BLAST RESULT looks like this > snap_masked-contig_8151-processed-gene-0.8-mRNA-1 P10978.1 49.315 73 37 0 43 115 874 946 2.61e-14 71.6 > > I attempted to run maker_funtional_gff using the swissprot.fa downloaded and the blastp result > > I got the following result > > Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 139, <$IN> line 2897906. > Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 141, <$IN> line 2897906. > Can't parse details from FASTA header: >P11684.1 RecName: Full=Uteroglobin; AltName: Full=Clara cell phospholipid-binding protein; Short=CCPBP; AltName: Full=Clara cells 10 kDa secretory protein; Short=CC10; AltName: Full=Secretoglobin family 1A member 1; AltName: Full=Urinary protein 1; Short=UP-1; Short=UP1; Short=Urine protein 1; Flags: Precursor > > > Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 139, <$IN> line 1608599. > Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 141, <$IN> line 1608599. > Can't parse details from FASTA header: >Q9HZU2.1 RecName: Full=Precorrin-8X methylmutase; AltName: Full=HBA synthase; AltName: Full=Precorrin isomerase > > What can I do? > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Oct 2 07:30:43 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 2 Oct 2017 09:30:43 -0400 Subject: [maker-devel] question on gene numbers with quality_filter.pl In-Reply-To: <16C1890A-2042-4BE1-93CE-8A8DC0C18151@ad.unc.edu> References: <16C1890A-2042-4BE1-93CE-8A8DC0C18151@ad.unc.edu> Message-ID: <30C718DD-D3E5-4659-B83D-B9520DD20E34@gmail.com> Hi Chris, This is interesting. -d in quality_filter.pl should only filter out genes based on AED. Is there a chance that you counted transcripts instead of genes? If there is a transcript with an AED of 1 then quality filter should remove it but leave the gene and the transcripts with AEDs less than 1. I can have a look at it if you send me one of the genes (in GFF3 format) that was filtered out by quality_filter.pl even though it had an AED less than 1. Thanks, Mike > On Sep 29, 2017, at 1:20 PM, Willett, Christopher S wrote: > > Hello- > > We are getting to the final stages (hopefully) of a reannotation of a new assembly of a copepod genome using MAKER and we had some questions about which set of genes to use. Our latest runs were using Pfam domains to define default vs standard set using the quality_filter.pl script and I had a question about stringency of the filters for this script. It appears that the default is more stringent than the output that we get from MAKER without using this script (all with AED max set to 1). Are there additional filters in this script beyond AED that would cause this? > > Here is what we are seeing if more details would be helpful. With a run with or without the keep_pred turned our final MAKER run gives ~21500 predicted genes with or 15200 without the keep predictions turned on. What I was wondering about was why this 15200 is higher than the default set (which gives ~14500 genes) after we filter the gff using the -d setting in quality_filter.pl. For completeness the standard set (-s setting) is retaining ~14800 genes and if I filter the 15200 gff file with the default parameters that yields ~14100 genes. So I was curious what else was going on in the filter script beyond AED that would trim out genes? > > The genes sets look pretty good overall and seem like reasonable numbers so we were debating which set to use as our final set. I am also trying a few other analyses in InterProScan to see if that identifies additional genes beyond Pfam for retention but that seems a bit independent from the question above. > > Thanks for your help, > > Best, > > Chris Willett > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Research Associate Professor > Department of Biology > CB#3280 Coker Hall > University of North Carolina, Chapel Hill > Chapel Hill, NC, 27599-3280 > > Office: 2252 Genome Science Building > phone: > 919-843-8663 > fax: > 919-962-1625 > > http://labs.bio.unc.edu/Willett/ > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Oct 2 13:19:51 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 2 Oct 2017 15:19:51 -0400 Subject: [maker-devel] question on gene numbers with quality_filter.pl In-Reply-To: <4C24415C-8A2A-499F-A55A-0026F7D1329F@ad.unc.edu> References: <16C1890A-2042-4BE1-93CE-8A8DC0C18151@ad.unc.edu> <30C718DD-D3E5-4659-B83D-B9520DD20E34@gmail.com> <4C24415C-8A2A-499F-A55A-0026F7D1329F@ad.unc.edu> Message-ID: <0A5A51F2-C551-493B-943B-7F5F81C294BF@gmail.com> Hi Chris, Yeah By default MAKER shouldn?t keep any annotation with an AED of 1. I?ve ccd the dev list on this to see if anyone else has any idea why you might get AED 1 genes with keep_preds=0. Could you send me the maker_opts.ctl file for the run. There may be something informative in there. Thanks, Mike > On Oct 2, 2017, at 2:32 PM, Willett, Christopher S wrote: > > Hi Mike- > > I was looking at the lists of mRNAs and I think what is happening is that there are still genes retained in our initial output from MAKER that have an AED=1 that are then getting trimmed out of the filtered file. If I am setting the AED threshold equal to 1 in the control file for the MAKER run is that less than one or less than or equal to one for retention? Should these AED=1 genes be making it into the gene and mRNA pools if we have the keep predictions parameter set to 0? > > Thanks for your help, > > Best, > > Chris > > > >> On Oct 2, 2017, at 9:30 AM, Michael Campbell > wrote: >> >> Hi Chris, >> >> This is interesting. -d in quality_filter.pl should only filter out genes based on AED. Is there a chance that you counted transcripts instead of genes? If there is a transcript with an AED of 1 then quality filter should remove it but leave the gene and the transcripts with AEDs less than 1. I can have a look at it if you send me one of the genes (in GFF3 format) that was filtered out by quality_filter.pl even though it had an AED less than 1. >> >> Thanks, >> Mike >> >> >>> On Sep 29, 2017, at 1:20 PM, Willett, Christopher S > wrote: >>> >>> Hello- >>> >>> We are getting to the final stages (hopefully) of a reannotation of a new assembly of a copepod genome using MAKER and we had some questions about which set of genes to use. Our latest runs were using Pfam domains to define default vs standard set using the quality_filter.pl script and I had a question about stringency of the filters for this script. It appears that the default is more stringent than the output that we get from MAKER without using this script (all with AED max set to 1). Are there additional filters in this script beyond AED that would cause this? >>> >>> Here is what we are seeing if more details would be helpful. With a run with or without the keep_pred turned our final MAKER run gives ~21500 predicted genes with or 15200 without the keep predictions turned on. What I was wondering about was why this 15200 is higher than the default set (which gives ~14500 genes) after we filter the gff using the -d setting in quality_filter.pl. For completeness the standard set (-s setting) is retaining ~14800 genes and if I filter the 15200 gff file with the default parameters that yields ~14100 genes. So I was curious what else was going on in the filter script beyond AED that would trim out genes? >>> >>> The genes sets look pretty good overall and seem like reasonable numbers so we were debating which set to use as our final set. I am also trying a few other analyses in InterProScan to see if that identifies additional genes beyond Pfam for retention but that seems a bit independent from the question above. >>> >>> Thanks for your help, >>> >>> Best, >>> >>> Chris Willett >>> >>> >>> >>> >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> Research Associate Professor >>> Department of Biology >>> CB#3280 Coker Hall >>> University of North Carolina, Chapel Hill >>> Chapel Hill, NC, 27599-3280 >>> >>> Office: 2252 Genome Science Building >>> phone: >>> 919-843-8663 >>> fax: >>> 919-962-1625 >>> >>> http://labs.bio.unc.edu/Willett/ >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Oct 2 13:35:55 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 2 Oct 2017 15:35:55 -0400 Subject: [maker-devel] question on gene numbers with quality_filter.pl In-Reply-To: References: <16C1890A-2042-4BE1-93CE-8A8DC0C18151@ad.unc.edu> <30C718DD-D3E5-4659-B83D-B9520DD20E34@gmail.com> <4C24415C-8A2A-499F-A55A-0026F7D1329F@ad.unc.edu> <0A5A51F2-C551-493B-943B-7F5F81C294BF@gmail.com> Message-ID: <4C4E3DE7-CE28-4DF7-B234-E88701CAD172@gmail.com> Hi Chris, It?s this line here: model_gff=/proj/willetlb/users/cwillett/MAKER_analyses/dovetail_ann/SDv1.0_est-forward-SDv2.1.gff Anything passed to model_gff is treated as sacred by MAKER and will be kept regardless of AED. If you pass it in as pred_gff= then it will be subject to the AED filters. I hope this helps, Mike > On Oct 2, 2017, at 3:28 PM, Willett, Christopher S wrote: > > From daren.card at gmail.com Wed Oct 4 09:53:42 2017 From: daren.card at gmail.com (Daren C. Card) Date: Wed, 4 Oct 2017 10:53:42 -0500 Subject: [maker-devel] MAKER RepeatRunner error on long scaffolds only Message-ID: Hi all, I?ve been having an issue with MAKER (v. 2.31.8) that I haven?t been able to overcome, and no former questions have really addressed or helped fix the problem. I?ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size? I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can?t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I?m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library). Any help would be greatly appreciated. Daren Card University of Texas Arlington ################################################### doing blastx repeats running blast search. #--------- command -------------# Widget::blastx: /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner #-------------------------------# deleted:0 hits collecting blastx repeatmasking processing all repeats in cluster::shadow_cluster... Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. --> rank=3, hostname=moonunit0 ERROR: Failed while processing all repeats ERROR: Chunk failed at level:3, tier_type:1 FAILED CONTIG:scaffold-1 doing blastx repeats running blast search. #--------- command -------------# Widget::blastx: /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner #-------------------------------# ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:scaffold-1 deleted:0 hits deleted:0 hits ################################################### From carsonhh at gmail.com Wed Oct 4 10:03:52 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 4 Oct 2017 10:03:52 -0600 Subject: [maker-devel] MAKER RepeatRunner error on long scaffolds only In-Reply-To: References: Message-ID: <2460BB61-C918-40B5-ABF2-03193BF13CCC@gmail.com> The point where it dies is because there is no start/end coordinate for one of the alignments. The issue can either be with the GFF3 you gave it or is a truncated BLAST report. Recently there have been a number of weird BLAST+ issues related to truncated reports. Updating to 2.6+ seems to solve it for most people. There is also a 2.6 update for rmblast inside RepeatMasker. I submitted a bug report and example set to BLAST a few months ago. ?Carson > On Oct 4, 2017, at 9:53 AM, Daren C. Card wrote: > > Hi all, > > I?ve been having an issue with MAKER (v. 2.31.8) that I haven?t been able to overcome, and no former questions have really addressed or helped fix the problem. I?ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size? > > I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can?t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I?m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library). > > Any help would be greatly appreciated. > Daren Card > > University of Texas Arlington > > ################################################### > doing blastx repeats > running blast search. > #--------- command -------------# > Widget::blastx: > /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner > #-------------------------------# > deleted:0 hits > collecting blastx repeatmasking > processing all repeats > in cluster::shadow_cluster... > Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. > --> rank=3, hostname=moonunit0 > ERROR: Failed while processing all repeats > ERROR: Chunk failed at level:3, tier_type:1 > FAILED CONTIG:scaffold-1 > > doing blastx repeats > running blast search. > #--------- command -------------# > Widget::blastx: > /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner > #-------------------------------# > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:scaffold-1 > > deleted:0 hits > deleted:0 hits > ################################################### > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From qwzhang0601 at gmail.com Wed Oct 4 16:31:09 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Wed, 4 Oct 2017 18:31:09 -0400 Subject: [maker-devel] About eAED Message-ID: Hello: I ran the maker2 pipeline and got the default gene sets (with AED<1). But I found there are several hundred genes with eAED 1. Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can be the reason of the great difference between AED and eAED. For this gene it has a very low AED score, is it still a reliable gene model if its eAED equals 1? >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 QI:75|0|0|1|0|0|2|111|35 Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Wed Oct 4 16:35:41 2017 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Thu, 5 Oct 2017 09:35:41 +1100 Subject: [maker-devel] About eAED In-Reply-To: References: Message-ID: Carson commented on this here https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ On 5 October 2017 at 09:31, Quanwei Zhang wrote: > Hello: > > I ran the maker2 pipeline and got the default gene sets (with AED<1). But > I found there are several hundred genes with eAED 1. > > Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can > be the reason of the great difference between AED and eAED. For this gene > it has a very low AED score, is it still a reliable gene model if its eAED > equals 1? > > >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 > QI:75|0|0|1|0|0|2|111|35 > > Thanks > > Best > Quanwei > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Oct 4 16:38:00 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 4 Oct 2017 16:38:00 -0600 Subject: [maker-devel] About eAED In-Reply-To: References: Message-ID: <77155DA5-6454-4B25-BCF6-DE6B077BA548@gmail.com> eAED is an extended AED calculation that does some inference about the evidence (i.e. checks reading frame and not just overlap, and may infer support for an exon if by splice sites are confirmed etc.). If eAED is 1 that means that while there is evidence supporting the model, the evidence is more likely to be spurious, so it may be a false model. ?Carson > On Oct 4, 2017, at 4:31 PM, Quanwei Zhang wrote: > > Hello: > > I ran the maker2 pipeline and got the default gene sets (with AED<1). But I found there are several hundred genes with eAED 1. > > Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can be the reason of the great difference between AED and eAED. For this gene it has a very low AED score, is it still a reliable gene model if its eAED equals 1? > > >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 QI:75|0|0|1|0|0|2|111|35 > > Thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Oct 4 16:39:52 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 4 Oct 2017 16:39:52 -0600 Subject: [maker-devel] About eAED In-Reply-To: References: Message-ID: <606958D2-D9BB-477D-ACE8-E096A9AD9666@gmail.com> This one is an even better explanation than the answer I just gave. Thank you. ?Carson > On Oct 4, 2017, at 4:35 PM, Xabier V?zquez-Campos wrote: > > Carson commented on this here > https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ > > On 5 October 2017 at 09:31, Quanwei Zhang > wrote: > Hello: > > I ran the maker2 pipeline and got the default gene sets (with AED<1). But I found there are several hundred genes with eAED 1. > > Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can be the reason of the great difference between AED and eAED. For this gene it has a very low AED score, is it still a reliable gene model if its eAED equals 1? > > >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 QI:75|0|0|1|0|0|2|111|35 > > Thanks > > Best > Quanwei > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > Xabier V?zquez-Campos, PhD > Research Associate > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From eennadi at gmail.com Sun Oct 1 23:03:01 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Mon, 2 Oct 2017 06:03:01 +0100 Subject: [maker-devel] Error with Maker_functional_gff Message-ID: Hello, I intend to rename genes for Genebank submission I downloaded swissprot.fa from NCBI and used blast MAKER generated file to swissprot. the output of BLAST RESULT looks like this snap_masked-contig_8151-processed-gene-0.8-mRNA-1 P10978.1 49.315 73 37 0 43 115 874 946 2.61e-14 71.6 I attempted to run maker_funtional_gff using the swissprot.fa downloaded and the blastp result I got the following result Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 139, <$IN> line 2897906. Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 141, <$IN> line 2897906. Can't parse details from FASTA header: >P11684.1 RecName: Full=Uteroglobin; AltName: Full=Clara cell phospholipid-binding protein; Short=CCPBP; AltName: Full=Clara cells 10 kDa secretory protein; Short=CC10; AltName: Full=Secretoglobin family 1A member 1; AltName: Full=Urinary protein 1; Short=UP-1; Short=UP1; Short=Urine protein 1; Flags: Precursor Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 139, <$IN> line 1608599. Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 141, <$IN> line 1608599. Can't parse details from FASTA header: >Q9HZU2.1 RecName: Full=Precorrin-8X methylmutase; AltName: Full=HBA synthase; AltName: Full=Precorrin isomerase What can I do? Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From willett4 at email.unc.edu Mon Oct 2 09:04:38 2017 From: willett4 at email.unc.edu (Willett, Christopher S) Date: Mon, 2 Oct 2017 15:04:38 +0000 Subject: [maker-devel] question on gene numbers with quality_filter.pl In-Reply-To: <30C718DD-D3E5-4659-B83D-B9520DD20E34@gmail.com> References: <16C1890A-2042-4BE1-93CE-8A8DC0C18151@ad.unc.edu> <30C718DD-D3E5-4659-B83D-B9520DD20E34@gmail.com> Message-ID: Hi Mike- Thanks for getting back to me. I was using the grep -cP '\tgene\t? syntax to count the numbers and it seems to be giving me the same numbers I got before when I was counting either the transcripts or the genes in the fasta output files from our original run. I will have to look at the files a bit more to see if I can find some examples of genes that fit what you are suggesting. Best, Chris On Oct 2, 2017, at 9:30 AM, Michael Campbell > wrote: Hi Chris, This is interesting. -d in quality_filter.pl should only filter out genes based on AED. Is there a chance that you counted transcripts instead of genes? If there is a transcript with an AED of 1 then quality filter should remove it but leave the gene and the transcripts with AEDs less than 1. I can have a look at it if you send me one of the genes (in GFF3 format) that was filtered out by quality_filter.pl even though it had an AED less than 1. Thanks, Mike On Sep 29, 2017, at 1:20 PM, Willett, Christopher S > wrote: Hello- We are getting to the final stages (hopefully) of a reannotation of a new assembly of a copepod genome using MAKER and we had some questions about which set of genes to use. Our latest runs were using Pfam domains to define default vs standard set using the quality_filter.pl script and I had a question about stringency of the filters for this script. It appears that the default is more stringent than the output that we get from MAKER without using this script (all with AED max set to 1). Are there additional filters in this script beyond AED that would cause this? Here is what we are seeing if more details would be helpful. With a run with or without the keep_pred turned our final MAKER run gives ~21500 predicted genes with or 15200 without the keep predictions turned on. What I was wondering about was why this 15200 is higher than the default set (which gives ~14500 genes) after we filter the gff using the -d setting in quality_filter.pl. For completeness the standard set (-s setting) is retaining ~14800 genes and if I filter the 15200 gff file with the default parameters that yields ~14100 genes. So I was curious what else was going on in the filter script beyond AED that would trim out genes? The genes sets look pretty good overall and seem like reasonable numbers so we were debating which set to use as our final set. I am also trying a few other analyses in InterProScan to see if that identifies additional genes beyond Pfam for retention but that seems a bit independent from the question above. Thanks for your help, Best, Chris Willett ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Research Associate Professor Department of Biology CB#3280 Coker Hall University of North Carolina, Chapel Hill Chapel Hill, NC, 27599-3280 Office: 2252 Genome Science Building phone: 919-843-8663 fax: 919-962-1625 http://labs.bio.unc.edu/Willett/ _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From willett4 at email.unc.edu Mon Oct 2 13:28:19 2017 From: willett4 at email.unc.edu (Willett, Christopher S) Date: Mon, 2 Oct 2017 19:28:19 +0000 Subject: [maker-devel] question on gene numbers with quality_filter.pl In-Reply-To: <0A5A51F2-C551-493B-943B-7F5F81C294BF@gmail.com> References: <16C1890A-2042-4BE1-93CE-8A8DC0C18151@ad.unc.edu> <30C718DD-D3E5-4659-B83D-B9520DD20E34@gmail.com> <4C24415C-8A2A-499F-A55A-0026F7D1329F@ad.unc.edu> <0A5A51F2-C551-493B-943B-7F5F81C294BF@gmail.com> Message-ID: Hi Mike- Here is the control file for the last run of MAKER with keep_preds=0 and here is an example of one mRNA retained from the gff file: Chromosome_6 maker mRNA 556000 557215 . + . ID=maker-Chromosome_6-exonerate_est2genome-gene-5.3-mRNA-1;Parent=maker-Chromosome_6-exonerate_est2genome-gene-5.3;Name=TCALIF_02833-PA;_AED=1.00;_eAED=1.00;_QI=15|0|0|0|1|1|2|75|338;score=100;Alias=TCALIF_02833-PA Thanks, Chris On Oct 2, 2017, at 3:19 PM, Michael Campbell > wrote: Hi Chris, Yeah By default MAKER shouldn?t keep any annotation with an AED of 1. I?ve ccd the dev list on this to see if anyone else has any idea why you might get AED 1 genes with keep_preds=0. Could you send me the maker_opts.ctl file for the run. There may be something informative in there. Thanks, Mike On Oct 2, 2017, at 2:32 PM, Willett, Christopher S > wrote: Hi Mike- I was looking at the lists of mRNAs and I think what is happening is that there are still genes retained in our initial output from MAKER that have an AED=1 that are then getting trimmed out of the filtered file. If I am setting the AED threshold equal to 1 in the control file for the MAKER run is that less than one or less than or equal to one for retention? Should these AED=1 genes be making it into the gene and mRNA pools if we have the keep predictions parameter set to 0? Thanks for your help, Best, Chris On Oct 2, 2017, at 9:30 AM, Michael Campbell > wrote: Hi Chris, This is interesting. -d in quality_filter.pl should only filter out genes based on AED. Is there a chance that you counted transcripts instead of genes? If there is a transcript with an AED of 1 then quality filter should remove it but leave the gene and the transcripts with AEDs less than 1. I can have a look at it if you send me one of the genes (in GFF3 format) that was filtered out by quality_filter.pl even though it had an AED less than 1. Thanks, Mike On Sep 29, 2017, at 1:20 PM, Willett, Christopher S > wrote: Hello- We are getting to the final stages (hopefully) of a reannotation of a new assembly of a copepod genome using MAKER and we had some questions about which set of genes to use. Our latest runs were using Pfam domains to define default vs standard set using the quality_filter.pl script and I had a question about stringency of the filters for this script. It appears that the default is more stringent than the output that we get from MAKER without using this script (all with AED max set to 1). Are there additional filters in this script beyond AED that would cause this? Here is what we are seeing if more details would be helpful. With a run with or without the keep_pred turned our final MAKER run gives ~21500 predicted genes with or 15200 without the keep predictions turned on. What I was wondering about was why this 15200 is higher than the default set (which gives ~14500 genes) after we filter the gff using the -d setting in quality_filter.pl. For completeness the standard set (-s setting) is retaining ~14800 genes and if I filter the 15200 gff file with the default parameters that yields ~14100 genes. So I was curious what else was going on in the filter script beyond AED that would trim out genes? The genes sets look pretty good overall and seem like reasonable numbers so we were debating which set to use as our final set. I am also trying a few other analyses in InterProScan to see if that identifies additional genes beyond Pfam for retention but that seems a bit independent from the question above. Thanks for your help, Best, Chris Willett ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Research Associate Professor Department of Biology CB#3280 Coker Hall University of North Carolina, Chapel Hill Chapel Hill, NC, 27599-3280 Office: 2252 Genome Science Building phone: 919-843-8663 fax: 919-962-1625 http://labs.bio.unc.edu/Willett/ _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl_full8 Type: application/octet-stream Size: 5617 bytes Desc: maker_opts.ctl_full8 URL: From qwzhang0601 at gmail.com Wed Oct 4 20:35:55 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Wed, 4 Oct 2017 22:35:55 -0400 Subject: [maker-devel] About eAED In-Reply-To: <606958D2-D9BB-477D-ACE8-E096A9AD9666@gmail.com> References: <606958D2-D9BB-477D-ACE8-E096A9AD9666@gmail.com> Message-ID: Thank you all. Most time, the AED is equal to or lower than eAED, but there are some genes whose eAED is smaller than AED. I feel the eAED is more stringent than AED. Would you give me an example, under what condition eAED can be smaller than AED? The default maker2 gene set includes all genes with AED less than 1. Do you think eAED is a better choice to filter gene models than AED? Best Quanwei 2017-10-04 18:39 GMT-04:00 Carson Holt : > This one is an even better explanation than the answer I just gave. Thank > you. > > ?Carson > > On Oct 4, 2017, at 4:35 PM, Xabier V?zquez-Campos > wrote: > > Carson commented on this here > https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ > > On 5 October 2017 at 09:31, Quanwei Zhang wrote: > >> Hello: >> >> I ran the maker2 pipeline and got the default gene sets (with AED<1). But >> I found there are several hundred genes with eAED 1. >> >> Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can >> be the reason of the great difference between AED and eAED. For this gene >> it has a very low AED score, is it still a reliable gene model if its eAED >> equals 1? >> >> >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 >> QI:75|0|0|1|0|0|2|111|35 >> >> Thanks >> >> Best >> Quanwei >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Oct 4 20:38:25 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 4 Oct 2017 20:38:25 -0600 Subject: [maker-devel] About eAED In-Reply-To: References: <606958D2-D9BB-477D-ACE8-E096A9AD9666@gmail.com> Message-ID: <5DEAC021-9925-4B41-9332-AB48685D7304@gmail.com> The previous linked comment explains in detail ?> https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ Basically the middle support of exon is inferred from edge support even though no overlap exists (so eAED infers support and AED does not). ?Carson > On Oct 4, 2017, at 8:35 PM, Quanwei Zhang wrote: > > Thank you all. Most time, the AED is equal to or lower than eAED, but there are some genes whose eAED is smaller than AED. I feel the eAED is more stringent than AED. Would you give me an example, under what condition eAED can be smaller than AED? > > The default maker2 gene set includes all genes with AED less than 1. Do you think eAED is a better choice to filter gene models than AED? > > Best > Quanwei > > > > 2017-10-04 18:39 GMT-04:00 Carson Holt >: > This one is an even better explanation than the answer I just gave. Thank you. > > ?Carson > >> On Oct 4, 2017, at 4:35 PM, Xabier V?zquez-Campos > wrote: >> >> Carson commented on this here >> https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ >> >> On 5 October 2017 at 09:31, Quanwei Zhang > wrote: >> Hello: >> >> I ran the maker2 pipeline and got the default gene sets (with AED<1). But I found there are several hundred genes with eAED 1. >> >> Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can be the reason of the great difference between AED and eAED. For this gene it has a very low AED score, is it still a reliable gene model if its eAED equals 1? >> >> >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 QI:75|0|0|1|0|0|2|111|35 >> >> Thanks >> >> Best >> Quanwei >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> >> -- >> Xabier V?zquez-Campos, PhD >> Research Associate >> NSW Systems Biology Initiative >> School of Biotechnology and Biomolecular Sciences >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Oct 4 20:43:28 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 4 Oct 2017 20:43:28 -0600 Subject: [maker-devel] About eAED In-Reply-To: <5DEAC021-9925-4B41-9332-AB48685D7304@gmail.com> References: <606958D2-D9BB-477D-ACE8-E096A9AD9666@gmail.com> <5DEAC021-9925-4B41-9332-AB48685D7304@gmail.com> Message-ID: eAED can be better for edge cases, but neither is perfect. Low AED generally correlates with better models. But a high AED does not mean the model doesn?t exist, it just means you should spend a little more time deciding if you really believe it or not. ?Carson > On Oct 4, 2017, at 8:38 PM, Carson Holt wrote: > > The previous linked comment explains in detail ?> https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ > > Basically the middle support of exon is inferred from edge support even though no overlap exists (so eAED infers support and AED does not). > > ?Carson > > >> On Oct 4, 2017, at 8:35 PM, Quanwei Zhang > wrote: >> >> Thank you all. Most time, the AED is equal to or lower than eAED, but there are some genes whose eAED is smaller than AED. I feel the eAED is more stringent than AED. Would you give me an example, under what condition eAED can be smaller than AED? >> >> The default maker2 gene set includes all genes with AED less than 1. Do you think eAED is a better choice to filter gene models than AED? >> >> Best >> Quanwei >> >> >> >> 2017-10-04 18:39 GMT-04:00 Carson Holt >: >> This one is an even better explanation than the answer I just gave. Thank you. >> >> ?Carson >> >>> On Oct 4, 2017, at 4:35 PM, Xabier V?zquez-Campos > wrote: >>> >>> Carson commented on this here >>> https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ >>> >>> On 5 October 2017 at 09:31, Quanwei Zhang > wrote: >>> Hello: >>> >>> I ran the maker2 pipeline and got the default gene sets (with AED<1). But I found there are several hundred genes with eAED 1. >>> >>> Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can be the reason of the great difference between AED and eAED. For this gene it has a very low AED score, is it still a reliable gene model if its eAED equals 1? >>> >>> >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 QI:75|0|0|1|0|0|2|111|35 >>> >>> Thanks >>> >>> Best >>> Quanwei >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >>> >>> -- >>> Xabier V?zquez-Campos, PhD >>> Research Associate >>> NSW Systems Biology Initiative >>> School of Biotechnology and Biomolecular Sciences >>> The University of New South Wales >>> Sydney NSW 2052 AUSTRALIA >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Wed Oct 4 21:25:24 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Wed, 4 Oct 2017 23:25:24 -0400 Subject: [maker-devel] About eAED In-Reply-To: References: <606958D2-D9BB-477D-ACE8-E096A9AD9666@gmail.com> <5DEAC021-9925-4B41-9332-AB48685D7304@gmail.com> Message-ID: Thanks for your explanation. Best Quanwei 2017-10-04 22:43 GMT-04:00 Carson Holt : > eAED can be better for edge cases, but neither is perfect. Low AED > generally correlates with better models. But a high AED does not mean the > model doesn?t exist, it just means you should spend a little more time > deciding if you really believe it or not. > > ?Carson > > > > On Oct 4, 2017, at 8:38 PM, Carson Holt wrote: > > The previous linked comment explains in detail ?> > https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ > > Basically the middle support of exon is inferred from edge support even > though no overlap exists (so eAED infers support and AED does not). > > ?Carson > > > On Oct 4, 2017, at 8:35 PM, Quanwei Zhang wrote: > > Thank you all. Most time, the AED is equal to or lower than eAED, but > there are some genes whose eAED is smaller than AED. I feel the eAED is > more stringent than AED. Would you give me an example, under what condition > eAED can be smaller than AED? > > The default maker2 gene set includes all genes with AED less than 1. Do > you think eAED is a better choice to filter gene models than AED? > > Best > Quanwei > > > > 2017-10-04 18:39 GMT-04:00 Carson Holt : > >> This one is an even better explanation than the answer I just gave. Thank >> you. >> >> ?Carson >> >> On Oct 4, 2017, at 4:35 PM, Xabier V?zquez-Campos >> wrote: >> >> Carson commented on this here >> https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa- >> ko/iC4KTuIitGEJ >> >> On 5 October 2017 at 09:31, Quanwei Zhang wrote: >> >>> Hello: >>> >>> I ran the maker2 pipeline and got the default gene sets (with AED<1). >>> But I found there are several hundred genes with eAED 1. >>> >>> Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can >>> be the reason of the great difference between AED and eAED. For this gene >>> it has a very low AED score, is it still a reliable gene model if its eAED >>> equals 1? >>> >>> >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 >>> QI:75|0|0|1|0|0|2|111|35 >>> >>> Thanks >>> >>> Best >>> Quanwei >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> >> >> -- >> Xabier V?zquez-Campos, *PhD* >> *Research Associate* >> NSW Systems Biology Initiative >> School of Biotechnology and Biomolecular Sciences >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dandence at gmail.com Thu Oct 5 08:00:21 2017 From: dandence at gmail.com (Daniel Ence) Date: Thu, 5 Oct 2017 10:00:21 -0400 Subject: [maker-devel] Error with Maker_functional_gff In-Reply-To: References: Message-ID: Hi Emmanuel, I can?t tell whether it?s will work from the blast lines that you sent. It will depend on the full headers in the fasta lines, which you?ll run after all the blasts are complete. Assembly isn?t really my expertise or the topic of this mailing list, but assembling your contigs into scaffolds would probably help your annotations by connecting some parts of genes that are broken across contigs, and will definitely help downstream analysis if you need to know which genes are located next to each other. How much improvement you can get by scaffolding depends on the type of sequence data you have. Each scaffolder makes assumptions and has requirements, and some assemblers like velvet and SOAPdenovo have scaffolding built into their algorithms. I?d recommend starting with a review like this one: http://www.sciencedirect.com/science/article/pii/S1672022912000095 ~Daniel > On Oct 2, 2017, at 10:47 AM, Emmanuel Nnadi wrote: > > Hello Daniel, > > Thanks for the tip, I was able to download uniprot_swiss.fa I am currently running the blast now > > it looks like this > > MUCPR_041061-RA sp|P10978|POLX_TOBAC 49.315 73 37 0 43 115 874 946 2.95e-14 71.6 > MUCPR_026643-RA sp|Q00451|PRF1_SOLLC 86.207 87 11 1 243 328 257 343 3.65e-32 126 > > Is it ok? > > I wish to ask, I did not assemble my contigs into scaffold before annotating would it affect the end result? > > I wish to assemble my sequence into scaffold can you advice on the best software to use? > > I attempted using SSPACE: a new stand-alone scaffolding tool for small and large genomes > but am having problem with the library. Funny enough the software does not have support to solve problems > > Thanks > > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > On Mon, Oct 2, 2017 at 2:17 PM, Daniel Ence > wrote: > Hi Emmanuel, I think this script is expecting the file ?uniprot_sprot.fasta? downloaded from the uniprot download page at http://www.uniprot.org/downloads#uniprotkblink > The fasta headers in this file are different from the fasta header that the file you used has: > >sp|Q6GZX4|001R_FRG3G Putative transcription factor 001R OS=Frog virus 3 (isolate Goorha) GN=FV3-001R PE=4 SV=1 > > Let us know if that helps, > Daniel > >> On Oct 2, 2017, at 1:03 AM, Emmanuel Nnadi > wrote: >> >> Hello, >> I intend to rename genes for Genebank submission >> >> I downloaded swissprot.fa from NCBI and used blast MAKER generated file to swissprot. >> >> the output of BLAST RESULT looks like this >> snap_masked-contig_8151-processed-gene-0.8-mRNA-1 P10978.1 49.315 73 37 0 43 115 874 946 2.61e-14 71.6 >> >> I attempted to run maker_funtional_gff using the swissprot.fa downloaded and the blastp result >> >> I got the following result >> >> Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 139, <$IN> line 2897906. >> Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 141, <$IN> line 2897906. >> Can't parse details from FASTA header: >P11684.1 RecName: Full=Uteroglobin; AltName: Full=Clara cell phospholipid-binding protein; Short=CCPBP; AltName: Full=Clara cells 10 kDa secretory protein; Short=CC10; AltName: Full=Secretoglobin family 1A member 1; AltName: Full=Urinary protein 1; Short=UP-1; Short=UP1; Short=Urine protein 1; Flags: Precursor >> >> >> Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 139, <$IN> line 1608599. >> Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 141, <$IN> line 1608599. >> Can't parse details from FASTA header: >Q9HZU2.1 RecName: Full=Precorrin-8X methylmutase; AltName: Full=HBA synthase; AltName: Full=Precorrin isomerase >> >> What can I do? >> >> >> Nnadi Nnaemeka Emmanuel >> Department of Microbiology, >> Faculty of Natural and Applied Science, >> Plateau State University, Bokkos, Plateau State, Nigeria. >> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daren.card at gmail.com Fri Oct 6 06:23:36 2017 From: daren.card at gmail.com (Daren C. Card) Date: Fri, 6 Oct 2017 07:23:36 -0500 Subject: [maker-devel] MAKER RepeatRunner error on long scaffolds only In-Reply-To: <2460BB61-C918-40B5-ABF2-03193BF13CCC@gmail.com> References: <2460BB61-C918-40B5-ABF2-03193BF13CCC@gmail.com> Message-ID: <52A27F91-063E-45C5-BEE0-BED0BF4E861E@gmail.com> Dear Carson, Thanks so much for the quick reply. I updated BLAST to v2.6 and reran the configure script for RepeatMasker. Looks like MAKER should natively work with the BLAST that is available in the $PATH. Unfortunately, I?m still getting the same error what appears to be at roughly the same spot (~child 226). I?ve copied the stderr below. I checked my GFF file and I don?t see any issues with coordinates. I?m going to try running without a GFF of repeat annotations to see what that does, but in the meantime I wanted to send an update and see if there is anything else I should look into. Thank you, Daren Card ################################################ doing repeat masking re reading repeat masker report. /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.simple.rb.out doing blastx repeats re reading blast report. /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.te_proteins%2Efasta.repeatrunner deleted:2 hits doing blastx repeats doing blastx repeats doing blastx repeats doing blastx repeats doing blastx repeats doing blastx repeats doing blastx repeats doing blastx repeats doing blastx repeats collecting blastx repeatmasking processing all repeats in cluster::shadow_cluster... Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. --> rank=NA, hostname=moonunit0 ERROR: Failed while processing all repeats ERROR: Chunk failed at level:3, tier_type:1 FAILED CONTIG:scaffold-1 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:scaffold-1 examining contents of the fasta file and run log ################################################ > On Oct 4, 2017, at 11:03 AM, Carson Holt wrote: > > The point where it dies is because there is no start/end coordinate for one of the alignments. The issue can either be with the GFF3 you gave it or is a truncated BLAST report. Recently there have been a number of weird BLAST+ issues related to truncated reports. Updating to 2.6+ seems to solve it for most people. There is also a 2.6 update for rmblast inside RepeatMasker. I submitted a bug report and example set to BLAST a few months ago. > > ?Carson > > >> On Oct 4, 2017, at 9:53 AM, Daren C. Card wrote: >> >> Hi all, >> >> I?ve been having an issue with MAKER (v. 2.31.8) that I haven?t been able to overcome, and no former questions have really addressed or helped fix the problem. I?ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size? >> >> I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can?t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I?m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library). >> >> Any help would be greatly appreciated. >> Daren Card >> >> University of Texas Arlington >> >> ################################################### >> doing blastx repeats >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner >> #-------------------------------# >> deleted:0 hits >> collecting blastx repeatmasking >> processing all repeats >> in cluster::shadow_cluster... >> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >> --> rank=3, hostname=moonunit0 >> ERROR: Failed while processing all repeats >> ERROR: Chunk failed at level:3, tier_type:1 >> FAILED CONTIG:scaffold-1 >> >> doing blastx repeats >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner >> #-------------------------------# >> ERROR: Chunk failed at level:2, tier_type:0 >> FAILED CONTIG:scaffold-1 >> >> deleted:0 hits >> deleted:0 hits >> ################################################### >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From eennadi at gmail.com Sat Oct 7 15:34:46 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Sat, 7 Oct 2017 22:34:46 +0100 Subject: [maker-devel] jbrowse not working Message-ID: Please, I ran the command line maker2jbrowse muc1_genome_snap2.all.gff The command created some folders. However, at the end it read No reference sequences defined in configuration, nothing to do. Please what does it mean? How can I view it in jbrowse. Thanks Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun Oct 8 18:37:12 2017 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 8 Oct 2017 18:37:12 -0600 Subject: [maker-devel] MAKER RepeatRunner error on long scaffolds only In-Reply-To: <52A27F91-063E-45C5-BEE0-BED0BF4E861E@gmail.com> References: <2460BB61-C918-40B5-ABF2-03193BF13CCC@gmail.com> <52A27F91-063E-45C5-BEE0-BED0BF4E861E@gmail.com> Message-ID: <228ECD18-7B0E-47EB-9F58-FA3C31421A52@gmail.com> MAKER will use whatever blast is indicated in maker_exe.ctl, so make sure the new installation is the one indicated there. RepeatRunner is not part of RepeatMasker, and is a separate step that is essentially just a modified BLASTX against a protein database. So the standard NCBI blast+ installation is what gets used for that (not RMBLAST). The error you get is because the BLAST report is truncated. At the top of a BLAST report there is a summary of results, and then below there are details about each result. What is happening is that there are results in the top summary that are not being found in the bottom detail section. If Updating to BLAST+ 2.6 does not fix it for you, you may need to drop to legacy NCBI BLAST (i.e. the one that is not the BLAST+ rewrite). Here ?> ftp://ftp.ncbi.nlm.nih.gov/blast/executables/legacy/2.2.26/ ?Carson > On Oct 6, 2017, at 6:23 AM, Daren C. Card wrote: > > Dear Carson, > > Thanks so much for the quick reply. I updated BLAST to v2.6 and reran the configure script for RepeatMasker. Looks like MAKER should natively work with the BLAST that is available in the $PATH. > > Unfortunately, I?m still getting the same error what appears to be at roughly the same spot (~child 226). I?ve copied the stderr below. I checked my GFF file and I don?t see any issues with coordinates. I?m going to try running without a GFF of repeat annotations to see what that does, but in the meantime I wanted to send an update and see if there is anything else I should look into. > > Thank you, > Daren Card > > > ################################################ > doing repeat masking > re reading repeat masker report. > /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.simple.rb.out > doing blastx repeats > re reading blast report. > /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.te_proteins%2Efasta.repeatrunner > deleted:2 hits > doing blastx repeats > doing blastx repeats > doing blastx repeats > doing blastx repeats > doing blastx repeats > doing blastx repeats > doing blastx repeats > doing blastx repeats > doing blastx repeats > collecting blastx repeatmasking > processing all repeats > in cluster::shadow_cluster... > Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. > --> rank=NA, hostname=moonunit0 > ERROR: Failed while processing all repeats > ERROR: Chunk failed at level:3, tier_type:1 > FAILED CONTIG:scaffold-1 > > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:scaffold-1 > > examining contents of the fasta file and run log > ################################################ > > > >> On Oct 4, 2017, at 11:03 AM, Carson Holt wrote: >> >> The point where it dies is because there is no start/end coordinate for one of the alignments. The issue can either be with the GFF3 you gave it or is a truncated BLAST report. Recently there have been a number of weird BLAST+ issues related to truncated reports. Updating to 2.6+ seems to solve it for most people. There is also a 2.6 update for rmblast inside RepeatMasker. I submitted a bug report and example set to BLAST a few months ago. >> >> ?Carson >> >> >>> On Oct 4, 2017, at 9:53 AM, Daren C. Card wrote: >>> >>> Hi all, >>> >>> I?ve been having an issue with MAKER (v. 2.31.8) that I haven?t been able to overcome, and no former questions have really addressed or helped fix the problem. I?ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size? >>> >>> I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can?t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I?m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library). >>> >>> Any help would be greatly appreciated. >>> Daren Card >>> >>> University of Texas Arlington >>> >>> ################################################### >>> doing blastx repeats >>> running blast search. >>> #--------- command -------------# >>> Widget::blastx: >>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner >>> #-------------------------------# >>> deleted:0 hits >>> collecting blastx repeatmasking >>> processing all repeats >>> in cluster::shadow_cluster... >>> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >>> --> rank=3, hostname=moonunit0 >>> ERROR: Failed while processing all repeats >>> ERROR: Chunk failed at level:3, tier_type:1 >>> FAILED CONTIG:scaffold-1 >>> >>> doing blastx repeats >>> running blast search. >>> #--------- command -------------# >>> Widget::blastx: >>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner >>> #-------------------------------# >>> ERROR: Chunk failed at level:2, tier_type:0 >>> FAILED CONTIG:scaffold-1 >>> >>> deleted:0 hits >>> deleted:0 hits >>> ################################################### >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Oct 9 18:35:49 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 9 Oct 2017 18:35:49 -0600 Subject: [maker-devel] jbrowse not working In-Reply-To: References: Message-ID: <83AFE420-D54D-4CE8-833F-DE6CCC34A229@gmail.com> Is muc1_genome_snap2.all.gff missing embedded fasta entries at the end of the file? That can happen if you use the -n option with gff3_merge. Alternatively it?s possible one of the individual contig gff3 used to build the merged gff3 is truncated. If that is the case then gff3_merge should have thrown some sort of error or warning when you run it. Thanks, Carson > On Oct 7, 2017, at 3:34 PM, Emmanuel Nnadi wrote: > > Please, > I ran the command line > > maker2jbrowse muc1_genome_snap2.all.gff > > The command created some folders. However, at the end it read > No reference sequences defined in configuration, nothing to do. > > Please what does it mean? How can I view it in jbrowse. > > Thanks > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From eennadi at gmail.com Mon Oct 9 22:42:35 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Tue, 10 Oct 2017 05:42:35 +0100 Subject: [maker-devel] jbrowse not working In-Reply-To: <83AFE420-D54D-4CE8-833F-DE6CCC34A229@gmail.com> References: <83AFE420-D54D-4CE8-833F-DE6CCC34A229@gmail.com> Message-ID: Hi Carson Thanks for the reply I generated the off with this command gff3_merge ?d dpp_contig.maker.output/dpp_contig_master_datastore_index.log I had to rerun browse with the following command maker2jbrowse /Users/emmannaemeka/desktop/Gpm/maker/data/mucuna/muc1_genome_snap2.maker.output/muc1_genome_snap2.functional_blast.gff\maker2jbrowse -d /Users/emmannaemeka/Desktop/Gpm/maker/data/mucuna/muc1_genome_snap2.maker.output/muc1_genome_snap2_master_datastore_index.log \-out /Library/WebServer/Documents/JBrowse-1.12.1/muc/muc_jb Although its showing WARNING: No matching features found for mRNA I don't know what it means I don't understand what it means Successfully, I was able to setup the jbrowse local host. I had to move the jbrowse folder to my local host The jbrowse is up and running however, I have about 18488 contigs only 31 contigs are showing, how can i make all my contigs to show on jbrowse? Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications On Tue, Oct 10, 2017 at 1:35 AM, Carson Holt wrote: > Is muc1_genome_snap2.all.gff missing embedded fasta entries at the end of > the file? That can happen if you use the -n option with gff3_merge. > Alternatively it?s possible one of the individual contig gff3 used to build > the merged gff3 is truncated. If that is the case then gff3_merge should > have thrown some sort of error or warning when you run it. > > Thanks, > Carson > > > > > On Oct 7, 2017, at 3:34 PM, Emmanuel Nnadi wrote: > > Please, > I ran the command line > > maker2jbrowse muc1_genome_snap2.all.gff > > The command created some folders. However, at the end it read > No reference sequences defined in configuration, nothing to do. > > Please what does it mean? How can I view it in jbrowse. > > Thanks > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/ > publications > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at nbis.se Tue Oct 10 03:24:34 2017 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Tue, 10 Oct 2017 11:24:34 +0200 Subject: [maker-devel] MAKER annotation submission (EMBLmyGFF3) Message-ID: <967873FE-D61F-4233-A004-C877A60A2AC1@nbis.se> Hi MAKER users, I take advantage to this mailing list to share a tool that I hope will be useful for MAKER's users. One of the steps once we are happy of our wonderful annotation is to submit it to the public archives through one of the three INSDC databases (EMBL-EBI / NCBI / DDBJ). We developed EMBLmyGFF3, allowing to easily convert any kind of GFF3 annotation to the EMBL flat file format in order to submit to the European Nucleotide Archive (ENA) Database that is part of EMBL-EBI. It works well, amongst others, with the MAKER annotation output. We hope the tool will ease the submission process of your annotations. You will find it here: https://github.com/NBISweden/EMBLmyGFF3 A typical usage case will look like that (where ERSXXXXXX and PRJXXXXXX are the accession number and the project ID provided by EMBL-EBI prior to any submission): ./EMBLmyGFF3.py maker.gff3 maker.fa --data_class STD --topology linear --molecule_type 'genomic DNA' --table 1 --species 'Drosophila melanogaster (fly)' --taxonomy INV --accession ERSXXXXXXX --project_id PRJXXXXXXX --rg MYGROUP -o result.embl Best regards, Jacques Dainat, PhD --------------------------------------- NBIS (National Bioinformatics Infrastructure Sweden) Genome Annotation Service --------------------------------------- Uppsala University, Biomedicinska Centrum Department of Medical Biochemistry Microbiology, Genomics -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcsimenc at gmail.com Wed Oct 11 08:53:36 2017 From: mcsimenc at gmail.com (Matt Simenc) Date: Wed, 11 Oct 2017 07:53:36 -0700 Subject: [maker-devel] QI codes insufficient - how to get frac exons with EST only? Message-ID: Hey MAKER people, I would like to make a Venn diagram showing the kinds of evidence supporting gene models in my MAKER annotation where the left side shows number of genes with EST support only, the right side shows number of genes with protein support only, and the intersection shows number of genes with EST and protein support. QI summary has: Fraction of exons that overlap an EST alignment Fraction of exons that overlap EST or Protein alignments Please correct me if I'm wrong, because I am interpreting the first to be fraction of exons that overlap an EST alignment and possibly also a protein alignment. If that is the case then we can't calculate the number of genes that overlap only EST or (EST and protein) from the QI information. Anyone have a way to do this or have a script to parse the MAKER GFF3 to get this? Thanks!!! Matt Simenc -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Wed Oct 11 09:18:54 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Wed, 11 Oct 2017 11:18:54 -0400 Subject: [maker-devel] QI codes insufficient - how to get frac exons with EST only? In-Reply-To: References: Message-ID: <93934B45-909D-48FD-A840-B4F59F15AB53@gmail.com> Hi Matt, I have a hacky way that I?ve done it. It requires running MAKER two more times but they are quicker runs. To identify the genes that have protein support I pass all of the annotation back to MAKER using the model_gff option in the maker_opts.ctl file. Then I pull out all of the protein2genome features from the big MAKER GFF3 file and pass them in using the protein_gff option. I turn off all repeat masking and run MAKER. It runs fast because it doesn?t have to run any gene finders, align evidence, or repeatmask. In the output any gene with an AED less than 1 has protein support. Then I do the same thing with est2genome lines from the big GFF3 file and put them in as est_gff. The output of that one gives you genes with EST support. Then the genes with an AED of less than one in both sets have support from protein and EST. Hope this helps, Mike > On Oct 11, 2017, at 10:53 AM, Matt Simenc wrote: > > Hey MAKER people, > > I would like to make a Venn diagram showing the kinds of evidence supporting gene models in my MAKER annotation where the left side shows number of genes with EST support only, the right side shows number of genes with protein support only, and the intersection shows number of genes with EST and protein support. > > QI summary has: > > Fraction of exons that overlap an EST alignment > Fraction of exons that overlap EST or Protein alignments > > Please correct me if I'm wrong, because I am interpreting the first to be fraction of exons that overlap an EST alignment and possibly also a protein alignment. If that is the case then we can't calculate the number of genes that overlap only EST or (EST and protein) from the QI information. > > Anyone have a way to do this or have a script to parse the MAKER GFF3 to get this? > > Thanks!!! > Matt Simenc > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Oct 11 09:22:54 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 11 Oct 2017 09:22:54 -0600 Subject: [maker-devel] QI codes insufficient - how to get frac exons with EST only? In-Reply-To: <93934B45-909D-48FD-A840-B4F59F15AB53@gmail.com> References: <93934B45-909D-48FD-A840-B4F59F15AB53@gmail.com> Message-ID: <6A3091A3-5F0E-470D-89F3-4B6C16E50F4B@gmail.com> Also look at GAL for building GFF3 feature queries ?> https://github.com/The-Sequence-Ontology/GAL ?Carson > On Oct 11, 2017, at 9:18 AM, Michael Campbell wrote: > > Hi Matt, > > I have a hacky way that I?ve done it. It requires running MAKER two more times but they are quicker runs. > > To identify the genes that have protein support I pass all of the annotation back to MAKER using the model_gff option in the maker_opts.ctl file. Then I pull out all of the protein2genome features from the big MAKER GFF3 file and pass them in using the protein_gff option. I turn off all repeat masking and run MAKER. It runs fast because it doesn?t have to run any gene finders, align evidence, or repeatmask. In the output any gene with an AED less than 1 has protein support. Then I do the same thing with est2genome lines from the big GFF3 file and put them in as est_gff. The output of that one gives you genes with EST support. Then the genes with an AED of less than one in both sets have support from protein and EST. > > Hope this helps, > Mike > >> On Oct 11, 2017, at 10:53 AM, Matt Simenc wrote: >> >> Hey MAKER people, >> >> I would like to make a Venn diagram showing the kinds of evidence supporting gene models in my MAKER annotation where the left side shows number of genes with EST support only, the right side shows number of genes with protein support only, and the intersection shows number of genes with EST and protein support. >> >> QI summary has: >> >> Fraction of exons that overlap an EST alignment >> Fraction of exons that overlap EST or Protein alignments >> >> Please correct me if I'm wrong, because I am interpreting the first to be fraction of exons that overlap an EST alignment and possibly also a protein alignment. If that is the case then we can't calculate the number of genes that overlap only EST or (EST and protein) from the QI information. >> >> Anyone have a way to do this or have a script to parse the MAKER GFF3 to get this? >> >> Thanks!!! >> Matt Simenc >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcsimenc at gmail.com Wed Oct 11 22:19:04 2017 From: mcsimenc at gmail.com (Matt Simenc) Date: Wed, 11 Oct 2017 21:19:04 -0700 Subject: [maker-devel] QI codes insufficient - how to get frac exons with EST only? In-Reply-To: <6A3091A3-5F0E-470D-89F3-4B6C16E50F4B@gmail.com> References: <93934B45-909D-48FD-A840-B4F59F15AB53@gmail.com> <6A3091A3-5F0E-470D-89F3-4B6C16E50F4B@gmail.com> Message-ID: Very good, thank you! Matt On Wed, Oct 11, 2017 at 8:22 AM, Carson Holt wrote: > Also look at GAL for building GFF3 feature queries ?> > https://github.com/The-Sequence-Ontology/GAL > > ?Carson > > > > > On Oct 11, 2017, at 9:18 AM, Michael Campbell < > michael.s.campbell1 at gmail.com> wrote: > > Hi Matt, > > I have a hacky way that I?ve done it. It requires running MAKER two more > times but they are quicker runs. > > To identify the genes that have protein support I pass all of the > annotation back to MAKER using the model_gff option in the maker_opts.ctl > file. Then I pull out all of the protein2genome features from the big MAKER > GFF3 file and pass them in using the protein_gff option. I turn off all > repeat masking and run MAKER. It runs fast because it doesn?t have to run > any gene finders, align evidence, or repeatmask. In the output any gene > with an AED less than 1 has protein support. Then I do the same thing with > est2genome lines from the big GFF3 file and put them in as est_gff. The > output of that one gives you genes with EST support. Then the genes with an > AED of less than one in both sets have support from protein and EST. > > Hope this helps, > Mike > > On Oct 11, 2017, at 10:53 AM, Matt Simenc wrote: > > Hey MAKER people, > > I would like to make a Venn diagram showing the kinds of evidence > supporting gene models in my MAKER annotation where the left side shows > number of genes with EST support only, the right side shows number of genes > with protein support only, and the intersection shows number of genes with > EST and protein support. > > QI summary has: > > Fraction of exons that overlap an EST alignment > Fraction of exons that overlap EST or Protein alignments > > Please correct me if I'm wrong, because I am interpreting the first to be > fraction of exons that overlap an EST alignment and possibly also a protein > alignment. If that is the case then we can't calculate the number of genes > that overlap only EST or (EST and protein) from the QI information. > > Anyone have a way to do this or have a script to parse the MAKER GFF3 to > get this? > > Thanks!!! > Matt Simenc > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott at scottcain.net Thu Oct 12 17:33:05 2017 From: scott at scottcain.net (Scott Cain) Date: Thu, 12 Oct 2017 19:33:05 -0400 Subject: [maker-devel] GMOD hackathon before PAG San Diego in January Message-ID: Hi all, This January before PAG on the Wednesday and Thursday before PAG (January 10-11) in San Diego we are planning a GMOD hackathon. We expect that participants will be interested in solving problems/creating solutions related to Tripal, JBrowse, Apollo, and Galaxy but if you're interested in another GMOD project, by all means, let us know! We expect this hackathon to overlap with the Tripal hackathon that is on January 11 (I'm pretty sure; right Stephen?) If you are interested in attending this hackathon, please let me know so I can be sure we have an appropriately sized space. And if you're coming for the pre-PAG hackathon, consider staying for PAG, since there is always a lot of GMOD-related content at the meeting! Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From daren.card at gmail.com Thu Oct 12 20:22:54 2017 From: daren.card at gmail.com (Daren C. Card) Date: Thu, 12 Oct 2017 21:22:54 -0500 Subject: [maker-devel] MAKER RepeatRunner error on long scaffolds only In-Reply-To: <228ECD18-7B0E-47EB-9F58-FA3C31421A52@gmail.com> References: <2460BB61-C918-40B5-ABF2-03193BF13CCC@gmail.com> <52A27F91-063E-45C5-BEE0-BED0BF4E861E@gmail.com> <228ECD18-7B0E-47EB-9F58-FA3C31421A52@gmail.com> Message-ID: <90B18E05-63DB-4458-BC9B-807972BE1414@gmail.com> Hi Carson, Thanks for the help. Issue is still lingering. I?ve tried my full ?ideal? run using both the BLAST legacy 2.2.26 and also 2.6 and get the same error, so doesn?t seem to be a BLAST issue. Or is one that won?t be easy to overcome. Using BLAST v. 2.6, I tried some more runs turning off RepeatRunner or excluding the complex repeat GFF I?m trying to supply. Seems to be running fine without my GFF, which indicates to me that the issue is this file and not BLAST. Disclaimer: I didn?t run the entire scaffold since it is quite large, but it went well past the point at which it was otherwise failing which leads me to believe it would finish okay. I validated the GFF at http://genometools.org/cgi-bin/gff3validator.cgi. I had previously had <10 negative start coordinates for the repeat coordinates in the attributes field of the GFF, which I just set to 1 to give a clean GFF. This was what I used for the runs I described above, so whatever issue there is with this GFF is a mystery to me. What advice do you have for further troubleshooting to try to determine what part of the GFF is causing the issue? I don?t see any obvious way info about how the sequence or the GFF is partitioned up for the annotation among the output files produced, so any help you can provide would be great. Hoping I can resolve this as maybe this is useful to others. Weird that I?m getting this error, as I?ve annotated several other genomes in a similar manner and never had this issue. They were less contiguous, but can?t imagine that really mattering. Thanks, Daren > On Oct 8, 2017, at 7:37 PM, Carson Holt wrote: > > MAKER will use whatever blast is indicated in maker_exe.ctl, so make sure the new installation is the one indicated there. RepeatRunner is not part of RepeatMasker, and is a separate step that is essentially just a modified BLASTX against a protein database. So the standard NCBI blast+ installation is what gets used for that (not RMBLAST). > > The error you get is because the BLAST report is truncated. At the top of a BLAST report there is a summary of results, and then below there are details about each result. What is happening is that there are results in the top summary that are not being found in the bottom detail section. If Updating to BLAST+ 2.6 does not fix it for you, you may need to drop to legacy NCBI BLAST (i.e. the one that is not the BLAST+ rewrite). Here ?> ftp://ftp.ncbi.nlm.nih.gov/blast/executables/legacy/2.2.26/ > > ?Carson > > > > > >> On Oct 6, 2017, at 6:23 AM, Daren C. Card wrote: >> >> Dear Carson, >> >> Thanks so much for the quick reply. I updated BLAST to v2.6 and reran the configure script for RepeatMasker. Looks like MAKER should natively work with the BLAST that is available in the $PATH. >> >> Unfortunately, I?m still getting the same error what appears to be at roughly the same spot (~child 226). I?ve copied the stderr below. I checked my GFF file and I don?t see any issues with coordinates. I?m going to try running without a GFF of repeat annotations to see what that does, but in the meantime I wanted to send an update and see if there is anything else I should look into. >> >> Thank you, >> Daren Card >> >> >> ################################################ >> doing repeat masking >> re reading repeat masker report. >> /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.simple.rb.out >> doing blastx repeats >> re reading blast report. >> /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.te_proteins%2Efasta.repeatrunner >> deleted:2 hits >> doing blastx repeats >> doing blastx repeats >> doing blastx repeats >> doing blastx repeats >> doing blastx repeats >> doing blastx repeats >> doing blastx repeats >> doing blastx repeats >> doing blastx repeats >> collecting blastx repeatmasking >> processing all repeats >> in cluster::shadow_cluster... >> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >> --> rank=NA, hostname=moonunit0 >> ERROR: Failed while processing all repeats >> ERROR: Chunk failed at level:3, tier_type:1 >> FAILED CONTIG:scaffold-1 >> >> ERROR: Chunk failed at level:2, tier_type:0 >> FAILED CONTIG:scaffold-1 >> >> examining contents of the fasta file and run log >> ################################################ >> >> >> >>> On Oct 4, 2017, at 11:03 AM, Carson Holt wrote: >>> >>> The point where it dies is because there is no start/end coordinate for one of the alignments. The issue can either be with the GFF3 you gave it or is a truncated BLAST report. Recently there have been a number of weird BLAST+ issues related to truncated reports. Updating to 2.6+ seems to solve it for most people. There is also a 2.6 update for rmblast inside RepeatMasker. I submitted a bug report and example set to BLAST a few months ago. >>> >>> ?Carson >>> >>> >>>> On Oct 4, 2017, at 9:53 AM, Daren C. Card wrote: >>>> >>>> Hi all, >>>> >>>> I?ve been having an issue with MAKER (v. 2.31.8) that I haven?t been able to overcome, and no former questions have really addressed or helped fix the problem. I?ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size? >>>> >>>> I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can?t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I?m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library). >>>> >>>> Any help would be greatly appreciated. >>>> Daren Card >>>> >>>> University of Texas Arlington >>>> >>>> ################################################### >>>> doing blastx repeats >>>> running blast search. >>>> #--------- command -------------# >>>> Widget::blastx: >>>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner >>>> #-------------------------------# >>>> deleted:0 hits >>>> collecting blastx repeatmasking >>>> processing all repeats >>>> in cluster::shadow_cluster... >>>> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >>>> --> rank=3, hostname=moonunit0 >>>> ERROR: Failed while processing all repeats >>>> ERROR: Chunk failed at level:3, tier_type:1 >>>> FAILED CONTIG:scaffold-1 >>>> >>>> doing blastx repeats >>>> running blast search. >>>> #--------- command -------------# >>>> Widget::blastx: >>>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner >>>> #-------------------------------# >>>> ERROR: Chunk failed at level:2, tier_type:0 >>>> FAILED CONTIG:scaffold-1 >>>> >>>> deleted:0 hits >>>> deleted:0 hits >>>> ################################################### >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> > From robert.zimmermann at univie.ac.at Wed Oct 11 13:42:14 2017 From: robert.zimmermann at univie.ac.at (Bob Zimmermann) Date: Wed, 11 Oct 2017 21:42:14 +0200 Subject: [maker-devel] custom "ab initio" predictions with automatic hint-based predictions Message-ID: Hello, I would like to run maker with a custom set of ab initio predictions (based on hints given to augustus from RNAseq data), but allowing it to incorporate EST and protein data to make an additional run of augustus using hints derived from those alignments. My gene prediction section of the maker_opts.ctl file looks like this: ... augustus_species=all_combined #Augustus gene prediction species model ... pred_gff=../ab_initio_predictions/all_combined.augustus_masked.gff3 #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no ? It seems as though even if pred_gff is set, augustus will still be run for ab initio predictions with no hints if an augustus_species setting is present. I was curious if there was any way around this, partly because custom ab initios could improve my annotation and also because the ab initio step can take long. Thanks for your help! Bob From xvazquezc at gmail.com Thu Oct 12 00:09:32 2017 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Thu, 12 Oct 2017 17:09:32 +1100 Subject: [maker-devel] choosing the right gene model Message-ID: Hi there, I was visualising the annotations and I realised that in some cases, what it seems to be a gene is splitted according to one of the gene models, despite that the other 2, est2genome and prot2genome suggest that it isn't the case. Although the opposite also happens. ? For some reason, the "out of place" model is always (or almost) the one from Genemark. How much weight does carry the RNAseq and protein data on this decision (if any)? How exactly is the final gene selected? Cheers, Xabi -- Xabier V?zquez-Campos, *PhD* *Research Associate* NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: split-gene.png Type: image/png Size: 66389 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: merged-gene.png Type: image/png Size: 63815 bytes Desc: not available URL: From jan.nagel at fabi.up.ac.za Thu Oct 12 01:37:07 2017 From: jan.nagel at fabi.up.ac.za (Jan FABI) Date: Thu, 12 Oct 2017 09:37:07 +0200 Subject: [maker-devel] Maker problem Message-ID: Dear Maker team I am experiencing a problem while running maker and cannot find a solution to it online. I am running maker on a new genome, using BRAKER trained models for Augustus and GeneMark. This was successful and performed as expected, except for one contig where an error was encountered. This error occurs during Augustus and seems to have something to do with intron models. I have made sure that the input fasta does not contain characters other than ATCGN or contains "windows"/non-UNIX carriage returns. I include the relevant portion of the log below. Could you help me determine the cause of this error. setting up GFF3 output and fasta chunks preparing ab-inits running augustus. #--------- command -------------# Widget::augustus: /home/xxx/Desktop/programs/augustus-3.2.3/bin/augustus --species=Np_2017_braker --UTR=off /tmp/maker_bQo5Oc/NODE_1040_length_26483_cov_27%2E125137.abinit_masked.0 > /tmp/maker_bQo5Oc/NODE_1040_length_26483_cov_27%2E125137.abinit_masked.0.Np_2017_braker.augustus #-------------------------------# Sampling error in intron model. state=37 base=26570 /home/xxx/Desktop/programs/augustus-3.2.3/bin/augustus: ERROR Tried to sample from empty list. Sampling error in intron model. state=37 base=26570 /home/xxx/Desktop/programs/augustus-3.2.3/bin/augustus: ERROR Tried to sample from empty list. ERROR: Augustus failed --> rank=NA, hostname=xxx-VirtualBox ERROR: Failed while preparing ab-inits ERROR: Chunk failed at level:0, tier_type:2 FAILED CONTIG:NODE_1040_length_26483_cov_27.125137 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:NODE_1040_length_26483_cov_27.125137 -- Regards Jan Nagel ---------------------------------------------------------------------- PhD Genetics student Department of Genetics Forestry and Agricultural Biotechnology Institute (FABI) FABI 1, Room 1-55 University of Pretoria 74 Lunnon Rd. Hillcrest 0002 Gauteng Province South Africa Email : jan.nagel at fabi.up.ac.za Website: http://www.fabinet.up.ac.za/index.php/people-profile?profile=961 -- This message and attachments are subject to a disclaimer. Please refer to http://upnet.up.ac.za/services/it/documentation/docs/004167.pdf for full details. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott at scottcain.net Thu Oct 12 17:40:33 2017 From: scott at scottcain.net (Scott Cain) Date: Thu, 12 Oct 2017 19:40:33 -0400 Subject: [maker-devel] Call for presentations at GMOD workshop at PAG Message-ID: Hi all, This January in San Diego is the annual Plant and Animal Genomes (PAG) meeting (http://www.intlpag.org). As in previous PAGs, there will be several opportunities to present content related to GMOD projects. If you are interested in attending PAG and giving a talk at the GMOD workshop on Wednesday, January 17, please let me know. Your talk can either be about new developments/functionality in existing GMOD software, about how your organization is using the suite of GMOD software to good effect, or about technologies that you think the GMOD community would be interested in hearing about. Please email me directly with a title, an abstract or a vague idea of what you'd like to talk about. Also, if you'd really like to come but are having a hard time coming up with travel funds, please let me know, I might be able to help you with that too (up to a limit of one person anyway). Cheers, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Oct 13 09:37:25 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 Oct 2017 09:37:25 -0600 Subject: [maker-devel] MAKER RepeatRunner error on long scaffolds only In-Reply-To: <90B18E05-63DB-4458-BC9B-807972BE1414@gmail.com> References: <2460BB61-C918-40B5-ABF2-03193BF13CCC@gmail.com> <52A27F91-063E-45C5-BEE0-BED0BF4E861E@gmail.com> <228ECD18-7B0E-47EB-9F58-FA3C31421A52@gmail.com> <90B18E05-63DB-4458-BC9B-807972BE1414@gmail.com> Message-ID: So you have an input GFF3 file? Could you send it to me along with the problem contig. If you want you can upload the maker control files and evidence sets, and I can just recreate the run for the contig. Upload here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi ?Carson > On Oct 12, 2017, at 8:22 PM, Daren C. Card wrote: > > Hi Carson, > > Thanks for the help. Issue is still lingering. I?ve tried my full ?ideal? run using both the BLAST legacy 2.2.26 and also 2.6 and get the same error, so doesn?t seem to be a BLAST issue. Or is one that won?t be easy to overcome. > > Using BLAST v. 2.6, I tried some more runs turning off RepeatRunner or excluding the complex repeat GFF I?m trying to supply. Seems to be running fine without my GFF, which indicates to me that the issue is this file and not BLAST. Disclaimer: I didn?t run the entire scaffold since it is quite large, but it went well past the point at which it was otherwise failing which leads me to believe it would finish okay. > > I validated the GFF at http://genometools.org/cgi-bin/gff3validator.cgi. I had previously had <10 negative start coordinates for the repeat coordinates in the attributes field of the GFF, which I just set to 1 to give a clean GFF. This was what I used for the runs I described above, so whatever issue there is with this GFF is a mystery to me. > > What advice do you have for further troubleshooting to try to determine what part of the GFF is causing the issue? I don?t see any obvious way info about how the sequence or the GFF is partitioned up for the annotation among the output files produced, so any help you can provide would be great. > > Hoping I can resolve this as maybe this is useful to others. Weird that I?m getting this error, as I?ve annotated several other genomes in a similar manner and never had this issue. They were less contiguous, but can?t imagine that really mattering. > > Thanks, > Daren > > >> On Oct 8, 2017, at 7:37 PM, Carson Holt wrote: >> >> MAKER will use whatever blast is indicated in maker_exe.ctl, so make sure the new installation is the one indicated there. RepeatRunner is not part of RepeatMasker, and is a separate step that is essentially just a modified BLASTX against a protein database. So the standard NCBI blast+ installation is what gets used for that (not RMBLAST). >> >> The error you get is because the BLAST report is truncated. At the top of a BLAST report there is a summary of results, and then below there are details about each result. What is happening is that there are results in the top summary that are not being found in the bottom detail section. If Updating to BLAST+ 2.6 does not fix it for you, you may need to drop to legacy NCBI BLAST (i.e. the one that is not the BLAST+ rewrite). Here ?> ftp://ftp.ncbi.nlm.nih.gov/blast/executables/legacy/2.2.26/ >> >> ?Carson >> >> >> >> >> >>> On Oct 6, 2017, at 6:23 AM, Daren C. Card wrote: >>> >>> Dear Carson, >>> >>> Thanks so much for the quick reply. I updated BLAST to v2.6 and reran the configure script for RepeatMasker. Looks like MAKER should natively work with the BLAST that is available in the $PATH. >>> >>> Unfortunately, I?m still getting the same error what appears to be at roughly the same spot (~child 226). I?ve copied the stderr below. I checked my GFF file and I don?t see any issues with coordinates. I?m going to try running without a GFF of repeat annotations to see what that does, but in the meantime I wanted to send an update and see if there is anything else I should look into. >>> >>> Thank you, >>> Daren Card >>> >>> >>> ################################################ >>> doing repeat masking >>> re reading repeat masker report. >>> /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.simple.rb.out >>> doing blastx repeats >>> re reading blast report. >>> /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.te_proteins%2Efasta.repeatrunner >>> deleted:2 hits >>> doing blastx repeats >>> doing blastx repeats >>> doing blastx repeats >>> doing blastx repeats >>> doing blastx repeats >>> doing blastx repeats >>> doing blastx repeats >>> doing blastx repeats >>> doing blastx repeats >>> collecting blastx repeatmasking >>> processing all repeats >>> in cluster::shadow_cluster... >>> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >>> --> rank=NA, hostname=moonunit0 >>> ERROR: Failed while processing all repeats >>> ERROR: Chunk failed at level:3, tier_type:1 >>> FAILED CONTIG:scaffold-1 >>> >>> ERROR: Chunk failed at level:2, tier_type:0 >>> FAILED CONTIG:scaffold-1 >>> >>> examining contents of the fasta file and run log >>> ################################################ >>> >>> >>> >>>> On Oct 4, 2017, at 11:03 AM, Carson Holt wrote: >>>> >>>> The point where it dies is because there is no start/end coordinate for one of the alignments. The issue can either be with the GFF3 you gave it or is a truncated BLAST report. Recently there have been a number of weird BLAST+ issues related to truncated reports. Updating to 2.6+ seems to solve it for most people. There is also a 2.6 update for rmblast inside RepeatMasker. I submitted a bug report and example set to BLAST a few months ago. >>>> >>>> ?Carson >>>> >>>> >>>>> On Oct 4, 2017, at 9:53 AM, Daren C. Card wrote: >>>>> >>>>> Hi all, >>>>> >>>>> I?ve been having an issue with MAKER (v. 2.31.8) that I haven?t been able to overcome, and no former questions have really addressed or helped fix the problem. I?ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size? >>>>> >>>>> I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can?t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I?m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library). >>>>> >>>>> Any help would be greatly appreciated. >>>>> Daren Card >>>>> >>>>> University of Texas Arlington >>>>> >>>>> ################################################### >>>>> doing blastx repeats >>>>> running blast search. >>>>> #--------- command -------------# >>>>> Widget::blastx: >>>>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner >>>>> #-------------------------------# >>>>> deleted:0 hits >>>>> collecting blastx repeatmasking >>>>> processing all repeats >>>>> in cluster::shadow_cluster... >>>>> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >>>>> --> rank=3, hostname=moonunit0 >>>>> ERROR: Failed while processing all repeats >>>>> ERROR: Chunk failed at level:3, tier_type:1 >>>>> FAILED CONTIG:scaffold-1 >>>>> >>>>> doing blastx repeats >>>>> running blast search. >>>>> #--------- command -------------# >>>>> Widget::blastx: >>>>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner >>>>> #-------------------------------# >>>>> ERROR: Chunk failed at level:2, tier_type:0 >>>>> FAILED CONTIG:scaffold-1 >>>>> >>>>> deleted:0 hits >>>>> deleted:0 hits >>>>> ################################################### >>>>> >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Oct 13 09:42:41 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 Oct 2017 09:42:41 -0600 Subject: [maker-devel] custom "ab initio" predictions with automatic hint-based predictions In-Reply-To: References: Message-ID: <947BFB2F-A893-417B-A043-07CE71F6F97E@gmail.com> Hi Bob, pred_gff is a way to get models MAKER cannot run into the analysis. Input to pred_gff will not get hints since MAKER is not running the program. Setting augustus_species allows MAKER to run Augustus with and without hints and then those models compete against each other. You cannot just run with hints as the raw model is also used as a filter to help reduce false positive gene models that result from bad hints. If the gff3 you are providing is the same as the MAKER run of Augustus, I would recommend not providing it. If it is different in some way, then you can leave it in. If you run under MPI (it?s ok to run MPI on a single machine), then MAKER will parallelize the Augustus run by running multiple configs and contig chunks at the same time. Thanks, Carson > On Oct 11, 2017, at 1:42 PM, Bob Zimmermann wrote: > > Hello, > > I would like to run maker with a custom set of ab initio predictions (based on hints given to augustus from RNAseq data), but allowing it to incorporate EST and protein data to make an additional run of augustus using hints derived from those alignments. > > My gene prediction section of the maker_opts.ctl file looks like this: > ... > augustus_species=all_combined #Augustus gene prediction species model > ... > pred_gff=../ab_initio_predictions/all_combined.augustus_masked.gff3 #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) > est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no > ? > > It seems as though even if pred_gff is set, augustus will still be run for ab initio predictions with no hints if an augustus_species setting is present. I was curious if there was any way around this, partly because custom ab initios could improve my annotation and also because the ab initio step can take long. > > Thanks for your help! > > Bob > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri Oct 13 09:50:26 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 Oct 2017 09:50:26 -0600 Subject: [maker-devel] Maker problem In-Reply-To: References: Message-ID: If you look in the folder of the failed contig under the .../theVoid directory there will be a file called query.masked.fasta. Copy that file somewhere. Then because maker gave you the command that failed, you can run it all by itself outside of MAKER Example ?> /home/xxx/Desktop/programs/augustus-3.2.3/bin/augustus --species=Np_2017_braker --UTR=off query.masked.fasta If it still fails, you now have a test file and command you can send to Mario Stanke (mario.stanke at uni-greifswald.de ). He made Augustus. It may be a bug he has already fixed (current Augustus version is 3.3) or there may be something in the species file causing the error that he can point out. ?Carson > On Oct 12, 2017, at 1:37 AM, Jan FABI wrote: > > Dear Maker team > > I am experiencing a problem while running maker and cannot find a solution to it online. > > I am running maker on a new genome, using BRAKER trained models for Augustus and GeneMark. This was successful and performed as expected, except for one contig where an error was encountered. > > This error occurs during Augustus and seems to have something to do with intron models. I have made sure that the input fasta does not contain characters other than ATCGN or contains "windows"/non-UNIX carriage returns. > > I include the relevant portion of the log below. Could you help me determine the cause of this error. > > > > setting up GFF3 output and fasta chunks > preparing ab-inits > running augustus. > #--------- command -------------# > Widget::augustus: > /home/xxx/Desktop/programs/augustus-3.2.3/bin/augustus --species=Np_2017_braker --UTR=off /tmp/maker_bQo5Oc/NODE_1040_length_26483_cov_27%2E125137.abinit_masked.0 > /tmp/maker_bQo5Oc/NODE_1040_length_26483_cov_27%2E125137.abinit_masked.0.Np_2017_braker.augustus > #-------------------------------# > Sampling error in intron model. state=37 base=26570 > > /home/xxx/Desktop/programs/augustus-3.2.3/bin/augustus: ERROR > Tried to sample from empty list. > > Sampling error in intron model. state=37 base=26570 > > /home/xxx/Desktop/programs/augustus-3.2.3/bin/augustus: ERROR > Tried to sample from empty list. > > ERROR: Augustus failed > --> rank=NA, hostname=xxx-VirtualBox > ERROR: Failed while preparing ab-inits > ERROR: Chunk failed at level:0, tier_type:2 > FAILED CONTIG:NODE_1040_length_26483_cov_27.125137 > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:NODE_1040_length_26483_cov_27.125137 > > -- > Regards > Jan Nagel > ---------------------------------------------------------------------- > PhD Genetics student > Department of Genetics > Forestry and Agricultural Biotechnology Institute (FABI) > FABI 1, Room 1-55 > University of Pretoria > 74 Lunnon Rd. Hillcrest > 0002 > Gauteng Province > South Africa > > Email : jan.nagel at fabi.up.ac.za > > Website: http://www.fabinet.up.ac.za/index.php/people-profile?profile=961 > This message and attachments are subject to a disclaimer. > Please refer to http://upnet.up.ac.za/services/it/documentation/docs/004167.pdf for full details. > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Oct 13 09:56:43 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 Oct 2017 09:56:43 -0600 Subject: [maker-devel] choosing the right gene model In-Reply-To: References: Message-ID: <821CB4FC-5571-41B1-AB2F-5FDD691C49D9@gmail.com> Both transcript and protein evidence will go into the AED calculation for overlap support. So in both cases the chosen model had better overlap (protein evidence will not count toward the eAED overlap calculation if it is out of frame with the model it is supposed to be supporting). The larger merged model generates a clutering affect on it?s evidence, so it?s evidence set for AED calculation is slightly different than the SNAP and Augustus model would generate. In both cases, I think GeneMark is hurting more than it is helping. You may want to just drop it from the analysis (unless it?s a fungi, I often find GeneMark can have that affect). ?Carson > On Oct 12, 2017, at 12:09 AM, Xabier V?zquez-Campos wrote: > > Hi there, > > I was visualising the annotations and I realised that in some cases, what it seems to be a gene is splitted according to one of the gene models, despite that the other 2, est2genome and prot2genome suggest that it isn't the case. > > > > Although the opposite also happens. > > > ? > For some reason, the "out of place" model is always (or almost) the one from Genemark. > > How much weight does carry the RNAseq and protein data on this decision (if any)? > How exactly is the final gene selected? > > Cheers, > Xabi > > -- > Xabier V?zquez-Campos, PhD > Research Associate > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Oct 13 10:56:30 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 Oct 2017 10:56:30 -0600 Subject: [maker-devel] jbrowse not working In-Reply-To: References: <83AFE420-D54D-4CE8-833F-DE6CCC34A229@gmail.com> Message-ID: <2D6E11BC-6853-458D-AEB1-12EF74D041A3@gmail.com> The master_datastore_index.log file has a list of failed and finished contigs. You can grep the file contents for FAILED or DIED to see if any contigs are not finished. Finished contigs will be listed as FINISHED in the file. Also note that if you have errors with the jbrowse build, you have to start over (i.e. wipe out old build). Rerunning the command over a failed build will try and insert again which can generate it?s own errors. If gff3_merge was run without the -n option then you need to see if one of the GFF3 files being used is truncated (possibly dew to an IO error - not uncommon on NFS storage). You will need to see if you can identify which contig file is truncated and rerun it. ?Carson > On Oct 9, 2017, at 10:42 PM, Emmanuel Nnadi wrote: > > Hi Carson > Thanks for the reply > > I generated the off with this command gff3_merge ?d dpp_contig.maker.output/dpp_contig_master_datastore_index.log > > I had to rerun browse with the following command > > maker2jbrowse /Users/emmannaemeka/desktop/Gpm/maker/data/mucuna/muc1_genome_snap2.maker.output/muc1_genome_snap2.functional_blast.gff\maker2jbrowse -d /Users/emmannaemeka/Desktop/Gpm/maker/data/mucuna/muc1_genome_snap2.maker.output/muc1_genome_snap2_master_datastore_index.log \-out /Library/WebServer/Documents/JBrowse-1.12.1/muc/muc_jb > > Although its showing > > WARNING: No matching features found for mRNA I don't know what it means > > I don't understand what it means > > > Successfully, I was able to setup the jbrowse local host. I had to move the jbrowse folder to my local host > > > The jbrowse is up and running however, I have about 18488 contigs only 31 contigs are showing, how can i make all my contigs to show on jbrowse? > > > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > On Tue, Oct 10, 2017 at 1:35 AM, Carson Holt > wrote: > Is muc1_genome_snap2.all.gff missing embedded fasta entries at the end of the file? That can happen if you use the -n option with gff3_merge. Alternatively it?s possible one of the individual contig gff3 used to build the merged gff3 is truncated. If that is the case then gff3_merge should have thrown some sort of error or warning when you run it. > > Thanks, > Carson > > > > >> On Oct 7, 2017, at 3:34 PM, Emmanuel Nnadi > wrote: >> >> Please, >> I ran the command line >> >> maker2jbrowse muc1_genome_snap2.all.gff >> >> The command created some folders. However, at the end it read >> No reference sequences defined in configuration, nothing to do. >> >> Please what does it mean? How can I view it in jbrowse. >> >> Thanks >> >> >> Nnadi Nnaemeka Emmanuel >> Department of Microbiology, >> Faculty of Natural and Applied Science, >> Plateau State University, Bokkos, Plateau State, Nigeria. >> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Fri Oct 13 14:26:40 2017 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Sat, 14 Oct 2017 07:26:40 +1100 Subject: [maker-devel] choosing the right gene model In-Reply-To: <821CB4FC-5571-41B1-AB2F-5FDD691C49D9@gmail.com> References: <821CB4FC-5571-41B1-AB2F-5FDD691C49D9@gmail.com> Message-ID: Actually, it's a fungal genome. Although not very typical, almost half of it are repeats. Worth mention that Genemark generates a lot of predictions that overlap LTRs and other complex repeats, something that neither SNAP or Augustus do. Have you seen this before? On 14 Oct. 2017 02:56, "Carson Holt" wrote: > Both transcript and protein evidence will go into the AED calculation for > overlap support. So in both cases the chosen model had better overlap > (protein evidence will not count toward the eAED overlap calculation if it > is out of frame with the model it is supposed to be supporting). The larger > merged model generates a clutering affect on it?s evidence, so it?s > evidence set for AED calculation is slightly different than the SNAP and > Augustus model would generate. In both cases, I think GeneMark is hurting > more than it is helping. You may want to just drop it from the analysis > (unless it?s a fungi, I often find GeneMark can have that affect). > > ?Carson > > > On Oct 12, 2017, at 12:09 AM, Xabier V?zquez-Campos > wrote: > > Hi there, > > I was visualising the annotations and I realised that in some cases, what > it seems to be a gene is splitted according to one of the gene models, > despite that the other 2, est2genome and prot2genome suggest that it isn't > the case. > > > > Although the opposite also happens. > > > ? > For some reason, the "out of place" model is always (or almost) the one > from Genemark. > > How much weight does carry the RNAseq and protein data on this decision > (if any)? > How exactly is the final gene selected? > > Cheers, > Xabi > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From z2.stewart at qut.edu.au Sat Oct 14 23:02:08 2017 From: z2.stewart at qut.edu.au (ZACHARY STEWART) Date: Sun, 15 Oct 2017 05:02:08 +0000 Subject: [maker-devel] Advanced repeat library construction - CRL step 4 assistance Message-ID: Hello MAKER team, I am hoping I could have a bit of your time if that isn't a problem. I am currently performing the advanced repeat library construction as described on the MAKER wiki, and everything appears to work as expected until I reach "2.1.5 Building examplars". At this point I encounter a problem previously documented in the Google group (title: advanced repeat masking library constructions & rna-seq assembly choices) where the "Inner_Seq_For_BLAST.fasta" and "lLTRs_Seq_For_BLAST.fasta" are empty. I was hoping you could clarify what you meant by simplifying the sequence names. The genomic contig names are in a format such as ">001676F" and I modified the MITE library to have names like ">mite1, >mite2" etc. The passed_outinner_sequence.fasta has sequence names such as ">000021F_(dbseq-nr_766)_[918983,922225]" which I have not tried changing since I suspect the name is important for later reassociation. If you could point me in the right direction that would be very appreciated. Regards, Zac. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eennadi at gmail.com Sun Oct 15 15:32:10 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Sun, 15 Oct 2017 22:32:10 +0100 Subject: [maker-devel] Backlash running through my sequence Message-ID: Hi all, I am trying to running annotation on some of my sequences but noticed that i have backslash that runs through the sequence. Please how do I remove them I attached the sequence Thanks Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: sample_1.fasta Type: application/octet-stream Size: 3884914 bytes Desc: not available URL: From xvazquezc at gmail.com Mon Oct 16 01:26:56 2017 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Mon, 16 Oct 2017 18:26:56 +1100 Subject: [maker-devel] Advanced repeat library construction - CRL step 4 assistance In-Reply-To: References: Message-ID: Hi Zac, The contig names you indicate shouldn't give any problems. And if you changed the names of MITE.lib right after creation and before using it downstream, it shouldn't be an issue. Have you confirmed if the prior blastx output has any results? Also, be sure you use the same version of makeblastdb and blastx/blastn. I remember reading before running the protocol for first time that in some cases, switching versions could give problems. And be careful if you copy/paste from the wiki page, there are a few typos and dashes instead of minus characters in the command line option flags, all of which will result in errors Xabi On 15 October 2017 at 16:02, ZACHARY STEWART wrote: > Hello MAKER team, > > > I am hoping I could have a bit of your time if that isn't a problem. I am > currently performing the advanced repeat library construction as described > on the MAKER wiki, and everything appears to work as expected until I reach > "2.1.5 Building examplars". At this point I encounter a problem previously > documented in the Google group (title: advanced repeat masking library > constructions & rna-seq assembly choices) where the "Inner_Seq_For_BLAST.fasta" > and "lLTRs_Seq_For_BLAST.fasta" are empty. I was hoping you could clarify > what you meant by simplifying the sequence names. The genomic contig names > are in a format such as ">001676F" and I modified the MITE library to > have names like ">mite1, >mite2" etc. The passed_outinner_sequence.fasta > has sequence names such as ">000021F_(dbseq-nr_766)_[918983,922225]" > which I have not tried changing since I suspect the name is important for > later reassociation. If you could point me in the right direction that > would be very appreciated. > > > Regards, > > Zac. > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From yuejiaxing at gmail.com Mon Oct 16 02:54:42 2017 From: yuejiaxing at gmail.com (Jia-Xing Yue) Date: Mon, 16 Oct 2017 10:54:42 +0200 Subject: [maker-devel] maker-devel Digest, Vol 113, Issue 13 In-Reply-To: References: Message-ID: Dear maker developers, I am trying to install maker-3.01.1-beta but encountered the warning message about uninitialized value (see the warning message below) although still finished the installation. [jxyue at paralog src]$ ./Build install Building MAKER Use of uninitialized value $line in chomp at /home/jxyue/Projects/LRSDAY/bu ild/maker/src/../../../build/cpanm/perlmods/lib/perl5/Module/Build/Base.pm line 3082. Use of uninitialized value $line in substitution (s///) at /home/jxyue/Projects/LRSDAY/build/maker/src/../../../build/ cpanm/perlmods/lib/perl5/Module/Build/Base.pm line 3083. Installing MAKER... Building MAKER ... Also, when I ran this installation for the actual work, it reported errors about cannot find my specified snaphmm model for the annotation, despite that I have specified "snaphmm=$LRSDAY_HOME/data/S288C.gene.hmm" in the "maker_opts.ctl" file and this configuration information has been successfully recognized by maker. running snap. #--------- command -------------# Widget::snap: /home/jxyue/Projects/LRSDAY/build/SNAP/snap /home/jxyue/Projects/LRSDAY/data/S288C.gene.hmm /tmp/maker_m8TVEQ/chrI.abinit_masked.0 > /tmp/maker_m8TVEQ/chrI.abinit_masked.0.S288C%2Egene%2Ehmm.snap #-------------------------------# # (my comment: up to now everything looks fine) .... running snap. #--------- command -------------# Widget::snap: /home/jxyue/Projects/LRSDAY/build/SNAP/snap -plus -xdef /tmp/maker_m8TVEQ/0/85_0.4044-4985.S288C.gene.hmm.auto_annotator.xdef.snap S288C.gene.hmm /tmp /maker_m8TVEQ/0/85_0.4044-4985.S288C.gene.hmm.auto_annotator.snap.fasta > /tmp/maker_m8TVEQ/0/85_0.4044-4985.S288C.gene.hmm.auto_annotator.snap #-------------------------------# ZOE ERROR (from /home/jxyue/Projects/LRSDAY/build/SNAP/snap): error opening file (/home/jxyue/Projects/LRSDAY/build/SNAP/Zoe/HMM/S288C.gene.hmm) ZOE library version 2017-03-01 ERROR: Snap failed --> rank=NA, hostname=paralog.itc.unipi.it ERROR: Failed while annotating transcripts ERROR: Chunk failed at level:1, tier_type:4 FAILED CONTIG:chrI ERROR: Chunk failed at level:6, tier_type:0 FAILED CONTIG:chrI examining contents of the fasta file and run log # (my comment: here the error occurred. As you can see, snap somehow forgot about the path to my specified hmm file and instead looks for this file in its default installation location) It is worth noting that the parallel installation and run with maker-3.00.0-beta finish smoothly without any problem. So I suspect both the installation warning and the executing error are caused by the changes during the version update from 3.00.0-beta to 3.01.1-beta. Could you check about this issue? Thanks in advance! Finally, is it possible to also provide access to older version of maker (e.g. 3.00.0-beta in this particular case) when the user finish the registration in the maker download page? This will help users to roll back to older version when needed. Also this helps for the version control when other developers develop annotation pipelines that use maker as a dependency package. Thanks for the consideration! Best, Jia-Xing -- Jia-Xing Yue Population Genomics and Complex Traits Group Tour Pasteur 8eme etage Facult? de M?decine Institute for Research on Cancer and Aging, Nice (IRCAN) CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) 28 Avenue de Valombrose 06107 NICE Cedex 2 France Personal website: http://www.iamphioxus.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Oct 16 10:20:32 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 16 Oct 2017 10:20:32 -0600 Subject: [maker-devel] Backlash running through my sequence In-Reply-To: References: Message-ID: <09603A3A-9DC0-40DC-A111-9DC1FCDF80BB@gmail.com> I would not just remove them. The fact they are there calls into question how they got there in the first place. If you generated this file yourself, you may want to intead use fasta_tool. ?Carson > On Oct 15, 2017, at 3:32 PM, Emmanuel Nnadi wrote: > > Hi all, > I am trying to running annotation on some of my sequences but noticed that i have backslash that runs through the sequence. Please how do I remove them > I attached the sequence > > Thanks > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Tue Oct 17 13:11:39 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 17 Oct 2017 19:11:39 +0000 Subject: [maker-devel] Backlash running through my sequence In-Reply-To: <09603A3A-9DC0-40DC-A111-9DC1FCDF80BB@gmail.com> References: <09603A3A-9DC0-40DC-A111-9DC1FCDF80BB@gmail.com> Message-ID: <8FD23F25-92D4-4A9C-873B-BB559B2CCBF4@illinois.edu> I agree with Carson, though my guess is any fasta converters will either fail on these characters as non-IUPAC, or will silently remove them. Running them through a converter may not solve all the issues though, as the backslash also appears in the FASTA headers at the end of the line: cjfields-imac:MAKER cjfields$ grep '>' sample_1.fasta | grep '\\' >contig_134\ >contig_149\ >contig_158\ >contig_222\ >contig_316\ >contig_582\ >contig_634\ >contig_700\ >contig_741\ ? I?m curious, was this edited using any particular program prior to MAKER (or was this an amalgam of different files)? chris From: maker-devel on behalf of Carson Holt Date: Monday, October 16, 2017 at 11:22 AM To: Emmanuel Nnadi Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Backlash running through my sequence I would not just remove them. The fact they are there calls into question how they got there in the first place. If you generated this file yourself, you may want to intead use fasta_tool. ?Carson On Oct 15, 2017, at 3:32 PM, Emmanuel Nnadi > wrote: Hi all, I am trying to running annotation on some of my sequences but noticed that i have backslash that runs through the sequence. Please how do I remove them I attached the sequence Thanks Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Oct 17 13:33:26 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Oct 2017 13:33:26 -0600 Subject: [maker-devel] maker-devel Digest, Vol 113, Issue 13 In-Reply-To: References: Message-ID: <30F2FDFE-3B4E-4951-89D8-63C2FC772B63@gmail.com> Thanks. The map_fasta_ids script was empty in the bin directory for some reason, so the installer through an error because it could not find the #!/usr/bin/perl line. I have put it back in the bin directory where it was supposed to be and the issue goes away for the install. For the second issue, I think I found it and have updated a new tar ball to the website. Also here is a link to download the old 3.00-beta, although I would not recommend making it part of a pipeline because version 3 is still beta and still has bugs (you should use 2.31.9 instead for piplines). ?> http://topaz.genetics.utah.edu/maker_downloads/static/maker-3.00.0-beta.tgz ?Carson > On Oct 16, 2017, at 2:54 AM, Jia-Xing Yue wrote: > > Dear maker developers, > > I am trying to install maker-3.01.1-beta but encountered the warning message about uninitialized value (see the warning message below) although still finished the installation. > > [jxyue at paralog src]$ ./Build install > Building MAKER > Use of uninitialized value $line in chomp at /home/jxyue/Projects/LRSDAY/build/maker/src/../../../build/cpanm/perlmods/lib/perl5/Module/Build/Base.pm line 3082. > Use of uninitialized value $line in substitution (s///) at /home/jxyue/Projects/LRSDAY/build/maker/src/../../../build/cpanm/perlmods/lib/perl5/Module/Build/Base.pm line 3083. > Installing MAKER... > Building MAKER > ... > > Also, when I ran this installation for the actual work, it reported errors about cannot find my specified snaphmm model for the annotation, despite that I have specified "snaphmm=$LRSDAY_HOME/data/S288C.gene.hmm" in the "maker_opts.ctl" file and this configuration information has been successfully recognized by maker. > > running snap. > #--------- command -------------# > Widget::snap: > /home/jxyue/Projects/LRSDAY/build/SNAP/snap /home/jxyue/Projects/LRSDAY/data/S288C.gene.hmm /tmp/maker_m8TVEQ/chrI.abinit_masked.0 > /tmp/maker_m8TVEQ/chrI.abinit_masked.0.S288C%2Egene%2Ehmm.snap > #-------------------------------# > > # (my comment: up to now everything looks fine) > .... > > running snap. > #--------- command -------------# > Widget::snap: > /home/jxyue/Projects/LRSDAY/build/SNAP/snap -plus -xdef /tmp/maker_m8TVEQ/0/85_0.4044-4985.S288C.gene.hmm.auto_annotator.xdef.snap S288C.gene.hmm /tmp > /maker_m8TVEQ/0/85_0.4044-4985.S288C.gene.hmm.auto_annotator.snap.fasta > /tmp/maker_m8TVEQ/0/85_0.4044-4985.S288C.gene.hmm.auto_annotator.snap > #-------------------------------# > ZOE ERROR (from /home/jxyue/Projects/LRSDAY/build/SNAP/snap): error opening file (/home/jxyue/Projects/LRSDAY/build/SNAP/Zoe/HMM/S288C.gene.hmm) > ZOE library version 2017-03-01 > ERROR: Snap failed > --> rank=NA, hostname=paralog.itc.unipi.it > ERROR: Failed while annotating transcripts > ERROR: Chunk failed at level:1, tier_type:4 > FAILED CONTIG:chrI > > ERROR: Chunk failed at level:6, tier_type:0 > FAILED CONTIG:chrI > > examining contents of the fasta file and run log > > # (my comment: here the error occurred. As you can see, snap somehow forgot about the path to my specified hmm file and instead looks for this file in its default installation location) > > It is worth noting that the parallel installation and run with maker-3.00.0-beta finish smoothly without any problem. So I suspect both the installation warning and the executing error are caused by the changes during the version update from 3.00.0-beta to 3.01.1-beta. Could you check about this issue? Thanks in advance! > > Finally, is it possible to also provide access to older version of maker (e.g. 3.00.0-beta in this particular case) when the user finish the registration in the maker download page? This will help users to roll back to older version when needed. Also this helps for the version control when other developers develop annotation pipelines that use maker as a dependency package. Thanks for the consideration! > > > Best, > Jia-Xing > > -- > Jia-Xing Yue > > Population Genomics and Complex Traits Group > Tour Pasteur 8eme etage > Facult? de M?decine > Institute for Research on Cancer and Aging, Nice (IRCAN) > CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) > 28 Avenue de Valombrose > 06107 NICE Cedex 2 > France > > Personal website: http://www.iamphioxus.org/ > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.tranvan at unil.ch Wed Oct 18 05:47:35 2017 From: patrick.tranvan at unil.ch (Patrick Tran Van) Date: Wed, 18 Oct 2017 11:47:35 +0000 Subject: [maker-devel] MPI vs multiple instance for speed In-Reply-To: <8FD23F25-92D4-4A9C-873B-BB559B2CCBF4@illinois.edu> References: <09603A3A-9DC0-40DC-A111-9DC1FCDF80BB@gmail.com>, <8FD23F25-92D4-4A9C-873B-BB559B2CCBF4@illinois.edu> Message-ID: <1508327278733.19140@unil.ch> Hi Carson, 1) I think I have read one of your post saying that running maker with MPI is faster than multiple instance, can you explain why ? 2) I am trying to annotate a 1GB specie but it's superslow. I have filtered the transcriptome to speed up the process but do you have other suggestion to increase the speed ? Cheers, Patrick Tran Van -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Oct 18 09:09:10 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 18 Oct 2017 09:09:10 -0600 Subject: [maker-devel] MPI vs multiple instance for speed In-Reply-To: <1508327278733.19140@unil.ch> References: <09603A3A-9DC0-40DC-A111-9DC1FCDF80BB@gmail.com> <8FD23F25-92D4-4A9C-873B-BB559B2CCBF4@illinois.edu> <1508327278733.19140@unil.ch> Message-ID: <486FE3D5-0902-4B05-A3E1-96642C68E422@gmail.com> MAKER can coordinate parallelization under MPI in a way it can?t even with multiple simultaneous runs. Because processes can comunicate among themselves under MPI, MAKER can break larger contigs into chunks or even pull off individual steps and pass them onto another processor, then receive the results back from that processor. So multiple BLAST, RepeatMasker, Exonerate, and prediction processes can all run at the same time for the same contig. Then they all pass their result back to the parent process so it can produce output for that contig. MPI was chosen as the parallelization framework rather than threads because it works both within a single machine as well as across multiple machines, so you can scale up to hundreds of processes if needed. ?Carson > On Oct 18, 2017, at 5:47 AM, Patrick Tran Van wrote: > > Hi Carson, > > 1) I think I have read one of your post saying that running maker with MPI is faster than multiple instance, can you explain why ? > > 2) I am trying to annotate a 1GB specie but it's superslow. > I have filtered the transcriptome to speed up the process but do you have other suggestion to increase the speed ? > > Cheers, > > Patrick Tran Van > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jhumann at wsu.edu Wed Oct 18 14:38:41 2017 From: jhumann at wsu.edu (Humann, Jodi Lynn) Date: Wed, 18 Oct 2017 20:38:41 +0000 Subject: [maker-devel] fix nucleotides option on MWAS Message-ID: Hello, I was wondering if there was any way to enable the '-fix_nucleotides' option on the MWAS version we are running locally on our server? I have a genome sequence with a degenerate nucleotide and get the following error: ERROR: The nucleotide sequence file '/local/www/maker/data/users/1/NZ_CP006580.1_EcP101.fasta' appears to contain protein sequence or unrecognized characters. Note the following nucleotides may be valid but are unsupported [RYKMSWBDHV] Please check/fix the file before continuing, or set -fix_nucleotides on the command line to fix this automatically. Invalid Character: 'K' --> rank=NA, hostname=compute2 The error message says the option can be used on the command line. Is that set on the actual command to run Maker (when using the command line version), or is it something that can be set in one of the control files? Any input would be greatly appreciated. I know I can fix my input file, but would prefer to just enable the option if I can. Thanks, Jodi Jodi Humann, Ph.D. Main Bioinformatics Lab Project Coordinator Department of Horticulture Washington State University PO Box 646414 Pullman, WA 99164-6414 509-335-3206 jhumann at wsu.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.zimmermann at univie.ac.at Thu Oct 19 09:25:08 2017 From: robert.zimmermann at univie.ac.at (Bob Zimmermann) Date: Thu, 19 Oct 2017 17:25:08 +0200 Subject: [maker-devel] Fewer gene models output with a superset of EST evidence Message-ID: Hi Maker Developers, I have been playing around with several data sets as input to annotate our newly reassembled genome. We have 3 RNA seq datasets which have been assembled into de novo transcripts using Trinity. These are input into the maker pipeline along with protein evidence. What is strange is that when I run maker with the de novo transcripts from a single set, I optain more maker transcripts than when I run with a combined set (1619 vs 1450 on one chromosome) and they are longer (median transcript length 1619 vs 1450, IQR 872-2160 vs 667-2026). It might make sense if they were more and shorter if the additional evidence was joining transcripts, but this would indicate that it is not the case. Therefore I?m trying to understand the algorithm. From what I understand if it finds evidence for an ab initio prediction for which the internal splice junctions agree, then it is considered for improvement. Why, then, if my combined set is a strict superset of the single set, do i get more transcripts with the single set? Thanks for your help! Best, Bob ? Department of Molecular Evolution and Development Universit?t Wien Althanstra?e 14 (UZA I), Zimmer 2.019 1090 Vienna Austria +43 1 427757002 From robert.zimmermann at univie.ac.at Thu Oct 19 09:28:17 2017 From: robert.zimmermann at univie.ac.at (Bob Zimmermann) Date: Thu, 19 Oct 2017 17:28:17 +0200 Subject: [maker-devel] Fewer gene models output with a superset of EST evidence In-Reply-To: References: Message-ID: Correction to the above numbers, the median lengths are 1414 and 1256. > On 19 Oct 2017, at 17:25, Bob Zimmermann wrote: > > Hi Maker Developers, > > I have been playing around with several data sets as input to annotate our newly reassembled genome. We have 3 RNA seq datasets which have been assembled into de novo transcripts using Trinity. These are input into the maker pipeline along with protein evidence. What is strange is that when I run maker with the de novo transcripts from a single set, I optain more maker transcripts than when I run with a combined set (1619 vs 1450 on one chromosome) and they are longer (median transcript length 1619 vs 1450, IQR 872-2160 vs 667-2026). It might make sense if they were more and shorter if the additional evidence was joining transcripts, but this would indicate that it is not the case. > > Therefore I?m trying to understand the algorithm. From what I understand if it finds evidence for an ab initio prediction for which the internal splice junctions agree, then it is considered for improvement. Why, then, if my combined set is a strict superset of the single set, do i get more transcripts with the single set? > > Thanks for your help! > > Best, > Bob > > ? > > Department of Molecular Evolution and Development > Universit?t Wien > Althanstra?e 14 (UZA I), Zimmer 2.019 > 1090 Vienna > Austria > > +43 1 427757002 > From carsonhh at gmail.com Thu Oct 19 09:44:07 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 19 Oct 2017 09:44:07 -0600 Subject: [maker-devel] Fewer gene models output with a superset of EST evidence In-Reply-To: References: Message-ID: <62F04A76-F3F1-4044-B4AD-129B15A9EEB2@gmail.com> You should look at both in a browser to get a better idea of what?s going on. What MAKER does is take the evidence given, clusters it (strand specific clustering) then uses the transcript evidence as intron hints to the predictors and protein alignments as exon hints (will also use polished protein hints to generate intron hints in the absence of transcript intron hints). Finally it uses overlapping transcript evidence to generate UTR. So look at it in a browser. See if the apparent overlap clusters are different in extent, also look for mRNA-seq evidence being merged. If the cluster is falsely merging between two loci because the mRNA-seq is merged, one of two things will happen you will get multiple models since the predictor can?t make a single model work within the cluster using the hints, or you will get a model with a really long UTR that is blocking other models from existing in the region. Also as depending on the mRNA-seq evidence coming in, you may be generating false models because of noise in the data. Essentially everything is transcribed at a basal level, so as you get more and more mRNA-seq, you generate more and more spurious alignments. So more evidence might gernate fewer long alignments for true loci or by falsely merging genes while simultaneously adding a number of very short spurious results. ?Carson > On Oct 19, 2017, at 9:28 AM, Bob Zimmermann wrote: > > Correction to the above numbers, the median lengths are 1414 and 1256. > >> On 19 Oct 2017, at 17:25, Bob Zimmermann wrote: >> >> Hi Maker Developers, >> >> I have been playing around with several data sets as input to annotate our newly reassembled genome. We have 3 RNA seq datasets which have been assembled into de novo transcripts using Trinity. These are input into the maker pipeline along with protein evidence. What is strange is that when I run maker with the de novo transcripts from a single set, I optain more maker transcripts than when I run with a combined set (1619 vs 1450 on one chromosome) and they are longer (median transcript length 1619 vs 1450, IQR 872-2160 vs 667-2026). It might make sense if they were more and shorter if the additional evidence was joining transcripts, but this would indicate that it is not the case. >> >> Therefore I?m trying to understand the algorithm. From what I understand if it finds evidence for an ab initio prediction for which the internal splice junctions agree, then it is considered for improvement. Why, then, if my combined set is a strict superset of the single set, do i get more transcripts with the single set? >> >> Thanks for your help! >> >> Best, >> Bob >> >> ? >> >> Department of Molecular Evolution and Development >> Universit?t Wien >> Althanstra?e 14 (UZA I), Zimmer 2.019 >> 1090 Vienna >> Austria >> >> +43 1 427757002 >> > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Oct 19 11:32:44 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 19 Oct 2017 11:32:44 -0600 Subject: [maker-devel] fix nucleotides option on MWAS In-Reply-To: References: Message-ID: Hi Jodi, I didn?t even know anyone else even had an MWAS server running (I?ve actually pulled all of the Build options for MWAS out of current releases). But you should be able to add the fix_nucleotide option to the command run by MWAS by editing the mwas_server script (?/maker/MWAS/bin/mwas_server). Somewhere inside the script there will be a line like this ?> $command = "$FindBin::RealBin/../../bin/maker -qq -base $job_id"; You can add -fix_nucleotides to that command so it always runs. fix_nucleotides is as command line flag. It?s basically a warning for the user to let them know something is weird (i.e. it is possible they mixed up transcript/protein sequence files). And then it allows the user to tell MAKER they did not mix files up, rather the data is supposed to look that way and they are ok with MAKER altering the sequence by replacing the letters or dashes seen with N?s. Thanks, Carson > On Oct 18, 2017, at 2:38 PM, Humann, Jodi Lynn wrote: > > Hello, > > I was wondering if there was any way to enable the ??fix_nucleotides? option on the MWAS version we are running locally on our server? I have a genome sequence with a degenerate nucleotide and get the following error: > > ERROR: The nucleotide sequence file '/local/www/maker/data/users/1/NZ_CP006580.1_EcP101.fasta' appears to contain protein sequence or unrecognized characters. Note the following nucleotides may be valid but are unsupported [RYKMSWBDHV] Please check/fix the file before continuing, or set -fix_nucleotides on the command line to fix this automatically. Invalid Character: 'K' --> rank=NA, hostname=compute2 > > The error message says the option can be used on the command line. Is that set on the actual command to run Maker (when using the command line version), or is it something that can be set in one of the control files? Any input would be greatly appreciated. I know I can fix my input file, but would prefer to just enable the option if I can. > > Thanks, > Jodi > > Jodi Humann, Ph.D. > Main Bioinformatics Lab Project Coordinator > Department of Horticulture > Washington State University > PO Box 646414 > Pullman, WA 99164-6414 > 509-335-3206 > jhumann at wsu.edu > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Oct 19 12:46:17 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 19 Oct 2017 12:46:17 -0600 Subject: [maker-devel] fix nucleotides option on MWAS In-Reply-To: References: Message-ID: <052F801C-3B37-4B0F-B40A-A905F5F2B1CE@gmail.com> Yes. That is the current version. ?Carson > On Oct 19, 2017, at 12:45 PM, Humann, Jodi Lynn wrote: > > Thanks for the info, Carson. We are running v2.31.9, and were able to get MWAS running, with some work. That is the current Maker version right? > > Jodi > > From: Carson Holt [mailto:carsonhh at gmail.com ] > Sent: Thursday, October 19, 2017 10:33 AM > To: Humann, Jodi Lynn > > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] fix nucleotides option on MWAS > > Hi Jodi, > > I didn?t even know anyone else even had an MWAS server running (I?ve actually pulled all of the Build options for MWAS out of current releases). But you should be able to add the fix_nucleotide option to the command run by MWAS by editing the mwas_server script (?/maker/MWAS/bin/mwas_server). > > Somewhere inside the script there will be a line like this ?> > $command = "$FindBin::RealBin/../../bin/maker -qq -base $job_id"; > > You can add -fix_nucleotides to that command so it always runs. fix_nucleotides is as command line flag. It?s basically a warning for the user to let them know something is weird (i.e. it is possible they mixed up transcript/protein sequence files). And then it allows the user to tell MAKER they did not mix files up, rather the data is supposed to look that way and they are ok with MAKER altering the sequence by replacing the letters or dashes seen with N?s. > > Thanks, > Carson > > > On Oct 18, 2017, at 2:38 PM, Humann, Jodi Lynn > wrote: > > Hello, > > I was wondering if there was any way to enable the ??fix_nucleotides? option on the MWAS version we are running locally on our server? I have a genome sequence with a degenerate nucleotide and get the following error: > > ERROR: The nucleotide sequence file '/local/www/maker/data/users/1/NZ_CP006580.1_EcP101.fasta' appears to contain protein sequence or unrecognized characters. Note the following nucleotides may be valid but are unsupported [RYKMSWBDHV] Please check/fix the file before continuing, or set -fix_nucleotides on the command line to fix this automatically. Invalid Character: 'K' --> rank=NA, hostname=compute2 > > The error message says the option can be used on the command line. Is that set on the actual command to run Maker (when using the command line version), or is it something that can be set in one of the control files? Any input would be greatly appreciated. I know I can fix my input file, but would prefer to just enable the option if I can. > > Thanks, > Jodi > > Jodi Humann, Ph.D. > Main Bioinformatics Lab Project Coordinator > Department of Horticulture > Washington State University > PO Box 646414 > Pullman, WA 99164-6414 > 509-335-3206 > jhumann at wsu.edu > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jhumann at wsu.edu Thu Oct 19 12:45:43 2017 From: jhumann at wsu.edu (Humann, Jodi Lynn) Date: Thu, 19 Oct 2017 18:45:43 +0000 Subject: [maker-devel] fix nucleotides option on MWAS In-Reply-To: References: Message-ID: Thanks for the info, Carson. We are running v2.31.9, and were able to get MWAS running, with some work. That is the current Maker version right? Jodi From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Thursday, October 19, 2017 10:33 AM To: Humann, Jodi Lynn Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] fix nucleotides option on MWAS Hi Jodi, I didn?t even know anyone else even had an MWAS server running (I?ve actually pulled all of the Build options for MWAS out of current releases). But you should be able to add the fix_nucleotide option to the command run by MWAS by editing the mwas_server script (?/maker/MWAS/bin/mwas_server). Somewhere inside the script there will be a line like this ?> $command = "$FindBin::RealBin/../../bin/maker -qq -base $job_id"; You can add -fix_nucleotides to that command so it always runs. fix_nucleotides is as command line flag. It?s basically a warning for the user to let them know something is weird (i.e. it is possible they mixed up transcript/protein sequence files). And then it allows the user to tell MAKER they did not mix files up, rather the data is supposed to look that way and they are ok with MAKER altering the sequence by replacing the letters or dashes seen with N?s. Thanks, Carson On Oct 18, 2017, at 2:38 PM, Humann, Jodi Lynn > wrote: Hello, I was wondering if there was any way to enable the ??fix_nucleotides? option on the MWAS version we are running locally on our server? I have a genome sequence with a degenerate nucleotide and get the following error: ERROR: The nucleotide sequence file '/local/www/maker/data/users/1/NZ_CP006580.1_EcP101.fasta' appears to contain protein sequence or unrecognized characters. Note the following nucleotides may be valid but are unsupported [RYKMSWBDHV] Please check/fix the file before continuing, or set -fix_nucleotides on the command line to fix this automatically. Invalid Character: 'K' --> rank=NA, hostname=compute2 The error message says the option can be used on the command line. Is that set on the actual command to run Maker (when using the command line version), or is it something that can be set in one of the control files? Any input would be greatly appreciated. I know I can fix my input file, but would prefer to just enable the option if I can. Thanks, Jodi Jodi Humann, Ph.D. Main Bioinformatics Lab Project Coordinator Department of Horticulture Washington State University PO Box 646414 Pullman, WA 99164-6414 509-335-3206 jhumann at wsu.edu _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From eennadi at gmail.com Mon Oct 23 07:30:07 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Mon, 23 Oct 2017 14:30:07 +0100 Subject: [maker-devel] Contamination report from NCBI Message-ID: Hello Good day. Please I submitted my sequence to NCBI and they sent back this contamination report. Please how do I use maker to effect the correction Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- SUBID BioProject BioSample Organism -------------------------------------------------------- SUB3124577 PRJNA414658 SAMN07821433 Mucuna pruriens [] We ran your sequences through our Contamination Screen. The screen found contigs that need to be trimmed and/or excluded. Please adjust the sequences appropriately and then resubmit your sequences. After you remove the contamination, trim any Ns at the ends of the sequence and remove any sequences that are shorter than 200 nt and not part of a multi-component scaffold. Note that hits in eukaryotic genomes to mitochondrial sequences can be ignored when specific criteria are met. Those criteria are explained below. Note that mismatches between the name of the adaptor/primer identified in the screen and the sequencing technology used to generate the sequencing data should not be used to discount the validity of the screen results as the adaptors/primers of many different sequencing platforms share sequence similarity. [] Some of the sequences hit primers or adaptors used in Illumina or 454 or other sequencing strategies or platforms. Primers at the end of a sequence should be removed. However, if primers are present within sequences then you should strongly consider splitting the sequences at the primers because the primer sequence could have been the region of overlap, causing a misassembly. Screened 26,016 sequences, 396,641,426 bp. Note: 5,610 sequences with runs of Ns 10 bp or longer (or those longer that 20 MB) were split before screening. 428 sequences with locations to mask/trim (31 split spans to exclude, 397 split spans with locations to mask/trim) Trim: Sequence name, length, span(s), apparent source contig_10109 13138 13078..13138 adaptor:NGB00847.1 contig_10200 20270 1..76 adaptor:NGB00847.1 contig_10202 22517 1..44 adaptor:NGB00360.1 contig_10218 55661 55592..55661 adaptor:NGB00847.1 contig_10283 11575 1..79 adaptor:NGB00847.1 contig_1038 91134 91073..91134 adaptor:NGB00360.1 contig_104 10061 10005..10061 adaptor:NGB00360.1 contig_10405 24076 1..43 adaptor:NGB00847.1 contig_10425 16694 16639..16694 adaptor:NGB00360.1 contig_10447 37445 37233..37445 adaptor:NGB00360.1 contig_10466 19368 1..52 adaptor:NGB00847.1 contig_10576 12053 12003..12053 adaptor:NGB00360.1 contig_1059 34516 34457..34516 adaptor:NGB00847.1 contig_106 49997 1..45 adaptor:NGB00360.1 contig_10695 27664 1..38 adaptor:NGB01029.1 contig_10753 12481 12413..12481 adaptor:NGB00847.1 contig_10822 33522 33441..33522 adaptor:NGB00847.1 contig_1083 10637 1..23 adaptor:NGB01096.1 contig_10851 36752 36682..36752 adaptor:NGB00360.1 contig_10878 27925 27848..27925 adaptor:NGB00360.1 contig_10965 23597 1..57 adaptor:NGB00360.1 contig_10968 7413 1..40 adaptor:NGB00847.1 contig_1099 35847 1..70 adaptor:NGB00360.1 contig_11034 10224 10166..10224 adaptor:NGB00360.1 contig_11058 32994 1..23 adaptor:NGB01088.1 contig_11138 17426 1..73 adaptor:NGB00847.1 contig_11166 6306 6266..6306 adaptor:NGB00360.1 contig_11182 26558 1..30 adaptor:NGB01096.1 contig_11216 15160 1..59 adaptor:NGB00847.1 contig_11269 14732 14655..14732 adaptor:NGB00847.1 contig_11306 28246 28199..28246 adaptor:NGB00360.1 contig_1136 28186 1..73 adaptor:NGB00847.1 contig_1141 58119 58028..58119 adaptor:NGB00847.1 contig_11416 8561 8539..8561 adaptor:NGB01088.1 contig_11504 8890 8840..8890 adaptor:NGB00360.1 contig_1158 17422 17398..17422 adaptor:NGB01088.1 contig_11647 7021 1..69 adaptor:NGB00847.1 contig_11684 17442 17418..17442 adaptor:NGB01096.1 contig_11752 38337 38314..38337 adaptor:NGB01088.1 contig_11767 6366 6324..6366 adaptor:NGB00847.1 contig_11791 22415 1..43 adaptor:NGB00847.1 contig_11792 58260 1..29 adaptor:NGB01096.1 contig_1187 39501 39462..39501 adaptor:NGB01029.1 contig_12059 10094 1..72 adaptor:NGB00360.1 contig_12130 13210 13164..13210 adaptor:NGB00360.1 contig_12164 17561 17539..17561 adaptor:NGB01096.1 contig_12169 14178 139..196 adaptor:NGB00360.1 contig_12183 15822 61..112 adaptor:NGB00360.1 contig_12266 11704 11640..11704 adaptor:NGB00360.1 contig_12300 9550 9360..9550 adaptor:NGB01088.1 contig_12324 49997 49891..49997 adaptor:NGB00847.1 contig_12423 45971 45860..45918 adaptor:NGB00360.1 contig_12441 15141 1..42 adaptor:NGB00847.1 contig_12514 14655 1..69 adaptor:NGB00847.1 contig_12515 5355 5326..5355 adaptor:NGB01088.1 contig_12535 22496 22458..22496 adaptor:NGB01029.1 contig_12544 19615 19559..19615 adaptor:NGB00360.1 contig_12558 20026 20007..20026 adaptor:NGB01088.1 contig_12613 6880 6793..6880 adaptor:NGB00847.1 contig_12701 18439 18330..18382 adaptor:NGB00360.1 contig_12713 13341 13274..13341 adaptor:NGB00360.1 contig_12723 17913 1..38 adaptor:NGB01088.1 contig_12730 55277 55249..55277 adaptor:NGB01096.1 contig_12739 6792 1..48 adaptor:NGB00360.1 contig_12787 30950 1..19 adaptor:NGB01096.1 contig_1279 18699 18670..18699 adaptor:NGB01088.1 contig_12815 5168 5091..5168 adaptor:NGB00847.1 contig_12846 20753 1..70 adaptor:NGB00360.1 contig_1288 34784 1..31 adaptor:NGB01096.1 contig_12888 12204 1..23 adaptor:NGB01096.1 contig_12919 10315 1..71 adaptor:NGB00360.1 contig_13031 8972 8938..8972 adaptor:NGB01093.1 contig_13088 6275 1..22 adaptor:NGB01088.1 contig_13140 36197 1..48 adaptor:NGB00360.1 contig_13233 16414 16355..16414 adaptor:NGB00847.1 contig_1330 33261 1..44 adaptor:NGB00847.1 contig_13319 19747 1..20 adaptor:NGB01096.1 contig_13367 36004 35868..35931 adaptor:NGB00847.1 contig_13395 5338 1..79 adaptor:NGB00360.1 contig_1341 30756 30734..30756 adaptor:NGB01088.1 contig_13481 9637 9600..9637 adaptor:NGB00360.1 contig_13506 5704 5662..5704 adaptor:NGB00360.1 contig_13548 5814 79..121 adaptor:NGB00360.1 contig_13567 21576 1..47 adaptor:NGB00847.1 contig_13669 8336 1..24 adaptor:NGB01088.1 contig_13718 23500 1..25 adaptor:NGB01096.1 contig_13783 18720 1..41 adaptor:NGB00847.1 contig_13830 32395 32367..32395 adaptor:NGB01096.1 contig_13845 15572 15493..15572 adaptor:NGB00360.1 contig_13854 10932 1..48 adaptor:NGB00360.1 contig_13943 37701 37674..37701 adaptor:NGB01096.1 contig_13957 7159 1..30 adaptor:NGB01096.1 contig_14014 29735 29672..29735 adaptor:NGB00360.1 contig_14027 21418 21340..21418 adaptor:NGB00360.1 contig_14032 47642 1..53 adaptor:NGB00847.1 contig_14047 26936 1..28 adaptor:NGB01088.1 contig_14048 45832 1..22 adaptor:NGB01088.1 contig_14061 11471 1..179 adaptor:NGB01096.1 contig_14113 17661 1..67 adaptor:NGB00360.1 contig_14173 17601 1..41 adaptor:NGB00847.1 contig_1418 31840 1..248 adaptor:NGB00847.1 contig_14194 7456 7294..7456 adaptor:NGB01096.1 contig_14210 8814 1971..2025 adaptor:NGB00360.1 contig_14223 12513 12489..12513 adaptor:NGB01096.1 contig_14317 21472 21410..21472 adaptor:NGB00360.1 contig_14424 6040 5973..6040 adaptor:NGB00360.1 contig_14425 6404 6379..6404 adaptor:NGB01096.1 contig_14426 31457 31398..31457 adaptor:NGB00847.1 contig_14458 6814 6623..6814 adaptor:NGB01088.1 contig_14524 9488 9431..9488 adaptor:NGB00847.1 contig_14584 20433 1..96 adaptor:NGB00847.1 contig_1459 32979 1..32 adaptor:NGB01096.1 contig_14601 19077 1..28 adaptor:NGB01096.1 contig_14641 21747 1..45 adaptor:NGB00847.1 contig_14664 48155 48118..48155 adaptor:NGB00360.1 contig_14711 11854 11827..11854 adaptor:NGB01096.1 contig_14736 21360 1..37 adaptor:NGB01029.1 contig_14749 12830 1..33 adaptor:NGB01093.1 contig_14966 9962 9891..9962 adaptor:NGB00360.1 contig_14999 5248 1..41 adaptor:NGB00360.1 contig_15010 17976 1..43 adaptor:NGB00360.1 contig_15011 26484 26462..26484 adaptor:NGB01096.1 contig_15017 9331 9291..9331 adaptor:NGB00360.1 contig_1503 63533 1..33 adaptor:NGB01096.1 contig_15032 32240 32157..32240 adaptor:NGB00847.1 contig_15060 15050 15010..15050 adaptor:NGB00847.1 contig_15065 13062 12996..13062 adaptor:NGB00360.1 contig_15070 29943 1..29 adaptor:NGB01096.1 contig_15132 20431 1..71 adaptor:NGB00847.1 contig_15169 7086 7051..7086 adaptor:NGB00846.1 contig_15174 19921 1..23 adaptor:NGB01096.1 contig_15194 16100 16039..16100 adaptor:NGB00847.1 contig_15212 9272 1..50 adaptor:NGB00847.1 contig_15215 15591 1..58 adaptor:NGB00360.1 contig_15271 37699 37647..37699 adaptor:NGB00847.1 contig_15276 11087 11031..11087 adaptor:NGB00847.1 contig_15309 10118 1..42 adaptor:NGB00847.1 contig_15320 7963 7901..7963 adaptor:NGB00847.1 contig_15334 5683 1..36 adaptor:NGB00846.1 contig_15364 17306 76..139 adaptor:NGB00847.1 contig_15374 28301 28263..28301 adaptor:NGB00360.1 contig_15377 10470 10428..10470 adaptor:NGB00360.1 contig_15398 24069 23999..24069 adaptor:NGB00847.1 contig_15500 9289 9271..9289 adaptor:NGB01096.1 contig_15507 25565 1..22 adaptor:NGB01088.1 contig_15523 5782 5762..5782 adaptor:NGB01088.1 contig_15529 10225 10143..10225 adaptor:NGB00360.1 contig_15569 9645 9612..9645 adaptor:NGB01090.1 contig_15596 7163 1..42 adaptor:NGB00360.1 contig_15605 18521 1..31 adaptor:NGB01096.1 contig_15672 8446 1..213 adaptor:NGB01088.1 contig_15686 22141 58..90 adaptor:NGB00847.1 contig_15708 18098 17996..18098 adaptor:NGB00847.1 contig_15736 18284 18252..18284 adaptor:NGB01096.1 contig_15777 17192 1..45 adaptor:NGB00360.1 contig_15812 8602 1..77 adaptor:NGB00360.1 contig_15959 10936 10913..10936 adaptor:NGB01096.1 contig_15972 11324 1..71 adaptor:NGB00360.1 contig_15974 24312 24243..24312 adaptor:NGB00847.1 contig_16057 8838 8775..8838 adaptor:NGB00847.1 contig_16088 7608 1..71 adaptor:NGB00360.1 contig_16142 10392 1..53 adaptor:NGB00847.1 contig_1617 14870 255..310 adaptor:NGB00360.1 contig_16183 9226 9205..9226 adaptor:NGB01088.1 contig_16188 62666 62586..62666 adaptor:NGB00847.1 contig_16370 7868 1..42 adaptor:NGB00847.1 contig_16416 19512 1..21 adaptor:NGB01088.1 contig_1645 25016 24951..25016 adaptor:NGB00360.1 contig_16510 31845 31776..31845 adaptor:NGB00847.1 contig_16529 17342 1..45 adaptor:NGB00360.1 contig_16558 9338 9097..9338 adaptor:NGB00360.1 contig_16573 6590 6521..6590 adaptor:NGB00847.1 contig_16608 7397 7324..7397 adaptor:NGB00847.1 contig_16631 11055 1..50 adaptor:NGB00360.1 contig_16641 5482 1..190 adaptor:NGB01088.1 contig_1667 35244 35200..35244 adaptor:NGB01029.1 contig_16682 14500 1..71 adaptor:NGB00847.1 contig_16699 6216 6148..6216 adaptor:NGB00360.1 contig_16734 12674 12625..12674 adaptor:NGB00360.1 contig_16790 6341 1..51 adaptor:NGB00360.1 contig_16807 7512 1..36 adaptor:NGB01096.1 contig_16817 20743 1..155 adaptor:NGB01088.1 contig_16839 6969 1..69 adaptor:multiple contig_16870 10948 1..49 adaptor:NGB00847.1 contig_16880 5622 5549..5622 adaptor:NGB00360.1 contig_16889 9182 1..40 adaptor:NGB00360.1 contig_16911 6691 1..28 adaptor:NGB01088.1 contig_16921 9432 9358..9432 adaptor:NGB00360.1 contig_16951 14285 14262..14285 adaptor:NGB01088.1 contig_17021 12242 1..75 adaptor:NGB00360.1 contig_17092 22712 1..64 adaptor:NGB00360.1 contig_17147 7706 7685..7706 adaptor:NGB01096.1 contig_17195 15668 15643..15668 adaptor:NGB01096.1 contig_17214 7881 7819..7881 adaptor:NGB00847.1 contig_17299 7861 7830..7861 adaptor:NGB01088.1 contig_17344 8915 8765..8823 adaptor:NGB00360.1 contig_17361 8425 1..26 adaptor:NGB01096.1 contig_17422 11017 10964..11017 adaptor:NGB00360.1 contig_17471 5988 5964..5988 adaptor:NGB01096.1 contig_17505 10208 1..74 adaptor:NGB00360.1 contig_17506 6091 1..61 adaptor:NGB00360.1 contig_17520 6084 6028..6084 adaptor:NGB00360.1 contig_17538 5796 5766..5796 adaptor:NGB01096.1 contig_17558 7066 6837..7066 adaptor:NGB01080.1 contig_17561 15165 1..206 adaptor:NGB01083.1 contig_17594 6976 1..26 adaptor:NGB01088.1 contig_17655 14371 14177..14371 adaptor:NGB01088.1 contig_17671 17801 1..50 adaptor:NGB00847.1 contig_17680 5752 5693..5752 adaptor:NGB00847.1 contig_17738 6456 1..44 adaptor:NGB00360.1 contig_17741 10917 10889..10917 adaptor:NGB01096.1 contig_17775 5928 1..79 adaptor:NGB00847.1 contig_17804 11597 11562..11597 adaptor:NGB00846.1 contig_17872 11319 11278..11319 adaptor:NGB00847.1 contig_17876 5647 5613..5647 adaptor:NGB01083.1 contig_17925 9923 1..22 adaptor:NGB01088.1 contig_17938 5246 1..23 adaptor:NGB01088.1 contig_18016 8044 1..29 adaptor:NGB01096.1 contig_18017 6668 6647..6668 adaptor:NGB01096.1 contig_18044 11330 11299..11330 adaptor:NGB01096.1 contig_18049 10560 1..88 adaptor:NGB00847.1 contig_18173 12243 1..159 adaptor:NGB01096.1 contig_18175 8788 8765..8788 adaptor:NGB01096.1 contig_18177 11418 11340..11418 adaptor:multiple contig_18182 11901 11832..11901 adaptor:NGB00847.1 contig_18201 6059 6038..6059 adaptor:NGB01096.1 contig_18222 11216 11136..11216 adaptor:NGB00847.1 contig_18228 8386 8361..8386 adaptor:NGB01088.1 contig_18321 5922 5897..5922 adaptor:NGB01096.1 contig_18370 5400 5085..5116 adaptor:NGB00747.1 contig_18453 5849 1..38 adaptor:NGB00360.1 contig_1846 23210 1..64 adaptor:NGB00360.1 contig_18479 5209 1..44 adaptor:NGB00360.1 contig_18486 5749 5726..5749 adaptor:NGB01088.1 contig_18488 5217 1..19 adaptor:NGB01088.1 contig_1969 65776 1..60 adaptor:NGB00360.1 contig_197 9215 1..83 adaptor:NGB00847.1 contig_1977 13765 1..35 adaptor:NGB01093.1 contig_1999 53427 53398..53427 adaptor:NGB01096.1 contig_2125 11803 11769..11803 adaptor:NGB01083.1 contig_2151 9544 1..37 adaptor:NGB01029.1 contig_2179 38972 1..67 adaptor:NGB00360.1 contig_2186 31110 30935..31110 adaptor:NGB01096.1 contig_2203 60314 60124..60187 adaptor:NGB00847.1 contig_2278 33271 1..36 adaptor:NGB01090.1 contig_2305 17957 1..58 adaptor:NGB00360.1 contig_2361 48816 48764..48816 adaptor:NGB00847.1 contig_242 49604 49535..49604 adaptor:NGB00360.1 contig_2429 76318 76242..76318 adaptor:NGB00847.1 contig_2430 70439 70373..70439 adaptor:NGB00847.1 contig_2459 63920 1..96 adaptor:NGB00847.1 contig_2485 31300 31260..31300 adaptor:NGB00360.1 contig_2508 25152 25095..25152 adaptor:NGB00847.1 contig_2650 36583 1..58 adaptor:NGB00847.1 contig_2668 22089 22052..22089 adaptor:NGB01029.1 contig_2735 13614 1..19 adaptor:NGB01088.1 contig_2781 50403 1..70 adaptor:NGB00847.1 contig_2800 30768 22802..22846 adaptor:NGB00360.1 contig_2824 44109 1..38 adaptor:NGB00847.1 contig_2888 19121 1..89 adaptor:NGB00360.1 contig_2900 36871 1..32 adaptor:NGB01088.1 contig_2949 25959 25916..25959 adaptor:NGB00360.1 contig_2970 20833 1..46 adaptor:NGB00360.1 contig_2986 16429 1..43 adaptor:NGB00360.1 contig_3069 38956 38904..38956 adaptor:NGB00847.1 contig_3106 9135 1..87 adaptor:NGB00847.1 contig_3124 70101 70072..70101 adaptor:NGB01088.1 contig_3129 30402 30379..30402 adaptor:NGB01088.1 contig_3147 10611 10586..10611 adaptor:NGB01096.1 contig_3190 117726 117687..117726 adaptor:NGB01029.1 contig_3243 44291 44273..44291 adaptor:NGB01096.1 contig_3276 57911 1..42 adaptor:NGB00360.1 contig_341 67008 1..22 adaptor:NGB01096.1 contig_3542 16855 1..60 adaptor:NGB00847.1 contig_3595 29288 1..79 adaptor:NGB00847.1 contig_3712 73078 1..78 adaptor:NGB00847.1 contig_3840 40472 40414..40472 adaptor:NGB00360.1 contig_3868 33875 33819..33875 adaptor:NGB00360.1 contig_3903 40080 40010..40080 adaptor:NGB00847.1 contig_3996 44010 43970..44010 adaptor:NGB00360.1 contig_4001 26085 1..73 adaptor:NGB00847.1 contig_4014 30676 30590..30676 adaptor:NGB00360.1 contig_4019 49543 1..76 adaptor:NGB00360.1 contig_4036 58848 58696..58848 adaptor:NGB00846.1 contig_4084 41308 41210..41308 adaptor:NGB00360.1 contig_4095 24801 1..70 adaptor:NGB00847.1 contig_4098 27393 1..189 adaptor:NGB01096.1 contig_410 57740 57678..57740 adaptor:NGB00360.1 contig_4172 20870 9717..9749 adaptor:NGB01096.1 contig_4318 55870 55805..55870 adaptor:NGB00360.1 contig_432 58593 58569..58593 adaptor:NGB01088.1 contig_4323 87370 87304..87370 adaptor:NGB00847.1 contig_4365 27401 27350..27401 adaptor:NGB00847.1 contig_4516 14480 1..98 adaptor:NGB00847.1 contig_452 34031 1..23 adaptor:NGB01096.1 contig_4530 63069 63006..63069 adaptor:NGB00360.1 contig_4651 67570 67518..67570 adaptor:NGB00847.1 contig_4679 20970 1..38 adaptor:NGB00360.1 contig_4686 7411 1..24 adaptor:NGB01096.1 contig_4743 37926 1..79 adaptor:NGB00360.1 contig_4765 11248 11167..11248 adaptor:NGB00360.1 contig_4801 91339 1..50 adaptor:NGB00360.1 contig_4812 37300 37121..37300 adaptor:NGB01093.1 contig_4820 80899 80862..80899 adaptor:NGB00360.1 contig_4904 9220 1..52 adaptor:NGB00847.1 contig_4916 29759 29718..29759 adaptor:NGB00847.1 contig_4924 19015 1..49 adaptor:NGB00847.1 contig_4939 23620 23574..23620 adaptor:NGB01029.1 contig_4956 40890 1..24 adaptor:NGB01088.1 contig_4994 71509 71447..71509 adaptor:NGB00847.1 contig_501 34157 34116..34157 adaptor:NGB00847.1 contig_5036 13162 1..77 adaptor:NGB00360.1 contig_5052 64212 1..170 adaptor:NGB01096.1 contig_5063 35265 35243..35265 adaptor:NGB01096.1 contig_5090 27510 27441..27510 adaptor:NGB00847.1 contig_5157 5988 5805..5988 adaptor:NGB00847.1 contig_5168 6086 6051..6086 adaptor:NGB00846.1 contig_5176 9131 1..41 adaptor:NGB00360.1 contig_5243 44178 1..88 adaptor:NGB00847.1 contig_5270 39229 39177..39229 adaptor:NGB00847.1 contig_5452 30446 1..36 adaptor:NGB00846.1 contig_5576 58918 1..34 adaptor:NGB01096.1 contig_5582 108611 1..87 adaptor:NGB00847.1 contig_5590 55235 55210..55235 adaptor:NGB01088.1 contig_5700 8246 1..82 adaptor:NGB00847.1 contig_5815 99837 1..63 adaptor:NGB00847.1 contig_5820 11616 1..202 adaptor:NGB00847.1 contig_5878 55755 1..26 adaptor:NGB01096.1 contig_59 12390 1..24 adaptor:NGB01096.1 contig_5959 11737 11532..11737 adaptor:NGB01096.1 contig_6065 11492 1..32 adaptor:NGB01088.1 contig_6067 19311 1..39 adaptor:NGB01029.1 contig_6092 14700 1..37 adaptor:NGB01029.1 contig_6194 32760 1..19 adaptor:NGB01088.1 contig_620 10761 1..206 adaptor:NGB01029.1 contig_6259 83001 1..50 adaptor:NGB00360.1 contig_6321 29279 29260..29279 adaptor:NGB01096.1 contig_6408 14690 1..74 adaptor:NGB00360.1 contig_6455 68530 68497..68530 adaptor:NGB01090.1 contig_6513 12061 11986..12061 adaptor:NGB00847.1 contig_6542 45321 1..41 adaptor:NGB00360.1 contig_6569 19579 19500..19579 adaptor:NGB00847.1 contig_6628 13125 13107..13125 adaptor:NGB01096.1 contig_6673 6733 6699..6733 adaptor:NGB01088.1 contig_6676 13298 13265..13298 adaptor:NGB01088.1 contig_6692 17411 1..43 adaptor:NGB00847.1 contig_6703 57771 1..63 adaptor:NGB00360.1 contig_6785 8258 8237..8258 adaptor:NGB01088.1 contig_6908 53004 52732..52792 adaptor:NGB00847.1 contig_6940 18777 18580..18777 adaptor:NGB00360.1 contig_6941 42032 41980..42032 adaptor:NGB00847.1 contig_6945 53258 1..71 adaptor:NGB00360.1 contig_6986 49101 1..21 adaptor:NGB01088.1 contig_701 57358 1..28 adaptor:NGB01096.1 contig_7017 41786 1..88 adaptor:NGB00360.1 contig_7035 53503 53477..53503 adaptor:NGB01096.1 contig_7046 12860 12812..12860 adaptor:NGB00360.1 contig_7081 27746 1..78 adaptor:NGB00847.1 contig_7082 26783 1..73 adaptor:NGB00847.1 contig_7083 44465 1..70 adaptor:NGB00847.1 contig_7117 33739 33661..33739 adaptor:NGB00360.1 contig_7197 5439 5361..5439 adaptor:NGB00360.1 contig_720 34826 34755..34826 adaptor:NGB00360.1 contig_7210 16719 1..30 adaptor:NGB01096.1 contig_7225 51589 51483..51519 adaptor:NGB01090.1 contig_7228 37410 1..64 adaptor:NGB00360.1 contig_7296 6652 1..80 adaptor:NGB00847.1 contig_7317 11682 1..30 adaptor:NGB01088.1 contig_7323 47612 47560..47612 adaptor:NGB00847.1 contig_7353 50534 50506..50534 adaptor:NGB01096.1 contig_7478 44000 43977..44000 adaptor:NGB01088.1 contig_7510 11029 1..22 adaptor:NGB01096.1 contig_7540 12614 12566..12614 adaptor:NGB00360.1 contig_7587 74260 74065..74260 adaptor:NGB00847.1 contig_7607 14652 1..31 adaptor:NGB01088.1 contig_7612 27455 27299..27354 adaptor:NGB00360.1 contig_7705 39772 1..49 adaptor:NGB00360.1 contig_7729 22305 1..172 adaptor:NGB00360.1 contig_7747 11568 11502..11568 adaptor:NGB00847.1 contig_7750 52785 52748..52785 adaptor:NGB01029.1 contig_7800 20628 20588..20628 adaptor:NGB00360.1 contig_7851 53514 53439..53514 adaptor:NGB00360.1 contig_7989 51399 1..97 adaptor:NGB00847.1 contig_7992 9120 9035..9120 adaptor:NGB00360.1 contig_7995 103073 103034..103073 adaptor:NGB00360.1 contig_8000 16924 1..85 adaptor:NGB00847.1 contig_8071 73728 73657..73728 adaptor:NGB00360.1 contig_809 20474 20399..20474 adaptor:NGB00360.1 contig_8139 33627 1..25 adaptor:NGB01088.1 contig_8165 17003 16958..17003 adaptor:NGB00847.1 contig_8207 30300 30275..30300 adaptor:NGB01096.1 contig_821 111683 111656..111683 adaptor:NGB01096.1 contig_8236 30705 1..70 adaptor:NGB00360.1 contig_8261 49091 1..181 adaptor:NGB00847.1 contig_8265 28139 27940..28139 adaptor:NGB00360.1 contig_8307 32654 32591..32654 adaptor:NGB00360.1 contig_8340 12953 12925..12953 adaptor:NGB01096.1 contig_8389 19738 1..75 adaptor:NGB00847.1 contig_8399 35159 1..147 adaptor:NGB01096.1 contig_8569 19455 1..38 adaptor:multiple contig_8735 42362 42335..42362 adaptor:NGB01088.1 contig_8737 22308 1..70 adaptor:NGB00360.1 contig_8790 14216 14198..14216 adaptor:NGB01096.1 contig_8797 6889 1..95 adaptor:NGB00847.1 contig_8815 39194 1..80 adaptor:NGB00360.1 contig_886 10028 1..76 adaptor:NGB00360.1 contig_8861 12192 12145..12192 adaptor:NGB00360.1 contig_8909 11109 11042..11109 adaptor:NGB00360.1 contig_8932 8331 8281..8331 adaptor:NGB00847.1 contig_8975 8730 8671..8730 adaptor:NGB00847.1 contig_8992 12682 12661..12682 adaptor:NGB01088.1 contig_8994 7982 7950..7982 adaptor:NGB01096.1 contig_9017 8069 7896..8069 adaptor:NGB00360.1 contig_9045 35343 535..598 adaptor:NGB00847.1 contig_9082 10766 1..28 adaptor:NGB01096.1 contig_9271 17773 17750..17773 adaptor:NGB01096.1 contig_9273 12180 1..180 adaptor:NGB01096.1 contig_9287 6067 1..77 adaptor:NGB00847.1 contig_9474 33382 33060..33111 adaptor:NGB00360.1 contig_9495 19348 19274..19348 adaptor:NGB00847.1 contig_9540 30855 30836..30855 adaptor:NGB01088.1 contig_9591 10604 1..41 adaptor:NGB00847.1 contig_9628 15083 1..34 adaptor:NGB01096.1 contig_9677 5510 5486..5510 adaptor:NGB01088.1 contig_9693 9823 1..84 adaptor:NGB00847.1 contig_9825 54363 54309..54363 adaptor:NGB00847.1 contig_9863 14033 14013..14033 adaptor:NGB01088.1 contig_9993 35388 1..26 adaptor:NGB01096.1 From xvazquezc at gmail.com Mon Oct 23 16:02:47 2017 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Tue, 24 Oct 2017 09:02:47 +1100 Subject: [maker-devel] Contamination report from NCBI In-Reply-To: References: Message-ID: Hi there, Did you perform quality and adapter trimming of your raw reads? That's actually an assembly issue. I would seriously encourage you to redo the assembly before continuing. If that isnt possible, start by removing those sequences and split the contigs at those places as suggested in the report. For the annotation part, not 100% sure but I'd say start with the "Merge/resolve legacy annotations" steps but maybe Carson or Daniel have a different suggestion http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Merge.2FResolve_Legacy_Annotations Cheers, Xabi On 24 October 2017 at 00:30, Emmanuel Nnadi wrote: > Hello > > Good day. > > Please I submitted my sequence to NCBI and they sent back this > contamination report. > > Please how do I use maker to effect the correction > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/ > publications > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Mon Oct 23 17:21:06 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 23 Oct 2017 23:21:06 +0000 Subject: [maker-devel] Contamination report from NCBI In-Reply-To: References: Message-ID: <8B4331B5-9D10-478A-91A5-80AF702CD9CD@illinois.edu> It looks like the adapter is primarily at the ends, which is easy to remove. However, I agree, removing these and redoing the assembly may improve the assembly quality. chris From: maker-devel on behalf of Xabier V?zquez-Campos Date: Monday, October 23, 2017 at 5:03 PM To: Emmanuel Nnadi Cc: Maker Mailing List , "Ence, daniel" Subject: Re: [maker-devel] Contamination report from NCBI Hi there, Did you perform quality and adapter trimming of your raw reads? That's actually an assembly issue. I would seriously encourage you to redo the assembly before continuing. If that isnt possible, start by removing those sequences and split the contigs at those places as suggested in the report. For the annotation part, not 100% sure but I'd say start with the "Merge/resolve legacy annotations" steps but maybe Carson or Daniel have a different suggestion http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Merge.2FResolve_Legacy_Annotations Cheers, Xabi On 24 October 2017 at 00:30, Emmanuel Nnadi > wrote: Hello Good day. Please I submitted my sequence to NCBI and they sent back this contamination report. Please how do I use maker to effect the correction Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Xabier V?zquez-Campos, PhD Research Associate NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmokrejs at gmail.com Tue Oct 24 04:23:38 2017 From: mmokrejs at gmail.com (=?UTF-8?Q?Martin_MOKREJ=c5=a0?=) Date: Tue, 24 Oct 2017 12:23:38 +0200 Subject: [maker-devel] Contamination report from NCBI In-Reply-To: References: Message-ID: Hi Emmanuel, use trimmomatic or cutadapt to remove the adapters and check the output file for unremoved cases. Once they are all removed redo the assembly. Martin Emmanuel Nnadi wrote: > Hello > > Good day. > > Please I submitted my sequence to NCBI and they sent back this contamination report. > > Please how do I use maker to effect the correction -- Martin Mokrejs, Ph.D. Adapter/artefact removal from datasets based on the following technologies: 454 / IonTorrent / Evrogen MINT / Clontech SMART / ..., Illumina http://www.bioinformatics.cz/software/supported-protocols/ From eennadi at gmail.com Tue Oct 24 04:44:20 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Tue, 24 Oct 2017 11:44:20 +0100 Subject: [maker-devel] Contamination report from NCBI In-Reply-To: References: Message-ID: Thanks! Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications On Oct 24, 2017 11:23 AM, "Martin MOKREJ?" wrote: > Hi Emmanuel, > use trimmomatic or cutadapt to remove the adapters and check the output > file for unremoved cases. Once they are all removed redo the assembly. > Martin > > Emmanuel Nnadi wrote: > > Hello > > > > Good day. > > > > Please I submitted my sequence to NCBI and they sent back this > contamination report. > > > > Please how do I use maker to effect the correction > > -- > Martin Mokrejs, Ph.D. > Adapter/artefact removal from datasets based on the following technologies: > 454 / IonTorrent / Evrogen MINT / Clontech SMART / ..., Illumina > http://www.bioinformatics.cz/software/supported-protocols/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Tue Oct 24 10:54:13 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Tue, 24 Oct 2017 12:54:13 -0400 Subject: [maker-devel] gene annotation for a better genome In-Reply-To: <5AFEDD05-DF02-463F-A6EE-1619A9BB968D@gmail.com> References: <5AFEDD05-DF02-463F-A6EE-1619A9BB968D@gmail.com> Message-ID: Dear Carson: Thank you again for your suggestions. I just get the new genome assembly of NMR and start to do gene annotation. I understand you ideas about this. But can I simply use the old genome transcripts as transcript evidence, and just following the standard Maker2 pipeline? I set est2genome=1 and provide the mRNA sequences in the fasta format for the first round training of SNAP. For transcripts I have the following choices. I think the first choice is more reliable and better, right? (1) There are about 60,000 RefSeq transcripts from NCBI. So I downloaded those sequences in fasta format. (2) We have the raw data of RNA-seq from 11 tissues, we can do assembly by trinity for each sample and then get the transcripts. But I think most of the RNA-seq should have been submitted to NCBI. BTW, if we use the RefSeq data from NCBI, we can download the mRNA sequences, coding sequences or protein sequences. I wonder which type of data are the best to train the SNAP? For Augustus, we will use BUSCO to train it. Many thanks. Best Quanwei 2017-09-29 12:36 GMT-04:00 Carson Holt : > You can try using the est2genome=1 option to map the old models forward > onto the new assembly as if they were ESTs (add a line that says > est_forward=1 to the control file to maintain old naming and set est=1 to > the old model transcript file). Then provide the final models as a pred_gff > for a subsuquent run (i.e. a traditional MAKER run where you are annotating > the new assembly with transcript and protein evidence and ab initio > predictors). Don?t supply the old models to est= on that run. > > The idea behind doing it this way is: > 1. You need to get old models onto the new assembly so coordinates will > change. So by doing it this way, you will at least be able to move many > models forward based on homology. > 2. By providing the models to pred_gff on a subsequent MAKER run, you are > just letting old models compete against new annotations. They will be > rejected if they have no evidence support, or can be kept if they score > better than alternate models from SNAP/Augustus. That way you have the > chance to integrate old models while at the same time rejecting some old > models that have no evidence overlap. > > ?Carson > > > > On Sep 28, 2017, at 6:05 AM, Quanwei Zhang > wrote: > > > > Hello: > > > > Recently, we got a new version of NMR genome, whose genome had been > assembled and annotated a few years ago. We can download the gene > annotation from NCBI. > > > > Now we want to annotate the new genome using Maker2 pipeline. I wonder > how can I fully make use of existing annotations. On the other hand, since > the previous genome is not very well assemblies, some genes annotation > maybe false positives. I hope those false positive genes in previous > annotation won't mislead Maker2 for current gene annotation. > > > > Do you have any suggestions. Thanks > > > > Best > > Quanwei > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Oct 24 16:26:00 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 24 Oct 2017 16:26:00 -0600 Subject: [maker-devel] gene annotation for a better genome In-Reply-To: References: <5AFEDD05-DF02-463F-A6EE-1619A9BB968D@gmail.com> Message-ID: Yes. If you use est2genome it will just align the model, and then find the longest ORF. So it is a quick way to jsut align old models to the new assembly. Alternatively you can just do de novo annotation. ?Carson > On Oct 24, 2017, at 10:54 AM, Quanwei Zhang wrote: > > Dear Carson: > > Thank you again for your suggestions. I just get the new genome assembly of NMR and start to do gene annotation. I understand you ideas about this. But can I simply use the old genome transcripts as transcript evidence, and just following the standard Maker2 pipeline? I set est2genome=1 and provide the mRNA sequences in the fasta format for the first round training of SNAP. > > For transcripts I have the following choices. I think the first choice is more reliable and better, right? > (1) There are about 60,000 RefSeq transcripts from NCBI. So I downloaded those sequences in fasta format. > (2) We have the raw data of RNA-seq from 11 tissues, we can do assembly by trinity for each sample and then get the transcripts. But I think most of the RNA-seq should have been submitted to NCBI. > > BTW, if we use the RefSeq data from NCBI, we can download the mRNA sequences, coding sequences or protein sequences. I wonder which type of data are the best to train the SNAP? For Augustus, we will use BUSCO to train it. > > Many thanks. > > Best > Quanwei > > > > > 2017-09-29 12:36 GMT-04:00 Carson Holt >: > You can try using the est2genome=1 option to map the old models forward onto the new assembly as if they were ESTs (add a line that says est_forward=1 to the control file to maintain old naming and set est=1 to the old model transcript file). Then provide the final models as a pred_gff for a subsuquent run (i.e. a traditional MAKER run where you are annotating the new assembly with transcript and protein evidence and ab initio predictors). Don?t supply the old models to est= on that run. > > The idea behind doing it this way is: > 1. You need to get old models onto the new assembly so coordinates will change. So by doing it this way, you will at least be able to move many models forward based on homology. > 2. By providing the models to pred_gff on a subsequent MAKER run, you are just letting old models compete against new annotations. They will be rejected if they have no evidence support, or can be kept if they score better than alternate models from SNAP/Augustus. That way you have the chance to integrate old models while at the same time rejecting some old models that have no evidence overlap. > > ?Carson > > > > On Sep 28, 2017, at 6:05 AM, Quanwei Zhang > wrote: > > > > Hello: > > > > Recently, we got a new version of NMR genome, whose genome had been assembled and annotated a few years ago. We can download the gene annotation from NCBI. > > > > Now we want to annotate the new genome using Maker2 pipeline. I wonder how can I fully make use of existing annotations. On the other hand, since the previous genome is not very well assemblies, some genes annotation maybe false positives. I hope those false positive genes in previous annotation won't mislead Maker2 for current gene annotation. > > > > Do you have any suggestions. Thanks > > > > Best > > Quanwei > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daren.card at gmail.com Wed Oct 25 06:17:13 2017 From: daren.card at gmail.com (Daren C. Card) Date: Wed, 25 Oct 2017 07:17:13 -0500 Subject: [maker-devel] MAKER RepeatRunner error on long scaffolds only In-Reply-To: <49A07052-11CE-4D20-A8E1-2E036F04C45C@gmail.com> References: <2460BB61-C918-40B5-ABF2-03193BF13CCC@gmail.com> <52A27F91-063E-45C5-BEE0-BED0BF4E861E@gmail.com> <228ECD18-7B0E-47EB-9F58-FA3C31421A52@gmail.com> <90B18E05-63DB-4458-BC9B-807972BE1414@gmail.com> <97656D7C-3613-4B0B-9D99-0441AC28ABCC@gmail.com> <49A07052-11CE-4D20-A8E1-2E036F04C45C@gmail.com> Message-ID: <0406D4C3-9C43-4198-B2EA-241C6C504425@gmail.com> Hi Carson (and CCed MAKER list for the record), Thanks for troubleshooting my issue further. Good to hear that the run should ultimately work, but strange it isn?t for me. I?ll keep playing with it and will hopefully get it sorted out by running through the list you suggested. Thanks again for the help, Daren > On Oct 24, 2017, at 11:27 AM, Carson Holt wrote: > > I cannot seem to replicate this. I ran with MAKER 2.31.8 and 2.31.9, both with and without the GFF3 file (total of 4 runs). It succeeded without issues in every case. > > The only things I can think to try are. > 1. Reinstall BLAST+. Even though you have 2.6.0, just try it anyways. Also Install rmblast 2.6.0 for use wth RepeatMasker (requires that you install from source). > 2. Maker sure you run ./configure inside RepeatMasker to let it know about the new rmblast installation. > 3. Change the location of blast and related scripts in maker_exe.ctl otherwise MAKER won?t know to use your new installation. > 4. delete the mpi_blastdb directory under MAKER?s output directory tp force it to rebuild all BLAST indexes. > 5. delete any fle with a ?.db? extension in the maker output directory to force it to rebuld all GFF3 indexes. > 6. Update BioPerl to the current CPAN version. > > Also here is a link to the results I got for your contig (version 2.31.8 using the repeat masking GFF3 file) ?> http://weatherby.genetics.utah.edu/data/scaffold-1.tgz > > ?Carson > > > >> On Oct 17, 2017, at 6:46 AM, Daren C. Card wrote: >> >> Hi Carson, >> >> Thanks for offering to take a further look at this. I?ve uploaded all the files that I think you?d need to run MAKER on your systems, but let me know if you need anything else. My username is ?guest_5038?. >> >> Repeat annotations GFF is from RepeatModeler, with simple repeats filtered away. Transcript evidence was from Trinity assembly of several RNAseq libraries. Several sets of protein evidence from related species. Also have augustus HMM trained based on the genome assembly using BUSCO with retraining turned on. >> >> The command I?ve used is below, and here are the software versions I?m working with: >> >> Maker - 2.31.8 >> BLAST - 2.6.0 >> Augustus - 3.2.3 >> RepeatMasker - 4.0.6 >> >> mpiexec -n 12 maker -base CroVir_rnd1_chr1 round1_maker_opts.chr1.ctl maker_bopts.ctl maker_exe.ctl >> >> Thanks again! >> Daren >> >> >>> On Oct 13, 2017, at 10:37 AM, Carson Holt wrote: >>> >>> So you have an input GFF3 file? Could you send it to me along with the problem contig. If you want you can upload the maker control files and evidence sets, and I can just recreate the run for the contig. >>> >>> Upload here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>> >>> ?Carson >>> >>> >>> >>>> On Oct 12, 2017, at 8:22 PM, Daren C. Card wrote: >>>> >>>> Hi Carson, >>>> >>>> Thanks for the help. Issue is still lingering. I?ve tried my full ?ideal? run using both the BLAST legacy 2.2.26 and also 2.6 and get the same error, so doesn?t seem to be a BLAST issue. Or is one that won?t be easy to overcome. >>>> >>>> Using BLAST v. 2.6, I tried some more runs turning off RepeatRunner or excluding the complex repeat GFF I?m trying to supply. Seems to be running fine without my GFF, which indicates to me that the issue is this file and not BLAST. Disclaimer: I didn?t run the entire scaffold since it is quite large, but it went well past the point at which it was otherwise failing which leads me to believe it would finish okay. >>>> >>>> I validated the GFF at http://genometools.org/cgi-bin/gff3validator.cgi. I had previously had <10 negative start coordinates for the repeat coordinates in the attributes field of the GFF, which I just set to 1 to give a clean GFF. This was what I used for the runs I described above, so whatever issue there is with this GFF is a mystery to me. >>>> >>>> What advice do you have for further troubleshooting to try to determine what part of the GFF is causing the issue? I don?t see any obvious way info about how the sequence or the GFF is partitioned up for the annotation among the output files produced, so any help you can provide would be great. >>>> >>>> Hoping I can resolve this as maybe this is useful to others. Weird that I?m getting this error, as I?ve annotated several other genomes in a similar manner and never had this issue. They were less contiguous, but can?t imagine that really mattering. >>>> >>>> Thanks, >>>> Daren >>>> >>>> >>>>> On Oct 8, 2017, at 7:37 PM, Carson Holt wrote: >>>>> >>>>> MAKER will use whatever blast is indicated in maker_exe.ctl, so make sure the new installation is the one indicated there. RepeatRunner is not part of RepeatMasker, and is a separate step that is essentially just a modified BLASTX against a protein database. So the standard NCBI blast+ installation is what gets used for that (not RMBLAST). >>>>> >>>>> The error you get is because the BLAST report is truncated. At the top of a BLAST report there is a summary of results, and then below there are details about each result. What is happening is that there are results in the top summary that are not being found in the bottom detail section. If Updating to BLAST+ 2.6 does not fix it for you, you may need to drop to legacy NCBI BLAST (i.e. the one that is not the BLAST+ rewrite). Here ?> ftp://ftp.ncbi.nlm.nih.gov/blast/executables/legacy/2.2.26/ >>>>> >>>>> ?Carson >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> On Oct 6, 2017, at 6:23 AM, Daren C. Card wrote: >>>>>> >>>>>> Dear Carson, >>>>>> >>>>>> Thanks so much for the quick reply. I updated BLAST to v2.6 and reran the configure script for RepeatMasker. Looks like MAKER should natively work with the BLAST that is available in the $PATH. >>>>>> >>>>>> Unfortunately, I?m still getting the same error what appears to be at roughly the same spot (~child 226). I?ve copied the stderr below. I checked my GFF file and I don?t see any issues with coordinates. I?m going to try running without a GFF of repeat annotations to see what that does, but in the meantime I wanted to send an update and see if there is anything else I should look into. >>>>>> >>>>>> Thank you, >>>>>> Daren Card >>>>>> >>>>>> >>>>>> ################################################ >>>>>> doing repeat masking >>>>>> re reading repeat masker report. >>>>>> /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.simple.rb.out >>>>>> doing blastx repeats >>>>>> re reading blast report. >>>>>> /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.te_proteins%2Efasta.repeatrunner >>>>>> deleted:2 hits >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> collecting blastx repeatmasking >>>>>> processing all repeats >>>>>> in cluster::shadow_cluster... >>>>>> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >>>>>> --> rank=NA, hostname=moonunit0 >>>>>> ERROR: Failed while processing all repeats >>>>>> ERROR: Chunk failed at level:3, tier_type:1 >>>>>> FAILED CONTIG:scaffold-1 >>>>>> >>>>>> ERROR: Chunk failed at level:2, tier_type:0 >>>>>> FAILED CONTIG:scaffold-1 >>>>>> >>>>>> examining contents of the fasta file and run log >>>>>> ################################################ >>>>>> >>>>>> >>>>>> >>>>>>> On Oct 4, 2017, at 11:03 AM, Carson Holt wrote: >>>>>>> >>>>>>> The point where it dies is because there is no start/end coordinate for one of the alignments. The issue can either be with the GFF3 you gave it or is a truncated BLAST report. Recently there have been a number of weird BLAST+ issues related to truncated reports. Updating to 2.6+ seems to solve it for most people. There is also a 2.6 update for rmblast inside RepeatMasker. I submitted a bug report and example set to BLAST a few months ago. >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>>> On Oct 4, 2017, at 9:53 AM, Daren C. Card wrote: >>>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I?ve been having an issue with MAKER (v. 2.31.8) that I haven?t been able to overcome, and no former questions have really addressed or helped fix the problem. I?ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size? >>>>>>>> >>>>>>>> I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can?t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I?m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library). >>>>>>>> >>>>>>>> Any help would be greatly appreciated. >>>>>>>> Daren Card >>>>>>>> >>>>>>>> University of Texas Arlington >>>>>>>> >>>>>>>> ################################################### >>>>>>>> doing blastx repeats >>>>>>>> running blast search. >>>>>>>> #--------- command -------------# >>>>>>>> Widget::blastx: >>>>>>>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner >>>>>>>> #-------------------------------# >>>>>>>> deleted:0 hits >>>>>>>> collecting blastx repeatmasking >>>>>>>> processing all repeats >>>>>>>> in cluster::shadow_cluster... >>>>>>>> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >>>>>>>> --> rank=3, hostname=moonunit0 >>>>>>>> ERROR: Failed while processing all repeats >>>>>>>> ERROR: Chunk failed at level:3, tier_type:1 >>>>>>>> FAILED CONTIG:scaffold-1 >>>>>>>> >>>>>>>> doing blastx repeats >>>>>>>> running blast search. >>>>>>>> #--------- command -------------# >>>>>>>> Widget::blastx: >>>>>>>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner >>>>>>>> #-------------------------------# >>>>>>>> ERROR: Chunk failed at level:2, tier_type:0 >>>>>>>> FAILED CONTIG:scaffold-1 >>>>>>>> >>>>>>>> deleted:0 hits >>>>>>>> deleted:0 hits >>>>>>>> ################################################### >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> maker-devel mailing list >>>>>>>> maker-devel at box290.bluehost.com >>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>> >>>>>> >>>>> >>>> >>> >> > From venyao at qq.com Wed Oct 25 01:25:25 2017 From: venyao at qq.com (=?ISO-8859-1?B?V2VuIFlhbw==?=) Date: Wed, 25 Oct 2017 15:25:25 +0800 Subject: [maker-devel] NNN in maker output transcript Message-ID: Dear guys, Recently, I run maker to annotate a genome. I found that the transcript fasta file output by Maker contains "NNN". Is this normal? If not, what's going on? Is this a bug of maker or my configuration of maker is not correct? I told maker to use snap and augustus for de novo prediction and use exonerate to align ESTs and proteins. Thanks! Wen Yao -------------- next part -------------- An HTML attachment was scrubbed... URL: From dandence at gmail.com Wed Oct 25 09:42:04 2017 From: dandence at gmail.com (Daniel Ence) Date: Wed, 25 Oct 2017 11:42:04 -0400 Subject: [maker-devel] NNN in maker output transcript In-Reply-To: References: Message-ID: <4913D7BA-CD9B-4B7F-83EF-B8072B4950A6@gmail.com> Hi Wen Yao, Do you mean that some of the transcript sequences contain ?N? characters or that an entire transcript sequence is ?NNN?? > On Oct 25, 2017, at 3:25 AM, Wen Yao wrote: > > Dear guys, > > Recently, I run maker to annotate a genome. I found that the transcript fasta file output by Maker contains "NNN". Is this normal? > If not, what's going on? Is this a bug of maker or my configuration of maker is not correct? > I told maker to use snap and augustus for de novo prediction and use exonerate to align ESTs and proteins. > > Thanks! > > Wen Yao > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Oct 25 09:42:34 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 25 Oct 2017 09:42:34 -0600 Subject: [maker-devel] NNN in maker output transcript In-Reply-To: References: Message-ID: <96D45DF3-83D0-4EF3-AE29-1B929A369B81@gmail.com> The gene predictor generates the model. I don?t think snap will generate a model that contain an N. Augustus might be able to across a single codon (I?m not sure there). The N means that the nucleotide is unknown (i.e. it can be A, T, C or G). An NNN codon produces the amino acid X (which is the unknown amino acid code). So it is possible that for something as short as one or two codon?s that the predictor thinks it?s ok to assume that it will produce a valid codon and uses it to complete the reading frame. Alternatively if you are using est2genome=1 or est_gff then what you are seeing is just the result of an alignment which can align to a couple of N's. You should not use est2genome=1 for anything but training. Also est_gff or pred_gff will not be filtered if you supplied an feature location that includes an N. ?Carson > On Oct 25, 2017, at 1:25 AM, Wen Yao wrote: > > Dear guys, > > Recently, I run maker to annotate a genome. I found that the transcript fasta file output by Maker contains "NNN". Is this normal? > If not, what's going on? Is this a bug of maker or my configuration of maker is not correct? > I told maker to use snap and augustus for de novo prediction and use exonerate to align ESTs and proteins. > > Thanks! > > Wen Yao > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Oct 25 09:43:37 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 25 Oct 2017 09:43:37 -0600 Subject: [maker-devel] NNN in maker output transcript In-Reply-To: <96D45DF3-83D0-4EF3-AE29-1B929A369B81@gmail.com> References: <96D45DF3-83D0-4EF3-AE29-1B929A369B81@gmail.com> Message-ID: Also you can check the source of the model by looking a the name. i.e. does it have, augustus, snap, or est2genome in the name? ?Carson > On Oct 25, 2017, at 9:42 AM, Carson Holt wrote: > > The gene predictor generates the model. I don?t think snap will generate a model that contain an N. Augustus might be able to across a single codon (I?m not sure there). The N means that the nucleotide is unknown (i.e. it can be A, T, C or G). An NNN codon produces the amino acid X (which is the unknown amino acid code). So it is possible that for something as short as one or two codon?s that the predictor thinks it?s ok to assume that it will produce a valid codon and uses it to complete the reading frame. Alternatively if you are using est2genome=1 or est_gff then what you are seeing is just the result of an alignment which can align to a couple of N's. You should not use est2genome=1 for anything but training. Also est_gff or pred_gff will not be filtered if you supplied an feature location that includes an N. > > ?Carson > > > > >> On Oct 25, 2017, at 1:25 AM, Wen Yao wrote: >> >> Dear guys, >> >> Recently, I run maker to annotate a genome. I found that the transcript fasta file output by Maker contains "NNN". Is this normal? >> If not, what's going on? Is this a bug of maker or my configuration of maker is not correct? >> I told maker to use snap and augustus for de novo prediction and use exonerate to align ESTs and proteins. >> >> Thanks! >> >> Wen Yao >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From eennadi at gmail.com Thu Oct 26 15:34:33 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Thu, 26 Oct 2017 22:34:33 +0100 Subject: [maker-devel] How to remove contigs from GFF file Message-ID: Hello, I need to remove sequences from my GFF file can someone help me with command line for such removal ERROR: valid [SEQ_FEAT.FeatureBeginsOrEndsInGap] Feature begins or ends in gap starting at 17625 FEATURE: Gene: CR513_57782 <46071> [lcl|contig_14719:17653-17724] [lcl|contig_14719: delta, dna len= 17790] ERROR: valid [SEQ_INST.ShortSeq] Sequence only 2 residues BIOSEQ: gnl|aceprd|CR513_62412: raw, aa len= 2 Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From bmoore at genetics.utah.edu Fri Oct 27 07:17:41 2017 From: bmoore at genetics.utah.edu (Marvin B Moore) Date: Fri, 27 Oct 2017 13:17:41 +0000 Subject: [maker-devel] Backlash running through my sequence In-Reply-To: <8FD23F25-92D4-4A9C-873B-BB559B2CCBF4@illinois.edu> References: <09603A3A-9DC0-40DC-A111-9DC1FCDF80BB@gmail.com> <8FD23F25-92D4-4A9C-873B-BB559B2CCBF4@illinois.edu> Message-ID: <98FAE3F3-7C52-4EDA-8FBB-5F43DB7D54C9@umail.utah.edu> Those look suspiciously like the remnants of end-of-line control characters. Since Windows, Mac OS X and Linux all use slightly different control characters to mark end-of-line I?d look at the upstream path of where your files come from and how they?ve been processed by you or others upstream MAKER (were they generated or processed on a MS or Mac server). One bizarre example we?ve seen is that files that simply pass through an MS Outlook server as an e-mail attachment have had their end-of-line characters converted to MS format. Good luck? Barry On Oct 17, 2017, at 1:11 PM, Fields, Christopher J > wrote: I agree with Carson, though my guess is any fasta converters will either fail on these characters as non-IUPAC, or will silently remove them. Running them through a converter may not solve all the issues though, as the backslash also appears in the FASTA headers at the end of the line: cjfields-imac:MAKER cjfields$ grep '>' sample_1.fasta | grep '\\' >contig_134\ >contig_149\ >contig_158\ >contig_222\ >contig_316\ >contig_582\ >contig_634\ >contig_700\ >contig_741\ ? I?m curious, was this edited using any particular program prior to MAKER (or was this an amalgam of different files)? chris From: maker-devel > on behalf of Carson Holt > Date: Monday, October 16, 2017 at 11:22 AM To: Emmanuel Nnadi > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Backlash running through my sequence I would not just remove them. The fact they are there calls into question how they got there in the first place. If you generated this file yourself, you may want to intead use fasta_tool. ?Carson On Oct 15, 2017, at 3:32 PM, Emmanuel Nnadi > wrote: Hi all, I am trying to running annotation on some of my sequences but noticed that i have backslash that runs through the sequence. Please how do I remove them I attached the sequence Thanks Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From bmoore at genetics.utah.edu Fri Oct 27 07:24:44 2017 From: bmoore at genetics.utah.edu (Marvin B Moore) Date: Fri, 27 Oct 2017 13:24:44 +0000 Subject: [maker-devel] QI codes insufficient - how to get frac exons with EST only? In-Reply-To: References: <93934B45-909D-48FD-A840-B4F59F15AB53@gmail.com> <6A3091A3-5F0E-470D-89F3-4B6C16E50F4B@gmail.com> Message-ID: Also, you could probably build these overlap sets on the command line by subsetting the MAKER GFF3 file and then using BedTools intersect for overlap queries. Barry On Oct 11, 2017, at 10:19 PM, Matt Simenc > wrote: Very good, thank you! Matt On Wed, Oct 11, 2017 at 8:22 AM, Carson Holt > wrote: Also look at GAL for building GFF3 feature queries ?> https://github.com/The-Sequence-Ontology/GAL ?Carson On Oct 11, 2017, at 9:18 AM, Michael Campbell > wrote: Hi Matt, I have a hacky way that I?ve done it. It requires running MAKER two more times but they are quicker runs. To identify the genes that have protein support I pass all of the annotation back to MAKER using the model_gff option in the maker_opts.ctl file. Then I pull out all of the protein2genome features from the big MAKER GFF3 file and pass them in using the protein_gff option. I turn off all repeat masking and run MAKER. It runs fast because it doesn?t have to run any gene finders, align evidence, or repeatmask. In the output any gene with an AED less than 1 has protein support. Then I do the same thing with est2genome lines from the big GFF3 file and put them in as est_gff. The output of that one gives you genes with EST support. Then the genes with an AED of less than one in both sets have support from protein and EST. Hope this helps, Mike On Oct 11, 2017, at 10:53 AM, Matt Simenc > wrote: Hey MAKER people, I would like to make a Venn diagram showing the kinds of evidence supporting gene models in my MAKER annotation where the left side shows number of genes with EST support only, the right side shows number of genes with protein support only, and the intersection shows number of genes with EST and protein support. QI summary has: Fraction of exons that overlap an EST alignment Fraction of exons that overlap EST or Protein alignments Please correct me if I'm wrong, because I am interpreting the first to be fraction of exons that overlap an EST alignment and possibly also a protein alignment. If that is the case then we can't calculate the number of genes that overlap only EST or (EST and protein) from the QI information. Anyone have a way to do this or have a script to parse the MAKER GFF3 to get this? Thanks!!! Matt Simenc _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dandence at gmail.com Fri Oct 27 08:51:21 2017 From: dandence at gmail.com (Daniel Ence) Date: Fri, 27 Oct 2017 10:51:21 -0400 Subject: [maker-devel] How to remove contigs from GFF file In-Reply-To: References: Message-ID: Hi Emmanuel, can you send the command that produced the error? If you need to remove certain scaffolds or contigs from a gff3 file, you can use grep to to filter out certain scaffolds like this ?grep -v ?scaffold_name? gff3_file?. ~Daniel > On Oct 26, 2017, at 5:34 PM, Emmanuel Nnadi wrote: > > Hello, > > I need to remove sequences from my GFF file can someone help me with command line for such removal > > ERROR: valid [SEQ_FEAT.FeatureBeginsOrEndsInGap] Feature begins or ends in gap starting at 17625 FEATURE: Gene: CR513_57782 <46071> [lcl|contig_14719:17653-17724] [lcl|contig_14719: delta, dna len= 17790] > ERROR: valid [SEQ_INST.ShortSeq] Sequence only 2 residues BIOSEQ: gnl|aceprd|CR513_62412: raw, aa len= 2 > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From carsonhh at gmail.com Fri Oct 27 16:00:17 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 Oct 2017 16:00:17 -0600 Subject: [maker-devel] "ALRM" isn't numeric in exit - MAKER warning message In-Reply-To: References: Message-ID: <399AB5BD-2FC5-45F4-9AC8-1665CCFEA0D1@gmail.com> Hi Marivi, The only time MAKER uses the ALRM signal is during exit. Sometimes MPI_Finalize can freeze (it has to do with the fact it is being called from Perl). So we set an alarm just in case. Then if it takes to long we assume it is frozen and let things exit in a less than graceful way rather than let it block forever (it is already finished after all). The complaint you get may be because your system doesn?t support the alarm signal or forks.pm (which tries to intercept signals) is having an issue. Or it may just be ugliness related to parts of the process being killed with other parts still being active (it is an ungraceful exit after all). Or it may be another source of the ALRM all together (but I assume it is the MAKER ALRM given that it happens right after MAKER says it is finished). Thanks, Carson > On Oct 27, 2017, at 1:03 PM, Marivi Colle wrote: > > Hi Carson, > > After running MAKER, I checked my std output and here's the message at the end of the file. I was wondering what this warning message means? > > > Start_time: 1508465182 > End_time: 1508950543 > Elapsed: 485361 > > > Maker is now finished!!! > > Argument "ALRM" isn't numeric in exit at /opt/software/BioPerl/1.6.924--GCC-4.4.7/lib64/perl5/forks.pm line 2184. > Argument "ALRM" isn't numeric in exit at /opt/software/BioPerl/1.6.924--GCC-4.4.7/lib64/perl5/forks.pm line 2184. > Argument "ALRM" isn't numeric in exit at /opt/software/BioPerl/1.6.924--GCC-4.4.7/lib64/perl5/forks.pm line 2184. > Argument "ALRM" isn't numeric in exit at /opt/software/BioPerl/1.6.924--GCC-4.4.7/lib64/perl5/forks.pm line 2184 > > > Thank you. > Marivi > > > -- > Marivi G. Colle > Research Associate > Department of Horticulture > Michigan State University > 1066 Bogue St., East Lansing > Michigan 48824-1325, USA -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.tranvan at unil.ch Sat Oct 28 08:14:59 2017 From: patrick.tranvan at unil.ch (Patrick Tran Van) Date: Sat, 28 Oct 2017 14:14:59 +0000 Subject: [maker-devel] Advice on my pipeline In-Reply-To: <651D4267-0FD7-4A92-B778-8976B47353BB@gmail.com> References: <6b029690bace4d3fbae77c0bb1bddce8@prdexch02.ad.unil.ch> <1498470630221.84642@unil.ch> <696C51C6-5606-4ECB-A8B8-9C077182FFFA@gmail.com> <1498908228256.16549@unil.ch> <58E904BF-9AB8-4AC7-B10B-C902F414E03D@gmail.com> <1505986013492.52354@unil.ch>, <651D4267-0FD7-4A92-B778-8976B47353BB@gmail.com> Message-ID: <1509200133044.96929@unil.ch> Hi Carson, If I want to look for alternative splicing variant, can I just add the option alt_splice=1 only at the last round of maker or do I have to set it since the beggining ? (and perform the 4 rounds with this option). Cheers, Patrick ________________________________ From: Carson Holt Sent: Friday, September 22, 2017 10:08 PM To: Patrick Tran Van Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Advice on my pipeline The gff3 passthrough options are there to help users get old data into MAKER when they have lost access to the original files. But for iterative running of the pipeline, it is more effective just to rerun in place so MAKER can access the raw alignment reports. The raw reports from the alignments have more detail than what is stored in the GFF3. Details that are lost when trying to use the GFF3 as input. ?Carson On Sep 21, 2017, at 3:26 AM, Patrick Tran Van > wrote: Hi Carson, I have a doubt for the round 2, so in a previous reply you said: " Also it is more convenient to do each run in the same directory rather than supplying the previous run as GFF3 input. MAKER will automatically recycle previous results archived in the run directory when you do this. Using the maker_gff option is really more for getting data into the run from jobs performed a long time ago (so they can?t be run in the same directory). " Does it means that I don't need to modify the section : #-----Re-annotation Using MAKER Derived GFF3 ? If I let everything by default such as : altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no It will not look again for repeat and protein + transcriptome alignment ? Patrick Tran Van Groups Chapuisat, Robinson-Rechavi & Schwander Department of Ecology and Evolution University of Lausanne Le Biophore CH-1015 Lausanne Switzerland Office 3206 ________________________________ From: Carson Holt > Sent: Monday, July 3, 2017 10:50 PM To: Patrick Tran Van Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Advice on my pipeline maker2zff is just for SNAP training and not for gene filtering (please do not use it for filtering, it does not do what you think). So the final annotation set after maker with correct_est_fusion is 16,850. To decide which set is better, look at them in a browser (gene counts are not useful for guaging result). A well annotated genome will have evidence clusters that closely match the final models. A poorly annoted genome will have evidence clusters that are split or merged by the models. The corrected_est_fusion does two things. It trims long overlapping UTR fragments, and it stops evidence clusters from being merged on BLASTP evidence alone (so gene predictors will get unmerged hint regions if clusters are split). You may also find that using jaccard_clip with Trinity has reduced sensitivity for the transcript data (you may lose things that were there before, but now have better specificity, i.e. fewer false positives). Make sure you provided protein data from at least two related species to help maintain sensitivity lost form the transcript data. You can also add rejected genes models back in after the fact by using iprscan to identify unsupported models with identifiable protein domains ?> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4286374/ Thanks, Carson On Jul 1, 2017, at 5:21 AM, Patrick Tran Van > wrote: So I have assembled my transcriptome with Trinity using the jaccard clip option and I have run maker with and without corrected_est_fusion. I have then use SNAP to train/filter it with: maker2zff specie.all.gff Here are my results: Number of gene after maker -> Number of gene after maker2zff - Without corrected_est_fusion: 21621 -> 13875 - With corrected_est_fusion: 16850 -> 9098 1 )If I understand well how works corrected_est_fusion, because it prevents gene merging, shouldn't be the invert ? Normally I should find more genes with corrected_est_fusion right ? 2) I think I should find something like 13000-14000 genes for my specie. SHould I go with the "Without corrected_est_fusion" for the 2nd iteration of maker ? Thanks for your help Patrick Tran Van Groups Chapuisat, Robinson-Rechavi & Schwander Department of Ecology and Evolution University of Lausanne Le Biophore CH-1015 Lausanne Switzerland Office 3206 ________________________________ From: Carson Holt > Sent: Monday, June 26, 2017 11:38 PM To: Patrick Tran Van Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Advice on my pipeline Sorry the option is ?> correct_est_fusion It is in the maker_opts.ctl file. I would use both SNAP and Augustus on a few large contigs then review the results manually. If one of them is not behaving well, then drop it. If both behave well (i.e. correlate well with evidence alignemnts) then keep them both. ?Carson On Jun 26, 2017, at 3:48 AM, Patrick Tran Van > wrote: Thanks for your answer. 1) Do you think that adding a Augustus training in addition to SNAP at the step 3 and 5 will add more confidence (instead of adding Augustus only for the final round) ? Because I am using autoAug for this and it tooks a while to compute .. 2) I don't see this option : 'avoid_est_fusion=1' . I have tried to add it but I got this error: WARNING: Invalid option 'avoid_est_fusion' in control file maker_opts.ctl (I am using v 2.31.8 ) Patrick Tran Van Groups Chapuisat, Robinson-Rechavi & Schwander Department of Ecology and Evolution University of Lausanne Le Biophore CH-1015 Lausanne Switzerland Office 3206 ________________________________ From: Carson Holt > Sent: Monday, June 5, 2017 8:29 PM To: Patrick Tran Van Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Advice on my pipeline Your plan sounds good. A couple of related notes. Insect genomes tend to have high gene density, so gene merging will be the primary difficulty. You can avoid merging of mRNA-seq evidence by using options like jaccard_clip in Trinity. Then use avoid_est_fusion=1 inside of MAKER. Also it is more convenient to do each run in the same directory rather than supplying the previous run as GFF3 input. MAKER will automatically recycle previous results archived in the run directory when you do this. Using the maker_gff option is really more for getting data into the run from jobs performed a long time ago (so they can?t be run in the same directory). ?Carson On Jun 2, 2017, at 3:56 AM, Patrick Tran Van > wrote: Hello, This is my first time running Maker for an insect genome annotation. I have found various resources and tried to make a consensus, I am looking for your thoughts and advices about my pipeline, if I can improve something or doing useless things: What I have: - RNA evidence: transcriptome - Proteine evidence: swissprot/uniprot + busco protein set of insect - Cegma and busco results of my genome 1) Train SNAP with CEGMA 2) Run (run A) maker with repeat masking with transcript, protein, the new SNAP file (from step 1) and augustus file (from busco). 3) Create SNAP model from run A. 4) Run (run B ) with the new SNAP (done at step 3) with options turned off (est2genome=0) and (protein2genome=0) data, provide gff file (maker_gff=run_A.gff), turn off repeat masking (rm_pass=1), and use previous mapping results (altest_pass=1 and protein_pass=1). 5) Create SNAP model from run B. 6) Run (run C) with the new SNAP (done at step 5) with options turned off (est2genome=0) and (protein2genome=0) data, provide gff file (maker_gff=run_B.gff), turn off repeat masking (rm_pass=1), and use previous mapping results (altest_pass=1 and protein_pass=1). 7) Create SNAP model from run C AND Create Augustus gene model from run C 8) Run (run D) with the new SNAP (done at step 7) + AUGUSTUS file (step 7) with options turned off (est2genome=0) and (protein2genome=0) data, provide gff file (maker_gff=run_C.gff), turn off repeat masking (rm_pass=1), and use previous mapping results (altest_pass=1 and protein_pass=1). + Use keep_preds=1 Does it seems coherent ? Cheers, Patrick Tran Van Groups Chapuisat, Robinson-Rechavi & Schwander Department of Ecology and Evolution University of Lausanne Le Biophore CH-1015 Lausanne Switzerland Office 3206 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dandence at gmail.com Mon Oct 2 07:17:57 2017 From: dandence at gmail.com (Daniel Ence) Date: Mon, 2 Oct 2017 09:17:57 -0400 Subject: [maker-devel] Error with Maker_functional_gff In-Reply-To: References: Message-ID: Hi Emmanuel, I think this script is expecting the file ?uniprot_sprot.fasta? downloaded from the uniprot download page at http://www.uniprot.org/downloads#uniprotkblink The fasta headers in this file are different from the fasta header that the file you used has: >sp|Q6GZX4|001R_FRG3G Putative transcription factor 001R OS=Frog virus 3 (isolate Goorha) GN=FV3-001R PE=4 SV=1 Let us know if that helps, Daniel > On Oct 2, 2017, at 1:03 AM, Emmanuel Nnadi wrote: > > Hello, > I intend to rename genes for Genebank submission > > I downloaded swissprot.fa from NCBI and used blast MAKER generated file to swissprot. > > the output of BLAST RESULT looks like this > snap_masked-contig_8151-processed-gene-0.8-mRNA-1 P10978.1 49.315 73 37 0 43 115 874 946 2.61e-14 71.6 > > I attempted to run maker_funtional_gff using the swissprot.fa downloaded and the blastp result > > I got the following result > > Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 139, <$IN> line 2897906. > Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 141, <$IN> line 2897906. > Can't parse details from FASTA header: >P11684.1 RecName: Full=Uteroglobin; AltName: Full=Clara cell phospholipid-binding protein; Short=CCPBP; AltName: Full=Clara cells 10 kDa secretory protein; Short=CC10; AltName: Full=Secretoglobin family 1A member 1; AltName: Full=Urinary protein 1; Short=UP-1; Short=UP1; Short=Urine protein 1; Flags: Precursor > > > Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 139, <$IN> line 1608599. > Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 141, <$IN> line 1608599. > Can't parse details from FASTA header: >Q9HZU2.1 RecName: Full=Precorrin-8X methylmutase; AltName: Full=HBA synthase; AltName: Full=Precorrin isomerase > > What can I do? > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Oct 2 07:30:43 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 2 Oct 2017 09:30:43 -0400 Subject: [maker-devel] question on gene numbers with quality_filter.pl In-Reply-To: <16C1890A-2042-4BE1-93CE-8A8DC0C18151@ad.unc.edu> References: <16C1890A-2042-4BE1-93CE-8A8DC0C18151@ad.unc.edu> Message-ID: <30C718DD-D3E5-4659-B83D-B9520DD20E34@gmail.com> Hi Chris, This is interesting. -d in quality_filter.pl should only filter out genes based on AED. Is there a chance that you counted transcripts instead of genes? If there is a transcript with an AED of 1 then quality filter should remove it but leave the gene and the transcripts with AEDs less than 1. I can have a look at it if you send me one of the genes (in GFF3 format) that was filtered out by quality_filter.pl even though it had an AED less than 1. Thanks, Mike > On Sep 29, 2017, at 1:20 PM, Willett, Christopher S wrote: > > Hello- > > We are getting to the final stages (hopefully) of a reannotation of a new assembly of a copepod genome using MAKER and we had some questions about which set of genes to use. Our latest runs were using Pfam domains to define default vs standard set using the quality_filter.pl script and I had a question about stringency of the filters for this script. It appears that the default is more stringent than the output that we get from MAKER without using this script (all with AED max set to 1). Are there additional filters in this script beyond AED that would cause this? > > Here is what we are seeing if more details would be helpful. With a run with or without the keep_pred turned our final MAKER run gives ~21500 predicted genes with or 15200 without the keep predictions turned on. What I was wondering about was why this 15200 is higher than the default set (which gives ~14500 genes) after we filter the gff using the -d setting in quality_filter.pl. For completeness the standard set (-s setting) is retaining ~14800 genes and if I filter the 15200 gff file with the default parameters that yields ~14100 genes. So I was curious what else was going on in the filter script beyond AED that would trim out genes? > > The genes sets look pretty good overall and seem like reasonable numbers so we were debating which set to use as our final set. I am also trying a few other analyses in InterProScan to see if that identifies additional genes beyond Pfam for retention but that seems a bit independent from the question above. > > Thanks for your help, > > Best, > > Chris Willett > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Research Associate Professor > Department of Biology > CB#3280 Coker Hall > University of North Carolina, Chapel Hill > Chapel Hill, NC, 27599-3280 > > Office: 2252 Genome Science Building > phone: > 919-843-8663 > fax: > 919-962-1625 > > http://labs.bio.unc.edu/Willett/ > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Oct 2 13:19:51 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 2 Oct 2017 15:19:51 -0400 Subject: [maker-devel] question on gene numbers with quality_filter.pl In-Reply-To: <4C24415C-8A2A-499F-A55A-0026F7D1329F@ad.unc.edu> References: <16C1890A-2042-4BE1-93CE-8A8DC0C18151@ad.unc.edu> <30C718DD-D3E5-4659-B83D-B9520DD20E34@gmail.com> <4C24415C-8A2A-499F-A55A-0026F7D1329F@ad.unc.edu> Message-ID: <0A5A51F2-C551-493B-943B-7F5F81C294BF@gmail.com> Hi Chris, Yeah By default MAKER shouldn?t keep any annotation with an AED of 1. I?ve ccd the dev list on this to see if anyone else has any idea why you might get AED 1 genes with keep_preds=0. Could you send me the maker_opts.ctl file for the run. There may be something informative in there. Thanks, Mike > On Oct 2, 2017, at 2:32 PM, Willett, Christopher S wrote: > > Hi Mike- > > I was looking at the lists of mRNAs and I think what is happening is that there are still genes retained in our initial output from MAKER that have an AED=1 that are then getting trimmed out of the filtered file. If I am setting the AED threshold equal to 1 in the control file for the MAKER run is that less than one or less than or equal to one for retention? Should these AED=1 genes be making it into the gene and mRNA pools if we have the keep predictions parameter set to 0? > > Thanks for your help, > > Best, > > Chris > > > >> On Oct 2, 2017, at 9:30 AM, Michael Campbell > wrote: >> >> Hi Chris, >> >> This is interesting. -d in quality_filter.pl should only filter out genes based on AED. Is there a chance that you counted transcripts instead of genes? If there is a transcript with an AED of 1 then quality filter should remove it but leave the gene and the transcripts with AEDs less than 1. I can have a look at it if you send me one of the genes (in GFF3 format) that was filtered out by quality_filter.pl even though it had an AED less than 1. >> >> Thanks, >> Mike >> >> >>> On Sep 29, 2017, at 1:20 PM, Willett, Christopher S > wrote: >>> >>> Hello- >>> >>> We are getting to the final stages (hopefully) of a reannotation of a new assembly of a copepod genome using MAKER and we had some questions about which set of genes to use. Our latest runs were using Pfam domains to define default vs standard set using the quality_filter.pl script and I had a question about stringency of the filters for this script. It appears that the default is more stringent than the output that we get from MAKER without using this script (all with AED max set to 1). Are there additional filters in this script beyond AED that would cause this? >>> >>> Here is what we are seeing if more details would be helpful. With a run with or without the keep_pred turned our final MAKER run gives ~21500 predicted genes with or 15200 without the keep predictions turned on. What I was wondering about was why this 15200 is higher than the default set (which gives ~14500 genes) after we filter the gff using the -d setting in quality_filter.pl. For completeness the standard set (-s setting) is retaining ~14800 genes and if I filter the 15200 gff file with the default parameters that yields ~14100 genes. So I was curious what else was going on in the filter script beyond AED that would trim out genes? >>> >>> The genes sets look pretty good overall and seem like reasonable numbers so we were debating which set to use as our final set. I am also trying a few other analyses in InterProScan to see if that identifies additional genes beyond Pfam for retention but that seems a bit independent from the question above. >>> >>> Thanks for your help, >>> >>> Best, >>> >>> Chris Willett >>> >>> >>> >>> >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> Research Associate Professor >>> Department of Biology >>> CB#3280 Coker Hall >>> University of North Carolina, Chapel Hill >>> Chapel Hill, NC, 27599-3280 >>> >>> Office: 2252 Genome Science Building >>> phone: >>> 919-843-8663 >>> fax: >>> 919-962-1625 >>> >>> http://labs.bio.unc.edu/Willett/ >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Oct 2 13:35:55 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 2 Oct 2017 15:35:55 -0400 Subject: [maker-devel] question on gene numbers with quality_filter.pl In-Reply-To: References: <16C1890A-2042-4BE1-93CE-8A8DC0C18151@ad.unc.edu> <30C718DD-D3E5-4659-B83D-B9520DD20E34@gmail.com> <4C24415C-8A2A-499F-A55A-0026F7D1329F@ad.unc.edu> <0A5A51F2-C551-493B-943B-7F5F81C294BF@gmail.com> Message-ID: <4C4E3DE7-CE28-4DF7-B234-E88701CAD172@gmail.com> Hi Chris, It?s this line here: model_gff=/proj/willetlb/users/cwillett/MAKER_analyses/dovetail_ann/SDv1.0_est-forward-SDv2.1.gff Anything passed to model_gff is treated as sacred by MAKER and will be kept regardless of AED. If you pass it in as pred_gff= then it will be subject to the AED filters. I hope this helps, Mike > On Oct 2, 2017, at 3:28 PM, Willett, Christopher S wrote: > > From daren.card at gmail.com Wed Oct 4 09:53:42 2017 From: daren.card at gmail.com (Daren C. Card) Date: Wed, 4 Oct 2017 10:53:42 -0500 Subject: [maker-devel] MAKER RepeatRunner error on long scaffolds only Message-ID: Hi all, I?ve been having an issue with MAKER (v. 2.31.8) that I haven?t been able to overcome, and no former questions have really addressed or helped fix the problem. I?ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size? I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can?t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I?m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library). Any help would be greatly appreciated. Daren Card University of Texas Arlington ################################################### doing blastx repeats running blast search. #--------- command -------------# Widget::blastx: /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner #-------------------------------# deleted:0 hits collecting blastx repeatmasking processing all repeats in cluster::shadow_cluster... Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. --> rank=3, hostname=moonunit0 ERROR: Failed while processing all repeats ERROR: Chunk failed at level:3, tier_type:1 FAILED CONTIG:scaffold-1 doing blastx repeats running blast search. #--------- command -------------# Widget::blastx: /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner #-------------------------------# ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:scaffold-1 deleted:0 hits deleted:0 hits ################################################### From carsonhh at gmail.com Wed Oct 4 10:03:52 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 4 Oct 2017 10:03:52 -0600 Subject: [maker-devel] MAKER RepeatRunner error on long scaffolds only In-Reply-To: References: Message-ID: <2460BB61-C918-40B5-ABF2-03193BF13CCC@gmail.com> The point where it dies is because there is no start/end coordinate for one of the alignments. The issue can either be with the GFF3 you gave it or is a truncated BLAST report. Recently there have been a number of weird BLAST+ issues related to truncated reports. Updating to 2.6+ seems to solve it for most people. There is also a 2.6 update for rmblast inside RepeatMasker. I submitted a bug report and example set to BLAST a few months ago. ?Carson > On Oct 4, 2017, at 9:53 AM, Daren C. Card wrote: > > Hi all, > > I?ve been having an issue with MAKER (v. 2.31.8) that I haven?t been able to overcome, and no former questions have really addressed or helped fix the problem. I?ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size? > > I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can?t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I?m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library). > > Any help would be greatly appreciated. > Daren Card > > University of Texas Arlington > > ################################################### > doing blastx repeats > running blast search. > #--------- command -------------# > Widget::blastx: > /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner > #-------------------------------# > deleted:0 hits > collecting blastx repeatmasking > processing all repeats > in cluster::shadow_cluster... > Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. > --> rank=3, hostname=moonunit0 > ERROR: Failed while processing all repeats > ERROR: Chunk failed at level:3, tier_type:1 > FAILED CONTIG:scaffold-1 > > doing blastx repeats > running blast search. > #--------- command -------------# > Widget::blastx: > /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner > #-------------------------------# > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:scaffold-1 > > deleted:0 hits > deleted:0 hits > ################################################### > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From qwzhang0601 at gmail.com Wed Oct 4 16:31:09 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Wed, 4 Oct 2017 18:31:09 -0400 Subject: [maker-devel] About eAED Message-ID: Hello: I ran the maker2 pipeline and got the default gene sets (with AED<1). But I found there are several hundred genes with eAED 1. Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can be the reason of the great difference between AED and eAED. For this gene it has a very low AED score, is it still a reliable gene model if its eAED equals 1? >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 QI:75|0|0|1|0|0|2|111|35 Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Wed Oct 4 16:35:41 2017 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Thu, 5 Oct 2017 09:35:41 +1100 Subject: [maker-devel] About eAED In-Reply-To: References: Message-ID: Carson commented on this here https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ On 5 October 2017 at 09:31, Quanwei Zhang wrote: > Hello: > > I ran the maker2 pipeline and got the default gene sets (with AED<1). But > I found there are several hundred genes with eAED 1. > > Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can > be the reason of the great difference between AED and eAED. For this gene > it has a very low AED score, is it still a reliable gene model if its eAED > equals 1? > > >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 > QI:75|0|0|1|0|0|2|111|35 > > Thanks > > Best > Quanwei > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Oct 4 16:38:00 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 4 Oct 2017 16:38:00 -0600 Subject: [maker-devel] About eAED In-Reply-To: References: Message-ID: <77155DA5-6454-4B25-BCF6-DE6B077BA548@gmail.com> eAED is an extended AED calculation that does some inference about the evidence (i.e. checks reading frame and not just overlap, and may infer support for an exon if by splice sites are confirmed etc.). If eAED is 1 that means that while there is evidence supporting the model, the evidence is more likely to be spurious, so it may be a false model. ?Carson > On Oct 4, 2017, at 4:31 PM, Quanwei Zhang wrote: > > Hello: > > I ran the maker2 pipeline and got the default gene sets (with AED<1). But I found there are several hundred genes with eAED 1. > > Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can be the reason of the great difference between AED and eAED. For this gene it has a very low AED score, is it still a reliable gene model if its eAED equals 1? > > >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 QI:75|0|0|1|0|0|2|111|35 > > Thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Oct 4 16:39:52 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 4 Oct 2017 16:39:52 -0600 Subject: [maker-devel] About eAED In-Reply-To: References: Message-ID: <606958D2-D9BB-477D-ACE8-E096A9AD9666@gmail.com> This one is an even better explanation than the answer I just gave. Thank you. ?Carson > On Oct 4, 2017, at 4:35 PM, Xabier V?zquez-Campos wrote: > > Carson commented on this here > https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ > > On 5 October 2017 at 09:31, Quanwei Zhang > wrote: > Hello: > > I ran the maker2 pipeline and got the default gene sets (with AED<1). But I found there are several hundred genes with eAED 1. > > Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can be the reason of the great difference between AED and eAED. For this gene it has a very low AED score, is it still a reliable gene model if its eAED equals 1? > > >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 QI:75|0|0|1|0|0|2|111|35 > > Thanks > > Best > Quanwei > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > Xabier V?zquez-Campos, PhD > Research Associate > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From eennadi at gmail.com Sun Oct 1 23:03:01 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Mon, 2 Oct 2017 06:03:01 +0100 Subject: [maker-devel] Error with Maker_functional_gff Message-ID: Hello, I intend to rename genes for Genebank submission I downloaded swissprot.fa from NCBI and used blast MAKER generated file to swissprot. the output of BLAST RESULT looks like this snap_masked-contig_8151-processed-gene-0.8-mRNA-1 P10978.1 49.315 73 37 0 43 115 874 946 2.61e-14 71.6 I attempted to run maker_funtional_gff using the swissprot.fa downloaded and the blastp result I got the following result Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 139, <$IN> line 2897906. Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 141, <$IN> line 2897906. Can't parse details from FASTA header: >P11684.1 RecName: Full=Uteroglobin; AltName: Full=Clara cell phospholipid-binding protein; Short=CCPBP; AltName: Full=Clara cells 10 kDa secretory protein; Short=CC10; AltName: Full=Secretoglobin family 1A member 1; AltName: Full=Urinary protein 1; Short=UP-1; Short=UP1; Short=Urine protein 1; Flags: Precursor Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 139, <$IN> line 1608599. Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 141, <$IN> line 1608599. Can't parse details from FASTA header: >Q9HZU2.1 RecName: Full=Precorrin-8X methylmutase; AltName: Full=HBA synthase; AltName: Full=Precorrin isomerase What can I do? Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From willett4 at email.unc.edu Mon Oct 2 09:04:38 2017 From: willett4 at email.unc.edu (Willett, Christopher S) Date: Mon, 2 Oct 2017 15:04:38 +0000 Subject: [maker-devel] question on gene numbers with quality_filter.pl In-Reply-To: <30C718DD-D3E5-4659-B83D-B9520DD20E34@gmail.com> References: <16C1890A-2042-4BE1-93CE-8A8DC0C18151@ad.unc.edu> <30C718DD-D3E5-4659-B83D-B9520DD20E34@gmail.com> Message-ID: Hi Mike- Thanks for getting back to me. I was using the grep -cP '\tgene\t? syntax to count the numbers and it seems to be giving me the same numbers I got before when I was counting either the transcripts or the genes in the fasta output files from our original run. I will have to look at the files a bit more to see if I can find some examples of genes that fit what you are suggesting. Best, Chris On Oct 2, 2017, at 9:30 AM, Michael Campbell > wrote: Hi Chris, This is interesting. -d in quality_filter.pl should only filter out genes based on AED. Is there a chance that you counted transcripts instead of genes? If there is a transcript with an AED of 1 then quality filter should remove it but leave the gene and the transcripts with AEDs less than 1. I can have a look at it if you send me one of the genes (in GFF3 format) that was filtered out by quality_filter.pl even though it had an AED less than 1. Thanks, Mike On Sep 29, 2017, at 1:20 PM, Willett, Christopher S > wrote: Hello- We are getting to the final stages (hopefully) of a reannotation of a new assembly of a copepod genome using MAKER and we had some questions about which set of genes to use. Our latest runs were using Pfam domains to define default vs standard set using the quality_filter.pl script and I had a question about stringency of the filters for this script. It appears that the default is more stringent than the output that we get from MAKER without using this script (all with AED max set to 1). Are there additional filters in this script beyond AED that would cause this? Here is what we are seeing if more details would be helpful. With a run with or without the keep_pred turned our final MAKER run gives ~21500 predicted genes with or 15200 without the keep predictions turned on. What I was wondering about was why this 15200 is higher than the default set (which gives ~14500 genes) after we filter the gff using the -d setting in quality_filter.pl. For completeness the standard set (-s setting) is retaining ~14800 genes and if I filter the 15200 gff file with the default parameters that yields ~14100 genes. So I was curious what else was going on in the filter script beyond AED that would trim out genes? The genes sets look pretty good overall and seem like reasonable numbers so we were debating which set to use as our final set. I am also trying a few other analyses in InterProScan to see if that identifies additional genes beyond Pfam for retention but that seems a bit independent from the question above. Thanks for your help, Best, Chris Willett ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Research Associate Professor Department of Biology CB#3280 Coker Hall University of North Carolina, Chapel Hill Chapel Hill, NC, 27599-3280 Office: 2252 Genome Science Building phone: 919-843-8663 fax: 919-962-1625 http://labs.bio.unc.edu/Willett/ _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From willett4 at email.unc.edu Mon Oct 2 13:28:19 2017 From: willett4 at email.unc.edu (Willett, Christopher S) Date: Mon, 2 Oct 2017 19:28:19 +0000 Subject: [maker-devel] question on gene numbers with quality_filter.pl In-Reply-To: <0A5A51F2-C551-493B-943B-7F5F81C294BF@gmail.com> References: <16C1890A-2042-4BE1-93CE-8A8DC0C18151@ad.unc.edu> <30C718DD-D3E5-4659-B83D-B9520DD20E34@gmail.com> <4C24415C-8A2A-499F-A55A-0026F7D1329F@ad.unc.edu> <0A5A51F2-C551-493B-943B-7F5F81C294BF@gmail.com> Message-ID: Hi Mike- Here is the control file for the last run of MAKER with keep_preds=0 and here is an example of one mRNA retained from the gff file: Chromosome_6 maker mRNA 556000 557215 . + . ID=maker-Chromosome_6-exonerate_est2genome-gene-5.3-mRNA-1;Parent=maker-Chromosome_6-exonerate_est2genome-gene-5.3;Name=TCALIF_02833-PA;_AED=1.00;_eAED=1.00;_QI=15|0|0|0|1|1|2|75|338;score=100;Alias=TCALIF_02833-PA Thanks, Chris On Oct 2, 2017, at 3:19 PM, Michael Campbell > wrote: Hi Chris, Yeah By default MAKER shouldn?t keep any annotation with an AED of 1. I?ve ccd the dev list on this to see if anyone else has any idea why you might get AED 1 genes with keep_preds=0. Could you send me the maker_opts.ctl file for the run. There may be something informative in there. Thanks, Mike On Oct 2, 2017, at 2:32 PM, Willett, Christopher S > wrote: Hi Mike- I was looking at the lists of mRNAs and I think what is happening is that there are still genes retained in our initial output from MAKER that have an AED=1 that are then getting trimmed out of the filtered file. If I am setting the AED threshold equal to 1 in the control file for the MAKER run is that less than one or less than or equal to one for retention? Should these AED=1 genes be making it into the gene and mRNA pools if we have the keep predictions parameter set to 0? Thanks for your help, Best, Chris On Oct 2, 2017, at 9:30 AM, Michael Campbell > wrote: Hi Chris, This is interesting. -d in quality_filter.pl should only filter out genes based on AED. Is there a chance that you counted transcripts instead of genes? If there is a transcript with an AED of 1 then quality filter should remove it but leave the gene and the transcripts with AEDs less than 1. I can have a look at it if you send me one of the genes (in GFF3 format) that was filtered out by quality_filter.pl even though it had an AED less than 1. Thanks, Mike On Sep 29, 2017, at 1:20 PM, Willett, Christopher S > wrote: Hello- We are getting to the final stages (hopefully) of a reannotation of a new assembly of a copepod genome using MAKER and we had some questions about which set of genes to use. Our latest runs were using Pfam domains to define default vs standard set using the quality_filter.pl script and I had a question about stringency of the filters for this script. It appears that the default is more stringent than the output that we get from MAKER without using this script (all with AED max set to 1). Are there additional filters in this script beyond AED that would cause this? Here is what we are seeing if more details would be helpful. With a run with or without the keep_pred turned our final MAKER run gives ~21500 predicted genes with or 15200 without the keep predictions turned on. What I was wondering about was why this 15200 is higher than the default set (which gives ~14500 genes) after we filter the gff using the -d setting in quality_filter.pl. For completeness the standard set (-s setting) is retaining ~14800 genes and if I filter the 15200 gff file with the default parameters that yields ~14100 genes. So I was curious what else was going on in the filter script beyond AED that would trim out genes? The genes sets look pretty good overall and seem like reasonable numbers so we were debating which set to use as our final set. I am also trying a few other analyses in InterProScan to see if that identifies additional genes beyond Pfam for retention but that seems a bit independent from the question above. Thanks for your help, Best, Chris Willett ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Research Associate Professor Department of Biology CB#3280 Coker Hall University of North Carolina, Chapel Hill Chapel Hill, NC, 27599-3280 Office: 2252 Genome Science Building phone: 919-843-8663 fax: 919-962-1625 http://labs.bio.unc.edu/Willett/ _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl_full8 Type: application/octet-stream Size: 5617 bytes Desc: maker_opts.ctl_full8 URL: From qwzhang0601 at gmail.com Wed Oct 4 20:35:55 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Wed, 4 Oct 2017 22:35:55 -0400 Subject: [maker-devel] About eAED In-Reply-To: <606958D2-D9BB-477D-ACE8-E096A9AD9666@gmail.com> References: <606958D2-D9BB-477D-ACE8-E096A9AD9666@gmail.com> Message-ID: Thank you all. Most time, the AED is equal to or lower than eAED, but there are some genes whose eAED is smaller than AED. I feel the eAED is more stringent than AED. Would you give me an example, under what condition eAED can be smaller than AED? The default maker2 gene set includes all genes with AED less than 1. Do you think eAED is a better choice to filter gene models than AED? Best Quanwei 2017-10-04 18:39 GMT-04:00 Carson Holt : > This one is an even better explanation than the answer I just gave. Thank > you. > > ?Carson > > On Oct 4, 2017, at 4:35 PM, Xabier V?zquez-Campos > wrote: > > Carson commented on this here > https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ > > On 5 October 2017 at 09:31, Quanwei Zhang wrote: > >> Hello: >> >> I ran the maker2 pipeline and got the default gene sets (with AED<1). But >> I found there are several hundred genes with eAED 1. >> >> Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can >> be the reason of the great difference between AED and eAED. For this gene >> it has a very low AED score, is it still a reliable gene model if its eAED >> equals 1? >> >> >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 >> QI:75|0|0|1|0|0|2|111|35 >> >> Thanks >> >> Best >> Quanwei >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Oct 4 20:38:25 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 4 Oct 2017 20:38:25 -0600 Subject: [maker-devel] About eAED In-Reply-To: References: <606958D2-D9BB-477D-ACE8-E096A9AD9666@gmail.com> Message-ID: <5DEAC021-9925-4B41-9332-AB48685D7304@gmail.com> The previous linked comment explains in detail ?> https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ Basically the middle support of exon is inferred from edge support even though no overlap exists (so eAED infers support and AED does not). ?Carson > On Oct 4, 2017, at 8:35 PM, Quanwei Zhang wrote: > > Thank you all. Most time, the AED is equal to or lower than eAED, but there are some genes whose eAED is smaller than AED. I feel the eAED is more stringent than AED. Would you give me an example, under what condition eAED can be smaller than AED? > > The default maker2 gene set includes all genes with AED less than 1. Do you think eAED is a better choice to filter gene models than AED? > > Best > Quanwei > > > > 2017-10-04 18:39 GMT-04:00 Carson Holt >: > This one is an even better explanation than the answer I just gave. Thank you. > > ?Carson > >> On Oct 4, 2017, at 4:35 PM, Xabier V?zquez-Campos > wrote: >> >> Carson commented on this here >> https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ >> >> On 5 October 2017 at 09:31, Quanwei Zhang > wrote: >> Hello: >> >> I ran the maker2 pipeline and got the default gene sets (with AED<1). But I found there are several hundred genes with eAED 1. >> >> Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can be the reason of the great difference between AED and eAED. For this gene it has a very low AED score, is it still a reliable gene model if its eAED equals 1? >> >> >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 QI:75|0|0|1|0|0|2|111|35 >> >> Thanks >> >> Best >> Quanwei >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> >> -- >> Xabier V?zquez-Campos, PhD >> Research Associate >> NSW Systems Biology Initiative >> School of Biotechnology and Biomolecular Sciences >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Oct 4 20:43:28 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 4 Oct 2017 20:43:28 -0600 Subject: [maker-devel] About eAED In-Reply-To: <5DEAC021-9925-4B41-9332-AB48685D7304@gmail.com> References: <606958D2-D9BB-477D-ACE8-E096A9AD9666@gmail.com> <5DEAC021-9925-4B41-9332-AB48685D7304@gmail.com> Message-ID: eAED can be better for edge cases, but neither is perfect. Low AED generally correlates with better models. But a high AED does not mean the model doesn?t exist, it just means you should spend a little more time deciding if you really believe it or not. ?Carson > On Oct 4, 2017, at 8:38 PM, Carson Holt wrote: > > The previous linked comment explains in detail ?> https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ > > Basically the middle support of exon is inferred from edge support even though no overlap exists (so eAED infers support and AED does not). > > ?Carson > > >> On Oct 4, 2017, at 8:35 PM, Quanwei Zhang > wrote: >> >> Thank you all. Most time, the AED is equal to or lower than eAED, but there are some genes whose eAED is smaller than AED. I feel the eAED is more stringent than AED. Would you give me an example, under what condition eAED can be smaller than AED? >> >> The default maker2 gene set includes all genes with AED less than 1. Do you think eAED is a better choice to filter gene models than AED? >> >> Best >> Quanwei >> >> >> >> 2017-10-04 18:39 GMT-04:00 Carson Holt >: >> This one is an even better explanation than the answer I just gave. Thank you. >> >> ?Carson >> >>> On Oct 4, 2017, at 4:35 PM, Xabier V?zquez-Campos > wrote: >>> >>> Carson commented on this here >>> https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ >>> >>> On 5 October 2017 at 09:31, Quanwei Zhang > wrote: >>> Hello: >>> >>> I ran the maker2 pipeline and got the default gene sets (with AED<1). But I found there are several hundred genes with eAED 1. >>> >>> Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can be the reason of the great difference between AED and eAED. For this gene it has a very low AED score, is it still a reliable gene model if its eAED equals 1? >>> >>> >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 QI:75|0|0|1|0|0|2|111|35 >>> >>> Thanks >>> >>> Best >>> Quanwei >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >>> >>> -- >>> Xabier V?zquez-Campos, PhD >>> Research Associate >>> NSW Systems Biology Initiative >>> School of Biotechnology and Biomolecular Sciences >>> The University of New South Wales >>> Sydney NSW 2052 AUSTRALIA >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Wed Oct 4 21:25:24 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Wed, 4 Oct 2017 23:25:24 -0400 Subject: [maker-devel] About eAED In-Reply-To: References: <606958D2-D9BB-477D-ACE8-E096A9AD9666@gmail.com> <5DEAC021-9925-4B41-9332-AB48685D7304@gmail.com> Message-ID: Thanks for your explanation. Best Quanwei 2017-10-04 22:43 GMT-04:00 Carson Holt : > eAED can be better for edge cases, but neither is perfect. Low AED > generally correlates with better models. But a high AED does not mean the > model doesn?t exist, it just means you should spend a little more time > deciding if you really believe it or not. > > ?Carson > > > > On Oct 4, 2017, at 8:38 PM, Carson Holt wrote: > > The previous linked comment explains in detail ?> > https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ > > Basically the middle support of exon is inferred from edge support even > though no overlap exists (so eAED infers support and AED does not). > > ?Carson > > > On Oct 4, 2017, at 8:35 PM, Quanwei Zhang wrote: > > Thank you all. Most time, the AED is equal to or lower than eAED, but > there are some genes whose eAED is smaller than AED. I feel the eAED is > more stringent than AED. Would you give me an example, under what condition > eAED can be smaller than AED? > > The default maker2 gene set includes all genes with AED less than 1. Do > you think eAED is a better choice to filter gene models than AED? > > Best > Quanwei > > > > 2017-10-04 18:39 GMT-04:00 Carson Holt : > >> This one is an even better explanation than the answer I just gave. Thank >> you. >> >> ?Carson >> >> On Oct 4, 2017, at 4:35 PM, Xabier V?zquez-Campos >> wrote: >> >> Carson commented on this here >> https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa- >> ko/iC4KTuIitGEJ >> >> On 5 October 2017 at 09:31, Quanwei Zhang wrote: >> >>> Hello: >>> >>> I ran the maker2 pipeline and got the default gene sets (with AED<1). >>> But I found there are several hundred genes with eAED 1. >>> >>> Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can >>> be the reason of the great difference between AED and eAED. For this gene >>> it has a very low AED score, is it still a reliable gene model if its eAED >>> equals 1? >>> >>> >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 >>> QI:75|0|0|1|0|0|2|111|35 >>> >>> Thanks >>> >>> Best >>> Quanwei >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> >> >> -- >> Xabier V?zquez-Campos, *PhD* >> *Research Associate* >> NSW Systems Biology Initiative >> School of Biotechnology and Biomolecular Sciences >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dandence at gmail.com Thu Oct 5 08:00:21 2017 From: dandence at gmail.com (Daniel Ence) Date: Thu, 5 Oct 2017 10:00:21 -0400 Subject: [maker-devel] Error with Maker_functional_gff In-Reply-To: References: Message-ID: Hi Emmanuel, I can?t tell whether it?s will work from the blast lines that you sent. It will depend on the full headers in the fasta lines, which you?ll run after all the blasts are complete. Assembly isn?t really my expertise or the topic of this mailing list, but assembling your contigs into scaffolds would probably help your annotations by connecting some parts of genes that are broken across contigs, and will definitely help downstream analysis if you need to know which genes are located next to each other. How much improvement you can get by scaffolding depends on the type of sequence data you have. Each scaffolder makes assumptions and has requirements, and some assemblers like velvet and SOAPdenovo have scaffolding built into their algorithms. I?d recommend starting with a review like this one: http://www.sciencedirect.com/science/article/pii/S1672022912000095 ~Daniel > On Oct 2, 2017, at 10:47 AM, Emmanuel Nnadi wrote: > > Hello Daniel, > > Thanks for the tip, I was able to download uniprot_swiss.fa I am currently running the blast now > > it looks like this > > MUCPR_041061-RA sp|P10978|POLX_TOBAC 49.315 73 37 0 43 115 874 946 2.95e-14 71.6 > MUCPR_026643-RA sp|Q00451|PRF1_SOLLC 86.207 87 11 1 243 328 257 343 3.65e-32 126 > > Is it ok? > > I wish to ask, I did not assemble my contigs into scaffold before annotating would it affect the end result? > > I wish to assemble my sequence into scaffold can you advice on the best software to use? > > I attempted using SSPACE: a new stand-alone scaffolding tool for small and large genomes > but am having problem with the library. Funny enough the software does not have support to solve problems > > Thanks > > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > On Mon, Oct 2, 2017 at 2:17 PM, Daniel Ence > wrote: > Hi Emmanuel, I think this script is expecting the file ?uniprot_sprot.fasta? downloaded from the uniprot download page at http://www.uniprot.org/downloads#uniprotkblink > The fasta headers in this file are different from the fasta header that the file you used has: > >sp|Q6GZX4|001R_FRG3G Putative transcription factor 001R OS=Frog virus 3 (isolate Goorha) GN=FV3-001R PE=4 SV=1 > > Let us know if that helps, > Daniel > >> On Oct 2, 2017, at 1:03 AM, Emmanuel Nnadi > wrote: >> >> Hello, >> I intend to rename genes for Genebank submission >> >> I downloaded swissprot.fa from NCBI and used blast MAKER generated file to swissprot. >> >> the output of BLAST RESULT looks like this >> snap_masked-contig_8151-processed-gene-0.8-mRNA-1 P10978.1 49.315 73 37 0 43 115 874 946 2.61e-14 71.6 >> >> I attempted to run maker_funtional_gff using the swissprot.fa downloaded and the blastp result >> >> I got the following result >> >> Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 139, <$IN> line 2897906. >> Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 141, <$IN> line 2897906. >> Can't parse details from FASTA header: >P11684.1 RecName: Full=Uteroglobin; AltName: Full=Clara cell phospholipid-binding protein; Short=CCPBP; AltName: Full=Clara cells 10 kDa secretory protein; Short=CC10; AltName: Full=Secretoglobin family 1A member 1; AltName: Full=Urinary protein 1; Short=UP-1; Short=UP1; Short=Urine protein 1; Flags: Precursor >> >> >> Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 139, <$IN> line 1608599. >> Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 141, <$IN> line 1608599. >> Can't parse details from FASTA header: >Q9HZU2.1 RecName: Full=Precorrin-8X methylmutase; AltName: Full=HBA synthase; AltName: Full=Precorrin isomerase >> >> What can I do? >> >> >> Nnadi Nnaemeka Emmanuel >> Department of Microbiology, >> Faculty of Natural and Applied Science, >> Plateau State University, Bokkos, Plateau State, Nigeria. >> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daren.card at gmail.com Fri Oct 6 06:23:36 2017 From: daren.card at gmail.com (Daren C. Card) Date: Fri, 6 Oct 2017 07:23:36 -0500 Subject: [maker-devel] MAKER RepeatRunner error on long scaffolds only In-Reply-To: <2460BB61-C918-40B5-ABF2-03193BF13CCC@gmail.com> References: <2460BB61-C918-40B5-ABF2-03193BF13CCC@gmail.com> Message-ID: <52A27F91-063E-45C5-BEE0-BED0BF4E861E@gmail.com> Dear Carson, Thanks so much for the quick reply. I updated BLAST to v2.6 and reran the configure script for RepeatMasker. Looks like MAKER should natively work with the BLAST that is available in the $PATH. Unfortunately, I?m still getting the same error what appears to be at roughly the same spot (~child 226). I?ve copied the stderr below. I checked my GFF file and I don?t see any issues with coordinates. I?m going to try running without a GFF of repeat annotations to see what that does, but in the meantime I wanted to send an update and see if there is anything else I should look into. Thank you, Daren Card ################################################ doing repeat masking re reading repeat masker report. /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.simple.rb.out doing blastx repeats re reading blast report. /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.te_proteins%2Efasta.repeatrunner deleted:2 hits doing blastx repeats doing blastx repeats doing blastx repeats doing blastx repeats doing blastx repeats doing blastx repeats doing blastx repeats doing blastx repeats doing blastx repeats collecting blastx repeatmasking processing all repeats in cluster::shadow_cluster... Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. --> rank=NA, hostname=moonunit0 ERROR: Failed while processing all repeats ERROR: Chunk failed at level:3, tier_type:1 FAILED CONTIG:scaffold-1 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:scaffold-1 examining contents of the fasta file and run log ################################################ > On Oct 4, 2017, at 11:03 AM, Carson Holt wrote: > > The point where it dies is because there is no start/end coordinate for one of the alignments. The issue can either be with the GFF3 you gave it or is a truncated BLAST report. Recently there have been a number of weird BLAST+ issues related to truncated reports. Updating to 2.6+ seems to solve it for most people. There is also a 2.6 update for rmblast inside RepeatMasker. I submitted a bug report and example set to BLAST a few months ago. > > ?Carson > > >> On Oct 4, 2017, at 9:53 AM, Daren C. Card wrote: >> >> Hi all, >> >> I?ve been having an issue with MAKER (v. 2.31.8) that I haven?t been able to overcome, and no former questions have really addressed or helped fix the problem. I?ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size? >> >> I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can?t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I?m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library). >> >> Any help would be greatly appreciated. >> Daren Card >> >> University of Texas Arlington >> >> ################################################### >> doing blastx repeats >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner >> #-------------------------------# >> deleted:0 hits >> collecting blastx repeatmasking >> processing all repeats >> in cluster::shadow_cluster... >> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >> --> rank=3, hostname=moonunit0 >> ERROR: Failed while processing all repeats >> ERROR: Chunk failed at level:3, tier_type:1 >> FAILED CONTIG:scaffold-1 >> >> doing blastx repeats >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner >> #-------------------------------# >> ERROR: Chunk failed at level:2, tier_type:0 >> FAILED CONTIG:scaffold-1 >> >> deleted:0 hits >> deleted:0 hits >> ################################################### >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From eennadi at gmail.com Sat Oct 7 15:34:46 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Sat, 7 Oct 2017 22:34:46 +0100 Subject: [maker-devel] jbrowse not working Message-ID: Please, I ran the command line maker2jbrowse muc1_genome_snap2.all.gff The command created some folders. However, at the end it read No reference sequences defined in configuration, nothing to do. Please what does it mean? How can I view it in jbrowse. Thanks Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun Oct 8 18:37:12 2017 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 8 Oct 2017 18:37:12 -0600 Subject: [maker-devel] MAKER RepeatRunner error on long scaffolds only In-Reply-To: <52A27F91-063E-45C5-BEE0-BED0BF4E861E@gmail.com> References: <2460BB61-C918-40B5-ABF2-03193BF13CCC@gmail.com> <52A27F91-063E-45C5-BEE0-BED0BF4E861E@gmail.com> Message-ID: <228ECD18-7B0E-47EB-9F58-FA3C31421A52@gmail.com> MAKER will use whatever blast is indicated in maker_exe.ctl, so make sure the new installation is the one indicated there. RepeatRunner is not part of RepeatMasker, and is a separate step that is essentially just a modified BLASTX against a protein database. So the standard NCBI blast+ installation is what gets used for that (not RMBLAST). The error you get is because the BLAST report is truncated. At the top of a BLAST report there is a summary of results, and then below there are details about each result. What is happening is that there are results in the top summary that are not being found in the bottom detail section. If Updating to BLAST+ 2.6 does not fix it for you, you may need to drop to legacy NCBI BLAST (i.e. the one that is not the BLAST+ rewrite). Here ?> ftp://ftp.ncbi.nlm.nih.gov/blast/executables/legacy/2.2.26/ ?Carson > On Oct 6, 2017, at 6:23 AM, Daren C. Card wrote: > > Dear Carson, > > Thanks so much for the quick reply. I updated BLAST to v2.6 and reran the configure script for RepeatMasker. Looks like MAKER should natively work with the BLAST that is available in the $PATH. > > Unfortunately, I?m still getting the same error what appears to be at roughly the same spot (~child 226). I?ve copied the stderr below. I checked my GFF file and I don?t see any issues with coordinates. I?m going to try running without a GFF of repeat annotations to see what that does, but in the meantime I wanted to send an update and see if there is anything else I should look into. > > Thank you, > Daren Card > > > ################################################ > doing repeat masking > re reading repeat masker report. > /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.simple.rb.out > doing blastx repeats > re reading blast report. > /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.te_proteins%2Efasta.repeatrunner > deleted:2 hits > doing blastx repeats > doing blastx repeats > doing blastx repeats > doing blastx repeats > doing blastx repeats > doing blastx repeats > doing blastx repeats > doing blastx repeats > doing blastx repeats > collecting blastx repeatmasking > processing all repeats > in cluster::shadow_cluster... > Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. > --> rank=NA, hostname=moonunit0 > ERROR: Failed while processing all repeats > ERROR: Chunk failed at level:3, tier_type:1 > FAILED CONTIG:scaffold-1 > > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:scaffold-1 > > examining contents of the fasta file and run log > ################################################ > > > >> On Oct 4, 2017, at 11:03 AM, Carson Holt wrote: >> >> The point where it dies is because there is no start/end coordinate for one of the alignments. The issue can either be with the GFF3 you gave it or is a truncated BLAST report. Recently there have been a number of weird BLAST+ issues related to truncated reports. Updating to 2.6+ seems to solve it for most people. There is also a 2.6 update for rmblast inside RepeatMasker. I submitted a bug report and example set to BLAST a few months ago. >> >> ?Carson >> >> >>> On Oct 4, 2017, at 9:53 AM, Daren C. Card wrote: >>> >>> Hi all, >>> >>> I?ve been having an issue with MAKER (v. 2.31.8) that I haven?t been able to overcome, and no former questions have really addressed or helped fix the problem. I?ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size? >>> >>> I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can?t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I?m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library). >>> >>> Any help would be greatly appreciated. >>> Daren Card >>> >>> University of Texas Arlington >>> >>> ################################################### >>> doing blastx repeats >>> running blast search. >>> #--------- command -------------# >>> Widget::blastx: >>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner >>> #-------------------------------# >>> deleted:0 hits >>> collecting blastx repeatmasking >>> processing all repeats >>> in cluster::shadow_cluster... >>> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >>> --> rank=3, hostname=moonunit0 >>> ERROR: Failed while processing all repeats >>> ERROR: Chunk failed at level:3, tier_type:1 >>> FAILED CONTIG:scaffold-1 >>> >>> doing blastx repeats >>> running blast search. >>> #--------- command -------------# >>> Widget::blastx: >>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner >>> #-------------------------------# >>> ERROR: Chunk failed at level:2, tier_type:0 >>> FAILED CONTIG:scaffold-1 >>> >>> deleted:0 hits >>> deleted:0 hits >>> ################################################### >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Oct 9 18:35:49 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 9 Oct 2017 18:35:49 -0600 Subject: [maker-devel] jbrowse not working In-Reply-To: References: Message-ID: <83AFE420-D54D-4CE8-833F-DE6CCC34A229@gmail.com> Is muc1_genome_snap2.all.gff missing embedded fasta entries at the end of the file? That can happen if you use the -n option with gff3_merge. Alternatively it?s possible one of the individual contig gff3 used to build the merged gff3 is truncated. If that is the case then gff3_merge should have thrown some sort of error or warning when you run it. Thanks, Carson > On Oct 7, 2017, at 3:34 PM, Emmanuel Nnadi wrote: > > Please, > I ran the command line > > maker2jbrowse muc1_genome_snap2.all.gff > > The command created some folders. However, at the end it read > No reference sequences defined in configuration, nothing to do. > > Please what does it mean? How can I view it in jbrowse. > > Thanks > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From eennadi at gmail.com Mon Oct 9 22:42:35 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Tue, 10 Oct 2017 05:42:35 +0100 Subject: [maker-devel] jbrowse not working In-Reply-To: <83AFE420-D54D-4CE8-833F-DE6CCC34A229@gmail.com> References: <83AFE420-D54D-4CE8-833F-DE6CCC34A229@gmail.com> Message-ID: Hi Carson Thanks for the reply I generated the off with this command gff3_merge ?d dpp_contig.maker.output/dpp_contig_master_datastore_index.log I had to rerun browse with the following command maker2jbrowse /Users/emmannaemeka/desktop/Gpm/maker/data/mucuna/muc1_genome_snap2.maker.output/muc1_genome_snap2.functional_blast.gff\maker2jbrowse -d /Users/emmannaemeka/Desktop/Gpm/maker/data/mucuna/muc1_genome_snap2.maker.output/muc1_genome_snap2_master_datastore_index.log \-out /Library/WebServer/Documents/JBrowse-1.12.1/muc/muc_jb Although its showing WARNING: No matching features found for mRNA I don't know what it means I don't understand what it means Successfully, I was able to setup the jbrowse local host. I had to move the jbrowse folder to my local host The jbrowse is up and running however, I have about 18488 contigs only 31 contigs are showing, how can i make all my contigs to show on jbrowse? Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications On Tue, Oct 10, 2017 at 1:35 AM, Carson Holt wrote: > Is muc1_genome_snap2.all.gff missing embedded fasta entries at the end of > the file? That can happen if you use the -n option with gff3_merge. > Alternatively it?s possible one of the individual contig gff3 used to build > the merged gff3 is truncated. If that is the case then gff3_merge should > have thrown some sort of error or warning when you run it. > > Thanks, > Carson > > > > > On Oct 7, 2017, at 3:34 PM, Emmanuel Nnadi wrote: > > Please, > I ran the command line > > maker2jbrowse muc1_genome_snap2.all.gff > > The command created some folders. However, at the end it read > No reference sequences defined in configuration, nothing to do. > > Please what does it mean? How can I view it in jbrowse. > > Thanks > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/ > publications > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at nbis.se Tue Oct 10 03:24:34 2017 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Tue, 10 Oct 2017 11:24:34 +0200 Subject: [maker-devel] MAKER annotation submission (EMBLmyGFF3) Message-ID: <967873FE-D61F-4233-A004-C877A60A2AC1@nbis.se> Hi MAKER users, I take advantage to this mailing list to share a tool that I hope will be useful for MAKER's users. One of the steps once we are happy of our wonderful annotation is to submit it to the public archives through one of the three INSDC databases (EMBL-EBI / NCBI / DDBJ). We developed EMBLmyGFF3, allowing to easily convert any kind of GFF3 annotation to the EMBL flat file format in order to submit to the European Nucleotide Archive (ENA) Database that is part of EMBL-EBI. It works well, amongst others, with the MAKER annotation output. We hope the tool will ease the submission process of your annotations. You will find it here: https://github.com/NBISweden/EMBLmyGFF3 A typical usage case will look like that (where ERSXXXXXX and PRJXXXXXX are the accession number and the project ID provided by EMBL-EBI prior to any submission): ./EMBLmyGFF3.py maker.gff3 maker.fa --data_class STD --topology linear --molecule_type 'genomic DNA' --table 1 --species 'Drosophila melanogaster (fly)' --taxonomy INV --accession ERSXXXXXXX --project_id PRJXXXXXXX --rg MYGROUP -o result.embl Best regards, Jacques Dainat, PhD --------------------------------------- NBIS (National Bioinformatics Infrastructure Sweden) Genome Annotation Service --------------------------------------- Uppsala University, Biomedicinska Centrum Department of Medical Biochemistry Microbiology, Genomics -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcsimenc at gmail.com Wed Oct 11 08:53:36 2017 From: mcsimenc at gmail.com (Matt Simenc) Date: Wed, 11 Oct 2017 07:53:36 -0700 Subject: [maker-devel] QI codes insufficient - how to get frac exons with EST only? Message-ID: Hey MAKER people, I would like to make a Venn diagram showing the kinds of evidence supporting gene models in my MAKER annotation where the left side shows number of genes with EST support only, the right side shows number of genes with protein support only, and the intersection shows number of genes with EST and protein support. QI summary has: Fraction of exons that overlap an EST alignment Fraction of exons that overlap EST or Protein alignments Please correct me if I'm wrong, because I am interpreting the first to be fraction of exons that overlap an EST alignment and possibly also a protein alignment. If that is the case then we can't calculate the number of genes that overlap only EST or (EST and protein) from the QI information. Anyone have a way to do this or have a script to parse the MAKER GFF3 to get this? Thanks!!! Matt Simenc -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Wed Oct 11 09:18:54 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Wed, 11 Oct 2017 11:18:54 -0400 Subject: [maker-devel] QI codes insufficient - how to get frac exons with EST only? In-Reply-To: References: Message-ID: <93934B45-909D-48FD-A840-B4F59F15AB53@gmail.com> Hi Matt, I have a hacky way that I?ve done it. It requires running MAKER two more times but they are quicker runs. To identify the genes that have protein support I pass all of the annotation back to MAKER using the model_gff option in the maker_opts.ctl file. Then I pull out all of the protein2genome features from the big MAKER GFF3 file and pass them in using the protein_gff option. I turn off all repeat masking and run MAKER. It runs fast because it doesn?t have to run any gene finders, align evidence, or repeatmask. In the output any gene with an AED less than 1 has protein support. Then I do the same thing with est2genome lines from the big GFF3 file and put them in as est_gff. The output of that one gives you genes with EST support. Then the genes with an AED of less than one in both sets have support from protein and EST. Hope this helps, Mike > On Oct 11, 2017, at 10:53 AM, Matt Simenc wrote: > > Hey MAKER people, > > I would like to make a Venn diagram showing the kinds of evidence supporting gene models in my MAKER annotation where the left side shows number of genes with EST support only, the right side shows number of genes with protein support only, and the intersection shows number of genes with EST and protein support. > > QI summary has: > > Fraction of exons that overlap an EST alignment > Fraction of exons that overlap EST or Protein alignments > > Please correct me if I'm wrong, because I am interpreting the first to be fraction of exons that overlap an EST alignment and possibly also a protein alignment. If that is the case then we can't calculate the number of genes that overlap only EST or (EST and protein) from the QI information. > > Anyone have a way to do this or have a script to parse the MAKER GFF3 to get this? > > Thanks!!! > Matt Simenc > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Oct 11 09:22:54 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 11 Oct 2017 09:22:54 -0600 Subject: [maker-devel] QI codes insufficient - how to get frac exons with EST only? In-Reply-To: <93934B45-909D-48FD-A840-B4F59F15AB53@gmail.com> References: <93934B45-909D-48FD-A840-B4F59F15AB53@gmail.com> Message-ID: <6A3091A3-5F0E-470D-89F3-4B6C16E50F4B@gmail.com> Also look at GAL for building GFF3 feature queries ?> https://github.com/The-Sequence-Ontology/GAL ?Carson > On Oct 11, 2017, at 9:18 AM, Michael Campbell wrote: > > Hi Matt, > > I have a hacky way that I?ve done it. It requires running MAKER two more times but they are quicker runs. > > To identify the genes that have protein support I pass all of the annotation back to MAKER using the model_gff option in the maker_opts.ctl file. Then I pull out all of the protein2genome features from the big MAKER GFF3 file and pass them in using the protein_gff option. I turn off all repeat masking and run MAKER. It runs fast because it doesn?t have to run any gene finders, align evidence, or repeatmask. In the output any gene with an AED less than 1 has protein support. Then I do the same thing with est2genome lines from the big GFF3 file and put them in as est_gff. The output of that one gives you genes with EST support. Then the genes with an AED of less than one in both sets have support from protein and EST. > > Hope this helps, > Mike > >> On Oct 11, 2017, at 10:53 AM, Matt Simenc wrote: >> >> Hey MAKER people, >> >> I would like to make a Venn diagram showing the kinds of evidence supporting gene models in my MAKER annotation where the left side shows number of genes with EST support only, the right side shows number of genes with protein support only, and the intersection shows number of genes with EST and protein support. >> >> QI summary has: >> >> Fraction of exons that overlap an EST alignment >> Fraction of exons that overlap EST or Protein alignments >> >> Please correct me if I'm wrong, because I am interpreting the first to be fraction of exons that overlap an EST alignment and possibly also a protein alignment. If that is the case then we can't calculate the number of genes that overlap only EST or (EST and protein) from the QI information. >> >> Anyone have a way to do this or have a script to parse the MAKER GFF3 to get this? >> >> Thanks!!! >> Matt Simenc >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcsimenc at gmail.com Wed Oct 11 22:19:04 2017 From: mcsimenc at gmail.com (Matt Simenc) Date: Wed, 11 Oct 2017 21:19:04 -0700 Subject: [maker-devel] QI codes insufficient - how to get frac exons with EST only? In-Reply-To: <6A3091A3-5F0E-470D-89F3-4B6C16E50F4B@gmail.com> References: <93934B45-909D-48FD-A840-B4F59F15AB53@gmail.com> <6A3091A3-5F0E-470D-89F3-4B6C16E50F4B@gmail.com> Message-ID: Very good, thank you! Matt On Wed, Oct 11, 2017 at 8:22 AM, Carson Holt wrote: > Also look at GAL for building GFF3 feature queries ?> > https://github.com/The-Sequence-Ontology/GAL > > ?Carson > > > > > On Oct 11, 2017, at 9:18 AM, Michael Campbell < > michael.s.campbell1 at gmail.com> wrote: > > Hi Matt, > > I have a hacky way that I?ve done it. It requires running MAKER two more > times but they are quicker runs. > > To identify the genes that have protein support I pass all of the > annotation back to MAKER using the model_gff option in the maker_opts.ctl > file. Then I pull out all of the protein2genome features from the big MAKER > GFF3 file and pass them in using the protein_gff option. I turn off all > repeat masking and run MAKER. It runs fast because it doesn?t have to run > any gene finders, align evidence, or repeatmask. In the output any gene > with an AED less than 1 has protein support. Then I do the same thing with > est2genome lines from the big GFF3 file and put them in as est_gff. The > output of that one gives you genes with EST support. Then the genes with an > AED of less than one in both sets have support from protein and EST. > > Hope this helps, > Mike > > On Oct 11, 2017, at 10:53 AM, Matt Simenc wrote: > > Hey MAKER people, > > I would like to make a Venn diagram showing the kinds of evidence > supporting gene models in my MAKER annotation where the left side shows > number of genes with EST support only, the right side shows number of genes > with protein support only, and the intersection shows number of genes with > EST and protein support. > > QI summary has: > > Fraction of exons that overlap an EST alignment > Fraction of exons that overlap EST or Protein alignments > > Please correct me if I'm wrong, because I am interpreting the first to be > fraction of exons that overlap an EST alignment and possibly also a protein > alignment. If that is the case then we can't calculate the number of genes > that overlap only EST or (EST and protein) from the QI information. > > Anyone have a way to do this or have a script to parse the MAKER GFF3 to > get this? > > Thanks!!! > Matt Simenc > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott at scottcain.net Thu Oct 12 17:33:05 2017 From: scott at scottcain.net (Scott Cain) Date: Thu, 12 Oct 2017 19:33:05 -0400 Subject: [maker-devel] GMOD hackathon before PAG San Diego in January Message-ID: Hi all, This January before PAG on the Wednesday and Thursday before PAG (January 10-11) in San Diego we are planning a GMOD hackathon. We expect that participants will be interested in solving problems/creating solutions related to Tripal, JBrowse, Apollo, and Galaxy but if you're interested in another GMOD project, by all means, let us know! We expect this hackathon to overlap with the Tripal hackathon that is on January 11 (I'm pretty sure; right Stephen?) If you are interested in attending this hackathon, please let me know so I can be sure we have an appropriately sized space. And if you're coming for the pre-PAG hackathon, consider staying for PAG, since there is always a lot of GMOD-related content at the meeting! Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From daren.card at gmail.com Thu Oct 12 20:22:54 2017 From: daren.card at gmail.com (Daren C. Card) Date: Thu, 12 Oct 2017 21:22:54 -0500 Subject: [maker-devel] MAKER RepeatRunner error on long scaffolds only In-Reply-To: <228ECD18-7B0E-47EB-9F58-FA3C31421A52@gmail.com> References: <2460BB61-C918-40B5-ABF2-03193BF13CCC@gmail.com> <52A27F91-063E-45C5-BEE0-BED0BF4E861E@gmail.com> <228ECD18-7B0E-47EB-9F58-FA3C31421A52@gmail.com> Message-ID: <90B18E05-63DB-4458-BC9B-807972BE1414@gmail.com> Hi Carson, Thanks for the help. Issue is still lingering. I?ve tried my full ?ideal? run using both the BLAST legacy 2.2.26 and also 2.6 and get the same error, so doesn?t seem to be a BLAST issue. Or is one that won?t be easy to overcome. Using BLAST v. 2.6, I tried some more runs turning off RepeatRunner or excluding the complex repeat GFF I?m trying to supply. Seems to be running fine without my GFF, which indicates to me that the issue is this file and not BLAST. Disclaimer: I didn?t run the entire scaffold since it is quite large, but it went well past the point at which it was otherwise failing which leads me to believe it would finish okay. I validated the GFF at http://genometools.org/cgi-bin/gff3validator.cgi. I had previously had <10 negative start coordinates for the repeat coordinates in the attributes field of the GFF, which I just set to 1 to give a clean GFF. This was what I used for the runs I described above, so whatever issue there is with this GFF is a mystery to me. What advice do you have for further troubleshooting to try to determine what part of the GFF is causing the issue? I don?t see any obvious way info about how the sequence or the GFF is partitioned up for the annotation among the output files produced, so any help you can provide would be great. Hoping I can resolve this as maybe this is useful to others. Weird that I?m getting this error, as I?ve annotated several other genomes in a similar manner and never had this issue. They were less contiguous, but can?t imagine that really mattering. Thanks, Daren > On Oct 8, 2017, at 7:37 PM, Carson Holt wrote: > > MAKER will use whatever blast is indicated in maker_exe.ctl, so make sure the new installation is the one indicated there. RepeatRunner is not part of RepeatMasker, and is a separate step that is essentially just a modified BLASTX against a protein database. So the standard NCBI blast+ installation is what gets used for that (not RMBLAST). > > The error you get is because the BLAST report is truncated. At the top of a BLAST report there is a summary of results, and then below there are details about each result. What is happening is that there are results in the top summary that are not being found in the bottom detail section. If Updating to BLAST+ 2.6 does not fix it for you, you may need to drop to legacy NCBI BLAST (i.e. the one that is not the BLAST+ rewrite). Here ?> ftp://ftp.ncbi.nlm.nih.gov/blast/executables/legacy/2.2.26/ > > ?Carson > > > > > >> On Oct 6, 2017, at 6:23 AM, Daren C. Card wrote: >> >> Dear Carson, >> >> Thanks so much for the quick reply. I updated BLAST to v2.6 and reran the configure script for RepeatMasker. Looks like MAKER should natively work with the BLAST that is available in the $PATH. >> >> Unfortunately, I?m still getting the same error what appears to be at roughly the same spot (~child 226). I?ve copied the stderr below. I checked my GFF file and I don?t see any issues with coordinates. I?m going to try running without a GFF of repeat annotations to see what that does, but in the meantime I wanted to send an update and see if there is anything else I should look into. >> >> Thank you, >> Daren Card >> >> >> ################################################ >> doing repeat masking >> re reading repeat masker report. >> /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.simple.rb.out >> doing blastx repeats >> re reading blast report. >> /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.te_proteins%2Efasta.repeatrunner >> deleted:2 hits >> doing blastx repeats >> doing blastx repeats >> doing blastx repeats >> doing blastx repeats >> doing blastx repeats >> doing blastx repeats >> doing blastx repeats >> doing blastx repeats >> doing blastx repeats >> collecting blastx repeatmasking >> processing all repeats >> in cluster::shadow_cluster... >> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >> --> rank=NA, hostname=moonunit0 >> ERROR: Failed while processing all repeats >> ERROR: Chunk failed at level:3, tier_type:1 >> FAILED CONTIG:scaffold-1 >> >> ERROR: Chunk failed at level:2, tier_type:0 >> FAILED CONTIG:scaffold-1 >> >> examining contents of the fasta file and run log >> ################################################ >> >> >> >>> On Oct 4, 2017, at 11:03 AM, Carson Holt wrote: >>> >>> The point where it dies is because there is no start/end coordinate for one of the alignments. The issue can either be with the GFF3 you gave it or is a truncated BLAST report. Recently there have been a number of weird BLAST+ issues related to truncated reports. Updating to 2.6+ seems to solve it for most people. There is also a 2.6 update for rmblast inside RepeatMasker. I submitted a bug report and example set to BLAST a few months ago. >>> >>> ?Carson >>> >>> >>>> On Oct 4, 2017, at 9:53 AM, Daren C. Card wrote: >>>> >>>> Hi all, >>>> >>>> I?ve been having an issue with MAKER (v. 2.31.8) that I haven?t been able to overcome, and no former questions have really addressed or helped fix the problem. I?ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size? >>>> >>>> I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can?t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I?m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library). >>>> >>>> Any help would be greatly appreciated. >>>> Daren Card >>>> >>>> University of Texas Arlington >>>> >>>> ################################################### >>>> doing blastx repeats >>>> running blast search. >>>> #--------- command -------------# >>>> Widget::blastx: >>>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner >>>> #-------------------------------# >>>> deleted:0 hits >>>> collecting blastx repeatmasking >>>> processing all repeats >>>> in cluster::shadow_cluster... >>>> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >>>> --> rank=3, hostname=moonunit0 >>>> ERROR: Failed while processing all repeats >>>> ERROR: Chunk failed at level:3, tier_type:1 >>>> FAILED CONTIG:scaffold-1 >>>> >>>> doing blastx repeats >>>> running blast search. >>>> #--------- command -------------# >>>> Widget::blastx: >>>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner >>>> #-------------------------------# >>>> ERROR: Chunk failed at level:2, tier_type:0 >>>> FAILED CONTIG:scaffold-1 >>>> >>>> deleted:0 hits >>>> deleted:0 hits >>>> ################################################### >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> > From robert.zimmermann at univie.ac.at Wed Oct 11 13:42:14 2017 From: robert.zimmermann at univie.ac.at (Bob Zimmermann) Date: Wed, 11 Oct 2017 21:42:14 +0200 Subject: [maker-devel] custom "ab initio" predictions with automatic hint-based predictions Message-ID: Hello, I would like to run maker with a custom set of ab initio predictions (based on hints given to augustus from RNAseq data), but allowing it to incorporate EST and protein data to make an additional run of augustus using hints derived from those alignments. My gene prediction section of the maker_opts.ctl file looks like this: ... augustus_species=all_combined #Augustus gene prediction species model ... pred_gff=../ab_initio_predictions/all_combined.augustus_masked.gff3 #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no ? It seems as though even if pred_gff is set, augustus will still be run for ab initio predictions with no hints if an augustus_species setting is present. I was curious if there was any way around this, partly because custom ab initios could improve my annotation and also because the ab initio step can take long. Thanks for your help! Bob From xvazquezc at gmail.com Thu Oct 12 00:09:32 2017 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Thu, 12 Oct 2017 17:09:32 +1100 Subject: [maker-devel] choosing the right gene model Message-ID: Hi there, I was visualising the annotations and I realised that in some cases, what it seems to be a gene is splitted according to one of the gene models, despite that the other 2, est2genome and prot2genome suggest that it isn't the case. Although the opposite also happens. ? For some reason, the "out of place" model is always (or almost) the one from Genemark. How much weight does carry the RNAseq and protein data on this decision (if any)? How exactly is the final gene selected? Cheers, Xabi -- Xabier V?zquez-Campos, *PhD* *Research Associate* NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: split-gene.png Type: image/png Size: 66389 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: merged-gene.png Type: image/png Size: 63815 bytes Desc: not available URL: From jan.nagel at fabi.up.ac.za Thu Oct 12 01:37:07 2017 From: jan.nagel at fabi.up.ac.za (Jan FABI) Date: Thu, 12 Oct 2017 09:37:07 +0200 Subject: [maker-devel] Maker problem Message-ID: Dear Maker team I am experiencing a problem while running maker and cannot find a solution to it online. I am running maker on a new genome, using BRAKER trained models for Augustus and GeneMark. This was successful and performed as expected, except for one contig where an error was encountered. This error occurs during Augustus and seems to have something to do with intron models. I have made sure that the input fasta does not contain characters other than ATCGN or contains "windows"/non-UNIX carriage returns. I include the relevant portion of the log below. Could you help me determine the cause of this error. setting up GFF3 output and fasta chunks preparing ab-inits running augustus. #--------- command -------------# Widget::augustus: /home/xxx/Desktop/programs/augustus-3.2.3/bin/augustus --species=Np_2017_braker --UTR=off /tmp/maker_bQo5Oc/NODE_1040_length_26483_cov_27%2E125137.abinit_masked.0 > /tmp/maker_bQo5Oc/NODE_1040_length_26483_cov_27%2E125137.abinit_masked.0.Np_2017_braker.augustus #-------------------------------# Sampling error in intron model. state=37 base=26570 /home/xxx/Desktop/programs/augustus-3.2.3/bin/augustus: ERROR Tried to sample from empty list. Sampling error in intron model. state=37 base=26570 /home/xxx/Desktop/programs/augustus-3.2.3/bin/augustus: ERROR Tried to sample from empty list. ERROR: Augustus failed --> rank=NA, hostname=xxx-VirtualBox ERROR: Failed while preparing ab-inits ERROR: Chunk failed at level:0, tier_type:2 FAILED CONTIG:NODE_1040_length_26483_cov_27.125137 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:NODE_1040_length_26483_cov_27.125137 -- Regards Jan Nagel ---------------------------------------------------------------------- PhD Genetics student Department of Genetics Forestry and Agricultural Biotechnology Institute (FABI) FABI 1, Room 1-55 University of Pretoria 74 Lunnon Rd. Hillcrest 0002 Gauteng Province South Africa Email : jan.nagel at fabi.up.ac.za Website: http://www.fabinet.up.ac.za/index.php/people-profile?profile=961 -- This message and attachments are subject to a disclaimer. Please refer to http://upnet.up.ac.za/services/it/documentation/docs/004167.pdf for full details. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott at scottcain.net Thu Oct 12 17:40:33 2017 From: scott at scottcain.net (Scott Cain) Date: Thu, 12 Oct 2017 19:40:33 -0400 Subject: [maker-devel] Call for presentations at GMOD workshop at PAG Message-ID: Hi all, This January in San Diego is the annual Plant and Animal Genomes (PAG) meeting (http://www.intlpag.org). As in previous PAGs, there will be several opportunities to present content related to GMOD projects. If you are interested in attending PAG and giving a talk at the GMOD workshop on Wednesday, January 17, please let me know. Your talk can either be about new developments/functionality in existing GMOD software, about how your organization is using the suite of GMOD software to good effect, or about technologies that you think the GMOD community would be interested in hearing about. Please email me directly with a title, an abstract or a vague idea of what you'd like to talk about. Also, if you'd really like to come but are having a hard time coming up with travel funds, please let me know, I might be able to help you with that too (up to a limit of one person anyway). Cheers, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Oct 13 09:37:25 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 Oct 2017 09:37:25 -0600 Subject: [maker-devel] MAKER RepeatRunner error on long scaffolds only In-Reply-To: <90B18E05-63DB-4458-BC9B-807972BE1414@gmail.com> References: <2460BB61-C918-40B5-ABF2-03193BF13CCC@gmail.com> <52A27F91-063E-45C5-BEE0-BED0BF4E861E@gmail.com> <228ECD18-7B0E-47EB-9F58-FA3C31421A52@gmail.com> <90B18E05-63DB-4458-BC9B-807972BE1414@gmail.com> Message-ID: So you have an input GFF3 file? Could you send it to me along with the problem contig. If you want you can upload the maker control files and evidence sets, and I can just recreate the run for the contig. Upload here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi ?Carson > On Oct 12, 2017, at 8:22 PM, Daren C. Card wrote: > > Hi Carson, > > Thanks for the help. Issue is still lingering. I?ve tried my full ?ideal? run using both the BLAST legacy 2.2.26 and also 2.6 and get the same error, so doesn?t seem to be a BLAST issue. Or is one that won?t be easy to overcome. > > Using BLAST v. 2.6, I tried some more runs turning off RepeatRunner or excluding the complex repeat GFF I?m trying to supply. Seems to be running fine without my GFF, which indicates to me that the issue is this file and not BLAST. Disclaimer: I didn?t run the entire scaffold since it is quite large, but it went well past the point at which it was otherwise failing which leads me to believe it would finish okay. > > I validated the GFF at http://genometools.org/cgi-bin/gff3validator.cgi. I had previously had <10 negative start coordinates for the repeat coordinates in the attributes field of the GFF, which I just set to 1 to give a clean GFF. This was what I used for the runs I described above, so whatever issue there is with this GFF is a mystery to me. > > What advice do you have for further troubleshooting to try to determine what part of the GFF is causing the issue? I don?t see any obvious way info about how the sequence or the GFF is partitioned up for the annotation among the output files produced, so any help you can provide would be great. > > Hoping I can resolve this as maybe this is useful to others. Weird that I?m getting this error, as I?ve annotated several other genomes in a similar manner and never had this issue. They were less contiguous, but can?t imagine that really mattering. > > Thanks, > Daren > > >> On Oct 8, 2017, at 7:37 PM, Carson Holt wrote: >> >> MAKER will use whatever blast is indicated in maker_exe.ctl, so make sure the new installation is the one indicated there. RepeatRunner is not part of RepeatMasker, and is a separate step that is essentially just a modified BLASTX against a protein database. So the standard NCBI blast+ installation is what gets used for that (not RMBLAST). >> >> The error you get is because the BLAST report is truncated. At the top of a BLAST report there is a summary of results, and then below there are details about each result. What is happening is that there are results in the top summary that are not being found in the bottom detail section. If Updating to BLAST+ 2.6 does not fix it for you, you may need to drop to legacy NCBI BLAST (i.e. the one that is not the BLAST+ rewrite). Here ?> ftp://ftp.ncbi.nlm.nih.gov/blast/executables/legacy/2.2.26/ >> >> ?Carson >> >> >> >> >> >>> On Oct 6, 2017, at 6:23 AM, Daren C. Card wrote: >>> >>> Dear Carson, >>> >>> Thanks so much for the quick reply. I updated BLAST to v2.6 and reran the configure script for RepeatMasker. Looks like MAKER should natively work with the BLAST that is available in the $PATH. >>> >>> Unfortunately, I?m still getting the same error what appears to be at roughly the same spot (~child 226). I?ve copied the stderr below. I checked my GFF file and I don?t see any issues with coordinates. I?m going to try running without a GFF of repeat annotations to see what that does, but in the meantime I wanted to send an update and see if there is anything else I should look into. >>> >>> Thank you, >>> Daren Card >>> >>> >>> ################################################ >>> doing repeat masking >>> re reading repeat masker report. >>> /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.simple.rb.out >>> doing blastx repeats >>> re reading blast report. >>> /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.te_proteins%2Efasta.repeatrunner >>> deleted:2 hits >>> doing blastx repeats >>> doing blastx repeats >>> doing blastx repeats >>> doing blastx repeats >>> doing blastx repeats >>> doing blastx repeats >>> doing blastx repeats >>> doing blastx repeats >>> doing blastx repeats >>> collecting blastx repeatmasking >>> processing all repeats >>> in cluster::shadow_cluster... >>> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >>> --> rank=NA, hostname=moonunit0 >>> ERROR: Failed while processing all repeats >>> ERROR: Chunk failed at level:3, tier_type:1 >>> FAILED CONTIG:scaffold-1 >>> >>> ERROR: Chunk failed at level:2, tier_type:0 >>> FAILED CONTIG:scaffold-1 >>> >>> examining contents of the fasta file and run log >>> ################################################ >>> >>> >>> >>>> On Oct 4, 2017, at 11:03 AM, Carson Holt wrote: >>>> >>>> The point where it dies is because there is no start/end coordinate for one of the alignments. The issue can either be with the GFF3 you gave it or is a truncated BLAST report. Recently there have been a number of weird BLAST+ issues related to truncated reports. Updating to 2.6+ seems to solve it for most people. There is also a 2.6 update for rmblast inside RepeatMasker. I submitted a bug report and example set to BLAST a few months ago. >>>> >>>> ?Carson >>>> >>>> >>>>> On Oct 4, 2017, at 9:53 AM, Daren C. Card wrote: >>>>> >>>>> Hi all, >>>>> >>>>> I?ve been having an issue with MAKER (v. 2.31.8) that I haven?t been able to overcome, and no former questions have really addressed or helped fix the problem. I?ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size? >>>>> >>>>> I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can?t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I?m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library). >>>>> >>>>> Any help would be greatly appreciated. >>>>> Daren Card >>>>> >>>>> University of Texas Arlington >>>>> >>>>> ################################################### >>>>> doing blastx repeats >>>>> running blast search. >>>>> #--------- command -------------# >>>>> Widget::blastx: >>>>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner >>>>> #-------------------------------# >>>>> deleted:0 hits >>>>> collecting blastx repeatmasking >>>>> processing all repeats >>>>> in cluster::shadow_cluster... >>>>> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >>>>> --> rank=3, hostname=moonunit0 >>>>> ERROR: Failed while processing all repeats >>>>> ERROR: Chunk failed at level:3, tier_type:1 >>>>> FAILED CONTIG:scaffold-1 >>>>> >>>>> doing blastx repeats >>>>> running blast search. >>>>> #--------- command -------------# >>>>> Widget::blastx: >>>>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner >>>>> #-------------------------------# >>>>> ERROR: Chunk failed at level:2, tier_type:0 >>>>> FAILED CONTIG:scaffold-1 >>>>> >>>>> deleted:0 hits >>>>> deleted:0 hits >>>>> ################################################### >>>>> >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Oct 13 09:42:41 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 Oct 2017 09:42:41 -0600 Subject: [maker-devel] custom "ab initio" predictions with automatic hint-based predictions In-Reply-To: References: Message-ID: <947BFB2F-A893-417B-A043-07CE71F6F97E@gmail.com> Hi Bob, pred_gff is a way to get models MAKER cannot run into the analysis. Input to pred_gff will not get hints since MAKER is not running the program. Setting augustus_species allows MAKER to run Augustus with and without hints and then those models compete against each other. You cannot just run with hints as the raw model is also used as a filter to help reduce false positive gene models that result from bad hints. If the gff3 you are providing is the same as the MAKER run of Augustus, I would recommend not providing it. If it is different in some way, then you can leave it in. If you run under MPI (it?s ok to run MPI on a single machine), then MAKER will parallelize the Augustus run by running multiple configs and contig chunks at the same time. Thanks, Carson > On Oct 11, 2017, at 1:42 PM, Bob Zimmermann wrote: > > Hello, > > I would like to run maker with a custom set of ab initio predictions (based on hints given to augustus from RNAseq data), but allowing it to incorporate EST and protein data to make an additional run of augustus using hints derived from those alignments. > > My gene prediction section of the maker_opts.ctl file looks like this: > ... > augustus_species=all_combined #Augustus gene prediction species model > ... > pred_gff=../ab_initio_predictions/all_combined.augustus_masked.gff3 #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) > est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no > ? > > It seems as though even if pred_gff is set, augustus will still be run for ab initio predictions with no hints if an augustus_species setting is present. I was curious if there was any way around this, partly because custom ab initios could improve my annotation and also because the ab initio step can take long. > > Thanks for your help! > > Bob > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri Oct 13 09:50:26 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 Oct 2017 09:50:26 -0600 Subject: [maker-devel] Maker problem In-Reply-To: References: Message-ID: If you look in the folder of the failed contig under the .../theVoid directory there will be a file called query.masked.fasta. Copy that file somewhere. Then because maker gave you the command that failed, you can run it all by itself outside of MAKER Example ?> /home/xxx/Desktop/programs/augustus-3.2.3/bin/augustus --species=Np_2017_braker --UTR=off query.masked.fasta If it still fails, you now have a test file and command you can send to Mario Stanke (mario.stanke at uni-greifswald.de ). He made Augustus. It may be a bug he has already fixed (current Augustus version is 3.3) or there may be something in the species file causing the error that he can point out. ?Carson > On Oct 12, 2017, at 1:37 AM, Jan FABI wrote: > > Dear Maker team > > I am experiencing a problem while running maker and cannot find a solution to it online. > > I am running maker on a new genome, using BRAKER trained models for Augustus and GeneMark. This was successful and performed as expected, except for one contig where an error was encountered. > > This error occurs during Augustus and seems to have something to do with intron models. I have made sure that the input fasta does not contain characters other than ATCGN or contains "windows"/non-UNIX carriage returns. > > I include the relevant portion of the log below. Could you help me determine the cause of this error. > > > > setting up GFF3 output and fasta chunks > preparing ab-inits > running augustus. > #--------- command -------------# > Widget::augustus: > /home/xxx/Desktop/programs/augustus-3.2.3/bin/augustus --species=Np_2017_braker --UTR=off /tmp/maker_bQo5Oc/NODE_1040_length_26483_cov_27%2E125137.abinit_masked.0 > /tmp/maker_bQo5Oc/NODE_1040_length_26483_cov_27%2E125137.abinit_masked.0.Np_2017_braker.augustus > #-------------------------------# > Sampling error in intron model. state=37 base=26570 > > /home/xxx/Desktop/programs/augustus-3.2.3/bin/augustus: ERROR > Tried to sample from empty list. > > Sampling error in intron model. state=37 base=26570 > > /home/xxx/Desktop/programs/augustus-3.2.3/bin/augustus: ERROR > Tried to sample from empty list. > > ERROR: Augustus failed > --> rank=NA, hostname=xxx-VirtualBox > ERROR: Failed while preparing ab-inits > ERROR: Chunk failed at level:0, tier_type:2 > FAILED CONTIG:NODE_1040_length_26483_cov_27.125137 > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:NODE_1040_length_26483_cov_27.125137 > > -- > Regards > Jan Nagel > ---------------------------------------------------------------------- > PhD Genetics student > Department of Genetics > Forestry and Agricultural Biotechnology Institute (FABI) > FABI 1, Room 1-55 > University of Pretoria > 74 Lunnon Rd. Hillcrest > 0002 > Gauteng Province > South Africa > > Email : jan.nagel at fabi.up.ac.za > > Website: http://www.fabinet.up.ac.za/index.php/people-profile?profile=961 > This message and attachments are subject to a disclaimer. > Please refer to http://upnet.up.ac.za/services/it/documentation/docs/004167.pdf for full details. > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Oct 13 09:56:43 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 Oct 2017 09:56:43 -0600 Subject: [maker-devel] choosing the right gene model In-Reply-To: References: Message-ID: <821CB4FC-5571-41B1-AB2F-5FDD691C49D9@gmail.com> Both transcript and protein evidence will go into the AED calculation for overlap support. So in both cases the chosen model had better overlap (protein evidence will not count toward the eAED overlap calculation if it is out of frame with the model it is supposed to be supporting). The larger merged model generates a clutering affect on it?s evidence, so it?s evidence set for AED calculation is slightly different than the SNAP and Augustus model would generate. In both cases, I think GeneMark is hurting more than it is helping. You may want to just drop it from the analysis (unless it?s a fungi, I often find GeneMark can have that affect). ?Carson > On Oct 12, 2017, at 12:09 AM, Xabier V?zquez-Campos wrote: > > Hi there, > > I was visualising the annotations and I realised that in some cases, what it seems to be a gene is splitted according to one of the gene models, despite that the other 2, est2genome and prot2genome suggest that it isn't the case. > > > > Although the opposite also happens. > > > ? > For some reason, the "out of place" model is always (or almost) the one from Genemark. > > How much weight does carry the RNAseq and protein data on this decision (if any)? > How exactly is the final gene selected? > > Cheers, > Xabi > > -- > Xabier V?zquez-Campos, PhD > Research Associate > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Oct 13 10:56:30 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 Oct 2017 10:56:30 -0600 Subject: [maker-devel] jbrowse not working In-Reply-To: References: <83AFE420-D54D-4CE8-833F-DE6CCC34A229@gmail.com> Message-ID: <2D6E11BC-6853-458D-AEB1-12EF74D041A3@gmail.com> The master_datastore_index.log file has a list of failed and finished contigs. You can grep the file contents for FAILED or DIED to see if any contigs are not finished. Finished contigs will be listed as FINISHED in the file. Also note that if you have errors with the jbrowse build, you have to start over (i.e. wipe out old build). Rerunning the command over a failed build will try and insert again which can generate it?s own errors. If gff3_merge was run without the -n option then you need to see if one of the GFF3 files being used is truncated (possibly dew to an IO error - not uncommon on NFS storage). You will need to see if you can identify which contig file is truncated and rerun it. ?Carson > On Oct 9, 2017, at 10:42 PM, Emmanuel Nnadi wrote: > > Hi Carson > Thanks for the reply > > I generated the off with this command gff3_merge ?d dpp_contig.maker.output/dpp_contig_master_datastore_index.log > > I had to rerun browse with the following command > > maker2jbrowse /Users/emmannaemeka/desktop/Gpm/maker/data/mucuna/muc1_genome_snap2.maker.output/muc1_genome_snap2.functional_blast.gff\maker2jbrowse -d /Users/emmannaemeka/Desktop/Gpm/maker/data/mucuna/muc1_genome_snap2.maker.output/muc1_genome_snap2_master_datastore_index.log \-out /Library/WebServer/Documents/JBrowse-1.12.1/muc/muc_jb > > Although its showing > > WARNING: No matching features found for mRNA I don't know what it means > > I don't understand what it means > > > Successfully, I was able to setup the jbrowse local host. I had to move the jbrowse folder to my local host > > > The jbrowse is up and running however, I have about 18488 contigs only 31 contigs are showing, how can i make all my contigs to show on jbrowse? > > > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > On Tue, Oct 10, 2017 at 1:35 AM, Carson Holt > wrote: > Is muc1_genome_snap2.all.gff missing embedded fasta entries at the end of the file? That can happen if you use the -n option with gff3_merge. Alternatively it?s possible one of the individual contig gff3 used to build the merged gff3 is truncated. If that is the case then gff3_merge should have thrown some sort of error or warning when you run it. > > Thanks, > Carson > > > > >> On Oct 7, 2017, at 3:34 PM, Emmanuel Nnadi > wrote: >> >> Please, >> I ran the command line >> >> maker2jbrowse muc1_genome_snap2.all.gff >> >> The command created some folders. However, at the end it read >> No reference sequences defined in configuration, nothing to do. >> >> Please what does it mean? How can I view it in jbrowse. >> >> Thanks >> >> >> Nnadi Nnaemeka Emmanuel >> Department of Microbiology, >> Faculty of Natural and Applied Science, >> Plateau State University, Bokkos, Plateau State, Nigeria. >> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Fri Oct 13 14:26:40 2017 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Sat, 14 Oct 2017 07:26:40 +1100 Subject: [maker-devel] choosing the right gene model In-Reply-To: <821CB4FC-5571-41B1-AB2F-5FDD691C49D9@gmail.com> References: <821CB4FC-5571-41B1-AB2F-5FDD691C49D9@gmail.com> Message-ID: Actually, it's a fungal genome. Although not very typical, almost half of it are repeats. Worth mention that Genemark generates a lot of predictions that overlap LTRs and other complex repeats, something that neither SNAP or Augustus do. Have you seen this before? On 14 Oct. 2017 02:56, "Carson Holt" wrote: > Both transcript and protein evidence will go into the AED calculation for > overlap support. So in both cases the chosen model had better overlap > (protein evidence will not count toward the eAED overlap calculation if it > is out of frame with the model it is supposed to be supporting). The larger > merged model generates a clutering affect on it?s evidence, so it?s > evidence set for AED calculation is slightly different than the SNAP and > Augustus model would generate. In both cases, I think GeneMark is hurting > more than it is helping. You may want to just drop it from the analysis > (unless it?s a fungi, I often find GeneMark can have that affect). > > ?Carson > > > On Oct 12, 2017, at 12:09 AM, Xabier V?zquez-Campos > wrote: > > Hi there, > > I was visualising the annotations and I realised that in some cases, what > it seems to be a gene is splitted according to one of the gene models, > despite that the other 2, est2genome and prot2genome suggest that it isn't > the case. > > > > Although the opposite also happens. > > > ? > For some reason, the "out of place" model is always (or almost) the one > from Genemark. > > How much weight does carry the RNAseq and protein data on this decision > (if any)? > How exactly is the final gene selected? > > Cheers, > Xabi > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From z2.stewart at qut.edu.au Sat Oct 14 23:02:08 2017 From: z2.stewart at qut.edu.au (ZACHARY STEWART) Date: Sun, 15 Oct 2017 05:02:08 +0000 Subject: [maker-devel] Advanced repeat library construction - CRL step 4 assistance Message-ID: Hello MAKER team, I am hoping I could have a bit of your time if that isn't a problem. I am currently performing the advanced repeat library construction as described on the MAKER wiki, and everything appears to work as expected until I reach "2.1.5 Building examplars". At this point I encounter a problem previously documented in the Google group (title: advanced repeat masking library constructions & rna-seq assembly choices) where the "Inner_Seq_For_BLAST.fasta" and "lLTRs_Seq_For_BLAST.fasta" are empty. I was hoping you could clarify what you meant by simplifying the sequence names. The genomic contig names are in a format such as ">001676F" and I modified the MITE library to have names like ">mite1, >mite2" etc. The passed_outinner_sequence.fasta has sequence names such as ">000021F_(dbseq-nr_766)_[918983,922225]" which I have not tried changing since I suspect the name is important for later reassociation. If you could point me in the right direction that would be very appreciated. Regards, Zac. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eennadi at gmail.com Sun Oct 15 15:32:10 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Sun, 15 Oct 2017 22:32:10 +0100 Subject: [maker-devel] Backlash running through my sequence Message-ID: Hi all, I am trying to running annotation on some of my sequences but noticed that i have backslash that runs through the sequence. Please how do I remove them I attached the sequence Thanks Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: sample_1.fasta Type: application/octet-stream Size: 3884915 bytes Desc: not available URL: From xvazquezc at gmail.com Mon Oct 16 01:26:56 2017 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Mon, 16 Oct 2017 18:26:56 +1100 Subject: [maker-devel] Advanced repeat library construction - CRL step 4 assistance In-Reply-To: References: Message-ID: Hi Zac, The contig names you indicate shouldn't give any problems. And if you changed the names of MITE.lib right after creation and before using it downstream, it shouldn't be an issue. Have you confirmed if the prior blastx output has any results? Also, be sure you use the same version of makeblastdb and blastx/blastn. I remember reading before running the protocol for first time that in some cases, switching versions could give problems. And be careful if you copy/paste from the wiki page, there are a few typos and dashes instead of minus characters in the command line option flags, all of which will result in errors Xabi On 15 October 2017 at 16:02, ZACHARY STEWART wrote: > Hello MAKER team, > > > I am hoping I could have a bit of your time if that isn't a problem. I am > currently performing the advanced repeat library construction as described > on the MAKER wiki, and everything appears to work as expected until I reach > "2.1.5 Building examplars". At this point I encounter a problem previously > documented in the Google group (title: advanced repeat masking library > constructions & rna-seq assembly choices) where the "Inner_Seq_For_BLAST.fasta" > and "lLTRs_Seq_For_BLAST.fasta" are empty. I was hoping you could clarify > what you meant by simplifying the sequence names. The genomic contig names > are in a format such as ">001676F" and I modified the MITE library to > have names like ">mite1, >mite2" etc. The passed_outinner_sequence.fasta > has sequence names such as ">000021F_(dbseq-nr_766)_[918983,922225]" > which I have not tried changing since I suspect the name is important for > later reassociation. If you could point me in the right direction that > would be very appreciated. > > > Regards, > > Zac. > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From yuejiaxing at gmail.com Mon Oct 16 02:54:42 2017 From: yuejiaxing at gmail.com (Jia-Xing Yue) Date: Mon, 16 Oct 2017 10:54:42 +0200 Subject: [maker-devel] maker-devel Digest, Vol 113, Issue 13 In-Reply-To: References: Message-ID: Dear maker developers, I am trying to install maker-3.01.1-beta but encountered the warning message about uninitialized value (see the warning message below) although still finished the installation. [jxyue at paralog src]$ ./Build install Building MAKER Use of uninitialized value $line in chomp at /home/jxyue/Projects/LRSDAY/bu ild/maker/src/../../../build/cpanm/perlmods/lib/perl5/Module/Build/Base.pm line 3082. Use of uninitialized value $line in substitution (s///) at /home/jxyue/Projects/LRSDAY/build/maker/src/../../../build/ cpanm/perlmods/lib/perl5/Module/Build/Base.pm line 3083. Installing MAKER... Building MAKER ... Also, when I ran this installation for the actual work, it reported errors about cannot find my specified snaphmm model for the annotation, despite that I have specified "snaphmm=$LRSDAY_HOME/data/S288C.gene.hmm" in the "maker_opts.ctl" file and this configuration information has been successfully recognized by maker. running snap. #--------- command -------------# Widget::snap: /home/jxyue/Projects/LRSDAY/build/SNAP/snap /home/jxyue/Projects/LRSDAY/data/S288C.gene.hmm /tmp/maker_m8TVEQ/chrI.abinit_masked.0 > /tmp/maker_m8TVEQ/chrI.abinit_masked.0.S288C%2Egene%2Ehmm.snap #-------------------------------# # (my comment: up to now everything looks fine) .... running snap. #--------- command -------------# Widget::snap: /home/jxyue/Projects/LRSDAY/build/SNAP/snap -plus -xdef /tmp/maker_m8TVEQ/0/85_0.4044-4985.S288C.gene.hmm.auto_annotator.xdef.snap S288C.gene.hmm /tmp /maker_m8TVEQ/0/85_0.4044-4985.S288C.gene.hmm.auto_annotator.snap.fasta > /tmp/maker_m8TVEQ/0/85_0.4044-4985.S288C.gene.hmm.auto_annotator.snap #-------------------------------# ZOE ERROR (from /home/jxyue/Projects/LRSDAY/build/SNAP/snap): error opening file (/home/jxyue/Projects/LRSDAY/build/SNAP/Zoe/HMM/S288C.gene.hmm) ZOE library version 2017-03-01 ERROR: Snap failed --> rank=NA, hostname=paralog.itc.unipi.it ERROR: Failed while annotating transcripts ERROR: Chunk failed at level:1, tier_type:4 FAILED CONTIG:chrI ERROR: Chunk failed at level:6, tier_type:0 FAILED CONTIG:chrI examining contents of the fasta file and run log # (my comment: here the error occurred. As you can see, snap somehow forgot about the path to my specified hmm file and instead looks for this file in its default installation location) It is worth noting that the parallel installation and run with maker-3.00.0-beta finish smoothly without any problem. So I suspect both the installation warning and the executing error are caused by the changes during the version update from 3.00.0-beta to 3.01.1-beta. Could you check about this issue? Thanks in advance! Finally, is it possible to also provide access to older version of maker (e.g. 3.00.0-beta in this particular case) when the user finish the registration in the maker download page? This will help users to roll back to older version when needed. Also this helps for the version control when other developers develop annotation pipelines that use maker as a dependency package. Thanks for the consideration! Best, Jia-Xing -- Jia-Xing Yue Population Genomics and Complex Traits Group Tour Pasteur 8eme etage Facult? de M?decine Institute for Research on Cancer and Aging, Nice (IRCAN) CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) 28 Avenue de Valombrose 06107 NICE Cedex 2 France Personal website: http://www.iamphioxus.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Oct 16 10:20:32 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 16 Oct 2017 10:20:32 -0600 Subject: [maker-devel] Backlash running through my sequence In-Reply-To: References: Message-ID: <09603A3A-9DC0-40DC-A111-9DC1FCDF80BB@gmail.com> I would not just remove them. The fact they are there calls into question how they got there in the first place. If you generated this file yourself, you may want to intead use fasta_tool. ?Carson > On Oct 15, 2017, at 3:32 PM, Emmanuel Nnadi wrote: > > Hi all, > I am trying to running annotation on some of my sequences but noticed that i have backslash that runs through the sequence. Please how do I remove them > I attached the sequence > > Thanks > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Tue Oct 17 13:11:39 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 17 Oct 2017 19:11:39 +0000 Subject: [maker-devel] Backlash running through my sequence In-Reply-To: <09603A3A-9DC0-40DC-A111-9DC1FCDF80BB@gmail.com> References: <09603A3A-9DC0-40DC-A111-9DC1FCDF80BB@gmail.com> Message-ID: <8FD23F25-92D4-4A9C-873B-BB559B2CCBF4@illinois.edu> I agree with Carson, though my guess is any fasta converters will either fail on these characters as non-IUPAC, or will silently remove them. Running them through a converter may not solve all the issues though, as the backslash also appears in the FASTA headers at the end of the line: cjfields-imac:MAKER cjfields$ grep '>' sample_1.fasta | grep '\\' >contig_134\ >contig_149\ >contig_158\ >contig_222\ >contig_316\ >contig_582\ >contig_634\ >contig_700\ >contig_741\ ? I?m curious, was this edited using any particular program prior to MAKER (or was this an amalgam of different files)? chris From: maker-devel on behalf of Carson Holt Date: Monday, October 16, 2017 at 11:22 AM To: Emmanuel Nnadi Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Backlash running through my sequence I would not just remove them. The fact they are there calls into question how they got there in the first place. If you generated this file yourself, you may want to intead use fasta_tool. ?Carson On Oct 15, 2017, at 3:32 PM, Emmanuel Nnadi > wrote: Hi all, I am trying to running annotation on some of my sequences but noticed that i have backslash that runs through the sequence. Please how do I remove them I attached the sequence Thanks Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Oct 17 13:33:26 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Oct 2017 13:33:26 -0600 Subject: [maker-devel] maker-devel Digest, Vol 113, Issue 13 In-Reply-To: References: Message-ID: <30F2FDFE-3B4E-4951-89D8-63C2FC772B63@gmail.com> Thanks. The map_fasta_ids script was empty in the bin directory for some reason, so the installer through an error because it could not find the #!/usr/bin/perl line. I have put it back in the bin directory where it was supposed to be and the issue goes away for the install. For the second issue, I think I found it and have updated a new tar ball to the website. Also here is a link to download the old 3.00-beta, although I would not recommend making it part of a pipeline because version 3 is still beta and still has bugs (you should use 2.31.9 instead for piplines). ?> http://topaz.genetics.utah.edu/maker_downloads/static/maker-3.00.0-beta.tgz ?Carson > On Oct 16, 2017, at 2:54 AM, Jia-Xing Yue wrote: > > Dear maker developers, > > I am trying to install maker-3.01.1-beta but encountered the warning message about uninitialized value (see the warning message below) although still finished the installation. > > [jxyue at paralog src]$ ./Build install > Building MAKER > Use of uninitialized value $line in chomp at /home/jxyue/Projects/LRSDAY/build/maker/src/../../../build/cpanm/perlmods/lib/perl5/Module/Build/Base.pm line 3082. > Use of uninitialized value $line in substitution (s///) at /home/jxyue/Projects/LRSDAY/build/maker/src/../../../build/cpanm/perlmods/lib/perl5/Module/Build/Base.pm line 3083. > Installing MAKER... > Building MAKER > ... > > Also, when I ran this installation for the actual work, it reported errors about cannot find my specified snaphmm model for the annotation, despite that I have specified "snaphmm=$LRSDAY_HOME/data/S288C.gene.hmm" in the "maker_opts.ctl" file and this configuration information has been successfully recognized by maker. > > running snap. > #--------- command -------------# > Widget::snap: > /home/jxyue/Projects/LRSDAY/build/SNAP/snap /home/jxyue/Projects/LRSDAY/data/S288C.gene.hmm /tmp/maker_m8TVEQ/chrI.abinit_masked.0 > /tmp/maker_m8TVEQ/chrI.abinit_masked.0.S288C%2Egene%2Ehmm.snap > #-------------------------------# > > # (my comment: up to now everything looks fine) > .... > > running snap. > #--------- command -------------# > Widget::snap: > /home/jxyue/Projects/LRSDAY/build/SNAP/snap -plus -xdef /tmp/maker_m8TVEQ/0/85_0.4044-4985.S288C.gene.hmm.auto_annotator.xdef.snap S288C.gene.hmm /tmp > /maker_m8TVEQ/0/85_0.4044-4985.S288C.gene.hmm.auto_annotator.snap.fasta > /tmp/maker_m8TVEQ/0/85_0.4044-4985.S288C.gene.hmm.auto_annotator.snap > #-------------------------------# > ZOE ERROR (from /home/jxyue/Projects/LRSDAY/build/SNAP/snap): error opening file (/home/jxyue/Projects/LRSDAY/build/SNAP/Zoe/HMM/S288C.gene.hmm) > ZOE library version 2017-03-01 > ERROR: Snap failed > --> rank=NA, hostname=paralog.itc.unipi.it > ERROR: Failed while annotating transcripts > ERROR: Chunk failed at level:1, tier_type:4 > FAILED CONTIG:chrI > > ERROR: Chunk failed at level:6, tier_type:0 > FAILED CONTIG:chrI > > examining contents of the fasta file and run log > > # (my comment: here the error occurred. As you can see, snap somehow forgot about the path to my specified hmm file and instead looks for this file in its default installation location) > > It is worth noting that the parallel installation and run with maker-3.00.0-beta finish smoothly without any problem. So I suspect both the installation warning and the executing error are caused by the changes during the version update from 3.00.0-beta to 3.01.1-beta. Could you check about this issue? Thanks in advance! > > Finally, is it possible to also provide access to older version of maker (e.g. 3.00.0-beta in this particular case) when the user finish the registration in the maker download page? This will help users to roll back to older version when needed. Also this helps for the version control when other developers develop annotation pipelines that use maker as a dependency package. Thanks for the consideration! > > > Best, > Jia-Xing > > -- > Jia-Xing Yue > > Population Genomics and Complex Traits Group > Tour Pasteur 8eme etage > Facult? de M?decine > Institute for Research on Cancer and Aging, Nice (IRCAN) > CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) > 28 Avenue de Valombrose > 06107 NICE Cedex 2 > France > > Personal website: http://www.iamphioxus.org/ > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.tranvan at unil.ch Wed Oct 18 05:47:35 2017 From: patrick.tranvan at unil.ch (Patrick Tran Van) Date: Wed, 18 Oct 2017 11:47:35 +0000 Subject: [maker-devel] MPI vs multiple instance for speed In-Reply-To: <8FD23F25-92D4-4A9C-873B-BB559B2CCBF4@illinois.edu> References: <09603A3A-9DC0-40DC-A111-9DC1FCDF80BB@gmail.com>, <8FD23F25-92D4-4A9C-873B-BB559B2CCBF4@illinois.edu> Message-ID: <1508327278733.19140@unil.ch> Hi Carson, 1) I think I have read one of your post saying that running maker with MPI is faster than multiple instance, can you explain why ? 2) I am trying to annotate a 1GB specie but it's superslow. I have filtered the transcriptome to speed up the process but do you have other suggestion to increase the speed ? Cheers, Patrick Tran Van -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Oct 18 09:09:10 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 18 Oct 2017 09:09:10 -0600 Subject: [maker-devel] MPI vs multiple instance for speed In-Reply-To: <1508327278733.19140@unil.ch> References: <09603A3A-9DC0-40DC-A111-9DC1FCDF80BB@gmail.com> <8FD23F25-92D4-4A9C-873B-BB559B2CCBF4@illinois.edu> <1508327278733.19140@unil.ch> Message-ID: <486FE3D5-0902-4B05-A3E1-96642C68E422@gmail.com> MAKER can coordinate parallelization under MPI in a way it can?t even with multiple simultaneous runs. Because processes can comunicate among themselves under MPI, MAKER can break larger contigs into chunks or even pull off individual steps and pass them onto another processor, then receive the results back from that processor. So multiple BLAST, RepeatMasker, Exonerate, and prediction processes can all run at the same time for the same contig. Then they all pass their result back to the parent process so it can produce output for that contig. MPI was chosen as the parallelization framework rather than threads because it works both within a single machine as well as across multiple machines, so you can scale up to hundreds of processes if needed. ?Carson > On Oct 18, 2017, at 5:47 AM, Patrick Tran Van wrote: > > Hi Carson, > > 1) I think I have read one of your post saying that running maker with MPI is faster than multiple instance, can you explain why ? > > 2) I am trying to annotate a 1GB specie but it's superslow. > I have filtered the transcriptome to speed up the process but do you have other suggestion to increase the speed ? > > Cheers, > > Patrick Tran Van > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jhumann at wsu.edu Wed Oct 18 14:38:41 2017 From: jhumann at wsu.edu (Humann, Jodi Lynn) Date: Wed, 18 Oct 2017 20:38:41 +0000 Subject: [maker-devel] fix nucleotides option on MWAS Message-ID: Hello, I was wondering if there was any way to enable the '-fix_nucleotides' option on the MWAS version we are running locally on our server? I have a genome sequence with a degenerate nucleotide and get the following error: ERROR: The nucleotide sequence file '/local/www/maker/data/users/1/NZ_CP006580.1_EcP101.fasta' appears to contain protein sequence or unrecognized characters. Note the following nucleotides may be valid but are unsupported [RYKMSWBDHV] Please check/fix the file before continuing, or set -fix_nucleotides on the command line to fix this automatically. Invalid Character: 'K' --> rank=NA, hostname=compute2 The error message says the option can be used on the command line. Is that set on the actual command to run Maker (when using the command line version), or is it something that can be set in one of the control files? Any input would be greatly appreciated. I know I can fix my input file, but would prefer to just enable the option if I can. Thanks, Jodi Jodi Humann, Ph.D. Main Bioinformatics Lab Project Coordinator Department of Horticulture Washington State University PO Box 646414 Pullman, WA 99164-6414 509-335-3206 jhumann at wsu.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.zimmermann at univie.ac.at Thu Oct 19 09:25:08 2017 From: robert.zimmermann at univie.ac.at (Bob Zimmermann) Date: Thu, 19 Oct 2017 17:25:08 +0200 Subject: [maker-devel] Fewer gene models output with a superset of EST evidence Message-ID: Hi Maker Developers, I have been playing around with several data sets as input to annotate our newly reassembled genome. We have 3 RNA seq datasets which have been assembled into de novo transcripts using Trinity. These are input into the maker pipeline along with protein evidence. What is strange is that when I run maker with the de novo transcripts from a single set, I optain more maker transcripts than when I run with a combined set (1619 vs 1450 on one chromosome) and they are longer (median transcript length 1619 vs 1450, IQR 872-2160 vs 667-2026). It might make sense if they were more and shorter if the additional evidence was joining transcripts, but this would indicate that it is not the case. Therefore I?m trying to understand the algorithm. From what I understand if it finds evidence for an ab initio prediction for which the internal splice junctions agree, then it is considered for improvement. Why, then, if my combined set is a strict superset of the single set, do i get more transcripts with the single set? Thanks for your help! Best, Bob ? Department of Molecular Evolution and Development Universit?t Wien Althanstra?e 14 (UZA I), Zimmer 2.019 1090 Vienna Austria +43 1 427757002 From robert.zimmermann at univie.ac.at Thu Oct 19 09:28:17 2017 From: robert.zimmermann at univie.ac.at (Bob Zimmermann) Date: Thu, 19 Oct 2017 17:28:17 +0200 Subject: [maker-devel] Fewer gene models output with a superset of EST evidence In-Reply-To: References: Message-ID: Correction to the above numbers, the median lengths are 1414 and 1256. > On 19 Oct 2017, at 17:25, Bob Zimmermann wrote: > > Hi Maker Developers, > > I have been playing around with several data sets as input to annotate our newly reassembled genome. We have 3 RNA seq datasets which have been assembled into de novo transcripts using Trinity. These are input into the maker pipeline along with protein evidence. What is strange is that when I run maker with the de novo transcripts from a single set, I optain more maker transcripts than when I run with a combined set (1619 vs 1450 on one chromosome) and they are longer (median transcript length 1619 vs 1450, IQR 872-2160 vs 667-2026). It might make sense if they were more and shorter if the additional evidence was joining transcripts, but this would indicate that it is not the case. > > Therefore I?m trying to understand the algorithm. From what I understand if it finds evidence for an ab initio prediction for which the internal splice junctions agree, then it is considered for improvement. Why, then, if my combined set is a strict superset of the single set, do i get more transcripts with the single set? > > Thanks for your help! > > Best, > Bob > > ? > > Department of Molecular Evolution and Development > Universit?t Wien > Althanstra?e 14 (UZA I), Zimmer 2.019 > 1090 Vienna > Austria > > +43 1 427757002 > From carsonhh at gmail.com Thu Oct 19 09:44:07 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 19 Oct 2017 09:44:07 -0600 Subject: [maker-devel] Fewer gene models output with a superset of EST evidence In-Reply-To: References: Message-ID: <62F04A76-F3F1-4044-B4AD-129B15A9EEB2@gmail.com> You should look at both in a browser to get a better idea of what?s going on. What MAKER does is take the evidence given, clusters it (strand specific clustering) then uses the transcript evidence as intron hints to the predictors and protein alignments as exon hints (will also use polished protein hints to generate intron hints in the absence of transcript intron hints). Finally it uses overlapping transcript evidence to generate UTR. So look at it in a browser. See if the apparent overlap clusters are different in extent, also look for mRNA-seq evidence being merged. If the cluster is falsely merging between two loci because the mRNA-seq is merged, one of two things will happen you will get multiple models since the predictor can?t make a single model work within the cluster using the hints, or you will get a model with a really long UTR that is blocking other models from existing in the region. Also as depending on the mRNA-seq evidence coming in, you may be generating false models because of noise in the data. Essentially everything is transcribed at a basal level, so as you get more and more mRNA-seq, you generate more and more spurious alignments. So more evidence might gernate fewer long alignments for true loci or by falsely merging genes while simultaneously adding a number of very short spurious results. ?Carson > On Oct 19, 2017, at 9:28 AM, Bob Zimmermann wrote: > > Correction to the above numbers, the median lengths are 1414 and 1256. > >> On 19 Oct 2017, at 17:25, Bob Zimmermann wrote: >> >> Hi Maker Developers, >> >> I have been playing around with several data sets as input to annotate our newly reassembled genome. We have 3 RNA seq datasets which have been assembled into de novo transcripts using Trinity. These are input into the maker pipeline along with protein evidence. What is strange is that when I run maker with the de novo transcripts from a single set, I optain more maker transcripts than when I run with a combined set (1619 vs 1450 on one chromosome) and they are longer (median transcript length 1619 vs 1450, IQR 872-2160 vs 667-2026). It might make sense if they were more and shorter if the additional evidence was joining transcripts, but this would indicate that it is not the case. >> >> Therefore I?m trying to understand the algorithm. From what I understand if it finds evidence for an ab initio prediction for which the internal splice junctions agree, then it is considered for improvement. Why, then, if my combined set is a strict superset of the single set, do i get more transcripts with the single set? >> >> Thanks for your help! >> >> Best, >> Bob >> >> ? >> >> Department of Molecular Evolution and Development >> Universit?t Wien >> Althanstra?e 14 (UZA I), Zimmer 2.019 >> 1090 Vienna >> Austria >> >> +43 1 427757002 >> > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Oct 19 11:32:44 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 19 Oct 2017 11:32:44 -0600 Subject: [maker-devel] fix nucleotides option on MWAS In-Reply-To: References: Message-ID: Hi Jodi, I didn?t even know anyone else even had an MWAS server running (I?ve actually pulled all of the Build options for MWAS out of current releases). But you should be able to add the fix_nucleotide option to the command run by MWAS by editing the mwas_server script (?/maker/MWAS/bin/mwas_server). Somewhere inside the script there will be a line like this ?> $command = "$FindBin::RealBin/../../bin/maker -qq -base $job_id"; You can add -fix_nucleotides to that command so it always runs. fix_nucleotides is as command line flag. It?s basically a warning for the user to let them know something is weird (i.e. it is possible they mixed up transcript/protein sequence files). And then it allows the user to tell MAKER they did not mix files up, rather the data is supposed to look that way and they are ok with MAKER altering the sequence by replacing the letters or dashes seen with N?s. Thanks, Carson > On Oct 18, 2017, at 2:38 PM, Humann, Jodi Lynn wrote: > > Hello, > > I was wondering if there was any way to enable the ??fix_nucleotides? option on the MWAS version we are running locally on our server? I have a genome sequence with a degenerate nucleotide and get the following error: > > ERROR: The nucleotide sequence file '/local/www/maker/data/users/1/NZ_CP006580.1_EcP101.fasta' appears to contain protein sequence or unrecognized characters. Note the following nucleotides may be valid but are unsupported [RYKMSWBDHV] Please check/fix the file before continuing, or set -fix_nucleotides on the command line to fix this automatically. Invalid Character: 'K' --> rank=NA, hostname=compute2 > > The error message says the option can be used on the command line. Is that set on the actual command to run Maker (when using the command line version), or is it something that can be set in one of the control files? Any input would be greatly appreciated. I know I can fix my input file, but would prefer to just enable the option if I can. > > Thanks, > Jodi > > Jodi Humann, Ph.D. > Main Bioinformatics Lab Project Coordinator > Department of Horticulture > Washington State University > PO Box 646414 > Pullman, WA 99164-6414 > 509-335-3206 > jhumann at wsu.edu > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Oct 19 12:46:17 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 19 Oct 2017 12:46:17 -0600 Subject: [maker-devel] fix nucleotides option on MWAS In-Reply-To: References: Message-ID: <052F801C-3B37-4B0F-B40A-A905F5F2B1CE@gmail.com> Yes. That is the current version. ?Carson > On Oct 19, 2017, at 12:45 PM, Humann, Jodi Lynn wrote: > > Thanks for the info, Carson. We are running v2.31.9, and were able to get MWAS running, with some work. That is the current Maker version right? > > Jodi > > From: Carson Holt [mailto:carsonhh at gmail.com ] > Sent: Thursday, October 19, 2017 10:33 AM > To: Humann, Jodi Lynn > > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] fix nucleotides option on MWAS > > Hi Jodi, > > I didn?t even know anyone else even had an MWAS server running (I?ve actually pulled all of the Build options for MWAS out of current releases). But you should be able to add the fix_nucleotide option to the command run by MWAS by editing the mwas_server script (?/maker/MWAS/bin/mwas_server). > > Somewhere inside the script there will be a line like this ?> > $command = "$FindBin::RealBin/../../bin/maker -qq -base $job_id"; > > You can add -fix_nucleotides to that command so it always runs. fix_nucleotides is as command line flag. It?s basically a warning for the user to let them know something is weird (i.e. it is possible they mixed up transcript/protein sequence files). And then it allows the user to tell MAKER they did not mix files up, rather the data is supposed to look that way and they are ok with MAKER altering the sequence by replacing the letters or dashes seen with N?s. > > Thanks, > Carson > > > On Oct 18, 2017, at 2:38 PM, Humann, Jodi Lynn > wrote: > > Hello, > > I was wondering if there was any way to enable the ??fix_nucleotides? option on the MWAS version we are running locally on our server? I have a genome sequence with a degenerate nucleotide and get the following error: > > ERROR: The nucleotide sequence file '/local/www/maker/data/users/1/NZ_CP006580.1_EcP101.fasta' appears to contain protein sequence or unrecognized characters. Note the following nucleotides may be valid but are unsupported [RYKMSWBDHV] Please check/fix the file before continuing, or set -fix_nucleotides on the command line to fix this automatically. Invalid Character: 'K' --> rank=NA, hostname=compute2 > > The error message says the option can be used on the command line. Is that set on the actual command to run Maker (when using the command line version), or is it something that can be set in one of the control files? Any input would be greatly appreciated. I know I can fix my input file, but would prefer to just enable the option if I can. > > Thanks, > Jodi > > Jodi Humann, Ph.D. > Main Bioinformatics Lab Project Coordinator > Department of Horticulture > Washington State University > PO Box 646414 > Pullman, WA 99164-6414 > 509-335-3206 > jhumann at wsu.edu > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jhumann at wsu.edu Thu Oct 19 12:45:43 2017 From: jhumann at wsu.edu (Humann, Jodi Lynn) Date: Thu, 19 Oct 2017 18:45:43 +0000 Subject: [maker-devel] fix nucleotides option on MWAS In-Reply-To: References: Message-ID: Thanks for the info, Carson. We are running v2.31.9, and were able to get MWAS running, with some work. That is the current Maker version right? Jodi From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Thursday, October 19, 2017 10:33 AM To: Humann, Jodi Lynn Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] fix nucleotides option on MWAS Hi Jodi, I didn?t even know anyone else even had an MWAS server running (I?ve actually pulled all of the Build options for MWAS out of current releases). But you should be able to add the fix_nucleotide option to the command run by MWAS by editing the mwas_server script (?/maker/MWAS/bin/mwas_server). Somewhere inside the script there will be a line like this ?> $command = "$FindBin::RealBin/../../bin/maker -qq -base $job_id"; You can add -fix_nucleotides to that command so it always runs. fix_nucleotides is as command line flag. It?s basically a warning for the user to let them know something is weird (i.e. it is possible they mixed up transcript/protein sequence files). And then it allows the user to tell MAKER they did not mix files up, rather the data is supposed to look that way and they are ok with MAKER altering the sequence by replacing the letters or dashes seen with N?s. Thanks, Carson On Oct 18, 2017, at 2:38 PM, Humann, Jodi Lynn > wrote: Hello, I was wondering if there was any way to enable the ??fix_nucleotides? option on the MWAS version we are running locally on our server? I have a genome sequence with a degenerate nucleotide and get the following error: ERROR: The nucleotide sequence file '/local/www/maker/data/users/1/NZ_CP006580.1_EcP101.fasta' appears to contain protein sequence or unrecognized characters. Note the following nucleotides may be valid but are unsupported [RYKMSWBDHV] Please check/fix the file before continuing, or set -fix_nucleotides on the command line to fix this automatically. Invalid Character: 'K' --> rank=NA, hostname=compute2 The error message says the option can be used on the command line. Is that set on the actual command to run Maker (when using the command line version), or is it something that can be set in one of the control files? Any input would be greatly appreciated. I know I can fix my input file, but would prefer to just enable the option if I can. Thanks, Jodi Jodi Humann, Ph.D. Main Bioinformatics Lab Project Coordinator Department of Horticulture Washington State University PO Box 646414 Pullman, WA 99164-6414 509-335-3206 jhumann at wsu.edu _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From eennadi at gmail.com Mon Oct 23 07:30:07 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Mon, 23 Oct 2017 14:30:07 +0100 Subject: [maker-devel] Contamination report from NCBI Message-ID: Hello Good day. Please I submitted my sequence to NCBI and they sent back this contamination report. Please how do I use maker to effect the correction Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- SUBID BioProject BioSample Organism -------------------------------------------------------- SUB3124577 PRJNA414658 SAMN07821433 Mucuna pruriens [] We ran your sequences through our Contamination Screen. The screen found contigs that need to be trimmed and/or excluded. Please adjust the sequences appropriately and then resubmit your sequences. After you remove the contamination, trim any Ns at the ends of the sequence and remove any sequences that are shorter than 200 nt and not part of a multi-component scaffold. Note that hits in eukaryotic genomes to mitochondrial sequences can be ignored when specific criteria are met. Those criteria are explained below. Note that mismatches between the name of the adaptor/primer identified in the screen and the sequencing technology used to generate the sequencing data should not be used to discount the validity of the screen results as the adaptors/primers of many different sequencing platforms share sequence similarity. [] Some of the sequences hit primers or adaptors used in Illumina or 454 or other sequencing strategies or platforms. Primers at the end of a sequence should be removed. However, if primers are present within sequences then you should strongly consider splitting the sequences at the primers because the primer sequence could have been the region of overlap, causing a misassembly. Screened 26,016 sequences, 396,641,426 bp. Note: 5,610 sequences with runs of Ns 10 bp or longer (or those longer that 20 MB) were split before screening. 428 sequences with locations to mask/trim (31 split spans to exclude, 397 split spans with locations to mask/trim) Trim: Sequence name, length, span(s), apparent source contig_10109 13138 13078..13138 adaptor:NGB00847.1 contig_10200 20270 1..76 adaptor:NGB00847.1 contig_10202 22517 1..44 adaptor:NGB00360.1 contig_10218 55661 55592..55661 adaptor:NGB00847.1 contig_10283 11575 1..79 adaptor:NGB00847.1 contig_1038 91134 91073..91134 adaptor:NGB00360.1 contig_104 10061 10005..10061 adaptor:NGB00360.1 contig_10405 24076 1..43 adaptor:NGB00847.1 contig_10425 16694 16639..16694 adaptor:NGB00360.1 contig_10447 37445 37233..37445 adaptor:NGB00360.1 contig_10466 19368 1..52 adaptor:NGB00847.1 contig_10576 12053 12003..12053 adaptor:NGB00360.1 contig_1059 34516 34457..34516 adaptor:NGB00847.1 contig_106 49997 1..45 adaptor:NGB00360.1 contig_10695 27664 1..38 adaptor:NGB01029.1 contig_10753 12481 12413..12481 adaptor:NGB00847.1 contig_10822 33522 33441..33522 adaptor:NGB00847.1 contig_1083 10637 1..23 adaptor:NGB01096.1 contig_10851 36752 36682..36752 adaptor:NGB00360.1 contig_10878 27925 27848..27925 adaptor:NGB00360.1 contig_10965 23597 1..57 adaptor:NGB00360.1 contig_10968 7413 1..40 adaptor:NGB00847.1 contig_1099 35847 1..70 adaptor:NGB00360.1 contig_11034 10224 10166..10224 adaptor:NGB00360.1 contig_11058 32994 1..23 adaptor:NGB01088.1 contig_11138 17426 1..73 adaptor:NGB00847.1 contig_11166 6306 6266..6306 adaptor:NGB00360.1 contig_11182 26558 1..30 adaptor:NGB01096.1 contig_11216 15160 1..59 adaptor:NGB00847.1 contig_11269 14732 14655..14732 adaptor:NGB00847.1 contig_11306 28246 28199..28246 adaptor:NGB00360.1 contig_1136 28186 1..73 adaptor:NGB00847.1 contig_1141 58119 58028..58119 adaptor:NGB00847.1 contig_11416 8561 8539..8561 adaptor:NGB01088.1 contig_11504 8890 8840..8890 adaptor:NGB00360.1 contig_1158 17422 17398..17422 adaptor:NGB01088.1 contig_11647 7021 1..69 adaptor:NGB00847.1 contig_11684 17442 17418..17442 adaptor:NGB01096.1 contig_11752 38337 38314..38337 adaptor:NGB01088.1 contig_11767 6366 6324..6366 adaptor:NGB00847.1 contig_11791 22415 1..43 adaptor:NGB00847.1 contig_11792 58260 1..29 adaptor:NGB01096.1 contig_1187 39501 39462..39501 adaptor:NGB01029.1 contig_12059 10094 1..72 adaptor:NGB00360.1 contig_12130 13210 13164..13210 adaptor:NGB00360.1 contig_12164 17561 17539..17561 adaptor:NGB01096.1 contig_12169 14178 139..196 adaptor:NGB00360.1 contig_12183 15822 61..112 adaptor:NGB00360.1 contig_12266 11704 11640..11704 adaptor:NGB00360.1 contig_12300 9550 9360..9550 adaptor:NGB01088.1 contig_12324 49997 49891..49997 adaptor:NGB00847.1 contig_12423 45971 45860..45918 adaptor:NGB00360.1 contig_12441 15141 1..42 adaptor:NGB00847.1 contig_12514 14655 1..69 adaptor:NGB00847.1 contig_12515 5355 5326..5355 adaptor:NGB01088.1 contig_12535 22496 22458..22496 adaptor:NGB01029.1 contig_12544 19615 19559..19615 adaptor:NGB00360.1 contig_12558 20026 20007..20026 adaptor:NGB01088.1 contig_12613 6880 6793..6880 adaptor:NGB00847.1 contig_12701 18439 18330..18382 adaptor:NGB00360.1 contig_12713 13341 13274..13341 adaptor:NGB00360.1 contig_12723 17913 1..38 adaptor:NGB01088.1 contig_12730 55277 55249..55277 adaptor:NGB01096.1 contig_12739 6792 1..48 adaptor:NGB00360.1 contig_12787 30950 1..19 adaptor:NGB01096.1 contig_1279 18699 18670..18699 adaptor:NGB01088.1 contig_12815 5168 5091..5168 adaptor:NGB00847.1 contig_12846 20753 1..70 adaptor:NGB00360.1 contig_1288 34784 1..31 adaptor:NGB01096.1 contig_12888 12204 1..23 adaptor:NGB01096.1 contig_12919 10315 1..71 adaptor:NGB00360.1 contig_13031 8972 8938..8972 adaptor:NGB01093.1 contig_13088 6275 1..22 adaptor:NGB01088.1 contig_13140 36197 1..48 adaptor:NGB00360.1 contig_13233 16414 16355..16414 adaptor:NGB00847.1 contig_1330 33261 1..44 adaptor:NGB00847.1 contig_13319 19747 1..20 adaptor:NGB01096.1 contig_13367 36004 35868..35931 adaptor:NGB00847.1 contig_13395 5338 1..79 adaptor:NGB00360.1 contig_1341 30756 30734..30756 adaptor:NGB01088.1 contig_13481 9637 9600..9637 adaptor:NGB00360.1 contig_13506 5704 5662..5704 adaptor:NGB00360.1 contig_13548 5814 79..121 adaptor:NGB00360.1 contig_13567 21576 1..47 adaptor:NGB00847.1 contig_13669 8336 1..24 adaptor:NGB01088.1 contig_13718 23500 1..25 adaptor:NGB01096.1 contig_13783 18720 1..41 adaptor:NGB00847.1 contig_13830 32395 32367..32395 adaptor:NGB01096.1 contig_13845 15572 15493..15572 adaptor:NGB00360.1 contig_13854 10932 1..48 adaptor:NGB00360.1 contig_13943 37701 37674..37701 adaptor:NGB01096.1 contig_13957 7159 1..30 adaptor:NGB01096.1 contig_14014 29735 29672..29735 adaptor:NGB00360.1 contig_14027 21418 21340..21418 adaptor:NGB00360.1 contig_14032 47642 1..53 adaptor:NGB00847.1 contig_14047 26936 1..28 adaptor:NGB01088.1 contig_14048 45832 1..22 adaptor:NGB01088.1 contig_14061 11471 1..179 adaptor:NGB01096.1 contig_14113 17661 1..67 adaptor:NGB00360.1 contig_14173 17601 1..41 adaptor:NGB00847.1 contig_1418 31840 1..248 adaptor:NGB00847.1 contig_14194 7456 7294..7456 adaptor:NGB01096.1 contig_14210 8814 1971..2025 adaptor:NGB00360.1 contig_14223 12513 12489..12513 adaptor:NGB01096.1 contig_14317 21472 21410..21472 adaptor:NGB00360.1 contig_14424 6040 5973..6040 adaptor:NGB00360.1 contig_14425 6404 6379..6404 adaptor:NGB01096.1 contig_14426 31457 31398..31457 adaptor:NGB00847.1 contig_14458 6814 6623..6814 adaptor:NGB01088.1 contig_14524 9488 9431..9488 adaptor:NGB00847.1 contig_14584 20433 1..96 adaptor:NGB00847.1 contig_1459 32979 1..32 adaptor:NGB01096.1 contig_14601 19077 1..28 adaptor:NGB01096.1 contig_14641 21747 1..45 adaptor:NGB00847.1 contig_14664 48155 48118..48155 adaptor:NGB00360.1 contig_14711 11854 11827..11854 adaptor:NGB01096.1 contig_14736 21360 1..37 adaptor:NGB01029.1 contig_14749 12830 1..33 adaptor:NGB01093.1 contig_14966 9962 9891..9962 adaptor:NGB00360.1 contig_14999 5248 1..41 adaptor:NGB00360.1 contig_15010 17976 1..43 adaptor:NGB00360.1 contig_15011 26484 26462..26484 adaptor:NGB01096.1 contig_15017 9331 9291..9331 adaptor:NGB00360.1 contig_1503 63533 1..33 adaptor:NGB01096.1 contig_15032 32240 32157..32240 adaptor:NGB00847.1 contig_15060 15050 15010..15050 adaptor:NGB00847.1 contig_15065 13062 12996..13062 adaptor:NGB00360.1 contig_15070 29943 1..29 adaptor:NGB01096.1 contig_15132 20431 1..71 adaptor:NGB00847.1 contig_15169 7086 7051..7086 adaptor:NGB00846.1 contig_15174 19921 1..23 adaptor:NGB01096.1 contig_15194 16100 16039..16100 adaptor:NGB00847.1 contig_15212 9272 1..50 adaptor:NGB00847.1 contig_15215 15591 1..58 adaptor:NGB00360.1 contig_15271 37699 37647..37699 adaptor:NGB00847.1 contig_15276 11087 11031..11087 adaptor:NGB00847.1 contig_15309 10118 1..42 adaptor:NGB00847.1 contig_15320 7963 7901..7963 adaptor:NGB00847.1 contig_15334 5683 1..36 adaptor:NGB00846.1 contig_15364 17306 76..139 adaptor:NGB00847.1 contig_15374 28301 28263..28301 adaptor:NGB00360.1 contig_15377 10470 10428..10470 adaptor:NGB00360.1 contig_15398 24069 23999..24069 adaptor:NGB00847.1 contig_15500 9289 9271..9289 adaptor:NGB01096.1 contig_15507 25565 1..22 adaptor:NGB01088.1 contig_15523 5782 5762..5782 adaptor:NGB01088.1 contig_15529 10225 10143..10225 adaptor:NGB00360.1 contig_15569 9645 9612..9645 adaptor:NGB01090.1 contig_15596 7163 1..42 adaptor:NGB00360.1 contig_15605 18521 1..31 adaptor:NGB01096.1 contig_15672 8446 1..213 adaptor:NGB01088.1 contig_15686 22141 58..90 adaptor:NGB00847.1 contig_15708 18098 17996..18098 adaptor:NGB00847.1 contig_15736 18284 18252..18284 adaptor:NGB01096.1 contig_15777 17192 1..45 adaptor:NGB00360.1 contig_15812 8602 1..77 adaptor:NGB00360.1 contig_15959 10936 10913..10936 adaptor:NGB01096.1 contig_15972 11324 1..71 adaptor:NGB00360.1 contig_15974 24312 24243..24312 adaptor:NGB00847.1 contig_16057 8838 8775..8838 adaptor:NGB00847.1 contig_16088 7608 1..71 adaptor:NGB00360.1 contig_16142 10392 1..53 adaptor:NGB00847.1 contig_1617 14870 255..310 adaptor:NGB00360.1 contig_16183 9226 9205..9226 adaptor:NGB01088.1 contig_16188 62666 62586..62666 adaptor:NGB00847.1 contig_16370 7868 1..42 adaptor:NGB00847.1 contig_16416 19512 1..21 adaptor:NGB01088.1 contig_1645 25016 24951..25016 adaptor:NGB00360.1 contig_16510 31845 31776..31845 adaptor:NGB00847.1 contig_16529 17342 1..45 adaptor:NGB00360.1 contig_16558 9338 9097..9338 adaptor:NGB00360.1 contig_16573 6590 6521..6590 adaptor:NGB00847.1 contig_16608 7397 7324..7397 adaptor:NGB00847.1 contig_16631 11055 1..50 adaptor:NGB00360.1 contig_16641 5482 1..190 adaptor:NGB01088.1 contig_1667 35244 35200..35244 adaptor:NGB01029.1 contig_16682 14500 1..71 adaptor:NGB00847.1 contig_16699 6216 6148..6216 adaptor:NGB00360.1 contig_16734 12674 12625..12674 adaptor:NGB00360.1 contig_16790 6341 1..51 adaptor:NGB00360.1 contig_16807 7512 1..36 adaptor:NGB01096.1 contig_16817 20743 1..155 adaptor:NGB01088.1 contig_16839 6969 1..69 adaptor:multiple contig_16870 10948 1..49 adaptor:NGB00847.1 contig_16880 5622 5549..5622 adaptor:NGB00360.1 contig_16889 9182 1..40 adaptor:NGB00360.1 contig_16911 6691 1..28 adaptor:NGB01088.1 contig_16921 9432 9358..9432 adaptor:NGB00360.1 contig_16951 14285 14262..14285 adaptor:NGB01088.1 contig_17021 12242 1..75 adaptor:NGB00360.1 contig_17092 22712 1..64 adaptor:NGB00360.1 contig_17147 7706 7685..7706 adaptor:NGB01096.1 contig_17195 15668 15643..15668 adaptor:NGB01096.1 contig_17214 7881 7819..7881 adaptor:NGB00847.1 contig_17299 7861 7830..7861 adaptor:NGB01088.1 contig_17344 8915 8765..8823 adaptor:NGB00360.1 contig_17361 8425 1..26 adaptor:NGB01096.1 contig_17422 11017 10964..11017 adaptor:NGB00360.1 contig_17471 5988 5964..5988 adaptor:NGB01096.1 contig_17505 10208 1..74 adaptor:NGB00360.1 contig_17506 6091 1..61 adaptor:NGB00360.1 contig_17520 6084 6028..6084 adaptor:NGB00360.1 contig_17538 5796 5766..5796 adaptor:NGB01096.1 contig_17558 7066 6837..7066 adaptor:NGB01080.1 contig_17561 15165 1..206 adaptor:NGB01083.1 contig_17594 6976 1..26 adaptor:NGB01088.1 contig_17655 14371 14177..14371 adaptor:NGB01088.1 contig_17671 17801 1..50 adaptor:NGB00847.1 contig_17680 5752 5693..5752 adaptor:NGB00847.1 contig_17738 6456 1..44 adaptor:NGB00360.1 contig_17741 10917 10889..10917 adaptor:NGB01096.1 contig_17775 5928 1..79 adaptor:NGB00847.1 contig_17804 11597 11562..11597 adaptor:NGB00846.1 contig_17872 11319 11278..11319 adaptor:NGB00847.1 contig_17876 5647 5613..5647 adaptor:NGB01083.1 contig_17925 9923 1..22 adaptor:NGB01088.1 contig_17938 5246 1..23 adaptor:NGB01088.1 contig_18016 8044 1..29 adaptor:NGB01096.1 contig_18017 6668 6647..6668 adaptor:NGB01096.1 contig_18044 11330 11299..11330 adaptor:NGB01096.1 contig_18049 10560 1..88 adaptor:NGB00847.1 contig_18173 12243 1..159 adaptor:NGB01096.1 contig_18175 8788 8765..8788 adaptor:NGB01096.1 contig_18177 11418 11340..11418 adaptor:multiple contig_18182 11901 11832..11901 adaptor:NGB00847.1 contig_18201 6059 6038..6059 adaptor:NGB01096.1 contig_18222 11216 11136..11216 adaptor:NGB00847.1 contig_18228 8386 8361..8386 adaptor:NGB01088.1 contig_18321 5922 5897..5922 adaptor:NGB01096.1 contig_18370 5400 5085..5116 adaptor:NGB00747.1 contig_18453 5849 1..38 adaptor:NGB00360.1 contig_1846 23210 1..64 adaptor:NGB00360.1 contig_18479 5209 1..44 adaptor:NGB00360.1 contig_18486 5749 5726..5749 adaptor:NGB01088.1 contig_18488 5217 1..19 adaptor:NGB01088.1 contig_1969 65776 1..60 adaptor:NGB00360.1 contig_197 9215 1..83 adaptor:NGB00847.1 contig_1977 13765 1..35 adaptor:NGB01093.1 contig_1999 53427 53398..53427 adaptor:NGB01096.1 contig_2125 11803 11769..11803 adaptor:NGB01083.1 contig_2151 9544 1..37 adaptor:NGB01029.1 contig_2179 38972 1..67 adaptor:NGB00360.1 contig_2186 31110 30935..31110 adaptor:NGB01096.1 contig_2203 60314 60124..60187 adaptor:NGB00847.1 contig_2278 33271 1..36 adaptor:NGB01090.1 contig_2305 17957 1..58 adaptor:NGB00360.1 contig_2361 48816 48764..48816 adaptor:NGB00847.1 contig_242 49604 49535..49604 adaptor:NGB00360.1 contig_2429 76318 76242..76318 adaptor:NGB00847.1 contig_2430 70439 70373..70439 adaptor:NGB00847.1 contig_2459 63920 1..96 adaptor:NGB00847.1 contig_2485 31300 31260..31300 adaptor:NGB00360.1 contig_2508 25152 25095..25152 adaptor:NGB00847.1 contig_2650 36583 1..58 adaptor:NGB00847.1 contig_2668 22089 22052..22089 adaptor:NGB01029.1 contig_2735 13614 1..19 adaptor:NGB01088.1 contig_2781 50403 1..70 adaptor:NGB00847.1 contig_2800 30768 22802..22846 adaptor:NGB00360.1 contig_2824 44109 1..38 adaptor:NGB00847.1 contig_2888 19121 1..89 adaptor:NGB00360.1 contig_2900 36871 1..32 adaptor:NGB01088.1 contig_2949 25959 25916..25959 adaptor:NGB00360.1 contig_2970 20833 1..46 adaptor:NGB00360.1 contig_2986 16429 1..43 adaptor:NGB00360.1 contig_3069 38956 38904..38956 adaptor:NGB00847.1 contig_3106 9135 1..87 adaptor:NGB00847.1 contig_3124 70101 70072..70101 adaptor:NGB01088.1 contig_3129 30402 30379..30402 adaptor:NGB01088.1 contig_3147 10611 10586..10611 adaptor:NGB01096.1 contig_3190 117726 117687..117726 adaptor:NGB01029.1 contig_3243 44291 44273..44291 adaptor:NGB01096.1 contig_3276 57911 1..42 adaptor:NGB00360.1 contig_341 67008 1..22 adaptor:NGB01096.1 contig_3542 16855 1..60 adaptor:NGB00847.1 contig_3595 29288 1..79 adaptor:NGB00847.1 contig_3712 73078 1..78 adaptor:NGB00847.1 contig_3840 40472 40414..40472 adaptor:NGB00360.1 contig_3868 33875 33819..33875 adaptor:NGB00360.1 contig_3903 40080 40010..40080 adaptor:NGB00847.1 contig_3996 44010 43970..44010 adaptor:NGB00360.1 contig_4001 26085 1..73 adaptor:NGB00847.1 contig_4014 30676 30590..30676 adaptor:NGB00360.1 contig_4019 49543 1..76 adaptor:NGB00360.1 contig_4036 58848 58696..58848 adaptor:NGB00846.1 contig_4084 41308 41210..41308 adaptor:NGB00360.1 contig_4095 24801 1..70 adaptor:NGB00847.1 contig_4098 27393 1..189 adaptor:NGB01096.1 contig_410 57740 57678..57740 adaptor:NGB00360.1 contig_4172 20870 9717..9749 adaptor:NGB01096.1 contig_4318 55870 55805..55870 adaptor:NGB00360.1 contig_432 58593 58569..58593 adaptor:NGB01088.1 contig_4323 87370 87304..87370 adaptor:NGB00847.1 contig_4365 27401 27350..27401 adaptor:NGB00847.1 contig_4516 14480 1..98 adaptor:NGB00847.1 contig_452 34031 1..23 adaptor:NGB01096.1 contig_4530 63069 63006..63069 adaptor:NGB00360.1 contig_4651 67570 67518..67570 adaptor:NGB00847.1 contig_4679 20970 1..38 adaptor:NGB00360.1 contig_4686 7411 1..24 adaptor:NGB01096.1 contig_4743 37926 1..79 adaptor:NGB00360.1 contig_4765 11248 11167..11248 adaptor:NGB00360.1 contig_4801 91339 1..50 adaptor:NGB00360.1 contig_4812 37300 37121..37300 adaptor:NGB01093.1 contig_4820 80899 80862..80899 adaptor:NGB00360.1 contig_4904 9220 1..52 adaptor:NGB00847.1 contig_4916 29759 29718..29759 adaptor:NGB00847.1 contig_4924 19015 1..49 adaptor:NGB00847.1 contig_4939 23620 23574..23620 adaptor:NGB01029.1 contig_4956 40890 1..24 adaptor:NGB01088.1 contig_4994 71509 71447..71509 adaptor:NGB00847.1 contig_501 34157 34116..34157 adaptor:NGB00847.1 contig_5036 13162 1..77 adaptor:NGB00360.1 contig_5052 64212 1..170 adaptor:NGB01096.1 contig_5063 35265 35243..35265 adaptor:NGB01096.1 contig_5090 27510 27441..27510 adaptor:NGB00847.1 contig_5157 5988 5805..5988 adaptor:NGB00847.1 contig_5168 6086 6051..6086 adaptor:NGB00846.1 contig_5176 9131 1..41 adaptor:NGB00360.1 contig_5243 44178 1..88 adaptor:NGB00847.1 contig_5270 39229 39177..39229 adaptor:NGB00847.1 contig_5452 30446 1..36 adaptor:NGB00846.1 contig_5576 58918 1..34 adaptor:NGB01096.1 contig_5582 108611 1..87 adaptor:NGB00847.1 contig_5590 55235 55210..55235 adaptor:NGB01088.1 contig_5700 8246 1..82 adaptor:NGB00847.1 contig_5815 99837 1..63 adaptor:NGB00847.1 contig_5820 11616 1..202 adaptor:NGB00847.1 contig_5878 55755 1..26 adaptor:NGB01096.1 contig_59 12390 1..24 adaptor:NGB01096.1 contig_5959 11737 11532..11737 adaptor:NGB01096.1 contig_6065 11492 1..32 adaptor:NGB01088.1 contig_6067 19311 1..39 adaptor:NGB01029.1 contig_6092 14700 1..37 adaptor:NGB01029.1 contig_6194 32760 1..19 adaptor:NGB01088.1 contig_620 10761 1..206 adaptor:NGB01029.1 contig_6259 83001 1..50 adaptor:NGB00360.1 contig_6321 29279 29260..29279 adaptor:NGB01096.1 contig_6408 14690 1..74 adaptor:NGB00360.1 contig_6455 68530 68497..68530 adaptor:NGB01090.1 contig_6513 12061 11986..12061 adaptor:NGB00847.1 contig_6542 45321 1..41 adaptor:NGB00360.1 contig_6569 19579 19500..19579 adaptor:NGB00847.1 contig_6628 13125 13107..13125 adaptor:NGB01096.1 contig_6673 6733 6699..6733 adaptor:NGB01088.1 contig_6676 13298 13265..13298 adaptor:NGB01088.1 contig_6692 17411 1..43 adaptor:NGB00847.1 contig_6703 57771 1..63 adaptor:NGB00360.1 contig_6785 8258 8237..8258 adaptor:NGB01088.1 contig_6908 53004 52732..52792 adaptor:NGB00847.1 contig_6940 18777 18580..18777 adaptor:NGB00360.1 contig_6941 42032 41980..42032 adaptor:NGB00847.1 contig_6945 53258 1..71 adaptor:NGB00360.1 contig_6986 49101 1..21 adaptor:NGB01088.1 contig_701 57358 1..28 adaptor:NGB01096.1 contig_7017 41786 1..88 adaptor:NGB00360.1 contig_7035 53503 53477..53503 adaptor:NGB01096.1 contig_7046 12860 12812..12860 adaptor:NGB00360.1 contig_7081 27746 1..78 adaptor:NGB00847.1 contig_7082 26783 1..73 adaptor:NGB00847.1 contig_7083 44465 1..70 adaptor:NGB00847.1 contig_7117 33739 33661..33739 adaptor:NGB00360.1 contig_7197 5439 5361..5439 adaptor:NGB00360.1 contig_720 34826 34755..34826 adaptor:NGB00360.1 contig_7210 16719 1..30 adaptor:NGB01096.1 contig_7225 51589 51483..51519 adaptor:NGB01090.1 contig_7228 37410 1..64 adaptor:NGB00360.1 contig_7296 6652 1..80 adaptor:NGB00847.1 contig_7317 11682 1..30 adaptor:NGB01088.1 contig_7323 47612 47560..47612 adaptor:NGB00847.1 contig_7353 50534 50506..50534 adaptor:NGB01096.1 contig_7478 44000 43977..44000 adaptor:NGB01088.1 contig_7510 11029 1..22 adaptor:NGB01096.1 contig_7540 12614 12566..12614 adaptor:NGB00360.1 contig_7587 74260 74065..74260 adaptor:NGB00847.1 contig_7607 14652 1..31 adaptor:NGB01088.1 contig_7612 27455 27299..27354 adaptor:NGB00360.1 contig_7705 39772 1..49 adaptor:NGB00360.1 contig_7729 22305 1..172 adaptor:NGB00360.1 contig_7747 11568 11502..11568 adaptor:NGB00847.1 contig_7750 52785 52748..52785 adaptor:NGB01029.1 contig_7800 20628 20588..20628 adaptor:NGB00360.1 contig_7851 53514 53439..53514 adaptor:NGB00360.1 contig_7989 51399 1..97 adaptor:NGB00847.1 contig_7992 9120 9035..9120 adaptor:NGB00360.1 contig_7995 103073 103034..103073 adaptor:NGB00360.1 contig_8000 16924 1..85 adaptor:NGB00847.1 contig_8071 73728 73657..73728 adaptor:NGB00360.1 contig_809 20474 20399..20474 adaptor:NGB00360.1 contig_8139 33627 1..25 adaptor:NGB01088.1 contig_8165 17003 16958..17003 adaptor:NGB00847.1 contig_8207 30300 30275..30300 adaptor:NGB01096.1 contig_821 111683 111656..111683 adaptor:NGB01096.1 contig_8236 30705 1..70 adaptor:NGB00360.1 contig_8261 49091 1..181 adaptor:NGB00847.1 contig_8265 28139 27940..28139 adaptor:NGB00360.1 contig_8307 32654 32591..32654 adaptor:NGB00360.1 contig_8340 12953 12925..12953 adaptor:NGB01096.1 contig_8389 19738 1..75 adaptor:NGB00847.1 contig_8399 35159 1..147 adaptor:NGB01096.1 contig_8569 19455 1..38 adaptor:multiple contig_8735 42362 42335..42362 adaptor:NGB01088.1 contig_8737 22308 1..70 adaptor:NGB00360.1 contig_8790 14216 14198..14216 adaptor:NGB01096.1 contig_8797 6889 1..95 adaptor:NGB00847.1 contig_8815 39194 1..80 adaptor:NGB00360.1 contig_886 10028 1..76 adaptor:NGB00360.1 contig_8861 12192 12145..12192 adaptor:NGB00360.1 contig_8909 11109 11042..11109 adaptor:NGB00360.1 contig_8932 8331 8281..8331 adaptor:NGB00847.1 contig_8975 8730 8671..8730 adaptor:NGB00847.1 contig_8992 12682 12661..12682 adaptor:NGB01088.1 contig_8994 7982 7950..7982 adaptor:NGB01096.1 contig_9017 8069 7896..8069 adaptor:NGB00360.1 contig_9045 35343 535..598 adaptor:NGB00847.1 contig_9082 10766 1..28 adaptor:NGB01096.1 contig_9271 17773 17750..17773 adaptor:NGB01096.1 contig_9273 12180 1..180 adaptor:NGB01096.1 contig_9287 6067 1..77 adaptor:NGB00847.1 contig_9474 33382 33060..33111 adaptor:NGB00360.1 contig_9495 19348 19274..19348 adaptor:NGB00847.1 contig_9540 30855 30836..30855 adaptor:NGB01088.1 contig_9591 10604 1..41 adaptor:NGB00847.1 contig_9628 15083 1..34 adaptor:NGB01096.1 contig_9677 5510 5486..5510 adaptor:NGB01088.1 contig_9693 9823 1..84 adaptor:NGB00847.1 contig_9825 54363 54309..54363 adaptor:NGB00847.1 contig_9863 14033 14013..14033 adaptor:NGB01088.1 contig_9993 35388 1..26 adaptor:NGB01096.1 From xvazquezc at gmail.com Mon Oct 23 16:02:47 2017 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Tue, 24 Oct 2017 09:02:47 +1100 Subject: [maker-devel] Contamination report from NCBI In-Reply-To: References: Message-ID: Hi there, Did you perform quality and adapter trimming of your raw reads? That's actually an assembly issue. I would seriously encourage you to redo the assembly before continuing. If that isnt possible, start by removing those sequences and split the contigs at those places as suggested in the report. For the annotation part, not 100% sure but I'd say start with the "Merge/resolve legacy annotations" steps but maybe Carson or Daniel have a different suggestion http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Merge.2FResolve_Legacy_Annotations Cheers, Xabi On 24 October 2017 at 00:30, Emmanuel Nnadi wrote: > Hello > > Good day. > > Please I submitted my sequence to NCBI and they sent back this > contamination report. > > Please how do I use maker to effect the correction > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/ > publications > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Mon Oct 23 17:21:06 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 23 Oct 2017 23:21:06 +0000 Subject: [maker-devel] Contamination report from NCBI In-Reply-To: References: Message-ID: <8B4331B5-9D10-478A-91A5-80AF702CD9CD@illinois.edu> It looks like the adapter is primarily at the ends, which is easy to remove. However, I agree, removing these and redoing the assembly may improve the assembly quality. chris From: maker-devel on behalf of Xabier V?zquez-Campos Date: Monday, October 23, 2017 at 5:03 PM To: Emmanuel Nnadi Cc: Maker Mailing List , "Ence, daniel" Subject: Re: [maker-devel] Contamination report from NCBI Hi there, Did you perform quality and adapter trimming of your raw reads? That's actually an assembly issue. I would seriously encourage you to redo the assembly before continuing. If that isnt possible, start by removing those sequences and split the contigs at those places as suggested in the report. For the annotation part, not 100% sure but I'd say start with the "Merge/resolve legacy annotations" steps but maybe Carson or Daniel have a different suggestion http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Merge.2FResolve_Legacy_Annotations Cheers, Xabi On 24 October 2017 at 00:30, Emmanuel Nnadi > wrote: Hello Good day. Please I submitted my sequence to NCBI and they sent back this contamination report. Please how do I use maker to effect the correction Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Xabier V?zquez-Campos, PhD Research Associate NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmokrejs at gmail.com Tue Oct 24 04:23:38 2017 From: mmokrejs at gmail.com (=?UTF-8?Q?Martin_MOKREJ=c5=a0?=) Date: Tue, 24 Oct 2017 12:23:38 +0200 Subject: [maker-devel] Contamination report from NCBI In-Reply-To: References: Message-ID: Hi Emmanuel, use trimmomatic or cutadapt to remove the adapters and check the output file for unremoved cases. Once they are all removed redo the assembly. Martin Emmanuel Nnadi wrote: > Hello > > Good day. > > Please I submitted my sequence to NCBI and they sent back this contamination report. > > Please how do I use maker to effect the correction -- Martin Mokrejs, Ph.D. Adapter/artefact removal from datasets based on the following technologies: 454 / IonTorrent / Evrogen MINT / Clontech SMART / ..., Illumina http://www.bioinformatics.cz/software/supported-protocols/ From eennadi at gmail.com Tue Oct 24 04:44:20 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Tue, 24 Oct 2017 11:44:20 +0100 Subject: [maker-devel] Contamination report from NCBI In-Reply-To: References: Message-ID: Thanks! Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications On Oct 24, 2017 11:23 AM, "Martin MOKREJ?" wrote: > Hi Emmanuel, > use trimmomatic or cutadapt to remove the adapters and check the output > file for unremoved cases. Once they are all removed redo the assembly. > Martin > > Emmanuel Nnadi wrote: > > Hello > > > > Good day. > > > > Please I submitted my sequence to NCBI and they sent back this > contamination report. > > > > Please how do I use maker to effect the correction > > -- > Martin Mokrejs, Ph.D. > Adapter/artefact removal from datasets based on the following technologies: > 454 / IonTorrent / Evrogen MINT / Clontech SMART / ..., Illumina > http://www.bioinformatics.cz/software/supported-protocols/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Tue Oct 24 10:54:13 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Tue, 24 Oct 2017 12:54:13 -0400 Subject: [maker-devel] gene annotation for a better genome In-Reply-To: <5AFEDD05-DF02-463F-A6EE-1619A9BB968D@gmail.com> References: <5AFEDD05-DF02-463F-A6EE-1619A9BB968D@gmail.com> Message-ID: Dear Carson: Thank you again for your suggestions. I just get the new genome assembly of NMR and start to do gene annotation. I understand you ideas about this. But can I simply use the old genome transcripts as transcript evidence, and just following the standard Maker2 pipeline? I set est2genome=1 and provide the mRNA sequences in the fasta format for the first round training of SNAP. For transcripts I have the following choices. I think the first choice is more reliable and better, right? (1) There are about 60,000 RefSeq transcripts from NCBI. So I downloaded those sequences in fasta format. (2) We have the raw data of RNA-seq from 11 tissues, we can do assembly by trinity for each sample and then get the transcripts. But I think most of the RNA-seq should have been submitted to NCBI. BTW, if we use the RefSeq data from NCBI, we can download the mRNA sequences, coding sequences or protein sequences. I wonder which type of data are the best to train the SNAP? For Augustus, we will use BUSCO to train it. Many thanks. Best Quanwei 2017-09-29 12:36 GMT-04:00 Carson Holt : > You can try using the est2genome=1 option to map the old models forward > onto the new assembly as if they were ESTs (add a line that says > est_forward=1 to the control file to maintain old naming and set est=1 to > the old model transcript file). Then provide the final models as a pred_gff > for a subsuquent run (i.e. a traditional MAKER run where you are annotating > the new assembly with transcript and protein evidence and ab initio > predictors). Don?t supply the old models to est= on that run. > > The idea behind doing it this way is: > 1. You need to get old models onto the new assembly so coordinates will > change. So by doing it this way, you will at least be able to move many > models forward based on homology. > 2. By providing the models to pred_gff on a subsequent MAKER run, you are > just letting old models compete against new annotations. They will be > rejected if they have no evidence support, or can be kept if they score > better than alternate models from SNAP/Augustus. That way you have the > chance to integrate old models while at the same time rejecting some old > models that have no evidence overlap. > > ?Carson > > > > On Sep 28, 2017, at 6:05 AM, Quanwei Zhang > wrote: > > > > Hello: > > > > Recently, we got a new version of NMR genome, whose genome had been > assembled and annotated a few years ago. We can download the gene > annotation from NCBI. > > > > Now we want to annotate the new genome using Maker2 pipeline. I wonder > how can I fully make use of existing annotations. On the other hand, since > the previous genome is not very well assemblies, some genes annotation > maybe false positives. I hope those false positive genes in previous > annotation won't mislead Maker2 for current gene annotation. > > > > Do you have any suggestions. Thanks > > > > Best > > Quanwei > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Oct 24 16:26:00 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 24 Oct 2017 16:26:00 -0600 Subject: [maker-devel] gene annotation for a better genome In-Reply-To: References: <5AFEDD05-DF02-463F-A6EE-1619A9BB968D@gmail.com> Message-ID: Yes. If you use est2genome it will just align the model, and then find the longest ORF. So it is a quick way to jsut align old models to the new assembly. Alternatively you can just do de novo annotation. ?Carson > On Oct 24, 2017, at 10:54 AM, Quanwei Zhang wrote: > > Dear Carson: > > Thank you again for your suggestions. I just get the new genome assembly of NMR and start to do gene annotation. I understand you ideas about this. But can I simply use the old genome transcripts as transcript evidence, and just following the standard Maker2 pipeline? I set est2genome=1 and provide the mRNA sequences in the fasta format for the first round training of SNAP. > > For transcripts I have the following choices. I think the first choice is more reliable and better, right? > (1) There are about 60,000 RefSeq transcripts from NCBI. So I downloaded those sequences in fasta format. > (2) We have the raw data of RNA-seq from 11 tissues, we can do assembly by trinity for each sample and then get the transcripts. But I think most of the RNA-seq should have been submitted to NCBI. > > BTW, if we use the RefSeq data from NCBI, we can download the mRNA sequences, coding sequences or protein sequences. I wonder which type of data are the best to train the SNAP? For Augustus, we will use BUSCO to train it. > > Many thanks. > > Best > Quanwei > > > > > 2017-09-29 12:36 GMT-04:00 Carson Holt >: > You can try using the est2genome=1 option to map the old models forward onto the new assembly as if they were ESTs (add a line that says est_forward=1 to the control file to maintain old naming and set est=1 to the old model transcript file). Then provide the final models as a pred_gff for a subsuquent run (i.e. a traditional MAKER run where you are annotating the new assembly with transcript and protein evidence and ab initio predictors). Don?t supply the old models to est= on that run. > > The idea behind doing it this way is: > 1. You need to get old models onto the new assembly so coordinates will change. So by doing it this way, you will at least be able to move many models forward based on homology. > 2. By providing the models to pred_gff on a subsequent MAKER run, you are just letting old models compete against new annotations. They will be rejected if they have no evidence support, or can be kept if they score better than alternate models from SNAP/Augustus. That way you have the chance to integrate old models while at the same time rejecting some old models that have no evidence overlap. > > ?Carson > > > > On Sep 28, 2017, at 6:05 AM, Quanwei Zhang > wrote: > > > > Hello: > > > > Recently, we got a new version of NMR genome, whose genome had been assembled and annotated a few years ago. We can download the gene annotation from NCBI. > > > > Now we want to annotate the new genome using Maker2 pipeline. I wonder how can I fully make use of existing annotations. On the other hand, since the previous genome is not very well assemblies, some genes annotation maybe false positives. I hope those false positive genes in previous annotation won't mislead Maker2 for current gene annotation. > > > > Do you have any suggestions. Thanks > > > > Best > > Quanwei > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daren.card at gmail.com Wed Oct 25 06:17:13 2017 From: daren.card at gmail.com (Daren C. Card) Date: Wed, 25 Oct 2017 07:17:13 -0500 Subject: [maker-devel] MAKER RepeatRunner error on long scaffolds only In-Reply-To: <49A07052-11CE-4D20-A8E1-2E036F04C45C@gmail.com> References: <2460BB61-C918-40B5-ABF2-03193BF13CCC@gmail.com> <52A27F91-063E-45C5-BEE0-BED0BF4E861E@gmail.com> <228ECD18-7B0E-47EB-9F58-FA3C31421A52@gmail.com> <90B18E05-63DB-4458-BC9B-807972BE1414@gmail.com> <97656D7C-3613-4B0B-9D99-0441AC28ABCC@gmail.com> <49A07052-11CE-4D20-A8E1-2E036F04C45C@gmail.com> Message-ID: <0406D4C3-9C43-4198-B2EA-241C6C504425@gmail.com> Hi Carson (and CCed MAKER list for the record), Thanks for troubleshooting my issue further. Good to hear that the run should ultimately work, but strange it isn?t for me. I?ll keep playing with it and will hopefully get it sorted out by running through the list you suggested. Thanks again for the help, Daren > On Oct 24, 2017, at 11:27 AM, Carson Holt wrote: > > I cannot seem to replicate this. I ran with MAKER 2.31.8 and 2.31.9, both with and without the GFF3 file (total of 4 runs). It succeeded without issues in every case. > > The only things I can think to try are. > 1. Reinstall BLAST+. Even though you have 2.6.0, just try it anyways. Also Install rmblast 2.6.0 for use wth RepeatMasker (requires that you install from source). > 2. Maker sure you run ./configure inside RepeatMasker to let it know about the new rmblast installation. > 3. Change the location of blast and related scripts in maker_exe.ctl otherwise MAKER won?t know to use your new installation. > 4. delete the mpi_blastdb directory under MAKER?s output directory tp force it to rebuild all BLAST indexes. > 5. delete any fle with a ?.db? extension in the maker output directory to force it to rebuld all GFF3 indexes. > 6. Update BioPerl to the current CPAN version. > > Also here is a link to the results I got for your contig (version 2.31.8 using the repeat masking GFF3 file) ?> http://weatherby.genetics.utah.edu/data/scaffold-1.tgz > > ?Carson > > > >> On Oct 17, 2017, at 6:46 AM, Daren C. Card wrote: >> >> Hi Carson, >> >> Thanks for offering to take a further look at this. I?ve uploaded all the files that I think you?d need to run MAKER on your systems, but let me know if you need anything else. My username is ?guest_5038?. >> >> Repeat annotations GFF is from RepeatModeler, with simple repeats filtered away. Transcript evidence was from Trinity assembly of several RNAseq libraries. Several sets of protein evidence from related species. Also have augustus HMM trained based on the genome assembly using BUSCO with retraining turned on. >> >> The command I?ve used is below, and here are the software versions I?m working with: >> >> Maker - 2.31.8 >> BLAST - 2.6.0 >> Augustus - 3.2.3 >> RepeatMasker - 4.0.6 >> >> mpiexec -n 12 maker -base CroVir_rnd1_chr1 round1_maker_opts.chr1.ctl maker_bopts.ctl maker_exe.ctl >> >> Thanks again! >> Daren >> >> >>> On Oct 13, 2017, at 10:37 AM, Carson Holt wrote: >>> >>> So you have an input GFF3 file? Could you send it to me along with the problem contig. If you want you can upload the maker control files and evidence sets, and I can just recreate the run for the contig. >>> >>> Upload here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>> >>> ?Carson >>> >>> >>> >>>> On Oct 12, 2017, at 8:22 PM, Daren C. Card wrote: >>>> >>>> Hi Carson, >>>> >>>> Thanks for the help. Issue is still lingering. I?ve tried my full ?ideal? run using both the BLAST legacy 2.2.26 and also 2.6 and get the same error, so doesn?t seem to be a BLAST issue. Or is one that won?t be easy to overcome. >>>> >>>> Using BLAST v. 2.6, I tried some more runs turning off RepeatRunner or excluding the complex repeat GFF I?m trying to supply. Seems to be running fine without my GFF, which indicates to me that the issue is this file and not BLAST. Disclaimer: I didn?t run the entire scaffold since it is quite large, but it went well past the point at which it was otherwise failing which leads me to believe it would finish okay. >>>> >>>> I validated the GFF at http://genometools.org/cgi-bin/gff3validator.cgi. I had previously had <10 negative start coordinates for the repeat coordinates in the attributes field of the GFF, which I just set to 1 to give a clean GFF. This was what I used for the runs I described above, so whatever issue there is with this GFF is a mystery to me. >>>> >>>> What advice do you have for further troubleshooting to try to determine what part of the GFF is causing the issue? I don?t see any obvious way info about how the sequence or the GFF is partitioned up for the annotation among the output files produced, so any help you can provide would be great. >>>> >>>> Hoping I can resolve this as maybe this is useful to others. Weird that I?m getting this error, as I?ve annotated several other genomes in a similar manner and never had this issue. They were less contiguous, but can?t imagine that really mattering. >>>> >>>> Thanks, >>>> Daren >>>> >>>> >>>>> On Oct 8, 2017, at 7:37 PM, Carson Holt wrote: >>>>> >>>>> MAKER will use whatever blast is indicated in maker_exe.ctl, so make sure the new installation is the one indicated there. RepeatRunner is not part of RepeatMasker, and is a separate step that is essentially just a modified BLASTX against a protein database. So the standard NCBI blast+ installation is what gets used for that (not RMBLAST). >>>>> >>>>> The error you get is because the BLAST report is truncated. At the top of a BLAST report there is a summary of results, and then below there are details about each result. What is happening is that there are results in the top summary that are not being found in the bottom detail section. If Updating to BLAST+ 2.6 does not fix it for you, you may need to drop to legacy NCBI BLAST (i.e. the one that is not the BLAST+ rewrite). Here ?> ftp://ftp.ncbi.nlm.nih.gov/blast/executables/legacy/2.2.26/ >>>>> >>>>> ?Carson >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> On Oct 6, 2017, at 6:23 AM, Daren C. Card wrote: >>>>>> >>>>>> Dear Carson, >>>>>> >>>>>> Thanks so much for the quick reply. I updated BLAST to v2.6 and reran the configure script for RepeatMasker. Looks like MAKER should natively work with the BLAST that is available in the $PATH. >>>>>> >>>>>> Unfortunately, I?m still getting the same error what appears to be at roughly the same spot (~child 226). I?ve copied the stderr below. I checked my GFF file and I don?t see any issues with coordinates. I?m going to try running without a GFF of repeat annotations to see what that does, but in the meantime I wanted to send an update and see if there is anything else I should look into. >>>>>> >>>>>> Thank you, >>>>>> Daren Card >>>>>> >>>>>> >>>>>> ################################################ >>>>>> doing repeat masking >>>>>> re reading repeat masker report. >>>>>> /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.simple.rb.out >>>>>> doing blastx repeats >>>>>> re reading blast report. >>>>>> /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.te_proteins%2Efasta.repeatrunner >>>>>> deleted:2 hits >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> collecting blastx repeatmasking >>>>>> processing all repeats >>>>>> in cluster::shadow_cluster... >>>>>> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >>>>>> --> rank=NA, hostname=moonunit0 >>>>>> ERROR: Failed while processing all repeats >>>>>> ERROR: Chunk failed at level:3, tier_type:1 >>>>>> FAILED CONTIG:scaffold-1 >>>>>> >>>>>> ERROR: Chunk failed at level:2, tier_type:0 >>>>>> FAILED CONTIG:scaffold-1 >>>>>> >>>>>> examining contents of the fasta file and run log >>>>>> ################################################ >>>>>> >>>>>> >>>>>> >>>>>>> On Oct 4, 2017, at 11:03 AM, Carson Holt wrote: >>>>>>> >>>>>>> The point where it dies is because there is no start/end coordinate for one of the alignments. The issue can either be with the GFF3 you gave it or is a truncated BLAST report. Recently there have been a number of weird BLAST+ issues related to truncated reports. Updating to 2.6+ seems to solve it for most people. There is also a 2.6 update for rmblast inside RepeatMasker. I submitted a bug report and example set to BLAST a few months ago. >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>>> On Oct 4, 2017, at 9:53 AM, Daren C. Card wrote: >>>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I?ve been having an issue with MAKER (v. 2.31.8) that I haven?t been able to overcome, and no former questions have really addressed or helped fix the problem. I?ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size? >>>>>>>> >>>>>>>> I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can?t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I?m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library). >>>>>>>> >>>>>>>> Any help would be greatly appreciated. >>>>>>>> Daren Card >>>>>>>> >>>>>>>> University of Texas Arlington >>>>>>>> >>>>>>>> ################################################### >>>>>>>> doing blastx repeats >>>>>>>> running blast search. >>>>>>>> #--------- command -------------# >>>>>>>> Widget::blastx: >>>>>>>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner >>>>>>>> #-------------------------------# >>>>>>>> deleted:0 hits >>>>>>>> collecting blastx repeatmasking >>>>>>>> processing all repeats >>>>>>>> in cluster::shadow_cluster... >>>>>>>> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >>>>>>>> --> rank=3, hostname=moonunit0 >>>>>>>> ERROR: Failed while processing all repeats >>>>>>>> ERROR: Chunk failed at level:3, tier_type:1 >>>>>>>> FAILED CONTIG:scaffold-1 >>>>>>>> >>>>>>>> doing blastx repeats >>>>>>>> running blast search. >>>>>>>> #--------- command -------------# >>>>>>>> Widget::blastx: >>>>>>>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner >>>>>>>> #-------------------------------# >>>>>>>> ERROR: Chunk failed at level:2, tier_type:0 >>>>>>>> FAILED CONTIG:scaffold-1 >>>>>>>> >>>>>>>> deleted:0 hits >>>>>>>> deleted:0 hits >>>>>>>> ################################################### >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> maker-devel mailing list >>>>>>>> maker-devel at box290.bluehost.com >>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>> >>>>>> >>>>> >>>> >>> >> > From venyao at qq.com Wed Oct 25 01:25:25 2017 From: venyao at qq.com (=?ISO-8859-1?B?V2VuIFlhbw==?=) Date: Wed, 25 Oct 2017 15:25:25 +0800 Subject: [maker-devel] NNN in maker output transcript Message-ID: Dear guys, Recently, I run maker to annotate a genome. I found that the transcript fasta file output by Maker contains "NNN". Is this normal? If not, what's going on? Is this a bug of maker or my configuration of maker is not correct? I told maker to use snap and augustus for de novo prediction and use exonerate to align ESTs and proteins. Thanks! Wen Yao -------------- next part -------------- An HTML attachment was scrubbed... URL: From dandence at gmail.com Wed Oct 25 09:42:04 2017 From: dandence at gmail.com (Daniel Ence) Date: Wed, 25 Oct 2017 11:42:04 -0400 Subject: [maker-devel] NNN in maker output transcript In-Reply-To: References: Message-ID: <4913D7BA-CD9B-4B7F-83EF-B8072B4950A6@gmail.com> Hi Wen Yao, Do you mean that some of the transcript sequences contain ?N? characters or that an entire transcript sequence is ?NNN?? > On Oct 25, 2017, at 3:25 AM, Wen Yao wrote: > > Dear guys, > > Recently, I run maker to annotate a genome. I found that the transcript fasta file output by Maker contains "NNN". Is this normal? > If not, what's going on? Is this a bug of maker or my configuration of maker is not correct? > I told maker to use snap and augustus for de novo prediction and use exonerate to align ESTs and proteins. > > Thanks! > > Wen Yao > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Oct 25 09:42:34 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 25 Oct 2017 09:42:34 -0600 Subject: [maker-devel] NNN in maker output transcript In-Reply-To: References: Message-ID: <96D45DF3-83D0-4EF3-AE29-1B929A369B81@gmail.com> The gene predictor generates the model. I don?t think snap will generate a model that contain an N. Augustus might be able to across a single codon (I?m not sure there). The N means that the nucleotide is unknown (i.e. it can be A, T, C or G). An NNN codon produces the amino acid X (which is the unknown amino acid code). So it is possible that for something as short as one or two codon?s that the predictor thinks it?s ok to assume that it will produce a valid codon and uses it to complete the reading frame. Alternatively if you are using est2genome=1 or est_gff then what you are seeing is just the result of an alignment which can align to a couple of N's. You should not use est2genome=1 for anything but training. Also est_gff or pred_gff will not be filtered if you supplied an feature location that includes an N. ?Carson > On Oct 25, 2017, at 1:25 AM, Wen Yao wrote: > > Dear guys, > > Recently, I run maker to annotate a genome. I found that the transcript fasta file output by Maker contains "NNN". Is this normal? > If not, what's going on? Is this a bug of maker or my configuration of maker is not correct? > I told maker to use snap and augustus for de novo prediction and use exonerate to align ESTs and proteins. > > Thanks! > > Wen Yao > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Oct 25 09:43:37 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 25 Oct 2017 09:43:37 -0600 Subject: [maker-devel] NNN in maker output transcript In-Reply-To: <96D45DF3-83D0-4EF3-AE29-1B929A369B81@gmail.com> References: <96D45DF3-83D0-4EF3-AE29-1B929A369B81@gmail.com> Message-ID: Also you can check the source of the model by looking a the name. i.e. does it have, augustus, snap, or est2genome in the name? ?Carson > On Oct 25, 2017, at 9:42 AM, Carson Holt wrote: > > The gene predictor generates the model. I don?t think snap will generate a model that contain an N. Augustus might be able to across a single codon (I?m not sure there). The N means that the nucleotide is unknown (i.e. it can be A, T, C or G). An NNN codon produces the amino acid X (which is the unknown amino acid code). So it is possible that for something as short as one or two codon?s that the predictor thinks it?s ok to assume that it will produce a valid codon and uses it to complete the reading frame. Alternatively if you are using est2genome=1 or est_gff then what you are seeing is just the result of an alignment which can align to a couple of N's. You should not use est2genome=1 for anything but training. Also est_gff or pred_gff will not be filtered if you supplied an feature location that includes an N. > > ?Carson > > > > >> On Oct 25, 2017, at 1:25 AM, Wen Yao wrote: >> >> Dear guys, >> >> Recently, I run maker to annotate a genome. I found that the transcript fasta file output by Maker contains "NNN". Is this normal? >> If not, what's going on? Is this a bug of maker or my configuration of maker is not correct? >> I told maker to use snap and augustus for de novo prediction and use exonerate to align ESTs and proteins. >> >> Thanks! >> >> Wen Yao >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From eennadi at gmail.com Thu Oct 26 15:34:33 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Thu, 26 Oct 2017 22:34:33 +0100 Subject: [maker-devel] How to remove contigs from GFF file Message-ID: Hello, I need to remove sequences from my GFF file can someone help me with command line for such removal ERROR: valid [SEQ_FEAT.FeatureBeginsOrEndsInGap] Feature begins or ends in gap starting at 17625 FEATURE: Gene: CR513_57782 <46071> [lcl|contig_14719:17653-17724] [lcl|contig_14719: delta, dna len= 17790] ERROR: valid [SEQ_INST.ShortSeq] Sequence only 2 residues BIOSEQ: gnl|aceprd|CR513_62412: raw, aa len= 2 Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From bmoore at genetics.utah.edu Fri Oct 27 07:17:41 2017 From: bmoore at genetics.utah.edu (Marvin B Moore) Date: Fri, 27 Oct 2017 13:17:41 +0000 Subject: [maker-devel] Backlash running through my sequence In-Reply-To: <8FD23F25-92D4-4A9C-873B-BB559B2CCBF4@illinois.edu> References: <09603A3A-9DC0-40DC-A111-9DC1FCDF80BB@gmail.com> <8FD23F25-92D4-4A9C-873B-BB559B2CCBF4@illinois.edu> Message-ID: <98FAE3F3-7C52-4EDA-8FBB-5F43DB7D54C9@umail.utah.edu> Those look suspiciously like the remnants of end-of-line control characters. Since Windows, Mac OS X and Linux all use slightly different control characters to mark end-of-line I?d look at the upstream path of where your files come from and how they?ve been processed by you or others upstream MAKER (were they generated or processed on a MS or Mac server). One bizarre example we?ve seen is that files that simply pass through an MS Outlook server as an e-mail attachment have had their end-of-line characters converted to MS format. Good luck? Barry On Oct 17, 2017, at 1:11 PM, Fields, Christopher J > wrote: I agree with Carson, though my guess is any fasta converters will either fail on these characters as non-IUPAC, or will silently remove them. Running them through a converter may not solve all the issues though, as the backslash also appears in the FASTA headers at the end of the line: cjfields-imac:MAKER cjfields$ grep '>' sample_1.fasta | grep '\\' >contig_134\ >contig_149\ >contig_158\ >contig_222\ >contig_316\ >contig_582\ >contig_634\ >contig_700\ >contig_741\ ? I?m curious, was this edited using any particular program prior to MAKER (or was this an amalgam of different files)? chris From: maker-devel > on behalf of Carson Holt > Date: Monday, October 16, 2017 at 11:22 AM To: Emmanuel Nnadi > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Backlash running through my sequence I would not just remove them. The fact they are there calls into question how they got there in the first place. If you generated this file yourself, you may want to intead use fasta_tool. ?Carson On Oct 15, 2017, at 3:32 PM, Emmanuel Nnadi > wrote: Hi all, I am trying to running annotation on some of my sequences but noticed that i have backslash that runs through the sequence. Please how do I remove them I attached the sequence Thanks Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From bmoore at genetics.utah.edu Fri Oct 27 07:24:44 2017 From: bmoore at genetics.utah.edu (Marvin B Moore) Date: Fri, 27 Oct 2017 13:24:44 +0000 Subject: [maker-devel] QI codes insufficient - how to get frac exons with EST only? In-Reply-To: References: <93934B45-909D-48FD-A840-B4F59F15AB53@gmail.com> <6A3091A3-5F0E-470D-89F3-4B6C16E50F4B@gmail.com> Message-ID: Also, you could probably build these overlap sets on the command line by subsetting the MAKER GFF3 file and then using BedTools intersect for overlap queries. Barry On Oct 11, 2017, at 10:19 PM, Matt Simenc > wrote: Very good, thank you! Matt On Wed, Oct 11, 2017 at 8:22 AM, Carson Holt > wrote: Also look at GAL for building GFF3 feature queries ?> https://github.com/The-Sequence-Ontology/GAL ?Carson On Oct 11, 2017, at 9:18 AM, Michael Campbell > wrote: Hi Matt, I have a hacky way that I?ve done it. It requires running MAKER two more times but they are quicker runs. To identify the genes that have protein support I pass all of the annotation back to MAKER using the model_gff option in the maker_opts.ctl file. Then I pull out all of the protein2genome features from the big MAKER GFF3 file and pass them in using the protein_gff option. I turn off all repeat masking and run MAKER. It runs fast because it doesn?t have to run any gene finders, align evidence, or repeatmask. In the output any gene with an AED less than 1 has protein support. Then I do the same thing with est2genome lines from the big GFF3 file and put them in as est_gff. The output of that one gives you genes with EST support. Then the genes with an AED of less than one in both sets have support from protein and EST. Hope this helps, Mike On Oct 11, 2017, at 10:53 AM, Matt Simenc > wrote: Hey MAKER people, I would like to make a Venn diagram showing the kinds of evidence supporting gene models in my MAKER annotation where the left side shows number of genes with EST support only, the right side shows number of genes with protein support only, and the intersection shows number of genes with EST and protein support. QI summary has: Fraction of exons that overlap an EST alignment Fraction of exons that overlap EST or Protein alignments Please correct me if I'm wrong, because I am interpreting the first to be fraction of exons that overlap an EST alignment and possibly also a protein alignment. If that is the case then we can't calculate the number of genes that overlap only EST or (EST and protein) from the QI information. Anyone have a way to do this or have a script to parse the MAKER GFF3 to get this? Thanks!!! Matt Simenc _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dandence at gmail.com Fri Oct 27 08:51:21 2017 From: dandence at gmail.com (Daniel Ence) Date: Fri, 27 Oct 2017 10:51:21 -0400 Subject: [maker-devel] How to remove contigs from GFF file In-Reply-To: References: Message-ID: Hi Emmanuel, can you send the command that produced the error? If you need to remove certain scaffolds or contigs from a gff3 file, you can use grep to to filter out certain scaffolds like this ?grep -v ?scaffold_name? gff3_file?. ~Daniel > On Oct 26, 2017, at 5:34 PM, Emmanuel Nnadi wrote: > > Hello, > > I need to remove sequences from my GFF file can someone help me with command line for such removal > > ERROR: valid [SEQ_FEAT.FeatureBeginsOrEndsInGap] Feature begins or ends in gap starting at 17625 FEATURE: Gene: CR513_57782 <46071> [lcl|contig_14719:17653-17724] [lcl|contig_14719: delta, dna len= 17790] > ERROR: valid [SEQ_INST.ShortSeq] Sequence only 2 residues BIOSEQ: gnl|aceprd|CR513_62412: raw, aa len= 2 > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From carsonhh at gmail.com Fri Oct 27 16:00:17 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 Oct 2017 16:00:17 -0600 Subject: [maker-devel] "ALRM" isn't numeric in exit - MAKER warning message In-Reply-To: References: Message-ID: <399AB5BD-2FC5-45F4-9AC8-1665CCFEA0D1@gmail.com> Hi Marivi, The only time MAKER uses the ALRM signal is during exit. Sometimes MPI_Finalize can freeze (it has to do with the fact it is being called from Perl). So we set an alarm just in case. Then if it takes to long we assume it is frozen and let things exit in a less than graceful way rather than let it block forever (it is already finished after all). The complaint you get may be because your system doesn?t support the alarm signal or forks.pm (which tries to intercept signals) is having an issue. Or it may just be ugliness related to parts of the process being killed with other parts still being active (it is an ungraceful exit after all). Or it may be another source of the ALRM all together (but I assume it is the MAKER ALRM given that it happens right after MAKER says it is finished). Thanks, Carson > On Oct 27, 2017, at 1:03 PM, Marivi Colle wrote: > > Hi Carson, > > After running MAKER, I checked my std output and here's the message at the end of the file. I was wondering what this warning message means? > > > Start_time: 1508465182 > End_time: 1508950543 > Elapsed: 485361 > > > Maker is now finished!!! > > Argument "ALRM" isn't numeric in exit at /opt/software/BioPerl/1.6.924--GCC-4.4.7/lib64/perl5/forks.pm line 2184. > Argument "ALRM" isn't numeric in exit at /opt/software/BioPerl/1.6.924--GCC-4.4.7/lib64/perl5/forks.pm line 2184. > Argument "ALRM" isn't numeric in exit at /opt/software/BioPerl/1.6.924--GCC-4.4.7/lib64/perl5/forks.pm line 2184. > Argument "ALRM" isn't numeric in exit at /opt/software/BioPerl/1.6.924--GCC-4.4.7/lib64/perl5/forks.pm line 2184 > > > Thank you. > Marivi > > > -- > Marivi G. Colle > Research Associate > Department of Horticulture > Michigan State University > 1066 Bogue St., East Lansing > Michigan 48824-1325, USA -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.tranvan at unil.ch Sat Oct 28 08:14:59 2017 From: patrick.tranvan at unil.ch (Patrick Tran Van) Date: Sat, 28 Oct 2017 14:14:59 +0000 Subject: [maker-devel] Advice on my pipeline In-Reply-To: <651D4267-0FD7-4A92-B778-8976B47353BB@gmail.com> References: <6b029690bace4d3fbae77c0bb1bddce8@prdexch02.ad.unil.ch> <1498470630221.84642@unil.ch> <696C51C6-5606-4ECB-A8B8-9C077182FFFA@gmail.com> <1498908228256.16549@unil.ch> <58E904BF-9AB8-4AC7-B10B-C902F414E03D@gmail.com> <1505986013492.52354@unil.ch>, <651D4267-0FD7-4A92-B778-8976B47353BB@gmail.com> Message-ID: <1509200133044.96929@unil.ch> Hi Carson, If I want to look for alternative splicing variant, can I just add the option alt_splice=1 only at the last round of maker or do I have to set it since the beggining ? (and perform the 4 rounds with this option). Cheers, Patrick ________________________________ From: Carson Holt Sent: Friday, September 22, 2017 10:08 PM To: Patrick Tran Van Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Advice on my pipeline The gff3 passthrough options are there to help users get old data into MAKER when they have lost access to the original files. But for iterative running of the pipeline, it is more effective just to rerun in place so MAKER can access the raw alignment reports. The raw reports from the alignments have more detail than what is stored in the GFF3. Details that are lost when trying to use the GFF3 as input. ?Carson On Sep 21, 2017, at 3:26 AM, Patrick Tran Van > wrote: Hi Carson, I have a doubt for the round 2, so in a previous reply you said: " Also it is more convenient to do each run in the same directory rather than supplying the previous run as GFF3 input. MAKER will automatically recycle previous results archived in the run directory when you do this. Using the maker_gff option is really more for getting data into the run from jobs performed a long time ago (so they can?t be run in the same directory). " Does it means that I don't need to modify the section : #-----Re-annotation Using MAKER Derived GFF3 ? If I let everything by default such as : altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no It will not look again for repeat and protein + transcriptome alignment ? Patrick Tran Van Groups Chapuisat, Robinson-Rechavi & Schwander Department of Ecology and Evolution University of Lausanne Le Biophore CH-1015 Lausanne Switzerland Office 3206 ________________________________ From: Carson Holt > Sent: Monday, July 3, 2017 10:50 PM To: Patrick Tran Van Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Advice on my pipeline maker2zff is just for SNAP training and not for gene filtering (please do not use it for filtering, it does not do what you think). So the final annotation set after maker with correct_est_fusion is 16,850. To decide which set is better, look at them in a browser (gene counts are not useful for guaging result). A well annotated genome will have evidence clusters that closely match the final models. A poorly annoted genome will have evidence clusters that are split or merged by the models. The corrected_est_fusion does two things. It trims long overlapping UTR fragments, and it stops evidence clusters from being merged on BLASTP evidence alone (so gene predictors will get unmerged hint regions if clusters are split). You may also find that using jaccard_clip with Trinity has reduced sensitivity for the transcript data (you may lose things that were there before, but now have better specificity, i.e. fewer false positives). Make sure you provided protein data from at least two related species to help maintain sensitivity lost form the transcript data. You can also add rejected genes models back in after the fact by using iprscan to identify unsupported models with identifiable protein domains ?> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4286374/ Thanks, Carson On Jul 1, 2017, at 5:21 AM, Patrick Tran Van > wrote: So I have assembled my transcriptome with Trinity using the jaccard clip option and I have run maker with and without corrected_est_fusion. I have then use SNAP to train/filter it with: maker2zff specie.all.gff Here are my results: Number of gene after maker -> Number of gene after maker2zff - Without corrected_est_fusion: 21621 -> 13875 - With corrected_est_fusion: 16850 -> 9098 1 )If I understand well how works corrected_est_fusion, because it prevents gene merging, shouldn't be the invert ? Normally I should find more genes with corrected_est_fusion right ? 2) I think I should find something like 13000-14000 genes for my specie. SHould I go with the "Without corrected_est_fusion" for the 2nd iteration of maker ? Thanks for your help Patrick Tran Van Groups Chapuisat, Robinson-Rechavi & Schwander Department of Ecology and Evolution University of Lausanne Le Biophore CH-1015 Lausanne Switzerland Office 3206 ________________________________ From: Carson Holt > Sent: Monday, June 26, 2017 11:38 PM To: Patrick Tran Van Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Advice on my pipeline Sorry the option is ?> correct_est_fusion It is in the maker_opts.ctl file. I would use both SNAP and Augustus on a few large contigs then review the results manually. If one of them is not behaving well, then drop it. If both behave well (i.e. correlate well with evidence alignemnts) then keep them both. ?Carson On Jun 26, 2017, at 3:48 AM, Patrick Tran Van > wrote: Thanks for your answer. 1) Do you think that adding a Augustus training in addition to SNAP at the step 3 and 5 will add more confidence (instead of adding Augustus only for the final round) ? Because I am using autoAug for this and it tooks a while to compute .. 2) I don't see this option : 'avoid_est_fusion=1' . I have tried to add it but I got this error: WARNING: Invalid option 'avoid_est_fusion' in control file maker_opts.ctl (I am using v 2.31.8 ) Patrick Tran Van Groups Chapuisat, Robinson-Rechavi & Schwander Department of Ecology and Evolution University of Lausanne Le Biophore CH-1015 Lausanne Switzerland Office 3206 ________________________________ From: Carson Holt > Sent: Monday, June 5, 2017 8:29 PM To: Patrick Tran Van Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Advice on my pipeline Your plan sounds good. A couple of related notes. Insect genomes tend to have high gene density, so gene merging will be the primary difficulty. You can avoid merging of mRNA-seq evidence by using options like jaccard_clip in Trinity. Then use avoid_est_fusion=1 inside of MAKER. Also it is more convenient to do each run in the same directory rather than supplying the previous run as GFF3 input. MAKER will automatically recycle previous results archived in the run directory when you do this. Using the maker_gff option is really more for getting data into the run from jobs performed a long time ago (so they can?t be run in the same directory). ?Carson On Jun 2, 2017, at 3:56 AM, Patrick Tran Van > wrote: Hello, This is my first time running Maker for an insect genome annotation. I have found various resources and tried to make a consensus, I am looking for your thoughts and advices about my pipeline, if I can improve something or doing useless things: What I have: - RNA evidence: transcriptome - Proteine evidence: swissprot/uniprot + busco protein set of insect - Cegma and busco results of my genome 1) Train SNAP with CEGMA 2) Run (run A) maker with repeat masking with transcript, protein, the new SNAP file (from step 1) and augustus file (from busco). 3) Create SNAP model from run A. 4) Run (run B ) with the new SNAP (done at step 3) with options turned off (est2genome=0) and (protein2genome=0) data, provide gff file (maker_gff=run_A.gff), turn off repeat masking (rm_pass=1), and use previous mapping results (altest_pass=1 and protein_pass=1). 5) Create SNAP model from run B. 6) Run (run C) with the new SNAP (done at step 5) with options turned off (est2genome=0) and (protein2genome=0) data, provide gff file (maker_gff=run_B.gff), turn off repeat masking (rm_pass=1), and use previous mapping results (altest_pass=1 and protein_pass=1). 7) Create SNAP model from run C AND Create Augustus gene model from run C 8) Run (run D) with the new SNAP (done at step 7) + AUGUSTUS file (step 7) with options turned off (est2genome=0) and (protein2genome=0) data, provide gff file (maker_gff=run_C.gff), turn off repeat masking (rm_pass=1), and use previous mapping results (altest_pass=1 and protein_pass=1). + Use keep_preds=1 Does it seems coherent ? Cheers, Patrick Tran Van Groups Chapuisat, Robinson-Rechavi & Schwander Department of Ecology and Evolution University of Lausanne Le Biophore CH-1015 Lausanne Switzerland Office 3206 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dandence at gmail.com Mon Oct 2 07:17:57 2017 From: dandence at gmail.com (Daniel Ence) Date: Mon, 2 Oct 2017 09:17:57 -0400 Subject: [maker-devel] Error with Maker_functional_gff In-Reply-To: References: Message-ID: Hi Emmanuel, I think this script is expecting the file ?uniprot_sprot.fasta? downloaded from the uniprot download page at http://www.uniprot.org/downloads#uniprotkblink The fasta headers in this file are different from the fasta header that the file you used has: >sp|Q6GZX4|001R_FRG3G Putative transcription factor 001R OS=Frog virus 3 (isolate Goorha) GN=FV3-001R PE=4 SV=1 Let us know if that helps, Daniel > On Oct 2, 2017, at 1:03 AM, Emmanuel Nnadi wrote: > > Hello, > I intend to rename genes for Genebank submission > > I downloaded swissprot.fa from NCBI and used blast MAKER generated file to swissprot. > > the output of BLAST RESULT looks like this > snap_masked-contig_8151-processed-gene-0.8-mRNA-1 P10978.1 49.315 73 37 0 43 115 874 946 2.61e-14 71.6 > > I attempted to run maker_funtional_gff using the swissprot.fa downloaded and the blastp result > > I got the following result > > Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 139, <$IN> line 2897906. > Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 141, <$IN> line 2897906. > Can't parse details from FASTA header: >P11684.1 RecName: Full=Uteroglobin; AltName: Full=Clara cell phospholipid-binding protein; Short=CCPBP; AltName: Full=Clara cells 10 kDa secretory protein; Short=CC10; AltName: Full=Secretoglobin family 1A member 1; AltName: Full=Urinary protein 1; Short=UP-1; Short=UP1; Short=Urine protein 1; Flags: Precursor > > > Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 139, <$IN> line 1608599. > Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 141, <$IN> line 1608599. > Can't parse details from FASTA header: >Q9HZU2.1 RecName: Full=Precorrin-8X methylmutase; AltName: Full=HBA synthase; AltName: Full=Precorrin isomerase > > What can I do? > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Oct 2 07:30:43 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 2 Oct 2017 09:30:43 -0400 Subject: [maker-devel] question on gene numbers with quality_filter.pl In-Reply-To: <16C1890A-2042-4BE1-93CE-8A8DC0C18151@ad.unc.edu> References: <16C1890A-2042-4BE1-93CE-8A8DC0C18151@ad.unc.edu> Message-ID: <30C718DD-D3E5-4659-B83D-B9520DD20E34@gmail.com> Hi Chris, This is interesting. -d in quality_filter.pl should only filter out genes based on AED. Is there a chance that you counted transcripts instead of genes? If there is a transcript with an AED of 1 then quality filter should remove it but leave the gene and the transcripts with AEDs less than 1. I can have a look at it if you send me one of the genes (in GFF3 format) that was filtered out by quality_filter.pl even though it had an AED less than 1. Thanks, Mike > On Sep 29, 2017, at 1:20 PM, Willett, Christopher S wrote: > > Hello- > > We are getting to the final stages (hopefully) of a reannotation of a new assembly of a copepod genome using MAKER and we had some questions about which set of genes to use. Our latest runs were using Pfam domains to define default vs standard set using the quality_filter.pl script and I had a question about stringency of the filters for this script. It appears that the default is more stringent than the output that we get from MAKER without using this script (all with AED max set to 1). Are there additional filters in this script beyond AED that would cause this? > > Here is what we are seeing if more details would be helpful. With a run with or without the keep_pred turned our final MAKER run gives ~21500 predicted genes with or 15200 without the keep predictions turned on. What I was wondering about was why this 15200 is higher than the default set (which gives ~14500 genes) after we filter the gff using the -d setting in quality_filter.pl. For completeness the standard set (-s setting) is retaining ~14800 genes and if I filter the 15200 gff file with the default parameters that yields ~14100 genes. So I was curious what else was going on in the filter script beyond AED that would trim out genes? > > The genes sets look pretty good overall and seem like reasonable numbers so we were debating which set to use as our final set. I am also trying a few other analyses in InterProScan to see if that identifies additional genes beyond Pfam for retention but that seems a bit independent from the question above. > > Thanks for your help, > > Best, > > Chris Willett > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Research Associate Professor > Department of Biology > CB#3280 Coker Hall > University of North Carolina, Chapel Hill > Chapel Hill, NC, 27599-3280 > > Office: 2252 Genome Science Building > phone: > 919-843-8663 > fax: > 919-962-1625 > > http://labs.bio.unc.edu/Willett/ > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Oct 2 13:19:51 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 2 Oct 2017 15:19:51 -0400 Subject: [maker-devel] question on gene numbers with quality_filter.pl In-Reply-To: <4C24415C-8A2A-499F-A55A-0026F7D1329F@ad.unc.edu> References: <16C1890A-2042-4BE1-93CE-8A8DC0C18151@ad.unc.edu> <30C718DD-D3E5-4659-B83D-B9520DD20E34@gmail.com> <4C24415C-8A2A-499F-A55A-0026F7D1329F@ad.unc.edu> Message-ID: <0A5A51F2-C551-493B-943B-7F5F81C294BF@gmail.com> Hi Chris, Yeah By default MAKER shouldn?t keep any annotation with an AED of 1. I?ve ccd the dev list on this to see if anyone else has any idea why you might get AED 1 genes with keep_preds=0. Could you send me the maker_opts.ctl file for the run. There may be something informative in there. Thanks, Mike > On Oct 2, 2017, at 2:32 PM, Willett, Christopher S wrote: > > Hi Mike- > > I was looking at the lists of mRNAs and I think what is happening is that there are still genes retained in our initial output from MAKER that have an AED=1 that are then getting trimmed out of the filtered file. If I am setting the AED threshold equal to 1 in the control file for the MAKER run is that less than one or less than or equal to one for retention? Should these AED=1 genes be making it into the gene and mRNA pools if we have the keep predictions parameter set to 0? > > Thanks for your help, > > Best, > > Chris > > > >> On Oct 2, 2017, at 9:30 AM, Michael Campbell > wrote: >> >> Hi Chris, >> >> This is interesting. -d in quality_filter.pl should only filter out genes based on AED. Is there a chance that you counted transcripts instead of genes? If there is a transcript with an AED of 1 then quality filter should remove it but leave the gene and the transcripts with AEDs less than 1. I can have a look at it if you send me one of the genes (in GFF3 format) that was filtered out by quality_filter.pl even though it had an AED less than 1. >> >> Thanks, >> Mike >> >> >>> On Sep 29, 2017, at 1:20 PM, Willett, Christopher S > wrote: >>> >>> Hello- >>> >>> We are getting to the final stages (hopefully) of a reannotation of a new assembly of a copepod genome using MAKER and we had some questions about which set of genes to use. Our latest runs were using Pfam domains to define default vs standard set using the quality_filter.pl script and I had a question about stringency of the filters for this script. It appears that the default is more stringent than the output that we get from MAKER without using this script (all with AED max set to 1). Are there additional filters in this script beyond AED that would cause this? >>> >>> Here is what we are seeing if more details would be helpful. With a run with or without the keep_pred turned our final MAKER run gives ~21500 predicted genes with or 15200 without the keep predictions turned on. What I was wondering about was why this 15200 is higher than the default set (which gives ~14500 genes) after we filter the gff using the -d setting in quality_filter.pl. For completeness the standard set (-s setting) is retaining ~14800 genes and if I filter the 15200 gff file with the default parameters that yields ~14100 genes. So I was curious what else was going on in the filter script beyond AED that would trim out genes? >>> >>> The genes sets look pretty good overall and seem like reasonable numbers so we were debating which set to use as our final set. I am also trying a few other analyses in InterProScan to see if that identifies additional genes beyond Pfam for retention but that seems a bit independent from the question above. >>> >>> Thanks for your help, >>> >>> Best, >>> >>> Chris Willett >>> >>> >>> >>> >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> Research Associate Professor >>> Department of Biology >>> CB#3280 Coker Hall >>> University of North Carolina, Chapel Hill >>> Chapel Hill, NC, 27599-3280 >>> >>> Office: 2252 Genome Science Building >>> phone: >>> 919-843-8663 >>> fax: >>> 919-962-1625 >>> >>> http://labs.bio.unc.edu/Willett/ >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Oct 2 13:35:55 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 2 Oct 2017 15:35:55 -0400 Subject: [maker-devel] question on gene numbers with quality_filter.pl In-Reply-To: References: <16C1890A-2042-4BE1-93CE-8A8DC0C18151@ad.unc.edu> <30C718DD-D3E5-4659-B83D-B9520DD20E34@gmail.com> <4C24415C-8A2A-499F-A55A-0026F7D1329F@ad.unc.edu> <0A5A51F2-C551-493B-943B-7F5F81C294BF@gmail.com> Message-ID: <4C4E3DE7-CE28-4DF7-B234-E88701CAD172@gmail.com> Hi Chris, It?s this line here: model_gff=/proj/willetlb/users/cwillett/MAKER_analyses/dovetail_ann/SDv1.0_est-forward-SDv2.1.gff Anything passed to model_gff is treated as sacred by MAKER and will be kept regardless of AED. If you pass it in as pred_gff= then it will be subject to the AED filters. I hope this helps, Mike > On Oct 2, 2017, at 3:28 PM, Willett, Christopher S wrote: > > From daren.card at gmail.com Wed Oct 4 09:53:42 2017 From: daren.card at gmail.com (Daren C. Card) Date: Wed, 4 Oct 2017 10:53:42 -0500 Subject: [maker-devel] MAKER RepeatRunner error on long scaffolds only Message-ID: Hi all, I?ve been having an issue with MAKER (v. 2.31.8) that I haven?t been able to overcome, and no former questions have really addressed or helped fix the problem. I?ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size? I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can?t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I?m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library). Any help would be greatly appreciated. Daren Card University of Texas Arlington ################################################### doing blastx repeats running blast search. #--------- command -------------# Widget::blastx: /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner #-------------------------------# deleted:0 hits collecting blastx repeatmasking processing all repeats in cluster::shadow_cluster... Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. --> rank=3, hostname=moonunit0 ERROR: Failed while processing all repeats ERROR: Chunk failed at level:3, tier_type:1 FAILED CONTIG:scaffold-1 doing blastx repeats running blast search. #--------- command -------------# Widget::blastx: /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner #-------------------------------# ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:scaffold-1 deleted:0 hits deleted:0 hits ################################################### From carsonhh at gmail.com Wed Oct 4 10:03:52 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 4 Oct 2017 10:03:52 -0600 Subject: [maker-devel] MAKER RepeatRunner error on long scaffolds only In-Reply-To: References: Message-ID: <2460BB61-C918-40B5-ABF2-03193BF13CCC@gmail.com> The point where it dies is because there is no start/end coordinate for one of the alignments. The issue can either be with the GFF3 you gave it or is a truncated BLAST report. Recently there have been a number of weird BLAST+ issues related to truncated reports. Updating to 2.6+ seems to solve it for most people. There is also a 2.6 update for rmblast inside RepeatMasker. I submitted a bug report and example set to BLAST a few months ago. ?Carson > On Oct 4, 2017, at 9:53 AM, Daren C. Card wrote: > > Hi all, > > I?ve been having an issue with MAKER (v. 2.31.8) that I haven?t been able to overcome, and no former questions have really addressed or helped fix the problem. I?ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size? > > I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can?t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I?m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library). > > Any help would be greatly appreciated. > Daren Card > > University of Texas Arlington > > ################################################### > doing blastx repeats > running blast search. > #--------- command -------------# > Widget::blastx: > /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner > #-------------------------------# > deleted:0 hits > collecting blastx repeatmasking > processing all repeats > in cluster::shadow_cluster... > Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. > --> rank=3, hostname=moonunit0 > ERROR: Failed while processing all repeats > ERROR: Chunk failed at level:3, tier_type:1 > FAILED CONTIG:scaffold-1 > > doing blastx repeats > running blast search. > #--------- command -------------# > Widget::blastx: > /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner > #-------------------------------# > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:scaffold-1 > > deleted:0 hits > deleted:0 hits > ################################################### > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From qwzhang0601 at gmail.com Wed Oct 4 16:31:09 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Wed, 4 Oct 2017 18:31:09 -0400 Subject: [maker-devel] About eAED Message-ID: Hello: I ran the maker2 pipeline and got the default gene sets (with AED<1). But I found there are several hundred genes with eAED 1. Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can be the reason of the great difference between AED and eAED. For this gene it has a very low AED score, is it still a reliable gene model if its eAED equals 1? >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 QI:75|0|0|1|0|0|2|111|35 Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Wed Oct 4 16:35:41 2017 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Thu, 5 Oct 2017 09:35:41 +1100 Subject: [maker-devel] About eAED In-Reply-To: References: Message-ID: Carson commented on this here https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ On 5 October 2017 at 09:31, Quanwei Zhang wrote: > Hello: > > I ran the maker2 pipeline and got the default gene sets (with AED<1). But > I found there are several hundred genes with eAED 1. > > Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can > be the reason of the great difference between AED and eAED. For this gene > it has a very low AED score, is it still a reliable gene model if its eAED > equals 1? > > >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 > QI:75|0|0|1|0|0|2|111|35 > > Thanks > > Best > Quanwei > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Oct 4 16:38:00 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 4 Oct 2017 16:38:00 -0600 Subject: [maker-devel] About eAED In-Reply-To: References: Message-ID: <77155DA5-6454-4B25-BCF6-DE6B077BA548@gmail.com> eAED is an extended AED calculation that does some inference about the evidence (i.e. checks reading frame and not just overlap, and may infer support for an exon if by splice sites are confirmed etc.). If eAED is 1 that means that while there is evidence supporting the model, the evidence is more likely to be spurious, so it may be a false model. ?Carson > On Oct 4, 2017, at 4:31 PM, Quanwei Zhang wrote: > > Hello: > > I ran the maker2 pipeline and got the default gene sets (with AED<1). But I found there are several hundred genes with eAED 1. > > Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can be the reason of the great difference between AED and eAED. For this gene it has a very low AED score, is it still a reliable gene model if its eAED equals 1? > > >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 QI:75|0|0|1|0|0|2|111|35 > > Thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Oct 4 16:39:52 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 4 Oct 2017 16:39:52 -0600 Subject: [maker-devel] About eAED In-Reply-To: References: Message-ID: <606958D2-D9BB-477D-ACE8-E096A9AD9666@gmail.com> This one is an even better explanation than the answer I just gave. Thank you. ?Carson > On Oct 4, 2017, at 4:35 PM, Xabier V?zquez-Campos wrote: > > Carson commented on this here > https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ > > On 5 October 2017 at 09:31, Quanwei Zhang > wrote: > Hello: > > I ran the maker2 pipeline and got the default gene sets (with AED<1). But I found there are several hundred genes with eAED 1. > > Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can be the reason of the great difference between AED and eAED. For this gene it has a very low AED score, is it still a reliable gene model if its eAED equals 1? > > >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 QI:75|0|0|1|0|0|2|111|35 > > Thanks > > Best > Quanwei > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > Xabier V?zquez-Campos, PhD > Research Associate > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From eennadi at gmail.com Sun Oct 1 23:03:01 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Mon, 2 Oct 2017 06:03:01 +0100 Subject: [maker-devel] Error with Maker_functional_gff Message-ID: Hello, I intend to rename genes for Genebank submission I downloaded swissprot.fa from NCBI and used blast MAKER generated file to swissprot. the output of BLAST RESULT looks like this snap_masked-contig_8151-processed-gene-0.8-mRNA-1 P10978.1 49.315 73 37 0 43 115 874 946 2.61e-14 71.6 I attempted to run maker_funtional_gff using the swissprot.fa downloaded and the blastp result I got the following result Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 139, <$IN> line 2897906. Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 141, <$IN> line 2897906. Can't parse details from FASTA header: >P11684.1 RecName: Full=Uteroglobin; AltName: Full=Clara cell phospholipid-binding protein; Short=CCPBP; AltName: Full=Clara cells 10 kDa secretory protein; Short=CC10; AltName: Full=Secretoglobin family 1A member 1; AltName: Full=Urinary protein 1; Short=UP-1; Short=UP1; Short=Urine protein 1; Flags: Precursor Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 139, <$IN> line 1608599. Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 141, <$IN> line 1608599. Can't parse details from FASTA header: >Q9HZU2.1 RecName: Full=Precorrin-8X methylmutase; AltName: Full=HBA synthase; AltName: Full=Precorrin isomerase What can I do? Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From willett4 at email.unc.edu Mon Oct 2 09:04:38 2017 From: willett4 at email.unc.edu (Willett, Christopher S) Date: Mon, 2 Oct 2017 15:04:38 +0000 Subject: [maker-devel] question on gene numbers with quality_filter.pl In-Reply-To: <30C718DD-D3E5-4659-B83D-B9520DD20E34@gmail.com> References: <16C1890A-2042-4BE1-93CE-8A8DC0C18151@ad.unc.edu> <30C718DD-D3E5-4659-B83D-B9520DD20E34@gmail.com> Message-ID: Hi Mike- Thanks for getting back to me. I was using the grep -cP '\tgene\t? syntax to count the numbers and it seems to be giving me the same numbers I got before when I was counting either the transcripts or the genes in the fasta output files from our original run. I will have to look at the files a bit more to see if I can find some examples of genes that fit what you are suggesting. Best, Chris On Oct 2, 2017, at 9:30 AM, Michael Campbell > wrote: Hi Chris, This is interesting. -d in quality_filter.pl should only filter out genes based on AED. Is there a chance that you counted transcripts instead of genes? If there is a transcript with an AED of 1 then quality filter should remove it but leave the gene and the transcripts with AEDs less than 1. I can have a look at it if you send me one of the genes (in GFF3 format) that was filtered out by quality_filter.pl even though it had an AED less than 1. Thanks, Mike On Sep 29, 2017, at 1:20 PM, Willett, Christopher S > wrote: Hello- We are getting to the final stages (hopefully) of a reannotation of a new assembly of a copepod genome using MAKER and we had some questions about which set of genes to use. Our latest runs were using Pfam domains to define default vs standard set using the quality_filter.pl script and I had a question about stringency of the filters for this script. It appears that the default is more stringent than the output that we get from MAKER without using this script (all with AED max set to 1). Are there additional filters in this script beyond AED that would cause this? Here is what we are seeing if more details would be helpful. With a run with or without the keep_pred turned our final MAKER run gives ~21500 predicted genes with or 15200 without the keep predictions turned on. What I was wondering about was why this 15200 is higher than the default set (which gives ~14500 genes) after we filter the gff using the -d setting in quality_filter.pl. For completeness the standard set (-s setting) is retaining ~14800 genes and if I filter the 15200 gff file with the default parameters that yields ~14100 genes. So I was curious what else was going on in the filter script beyond AED that would trim out genes? The genes sets look pretty good overall and seem like reasonable numbers so we were debating which set to use as our final set. I am also trying a few other analyses in InterProScan to see if that identifies additional genes beyond Pfam for retention but that seems a bit independent from the question above. Thanks for your help, Best, Chris Willett ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Research Associate Professor Department of Biology CB#3280 Coker Hall University of North Carolina, Chapel Hill Chapel Hill, NC, 27599-3280 Office: 2252 Genome Science Building phone: 919-843-8663 fax: 919-962-1625 http://labs.bio.unc.edu/Willett/ _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From willett4 at email.unc.edu Mon Oct 2 13:28:19 2017 From: willett4 at email.unc.edu (Willett, Christopher S) Date: Mon, 2 Oct 2017 19:28:19 +0000 Subject: [maker-devel] question on gene numbers with quality_filter.pl In-Reply-To: <0A5A51F2-C551-493B-943B-7F5F81C294BF@gmail.com> References: <16C1890A-2042-4BE1-93CE-8A8DC0C18151@ad.unc.edu> <30C718DD-D3E5-4659-B83D-B9520DD20E34@gmail.com> <4C24415C-8A2A-499F-A55A-0026F7D1329F@ad.unc.edu> <0A5A51F2-C551-493B-943B-7F5F81C294BF@gmail.com> Message-ID: Hi Mike- Here is the control file for the last run of MAKER with keep_preds=0 and here is an example of one mRNA retained from the gff file: Chromosome_6 maker mRNA 556000 557215 . + . ID=maker-Chromosome_6-exonerate_est2genome-gene-5.3-mRNA-1;Parent=maker-Chromosome_6-exonerate_est2genome-gene-5.3;Name=TCALIF_02833-PA;_AED=1.00;_eAED=1.00;_QI=15|0|0|0|1|1|2|75|338;score=100;Alias=TCALIF_02833-PA Thanks, Chris On Oct 2, 2017, at 3:19 PM, Michael Campbell > wrote: Hi Chris, Yeah By default MAKER shouldn?t keep any annotation with an AED of 1. I?ve ccd the dev list on this to see if anyone else has any idea why you might get AED 1 genes with keep_preds=0. Could you send me the maker_opts.ctl file for the run. There may be something informative in there. Thanks, Mike On Oct 2, 2017, at 2:32 PM, Willett, Christopher S > wrote: Hi Mike- I was looking at the lists of mRNAs and I think what is happening is that there are still genes retained in our initial output from MAKER that have an AED=1 that are then getting trimmed out of the filtered file. If I am setting the AED threshold equal to 1 in the control file for the MAKER run is that less than one or less than or equal to one for retention? Should these AED=1 genes be making it into the gene and mRNA pools if we have the keep predictions parameter set to 0? Thanks for your help, Best, Chris On Oct 2, 2017, at 9:30 AM, Michael Campbell > wrote: Hi Chris, This is interesting. -d in quality_filter.pl should only filter out genes based on AED. Is there a chance that you counted transcripts instead of genes? If there is a transcript with an AED of 1 then quality filter should remove it but leave the gene and the transcripts with AEDs less than 1. I can have a look at it if you send me one of the genes (in GFF3 format) that was filtered out by quality_filter.pl even though it had an AED less than 1. Thanks, Mike On Sep 29, 2017, at 1:20 PM, Willett, Christopher S > wrote: Hello- We are getting to the final stages (hopefully) of a reannotation of a new assembly of a copepod genome using MAKER and we had some questions about which set of genes to use. Our latest runs were using Pfam domains to define default vs standard set using the quality_filter.pl script and I had a question about stringency of the filters for this script. It appears that the default is more stringent than the output that we get from MAKER without using this script (all with AED max set to 1). Are there additional filters in this script beyond AED that would cause this? Here is what we are seeing if more details would be helpful. With a run with or without the keep_pred turned our final MAKER run gives ~21500 predicted genes with or 15200 without the keep predictions turned on. What I was wondering about was why this 15200 is higher than the default set (which gives ~14500 genes) after we filter the gff using the -d setting in quality_filter.pl. For completeness the standard set (-s setting) is retaining ~14800 genes and if I filter the 15200 gff file with the default parameters that yields ~14100 genes. So I was curious what else was going on in the filter script beyond AED that would trim out genes? The genes sets look pretty good overall and seem like reasonable numbers so we were debating which set to use as our final set. I am also trying a few other analyses in InterProScan to see if that identifies additional genes beyond Pfam for retention but that seems a bit independent from the question above. Thanks for your help, Best, Chris Willett ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Research Associate Professor Department of Biology CB#3280 Coker Hall University of North Carolina, Chapel Hill Chapel Hill, NC, 27599-3280 Office: 2252 Genome Science Building phone: 919-843-8663 fax: 919-962-1625 http://labs.bio.unc.edu/Willett/ _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl_full8 Type: application/octet-stream Size: 5617 bytes Desc: maker_opts.ctl_full8 URL: From qwzhang0601 at gmail.com Wed Oct 4 20:35:55 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Wed, 4 Oct 2017 22:35:55 -0400 Subject: [maker-devel] About eAED In-Reply-To: <606958D2-D9BB-477D-ACE8-E096A9AD9666@gmail.com> References: <606958D2-D9BB-477D-ACE8-E096A9AD9666@gmail.com> Message-ID: Thank you all. Most time, the AED is equal to or lower than eAED, but there are some genes whose eAED is smaller than AED. I feel the eAED is more stringent than AED. Would you give me an example, under what condition eAED can be smaller than AED? The default maker2 gene set includes all genes with AED less than 1. Do you think eAED is a better choice to filter gene models than AED? Best Quanwei 2017-10-04 18:39 GMT-04:00 Carson Holt : > This one is an even better explanation than the answer I just gave. Thank > you. > > ?Carson > > On Oct 4, 2017, at 4:35 PM, Xabier V?zquez-Campos > wrote: > > Carson commented on this here > https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ > > On 5 October 2017 at 09:31, Quanwei Zhang wrote: > >> Hello: >> >> I ran the maker2 pipeline and got the default gene sets (with AED<1). But >> I found there are several hundred genes with eAED 1. >> >> Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can >> be the reason of the great difference between AED and eAED. For this gene >> it has a very low AED score, is it still a reliable gene model if its eAED >> equals 1? >> >> >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 >> QI:75|0|0|1|0|0|2|111|35 >> >> Thanks >> >> Best >> Quanwei >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Oct 4 20:38:25 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 4 Oct 2017 20:38:25 -0600 Subject: [maker-devel] About eAED In-Reply-To: References: <606958D2-D9BB-477D-ACE8-E096A9AD9666@gmail.com> Message-ID: <5DEAC021-9925-4B41-9332-AB48685D7304@gmail.com> The previous linked comment explains in detail ?> https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ Basically the middle support of exon is inferred from edge support even though no overlap exists (so eAED infers support and AED does not). ?Carson > On Oct 4, 2017, at 8:35 PM, Quanwei Zhang wrote: > > Thank you all. Most time, the AED is equal to or lower than eAED, but there are some genes whose eAED is smaller than AED. I feel the eAED is more stringent than AED. Would you give me an example, under what condition eAED can be smaller than AED? > > The default maker2 gene set includes all genes with AED less than 1. Do you think eAED is a better choice to filter gene models than AED? > > Best > Quanwei > > > > 2017-10-04 18:39 GMT-04:00 Carson Holt >: > This one is an even better explanation than the answer I just gave. Thank you. > > ?Carson > >> On Oct 4, 2017, at 4:35 PM, Xabier V?zquez-Campos > wrote: >> >> Carson commented on this here >> https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ >> >> On 5 October 2017 at 09:31, Quanwei Zhang > wrote: >> Hello: >> >> I ran the maker2 pipeline and got the default gene sets (with AED<1). But I found there are several hundred genes with eAED 1. >> >> Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can be the reason of the great difference between AED and eAED. For this gene it has a very low AED score, is it still a reliable gene model if its eAED equals 1? >> >> >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 QI:75|0|0|1|0|0|2|111|35 >> >> Thanks >> >> Best >> Quanwei >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> >> -- >> Xabier V?zquez-Campos, PhD >> Research Associate >> NSW Systems Biology Initiative >> School of Biotechnology and Biomolecular Sciences >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Oct 4 20:43:28 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 4 Oct 2017 20:43:28 -0600 Subject: [maker-devel] About eAED In-Reply-To: <5DEAC021-9925-4B41-9332-AB48685D7304@gmail.com> References: <606958D2-D9BB-477D-ACE8-E096A9AD9666@gmail.com> <5DEAC021-9925-4B41-9332-AB48685D7304@gmail.com> Message-ID: eAED can be better for edge cases, but neither is perfect. Low AED generally correlates with better models. But a high AED does not mean the model doesn?t exist, it just means you should spend a little more time deciding if you really believe it or not. ?Carson > On Oct 4, 2017, at 8:38 PM, Carson Holt wrote: > > The previous linked comment explains in detail ?> https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ > > Basically the middle support of exon is inferred from edge support even though no overlap exists (so eAED infers support and AED does not). > > ?Carson > > >> On Oct 4, 2017, at 8:35 PM, Quanwei Zhang > wrote: >> >> Thank you all. Most time, the AED is equal to or lower than eAED, but there are some genes whose eAED is smaller than AED. I feel the eAED is more stringent than AED. Would you give me an example, under what condition eAED can be smaller than AED? >> >> The default maker2 gene set includes all genes with AED less than 1. Do you think eAED is a better choice to filter gene models than AED? >> >> Best >> Quanwei >> >> >> >> 2017-10-04 18:39 GMT-04:00 Carson Holt >: >> This one is an even better explanation than the answer I just gave. Thank you. >> >> ?Carson >> >>> On Oct 4, 2017, at 4:35 PM, Xabier V?zquez-Campos > wrote: >>> >>> Carson commented on this here >>> https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ >>> >>> On 5 October 2017 at 09:31, Quanwei Zhang > wrote: >>> Hello: >>> >>> I ran the maker2 pipeline and got the default gene sets (with AED<1). But I found there are several hundred genes with eAED 1. >>> >>> Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can be the reason of the great difference between AED and eAED. For this gene it has a very low AED score, is it still a reliable gene model if its eAED equals 1? >>> >>> >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 QI:75|0|0|1|0|0|2|111|35 >>> >>> Thanks >>> >>> Best >>> Quanwei >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >>> >>> -- >>> Xabier V?zquez-Campos, PhD >>> Research Associate >>> NSW Systems Biology Initiative >>> School of Biotechnology and Biomolecular Sciences >>> The University of New South Wales >>> Sydney NSW 2052 AUSTRALIA >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Wed Oct 4 21:25:24 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Wed, 4 Oct 2017 23:25:24 -0400 Subject: [maker-devel] About eAED In-Reply-To: References: <606958D2-D9BB-477D-ACE8-E096A9AD9666@gmail.com> <5DEAC021-9925-4B41-9332-AB48685D7304@gmail.com> Message-ID: Thanks for your explanation. Best Quanwei 2017-10-04 22:43 GMT-04:00 Carson Holt : > eAED can be better for edge cases, but neither is perfect. Low AED > generally correlates with better models. But a high AED does not mean the > model doesn?t exist, it just means you should spend a little more time > deciding if you really believe it or not. > > ?Carson > > > > On Oct 4, 2017, at 8:38 PM, Carson Holt wrote: > > The previous linked comment explains in detail ?> > https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa-ko/iC4KTuIitGEJ > > Basically the middle support of exon is inferred from edge support even > though no overlap exists (so eAED infers support and AED does not). > > ?Carson > > > On Oct 4, 2017, at 8:35 PM, Quanwei Zhang wrote: > > Thank you all. Most time, the AED is equal to or lower than eAED, but > there are some genes whose eAED is smaller than AED. I feel the eAED is > more stringent than AED. Would you give me an example, under what condition > eAED can be smaller than AED? > > The default maker2 gene set includes all genes with AED less than 1. Do > you think eAED is a better choice to filter gene models than AED? > > Best > Quanwei > > > > 2017-10-04 18:39 GMT-04:00 Carson Holt : > >> This one is an even better explanation than the answer I just gave. Thank >> you. >> >> ?Carson >> >> On Oct 4, 2017, at 4:35 PM, Xabier V?zquez-Campos >> wrote: >> >> Carson commented on this here >> https://groups.google.com/forum/#!msg/maker-devel/wtmNRtRa- >> ko/iC4KTuIitGEJ >> >> On 5 October 2017 at 09:31, Quanwei Zhang wrote: >> >>> Hello: >>> >>> I ran the maker2 pipeline and got the default gene sets (with AED<1). >>> But I found there are several hundred genes with eAED 1. >>> >>> Below is an example, the gene has AED 0.05 and eAED 1. I wonder what can >>> be the reason of the great difference between AED and eAED. For this gene >>> it has a very low AED score, is it still a reliable gene model if its eAED >>> equals 1? >>> >>> >maker-Contig2656-snap-gene-269.6-mRNA-1 protein AED:0.05 eAED:1.00 >>> QI:75|0|0|1|0|0|2|111|35 >>> >>> Thanks >>> >>> Best >>> Quanwei >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> >> >> -- >> Xabier V?zquez-Campos, *PhD* >> *Research Associate* >> NSW Systems Biology Initiative >> School of Biotechnology and Biomolecular Sciences >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dandence at gmail.com Thu Oct 5 08:00:21 2017 From: dandence at gmail.com (Daniel Ence) Date: Thu, 5 Oct 2017 10:00:21 -0400 Subject: [maker-devel] Error with Maker_functional_gff In-Reply-To: References: Message-ID: Hi Emmanuel, I can?t tell whether it?s will work from the blast lines that you sent. It will depend on the full headers in the fasta lines, which you?ll run after all the blasts are complete. Assembly isn?t really my expertise or the topic of this mailing list, but assembling your contigs into scaffolds would probably help your annotations by connecting some parts of genes that are broken across contigs, and will definitely help downstream analysis if you need to know which genes are located next to each other. How much improvement you can get by scaffolding depends on the type of sequence data you have. Each scaffolder makes assumptions and has requirements, and some assemblers like velvet and SOAPdenovo have scaffolding built into their algorithms. I?d recommend starting with a review like this one: http://www.sciencedirect.com/science/article/pii/S1672022912000095 ~Daniel > On Oct 2, 2017, at 10:47 AM, Emmanuel Nnadi wrote: > > Hello Daniel, > > Thanks for the tip, I was able to download uniprot_swiss.fa I am currently running the blast now > > it looks like this > > MUCPR_041061-RA sp|P10978|POLX_TOBAC 49.315 73 37 0 43 115 874 946 2.95e-14 71.6 > MUCPR_026643-RA sp|Q00451|PRF1_SOLLC 86.207 87 11 1 243 328 257 343 3.65e-32 126 > > Is it ok? > > I wish to ask, I did not assemble my contigs into scaffold before annotating would it affect the end result? > > I wish to assemble my sequence into scaffold can you advice on the best software to use? > > I attempted using SSPACE: a new stand-alone scaffolding tool for small and large genomes > but am having problem with the library. Funny enough the software does not have support to solve problems > > Thanks > > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > On Mon, Oct 2, 2017 at 2:17 PM, Daniel Ence > wrote: > Hi Emmanuel, I think this script is expecting the file ?uniprot_sprot.fasta? downloaded from the uniprot download page at http://www.uniprot.org/downloads#uniprotkblink > The fasta headers in this file are different from the fasta header that the file you used has: > >sp|Q6GZX4|001R_FRG3G Putative transcription factor 001R OS=Frog virus 3 (isolate Goorha) GN=FV3-001R PE=4 SV=1 > > Let us know if that helps, > Daniel > >> On Oct 2, 2017, at 1:03 AM, Emmanuel Nnadi > wrote: >> >> Hello, >> I intend to rename genes for Genebank submission >> >> I downloaded swissprot.fa from NCBI and used blast MAKER generated file to swissprot. >> >> the output of BLAST RESULT looks like this >> snap_masked-contig_8151-processed-gene-0.8-mRNA-1 P10978.1 49.315 73 37 0 43 115 874 946 2.61e-14 71.6 >> >> I attempted to run maker_funtional_gff using the swissprot.fa downloaded and the blastp result >> >> I got the following result >> >> Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 139, <$IN> line 2897906. >> Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 141, <$IN> line 2897906. >> Can't parse details from FASTA header: >P11684.1 RecName: Full=Uteroglobin; AltName: Full=Clara cell phospholipid-binding protein; Short=CCPBP; AltName: Full=Clara cells 10 kDa secretory protein; Short=CC10; AltName: Full=Secretoglobin family 1A member 1; AltName: Full=Urinary protein 1; Short=UP-1; Short=UP1; Short=Urine protein 1; Flags: Precursor >> >> >> Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 139, <$IN> line 1608599. >> Use of uninitialized value $id in hash element at /Users/emmannaemeka/Desktop/Gpm/maker/bin/maker_functional_gff line 141, <$IN> line 1608599. >> Can't parse details from FASTA header: >Q9HZU2.1 RecName: Full=Precorrin-8X methylmutase; AltName: Full=HBA synthase; AltName: Full=Precorrin isomerase >> >> What can I do? >> >> >> Nnadi Nnaemeka Emmanuel >> Department of Microbiology, >> Faculty of Natural and Applied Science, >> Plateau State University, Bokkos, Plateau State, Nigeria. >> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daren.card at gmail.com Fri Oct 6 06:23:36 2017 From: daren.card at gmail.com (Daren C. Card) Date: Fri, 6 Oct 2017 07:23:36 -0500 Subject: [maker-devel] MAKER RepeatRunner error on long scaffolds only In-Reply-To: <2460BB61-C918-40B5-ABF2-03193BF13CCC@gmail.com> References: <2460BB61-C918-40B5-ABF2-03193BF13CCC@gmail.com> Message-ID: <52A27F91-063E-45C5-BEE0-BED0BF4E861E@gmail.com> Dear Carson, Thanks so much for the quick reply. I updated BLAST to v2.6 and reran the configure script for RepeatMasker. Looks like MAKER should natively work with the BLAST that is available in the $PATH. Unfortunately, I?m still getting the same error what appears to be at roughly the same spot (~child 226). I?ve copied the stderr below. I checked my GFF file and I don?t see any issues with coordinates. I?m going to try running without a GFF of repeat annotations to see what that does, but in the meantime I wanted to send an update and see if there is anything else I should look into. Thank you, Daren Card ################################################ doing repeat masking re reading repeat masker report. /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.simple.rb.out doing blastx repeats re reading blast report. /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.te_proteins%2Efasta.repeatrunner deleted:2 hits doing blastx repeats doing blastx repeats doing blastx repeats doing blastx repeats doing blastx repeats doing blastx repeats doing blastx repeats doing blastx repeats doing blastx repeats collecting blastx repeatmasking processing all repeats in cluster::shadow_cluster... Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. --> rank=NA, hostname=moonunit0 ERROR: Failed while processing all repeats ERROR: Chunk failed at level:3, tier_type:1 FAILED CONTIG:scaffold-1 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:scaffold-1 examining contents of the fasta file and run log ################################################ > On Oct 4, 2017, at 11:03 AM, Carson Holt wrote: > > The point where it dies is because there is no start/end coordinate for one of the alignments. The issue can either be with the GFF3 you gave it or is a truncated BLAST report. Recently there have been a number of weird BLAST+ issues related to truncated reports. Updating to 2.6+ seems to solve it for most people. There is also a 2.6 update for rmblast inside RepeatMasker. I submitted a bug report and example set to BLAST a few months ago. > > ?Carson > > >> On Oct 4, 2017, at 9:53 AM, Daren C. Card wrote: >> >> Hi all, >> >> I?ve been having an issue with MAKER (v. 2.31.8) that I haven?t been able to overcome, and no former questions have really addressed or helped fix the problem. I?ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size? >> >> I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can?t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I?m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library). >> >> Any help would be greatly appreciated. >> Daren Card >> >> University of Texas Arlington >> >> ################################################### >> doing blastx repeats >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner >> #-------------------------------# >> deleted:0 hits >> collecting blastx repeatmasking >> processing all repeats >> in cluster::shadow_cluster... >> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >> --> rank=3, hostname=moonunit0 >> ERROR: Failed while processing all repeats >> ERROR: Chunk failed at level:3, tier_type:1 >> FAILED CONTIG:scaffold-1 >> >> doing blastx repeats >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner >> #-------------------------------# >> ERROR: Chunk failed at level:2, tier_type:0 >> FAILED CONTIG:scaffold-1 >> >> deleted:0 hits >> deleted:0 hits >> ################################################### >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From eennadi at gmail.com Sat Oct 7 15:34:46 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Sat, 7 Oct 2017 22:34:46 +0100 Subject: [maker-devel] jbrowse not working Message-ID: Please, I ran the command line maker2jbrowse muc1_genome_snap2.all.gff The command created some folders. However, at the end it read No reference sequences defined in configuration, nothing to do. Please what does it mean? How can I view it in jbrowse. Thanks Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun Oct 8 18:37:12 2017 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 8 Oct 2017 18:37:12 -0600 Subject: [maker-devel] MAKER RepeatRunner error on long scaffolds only In-Reply-To: <52A27F91-063E-45C5-BEE0-BED0BF4E861E@gmail.com> References: <2460BB61-C918-40B5-ABF2-03193BF13CCC@gmail.com> <52A27F91-063E-45C5-BEE0-BED0BF4E861E@gmail.com> Message-ID: <228ECD18-7B0E-47EB-9F58-FA3C31421A52@gmail.com> MAKER will use whatever blast is indicated in maker_exe.ctl, so make sure the new installation is the one indicated there. RepeatRunner is not part of RepeatMasker, and is a separate step that is essentially just a modified BLASTX against a protein database. So the standard NCBI blast+ installation is what gets used for that (not RMBLAST). The error you get is because the BLAST report is truncated. At the top of a BLAST report there is a summary of results, and then below there are details about each result. What is happening is that there are results in the top summary that are not being found in the bottom detail section. If Updating to BLAST+ 2.6 does not fix it for you, you may need to drop to legacy NCBI BLAST (i.e. the one that is not the BLAST+ rewrite). Here ?> ftp://ftp.ncbi.nlm.nih.gov/blast/executables/legacy/2.2.26/ ?Carson > On Oct 6, 2017, at 6:23 AM, Daren C. Card wrote: > > Dear Carson, > > Thanks so much for the quick reply. I updated BLAST to v2.6 and reran the configure script for RepeatMasker. Looks like MAKER should natively work with the BLAST that is available in the $PATH. > > Unfortunately, I?m still getting the same error what appears to be at roughly the same spot (~child 226). I?ve copied the stderr below. I checked my GFF file and I don?t see any issues with coordinates. I?m going to try running without a GFF of repeat annotations to see what that does, but in the meantime I wanted to send an update and see if there is anything else I should look into. > > Thank you, > Daren Card > > > ################################################ > doing repeat masking > re reading repeat masker report. > /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.simple.rb.out > doing blastx repeats > re reading blast report. > /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.te_proteins%2Efasta.repeatrunner > deleted:2 hits > doing blastx repeats > doing blastx repeats > doing blastx repeats > doing blastx repeats > doing blastx repeats > doing blastx repeats > doing blastx repeats > doing blastx repeats > doing blastx repeats > collecting blastx repeatmasking > processing all repeats > in cluster::shadow_cluster... > Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. > --> rank=NA, hostname=moonunit0 > ERROR: Failed while processing all repeats > ERROR: Chunk failed at level:3, tier_type:1 > FAILED CONTIG:scaffold-1 > > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:scaffold-1 > > examining contents of the fasta file and run log > ################################################ > > > >> On Oct 4, 2017, at 11:03 AM, Carson Holt wrote: >> >> The point where it dies is because there is no start/end coordinate for one of the alignments. The issue can either be with the GFF3 you gave it or is a truncated BLAST report. Recently there have been a number of weird BLAST+ issues related to truncated reports. Updating to 2.6+ seems to solve it for most people. There is also a 2.6 update for rmblast inside RepeatMasker. I submitted a bug report and example set to BLAST a few months ago. >> >> ?Carson >> >> >>> On Oct 4, 2017, at 9:53 AM, Daren C. Card wrote: >>> >>> Hi all, >>> >>> I?ve been having an issue with MAKER (v. 2.31.8) that I haven?t been able to overcome, and no former questions have really addressed or helped fix the problem. I?ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size? >>> >>> I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can?t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I?m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library). >>> >>> Any help would be greatly appreciated. >>> Daren Card >>> >>> University of Texas Arlington >>> >>> ################################################### >>> doing blastx repeats >>> running blast search. >>> #--------- command -------------# >>> Widget::blastx: >>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner >>> #-------------------------------# >>> deleted:0 hits >>> collecting blastx repeatmasking >>> processing all repeats >>> in cluster::shadow_cluster... >>> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >>> --> rank=3, hostname=moonunit0 >>> ERROR: Failed while processing all repeats >>> ERROR: Chunk failed at level:3, tier_type:1 >>> FAILED CONTIG:scaffold-1 >>> >>> doing blastx repeats >>> running blast search. >>> #--------- command -------------# >>> Widget::blastx: >>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner >>> #-------------------------------# >>> ERROR: Chunk failed at level:2, tier_type:0 >>> FAILED CONTIG:scaffold-1 >>> >>> deleted:0 hits >>> deleted:0 hits >>> ################################################### >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Oct 9 18:35:49 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 9 Oct 2017 18:35:49 -0600 Subject: [maker-devel] jbrowse not working In-Reply-To: References: Message-ID: <83AFE420-D54D-4CE8-833F-DE6CCC34A229@gmail.com> Is muc1_genome_snap2.all.gff missing embedded fasta entries at the end of the file? That can happen if you use the -n option with gff3_merge. Alternatively it?s possible one of the individual contig gff3 used to build the merged gff3 is truncated. If that is the case then gff3_merge should have thrown some sort of error or warning when you run it. Thanks, Carson > On Oct 7, 2017, at 3:34 PM, Emmanuel Nnadi wrote: > > Please, > I ran the command line > > maker2jbrowse muc1_genome_snap2.all.gff > > The command created some folders. However, at the end it read > No reference sequences defined in configuration, nothing to do. > > Please what does it mean? How can I view it in jbrowse. > > Thanks > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From eennadi at gmail.com Mon Oct 9 22:42:35 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Tue, 10 Oct 2017 05:42:35 +0100 Subject: [maker-devel] jbrowse not working In-Reply-To: <83AFE420-D54D-4CE8-833F-DE6CCC34A229@gmail.com> References: <83AFE420-D54D-4CE8-833F-DE6CCC34A229@gmail.com> Message-ID: Hi Carson Thanks for the reply I generated the off with this command gff3_merge ?d dpp_contig.maker.output/dpp_contig_master_datastore_index.log I had to rerun browse with the following command maker2jbrowse /Users/emmannaemeka/desktop/Gpm/maker/data/mucuna/muc1_genome_snap2.maker.output/muc1_genome_snap2.functional_blast.gff\maker2jbrowse -d /Users/emmannaemeka/Desktop/Gpm/maker/data/mucuna/muc1_genome_snap2.maker.output/muc1_genome_snap2_master_datastore_index.log \-out /Library/WebServer/Documents/JBrowse-1.12.1/muc/muc_jb Although its showing WARNING: No matching features found for mRNA I don't know what it means I don't understand what it means Successfully, I was able to setup the jbrowse local host. I had to move the jbrowse folder to my local host The jbrowse is up and running however, I have about 18488 contigs only 31 contigs are showing, how can i make all my contigs to show on jbrowse? Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications On Tue, Oct 10, 2017 at 1:35 AM, Carson Holt wrote: > Is muc1_genome_snap2.all.gff missing embedded fasta entries at the end of > the file? That can happen if you use the -n option with gff3_merge. > Alternatively it?s possible one of the individual contig gff3 used to build > the merged gff3 is truncated. If that is the case then gff3_merge should > have thrown some sort of error or warning when you run it. > > Thanks, > Carson > > > > > On Oct 7, 2017, at 3:34 PM, Emmanuel Nnadi wrote: > > Please, > I ran the command line > > maker2jbrowse muc1_genome_snap2.all.gff > > The command created some folders. However, at the end it read > No reference sequences defined in configuration, nothing to do. > > Please what does it mean? How can I view it in jbrowse. > > Thanks > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/ > publications > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at nbis.se Tue Oct 10 03:24:34 2017 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Tue, 10 Oct 2017 11:24:34 +0200 Subject: [maker-devel] MAKER annotation submission (EMBLmyGFF3) Message-ID: <967873FE-D61F-4233-A004-C877A60A2AC1@nbis.se> Hi MAKER users, I take advantage to this mailing list to share a tool that I hope will be useful for MAKER's users. One of the steps once we are happy of our wonderful annotation is to submit it to the public archives through one of the three INSDC databases (EMBL-EBI / NCBI / DDBJ). We developed EMBLmyGFF3, allowing to easily convert any kind of GFF3 annotation to the EMBL flat file format in order to submit to the European Nucleotide Archive (ENA) Database that is part of EMBL-EBI. It works well, amongst others, with the MAKER annotation output. We hope the tool will ease the submission process of your annotations. You will find it here: https://github.com/NBISweden/EMBLmyGFF3 A typical usage case will look like that (where ERSXXXXXX and PRJXXXXXX are the accession number and the project ID provided by EMBL-EBI prior to any submission): ./EMBLmyGFF3.py maker.gff3 maker.fa --data_class STD --topology linear --molecule_type 'genomic DNA' --table 1 --species 'Drosophila melanogaster (fly)' --taxonomy INV --accession ERSXXXXXXX --project_id PRJXXXXXXX --rg MYGROUP -o result.embl Best regards, Jacques Dainat, PhD --------------------------------------- NBIS (National Bioinformatics Infrastructure Sweden) Genome Annotation Service --------------------------------------- Uppsala University, Biomedicinska Centrum Department of Medical Biochemistry Microbiology, Genomics -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcsimenc at gmail.com Wed Oct 11 08:53:36 2017 From: mcsimenc at gmail.com (Matt Simenc) Date: Wed, 11 Oct 2017 07:53:36 -0700 Subject: [maker-devel] QI codes insufficient - how to get frac exons with EST only? Message-ID: Hey MAKER people, I would like to make a Venn diagram showing the kinds of evidence supporting gene models in my MAKER annotation where the left side shows number of genes with EST support only, the right side shows number of genes with protein support only, and the intersection shows number of genes with EST and protein support. QI summary has: Fraction of exons that overlap an EST alignment Fraction of exons that overlap EST or Protein alignments Please correct me if I'm wrong, because I am interpreting the first to be fraction of exons that overlap an EST alignment and possibly also a protein alignment. If that is the case then we can't calculate the number of genes that overlap only EST or (EST and protein) from the QI information. Anyone have a way to do this or have a script to parse the MAKER GFF3 to get this? Thanks!!! Matt Simenc -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Wed Oct 11 09:18:54 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Wed, 11 Oct 2017 11:18:54 -0400 Subject: [maker-devel] QI codes insufficient - how to get frac exons with EST only? In-Reply-To: References: Message-ID: <93934B45-909D-48FD-A840-B4F59F15AB53@gmail.com> Hi Matt, I have a hacky way that I?ve done it. It requires running MAKER two more times but they are quicker runs. To identify the genes that have protein support I pass all of the annotation back to MAKER using the model_gff option in the maker_opts.ctl file. Then I pull out all of the protein2genome features from the big MAKER GFF3 file and pass them in using the protein_gff option. I turn off all repeat masking and run MAKER. It runs fast because it doesn?t have to run any gene finders, align evidence, or repeatmask. In the output any gene with an AED less than 1 has protein support. Then I do the same thing with est2genome lines from the big GFF3 file and put them in as est_gff. The output of that one gives you genes with EST support. Then the genes with an AED of less than one in both sets have support from protein and EST. Hope this helps, Mike > On Oct 11, 2017, at 10:53 AM, Matt Simenc wrote: > > Hey MAKER people, > > I would like to make a Venn diagram showing the kinds of evidence supporting gene models in my MAKER annotation where the left side shows number of genes with EST support only, the right side shows number of genes with protein support only, and the intersection shows number of genes with EST and protein support. > > QI summary has: > > Fraction of exons that overlap an EST alignment > Fraction of exons that overlap EST or Protein alignments > > Please correct me if I'm wrong, because I am interpreting the first to be fraction of exons that overlap an EST alignment and possibly also a protein alignment. If that is the case then we can't calculate the number of genes that overlap only EST or (EST and protein) from the QI information. > > Anyone have a way to do this or have a script to parse the MAKER GFF3 to get this? > > Thanks!!! > Matt Simenc > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Oct 11 09:22:54 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 11 Oct 2017 09:22:54 -0600 Subject: [maker-devel] QI codes insufficient - how to get frac exons with EST only? In-Reply-To: <93934B45-909D-48FD-A840-B4F59F15AB53@gmail.com> References: <93934B45-909D-48FD-A840-B4F59F15AB53@gmail.com> Message-ID: <6A3091A3-5F0E-470D-89F3-4B6C16E50F4B@gmail.com> Also look at GAL for building GFF3 feature queries ?> https://github.com/The-Sequence-Ontology/GAL ?Carson > On Oct 11, 2017, at 9:18 AM, Michael Campbell wrote: > > Hi Matt, > > I have a hacky way that I?ve done it. It requires running MAKER two more times but they are quicker runs. > > To identify the genes that have protein support I pass all of the annotation back to MAKER using the model_gff option in the maker_opts.ctl file. Then I pull out all of the protein2genome features from the big MAKER GFF3 file and pass them in using the protein_gff option. I turn off all repeat masking and run MAKER. It runs fast because it doesn?t have to run any gene finders, align evidence, or repeatmask. In the output any gene with an AED less than 1 has protein support. Then I do the same thing with est2genome lines from the big GFF3 file and put them in as est_gff. The output of that one gives you genes with EST support. Then the genes with an AED of less than one in both sets have support from protein and EST. > > Hope this helps, > Mike > >> On Oct 11, 2017, at 10:53 AM, Matt Simenc wrote: >> >> Hey MAKER people, >> >> I would like to make a Venn diagram showing the kinds of evidence supporting gene models in my MAKER annotation where the left side shows number of genes with EST support only, the right side shows number of genes with protein support only, and the intersection shows number of genes with EST and protein support. >> >> QI summary has: >> >> Fraction of exons that overlap an EST alignment >> Fraction of exons that overlap EST or Protein alignments >> >> Please correct me if I'm wrong, because I am interpreting the first to be fraction of exons that overlap an EST alignment and possibly also a protein alignment. If that is the case then we can't calculate the number of genes that overlap only EST or (EST and protein) from the QI information. >> >> Anyone have a way to do this or have a script to parse the MAKER GFF3 to get this? >> >> Thanks!!! >> Matt Simenc >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcsimenc at gmail.com Wed Oct 11 22:19:04 2017 From: mcsimenc at gmail.com (Matt Simenc) Date: Wed, 11 Oct 2017 21:19:04 -0700 Subject: [maker-devel] QI codes insufficient - how to get frac exons with EST only? In-Reply-To: <6A3091A3-5F0E-470D-89F3-4B6C16E50F4B@gmail.com> References: <93934B45-909D-48FD-A840-B4F59F15AB53@gmail.com> <6A3091A3-5F0E-470D-89F3-4B6C16E50F4B@gmail.com> Message-ID: Very good, thank you! Matt On Wed, Oct 11, 2017 at 8:22 AM, Carson Holt wrote: > Also look at GAL for building GFF3 feature queries ?> > https://github.com/The-Sequence-Ontology/GAL > > ?Carson > > > > > On Oct 11, 2017, at 9:18 AM, Michael Campbell < > michael.s.campbell1 at gmail.com> wrote: > > Hi Matt, > > I have a hacky way that I?ve done it. It requires running MAKER two more > times but they are quicker runs. > > To identify the genes that have protein support I pass all of the > annotation back to MAKER using the model_gff option in the maker_opts.ctl > file. Then I pull out all of the protein2genome features from the big MAKER > GFF3 file and pass them in using the protein_gff option. I turn off all > repeat masking and run MAKER. It runs fast because it doesn?t have to run > any gene finders, align evidence, or repeatmask. In the output any gene > with an AED less than 1 has protein support. Then I do the same thing with > est2genome lines from the big GFF3 file and put them in as est_gff. The > output of that one gives you genes with EST support. Then the genes with an > AED of less than one in both sets have support from protein and EST. > > Hope this helps, > Mike > > On Oct 11, 2017, at 10:53 AM, Matt Simenc wrote: > > Hey MAKER people, > > I would like to make a Venn diagram showing the kinds of evidence > supporting gene models in my MAKER annotation where the left side shows > number of genes with EST support only, the right side shows number of genes > with protein support only, and the intersection shows number of genes with > EST and protein support. > > QI summary has: > > Fraction of exons that overlap an EST alignment > Fraction of exons that overlap EST or Protein alignments > > Please correct me if I'm wrong, because I am interpreting the first to be > fraction of exons that overlap an EST alignment and possibly also a protein > alignment. If that is the case then we can't calculate the number of genes > that overlap only EST or (EST and protein) from the QI information. > > Anyone have a way to do this or have a script to parse the MAKER GFF3 to > get this? > > Thanks!!! > Matt Simenc > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott at scottcain.net Thu Oct 12 17:33:05 2017 From: scott at scottcain.net (Scott Cain) Date: Thu, 12 Oct 2017 19:33:05 -0400 Subject: [maker-devel] GMOD hackathon before PAG San Diego in January Message-ID: Hi all, This January before PAG on the Wednesday and Thursday before PAG (January 10-11) in San Diego we are planning a GMOD hackathon. We expect that participants will be interested in solving problems/creating solutions related to Tripal, JBrowse, Apollo, and Galaxy but if you're interested in another GMOD project, by all means, let us know! We expect this hackathon to overlap with the Tripal hackathon that is on January 11 (I'm pretty sure; right Stephen?) If you are interested in attending this hackathon, please let me know so I can be sure we have an appropriately sized space. And if you're coming for the pre-PAG hackathon, consider staying for PAG, since there is always a lot of GMOD-related content at the meeting! Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From daren.card at gmail.com Thu Oct 12 20:22:54 2017 From: daren.card at gmail.com (Daren C. Card) Date: Thu, 12 Oct 2017 21:22:54 -0500 Subject: [maker-devel] MAKER RepeatRunner error on long scaffolds only In-Reply-To: <228ECD18-7B0E-47EB-9F58-FA3C31421A52@gmail.com> References: <2460BB61-C918-40B5-ABF2-03193BF13CCC@gmail.com> <52A27F91-063E-45C5-BEE0-BED0BF4E861E@gmail.com> <228ECD18-7B0E-47EB-9F58-FA3C31421A52@gmail.com> Message-ID: <90B18E05-63DB-4458-BC9B-807972BE1414@gmail.com> Hi Carson, Thanks for the help. Issue is still lingering. I?ve tried my full ?ideal? run using both the BLAST legacy 2.2.26 and also 2.6 and get the same error, so doesn?t seem to be a BLAST issue. Or is one that won?t be easy to overcome. Using BLAST v. 2.6, I tried some more runs turning off RepeatRunner or excluding the complex repeat GFF I?m trying to supply. Seems to be running fine without my GFF, which indicates to me that the issue is this file and not BLAST. Disclaimer: I didn?t run the entire scaffold since it is quite large, but it went well past the point at which it was otherwise failing which leads me to believe it would finish okay. I validated the GFF at http://genometools.org/cgi-bin/gff3validator.cgi. I had previously had <10 negative start coordinates for the repeat coordinates in the attributes field of the GFF, which I just set to 1 to give a clean GFF. This was what I used for the runs I described above, so whatever issue there is with this GFF is a mystery to me. What advice do you have for further troubleshooting to try to determine what part of the GFF is causing the issue? I don?t see any obvious way info about how the sequence or the GFF is partitioned up for the annotation among the output files produced, so any help you can provide would be great. Hoping I can resolve this as maybe this is useful to others. Weird that I?m getting this error, as I?ve annotated several other genomes in a similar manner and never had this issue. They were less contiguous, but can?t imagine that really mattering. Thanks, Daren > On Oct 8, 2017, at 7:37 PM, Carson Holt wrote: > > MAKER will use whatever blast is indicated in maker_exe.ctl, so make sure the new installation is the one indicated there. RepeatRunner is not part of RepeatMasker, and is a separate step that is essentially just a modified BLASTX against a protein database. So the standard NCBI blast+ installation is what gets used for that (not RMBLAST). > > The error you get is because the BLAST report is truncated. At the top of a BLAST report there is a summary of results, and then below there are details about each result. What is happening is that there are results in the top summary that are not being found in the bottom detail section. If Updating to BLAST+ 2.6 does not fix it for you, you may need to drop to legacy NCBI BLAST (i.e. the one that is not the BLAST+ rewrite). Here ?> ftp://ftp.ncbi.nlm.nih.gov/blast/executables/legacy/2.2.26/ > > ?Carson > > > > > >> On Oct 6, 2017, at 6:23 AM, Daren C. Card wrote: >> >> Dear Carson, >> >> Thanks so much for the quick reply. I updated BLAST to v2.6 and reran the configure script for RepeatMasker. Looks like MAKER should natively work with the BLAST that is available in the $PATH. >> >> Unfortunately, I?m still getting the same error what appears to be at roughly the same spot (~child 226). I?ve copied the stderr below. I checked my GFF file and I don?t see any issues with coordinates. I?m going to try running without a GFF of repeat annotations to see what that does, but in the meantime I wanted to send an update and see if there is anything else I should look into. >> >> Thank you, >> Daren Card >> >> >> ################################################ >> doing repeat masking >> re reading repeat masker report. >> /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.simple.rb.out >> doing blastx repeats >> re reading blast report. >> /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.te_proteins%2Efasta.repeatrunner >> deleted:2 hits >> doing blastx repeats >> doing blastx repeats >> doing blastx repeats >> doing blastx repeats >> doing blastx repeats >> doing blastx repeats >> doing blastx repeats >> doing blastx repeats >> doing blastx repeats >> collecting blastx repeatmasking >> processing all repeats >> in cluster::shadow_cluster... >> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >> --> rank=NA, hostname=moonunit0 >> ERROR: Failed while processing all repeats >> ERROR: Chunk failed at level:3, tier_type:1 >> FAILED CONTIG:scaffold-1 >> >> ERROR: Chunk failed at level:2, tier_type:0 >> FAILED CONTIG:scaffold-1 >> >> examining contents of the fasta file and run log >> ################################################ >> >> >> >>> On Oct 4, 2017, at 11:03 AM, Carson Holt wrote: >>> >>> The point where it dies is because there is no start/end coordinate for one of the alignments. The issue can either be with the GFF3 you gave it or is a truncated BLAST report. Recently there have been a number of weird BLAST+ issues related to truncated reports. Updating to 2.6+ seems to solve it for most people. There is also a 2.6 update for rmblast inside RepeatMasker. I submitted a bug report and example set to BLAST a few months ago. >>> >>> ?Carson >>> >>> >>>> On Oct 4, 2017, at 9:53 AM, Daren C. Card wrote: >>>> >>>> Hi all, >>>> >>>> I?ve been having an issue with MAKER (v. 2.31.8) that I haven?t been able to overcome, and no former questions have really addressed or helped fix the problem. I?ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size? >>>> >>>> I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can?t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I?m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library). >>>> >>>> Any help would be greatly appreciated. >>>> Daren Card >>>> >>>> University of Texas Arlington >>>> >>>> ################################################### >>>> doing blastx repeats >>>> running blast search. >>>> #--------- command -------------# >>>> Widget::blastx: >>>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner >>>> #-------------------------------# >>>> deleted:0 hits >>>> collecting blastx repeatmasking >>>> processing all repeats >>>> in cluster::shadow_cluster... >>>> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >>>> --> rank=3, hostname=moonunit0 >>>> ERROR: Failed while processing all repeats >>>> ERROR: Chunk failed at level:3, tier_type:1 >>>> FAILED CONTIG:scaffold-1 >>>> >>>> doing blastx repeats >>>> running blast search. >>>> #--------- command -------------# >>>> Widget::blastx: >>>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner >>>> #-------------------------------# >>>> ERROR: Chunk failed at level:2, tier_type:0 >>>> FAILED CONTIG:scaffold-1 >>>> >>>> deleted:0 hits >>>> deleted:0 hits >>>> ################################################### >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> > From robert.zimmermann at univie.ac.at Wed Oct 11 13:42:14 2017 From: robert.zimmermann at univie.ac.at (Bob Zimmermann) Date: Wed, 11 Oct 2017 21:42:14 +0200 Subject: [maker-devel] custom "ab initio" predictions with automatic hint-based predictions Message-ID: Hello, I would like to run maker with a custom set of ab initio predictions (based on hints given to augustus from RNAseq data), but allowing it to incorporate EST and protein data to make an additional run of augustus using hints derived from those alignments. My gene prediction section of the maker_opts.ctl file looks like this: ... augustus_species=all_combined #Augustus gene prediction species model ... pred_gff=../ab_initio_predictions/all_combined.augustus_masked.gff3 #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no ? It seems as though even if pred_gff is set, augustus will still be run for ab initio predictions with no hints if an augustus_species setting is present. I was curious if there was any way around this, partly because custom ab initios could improve my annotation and also because the ab initio step can take long. Thanks for your help! Bob From xvazquezc at gmail.com Thu Oct 12 00:09:32 2017 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Thu, 12 Oct 2017 17:09:32 +1100 Subject: [maker-devel] choosing the right gene model Message-ID: Hi there, I was visualising the annotations and I realised that in some cases, what it seems to be a gene is splitted according to one of the gene models, despite that the other 2, est2genome and prot2genome suggest that it isn't the case. Although the opposite also happens. ? For some reason, the "out of place" model is always (or almost) the one from Genemark. How much weight does carry the RNAseq and protein data on this decision (if any)? How exactly is the final gene selected? Cheers, Xabi -- Xabier V?zquez-Campos, *PhD* *Research Associate* NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: split-gene.png Type: image/png Size: 66389 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: merged-gene.png Type: image/png Size: 63815 bytes Desc: not available URL: From jan.nagel at fabi.up.ac.za Thu Oct 12 01:37:07 2017 From: jan.nagel at fabi.up.ac.za (Jan FABI) Date: Thu, 12 Oct 2017 09:37:07 +0200 Subject: [maker-devel] Maker problem Message-ID: Dear Maker team I am experiencing a problem while running maker and cannot find a solution to it online. I am running maker on a new genome, using BRAKER trained models for Augustus and GeneMark. This was successful and performed as expected, except for one contig where an error was encountered. This error occurs during Augustus and seems to have something to do with intron models. I have made sure that the input fasta does not contain characters other than ATCGN or contains "windows"/non-UNIX carriage returns. I include the relevant portion of the log below. Could you help me determine the cause of this error. setting up GFF3 output and fasta chunks preparing ab-inits running augustus. #--------- command -------------# Widget::augustus: /home/xxx/Desktop/programs/augustus-3.2.3/bin/augustus --species=Np_2017_braker --UTR=off /tmp/maker_bQo5Oc/NODE_1040_length_26483_cov_27%2E125137.abinit_masked.0 > /tmp/maker_bQo5Oc/NODE_1040_length_26483_cov_27%2E125137.abinit_masked.0.Np_2017_braker.augustus #-------------------------------# Sampling error in intron model. state=37 base=26570 /home/xxx/Desktop/programs/augustus-3.2.3/bin/augustus: ERROR Tried to sample from empty list. Sampling error in intron model. state=37 base=26570 /home/xxx/Desktop/programs/augustus-3.2.3/bin/augustus: ERROR Tried to sample from empty list. ERROR: Augustus failed --> rank=NA, hostname=xxx-VirtualBox ERROR: Failed while preparing ab-inits ERROR: Chunk failed at level:0, tier_type:2 FAILED CONTIG:NODE_1040_length_26483_cov_27.125137 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:NODE_1040_length_26483_cov_27.125137 -- Regards Jan Nagel ---------------------------------------------------------------------- PhD Genetics student Department of Genetics Forestry and Agricultural Biotechnology Institute (FABI) FABI 1, Room 1-55 University of Pretoria 74 Lunnon Rd. Hillcrest 0002 Gauteng Province South Africa Email : jan.nagel at fabi.up.ac.za Website: http://www.fabinet.up.ac.za/index.php/people-profile?profile=961 -- This message and attachments are subject to a disclaimer. Please refer to http://upnet.up.ac.za/services/it/documentation/docs/004167.pdf for full details. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott at scottcain.net Thu Oct 12 17:40:33 2017 From: scott at scottcain.net (Scott Cain) Date: Thu, 12 Oct 2017 19:40:33 -0400 Subject: [maker-devel] Call for presentations at GMOD workshop at PAG Message-ID: Hi all, This January in San Diego is the annual Plant and Animal Genomes (PAG) meeting (http://www.intlpag.org). As in previous PAGs, there will be several opportunities to present content related to GMOD projects. If you are interested in attending PAG and giving a talk at the GMOD workshop on Wednesday, January 17, please let me know. Your talk can either be about new developments/functionality in existing GMOD software, about how your organization is using the suite of GMOD software to good effect, or about technologies that you think the GMOD community would be interested in hearing about. Please email me directly with a title, an abstract or a vague idea of what you'd like to talk about. Also, if you'd really like to come but are having a hard time coming up with travel funds, please let me know, I might be able to help you with that too (up to a limit of one person anyway). Cheers, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Oct 13 09:37:25 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 Oct 2017 09:37:25 -0600 Subject: [maker-devel] MAKER RepeatRunner error on long scaffolds only In-Reply-To: <90B18E05-63DB-4458-BC9B-807972BE1414@gmail.com> References: <2460BB61-C918-40B5-ABF2-03193BF13CCC@gmail.com> <52A27F91-063E-45C5-BEE0-BED0BF4E861E@gmail.com> <228ECD18-7B0E-47EB-9F58-FA3C31421A52@gmail.com> <90B18E05-63DB-4458-BC9B-807972BE1414@gmail.com> Message-ID: So you have an input GFF3 file? Could you send it to me along with the problem contig. If you want you can upload the maker control files and evidence sets, and I can just recreate the run for the contig. Upload here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi ?Carson > On Oct 12, 2017, at 8:22 PM, Daren C. Card wrote: > > Hi Carson, > > Thanks for the help. Issue is still lingering. I?ve tried my full ?ideal? run using both the BLAST legacy 2.2.26 and also 2.6 and get the same error, so doesn?t seem to be a BLAST issue. Or is one that won?t be easy to overcome. > > Using BLAST v. 2.6, I tried some more runs turning off RepeatRunner or excluding the complex repeat GFF I?m trying to supply. Seems to be running fine without my GFF, which indicates to me that the issue is this file and not BLAST. Disclaimer: I didn?t run the entire scaffold since it is quite large, but it went well past the point at which it was otherwise failing which leads me to believe it would finish okay. > > I validated the GFF at http://genometools.org/cgi-bin/gff3validator.cgi. I had previously had <10 negative start coordinates for the repeat coordinates in the attributes field of the GFF, which I just set to 1 to give a clean GFF. This was what I used for the runs I described above, so whatever issue there is with this GFF is a mystery to me. > > What advice do you have for further troubleshooting to try to determine what part of the GFF is causing the issue? I don?t see any obvious way info about how the sequence or the GFF is partitioned up for the annotation among the output files produced, so any help you can provide would be great. > > Hoping I can resolve this as maybe this is useful to others. Weird that I?m getting this error, as I?ve annotated several other genomes in a similar manner and never had this issue. They were less contiguous, but can?t imagine that really mattering. > > Thanks, > Daren > > >> On Oct 8, 2017, at 7:37 PM, Carson Holt wrote: >> >> MAKER will use whatever blast is indicated in maker_exe.ctl, so make sure the new installation is the one indicated there. RepeatRunner is not part of RepeatMasker, and is a separate step that is essentially just a modified BLASTX against a protein database. So the standard NCBI blast+ installation is what gets used for that (not RMBLAST). >> >> The error you get is because the BLAST report is truncated. At the top of a BLAST report there is a summary of results, and then below there are details about each result. What is happening is that there are results in the top summary that are not being found in the bottom detail section. If Updating to BLAST+ 2.6 does not fix it for you, you may need to drop to legacy NCBI BLAST (i.e. the one that is not the BLAST+ rewrite). Here ?> ftp://ftp.ncbi.nlm.nih.gov/blast/executables/legacy/2.2.26/ >> >> ?Carson >> >> >> >> >> >>> On Oct 6, 2017, at 6:23 AM, Daren C. Card wrote: >>> >>> Dear Carson, >>> >>> Thanks so much for the quick reply. I updated BLAST to v2.6 and reran the configure script for RepeatMasker. Looks like MAKER should natively work with the BLAST that is available in the $PATH. >>> >>> Unfortunately, I?m still getting the same error what appears to be at roughly the same spot (~child 226). I?ve copied the stderr below. I checked my GFF file and I don?t see any issues with coordinates. I?m going to try running without a GFF of repeat annotations to see what that does, but in the meantime I wanted to send an update and see if there is anything else I should look into. >>> >>> Thank you, >>> Daren Card >>> >>> >>> ################################################ >>> doing repeat masking >>> re reading repeat masker report. >>> /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.simple.rb.out >>> doing blastx repeats >>> re reading blast report. >>> /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.te_proteins%2Efasta.repeatrunner >>> deleted:2 hits >>> doing blastx repeats >>> doing blastx repeats >>> doing blastx repeats >>> doing blastx repeats >>> doing blastx repeats >>> doing blastx repeats >>> doing blastx repeats >>> doing blastx repeats >>> doing blastx repeats >>> collecting blastx repeatmasking >>> processing all repeats >>> in cluster::shadow_cluster... >>> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >>> --> rank=NA, hostname=moonunit0 >>> ERROR: Failed while processing all repeats >>> ERROR: Chunk failed at level:3, tier_type:1 >>> FAILED CONTIG:scaffold-1 >>> >>> ERROR: Chunk failed at level:2, tier_type:0 >>> FAILED CONTIG:scaffold-1 >>> >>> examining contents of the fasta file and run log >>> ################################################ >>> >>> >>> >>>> On Oct 4, 2017, at 11:03 AM, Carson Holt wrote: >>>> >>>> The point where it dies is because there is no start/end coordinate for one of the alignments. The issue can either be with the GFF3 you gave it or is a truncated BLAST report. Recently there have been a number of weird BLAST+ issues related to truncated reports. Updating to 2.6+ seems to solve it for most people. There is also a 2.6 update for rmblast inside RepeatMasker. I submitted a bug report and example set to BLAST a few months ago. >>>> >>>> ?Carson >>>> >>>> >>>>> On Oct 4, 2017, at 9:53 AM, Daren C. Card wrote: >>>>> >>>>> Hi all, >>>>> >>>>> I?ve been having an issue with MAKER (v. 2.31.8) that I haven?t been able to overcome, and no former questions have really addressed or helped fix the problem. I?ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size? >>>>> >>>>> I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can?t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I?m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library). >>>>> >>>>> Any help would be greatly appreciated. >>>>> Daren Card >>>>> >>>>> University of Texas Arlington >>>>> >>>>> ################################################### >>>>> doing blastx repeats >>>>> running blast search. >>>>> #--------- command -------------# >>>>> Widget::blastx: >>>>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner >>>>> #-------------------------------# >>>>> deleted:0 hits >>>>> collecting blastx repeatmasking >>>>> processing all repeats >>>>> in cluster::shadow_cluster... >>>>> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >>>>> --> rank=3, hostname=moonunit0 >>>>> ERROR: Failed while processing all repeats >>>>> ERROR: Chunk failed at level:3, tier_type:1 >>>>> FAILED CONTIG:scaffold-1 >>>>> >>>>> doing blastx repeats >>>>> running blast search. >>>>> #--------- command -------------# >>>>> Widget::blastx: >>>>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner >>>>> #-------------------------------# >>>>> ERROR: Chunk failed at level:2, tier_type:0 >>>>> FAILED CONTIG:scaffold-1 >>>>> >>>>> deleted:0 hits >>>>> deleted:0 hits >>>>> ################################################### >>>>> >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Oct 13 09:42:41 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 Oct 2017 09:42:41 -0600 Subject: [maker-devel] custom "ab initio" predictions with automatic hint-based predictions In-Reply-To: References: Message-ID: <947BFB2F-A893-417B-A043-07CE71F6F97E@gmail.com> Hi Bob, pred_gff is a way to get models MAKER cannot run into the analysis. Input to pred_gff will not get hints since MAKER is not running the program. Setting augustus_species allows MAKER to run Augustus with and without hints and then those models compete against each other. You cannot just run with hints as the raw model is also used as a filter to help reduce false positive gene models that result from bad hints. If the gff3 you are providing is the same as the MAKER run of Augustus, I would recommend not providing it. If it is different in some way, then you can leave it in. If you run under MPI (it?s ok to run MPI on a single machine), then MAKER will parallelize the Augustus run by running multiple configs and contig chunks at the same time. Thanks, Carson > On Oct 11, 2017, at 1:42 PM, Bob Zimmermann wrote: > > Hello, > > I would like to run maker with a custom set of ab initio predictions (based on hints given to augustus from RNAseq data), but allowing it to incorporate EST and protein data to make an additional run of augustus using hints derived from those alignments. > > My gene prediction section of the maker_opts.ctl file looks like this: > ... > augustus_species=all_combined #Augustus gene prediction species model > ... > pred_gff=../ab_initio_predictions/all_combined.augustus_masked.gff3 #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) > est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no > ? > > It seems as though even if pred_gff is set, augustus will still be run for ab initio predictions with no hints if an augustus_species setting is present. I was curious if there was any way around this, partly because custom ab initios could improve my annotation and also because the ab initio step can take long. > > Thanks for your help! > > Bob > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri Oct 13 09:50:26 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 Oct 2017 09:50:26 -0600 Subject: [maker-devel] Maker problem In-Reply-To: References: Message-ID: If you look in the folder of the failed contig under the .../theVoid directory there will be a file called query.masked.fasta. Copy that file somewhere. Then because maker gave you the command that failed, you can run it all by itself outside of MAKER Example ?> /home/xxx/Desktop/programs/augustus-3.2.3/bin/augustus --species=Np_2017_braker --UTR=off query.masked.fasta If it still fails, you now have a test file and command you can send to Mario Stanke (mario.stanke at uni-greifswald.de ). He made Augustus. It may be a bug he has already fixed (current Augustus version is 3.3) or there may be something in the species file causing the error that he can point out. ?Carson > On Oct 12, 2017, at 1:37 AM, Jan FABI wrote: > > Dear Maker team > > I am experiencing a problem while running maker and cannot find a solution to it online. > > I am running maker on a new genome, using BRAKER trained models for Augustus and GeneMark. This was successful and performed as expected, except for one contig where an error was encountered. > > This error occurs during Augustus and seems to have something to do with intron models. I have made sure that the input fasta does not contain characters other than ATCGN or contains "windows"/non-UNIX carriage returns. > > I include the relevant portion of the log below. Could you help me determine the cause of this error. > > > > setting up GFF3 output and fasta chunks > preparing ab-inits > running augustus. > #--------- command -------------# > Widget::augustus: > /home/xxx/Desktop/programs/augustus-3.2.3/bin/augustus --species=Np_2017_braker --UTR=off /tmp/maker_bQo5Oc/NODE_1040_length_26483_cov_27%2E125137.abinit_masked.0 > /tmp/maker_bQo5Oc/NODE_1040_length_26483_cov_27%2E125137.abinit_masked.0.Np_2017_braker.augustus > #-------------------------------# > Sampling error in intron model. state=37 base=26570 > > /home/xxx/Desktop/programs/augustus-3.2.3/bin/augustus: ERROR > Tried to sample from empty list. > > Sampling error in intron model. state=37 base=26570 > > /home/xxx/Desktop/programs/augustus-3.2.3/bin/augustus: ERROR > Tried to sample from empty list. > > ERROR: Augustus failed > --> rank=NA, hostname=xxx-VirtualBox > ERROR: Failed while preparing ab-inits > ERROR: Chunk failed at level:0, tier_type:2 > FAILED CONTIG:NODE_1040_length_26483_cov_27.125137 > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:NODE_1040_length_26483_cov_27.125137 > > -- > Regards > Jan Nagel > ---------------------------------------------------------------------- > PhD Genetics student > Department of Genetics > Forestry and Agricultural Biotechnology Institute (FABI) > FABI 1, Room 1-55 > University of Pretoria > 74 Lunnon Rd. Hillcrest > 0002 > Gauteng Province > South Africa > > Email : jan.nagel at fabi.up.ac.za > > Website: http://www.fabinet.up.ac.za/index.php/people-profile?profile=961 > This message and attachments are subject to a disclaimer. > Please refer to http://upnet.up.ac.za/services/it/documentation/docs/004167.pdf for full details. > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Oct 13 09:56:43 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 Oct 2017 09:56:43 -0600 Subject: [maker-devel] choosing the right gene model In-Reply-To: References: Message-ID: <821CB4FC-5571-41B1-AB2F-5FDD691C49D9@gmail.com> Both transcript and protein evidence will go into the AED calculation for overlap support. So in both cases the chosen model had better overlap (protein evidence will not count toward the eAED overlap calculation if it is out of frame with the model it is supposed to be supporting). The larger merged model generates a clutering affect on it?s evidence, so it?s evidence set for AED calculation is slightly different than the SNAP and Augustus model would generate. In both cases, I think GeneMark is hurting more than it is helping. You may want to just drop it from the analysis (unless it?s a fungi, I often find GeneMark can have that affect). ?Carson > On Oct 12, 2017, at 12:09 AM, Xabier V?zquez-Campos wrote: > > Hi there, > > I was visualising the annotations and I realised that in some cases, what it seems to be a gene is splitted according to one of the gene models, despite that the other 2, est2genome and prot2genome suggest that it isn't the case. > > > > Although the opposite also happens. > > > ? > For some reason, the "out of place" model is always (or almost) the one from Genemark. > > How much weight does carry the RNAseq and protein data on this decision (if any)? > How exactly is the final gene selected? > > Cheers, > Xabi > > -- > Xabier V?zquez-Campos, PhD > Research Associate > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Oct 13 10:56:30 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 Oct 2017 10:56:30 -0600 Subject: [maker-devel] jbrowse not working In-Reply-To: References: <83AFE420-D54D-4CE8-833F-DE6CCC34A229@gmail.com> Message-ID: <2D6E11BC-6853-458D-AEB1-12EF74D041A3@gmail.com> The master_datastore_index.log file has a list of failed and finished contigs. You can grep the file contents for FAILED or DIED to see if any contigs are not finished. Finished contigs will be listed as FINISHED in the file. Also note that if you have errors with the jbrowse build, you have to start over (i.e. wipe out old build). Rerunning the command over a failed build will try and insert again which can generate it?s own errors. If gff3_merge was run without the -n option then you need to see if one of the GFF3 files being used is truncated (possibly dew to an IO error - not uncommon on NFS storage). You will need to see if you can identify which contig file is truncated and rerun it. ?Carson > On Oct 9, 2017, at 10:42 PM, Emmanuel Nnadi wrote: > > Hi Carson > Thanks for the reply > > I generated the off with this command gff3_merge ?d dpp_contig.maker.output/dpp_contig_master_datastore_index.log > > I had to rerun browse with the following command > > maker2jbrowse /Users/emmannaemeka/desktop/Gpm/maker/data/mucuna/muc1_genome_snap2.maker.output/muc1_genome_snap2.functional_blast.gff\maker2jbrowse -d /Users/emmannaemeka/Desktop/Gpm/maker/data/mucuna/muc1_genome_snap2.maker.output/muc1_genome_snap2_master_datastore_index.log \-out /Library/WebServer/Documents/JBrowse-1.12.1/muc/muc_jb > > Although its showing > > WARNING: No matching features found for mRNA I don't know what it means > > I don't understand what it means > > > Successfully, I was able to setup the jbrowse local host. I had to move the jbrowse folder to my local host > > > The jbrowse is up and running however, I have about 18488 contigs only 31 contigs are showing, how can i make all my contigs to show on jbrowse? > > > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > On Tue, Oct 10, 2017 at 1:35 AM, Carson Holt > wrote: > Is muc1_genome_snap2.all.gff missing embedded fasta entries at the end of the file? That can happen if you use the -n option with gff3_merge. Alternatively it?s possible one of the individual contig gff3 used to build the merged gff3 is truncated. If that is the case then gff3_merge should have thrown some sort of error or warning when you run it. > > Thanks, > Carson > > > > >> On Oct 7, 2017, at 3:34 PM, Emmanuel Nnadi > wrote: >> >> Please, >> I ran the command line >> >> maker2jbrowse muc1_genome_snap2.all.gff >> >> The command created some folders. However, at the end it read >> No reference sequences defined in configuration, nothing to do. >> >> Please what does it mean? How can I view it in jbrowse. >> >> Thanks >> >> >> Nnadi Nnaemeka Emmanuel >> Department of Microbiology, >> Faculty of Natural and Applied Science, >> Plateau State University, Bokkos, Plateau State, Nigeria. >> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Fri Oct 13 14:26:40 2017 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Sat, 14 Oct 2017 07:26:40 +1100 Subject: [maker-devel] choosing the right gene model In-Reply-To: <821CB4FC-5571-41B1-AB2F-5FDD691C49D9@gmail.com> References: <821CB4FC-5571-41B1-AB2F-5FDD691C49D9@gmail.com> Message-ID: Actually, it's a fungal genome. Although not very typical, almost half of it are repeats. Worth mention that Genemark generates a lot of predictions that overlap LTRs and other complex repeats, something that neither SNAP or Augustus do. Have you seen this before? On 14 Oct. 2017 02:56, "Carson Holt" wrote: > Both transcript and protein evidence will go into the AED calculation for > overlap support. So in both cases the chosen model had better overlap > (protein evidence will not count toward the eAED overlap calculation if it > is out of frame with the model it is supposed to be supporting). The larger > merged model generates a clutering affect on it?s evidence, so it?s > evidence set for AED calculation is slightly different than the SNAP and > Augustus model would generate. In both cases, I think GeneMark is hurting > more than it is helping. You may want to just drop it from the analysis > (unless it?s a fungi, I often find GeneMark can have that affect). > > ?Carson > > > On Oct 12, 2017, at 12:09 AM, Xabier V?zquez-Campos > wrote: > > Hi there, > > I was visualising the annotations and I realised that in some cases, what > it seems to be a gene is splitted according to one of the gene models, > despite that the other 2, est2genome and prot2genome suggest that it isn't > the case. > > > > Although the opposite also happens. > > > ? > For some reason, the "out of place" model is always (or almost) the one > from Genemark. > > How much weight does carry the RNAseq and protein data on this decision > (if any)? > How exactly is the final gene selected? > > Cheers, > Xabi > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From z2.stewart at qut.edu.au Sat Oct 14 23:02:08 2017 From: z2.stewart at qut.edu.au (ZACHARY STEWART) Date: Sun, 15 Oct 2017 05:02:08 +0000 Subject: [maker-devel] Advanced repeat library construction - CRL step 4 assistance Message-ID: Hello MAKER team, I am hoping I could have a bit of your time if that isn't a problem. I am currently performing the advanced repeat library construction as described on the MAKER wiki, and everything appears to work as expected until I reach "2.1.5 Building examplars". At this point I encounter a problem previously documented in the Google group (title: advanced repeat masking library constructions & rna-seq assembly choices) where the "Inner_Seq_For_BLAST.fasta" and "lLTRs_Seq_For_BLAST.fasta" are empty. I was hoping you could clarify what you meant by simplifying the sequence names. The genomic contig names are in a format such as ">001676F" and I modified the MITE library to have names like ">mite1, >mite2" etc. The passed_outinner_sequence.fasta has sequence names such as ">000021F_(dbseq-nr_766)_[918983,922225]" which I have not tried changing since I suspect the name is important for later reassociation. If you could point me in the right direction that would be very appreciated. Regards, Zac. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eennadi at gmail.com Sun Oct 15 15:32:10 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Sun, 15 Oct 2017 22:32:10 +0100 Subject: [maker-devel] Backlash running through my sequence Message-ID: Hi all, I am trying to running annotation on some of my sequences but noticed that i have backslash that runs through the sequence. Please how do I remove them I attached the sequence Thanks Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: sample_1.fasta Type: application/octet-stream Size: 3884915 bytes Desc: not available URL: From xvazquezc at gmail.com Mon Oct 16 01:26:56 2017 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Mon, 16 Oct 2017 18:26:56 +1100 Subject: [maker-devel] Advanced repeat library construction - CRL step 4 assistance In-Reply-To: References: Message-ID: Hi Zac, The contig names you indicate shouldn't give any problems. And if you changed the names of MITE.lib right after creation and before using it downstream, it shouldn't be an issue. Have you confirmed if the prior blastx output has any results? Also, be sure you use the same version of makeblastdb and blastx/blastn. I remember reading before running the protocol for first time that in some cases, switching versions could give problems. And be careful if you copy/paste from the wiki page, there are a few typos and dashes instead of minus characters in the command line option flags, all of which will result in errors Xabi On 15 October 2017 at 16:02, ZACHARY STEWART wrote: > Hello MAKER team, > > > I am hoping I could have a bit of your time if that isn't a problem. I am > currently performing the advanced repeat library construction as described > on the MAKER wiki, and everything appears to work as expected until I reach > "2.1.5 Building examplars". At this point I encounter a problem previously > documented in the Google group (title: advanced repeat masking library > constructions & rna-seq assembly choices) where the "Inner_Seq_For_BLAST.fasta" > and "lLTRs_Seq_For_BLAST.fasta" are empty. I was hoping you could clarify > what you meant by simplifying the sequence names. The genomic contig names > are in a format such as ">001676F" and I modified the MITE library to > have names like ">mite1, >mite2" etc. The passed_outinner_sequence.fasta > has sequence names such as ">000021F_(dbseq-nr_766)_[918983,922225]" > which I have not tried changing since I suspect the name is important for > later reassociation. If you could point me in the right direction that > would be very appreciated. > > > Regards, > > Zac. > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From yuejiaxing at gmail.com Mon Oct 16 02:54:42 2017 From: yuejiaxing at gmail.com (Jia-Xing Yue) Date: Mon, 16 Oct 2017 10:54:42 +0200 Subject: [maker-devel] maker-devel Digest, Vol 113, Issue 13 In-Reply-To: References: Message-ID: Dear maker developers, I am trying to install maker-3.01.1-beta but encountered the warning message about uninitialized value (see the warning message below) although still finished the installation. [jxyue at paralog src]$ ./Build install Building MAKER Use of uninitialized value $line in chomp at /home/jxyue/Projects/LRSDAY/bu ild/maker/src/../../../build/cpanm/perlmods/lib/perl5/Module/Build/Base.pm line 3082. Use of uninitialized value $line in substitution (s///) at /home/jxyue/Projects/LRSDAY/build/maker/src/../../../build/ cpanm/perlmods/lib/perl5/Module/Build/Base.pm line 3083. Installing MAKER... Building MAKER ... Also, when I ran this installation for the actual work, it reported errors about cannot find my specified snaphmm model for the annotation, despite that I have specified "snaphmm=$LRSDAY_HOME/data/S288C.gene.hmm" in the "maker_opts.ctl" file and this configuration information has been successfully recognized by maker. running snap. #--------- command -------------# Widget::snap: /home/jxyue/Projects/LRSDAY/build/SNAP/snap /home/jxyue/Projects/LRSDAY/data/S288C.gene.hmm /tmp/maker_m8TVEQ/chrI.abinit_masked.0 > /tmp/maker_m8TVEQ/chrI.abinit_masked.0.S288C%2Egene%2Ehmm.snap #-------------------------------# # (my comment: up to now everything looks fine) .... running snap. #--------- command -------------# Widget::snap: /home/jxyue/Projects/LRSDAY/build/SNAP/snap -plus -xdef /tmp/maker_m8TVEQ/0/85_0.4044-4985.S288C.gene.hmm.auto_annotator.xdef.snap S288C.gene.hmm /tmp /maker_m8TVEQ/0/85_0.4044-4985.S288C.gene.hmm.auto_annotator.snap.fasta > /tmp/maker_m8TVEQ/0/85_0.4044-4985.S288C.gene.hmm.auto_annotator.snap #-------------------------------# ZOE ERROR (from /home/jxyue/Projects/LRSDAY/build/SNAP/snap): error opening file (/home/jxyue/Projects/LRSDAY/build/SNAP/Zoe/HMM/S288C.gene.hmm) ZOE library version 2017-03-01 ERROR: Snap failed --> rank=NA, hostname=paralog.itc.unipi.it ERROR: Failed while annotating transcripts ERROR: Chunk failed at level:1, tier_type:4 FAILED CONTIG:chrI ERROR: Chunk failed at level:6, tier_type:0 FAILED CONTIG:chrI examining contents of the fasta file and run log # (my comment: here the error occurred. As you can see, snap somehow forgot about the path to my specified hmm file and instead looks for this file in its default installation location) It is worth noting that the parallel installation and run with maker-3.00.0-beta finish smoothly without any problem. So I suspect both the installation warning and the executing error are caused by the changes during the version update from 3.00.0-beta to 3.01.1-beta. Could you check about this issue? Thanks in advance! Finally, is it possible to also provide access to older version of maker (e.g. 3.00.0-beta in this particular case) when the user finish the registration in the maker download page? This will help users to roll back to older version when needed. Also this helps for the version control when other developers develop annotation pipelines that use maker as a dependency package. Thanks for the consideration! Best, Jia-Xing -- Jia-Xing Yue Population Genomics and Complex Traits Group Tour Pasteur 8eme etage Facult? de M?decine Institute for Research on Cancer and Aging, Nice (IRCAN) CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) 28 Avenue de Valombrose 06107 NICE Cedex 2 France Personal website: http://www.iamphioxus.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Oct 16 10:20:32 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 16 Oct 2017 10:20:32 -0600 Subject: [maker-devel] Backlash running through my sequence In-Reply-To: References: Message-ID: <09603A3A-9DC0-40DC-A111-9DC1FCDF80BB@gmail.com> I would not just remove them. The fact they are there calls into question how they got there in the first place. If you generated this file yourself, you may want to intead use fasta_tool. ?Carson > On Oct 15, 2017, at 3:32 PM, Emmanuel Nnadi wrote: > > Hi all, > I am trying to running annotation on some of my sequences but noticed that i have backslash that runs through the sequence. Please how do I remove them > I attached the sequence > > Thanks > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Tue Oct 17 13:11:39 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 17 Oct 2017 19:11:39 +0000 Subject: [maker-devel] Backlash running through my sequence In-Reply-To: <09603A3A-9DC0-40DC-A111-9DC1FCDF80BB@gmail.com> References: <09603A3A-9DC0-40DC-A111-9DC1FCDF80BB@gmail.com> Message-ID: <8FD23F25-92D4-4A9C-873B-BB559B2CCBF4@illinois.edu> I agree with Carson, though my guess is any fasta converters will either fail on these characters as non-IUPAC, or will silently remove them. Running them through a converter may not solve all the issues though, as the backslash also appears in the FASTA headers at the end of the line: cjfields-imac:MAKER cjfields$ grep '>' sample_1.fasta | grep '\\' >contig_134\ >contig_149\ >contig_158\ >contig_222\ >contig_316\ >contig_582\ >contig_634\ >contig_700\ >contig_741\ ? I?m curious, was this edited using any particular program prior to MAKER (or was this an amalgam of different files)? chris From: maker-devel on behalf of Carson Holt Date: Monday, October 16, 2017 at 11:22 AM To: Emmanuel Nnadi Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Backlash running through my sequence I would not just remove them. The fact they are there calls into question how they got there in the first place. If you generated this file yourself, you may want to intead use fasta_tool. ?Carson On Oct 15, 2017, at 3:32 PM, Emmanuel Nnadi > wrote: Hi all, I am trying to running annotation on some of my sequences but noticed that i have backslash that runs through the sequence. Please how do I remove them I attached the sequence Thanks Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Oct 17 13:33:26 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Oct 2017 13:33:26 -0600 Subject: [maker-devel] maker-devel Digest, Vol 113, Issue 13 In-Reply-To: References: Message-ID: <30F2FDFE-3B4E-4951-89D8-63C2FC772B63@gmail.com> Thanks. The map_fasta_ids script was empty in the bin directory for some reason, so the installer through an error because it could not find the #!/usr/bin/perl line. I have put it back in the bin directory where it was supposed to be and the issue goes away for the install. For the second issue, I think I found it and have updated a new tar ball to the website. Also here is a link to download the old 3.00-beta, although I would not recommend making it part of a pipeline because version 3 is still beta and still has bugs (you should use 2.31.9 instead for piplines). ?> http://topaz.genetics.utah.edu/maker_downloads/static/maker-3.00.0-beta.tgz ?Carson > On Oct 16, 2017, at 2:54 AM, Jia-Xing Yue wrote: > > Dear maker developers, > > I am trying to install maker-3.01.1-beta but encountered the warning message about uninitialized value (see the warning message below) although still finished the installation. > > [jxyue at paralog src]$ ./Build install > Building MAKER > Use of uninitialized value $line in chomp at /home/jxyue/Projects/LRSDAY/build/maker/src/../../../build/cpanm/perlmods/lib/perl5/Module/Build/Base.pm line 3082. > Use of uninitialized value $line in substitution (s///) at /home/jxyue/Projects/LRSDAY/build/maker/src/../../../build/cpanm/perlmods/lib/perl5/Module/Build/Base.pm line 3083. > Installing MAKER... > Building MAKER > ... > > Also, when I ran this installation for the actual work, it reported errors about cannot find my specified snaphmm model for the annotation, despite that I have specified "snaphmm=$LRSDAY_HOME/data/S288C.gene.hmm" in the "maker_opts.ctl" file and this configuration information has been successfully recognized by maker. > > running snap. > #--------- command -------------# > Widget::snap: > /home/jxyue/Projects/LRSDAY/build/SNAP/snap /home/jxyue/Projects/LRSDAY/data/S288C.gene.hmm /tmp/maker_m8TVEQ/chrI.abinit_masked.0 > /tmp/maker_m8TVEQ/chrI.abinit_masked.0.S288C%2Egene%2Ehmm.snap > #-------------------------------# > > # (my comment: up to now everything looks fine) > .... > > running snap. > #--------- command -------------# > Widget::snap: > /home/jxyue/Projects/LRSDAY/build/SNAP/snap -plus -xdef /tmp/maker_m8TVEQ/0/85_0.4044-4985.S288C.gene.hmm.auto_annotator.xdef.snap S288C.gene.hmm /tmp > /maker_m8TVEQ/0/85_0.4044-4985.S288C.gene.hmm.auto_annotator.snap.fasta > /tmp/maker_m8TVEQ/0/85_0.4044-4985.S288C.gene.hmm.auto_annotator.snap > #-------------------------------# > ZOE ERROR (from /home/jxyue/Projects/LRSDAY/build/SNAP/snap): error opening file (/home/jxyue/Projects/LRSDAY/build/SNAP/Zoe/HMM/S288C.gene.hmm) > ZOE library version 2017-03-01 > ERROR: Snap failed > --> rank=NA, hostname=paralog.itc.unipi.it > ERROR: Failed while annotating transcripts > ERROR: Chunk failed at level:1, tier_type:4 > FAILED CONTIG:chrI > > ERROR: Chunk failed at level:6, tier_type:0 > FAILED CONTIG:chrI > > examining contents of the fasta file and run log > > # (my comment: here the error occurred. As you can see, snap somehow forgot about the path to my specified hmm file and instead looks for this file in its default installation location) > > It is worth noting that the parallel installation and run with maker-3.00.0-beta finish smoothly without any problem. So I suspect both the installation warning and the executing error are caused by the changes during the version update from 3.00.0-beta to 3.01.1-beta. Could you check about this issue? Thanks in advance! > > Finally, is it possible to also provide access to older version of maker (e.g. 3.00.0-beta in this particular case) when the user finish the registration in the maker download page? This will help users to roll back to older version when needed. Also this helps for the version control when other developers develop annotation pipelines that use maker as a dependency package. Thanks for the consideration! > > > Best, > Jia-Xing > > -- > Jia-Xing Yue > > Population Genomics and Complex Traits Group > Tour Pasteur 8eme etage > Facult? de M?decine > Institute for Research on Cancer and Aging, Nice (IRCAN) > CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) > 28 Avenue de Valombrose > 06107 NICE Cedex 2 > France > > Personal website: http://www.iamphioxus.org/ > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.tranvan at unil.ch Wed Oct 18 05:47:35 2017 From: patrick.tranvan at unil.ch (Patrick Tran Van) Date: Wed, 18 Oct 2017 11:47:35 +0000 Subject: [maker-devel] MPI vs multiple instance for speed In-Reply-To: <8FD23F25-92D4-4A9C-873B-BB559B2CCBF4@illinois.edu> References: <09603A3A-9DC0-40DC-A111-9DC1FCDF80BB@gmail.com>, <8FD23F25-92D4-4A9C-873B-BB559B2CCBF4@illinois.edu> Message-ID: <1508327278733.19140@unil.ch> Hi Carson, 1) I think I have read one of your post saying that running maker with MPI is faster than multiple instance, can you explain why ? 2) I am trying to annotate a 1GB specie but it's superslow. I have filtered the transcriptome to speed up the process but do you have other suggestion to increase the speed ? Cheers, Patrick Tran Van -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Oct 18 09:09:10 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 18 Oct 2017 09:09:10 -0600 Subject: [maker-devel] MPI vs multiple instance for speed In-Reply-To: <1508327278733.19140@unil.ch> References: <09603A3A-9DC0-40DC-A111-9DC1FCDF80BB@gmail.com> <8FD23F25-92D4-4A9C-873B-BB559B2CCBF4@illinois.edu> <1508327278733.19140@unil.ch> Message-ID: <486FE3D5-0902-4B05-A3E1-96642C68E422@gmail.com> MAKER can coordinate parallelization under MPI in a way it can?t even with multiple simultaneous runs. Because processes can comunicate among themselves under MPI, MAKER can break larger contigs into chunks or even pull off individual steps and pass them onto another processor, then receive the results back from that processor. So multiple BLAST, RepeatMasker, Exonerate, and prediction processes can all run at the same time for the same contig. Then they all pass their result back to the parent process so it can produce output for that contig. MPI was chosen as the parallelization framework rather than threads because it works both within a single machine as well as across multiple machines, so you can scale up to hundreds of processes if needed. ?Carson > On Oct 18, 2017, at 5:47 AM, Patrick Tran Van wrote: > > Hi Carson, > > 1) I think I have read one of your post saying that running maker with MPI is faster than multiple instance, can you explain why ? > > 2) I am trying to annotate a 1GB specie but it's superslow. > I have filtered the transcriptome to speed up the process but do you have other suggestion to increase the speed ? > > Cheers, > > Patrick Tran Van > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jhumann at wsu.edu Wed Oct 18 14:38:41 2017 From: jhumann at wsu.edu (Humann, Jodi Lynn) Date: Wed, 18 Oct 2017 20:38:41 +0000 Subject: [maker-devel] fix nucleotides option on MWAS Message-ID: Hello, I was wondering if there was any way to enable the '-fix_nucleotides' option on the MWAS version we are running locally on our server? I have a genome sequence with a degenerate nucleotide and get the following error: ERROR: The nucleotide sequence file '/local/www/maker/data/users/1/NZ_CP006580.1_EcP101.fasta' appears to contain protein sequence or unrecognized characters. Note the following nucleotides may be valid but are unsupported [RYKMSWBDHV] Please check/fix the file before continuing, or set -fix_nucleotides on the command line to fix this automatically. Invalid Character: 'K' --> rank=NA, hostname=compute2 The error message says the option can be used on the command line. Is that set on the actual command to run Maker (when using the command line version), or is it something that can be set in one of the control files? Any input would be greatly appreciated. I know I can fix my input file, but would prefer to just enable the option if I can. Thanks, Jodi Jodi Humann, Ph.D. Main Bioinformatics Lab Project Coordinator Department of Horticulture Washington State University PO Box 646414 Pullman, WA 99164-6414 509-335-3206 jhumann at wsu.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.zimmermann at univie.ac.at Thu Oct 19 09:25:08 2017 From: robert.zimmermann at univie.ac.at (Bob Zimmermann) Date: Thu, 19 Oct 2017 17:25:08 +0200 Subject: [maker-devel] Fewer gene models output with a superset of EST evidence Message-ID: Hi Maker Developers, I have been playing around with several data sets as input to annotate our newly reassembled genome. We have 3 RNA seq datasets which have been assembled into de novo transcripts using Trinity. These are input into the maker pipeline along with protein evidence. What is strange is that when I run maker with the de novo transcripts from a single set, I optain more maker transcripts than when I run with a combined set (1619 vs 1450 on one chromosome) and they are longer (median transcript length 1619 vs 1450, IQR 872-2160 vs 667-2026). It might make sense if they were more and shorter if the additional evidence was joining transcripts, but this would indicate that it is not the case. Therefore I?m trying to understand the algorithm. From what I understand if it finds evidence for an ab initio prediction for which the internal splice junctions agree, then it is considered for improvement. Why, then, if my combined set is a strict superset of the single set, do i get more transcripts with the single set? Thanks for your help! Best, Bob ? Department of Molecular Evolution and Development Universit?t Wien Althanstra?e 14 (UZA I), Zimmer 2.019 1090 Vienna Austria +43 1 427757002 From robert.zimmermann at univie.ac.at Thu Oct 19 09:28:17 2017 From: robert.zimmermann at univie.ac.at (Bob Zimmermann) Date: Thu, 19 Oct 2017 17:28:17 +0200 Subject: [maker-devel] Fewer gene models output with a superset of EST evidence In-Reply-To: References: Message-ID: Correction to the above numbers, the median lengths are 1414 and 1256. > On 19 Oct 2017, at 17:25, Bob Zimmermann wrote: > > Hi Maker Developers, > > I have been playing around with several data sets as input to annotate our newly reassembled genome. We have 3 RNA seq datasets which have been assembled into de novo transcripts using Trinity. These are input into the maker pipeline along with protein evidence. What is strange is that when I run maker with the de novo transcripts from a single set, I optain more maker transcripts than when I run with a combined set (1619 vs 1450 on one chromosome) and they are longer (median transcript length 1619 vs 1450, IQR 872-2160 vs 667-2026). It might make sense if they were more and shorter if the additional evidence was joining transcripts, but this would indicate that it is not the case. > > Therefore I?m trying to understand the algorithm. From what I understand if it finds evidence for an ab initio prediction for which the internal splice junctions agree, then it is considered for improvement. Why, then, if my combined set is a strict superset of the single set, do i get more transcripts with the single set? > > Thanks for your help! > > Best, > Bob > > ? > > Department of Molecular Evolution and Development > Universit?t Wien > Althanstra?e 14 (UZA I), Zimmer 2.019 > 1090 Vienna > Austria > > +43 1 427757002 > From carsonhh at gmail.com Thu Oct 19 09:44:07 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 19 Oct 2017 09:44:07 -0600 Subject: [maker-devel] Fewer gene models output with a superset of EST evidence In-Reply-To: References: Message-ID: <62F04A76-F3F1-4044-B4AD-129B15A9EEB2@gmail.com> You should look at both in a browser to get a better idea of what?s going on. What MAKER does is take the evidence given, clusters it (strand specific clustering) then uses the transcript evidence as intron hints to the predictors and protein alignments as exon hints (will also use polished protein hints to generate intron hints in the absence of transcript intron hints). Finally it uses overlapping transcript evidence to generate UTR. So look at it in a browser. See if the apparent overlap clusters are different in extent, also look for mRNA-seq evidence being merged. If the cluster is falsely merging between two loci because the mRNA-seq is merged, one of two things will happen you will get multiple models since the predictor can?t make a single model work within the cluster using the hints, or you will get a model with a really long UTR that is blocking other models from existing in the region. Also as depending on the mRNA-seq evidence coming in, you may be generating false models because of noise in the data. Essentially everything is transcribed at a basal level, so as you get more and more mRNA-seq, you generate more and more spurious alignments. So more evidence might gernate fewer long alignments for true loci or by falsely merging genes while simultaneously adding a number of very short spurious results. ?Carson > On Oct 19, 2017, at 9:28 AM, Bob Zimmermann wrote: > > Correction to the above numbers, the median lengths are 1414 and 1256. > >> On 19 Oct 2017, at 17:25, Bob Zimmermann wrote: >> >> Hi Maker Developers, >> >> I have been playing around with several data sets as input to annotate our newly reassembled genome. We have 3 RNA seq datasets which have been assembled into de novo transcripts using Trinity. These are input into the maker pipeline along with protein evidence. What is strange is that when I run maker with the de novo transcripts from a single set, I optain more maker transcripts than when I run with a combined set (1619 vs 1450 on one chromosome) and they are longer (median transcript length 1619 vs 1450, IQR 872-2160 vs 667-2026). It might make sense if they were more and shorter if the additional evidence was joining transcripts, but this would indicate that it is not the case. >> >> Therefore I?m trying to understand the algorithm. From what I understand if it finds evidence for an ab initio prediction for which the internal splice junctions agree, then it is considered for improvement. Why, then, if my combined set is a strict superset of the single set, do i get more transcripts with the single set? >> >> Thanks for your help! >> >> Best, >> Bob >> >> ? >> >> Department of Molecular Evolution and Development >> Universit?t Wien >> Althanstra?e 14 (UZA I), Zimmer 2.019 >> 1090 Vienna >> Austria >> >> +43 1 427757002 >> > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Oct 19 11:32:44 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 19 Oct 2017 11:32:44 -0600 Subject: [maker-devel] fix nucleotides option on MWAS In-Reply-To: References: Message-ID: Hi Jodi, I didn?t even know anyone else even had an MWAS server running (I?ve actually pulled all of the Build options for MWAS out of current releases). But you should be able to add the fix_nucleotide option to the command run by MWAS by editing the mwas_server script (?/maker/MWAS/bin/mwas_server). Somewhere inside the script there will be a line like this ?> $command = "$FindBin::RealBin/../../bin/maker -qq -base $job_id"; You can add -fix_nucleotides to that command so it always runs. fix_nucleotides is as command line flag. It?s basically a warning for the user to let them know something is weird (i.e. it is possible they mixed up transcript/protein sequence files). And then it allows the user to tell MAKER they did not mix files up, rather the data is supposed to look that way and they are ok with MAKER altering the sequence by replacing the letters or dashes seen with N?s. Thanks, Carson > On Oct 18, 2017, at 2:38 PM, Humann, Jodi Lynn wrote: > > Hello, > > I was wondering if there was any way to enable the ??fix_nucleotides? option on the MWAS version we are running locally on our server? I have a genome sequence with a degenerate nucleotide and get the following error: > > ERROR: The nucleotide sequence file '/local/www/maker/data/users/1/NZ_CP006580.1_EcP101.fasta' appears to contain protein sequence or unrecognized characters. Note the following nucleotides may be valid but are unsupported [RYKMSWBDHV] Please check/fix the file before continuing, or set -fix_nucleotides on the command line to fix this automatically. Invalid Character: 'K' --> rank=NA, hostname=compute2 > > The error message says the option can be used on the command line. Is that set on the actual command to run Maker (when using the command line version), or is it something that can be set in one of the control files? Any input would be greatly appreciated. I know I can fix my input file, but would prefer to just enable the option if I can. > > Thanks, > Jodi > > Jodi Humann, Ph.D. > Main Bioinformatics Lab Project Coordinator > Department of Horticulture > Washington State University > PO Box 646414 > Pullman, WA 99164-6414 > 509-335-3206 > jhumann at wsu.edu > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Oct 19 12:46:17 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 19 Oct 2017 12:46:17 -0600 Subject: [maker-devel] fix nucleotides option on MWAS In-Reply-To: References: Message-ID: <052F801C-3B37-4B0F-B40A-A905F5F2B1CE@gmail.com> Yes. That is the current version. ?Carson > On Oct 19, 2017, at 12:45 PM, Humann, Jodi Lynn wrote: > > Thanks for the info, Carson. We are running v2.31.9, and were able to get MWAS running, with some work. That is the current Maker version right? > > Jodi > > From: Carson Holt [mailto:carsonhh at gmail.com ] > Sent: Thursday, October 19, 2017 10:33 AM > To: Humann, Jodi Lynn > > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] fix nucleotides option on MWAS > > Hi Jodi, > > I didn?t even know anyone else even had an MWAS server running (I?ve actually pulled all of the Build options for MWAS out of current releases). But you should be able to add the fix_nucleotide option to the command run by MWAS by editing the mwas_server script (?/maker/MWAS/bin/mwas_server). > > Somewhere inside the script there will be a line like this ?> > $command = "$FindBin::RealBin/../../bin/maker -qq -base $job_id"; > > You can add -fix_nucleotides to that command so it always runs. fix_nucleotides is as command line flag. It?s basically a warning for the user to let them know something is weird (i.e. it is possible they mixed up transcript/protein sequence files). And then it allows the user to tell MAKER they did not mix files up, rather the data is supposed to look that way and they are ok with MAKER altering the sequence by replacing the letters or dashes seen with N?s. > > Thanks, > Carson > > > On Oct 18, 2017, at 2:38 PM, Humann, Jodi Lynn > wrote: > > Hello, > > I was wondering if there was any way to enable the ??fix_nucleotides? option on the MWAS version we are running locally on our server? I have a genome sequence with a degenerate nucleotide and get the following error: > > ERROR: The nucleotide sequence file '/local/www/maker/data/users/1/NZ_CP006580.1_EcP101.fasta' appears to contain protein sequence or unrecognized characters. Note the following nucleotides may be valid but are unsupported [RYKMSWBDHV] Please check/fix the file before continuing, or set -fix_nucleotides on the command line to fix this automatically. Invalid Character: 'K' --> rank=NA, hostname=compute2 > > The error message says the option can be used on the command line. Is that set on the actual command to run Maker (when using the command line version), or is it something that can be set in one of the control files? Any input would be greatly appreciated. I know I can fix my input file, but would prefer to just enable the option if I can. > > Thanks, > Jodi > > Jodi Humann, Ph.D. > Main Bioinformatics Lab Project Coordinator > Department of Horticulture > Washington State University > PO Box 646414 > Pullman, WA 99164-6414 > 509-335-3206 > jhumann at wsu.edu > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jhumann at wsu.edu Thu Oct 19 12:45:43 2017 From: jhumann at wsu.edu (Humann, Jodi Lynn) Date: Thu, 19 Oct 2017 18:45:43 +0000 Subject: [maker-devel] fix nucleotides option on MWAS In-Reply-To: References: Message-ID: Thanks for the info, Carson. We are running v2.31.9, and were able to get MWAS running, with some work. That is the current Maker version right? Jodi From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Thursday, October 19, 2017 10:33 AM To: Humann, Jodi Lynn Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] fix nucleotides option on MWAS Hi Jodi, I didn?t even know anyone else even had an MWAS server running (I?ve actually pulled all of the Build options for MWAS out of current releases). But you should be able to add the fix_nucleotide option to the command run by MWAS by editing the mwas_server script (?/maker/MWAS/bin/mwas_server). Somewhere inside the script there will be a line like this ?> $command = "$FindBin::RealBin/../../bin/maker -qq -base $job_id"; You can add -fix_nucleotides to that command so it always runs. fix_nucleotides is as command line flag. It?s basically a warning for the user to let them know something is weird (i.e. it is possible they mixed up transcript/protein sequence files). And then it allows the user to tell MAKER they did not mix files up, rather the data is supposed to look that way and they are ok with MAKER altering the sequence by replacing the letters or dashes seen with N?s. Thanks, Carson On Oct 18, 2017, at 2:38 PM, Humann, Jodi Lynn > wrote: Hello, I was wondering if there was any way to enable the ??fix_nucleotides? option on the MWAS version we are running locally on our server? I have a genome sequence with a degenerate nucleotide and get the following error: ERROR: The nucleotide sequence file '/local/www/maker/data/users/1/NZ_CP006580.1_EcP101.fasta' appears to contain protein sequence or unrecognized characters. Note the following nucleotides may be valid but are unsupported [RYKMSWBDHV] Please check/fix the file before continuing, or set -fix_nucleotides on the command line to fix this automatically. Invalid Character: 'K' --> rank=NA, hostname=compute2 The error message says the option can be used on the command line. Is that set on the actual command to run Maker (when using the command line version), or is it something that can be set in one of the control files? Any input would be greatly appreciated. I know I can fix my input file, but would prefer to just enable the option if I can. Thanks, Jodi Jodi Humann, Ph.D. Main Bioinformatics Lab Project Coordinator Department of Horticulture Washington State University PO Box 646414 Pullman, WA 99164-6414 509-335-3206 jhumann at wsu.edu _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From eennadi at gmail.com Mon Oct 23 07:30:07 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Mon, 23 Oct 2017 14:30:07 +0100 Subject: [maker-devel] Contamination report from NCBI Message-ID: Hello Good day. Please I submitted my sequence to NCBI and they sent back this contamination report. Please how do I use maker to effect the correction Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- SUBID BioProject BioSample Organism -------------------------------------------------------- SUB3124577 PRJNA414658 SAMN07821433 Mucuna pruriens [] We ran your sequences through our Contamination Screen. The screen found contigs that need to be trimmed and/or excluded. Please adjust the sequences appropriately and then resubmit your sequences. After you remove the contamination, trim any Ns at the ends of the sequence and remove any sequences that are shorter than 200 nt and not part of a multi-component scaffold. Note that hits in eukaryotic genomes to mitochondrial sequences can be ignored when specific criteria are met. Those criteria are explained below. Note that mismatches between the name of the adaptor/primer identified in the screen and the sequencing technology used to generate the sequencing data should not be used to discount the validity of the screen results as the adaptors/primers of many different sequencing platforms share sequence similarity. [] Some of the sequences hit primers or adaptors used in Illumina or 454 or other sequencing strategies or platforms. Primers at the end of a sequence should be removed. However, if primers are present within sequences then you should strongly consider splitting the sequences at the primers because the primer sequence could have been the region of overlap, causing a misassembly. Screened 26,016 sequences, 396,641,426 bp. Note: 5,610 sequences with runs of Ns 10 bp or longer (or those longer that 20 MB) were split before screening. 428 sequences with locations to mask/trim (31 split spans to exclude, 397 split spans with locations to mask/trim) Trim: Sequence name, length, span(s), apparent source contig_10109 13138 13078..13138 adaptor:NGB00847.1 contig_10200 20270 1..76 adaptor:NGB00847.1 contig_10202 22517 1..44 adaptor:NGB00360.1 contig_10218 55661 55592..55661 adaptor:NGB00847.1 contig_10283 11575 1..79 adaptor:NGB00847.1 contig_1038 91134 91073..91134 adaptor:NGB00360.1 contig_104 10061 10005..10061 adaptor:NGB00360.1 contig_10405 24076 1..43 adaptor:NGB00847.1 contig_10425 16694 16639..16694 adaptor:NGB00360.1 contig_10447 37445 37233..37445 adaptor:NGB00360.1 contig_10466 19368 1..52 adaptor:NGB00847.1 contig_10576 12053 12003..12053 adaptor:NGB00360.1 contig_1059 34516 34457..34516 adaptor:NGB00847.1 contig_106 49997 1..45 adaptor:NGB00360.1 contig_10695 27664 1..38 adaptor:NGB01029.1 contig_10753 12481 12413..12481 adaptor:NGB00847.1 contig_10822 33522 33441..33522 adaptor:NGB00847.1 contig_1083 10637 1..23 adaptor:NGB01096.1 contig_10851 36752 36682..36752 adaptor:NGB00360.1 contig_10878 27925 27848..27925 adaptor:NGB00360.1 contig_10965 23597 1..57 adaptor:NGB00360.1 contig_10968 7413 1..40 adaptor:NGB00847.1 contig_1099 35847 1..70 adaptor:NGB00360.1 contig_11034 10224 10166..10224 adaptor:NGB00360.1 contig_11058 32994 1..23 adaptor:NGB01088.1 contig_11138 17426 1..73 adaptor:NGB00847.1 contig_11166 6306 6266..6306 adaptor:NGB00360.1 contig_11182 26558 1..30 adaptor:NGB01096.1 contig_11216 15160 1..59 adaptor:NGB00847.1 contig_11269 14732 14655..14732 adaptor:NGB00847.1 contig_11306 28246 28199..28246 adaptor:NGB00360.1 contig_1136 28186 1..73 adaptor:NGB00847.1 contig_1141 58119 58028..58119 adaptor:NGB00847.1 contig_11416 8561 8539..8561 adaptor:NGB01088.1 contig_11504 8890 8840..8890 adaptor:NGB00360.1 contig_1158 17422 17398..17422 adaptor:NGB01088.1 contig_11647 7021 1..69 adaptor:NGB00847.1 contig_11684 17442 17418..17442 adaptor:NGB01096.1 contig_11752 38337 38314..38337 adaptor:NGB01088.1 contig_11767 6366 6324..6366 adaptor:NGB00847.1 contig_11791 22415 1..43 adaptor:NGB00847.1 contig_11792 58260 1..29 adaptor:NGB01096.1 contig_1187 39501 39462..39501 adaptor:NGB01029.1 contig_12059 10094 1..72 adaptor:NGB00360.1 contig_12130 13210 13164..13210 adaptor:NGB00360.1 contig_12164 17561 17539..17561 adaptor:NGB01096.1 contig_12169 14178 139..196 adaptor:NGB00360.1 contig_12183 15822 61..112 adaptor:NGB00360.1 contig_12266 11704 11640..11704 adaptor:NGB00360.1 contig_12300 9550 9360..9550 adaptor:NGB01088.1 contig_12324 49997 49891..49997 adaptor:NGB00847.1 contig_12423 45971 45860..45918 adaptor:NGB00360.1 contig_12441 15141 1..42 adaptor:NGB00847.1 contig_12514 14655 1..69 adaptor:NGB00847.1 contig_12515 5355 5326..5355 adaptor:NGB01088.1 contig_12535 22496 22458..22496 adaptor:NGB01029.1 contig_12544 19615 19559..19615 adaptor:NGB00360.1 contig_12558 20026 20007..20026 adaptor:NGB01088.1 contig_12613 6880 6793..6880 adaptor:NGB00847.1 contig_12701 18439 18330..18382 adaptor:NGB00360.1 contig_12713 13341 13274..13341 adaptor:NGB00360.1 contig_12723 17913 1..38 adaptor:NGB01088.1 contig_12730 55277 55249..55277 adaptor:NGB01096.1 contig_12739 6792 1..48 adaptor:NGB00360.1 contig_12787 30950 1..19 adaptor:NGB01096.1 contig_1279 18699 18670..18699 adaptor:NGB01088.1 contig_12815 5168 5091..5168 adaptor:NGB00847.1 contig_12846 20753 1..70 adaptor:NGB00360.1 contig_1288 34784 1..31 adaptor:NGB01096.1 contig_12888 12204 1..23 adaptor:NGB01096.1 contig_12919 10315 1..71 adaptor:NGB00360.1 contig_13031 8972 8938..8972 adaptor:NGB01093.1 contig_13088 6275 1..22 adaptor:NGB01088.1 contig_13140 36197 1..48 adaptor:NGB00360.1 contig_13233 16414 16355..16414 adaptor:NGB00847.1 contig_1330 33261 1..44 adaptor:NGB00847.1 contig_13319 19747 1..20 adaptor:NGB01096.1 contig_13367 36004 35868..35931 adaptor:NGB00847.1 contig_13395 5338 1..79 adaptor:NGB00360.1 contig_1341 30756 30734..30756 adaptor:NGB01088.1 contig_13481 9637 9600..9637 adaptor:NGB00360.1 contig_13506 5704 5662..5704 adaptor:NGB00360.1 contig_13548 5814 79..121 adaptor:NGB00360.1 contig_13567 21576 1..47 adaptor:NGB00847.1 contig_13669 8336 1..24 adaptor:NGB01088.1 contig_13718 23500 1..25 adaptor:NGB01096.1 contig_13783 18720 1..41 adaptor:NGB00847.1 contig_13830 32395 32367..32395 adaptor:NGB01096.1 contig_13845 15572 15493..15572 adaptor:NGB00360.1 contig_13854 10932 1..48 adaptor:NGB00360.1 contig_13943 37701 37674..37701 adaptor:NGB01096.1 contig_13957 7159 1..30 adaptor:NGB01096.1 contig_14014 29735 29672..29735 adaptor:NGB00360.1 contig_14027 21418 21340..21418 adaptor:NGB00360.1 contig_14032 47642 1..53 adaptor:NGB00847.1 contig_14047 26936 1..28 adaptor:NGB01088.1 contig_14048 45832 1..22 adaptor:NGB01088.1 contig_14061 11471 1..179 adaptor:NGB01096.1 contig_14113 17661 1..67 adaptor:NGB00360.1 contig_14173 17601 1..41 adaptor:NGB00847.1 contig_1418 31840 1..248 adaptor:NGB00847.1 contig_14194 7456 7294..7456 adaptor:NGB01096.1 contig_14210 8814 1971..2025 adaptor:NGB00360.1 contig_14223 12513 12489..12513 adaptor:NGB01096.1 contig_14317 21472 21410..21472 adaptor:NGB00360.1 contig_14424 6040 5973..6040 adaptor:NGB00360.1 contig_14425 6404 6379..6404 adaptor:NGB01096.1 contig_14426 31457 31398..31457 adaptor:NGB00847.1 contig_14458 6814 6623..6814 adaptor:NGB01088.1 contig_14524 9488 9431..9488 adaptor:NGB00847.1 contig_14584 20433 1..96 adaptor:NGB00847.1 contig_1459 32979 1..32 adaptor:NGB01096.1 contig_14601 19077 1..28 adaptor:NGB01096.1 contig_14641 21747 1..45 adaptor:NGB00847.1 contig_14664 48155 48118..48155 adaptor:NGB00360.1 contig_14711 11854 11827..11854 adaptor:NGB01096.1 contig_14736 21360 1..37 adaptor:NGB01029.1 contig_14749 12830 1..33 adaptor:NGB01093.1 contig_14966 9962 9891..9962 adaptor:NGB00360.1 contig_14999 5248 1..41 adaptor:NGB00360.1 contig_15010 17976 1..43 adaptor:NGB00360.1 contig_15011 26484 26462..26484 adaptor:NGB01096.1 contig_15017 9331 9291..9331 adaptor:NGB00360.1 contig_1503 63533 1..33 adaptor:NGB01096.1 contig_15032 32240 32157..32240 adaptor:NGB00847.1 contig_15060 15050 15010..15050 adaptor:NGB00847.1 contig_15065 13062 12996..13062 adaptor:NGB00360.1 contig_15070 29943 1..29 adaptor:NGB01096.1 contig_15132 20431 1..71 adaptor:NGB00847.1 contig_15169 7086 7051..7086 adaptor:NGB00846.1 contig_15174 19921 1..23 adaptor:NGB01096.1 contig_15194 16100 16039..16100 adaptor:NGB00847.1 contig_15212 9272 1..50 adaptor:NGB00847.1 contig_15215 15591 1..58 adaptor:NGB00360.1 contig_15271 37699 37647..37699 adaptor:NGB00847.1 contig_15276 11087 11031..11087 adaptor:NGB00847.1 contig_15309 10118 1..42 adaptor:NGB00847.1 contig_15320 7963 7901..7963 adaptor:NGB00847.1 contig_15334 5683 1..36 adaptor:NGB00846.1 contig_15364 17306 76..139 adaptor:NGB00847.1 contig_15374 28301 28263..28301 adaptor:NGB00360.1 contig_15377 10470 10428..10470 adaptor:NGB00360.1 contig_15398 24069 23999..24069 adaptor:NGB00847.1 contig_15500 9289 9271..9289 adaptor:NGB01096.1 contig_15507 25565 1..22 adaptor:NGB01088.1 contig_15523 5782 5762..5782 adaptor:NGB01088.1 contig_15529 10225 10143..10225 adaptor:NGB00360.1 contig_15569 9645 9612..9645 adaptor:NGB01090.1 contig_15596 7163 1..42 adaptor:NGB00360.1 contig_15605 18521 1..31 adaptor:NGB01096.1 contig_15672 8446 1..213 adaptor:NGB01088.1 contig_15686 22141 58..90 adaptor:NGB00847.1 contig_15708 18098 17996..18098 adaptor:NGB00847.1 contig_15736 18284 18252..18284 adaptor:NGB01096.1 contig_15777 17192 1..45 adaptor:NGB00360.1 contig_15812 8602 1..77 adaptor:NGB00360.1 contig_15959 10936 10913..10936 adaptor:NGB01096.1 contig_15972 11324 1..71 adaptor:NGB00360.1 contig_15974 24312 24243..24312 adaptor:NGB00847.1 contig_16057 8838 8775..8838 adaptor:NGB00847.1 contig_16088 7608 1..71 adaptor:NGB00360.1 contig_16142 10392 1..53 adaptor:NGB00847.1 contig_1617 14870 255..310 adaptor:NGB00360.1 contig_16183 9226 9205..9226 adaptor:NGB01088.1 contig_16188 62666 62586..62666 adaptor:NGB00847.1 contig_16370 7868 1..42 adaptor:NGB00847.1 contig_16416 19512 1..21 adaptor:NGB01088.1 contig_1645 25016 24951..25016 adaptor:NGB00360.1 contig_16510 31845 31776..31845 adaptor:NGB00847.1 contig_16529 17342 1..45 adaptor:NGB00360.1 contig_16558 9338 9097..9338 adaptor:NGB00360.1 contig_16573 6590 6521..6590 adaptor:NGB00847.1 contig_16608 7397 7324..7397 adaptor:NGB00847.1 contig_16631 11055 1..50 adaptor:NGB00360.1 contig_16641 5482 1..190 adaptor:NGB01088.1 contig_1667 35244 35200..35244 adaptor:NGB01029.1 contig_16682 14500 1..71 adaptor:NGB00847.1 contig_16699 6216 6148..6216 adaptor:NGB00360.1 contig_16734 12674 12625..12674 adaptor:NGB00360.1 contig_16790 6341 1..51 adaptor:NGB00360.1 contig_16807 7512 1..36 adaptor:NGB01096.1 contig_16817 20743 1..155 adaptor:NGB01088.1 contig_16839 6969 1..69 adaptor:multiple contig_16870 10948 1..49 adaptor:NGB00847.1 contig_16880 5622 5549..5622 adaptor:NGB00360.1 contig_16889 9182 1..40 adaptor:NGB00360.1 contig_16911 6691 1..28 adaptor:NGB01088.1 contig_16921 9432 9358..9432 adaptor:NGB00360.1 contig_16951 14285 14262..14285 adaptor:NGB01088.1 contig_17021 12242 1..75 adaptor:NGB00360.1 contig_17092 22712 1..64 adaptor:NGB00360.1 contig_17147 7706 7685..7706 adaptor:NGB01096.1 contig_17195 15668 15643..15668 adaptor:NGB01096.1 contig_17214 7881 7819..7881 adaptor:NGB00847.1 contig_17299 7861 7830..7861 adaptor:NGB01088.1 contig_17344 8915 8765..8823 adaptor:NGB00360.1 contig_17361 8425 1..26 adaptor:NGB01096.1 contig_17422 11017 10964..11017 adaptor:NGB00360.1 contig_17471 5988 5964..5988 adaptor:NGB01096.1 contig_17505 10208 1..74 adaptor:NGB00360.1 contig_17506 6091 1..61 adaptor:NGB00360.1 contig_17520 6084 6028..6084 adaptor:NGB00360.1 contig_17538 5796 5766..5796 adaptor:NGB01096.1 contig_17558 7066 6837..7066 adaptor:NGB01080.1 contig_17561 15165 1..206 adaptor:NGB01083.1 contig_17594 6976 1..26 adaptor:NGB01088.1 contig_17655 14371 14177..14371 adaptor:NGB01088.1 contig_17671 17801 1..50 adaptor:NGB00847.1 contig_17680 5752 5693..5752 adaptor:NGB00847.1 contig_17738 6456 1..44 adaptor:NGB00360.1 contig_17741 10917 10889..10917 adaptor:NGB01096.1 contig_17775 5928 1..79 adaptor:NGB00847.1 contig_17804 11597 11562..11597 adaptor:NGB00846.1 contig_17872 11319 11278..11319 adaptor:NGB00847.1 contig_17876 5647 5613..5647 adaptor:NGB01083.1 contig_17925 9923 1..22 adaptor:NGB01088.1 contig_17938 5246 1..23 adaptor:NGB01088.1 contig_18016 8044 1..29 adaptor:NGB01096.1 contig_18017 6668 6647..6668 adaptor:NGB01096.1 contig_18044 11330 11299..11330 adaptor:NGB01096.1 contig_18049 10560 1..88 adaptor:NGB00847.1 contig_18173 12243 1..159 adaptor:NGB01096.1 contig_18175 8788 8765..8788 adaptor:NGB01096.1 contig_18177 11418 11340..11418 adaptor:multiple contig_18182 11901 11832..11901 adaptor:NGB00847.1 contig_18201 6059 6038..6059 adaptor:NGB01096.1 contig_18222 11216 11136..11216 adaptor:NGB00847.1 contig_18228 8386 8361..8386 adaptor:NGB01088.1 contig_18321 5922 5897..5922 adaptor:NGB01096.1 contig_18370 5400 5085..5116 adaptor:NGB00747.1 contig_18453 5849 1..38 adaptor:NGB00360.1 contig_1846 23210 1..64 adaptor:NGB00360.1 contig_18479 5209 1..44 adaptor:NGB00360.1 contig_18486 5749 5726..5749 adaptor:NGB01088.1 contig_18488 5217 1..19 adaptor:NGB01088.1 contig_1969 65776 1..60 adaptor:NGB00360.1 contig_197 9215 1..83 adaptor:NGB00847.1 contig_1977 13765 1..35 adaptor:NGB01093.1 contig_1999 53427 53398..53427 adaptor:NGB01096.1 contig_2125 11803 11769..11803 adaptor:NGB01083.1 contig_2151 9544 1..37 adaptor:NGB01029.1 contig_2179 38972 1..67 adaptor:NGB00360.1 contig_2186 31110 30935..31110 adaptor:NGB01096.1 contig_2203 60314 60124..60187 adaptor:NGB00847.1 contig_2278 33271 1..36 adaptor:NGB01090.1 contig_2305 17957 1..58 adaptor:NGB00360.1 contig_2361 48816 48764..48816 adaptor:NGB00847.1 contig_242 49604 49535..49604 adaptor:NGB00360.1 contig_2429 76318 76242..76318 adaptor:NGB00847.1 contig_2430 70439 70373..70439 adaptor:NGB00847.1 contig_2459 63920 1..96 adaptor:NGB00847.1 contig_2485 31300 31260..31300 adaptor:NGB00360.1 contig_2508 25152 25095..25152 adaptor:NGB00847.1 contig_2650 36583 1..58 adaptor:NGB00847.1 contig_2668 22089 22052..22089 adaptor:NGB01029.1 contig_2735 13614 1..19 adaptor:NGB01088.1 contig_2781 50403 1..70 adaptor:NGB00847.1 contig_2800 30768 22802..22846 adaptor:NGB00360.1 contig_2824 44109 1..38 adaptor:NGB00847.1 contig_2888 19121 1..89 adaptor:NGB00360.1 contig_2900 36871 1..32 adaptor:NGB01088.1 contig_2949 25959 25916..25959 adaptor:NGB00360.1 contig_2970 20833 1..46 adaptor:NGB00360.1 contig_2986 16429 1..43 adaptor:NGB00360.1 contig_3069 38956 38904..38956 adaptor:NGB00847.1 contig_3106 9135 1..87 adaptor:NGB00847.1 contig_3124 70101 70072..70101 adaptor:NGB01088.1 contig_3129 30402 30379..30402 adaptor:NGB01088.1 contig_3147 10611 10586..10611 adaptor:NGB01096.1 contig_3190 117726 117687..117726 adaptor:NGB01029.1 contig_3243 44291 44273..44291 adaptor:NGB01096.1 contig_3276 57911 1..42 adaptor:NGB00360.1 contig_341 67008 1..22 adaptor:NGB01096.1 contig_3542 16855 1..60 adaptor:NGB00847.1 contig_3595 29288 1..79 adaptor:NGB00847.1 contig_3712 73078 1..78 adaptor:NGB00847.1 contig_3840 40472 40414..40472 adaptor:NGB00360.1 contig_3868 33875 33819..33875 adaptor:NGB00360.1 contig_3903 40080 40010..40080 adaptor:NGB00847.1 contig_3996 44010 43970..44010 adaptor:NGB00360.1 contig_4001 26085 1..73 adaptor:NGB00847.1 contig_4014 30676 30590..30676 adaptor:NGB00360.1 contig_4019 49543 1..76 adaptor:NGB00360.1 contig_4036 58848 58696..58848 adaptor:NGB00846.1 contig_4084 41308 41210..41308 adaptor:NGB00360.1 contig_4095 24801 1..70 adaptor:NGB00847.1 contig_4098 27393 1..189 adaptor:NGB01096.1 contig_410 57740 57678..57740 adaptor:NGB00360.1 contig_4172 20870 9717..9749 adaptor:NGB01096.1 contig_4318 55870 55805..55870 adaptor:NGB00360.1 contig_432 58593 58569..58593 adaptor:NGB01088.1 contig_4323 87370 87304..87370 adaptor:NGB00847.1 contig_4365 27401 27350..27401 adaptor:NGB00847.1 contig_4516 14480 1..98 adaptor:NGB00847.1 contig_452 34031 1..23 adaptor:NGB01096.1 contig_4530 63069 63006..63069 adaptor:NGB00360.1 contig_4651 67570 67518..67570 adaptor:NGB00847.1 contig_4679 20970 1..38 adaptor:NGB00360.1 contig_4686 7411 1..24 adaptor:NGB01096.1 contig_4743 37926 1..79 adaptor:NGB00360.1 contig_4765 11248 11167..11248 adaptor:NGB00360.1 contig_4801 91339 1..50 adaptor:NGB00360.1 contig_4812 37300 37121..37300 adaptor:NGB01093.1 contig_4820 80899 80862..80899 adaptor:NGB00360.1 contig_4904 9220 1..52 adaptor:NGB00847.1 contig_4916 29759 29718..29759 adaptor:NGB00847.1 contig_4924 19015 1..49 adaptor:NGB00847.1 contig_4939 23620 23574..23620 adaptor:NGB01029.1 contig_4956 40890 1..24 adaptor:NGB01088.1 contig_4994 71509 71447..71509 adaptor:NGB00847.1 contig_501 34157 34116..34157 adaptor:NGB00847.1 contig_5036 13162 1..77 adaptor:NGB00360.1 contig_5052 64212 1..170 adaptor:NGB01096.1 contig_5063 35265 35243..35265 adaptor:NGB01096.1 contig_5090 27510 27441..27510 adaptor:NGB00847.1 contig_5157 5988 5805..5988 adaptor:NGB00847.1 contig_5168 6086 6051..6086 adaptor:NGB00846.1 contig_5176 9131 1..41 adaptor:NGB00360.1 contig_5243 44178 1..88 adaptor:NGB00847.1 contig_5270 39229 39177..39229 adaptor:NGB00847.1 contig_5452 30446 1..36 adaptor:NGB00846.1 contig_5576 58918 1..34 adaptor:NGB01096.1 contig_5582 108611 1..87 adaptor:NGB00847.1 contig_5590 55235 55210..55235 adaptor:NGB01088.1 contig_5700 8246 1..82 adaptor:NGB00847.1 contig_5815 99837 1..63 adaptor:NGB00847.1 contig_5820 11616 1..202 adaptor:NGB00847.1 contig_5878 55755 1..26 adaptor:NGB01096.1 contig_59 12390 1..24 adaptor:NGB01096.1 contig_5959 11737 11532..11737 adaptor:NGB01096.1 contig_6065 11492 1..32 adaptor:NGB01088.1 contig_6067 19311 1..39 adaptor:NGB01029.1 contig_6092 14700 1..37 adaptor:NGB01029.1 contig_6194 32760 1..19 adaptor:NGB01088.1 contig_620 10761 1..206 adaptor:NGB01029.1 contig_6259 83001 1..50 adaptor:NGB00360.1 contig_6321 29279 29260..29279 adaptor:NGB01096.1 contig_6408 14690 1..74 adaptor:NGB00360.1 contig_6455 68530 68497..68530 adaptor:NGB01090.1 contig_6513 12061 11986..12061 adaptor:NGB00847.1 contig_6542 45321 1..41 adaptor:NGB00360.1 contig_6569 19579 19500..19579 adaptor:NGB00847.1 contig_6628 13125 13107..13125 adaptor:NGB01096.1 contig_6673 6733 6699..6733 adaptor:NGB01088.1 contig_6676 13298 13265..13298 adaptor:NGB01088.1 contig_6692 17411 1..43 adaptor:NGB00847.1 contig_6703 57771 1..63 adaptor:NGB00360.1 contig_6785 8258 8237..8258 adaptor:NGB01088.1 contig_6908 53004 52732..52792 adaptor:NGB00847.1 contig_6940 18777 18580..18777 adaptor:NGB00360.1 contig_6941 42032 41980..42032 adaptor:NGB00847.1 contig_6945 53258 1..71 adaptor:NGB00360.1 contig_6986 49101 1..21 adaptor:NGB01088.1 contig_701 57358 1..28 adaptor:NGB01096.1 contig_7017 41786 1..88 adaptor:NGB00360.1 contig_7035 53503 53477..53503 adaptor:NGB01096.1 contig_7046 12860 12812..12860 adaptor:NGB00360.1 contig_7081 27746 1..78 adaptor:NGB00847.1 contig_7082 26783 1..73 adaptor:NGB00847.1 contig_7083 44465 1..70 adaptor:NGB00847.1 contig_7117 33739 33661..33739 adaptor:NGB00360.1 contig_7197 5439 5361..5439 adaptor:NGB00360.1 contig_720 34826 34755..34826 adaptor:NGB00360.1 contig_7210 16719 1..30 adaptor:NGB01096.1 contig_7225 51589 51483..51519 adaptor:NGB01090.1 contig_7228 37410 1..64 adaptor:NGB00360.1 contig_7296 6652 1..80 adaptor:NGB00847.1 contig_7317 11682 1..30 adaptor:NGB01088.1 contig_7323 47612 47560..47612 adaptor:NGB00847.1 contig_7353 50534 50506..50534 adaptor:NGB01096.1 contig_7478 44000 43977..44000 adaptor:NGB01088.1 contig_7510 11029 1..22 adaptor:NGB01096.1 contig_7540 12614 12566..12614 adaptor:NGB00360.1 contig_7587 74260 74065..74260 adaptor:NGB00847.1 contig_7607 14652 1..31 adaptor:NGB01088.1 contig_7612 27455 27299..27354 adaptor:NGB00360.1 contig_7705 39772 1..49 adaptor:NGB00360.1 contig_7729 22305 1..172 adaptor:NGB00360.1 contig_7747 11568 11502..11568 adaptor:NGB00847.1 contig_7750 52785 52748..52785 adaptor:NGB01029.1 contig_7800 20628 20588..20628 adaptor:NGB00360.1 contig_7851 53514 53439..53514 adaptor:NGB00360.1 contig_7989 51399 1..97 adaptor:NGB00847.1 contig_7992 9120 9035..9120 adaptor:NGB00360.1 contig_7995 103073 103034..103073 adaptor:NGB00360.1 contig_8000 16924 1..85 adaptor:NGB00847.1 contig_8071 73728 73657..73728 adaptor:NGB00360.1 contig_809 20474 20399..20474 adaptor:NGB00360.1 contig_8139 33627 1..25 adaptor:NGB01088.1 contig_8165 17003 16958..17003 adaptor:NGB00847.1 contig_8207 30300 30275..30300 adaptor:NGB01096.1 contig_821 111683 111656..111683 adaptor:NGB01096.1 contig_8236 30705 1..70 adaptor:NGB00360.1 contig_8261 49091 1..181 adaptor:NGB00847.1 contig_8265 28139 27940..28139 adaptor:NGB00360.1 contig_8307 32654 32591..32654 adaptor:NGB00360.1 contig_8340 12953 12925..12953 adaptor:NGB01096.1 contig_8389 19738 1..75 adaptor:NGB00847.1 contig_8399 35159 1..147 adaptor:NGB01096.1 contig_8569 19455 1..38 adaptor:multiple contig_8735 42362 42335..42362 adaptor:NGB01088.1 contig_8737 22308 1..70 adaptor:NGB00360.1 contig_8790 14216 14198..14216 adaptor:NGB01096.1 contig_8797 6889 1..95 adaptor:NGB00847.1 contig_8815 39194 1..80 adaptor:NGB00360.1 contig_886 10028 1..76 adaptor:NGB00360.1 contig_8861 12192 12145..12192 adaptor:NGB00360.1 contig_8909 11109 11042..11109 adaptor:NGB00360.1 contig_8932 8331 8281..8331 adaptor:NGB00847.1 contig_8975 8730 8671..8730 adaptor:NGB00847.1 contig_8992 12682 12661..12682 adaptor:NGB01088.1 contig_8994 7982 7950..7982 adaptor:NGB01096.1 contig_9017 8069 7896..8069 adaptor:NGB00360.1 contig_9045 35343 535..598 adaptor:NGB00847.1 contig_9082 10766 1..28 adaptor:NGB01096.1 contig_9271 17773 17750..17773 adaptor:NGB01096.1 contig_9273 12180 1..180 adaptor:NGB01096.1 contig_9287 6067 1..77 adaptor:NGB00847.1 contig_9474 33382 33060..33111 adaptor:NGB00360.1 contig_9495 19348 19274..19348 adaptor:NGB00847.1 contig_9540 30855 30836..30855 adaptor:NGB01088.1 contig_9591 10604 1..41 adaptor:NGB00847.1 contig_9628 15083 1..34 adaptor:NGB01096.1 contig_9677 5510 5486..5510 adaptor:NGB01088.1 contig_9693 9823 1..84 adaptor:NGB00847.1 contig_9825 54363 54309..54363 adaptor:NGB00847.1 contig_9863 14033 14013..14033 adaptor:NGB01088.1 contig_9993 35388 1..26 adaptor:NGB01096.1 From xvazquezc at gmail.com Mon Oct 23 16:02:47 2017 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Tue, 24 Oct 2017 09:02:47 +1100 Subject: [maker-devel] Contamination report from NCBI In-Reply-To: References: Message-ID: Hi there, Did you perform quality and adapter trimming of your raw reads? That's actually an assembly issue. I would seriously encourage you to redo the assembly before continuing. If that isnt possible, start by removing those sequences and split the contigs at those places as suggested in the report. For the annotation part, not 100% sure but I'd say start with the "Merge/resolve legacy annotations" steps but maybe Carson or Daniel have a different suggestion http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Merge.2FResolve_Legacy_Annotations Cheers, Xabi On 24 October 2017 at 00:30, Emmanuel Nnadi wrote: > Hello > > Good day. > > Please I submitted my sequence to NCBI and they sent back this > contamination report. > > Please how do I use maker to effect the correction > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/ > publications > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Mon Oct 23 17:21:06 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 23 Oct 2017 23:21:06 +0000 Subject: [maker-devel] Contamination report from NCBI In-Reply-To: References: Message-ID: <8B4331B5-9D10-478A-91A5-80AF702CD9CD@illinois.edu> It looks like the adapter is primarily at the ends, which is easy to remove. However, I agree, removing these and redoing the assembly may improve the assembly quality. chris From: maker-devel on behalf of Xabier V?zquez-Campos Date: Monday, October 23, 2017 at 5:03 PM To: Emmanuel Nnadi Cc: Maker Mailing List , "Ence, daniel" Subject: Re: [maker-devel] Contamination report from NCBI Hi there, Did you perform quality and adapter trimming of your raw reads? That's actually an assembly issue. I would seriously encourage you to redo the assembly before continuing. If that isnt possible, start by removing those sequences and split the contigs at those places as suggested in the report. For the annotation part, not 100% sure but I'd say start with the "Merge/resolve legacy annotations" steps but maybe Carson or Daniel have a different suggestion http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Merge.2FResolve_Legacy_Annotations Cheers, Xabi On 24 October 2017 at 00:30, Emmanuel Nnadi > wrote: Hello Good day. Please I submitted my sequence to NCBI and they sent back this contamination report. Please how do I use maker to effect the correction Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Xabier V?zquez-Campos, PhD Research Associate NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmokrejs at gmail.com Tue Oct 24 04:23:38 2017 From: mmokrejs at gmail.com (=?UTF-8?Q?Martin_MOKREJ=c5=a0?=) Date: Tue, 24 Oct 2017 12:23:38 +0200 Subject: [maker-devel] Contamination report from NCBI In-Reply-To: References: Message-ID: Hi Emmanuel, use trimmomatic or cutadapt to remove the adapters and check the output file for unremoved cases. Once they are all removed redo the assembly. Martin Emmanuel Nnadi wrote: > Hello > > Good day. > > Please I submitted my sequence to NCBI and they sent back this contamination report. > > Please how do I use maker to effect the correction -- Martin Mokrejs, Ph.D. Adapter/artefact removal from datasets based on the following technologies: 454 / IonTorrent / Evrogen MINT / Clontech SMART / ..., Illumina http://www.bioinformatics.cz/software/supported-protocols/ From eennadi at gmail.com Tue Oct 24 04:44:20 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Tue, 24 Oct 2017 11:44:20 +0100 Subject: [maker-devel] Contamination report from NCBI In-Reply-To: References: Message-ID: Thanks! Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications On Oct 24, 2017 11:23 AM, "Martin MOKREJ?" wrote: > Hi Emmanuel, > use trimmomatic or cutadapt to remove the adapters and check the output > file for unremoved cases. Once they are all removed redo the assembly. > Martin > > Emmanuel Nnadi wrote: > > Hello > > > > Good day. > > > > Please I submitted my sequence to NCBI and they sent back this > contamination report. > > > > Please how do I use maker to effect the correction > > -- > Martin Mokrejs, Ph.D. > Adapter/artefact removal from datasets based on the following technologies: > 454 / IonTorrent / Evrogen MINT / Clontech SMART / ..., Illumina > http://www.bioinformatics.cz/software/supported-protocols/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Tue Oct 24 10:54:13 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Tue, 24 Oct 2017 12:54:13 -0400 Subject: [maker-devel] gene annotation for a better genome In-Reply-To: <5AFEDD05-DF02-463F-A6EE-1619A9BB968D@gmail.com> References: <5AFEDD05-DF02-463F-A6EE-1619A9BB968D@gmail.com> Message-ID: Dear Carson: Thank you again for your suggestions. I just get the new genome assembly of NMR and start to do gene annotation. I understand you ideas about this. But can I simply use the old genome transcripts as transcript evidence, and just following the standard Maker2 pipeline? I set est2genome=1 and provide the mRNA sequences in the fasta format for the first round training of SNAP. For transcripts I have the following choices. I think the first choice is more reliable and better, right? (1) There are about 60,000 RefSeq transcripts from NCBI. So I downloaded those sequences in fasta format. (2) We have the raw data of RNA-seq from 11 tissues, we can do assembly by trinity for each sample and then get the transcripts. But I think most of the RNA-seq should have been submitted to NCBI. BTW, if we use the RefSeq data from NCBI, we can download the mRNA sequences, coding sequences or protein sequences. I wonder which type of data are the best to train the SNAP? For Augustus, we will use BUSCO to train it. Many thanks. Best Quanwei 2017-09-29 12:36 GMT-04:00 Carson Holt : > You can try using the est2genome=1 option to map the old models forward > onto the new assembly as if they were ESTs (add a line that says > est_forward=1 to the control file to maintain old naming and set est=1 to > the old model transcript file). Then provide the final models as a pred_gff > for a subsuquent run (i.e. a traditional MAKER run where you are annotating > the new assembly with transcript and protein evidence and ab initio > predictors). Don?t supply the old models to est= on that run. > > The idea behind doing it this way is: > 1. You need to get old models onto the new assembly so coordinates will > change. So by doing it this way, you will at least be able to move many > models forward based on homology. > 2. By providing the models to pred_gff on a subsequent MAKER run, you are > just letting old models compete against new annotations. They will be > rejected if they have no evidence support, or can be kept if they score > better than alternate models from SNAP/Augustus. That way you have the > chance to integrate old models while at the same time rejecting some old > models that have no evidence overlap. > > ?Carson > > > > On Sep 28, 2017, at 6:05 AM, Quanwei Zhang > wrote: > > > > Hello: > > > > Recently, we got a new version of NMR genome, whose genome had been > assembled and annotated a few years ago. We can download the gene > annotation from NCBI. > > > > Now we want to annotate the new genome using Maker2 pipeline. I wonder > how can I fully make use of existing annotations. On the other hand, since > the previous genome is not very well assemblies, some genes annotation > maybe false positives. I hope those false positive genes in previous > annotation won't mislead Maker2 for current gene annotation. > > > > Do you have any suggestions. Thanks > > > > Best > > Quanwei > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Oct 24 16:26:00 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 24 Oct 2017 16:26:00 -0600 Subject: [maker-devel] gene annotation for a better genome In-Reply-To: References: <5AFEDD05-DF02-463F-A6EE-1619A9BB968D@gmail.com> Message-ID: Yes. If you use est2genome it will just align the model, and then find the longest ORF. So it is a quick way to jsut align old models to the new assembly. Alternatively you can just do de novo annotation. ?Carson > On Oct 24, 2017, at 10:54 AM, Quanwei Zhang wrote: > > Dear Carson: > > Thank you again for your suggestions. I just get the new genome assembly of NMR and start to do gene annotation. I understand you ideas about this. But can I simply use the old genome transcripts as transcript evidence, and just following the standard Maker2 pipeline? I set est2genome=1 and provide the mRNA sequences in the fasta format for the first round training of SNAP. > > For transcripts I have the following choices. I think the first choice is more reliable and better, right? > (1) There are about 60,000 RefSeq transcripts from NCBI. So I downloaded those sequences in fasta format. > (2) We have the raw data of RNA-seq from 11 tissues, we can do assembly by trinity for each sample and then get the transcripts. But I think most of the RNA-seq should have been submitted to NCBI. > > BTW, if we use the RefSeq data from NCBI, we can download the mRNA sequences, coding sequences or protein sequences. I wonder which type of data are the best to train the SNAP? For Augustus, we will use BUSCO to train it. > > Many thanks. > > Best > Quanwei > > > > > 2017-09-29 12:36 GMT-04:00 Carson Holt >: > You can try using the est2genome=1 option to map the old models forward onto the new assembly as if they were ESTs (add a line that says est_forward=1 to the control file to maintain old naming and set est=1 to the old model transcript file). Then provide the final models as a pred_gff for a subsuquent run (i.e. a traditional MAKER run where you are annotating the new assembly with transcript and protein evidence and ab initio predictors). Don?t supply the old models to est= on that run. > > The idea behind doing it this way is: > 1. You need to get old models onto the new assembly so coordinates will change. So by doing it this way, you will at least be able to move many models forward based on homology. > 2. By providing the models to pred_gff on a subsequent MAKER run, you are just letting old models compete against new annotations. They will be rejected if they have no evidence support, or can be kept if they score better than alternate models from SNAP/Augustus. That way you have the chance to integrate old models while at the same time rejecting some old models that have no evidence overlap. > > ?Carson > > > > On Sep 28, 2017, at 6:05 AM, Quanwei Zhang > wrote: > > > > Hello: > > > > Recently, we got a new version of NMR genome, whose genome had been assembled and annotated a few years ago. We can download the gene annotation from NCBI. > > > > Now we want to annotate the new genome using Maker2 pipeline. I wonder how can I fully make use of existing annotations. On the other hand, since the previous genome is not very well assemblies, some genes annotation maybe false positives. I hope those false positive genes in previous annotation won't mislead Maker2 for current gene annotation. > > > > Do you have any suggestions. Thanks > > > > Best > > Quanwei > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daren.card at gmail.com Wed Oct 25 06:17:13 2017 From: daren.card at gmail.com (Daren C. Card) Date: Wed, 25 Oct 2017 07:17:13 -0500 Subject: [maker-devel] MAKER RepeatRunner error on long scaffolds only In-Reply-To: <49A07052-11CE-4D20-A8E1-2E036F04C45C@gmail.com> References: <2460BB61-C918-40B5-ABF2-03193BF13CCC@gmail.com> <52A27F91-063E-45C5-BEE0-BED0BF4E861E@gmail.com> <228ECD18-7B0E-47EB-9F58-FA3C31421A52@gmail.com> <90B18E05-63DB-4458-BC9B-807972BE1414@gmail.com> <97656D7C-3613-4B0B-9D99-0441AC28ABCC@gmail.com> <49A07052-11CE-4D20-A8E1-2E036F04C45C@gmail.com> Message-ID: <0406D4C3-9C43-4198-B2EA-241C6C504425@gmail.com> Hi Carson (and CCed MAKER list for the record), Thanks for troubleshooting my issue further. Good to hear that the run should ultimately work, but strange it isn?t for me. I?ll keep playing with it and will hopefully get it sorted out by running through the list you suggested. Thanks again for the help, Daren > On Oct 24, 2017, at 11:27 AM, Carson Holt wrote: > > I cannot seem to replicate this. I ran with MAKER 2.31.8 and 2.31.9, both with and without the GFF3 file (total of 4 runs). It succeeded without issues in every case. > > The only things I can think to try are. > 1. Reinstall BLAST+. Even though you have 2.6.0, just try it anyways. Also Install rmblast 2.6.0 for use wth RepeatMasker (requires that you install from source). > 2. Maker sure you run ./configure inside RepeatMasker to let it know about the new rmblast installation. > 3. Change the location of blast and related scripts in maker_exe.ctl otherwise MAKER won?t know to use your new installation. > 4. delete the mpi_blastdb directory under MAKER?s output directory tp force it to rebuild all BLAST indexes. > 5. delete any fle with a ?.db? extension in the maker output directory to force it to rebuld all GFF3 indexes. > 6. Update BioPerl to the current CPAN version. > > Also here is a link to the results I got for your contig (version 2.31.8 using the repeat masking GFF3 file) ?> http://weatherby.genetics.utah.edu/data/scaffold-1.tgz > > ?Carson > > > >> On Oct 17, 2017, at 6:46 AM, Daren C. Card wrote: >> >> Hi Carson, >> >> Thanks for offering to take a further look at this. I?ve uploaded all the files that I think you?d need to run MAKER on your systems, but let me know if you need anything else. My username is ?guest_5038?. >> >> Repeat annotations GFF is from RepeatModeler, with simple repeats filtered away. Transcript evidence was from Trinity assembly of several RNAseq libraries. Several sets of protein evidence from related species. Also have augustus HMM trained based on the genome assembly using BUSCO with retraining turned on. >> >> The command I?ve used is below, and here are the software versions I?m working with: >> >> Maker - 2.31.8 >> BLAST - 2.6.0 >> Augustus - 3.2.3 >> RepeatMasker - 4.0.6 >> >> mpiexec -n 12 maker -base CroVir_rnd1_chr1 round1_maker_opts.chr1.ctl maker_bopts.ctl maker_exe.ctl >> >> Thanks again! >> Daren >> >> >>> On Oct 13, 2017, at 10:37 AM, Carson Holt wrote: >>> >>> So you have an input GFF3 file? Could you send it to me along with the problem contig. If you want you can upload the maker control files and evidence sets, and I can just recreate the run for the contig. >>> >>> Upload here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>> >>> ?Carson >>> >>> >>> >>>> On Oct 12, 2017, at 8:22 PM, Daren C. Card wrote: >>>> >>>> Hi Carson, >>>> >>>> Thanks for the help. Issue is still lingering. I?ve tried my full ?ideal? run using both the BLAST legacy 2.2.26 and also 2.6 and get the same error, so doesn?t seem to be a BLAST issue. Or is one that won?t be easy to overcome. >>>> >>>> Using BLAST v. 2.6, I tried some more runs turning off RepeatRunner or excluding the complex repeat GFF I?m trying to supply. Seems to be running fine without my GFF, which indicates to me that the issue is this file and not BLAST. Disclaimer: I didn?t run the entire scaffold since it is quite large, but it went well past the point at which it was otherwise failing which leads me to believe it would finish okay. >>>> >>>> I validated the GFF at http://genometools.org/cgi-bin/gff3validator.cgi. I had previously had <10 negative start coordinates for the repeat coordinates in the attributes field of the GFF, which I just set to 1 to give a clean GFF. This was what I used for the runs I described above, so whatever issue there is with this GFF is a mystery to me. >>>> >>>> What advice do you have for further troubleshooting to try to determine what part of the GFF is causing the issue? I don?t see any obvious way info about how the sequence or the GFF is partitioned up for the annotation among the output files produced, so any help you can provide would be great. >>>> >>>> Hoping I can resolve this as maybe this is useful to others. Weird that I?m getting this error, as I?ve annotated several other genomes in a similar manner and never had this issue. They were less contiguous, but can?t imagine that really mattering. >>>> >>>> Thanks, >>>> Daren >>>> >>>> >>>>> On Oct 8, 2017, at 7:37 PM, Carson Holt wrote: >>>>> >>>>> MAKER will use whatever blast is indicated in maker_exe.ctl, so make sure the new installation is the one indicated there. RepeatRunner is not part of RepeatMasker, and is a separate step that is essentially just a modified BLASTX against a protein database. So the standard NCBI blast+ installation is what gets used for that (not RMBLAST). >>>>> >>>>> The error you get is because the BLAST report is truncated. At the top of a BLAST report there is a summary of results, and then below there are details about each result. What is happening is that there are results in the top summary that are not being found in the bottom detail section. If Updating to BLAST+ 2.6 does not fix it for you, you may need to drop to legacy NCBI BLAST (i.e. the one that is not the BLAST+ rewrite). Here ?> ftp://ftp.ncbi.nlm.nih.gov/blast/executables/legacy/2.2.26/ >>>>> >>>>> ?Carson >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> On Oct 6, 2017, at 6:23 AM, Daren C. Card wrote: >>>>>> >>>>>> Dear Carson, >>>>>> >>>>>> Thanks so much for the quick reply. I updated BLAST to v2.6 and reran the configure script for RepeatMasker. Looks like MAKER should natively work with the BLAST that is available in the $PATH. >>>>>> >>>>>> Unfortunately, I?m still getting the same error what appears to be at roughly the same spot (~child 226). I?ve copied the stderr below. I checked my GFF file and I don?t see any issues with coordinates. I?m going to try running without a GFF of repeat annotations to see what that does, but in the meantime I wanted to send an update and see if there is anything else I should look into. >>>>>> >>>>>> Thank you, >>>>>> Daren Card >>>>>> >>>>>> >>>>>> ################################################ >>>>>> doing repeat masking >>>>>> re reading repeat masker report. >>>>>> /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.simple.rb.out >>>>>> doing blastx repeats >>>>>> re reading blast report. >>>>>> /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/68/scaffold-1.227.te_proteins%2Efasta.repeatrunner >>>>>> deleted:2 hits >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> doing blastx repeats >>>>>> collecting blastx repeatmasking >>>>>> processing all repeats >>>>>> in cluster::shadow_cluster... >>>>>> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >>>>>> --> rank=NA, hostname=moonunit0 >>>>>> ERROR: Failed while processing all repeats >>>>>> ERROR: Chunk failed at level:3, tier_type:1 >>>>>> FAILED CONTIG:scaffold-1 >>>>>> >>>>>> ERROR: Chunk failed at level:2, tier_type:0 >>>>>> FAILED CONTIG:scaffold-1 >>>>>> >>>>>> examining contents of the fasta file and run log >>>>>> ################################################ >>>>>> >>>>>> >>>>>> >>>>>>> On Oct 4, 2017, at 11:03 AM, Carson Holt wrote: >>>>>>> >>>>>>> The point where it dies is because there is no start/end coordinate for one of the alignments. The issue can either be with the GFF3 you gave it or is a truncated BLAST report. Recently there have been a number of weird BLAST+ issues related to truncated reports. Updating to 2.6+ seems to solve it for most people. There is also a 2.6 update for rmblast inside RepeatMasker. I submitted a bug report and example set to BLAST a few months ago. >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>>> On Oct 4, 2017, at 9:53 AM, Daren C. Card wrote: >>>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I?ve been having an issue with MAKER (v. 2.31.8) that I haven?t been able to overcome, and no former questions have really addressed or helped fix the problem. I?ve run MAKER on a vertebrate genome and it runs fine and finishes all but the 8 longest scaffolds. These are all above 65Mb (others are below 5Mb) and most are around 20% Ns (one is 35%). The 9th longest sequence, which is just above 60Mb and 27% Ns finished fine too, which is strange because it is the only really long scaffold to run to completion. The fact that MAKER works fine on all but a few scaffolds indicates to me that the issue is those scaffolds and not MAKER/my settings, but the only difference is the length of the sequences. Is there an upper limit on scaffold size? >>>>>>>> >>>>>>>> I originally ran whole genome as MPI, but have since tried to rerun individual scaffolds using a single core and still get issues. The error I get is below, but I can?t find any additional info in the program-specific logs to help figure this out. MAKER actually runs a little bit longer after this error before stalling and trying again. Seems to have something to do with RepeatRunner. For repeats I?m providing a GFF of complex repeats obtained from custom RepeatMasker annotations (using rm_gff option) and letting MAKER handle simple repeats (model_org=simple) and protein-based annotation with RepeatRunner (with default library). >>>>>>>> >>>>>>>> Any help would be greatly appreciated. >>>>>>>> Daren Card >>>>>>>> >>>>>>>> University of Texas Arlington >>>>>>>> >>>>>>>> ################################################### >>>>>>>> doing blastx repeats >>>>>>>> running blast search. >>>>>>>> #--------- command -------------# >>>>>>>> Widget::blastx: >>>>>>>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.6 -query /tmp/maker_xiChvf/1/scaffold-1.226 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.226.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner >>>>>>>> #-------------------------------# >>>>>>>> deleted:0 hits >>>>>>>> collecting blastx repeatmasking >>>>>>>> processing all repeats >>>>>>>> in cluster::shadow_cluster... >>>>>>>> Died at /opt/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >>>>>>>> --> rank=3, hostname=moonunit0 >>>>>>>> ERROR: Failed while processing all repeats >>>>>>>> ERROR: Chunk failed at level:3, tier_type:1 >>>>>>>> FAILED CONTIG:scaffold-1 >>>>>>>> >>>>>>>> doing blastx repeats >>>>>>>> running blast search. >>>>>>>> #--------- command -------------# >>>>>>>> Widget::blastx: >>>>>>>> /usr/bin/blastx -db /tmp/maker_xiChvf/te_proteins%2Efasta.mpi.10.3 -query /tmp/maker_xiChvf/3/scaffold-1.225 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/castoelab/Desktop/daren/cvv_annotation/chr1-8/CroVir_rnd1_chr1.maker.output/CroVir_rnd1_chr1_datastore/51/66/scaffold-1//theVoid.scaffold-1/67/scaffold-1.225.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner >>>>>>>> #-------------------------------# >>>>>>>> ERROR: Chunk failed at level:2, tier_type:0 >>>>>>>> FAILED CONTIG:scaffold-1 >>>>>>>> >>>>>>>> deleted:0 hits >>>>>>>> deleted:0 hits >>>>>>>> ################################################### >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> maker-devel mailing list >>>>>>>> maker-devel at box290.bluehost.com >>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>> >>>>>> >>>>> >>>> >>> >> > From venyao at qq.com Wed Oct 25 01:25:25 2017 From: venyao at qq.com (=?ISO-8859-1?B?V2VuIFlhbw==?=) Date: Wed, 25 Oct 2017 15:25:25 +0800 Subject: [maker-devel] NNN in maker output transcript Message-ID: Dear guys, Recently, I run maker to annotate a genome. I found that the transcript fasta file output by Maker contains "NNN". Is this normal? If not, what's going on? Is this a bug of maker or my configuration of maker is not correct? I told maker to use snap and augustus for de novo prediction and use exonerate to align ESTs and proteins. Thanks! Wen Yao -------------- next part -------------- An HTML attachment was scrubbed... URL: From dandence at gmail.com Wed Oct 25 09:42:04 2017 From: dandence at gmail.com (Daniel Ence) Date: Wed, 25 Oct 2017 11:42:04 -0400 Subject: [maker-devel] NNN in maker output transcript In-Reply-To: References: Message-ID: <4913D7BA-CD9B-4B7F-83EF-B8072B4950A6@gmail.com> Hi Wen Yao, Do you mean that some of the transcript sequences contain ?N? characters or that an entire transcript sequence is ?NNN?? > On Oct 25, 2017, at 3:25 AM, Wen Yao wrote: > > Dear guys, > > Recently, I run maker to annotate a genome. I found that the transcript fasta file output by Maker contains "NNN". Is this normal? > If not, what's going on? Is this a bug of maker or my configuration of maker is not correct? > I told maker to use snap and augustus for de novo prediction and use exonerate to align ESTs and proteins. > > Thanks! > > Wen Yao > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Oct 25 09:42:34 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 25 Oct 2017 09:42:34 -0600 Subject: [maker-devel] NNN in maker output transcript In-Reply-To: References: Message-ID: <96D45DF3-83D0-4EF3-AE29-1B929A369B81@gmail.com> The gene predictor generates the model. I don?t think snap will generate a model that contain an N. Augustus might be able to across a single codon (I?m not sure there). The N means that the nucleotide is unknown (i.e. it can be A, T, C or G). An NNN codon produces the amino acid X (which is the unknown amino acid code). So it is possible that for something as short as one or two codon?s that the predictor thinks it?s ok to assume that it will produce a valid codon and uses it to complete the reading frame. Alternatively if you are using est2genome=1 or est_gff then what you are seeing is just the result of an alignment which can align to a couple of N's. You should not use est2genome=1 for anything but training. Also est_gff or pred_gff will not be filtered if you supplied an feature location that includes an N. ?Carson > On Oct 25, 2017, at 1:25 AM, Wen Yao wrote: > > Dear guys, > > Recently, I run maker to annotate a genome. I found that the transcript fasta file output by Maker contains "NNN". Is this normal? > If not, what's going on? Is this a bug of maker or my configuration of maker is not correct? > I told maker to use snap and augustus for de novo prediction and use exonerate to align ESTs and proteins. > > Thanks! > > Wen Yao > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Oct 25 09:43:37 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 25 Oct 2017 09:43:37 -0600 Subject: [maker-devel] NNN in maker output transcript In-Reply-To: <96D45DF3-83D0-4EF3-AE29-1B929A369B81@gmail.com> References: <96D45DF3-83D0-4EF3-AE29-1B929A369B81@gmail.com> Message-ID: Also you can check the source of the model by looking a the name. i.e. does it have, augustus, snap, or est2genome in the name? ?Carson > On Oct 25, 2017, at 9:42 AM, Carson Holt wrote: > > The gene predictor generates the model. I don?t think snap will generate a model that contain an N. Augustus might be able to across a single codon (I?m not sure there). The N means that the nucleotide is unknown (i.e. it can be A, T, C or G). An NNN codon produces the amino acid X (which is the unknown amino acid code). So it is possible that for something as short as one or two codon?s that the predictor thinks it?s ok to assume that it will produce a valid codon and uses it to complete the reading frame. Alternatively if you are using est2genome=1 or est_gff then what you are seeing is just the result of an alignment which can align to a couple of N's. You should not use est2genome=1 for anything but training. Also est_gff or pred_gff will not be filtered if you supplied an feature location that includes an N. > > ?Carson > > > > >> On Oct 25, 2017, at 1:25 AM, Wen Yao wrote: >> >> Dear guys, >> >> Recently, I run maker to annotate a genome. I found that the transcript fasta file output by Maker contains "NNN". Is this normal? >> If not, what's going on? Is this a bug of maker or my configuration of maker is not correct? >> I told maker to use snap and augustus for de novo prediction and use exonerate to align ESTs and proteins. >> >> Thanks! >> >> Wen Yao >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From eennadi at gmail.com Thu Oct 26 15:34:33 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Thu, 26 Oct 2017 22:34:33 +0100 Subject: [maker-devel] How to remove contigs from GFF file Message-ID: Hello, I need to remove sequences from my GFF file can someone help me with command line for such removal ERROR: valid [SEQ_FEAT.FeatureBeginsOrEndsInGap] Feature begins or ends in gap starting at 17625 FEATURE: Gene: CR513_57782 <46071> [lcl|contig_14719:17653-17724] [lcl|contig_14719: delta, dna len= 17790] ERROR: valid [SEQ_INST.ShortSeq] Sequence only 2 residues BIOSEQ: gnl|aceprd|CR513_62412: raw, aa len= 2 Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From bmoore at genetics.utah.edu Fri Oct 27 07:17:41 2017 From: bmoore at genetics.utah.edu (Marvin B Moore) Date: Fri, 27 Oct 2017 13:17:41 +0000 Subject: [maker-devel] Backlash running through my sequence In-Reply-To: <8FD23F25-92D4-4A9C-873B-BB559B2CCBF4@illinois.edu> References: <09603A3A-9DC0-40DC-A111-9DC1FCDF80BB@gmail.com> <8FD23F25-92D4-4A9C-873B-BB559B2CCBF4@illinois.edu> Message-ID: <98FAE3F3-7C52-4EDA-8FBB-5F43DB7D54C9@umail.utah.edu> Those look suspiciously like the remnants of end-of-line control characters. Since Windows, Mac OS X and Linux all use slightly different control characters to mark end-of-line I?d look at the upstream path of where your files come from and how they?ve been processed by you or others upstream MAKER (were they generated or processed on a MS or Mac server). One bizarre example we?ve seen is that files that simply pass through an MS Outlook server as an e-mail attachment have had their end-of-line characters converted to MS format. Good luck? Barry On Oct 17, 2017, at 1:11 PM, Fields, Christopher J > wrote: I agree with Carson, though my guess is any fasta converters will either fail on these characters as non-IUPAC, or will silently remove them. Running them through a converter may not solve all the issues though, as the backslash also appears in the FASTA headers at the end of the line: cjfields-imac:MAKER cjfields$ grep '>' sample_1.fasta | grep '\\' >contig_134\ >contig_149\ >contig_158\ >contig_222\ >contig_316\ >contig_582\ >contig_634\ >contig_700\ >contig_741\ ? I?m curious, was this edited using any particular program prior to MAKER (or was this an amalgam of different files)? chris From: maker-devel > on behalf of Carson Holt > Date: Monday, October 16, 2017 at 11:22 AM To: Emmanuel Nnadi > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Backlash running through my sequence I would not just remove them. The fact they are there calls into question how they got there in the first place. If you generated this file yourself, you may want to intead use fasta_tool. ?Carson On Oct 15, 2017, at 3:32 PM, Emmanuel Nnadi > wrote: Hi all, I am trying to running annotation on some of my sequences but noticed that i have backslash that runs through the sequence. Please how do I remove them I attached the sequence Thanks Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From bmoore at genetics.utah.edu Fri Oct 27 07:24:44 2017 From: bmoore at genetics.utah.edu (Marvin B Moore) Date: Fri, 27 Oct 2017 13:24:44 +0000 Subject: [maker-devel] QI codes insufficient - how to get frac exons with EST only? In-Reply-To: References: <93934B45-909D-48FD-A840-B4F59F15AB53@gmail.com> <6A3091A3-5F0E-470D-89F3-4B6C16E50F4B@gmail.com> Message-ID: Also, you could probably build these overlap sets on the command line by subsetting the MAKER GFF3 file and then using BedTools intersect for overlap queries. Barry On Oct 11, 2017, at 10:19 PM, Matt Simenc > wrote: Very good, thank you! Matt On Wed, Oct 11, 2017 at 8:22 AM, Carson Holt > wrote: Also look at GAL for building GFF3 feature queries ?> https://github.com/The-Sequence-Ontology/GAL ?Carson On Oct 11, 2017, at 9:18 AM, Michael Campbell > wrote: Hi Matt, I have a hacky way that I?ve done it. It requires running MAKER two more times but they are quicker runs. To identify the genes that have protein support I pass all of the annotation back to MAKER using the model_gff option in the maker_opts.ctl file. Then I pull out all of the protein2genome features from the big MAKER GFF3 file and pass them in using the protein_gff option. I turn off all repeat masking and run MAKER. It runs fast because it doesn?t have to run any gene finders, align evidence, or repeatmask. In the output any gene with an AED less than 1 has protein support. Then I do the same thing with est2genome lines from the big GFF3 file and put them in as est_gff. The output of that one gives you genes with EST support. Then the genes with an AED of less than one in both sets have support from protein and EST. Hope this helps, Mike On Oct 11, 2017, at 10:53 AM, Matt Simenc > wrote: Hey MAKER people, I would like to make a Venn diagram showing the kinds of evidence supporting gene models in my MAKER annotation where the left side shows number of genes with EST support only, the right side shows number of genes with protein support only, and the intersection shows number of genes with EST and protein support. QI summary has: Fraction of exons that overlap an EST alignment Fraction of exons that overlap EST or Protein alignments Please correct me if I'm wrong, because I am interpreting the first to be fraction of exons that overlap an EST alignment and possibly also a protein alignment. If that is the case then we can't calculate the number of genes that overlap only EST or (EST and protein) from the QI information. Anyone have a way to do this or have a script to parse the MAKER GFF3 to get this? Thanks!!! Matt Simenc _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dandence at gmail.com Fri Oct 27 08:51:21 2017 From: dandence at gmail.com (Daniel Ence) Date: Fri, 27 Oct 2017 10:51:21 -0400 Subject: [maker-devel] How to remove contigs from GFF file In-Reply-To: References: Message-ID: Hi Emmanuel, can you send the command that produced the error? If you need to remove certain scaffolds or contigs from a gff3 file, you can use grep to to filter out certain scaffolds like this ?grep -v ?scaffold_name? gff3_file?. ~Daniel > On Oct 26, 2017, at 5:34 PM, Emmanuel Nnadi wrote: > > Hello, > > I need to remove sequences from my GFF file can someone help me with command line for such removal > > ERROR: valid [SEQ_FEAT.FeatureBeginsOrEndsInGap] Feature begins or ends in gap starting at 17625 FEATURE: Gene: CR513_57782 <46071> [lcl|contig_14719:17653-17724] [lcl|contig_14719: delta, dna len= 17790] > ERROR: valid [SEQ_INST.ShortSeq] Sequence only 2 residues BIOSEQ: gnl|aceprd|CR513_62412: raw, aa len= 2 > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From carsonhh at gmail.com Fri Oct 27 16:00:17 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 Oct 2017 16:00:17 -0600 Subject: [maker-devel] "ALRM" isn't numeric in exit - MAKER warning message In-Reply-To: References: Message-ID: <399AB5BD-2FC5-45F4-9AC8-1665CCFEA0D1@gmail.com> Hi Marivi, The only time MAKER uses the ALRM signal is during exit. Sometimes MPI_Finalize can freeze (it has to do with the fact it is being called from Perl). So we set an alarm just in case. Then if it takes to long we assume it is frozen and let things exit in a less than graceful way rather than let it block forever (it is already finished after all). The complaint you get may be because your system doesn?t support the alarm signal or forks.pm (which tries to intercept signals) is having an issue. Or it may just be ugliness related to parts of the process being killed with other parts still being active (it is an ungraceful exit after all). Or it may be another source of the ALRM all together (but I assume it is the MAKER ALRM given that it happens right after MAKER says it is finished). Thanks, Carson > On Oct 27, 2017, at 1:03 PM, Marivi Colle wrote: > > Hi Carson, > > After running MAKER, I checked my std output and here's the message at the end of the file. I was wondering what this warning message means? > > > Start_time: 1508465182 > End_time: 1508950543 > Elapsed: 485361 > > > Maker is now finished!!! > > Argument "ALRM" isn't numeric in exit at /opt/software/BioPerl/1.6.924--GCC-4.4.7/lib64/perl5/forks.pm line 2184. > Argument "ALRM" isn't numeric in exit at /opt/software/BioPerl/1.6.924--GCC-4.4.7/lib64/perl5/forks.pm line 2184. > Argument "ALRM" isn't numeric in exit at /opt/software/BioPerl/1.6.924--GCC-4.4.7/lib64/perl5/forks.pm line 2184. > Argument "ALRM" isn't numeric in exit at /opt/software/BioPerl/1.6.924--GCC-4.4.7/lib64/perl5/forks.pm line 2184 > > > Thank you. > Marivi > > > -- > Marivi G. Colle > Research Associate > Department of Horticulture > Michigan State University > 1066 Bogue St., East Lansing > Michigan 48824-1325, USA -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.tranvan at unil.ch Sat Oct 28 08:14:59 2017 From: patrick.tranvan at unil.ch (Patrick Tran Van) Date: Sat, 28 Oct 2017 14:14:59 +0000 Subject: [maker-devel] Advice on my pipeline In-Reply-To: <651D4267-0FD7-4A92-B778-8976B47353BB@gmail.com> References: <6b029690bace4d3fbae77c0bb1bddce8@prdexch02.ad.unil.ch> <1498470630221.84642@unil.ch> <696C51C6-5606-4ECB-A8B8-9C077182FFFA@gmail.com> <1498908228256.16549@unil.ch> <58E904BF-9AB8-4AC7-B10B-C902F414E03D@gmail.com> <1505986013492.52354@unil.ch>, <651D4267-0FD7-4A92-B778-8976B47353BB@gmail.com> Message-ID: <1509200133044.96929@unil.ch> Hi Carson, If I want to look for alternative splicing variant, can I just add the option alt_splice=1 only at the last round of maker or do I have to set it since the beggining ? (and perform the 4 rounds with this option). Cheers, Patrick ________________________________ From: Carson Holt Sent: Friday, September 22, 2017 10:08 PM To: Patrick Tran Van Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Advice on my pipeline The gff3 passthrough options are there to help users get old data into MAKER when they have lost access to the original files. But for iterative running of the pipeline, it is more effective just to rerun in place so MAKER can access the raw alignment reports. The raw reports from the alignments have more detail than what is stored in the GFF3. Details that are lost when trying to use the GFF3 as input. ?Carson On Sep 21, 2017, at 3:26 AM, Patrick Tran Van > wrote: Hi Carson, I have a doubt for the round 2, so in a previous reply you said: " Also it is more convenient to do each run in the same directory rather than supplying the previous run as GFF3 input. MAKER will automatically recycle previous results archived in the run directory when you do this. Using the maker_gff option is really more for getting data into the run from jobs performed a long time ago (so they can?t be run in the same directory). " Does it means that I don't need to modify the section : #-----Re-annotation Using MAKER Derived GFF3 ? If I let everything by default such as : altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no It will not look again for repeat and protein + transcriptome alignment ? Patrick Tran Van Groups Chapuisat, Robinson-Rechavi & Schwander Department of Ecology and Evolution University of Lausanne Le Biophore CH-1015 Lausanne Switzerland Office 3206 ________________________________ From: Carson Holt > Sent: Monday, July 3, 2017 10:50 PM To: Patrick Tran Van Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Advice on my pipeline maker2zff is just for SNAP training and not for gene filtering (please do not use it for filtering, it does not do what you think). So the final annotation set after maker with correct_est_fusion is 16,850. To decide which set is better, look at them in a browser (gene counts are not useful for guaging result). A well annotated genome will have evidence clusters that closely match the final models. A poorly annoted genome will have evidence clusters that are split or merged by the models. The corrected_est_fusion does two things. It trims long overlapping UTR fragments, and it stops evidence clusters from being merged on BLASTP evidence alone (so gene predictors will get unmerged hint regions if clusters are split). You may also find that using jaccard_clip with Trinity has reduced sensitivity for the transcript data (you may lose things that were there before, but now have better specificity, i.e. fewer false positives). Make sure you provided protein data from at least two related species to help maintain sensitivity lost form the transcript data. You can also add rejected genes models back in after the fact by using iprscan to identify unsupported models with identifiable protein domains ?> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4286374/ Thanks, Carson On Jul 1, 2017, at 5:21 AM, Patrick Tran Van > wrote: So I have assembled my transcriptome with Trinity using the jaccard clip option and I have run maker with and without corrected_est_fusion. I have then use SNAP to train/filter it with: maker2zff specie.all.gff Here are my results: Number of gene after maker -> Number of gene after maker2zff - Without corrected_est_fusion: 21621 -> 13875 - With corrected_est_fusion: 16850 -> 9098 1 )If I understand well how works corrected_est_fusion, because it prevents gene merging, shouldn't be the invert ? Normally I should find more genes with corrected_est_fusion right ? 2) I think I should find something like 13000-14000 genes for my specie. SHould I go with the "Without corrected_est_fusion" for the 2nd iteration of maker ? Thanks for your help Patrick Tran Van Groups Chapuisat, Robinson-Rechavi & Schwander Department of Ecology and Evolution University of Lausanne Le Biophore CH-1015 Lausanne Switzerland Office 3206 ________________________________ From: Carson Holt > Sent: Monday, June 26, 2017 11:38 PM To: Patrick Tran Van Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Advice on my pipeline Sorry the option is ?> correct_est_fusion It is in the maker_opts.ctl file. I would use both SNAP and Augustus on a few large contigs then review the results manually. If one of them is not behaving well, then drop it. If both behave well (i.e. correlate well with evidence alignemnts) then keep them both. ?Carson On Jun 26, 2017, at 3:48 AM, Patrick Tran Van > wrote: Thanks for your answer. 1) Do you think that adding a Augustus training in addition to SNAP at the step 3 and 5 will add more confidence (instead of adding Augustus only for the final round) ? Because I am using autoAug for this and it tooks a while to compute .. 2) I don't see this option : 'avoid_est_fusion=1' . I have tried to add it but I got this error: WARNING: Invalid option 'avoid_est_fusion' in control file maker_opts.ctl (I am using v 2.31.8 ) Patrick Tran Van Groups Chapuisat, Robinson-Rechavi & Schwander Department of Ecology and Evolution University of Lausanne Le Biophore CH-1015 Lausanne Switzerland Office 3206 ________________________________ From: Carson Holt > Sent: Monday, June 5, 2017 8:29 PM To: Patrick Tran Van Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Advice on my pipeline Your plan sounds good. A couple of related notes. Insect genomes tend to have high gene density, so gene merging will be the primary difficulty. You can avoid merging of mRNA-seq evidence by using options like jaccard_clip in Trinity. Then use avoid_est_fusion=1 inside of MAKER. Also it is more convenient to do each run in the same directory rather than supplying the previous run as GFF3 input. MAKER will automatically recycle previous results archived in the run directory when you do this. Using the maker_gff option is really more for getting data into the run from jobs performed a long time ago (so they can?t be run in the same directory). ?Carson On Jun 2, 2017, at 3:56 AM, Patrick Tran Van > wrote: Hello, This is my first time running Maker for an insect genome annotation. I have found various resources and tried to make a consensus, I am looking for your thoughts and advices about my pipeline, if I can improve something or doing useless things: What I have: - RNA evidence: transcriptome - Proteine evidence: swissprot/uniprot + busco protein set of insect - Cegma and busco results of my genome 1) Train SNAP with CEGMA 2) Run (run A) maker with repeat masking with transcript, protein, the new SNAP file (from step 1) and augustus file (from busco). 3) Create SNAP model from run A. 4) Run (run B ) with the new SNAP (done at step 3) with options turned off (est2genome=0) and (protein2genome=0) data, provide gff file (maker_gff=run_A.gff), turn off repeat masking (rm_pass=1), and use previous mapping results (altest_pass=1 and protein_pass=1). 5) Create SNAP model from run B. 6) Run (run C) with the new SNAP (done at step 5) with options turned off (est2genome=0) and (protein2genome=0) data, provide gff file (maker_gff=run_B.gff), turn off repeat masking (rm_pass=1), and use previous mapping results (altest_pass=1 and protein_pass=1). 7) Create SNAP model from run C AND Create Augustus gene model from run C 8) Run (run D) with the new SNAP (done at step 7) + AUGUSTUS file (step 7) with options turned off (est2genome=0) and (protein2genome=0) data, provide gff file (maker_gff=run_C.gff), turn off repeat masking (rm_pass=1), and use previous mapping results (altest_pass=1 and protein_pass=1). + Use keep_preds=1 Does it seems coherent ? Cheers, Patrick Tran Van Groups Chapuisat, Robinson-Rechavi & Schwander Department of Ecology and Evolution University of Lausanne Le Biophore CH-1015 Lausanne Switzerland Office 3206 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: