From dcg at cau.edu.cn Mon May 1 08:32:30 2017 From: dcg at cau.edu.cn (dcg at cau.edu.cn) Date: Mon, 1 May 2017 21:32:30 +0800 Subject: [maker-devel] Why my maker get no results? Message-ID: <2017050121323023791817@cau.edu.cn> Dear sir: I' have bben working on genome annotation these days.My process in as below: 1. I split my contigs into 300 parts and deal with them simultaneously to speed up my process. 2. I used my splited-genome, protein, ESTs and RNA-seq to make the first alignment( est2genome=1, AED_threshold=0.2 ). 3. Merge the maker.*_.master_datastore_index.log to get all the paths of results. 4. gff_merge script to merge all the results in different dirs. However, there is no results returned. (My genome is about 3GB, but the gff of result is none.) index_all.log.all.gff 1KB index_all.log.all.maker.proteins.fasta 2837KB index_all.log.all.maker.transcripts.fasta 9866KB Where can the problems take place? Thanks! Yours sincerely. Chao Chao dcg at cau.edu.cn -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 1 15:04:36 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 1 May 2017 14:04:36 -0600 Subject: [maker-devel] Why my maker get no results? In-Reply-To: <2017050121323023791817@cau.edu.cn> References: <2017050121323023791817@cau.edu.cn> Message-ID: <28772B4F-D674-49E2-BFBD-CE2651CE0454@gmail.com> You can merge datastore indexes that way. You will need to run them separately (i.e. unmodified location and content from what MAKER gave you), and then merge the fasta and gff3 files afterwards. ?Carson > On May 1, 2017, at 7:32 AM, dcg at cau.edu.cn wrote: > > Dear sir: > I' have bben working on genome annotation these days.My process in as below: > > 1. I split my contigs into 300 parts and deal with them simultaneously to speed up my process. > 2. I used my splited-genome, protein, ESTs and RNA-seq to make the first alignment( est2genome=1, AED_threshold=0.2 ). > 3. Merge the maker.*_.master_datastore_index.log to get all the paths of results. > 4. gff_merge script to merge all the results in different dirs. > > However, there is no results returned. (My genome is about 3GB, but the gff of result is none.) > index_all.log.all.gff 1KB > index_all.log.all.maker.proteins.fasta 2837KB > index_all.log.all.maker.transcripts.fasta 9866KB > > > > > Where can the problems take place? > Thanks! > Yours sincerely. > > Chao Chao > dcg at cau.edu.cn _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim03ljy at 126.com Wed May 3 07:15:22 2017 From: jim03ljy at 126.com (=?GBK?B?wqy98NStSmlt?=) Date: Wed, 3 May 2017 20:15:22 +0800 (CST) Subject: [maker-devel] RepeatMasker: NCBIBlastSearchEngine::search: Error Message-ID: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> Hi, I'm a newbie of maker. I met some errors in Repeatmasker step. The error is here: NCBIBlastSearchEngine::search: Error...compressed subject database (/home/softwares/RepeatMasker/Libraries/20170127/general/is.lib) does not exist! I tried ncbi+blast 2.5.0 version and 2.6.0 version as the path to blast, both have the same error. And when I use the command as "maker -R", which skips the repeatmasker step, the maker could work. I checked the former similar errors reported by another user and he solved the problem by updating the RepBase. So, I deleted and re-installed the RepeatMasker, updated the RepBase, also installed RMblast. The error is the same. I'm stuck in the problem now. Would highly appreciate any help - thanks! Jinyuan Lu Shanghai Jiao Tong University No. 800 Dong Chuan Road,Minhang District, Shanghai, P.R. China -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcg at cau.edu.cn Wed May 3 10:29:18 2017 From: dcg at cau.edu.cn (dcg at cau.edu.cn) Date: Wed, 3 May 2017 23:29:18 +0800 Subject: [maker-devel] How to explain the maker results? Message-ID: <2017050323291810262239@cau.edu.cn> Dear sir: I?ve been using maker to do my genome annotation. However, I still have something I can't understand: 1. After assembly, I have many contigs. Firstly, I set est2genome=1 and protein2genome=1 , with my proteins, ESTs and RNA-seq.. Which way below is correct? 1.1 Each contig has its own gff. I just use its own maker_gff file to get a pyu.hmm(be used in snap practice), and then, train the single contig. 1.2 I merge all the maker_gff to produce a pyu.hmm(for snap) , and then, use this pyu.hmm to train all the contigs. 2. The aim of my project is to find new protein, so I need to guarantee the rigor of my annotation. I made a plan that the predicted protein should be successfully aligned to the Uniprot(reviewed protein, total number is about 30K) with 100% identity and coverage. However, if I choose method 1.2 as above: After the first step (est2genome=1 and protein2genome=1), about 1600 proteins can be 100% aligned to the Uniprot. After 2 rounds training(est2genome=0 and protein2genome=0), less proteins can be 100% aligned. Is my test method reasonable? Why the final results can't get more well aligned proteins? After training and fasta_merge, the results can be index_all.log.all.maker.proteins.fasta, index_all.log.all.maker.snap_masked.proteins.fasta, index_all.log.all.maker.non_overlapping_ab_initio.proteins.fasta, which is the final results? I'm looking forward to hearing from you. Thanks! Yours sincerely! Chao Chao dcg at cau.edu.cn -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.ence at ufl.edu Wed May 3 10:49:08 2017 From: d.ence at ufl.edu (Ence,daniel) Date: Wed, 3 May 2017 15:49:08 +0000 Subject: [maker-devel] RepeatMasker: NCBIBlastSearchEngine::search: Error In-Reply-To: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> References: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> Message-ID: Hi, the error is regarding a specific file (is.lib) which isn?t being found. Can you verify that the file is there after you updated Repbase? Use the command: ?ls -l /home/softwares/RepeatMasker/Libraries/20170127/general/is.lib? Thanks, Daniel Ence On May 3, 2017, at 8:15 AM, ???Jim > wrote: Hi, I'm a newbie of maker. I met some errors in Repeatmasker step. The error is here: NCBIBlastSearchEngine::search: Error...compressed subject database (/home/softwares/RepeatMasker/Libraries/20170127/general/is.lib) does not exist! I tried ncbi+blast 2.5.0 version and 2.6.0 version as the path to blast, both have the same error. And when I use the command as "maker -R", which skips the repeatmasker step, the maker could work. I checked the former similar errors reported by another user and he solved the problem by updating the RepBase. So, I deleted and re-installed the RepeatMasker, updated the RepBase, also installed RMblast. The error is the same. I'm stuck in the problem now. Would highly appreciate any help - thanks! Jinyuan Lu Shanghai Jiao Tong University No. 800 Dong Chuan Road,Minhang District, Shanghai, P.R. China _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 3 10:53:40 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 3 May 2017 09:53:40 -0600 Subject: [maker-devel] RepeatMasker: NCBIBlastSearchEngine::search: Error In-Reply-To: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> References: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> Message-ID: <92771056-564B-4953-B738-5A1B97FC71AF@gmail.com> RepBase and RepeatMasker have changed structure with the new 4.0.7 released two months ago. The new version and RepBase is only compatible with the new version of RepeatMasker. You have to update both (complete reinstall). Or you have to use the previous version of RepeatMasker with the previous version of RepBase. ?Carson > On May 3, 2017, at 6:15 AM, ???Jim wrote: > > Hi, I'm a newbie of maker. > I met some errors in Repeatmasker step. > > The error is here: > NCBIBlastSearchEngine::search: Error...compressed subject database (/home/softwares/RepeatMasker/Libraries/20170127/general/is.lib) does not exist! > > I tried ncbi+blast 2.5.0 version and 2.6.0 version as the path to blast, both have the same error. > And when I use the command as "maker -R", which skips the repeatmasker step, the maker could work. > I checked the former similar errors reported by another user and he solved the problem by updating the RepBase. > So, > I deleted and re-installed the RepeatMasker, updated the RepBase, also installed RMblast. > The error is the same. > > I'm stuck in the problem now. > Would highly appreciate any help - thanks! > > Jinyuan Lu > Shanghai Jiao Tong University > No. 800 Dong Chuan Road,Minhang District, Shanghai, P.R. China > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 3 10:55:41 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 3 May 2017 09:55:41 -0600 Subject: [maker-devel] RepeatMasker: NCBIBlastSearchEngine::search: Error In-Reply-To: <92771056-564B-4953-B738-5A1B97FC71AF@gmail.com> References: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> <92771056-564B-4953-B738-5A1B97FC71AF@gmail.com> Message-ID: <2E668C83-8884-430B-A764-CD0B44D03D19@gmail.com> You may want to use the previous version of both as the new version may still have hidden bugs. ?Carson > On May 3, 2017, at 9:53 AM, Carson Holt wrote: > > RepBase and RepeatMasker have changed structure with the new 4.0.7 released two months ago. The new version and RepBase is only compatible with the new version of RepeatMasker. You have to update both (complete reinstall). Or you have to use the previous version of RepeatMasker with the previous version of RepBase. > > ?Carson > > > >> On May 3, 2017, at 6:15 AM, ???Jim > wrote: >> >> Hi, I'm a newbie of maker. >> I met some errors in Repeatmasker step. >> >> The error is here: >> NCBIBlastSearchEngine::search: Error...compressed subject database (/home/softwares/RepeatMasker/Libraries/20170127/general/is.lib) does not exist! >> >> I tried ncbi+blast 2.5.0 version and 2.6.0 version as the path to blast, both have the same error. >> And when I use the command as "maker -R", which skips the repeatmasker step, the maker could work. >> I checked the former similar errors reported by another user and he solved the problem by updating the RepBase. >> So, >> I deleted and re-installed the RepeatMasker, updated the RepBase, also installed RMblast. >> The error is the same. >> >> I'm stuck in the problem now. >> Would highly appreciate any help - thanks! >> >> Jinyuan Lu >> Shanghai Jiao Tong University >> No. 800 Dong Chuan Road,Minhang District, Shanghai, P.R. China >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 3 11:04:20 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 3 May 2017 10:04:20 -0600 Subject: [maker-devel] RepeatMasker: NCBIBlastSearchEngine::search: Error In-Reply-To: <2E668C83-8884-430B-A764-CD0B44D03D19@gmail.com> References: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> <92771056-564B-4953-B738-5A1B97FC71AF@gmail.com> <2E668C83-8884-430B-A764-CD0B44D03D19@gmail.com> Message-ID: You may have to contact RepBase via e-mail to find out how to get the libraries compatible with RepeatMasker 4.0.6 as it looks like they have removed the previous release from the website. The last release for 4.0.6 was ?> repeatmaskerlibraries-20160829.tar.gz ?Carson > On May 3, 2017, at 9:55 AM, Carson Holt wrote: > > You may want to use the previous version of both as the new version may still have hidden bugs. > > ?Carson > >> On May 3, 2017, at 9:53 AM, Carson Holt > wrote: >> >> RepBase and RepeatMasker have changed structure with the new 4.0.7 released two months ago. The new version and RepBase is only compatible with the new version of RepeatMasker. You have to update both (complete reinstall). Or you have to use the previous version of RepeatMasker with the previous version of RepBase. >> >> ?Carson >> >> >> >>> On May 3, 2017, at 6:15 AM, ???Jim > wrote: >>> >>> Hi, I'm a newbie of maker. >>> I met some errors in Repeatmasker step. >>> >>> The error is here: >>> NCBIBlastSearchEngine::search: Error...compressed subject database (/home/softwares/RepeatMasker/Libraries/20170127/general/is.lib) does not exist! >>> >>> I tried ncbi+blast 2.5.0 version and 2.6.0 version as the path to blast, both have the same error. >>> And when I use the command as "maker -R", which skips the repeatmasker step, the maker could work. >>> I checked the former similar errors reported by another user and he solved the problem by updating the RepBase. >>> So, >>> I deleted and re-installed the RepeatMasker, updated the RepBase, also installed RMblast. >>> The error is the same. >>> >>> I'm stuck in the problem now. >>> Would highly appreciate any help - thanks! >>> >>> Jinyuan Lu >>> Shanghai Jiao Tong University >>> No. 800 Dong Chuan Road,Minhang District, Shanghai, P.R. China >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 3 11:10:48 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 3 May 2017 10:10:48 -0600 Subject: [maker-devel] How to explain the maker results? In-Reply-To: <2017050323291810262239@cau.edu.cn> References: <2017050323291810262239@cau.edu.cn> Message-ID: <049F8AC8-7E16-4F05-B8B2-01CA7AB88751@gmail.com> Use the merged gff3 to train snap, otherwise you won?t have enough models. Info on training can be found on the wiki ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors Also you can find additional detailed info by searching the mailing list archives ?> http://groups.google.com/group/maker-devel I?m not sure what you are asking with the last question. Alignment is not a function of training, and will not be affected by the hmm, but 100% coverage and identity is too strict a threshold even for data derived from the same species. ?Carson > On May 3, 2017, at 9:29 AM, dcg at cau.edu.cn wrote: > > Dear sir: > I?ve been using maker to do my genome annotation. However, I still have something I can't understand: > > 1. After assembly, I have many contigs. Firstly, I set est2genome=1 and protein2genome=1 , with my proteins, ESTs and RNA-seq.. Which way below is correct? > 1.1 Each contig has its own gff. I just use its own maker_gff file to get a pyu.hmm(be used in snap practice), and then, train the single contig. > 1.2 I merge all the maker_gff to produce a pyu.hmm(for snap) , and then, use this pyu.hmm to train all the contigs. > > 2. The aim of my project is to find new protein, so I need to guarantee the rigor of my annotation. > I made a plan that the predicted protein should be successfully aligned to the Uniprot(reviewed protein, total number is about 30K) with 100% identity and coverage. > However, if I choose method 1.2 as above: > After the first step (est2genome=1 and protein2genome=1), about 1600 proteins can be 100% aligned to the Uniprot. After 2 rounds training(est2genome=0 and protein2genome=0), less proteins can be 100% aligned. > Is my test method reasonable? Why the final results can't get more well aligned proteins? > After training and fasta_merge, the results can be index_all.log.all.maker.proteins.fasta, index_all.log.all.maker.snap_masked.proteins.fasta, index_all.log.all.maker.non_overlapping_ab_initio.proteins.fasta, which is the final results? > > > I'm looking forward to hearing from you. Thanks! > Yours sincerely! > > > Chao Chao > dcg at cau.edu.cn _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan.ricks at gmail.com Wed May 3 11:19:57 2017 From: nathan.ricks at gmail.com (Nathan Ricks) Date: Wed, 3 May 2017 10:19:57 -0600 Subject: [maker-devel] Post Processing of Annotations Message-ID: Hi, I've been running your Maker pipeline, and I've reached Post Processing of Annotations portion. In your Online training you use the output.blastp and the outuput.iprscan files to help assign function. My question is what format do these files need to be in. Iprscan can produce files in a variety of formats: tsv, xml, gff3, html and SVG while blastp can produce the tabular, pairise, xml and a number of others. Thanks Nathan Ricks -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 3 11:30:03 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 3 May 2017 10:30:03 -0600 Subject: [maker-devel] Post Processing of Annotations In-Reply-To: References: Message-ID: Use blastp with the the tab delimited format option and the UniProt/Swiss-Prot database. What additional filters you choose to set (i.e. e-value limit) may vary, although I would recommend 1e-6 or lower. ?Carson > On May 3, 2017, at 10:19 AM, Nathan Ricks wrote: > > Hi, > I've been running your Maker pipeline, and I've reached Post Processing of Annotations portion. In your Online training you use the output.blastp and the outuput.iprscan files to help assign function. > My question is what format do these files need to be in. > Iprscan can produce files in a variety of formats: tsv, xml, gff3, html and SVG > while blastp can produce the tabular, pairise, xml and a number of others. > > Thanks > > Nathan Ricks > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From d.ence at ufl.edu Wed May 3 11:34:35 2017 From: d.ence at ufl.edu (Ence,daniel) Date: Wed, 3 May 2017 16:34:35 +0000 Subject: [maker-devel] Post Processing of Annotations In-Reply-To: References: Message-ID: Hi, The iprscan output should be in tsv format, which is tab-separated, and the usage statement for the maker_functional_gff says that the blastp output should be in ?wu-blast -mformat 2?, which I think is tabbed too. ~Daniel > On May 3, 2017, at 12:19 PM, Nathan Ricks wrote: > > Hi, > I've been running your Maker pipeline, and I've reached Post Processing of Annotations portion. In your Online training you use the output.blastp and the outuput.iprscan files to help assign function. > My question is what format do these files need to be in. > Iprscan can produce files in a variety of formats: tsv, xml, gff3, html and SVG > while blastp can produce the tabular, pairise, xml and a number of others. > > Thanks > > Nathan Ricks > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed May 3 14:20:31 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 3 May 2017 13:20:31 -0600 Subject: [maker-devel] Post Processing of Annotations In-Reply-To: References: Message-ID: <0EE0D7F6-5F28-46E7-9AB8-CED93DC811F6@gmail.com> The maker_functional_gff and maker_functional_fasta scripts pull specific fields out of the UniProt fasta header, so they are tied to the format used by UniProt/Swiss-Prot. At one time I had modified them to also work with NR, but that was several years ago, so I don?t know if it would still work. ?Carson > On May 3, 2017, at 1:10 PM, Nathan Ricks wrote: > > Is it possible to make my own database from sequences that I have downloaded form NCBI instead of using the UniProt/Swiss-Prot? > > On Wed, May 3, 2017 at 10:30 AM, Carson Holt > wrote: > Use blastp with the the tab delimited format option and the UniProt/Swiss-Prot database. What additional filters you choose to set (i.e. e-value limit) may vary, although I would recommend 1e-6 or lower. > > ?Carson > > > On May 3, 2017, at 10:19 AM, Nathan Ricks > wrote: > > > > Hi, > > I've been running your Maker pipeline, and I've reached Post Processing of Annotations portion. In your Online training you use the output.blastp and the outuput.iprscan files to help assign function. > > My question is what format do these files need to be in. > > Iprscan can produce files in a variety of formats: tsv, xml, gff3, html and SVG > > while blastp can produce the tabular, pairise, xml and a number of others. > > > > Thanks > > > > Nathan Ricks > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mjfi2sb3 at gmail.com Thu May 4 01:37:52 2017 From: mjfi2sb3 at gmail.com (Salim Bougouffa) Date: Thu, 04 May 2017 06:37:52 +0000 Subject: [maker-devel] advanced repeat masking library constructions & rna-seq assembly choices Message-ID: Hi, I am attempting to annotate a plant genome. I have a couple of questions: *1) RNA-seq assembly* a) I assembled my RNA-seq data using Trinity and StringTie. The two produce drastically different numbers. When I compare the two assemblies for each sample using TransRate, StringTie produces a higher score. for most of the assemblies. I see in all of the threads that you recommend Trinity but doesn't trinity produce way too many transcripts (even after chucking out the "bad" ones using transrate). b) During hint creation in MAKER, does it take into account that different transcripts have different read coverage (expression levels). I guess my question is should I filter transcripts that have a small read coverage. *2) Repeat Masking * I am following the advanced repeat library construction tutorial ( http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced). The initial steps find 15 sequences for the LTR and 159 for MITE. But, when I get to the perl DIR_CRL/CRL_Step4.pl step, both output files (Inner_Seq_For_BLAST.fasta, lLTRs_Seq_For_BLAST.fasta) are empty. a) are these numbers normal because I was expecting a lot more than 16 for the LTR? b) I don't get any errors when I run CRL_Step4.pl yet no output. What's going on?! Many thanks, /SB -- ____________________________ Sent from Inbox Mobile -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim03ljy at 126.com Thu May 4 01:36:12 2017 From: jim03ljy at 126.com (=?GBK?B?wqy98NStSmlt?=) Date: Thu, 4 May 2017 14:36:12 +0800 (CST) Subject: [maker-devel] RepeatMasker: NCBIBlastSearchEngine::search: Error In-Reply-To: References: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> <92771056-564B-4953-B738-5A1B97FC71AF@gmail.com> <2E668C83-8884-430B-A764-CD0B44D03D19@gmail.com> Message-ID: <12de56ed.5ded.15bd22c56c7.Coremail.jim03ljy@126.com> Thanks a lot! Problem solved. I matched the RepeatMasker 4.0.7 with RepBase20170127 and it worked! Thanks! ----Jinyuan Lu At 2017-05-04 00:04:20, "Carson Holt" wrote: You may have to contact RepBase via e-mail to find out how to get the libraries compatible with RepeatMasker 4.0.6 as it looks like they have removed the previous release from the website. The last release for 4.0.6 was ?> repeatmaskerlibraries-20160829.tar.gz ?Carson On May 3, 2017, at 9:55 AM, Carson Holt wrote: You may want to use the previous version of both as the new version may still have hidden bugs. ?Carson On May 3, 2017, at 9:53 AM, Carson Holt wrote: RepBase and RepeatMasker have changed structure with the new 4.0.7 released two months ago. The new version and RepBase is only compatible with the new version of RepeatMasker. You have to update both (complete reinstall). Or you have to use the previous version of RepeatMasker with the previous version of RepBase. ?Carson On May 3, 2017, at 6:15 AM, ???Jim wrote: Hi, I'm a newbie of maker. I met some errors in Repeatmasker step. The error is here: NCBIBlastSearchEngine::search: Error...compressed subject database (/home/softwares/RepeatMasker/Libraries/20170127/general/is.lib) does not exist! I tried ncbi+blast 2.5.0 version and 2.6.0 version as the path to blast, both have the same error. And when I use the command as "maker -R", which skips the repeatmasker step, the maker could work. I checked the former similar errors reported by another user and he solved the problem by updating the RepBase. So, I deleted and re-installed the RepeatMasker, updated the RepBase, also installed RMblast. The error is the same. I'm stuck in the problem now. Would highly appreciate any help - thanks! Jinyuan Lu Shanghai Jiao Tong University No. 800 Dong Chuan Road,Minhang District, Shanghai, P.R. China _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcg at cau.edu.cn Fri May 5 08:43:43 2017 From: dcg at cau.edu.cn (dcg at cau.edu.cn) Date: Fri, 5 May 2017 21:43:43 +0800 Subject: [maker-devel] How to evaluate maker proteins' quality? Message-ID: <2017050521434331108720@cau.edu.cn> Dear sir: After I finished my maker running, I should check the quality of my results. My annotation purpose is to find some new proteins. There is about 30K reviewed proteins of my species. If I want to see how many predicted proteins can support the reviewed proteins, how to do it?(Can blastp be OK? How to set the threshold? ) I used Uniprot, ESTs and RNA-seq to do my annotation. From my perspective, if the protein is reviewed and used to train snap/augustus, we should get the same one after several training rounds. So I planned to align maker_proteins to Uniprot proteins(which I utilized to annotate). If the predicted proteins match Uniprot by 100% identity and coverage, they can be thought to support the reviewed proteins. Is it correct? If not, maybe I can evaluate my proteins only by AED value and proteome domain? I'm looking forward to your help. Thanks a lot! Yours sincerely! Chao Chao dcg at cau.edu.cn -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun May 7 19:31:31 2017 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 7 May 2017 18:31:31 -0600 Subject: [maker-devel] How to evaluate maker proteins' quality? In-Reply-To: <2017050521434331108720@cau.edu.cn> References: <2017050521434331108720@cau.edu.cn> Message-ID: <51620CC3-43D9-47D5-B8B3-871F291D6518@gmail.com> Because of small differences in the assemblies, individual variants, annotated proteins used as reference being partial, as well as potential assembly error, a 100% identity expectation is too high. About 90+% would be more reasonable for a same species comparison. AED gives a good correlation with protein confidence. A perfect zero score will not happen often though since the way alignment algorithms work will leave alignment errors around splice sites and short exons. Also the evidence used is never perfect, so with AED lower values are better than higher values but can not be used as an overly specific measurement (it is only correlative and not exact). ?Carson > On May 5, 2017, at 7:43 AM, dcg at cau.edu.cn wrote: > > Dear sir: > After I finished my maker running, I should check the quality of my results. > My annotation purpose is to find some new proteins. > There is about 30K reviewed proteins of my species. If I want to see how many predicted proteins can support the reviewed proteins, how to do it?(Can blastp be OK? How to set the threshold? ) > I used Uniprot, ESTs and RNA-seq to do my annotation. From my perspective, if the protein is reviewed and used to train snap/augustus, we should get the same one after several training rounds. So I planned to align maker_proteins to Uniprot proteins(which I utilized to annotate). If the predicted proteins match Uniprot by 100% identity and coverage, they can be thought to support the reviewed proteins. Is it correct? > > If not, maybe I can evaluate my proteins only by AED value and proteome domain? > > I'm looking forward to your help. Thanks a lot! > > Yours sincerely! > > Chao Chao > dcg at cau.edu.cn _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun May 7 20:17:37 2017 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 7 May 2017 19:17:37 -0600 Subject: [maker-devel] advanced repeat masking library constructions & rna-seq assembly choices In-Reply-To: References: Message-ID: <18086AF2-01C3-4671-B974-C5FF36460618@gmail.com> Michael can you answer the second question (Michael wrote the protocol, so I CC?d him). With respect to the first question. Expression level is not necessarily relevant to the annotation process (so no MAKER does not look at read coverage). Instead we use the transcript assemblies to identify introns via splice aware alignment (yes it is the introns and not the exons we care about). Trinity has a nice option called jaccard_clip which avoids false merging of neighboring transcripts (mostly occurs in fungi where UTR can overlap). Merging of transcripts will cause extra introns to be assigned as hints as well as potential overextension of UTR during final polishing steps. The jaccard_clip option is the main reason we recommend Trinity. If Stringtie has a similar option, then it can be used as well. Thanks, Carson > On May 4, 2017, at 12:37 AM, Salim Bougouffa wrote: > > Hi, > > I am attempting to annotate a plant genome. I have a couple of questions: > > 1) RNA-seq assembly > a) I assembled my RNA-seq data using Trinity and StringTie. The two produce drastically different numbers. When I compare the two assemblies for each sample using TransRate, StringTie produces a higher score. for most of the assemblies. I see in all of the threads that you recommend Trinity but doesn't trinity produce way too many transcripts (even after chucking out the "bad" ones using transrate). > b) During hint creation in MAKER, does it take into account that different transcripts have different read coverage (expression levels). I guess my question is should I filter transcripts that have a small read coverage. > > 2) Repeat Masking > I am following the advanced repeat library construction tutorial (http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced ). The initial steps find 15 sequences for the LTR and 159 for MITE. But, when I get to the perl DIR_CRL/CRL_Step4.pl step, both output files (Inner_Seq_For_BLAST.fasta, lLTRs_Seq_For_BLAST.fasta) are empty. > > a) are these numbers normal because I was expecting a lot more than 16 for the LTR? > b) I don't get any errors when I run CRL_Step4.pl yet no output. What's going on?! > > Many thanks, > /SB > -- > ____________________________ > Sent from Inbox Mobile > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcampbel at cshl.edu Sun May 7 20:24:27 2017 From: mcampbel at cshl.edu (Campbell, Michael) Date: Mon, 8 May 2017 01:24:27 +0000 Subject: [maker-devel] advanced repeat masking library constructions & rna-seq assembly choices In-Reply-To: <18086AF2-01C3-4671-B974-C5FF36460618@gmail.com> References: <18086AF2-01C3-4671-B974-C5FF36460618@gmail.com> Message-ID: <076B034E-8107-49CE-90C7-277AA4AB4ED3@cshl.edu> Hi SB, I?ve added Ning Jaing to this email. She has put great effort into updating this protocol recently and will be able to address your questions better than I can. Ning, would you mind helping out with this? Thanks, Mike On May 7, 2017, at 9:17 PM, Carson Holt > wrote: Michael can you answer the second question (Michael wrote the protocol, so I CC?d him). With respect to the first question. Expression level is not necessarily relevant to the annotation process (so no MAKER does not look at read coverage). Instead we use the transcript assemblies to identify introns via splice aware alignment (yes it is the introns and not the exons we care about). Trinity has a nice option called jaccard_clip which avoids false merging of neighboring transcripts (mostly occurs in fungi where UTR can overlap). Merging of transcripts will cause extra introns to be assigned as hints as well as potential overextension of UTR during final polishing steps. The jaccard_clip option is the main reason we recommend Trinity. If Stringtie has a similar option, then it can be used as well. Thanks, Carson On May 4, 2017, at 12:37 AM, Salim Bougouffa > wrote: Hi, I am attempting to annotate a plant genome. I have a couple of questions: 1) RNA-seq assembly a) I assembled my RNA-seq data using Trinity and StringTie. The two produce drastically different numbers. When I compare the two assemblies for each sample using TransRate, StringTie produces a higher score. for most of the assemblies. I see in all of the threads that you recommend Trinity but doesn't trinity produce way too many transcripts (even after chucking out the "bad" ones using transrate). b) During hint creation in MAKER, does it take into account that different transcripts have different read coverage (expression levels). I guess my question is should I filter transcripts that have a small read coverage. 2) Repeat Masking I am following the advanced repeat library construction tutorial (http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced). The initial steps find 15 sequences for the LTR and 159 for MITE. But, when I get to the perl DIR_CRL/CRL_Step4.pl step, both output files (Inner_Seq_For_BLAST.fasta, lLTRs_Seq_For_BLAST.fasta) are empty. a) are these numbers normal because I was expecting a lot more than 16 for the LTR? b) I don't get any errors when I run CRL_Step4.pl yet no output. What's going on?! Many thanks, /SB -- ____________________________ Sent from Inbox Mobile _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From jiangn at msu.edu Mon May 8 10:50:45 2017 From: jiangn at msu.edu (Jiang, Ning) Date: Mon, 8 May 2017 15:50:45 +0000 Subject: [maker-devel] advanced repeat masking library constructions & rna-seq assembly choices In-Reply-To: <076B034E-8107-49CE-90C7-277AA4AB4ED3@cshl.edu> References: <18086AF2-01C3-4671-B974-C5FF36460618@gmail.com>, <076B034E-8107-49CE-90C7-277AA4AB4ED3@cshl.edu> Message-ID: Hi Salim, I am sorry to learn about the issues. it depends on the quality of your genome assembly for how many intact LTR elements you would get; however, 16 seems too low to me. The inner and LTR sequence file should NOT be empty. Some times the issue could be due to that the initial sequence name is long and complicated. If that's the case for your sequences, you might want to simplify your sequence name (only including letters and numbers) and try again. We are working on an automatic pipeline for LTR collection, if everything goes smoothly, it should be available in two to three months. Best wishes, Ning ________________________________ From: Campbell, Michael Sent: Sunday, May 7, 2017 9:24 PM To: Carson Holt Cc: Salim Bougouffa; maker-devel at yandell-lab.org List; Jiang, Ning Subject: Re: [maker-devel] advanced repeat masking library constructions & rna-seq assembly choices Hi SB, I?ve added Ning Jaing to this email. She has put great effort into updating this protocol recently and will be able to address your questions better than I can. Ning, would you mind helping out with this? Thanks, Mike On May 7, 2017, at 9:17 PM, Carson Holt > wrote: Michael can you answer the second question (Michael wrote the protocol, so I CC?d him). With respect to the first question. Expression level is not necessarily relevant to the annotation process (so no MAKER does not look at read coverage). Instead we use the transcript assemblies to identify introns via splice aware alignment (yes it is the introns and not the exons we care about). Trinity has a nice option called jaccard_clip which avoids false merging of neighboring transcripts (mostly occurs in fungi where UTR can overlap). Merging of transcripts will cause extra introns to be assigned as hints as well as potential overextension of UTR during final polishing steps. The jaccard_clip option is the main reason we recommend Trinity. If Stringtie has a similar option, then it can be used as well. Thanks, Carson On May 4, 2017, at 12:37 AM, Salim Bougouffa > wrote: Hi, I am attempting to annotate a plant genome. I have a couple of questions: 1) RNA-seq assembly a) I assembled my RNA-seq data using Trinity and StringTie. The two produce drastically different numbers. When I compare the two assemblies for each sample using TransRate, StringTie produces a higher score. for most of the assemblies. I see in all of the threads that you recommend Trinity but doesn't trinity produce way too many transcripts (even after chucking out the "bad" ones using transrate). b) During hint creation in MAKER, does it take into account that different transcripts have different read coverage (expression levels). I guess my question is should I filter transcripts that have a small read coverage. 2) Repeat Masking I am following the advanced repeat library construction tutorial (http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced). The initial steps find 15 sequences for the LTR and 159 for MITE. But, when I get to the perl DIR_CRL/CRL_Step4.pl step, both output files (Inner_Seq_For_BLAST.fasta, lLTRs_Seq_For_BLAST.fasta) are empty. a) are these numbers normal because I was expecting a lot more than 16 for the LTR? b) I don't get any errors when I run CRL_Step4.pl yet no output. What's going on?! Many thanks, /SB -- ____________________________ Sent from Inbox Mobile _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mjfi2sb3 at gmail.com Mon May 8 11:41:51 2017 From: mjfi2sb3 at gmail.com (Salim Bougouffa) Date: Mon, 08 May 2017 16:41:51 +0000 Subject: [maker-devel] advanced repeat masking library constructions & rna-seq assembly choices In-Reply-To: References: <18086AF2-01C3-4671-B974-C5FF36460618@gmail.com> <076B034E-8107-49CE-90C7-277AA4AB4ED3@cshl.edu> Message-ID: Thank you all for your responses. Regards, /SB On Mon, 8 May 2017, 18:50 Jiang, Ning, wrote: > Hi Salim, > > > I am sorry to learn about the issues. it depends on the quality of your > genome assembly for how many intact LTR elements you would get; however, 16 > seems too low to me. > > > The inner and LTR sequence file should NOT be empty. Some times the issue > could be due to that the initial sequence name is long and complicated. If > that's the case for your sequences, you might want to simplify your > sequence name (only including letters and numbers) and try again. > > > We are working on an automatic pipeline for LTR collection, if everything > goes smoothly, it should be available in two to three months. > > > Best wishes, > > > Ning > ------------------------------ > *From:* Campbell, Michael > *Sent:* Sunday, May 7, 2017 9:24 PM > *To:* Carson Holt > *Cc:* Salim Bougouffa; maker-devel at yandell-lab.org List; Jiang, Ning > *Subject:* Re: [maker-devel] advanced repeat masking library > constructions & rna-seq assembly choices > > Hi SB, > > I?ve added Ning Jaing to this email. She has put great effort into > updating this protocol recently and will be able to address your questions > better than I can. > > Ning, would you mind helping out with this? > > Thanks, > Mike > > On May 7, 2017, at 9:17 PM, Carson Holt carsonhh at gmail.com>> wrote: > > Michael can you answer the second question (Michael wrote the protocol, so > I CC?d him). > > With respect to the first question. Expression level is not necessarily > relevant to the annotation process (so no MAKER does not look at read > coverage). Instead we use the transcript assemblies to identify introns via > splice aware alignment (yes it is the introns and not the exons we care > about). Trinity has a nice option called jaccard_clip which avoids false > merging of neighboring transcripts (mostly occurs in fungi where UTR can > overlap). Merging of transcripts will cause extra introns to be assigned as > hints as well as potential overextension of UTR during final polishing > steps. The jaccard_clip option is the main reason we recommend Trinity. If > Stringtie has a similar option, then it can be used as well. > > Thanks, > Carson > > > > On May 4, 2017, at 12:37 AM, Salim Bougouffa mjfi2sb3 at gmail.com>> wrote: > > Hi, > > I am attempting to annotate a plant genome. I have a couple of questions: > > 1) RNA-seq assembly > a) I assembled my RNA-seq data using Trinity and StringTie. The two > produce drastically different numbers. When I compare the two assemblies > for each sample using TransRate, StringTie produces a higher score. for > most of the assemblies. I see in all of the threads that you recommend > Trinity but doesn't trinity produce way too many transcripts (even after > chucking out the "bad" ones using transrate). > b) During hint creation in MAKER, does it take into account that different > transcripts have different read coverage (expression levels). I guess my > question is should I filter transcripts that have a small read coverage. > > 2) Repeat Masking > I am following the advanced repeat library construction tutorial ( > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced). > The initial steps find 15 sequences for the LTR and 159 for MITE. But, when > I get to the perl DIR_CRL/CRL_Step4.pl step, both output files > (Inner_Seq_For_BLAST.fasta, lLTRs_Seq_For_BLAST.fasta) are empty. > > a) are these numbers normal because I was expecting a lot more than 16 for > the LTR? > b) I don't get any errors when I run CRL_Step4.pl yet no output. What's > going on?! > > Many thanks, > /SB > -- > > ____________________________ > Sent from Inbox Mobile > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Wed May 10 10:48:01 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Wed, 10 May 2017 11:48:01 -0400 Subject: [maker-devel] want coding sequences Message-ID: Hello: Thanks for development and maintenance of the tool "Maker2". We have used Maker2 to do genome annotation of a new rodent species. Now we are doing downstream analysis, which requires inputs of coding sequences from different species. I found the outputs I got from Maker2 only include protein sequences and transcripts. Is there an easy way that I can get the coding sequences for our annotated genome? Many thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From munholl at uwindsor.ca Wed May 10 15:24:49 2017 From: munholl at uwindsor.ca (Seth Munholland) Date: Wed, 10 May 2017 16:24:49 -0400 Subject: [maker-devel] MAKER only running 1 task Message-ID: Hello, I'm running a MAKER annotation on an ubuntu cluster and my top screen shows the following: Tasks: 831 total, 3 running, 826 sleeping, 0 stopped, 1 zombie and my maker run (on a screen) shows: ... total clusters:4 now processing 0 ...processing 0 of 3 ...processing 1 of 3 ...processing 2 of 3 total clusters:4 now processing 0 flattening protein clusters prepare section files merging blast reports... flattening protein clusters prepare section files merging blast reports... flattening protein clusters prepare section files Of the 826, the vast majority of them are maker and only one of the running tasks is maker. Is this normal behaviour or has my maker run stopped processing? Seth Munholland, B.Sc. Department of Biological Sciences Rm. 304 Biology Building University of Windsor 401 Sunset Ave. N9B 3P4 T: (519) 253-3000 Ext: 4755 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu May 11 10:58:59 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 11 May 2017 09:58:59 -0600 Subject: [maker-devel] want coding sequences In-Reply-To: References: Message-ID: <82DF4E49-8E78-45F6-8A78-01A45F908987@gmail.com> Use the fasta_tool utility with ?trim_maker_utr to get just the CDS part of each transcript. ?Carson > On May 10, 2017, at 9:48 AM, Quanwei Zhang wrote: > > Hello: > > Thanks for development and maintenance of the tool "Maker2". We have used Maker2 to do genome annotation of a new rodent species. Now we are doing downstream analysis, which requires inputs of coding sequences from different species. > > I found the outputs I got from Maker2 only include protein sequences and transcripts. Is there an easy way that I can get the coding sequences for our annotated genome? > > Many thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu May 11 11:03:04 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 11 May 2017 10:03:04 -0600 Subject: [maker-devel] MAKER only running 1 task In-Reply-To: References: Message-ID: <2119AF6E-3571-4D28-9D81-B24A513839E5@gmail.com> It may be frozen, or if it is on the last contig it can be running a non-paralelizable step (very last cluster merging step for each contig is not paralelizable). So on a large contig the very last step can take a little while, and if there are no other contigs, then there is no work to give to other processes to keep them busy in the meantime. So everyone has to wait so they can all exit together once the last step is done. But as I said, this will only happen if you are on the last contig and it is large. Otherwise it is probably frozen somehow (look for any errors further up the log). ?Carson > On May 10, 2017, at 2:24 PM, Seth Munholland wrote: > > Hello, > > I'm running a MAKER annotation on an ubuntu cluster and my top screen shows the following: > > Tasks: 831 total, 3 running, 826 sleeping, 0 stopped, 1 zombie > > and my maker run (on a screen) shows: > > ... > total clusters:4 now processing 0 > ...processing 0 of 3 > ...processing 1 of 3 > ...processing 2 of 3 > total clusters:4 now processing 0 > flattening protein clusters > prepare section files > merging blast reports... > flattening protein clusters > prepare section files > merging blast reports... > flattening protein clusters > prepare section files > > Of the 826, the vast majority of them are maker and only one of the running tasks is maker. Is this normal behaviour or has my maker run stopped processing? > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 <>_______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaymik at tgen.org Thu May 11 13:29:46 2017 From: mnaymik at tgen.org (Marcus Naymik) Date: Thu, 11 May 2017 11:29:46 -0700 Subject: [maker-devel] Maker gene vs snap match in final GFF's Message-ID: In the final GFF annotations what is the difference between a 'gene' from maker and a 'match' from snap? -- *This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu May 11 13:33:55 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 11 May 2017 12:33:55 -0600 Subject: [maker-devel] Maker gene vs snap match in final GFF's In-Reply-To: References: Message-ID: <8A77F384-9BAD-4BF4-BD1E-EDAE4E010612@gmail.com> MAKER results can be the result of additional hints sent to SNAP together with post processing to add UTR and additional exons that have support form transcript evidence. MAKER results will also have support from either protein or EST/mRNA evidence. SNAP match is simply the raw ab initio call made by SNAP (no hints, no post processing, and may or may not have evidence supporting the structure). They are there just for reference purposes. so you know what SNAP will produce outside of MAKER given the underlying HMM. ?Carson > On May 11, 2017, at 12:29 PM, Marcus Naymik wrote: > > In the final GFF annotations what is the difference between a 'gene' from maker and a 'match' from snap? > > This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.ence at ufl.edu Thu May 11 13:35:00 2017 From: d.ence at ufl.edu (Ence,daniel) Date: Thu, 11 May 2017 18:35:00 +0000 Subject: [maker-devel] Maker gene vs snap match in final GFF's In-Reply-To: References: Message-ID: <2560F44E-3D8D-4E81-B7B9-621921408B61@mail.ufl.edu> The two might have identical coordinates in some cases, but they are different kinds of features. The ?match? is a product of an abinitio gene prediction algorithm, while the ?gene? is is supported by evidence and passed through the maker polishing and filtering steps. On May 11, 2017, at 2:29 PM, Marcus Naymik > wrote: In the final GFF annotations what is the difference between a 'gene' from maker and a 'match' from snap? This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan.ricks at gmail.com Fri May 12 16:05:45 2017 From: nathan.ricks at gmail.com (Nathan Ricks) Date: Fri, 12 May 2017 15:05:45 -0600 Subject: [maker-devel] Using mpich2 with Maker Message-ID: I've been using maker for some time now. However, I would like to speed up the process by using the mpich2 option. When use the command ./Build install, the following error is produced. Any help would be appreciated. Nathan Ricks [image: Inline image 1] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 170139 bytes Desc: not available URL: From yuejiaxing at gmail.com Mon May 15 06:28:47 2017 From: yuejiaxing at gmail.com (Jia-Xing Yue) Date: Mon, 15 May 2017 13:28:47 +0200 Subject: [maker-devel] multiple overlapped snoRNA genes got annotated by maker Message-ID: Hello, I configured snoscan (v.0.9.1) for my maker installation (v2.31.9) and run the annotation for a yeast (S. cerevisiae) genome. I think the annotation went well with regard to tRNAs and protein-coding genes but I am not sure about snoRNAs. I found multiple overlapped snoRNA genes were annotated by maker as the example below shows. I was wondering if this is expected. If not, what might have caused this problem and is there a way to work around. Thanks in advance! chrIX maker gene 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding-gene-0.49 chrIX maker snoRNA 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|90|0 chrIX maker exon 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1:exon:12260;Parent=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1 chrIX maker gene 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding-gene-0.50 chrIX maker snoRNA 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|190|0 chrIX maker exon 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1:exon:12261;Parent=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1 chrIX maker gene 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding-gene-0.51 chrIX maker snoRNA 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|88|0 chrIX maker exon 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1:exon:12262;Parent=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1 chrIX maker gene 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding-gene-0.52 chrIX maker snoRNA 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|118|0 chrIX maker exon 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1:exon:12263;Parent=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1 chrIX maker gene 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding-gene-0.53 chrIX maker snoRNA 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|127|0 chrIX maker exon 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1:exon:12264;Parent=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1 Best, Jia-Xing -- Jia-Xing Yue Population Genomics and Complex Traits Group Tour Pasteur 8eme etage Facult? de M?decine Institute for Research on Cancer and Aging, Nice (IRCAN) CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) 28 Avenue de Valombrose 06107 NICE Cedex 2 France Personal website: http://www.iamphioxus.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon May 15 11:28:58 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 15 May 2017 12:28:58 -0400 Subject: [maker-devel] multiple overlapped snoRNA genes got annotated by maker In-Reply-To: References: Message-ID: <0AC3F89F-28EB-4E2A-ADDE-2DF8BD625416@gmail.com> Hi Jia-Xing, That has been my experience in the past as well. For the non-coding RNAs tRNA-scan is very accurate while snoscan seems to be quite sensitive but very specific. Did you give it a ?snoscan_meth? file? Giving it a snoscan_meth file will help with accuracy. The biggest gains in accuracy are from small RNA-seq data. In the paper where we used snoscan on maize we didn?t keep any snoRNA predictions that didn?t have support from small RNA-seq data, in practical terms we got rid of anything with a AED of 1. I hope this helps, Mike > On May 15, 2017, at 7:28 AM, Jia-Xing Yue wrote: > > Hello, > > I configured snoscan (v.0.9.1) for my maker installation (v2.31.9) and run the annotation for a yeast (S. cerevisiae) genome. I think the annotation went well with regard to tRNAs and protein-coding genes but I am not sure about snoRNAs. I found multiple overlapped snoRNA genes were annotated by maker as the example below shows. I was wondering if this is expected. If not, what might have caused this problem and is there a way to work around. Thanks in advance! > > chrIX maker gene 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding-gene-0.49 > chrIX maker snoRNA 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|90|0 > chrIX maker exon 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1:exon:12260;Parent=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1 > chrIX maker gene 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding-gene-0.50 > chrIX maker snoRNA 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|190|0 > chrIX maker exon 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1:exon:12261;Parent=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1 > chrIX maker gene 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding-gene-0.51 > chrIX maker snoRNA 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|88|0 > chrIX maker exon 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1:exon:12262;Parent=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1 > chrIX maker gene 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding-gene-0.52 > chrIX maker snoRNA 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|118|0 > chrIX maker exon 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1:exon:12263;Parent=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1 > chrIX maker gene 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding-gene-0.53 > chrIX maker snoRNA 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|127|0 > chrIX maker exon 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1:exon:12264;Parent=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1 > > Best, > Jia-Xing > > > -- > Jia-Xing Yue > > Population Genomics and Complex Traits Group > Tour Pasteur 8eme etage > Facult? de M?decine > Institute for Research on Cancer and Aging, Nice (IRCAN) > CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) > 28 Avenue de Valombrose > 06107 NICE Cedex 2 > France > > Personal website: http://www.iamphioxus.org/ > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From yuejiaxing at gmail.com Mon May 15 12:14:26 2017 From: yuejiaxing at gmail.com (Jia-Xing Yue) Date: Mon, 15 May 2017 19:14:26 +0200 Subject: [maker-devel] multiple overlapped snoRNA genes got annotated by maker In-Reply-To: <0AC3F89F-28EB-4E2A-ADDE-2DF8BD625416@gmail.com> References: <0AC3F89F-28EB-4E2A-ADDE-2DF8BD625416@gmail.com> Message-ID: Hi Michael, Many thanks for the information! I will specify the "snoscan_meth" file and give it another try then. I majorly want to use maker to annotate protein-coding genes and tRNAs. But it would be nice to have snoRNA reasonably annotated as well. Thanks gain and have a great day! Best, Jia-Xing On Mon, May 15, 2017 at 6:28 PM, Michael Campbell < michael.s.campbell1 at gmail.com> wrote: > Hi Jia-Xing, > > That has been my experience in the past as well. For the non-coding RNAs > tRNA-scan is very accurate while snoscan seems to be quite sensitive but > very specific. Did you give it a ?snoscan_meth? file? Giving it > a snoscan_meth file will help with accuracy. The biggest gains in accuracy > are from small RNA-seq data. In the paper where we used snoscan on maize we > didn?t keep any snoRNA predictions that didn?t have support from small > RNA-seq data, in practical terms we got rid of anything with a AED of 1. > > I hope this helps, > Mike > > On May 15, 2017, at 7:28 AM, Jia-Xing Yue wrote: > > Hello, > > I configured snoscan (v.0.9.1) for my maker installation (v2.31.9) and run > the annotation for a yeast (S. cerevisiae) genome. I think the annotation > went well with regard to tRNAs and protein-coding genes but I am not sure > about snoRNAs. I found multiple overlapped snoRNA genes were annotated by > maker as the example below shows. I was wondering if this is expected. If > not, what might have caused this problem and is there a way to work around. > Thanks in advance! > > chrIX maker gene 4328 4416 . + . > ID=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX- > noncoding-gene-0.49 > chrIX maker snoRNA 4328 4416 . + . > ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;Parent= > snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX- > noncoding-gene-0.49-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|90|0 > chrIX maker exon 4328 4416 . + . > ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1:exon:12260; > Parent=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1 > chrIX maker gene 4375 4563 . + . > ID=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX- > noncoding-gene-0.50 > chrIX maker snoRNA 4375 4563 . + . > ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;Parent= > snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX- > noncoding-gene-0.50-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1| > 0|0|-1|0|1|190|0 > chrIX maker exon 4375 4563 . + . > ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1:exon:12261; > Parent=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1 > chrIX maker gene 4375 4461 . + . > ID=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX- > noncoding-gene-0.51 > chrIX maker snoRNA 4375 4461 . + . > ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;Parent= > snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX- > noncoding-gene-0.51-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|88|0 > chrIX maker exon 4375 4461 . + . > ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1:exon:12262; > Parent=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1 > chrIX maker gene 4375 4491 . + . > ID=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX- > noncoding-gene-0.52 > chrIX maker snoRNA 4375 4491 . + . > ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;Parent= > snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX- > noncoding-gene-0.52-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1| > 0|0|-1|0|1|118|0 > chrIX maker exon 4375 4491 . + . > ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1:exon:12263; > Parent=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1 > chrIX maker gene 4375 4500 . + . > ID=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX- > noncoding-gene-0.53 > chrIX maker snoRNA 4375 4500 . + . > ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;Parent= > snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX- > noncoding-gene-0.53-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1| > 0|0|-1|0|1|127|0 > chrIX maker exon 4375 4500 . + . > ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1:exon:12264; > Parent=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1 > > Best, > Jia-Xing > > > -- > Jia-Xing Yue > > Population Genomics and Complex Traits Group > Tour Pasteur 8eme etage > Facult? de M?decine > Institute for Research on Cancer and Aging, Nice (IRCAN) > CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) > 28 Avenue de Valombrose > 06107 NICE Cedex 2 > France > > Personal website: http://www.iamphioxus.org/ > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -- Jia-Xing Yue Population Genomics and Complex Traits Group Tour Pasteur 8eme etage Facult? de M?decine Institute for Research on Cancer and Aging, Nice (IRCAN) CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) 28 Avenue de Valombrose 06107 NICE Cedex 2 France Personal website: http://www.iamphioxus.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 16 09:51:00 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 16 May 2017 08:51:00 -0600 Subject: [maker-devel] Using mpich2 with Maker In-Reply-To: References: Message-ID: <6C739B3B-D5D6-4263-A73C-4EA1762B1EE1@gmail.com> You probably need to reinstall Parse::RecDescent, Inline, Inline::C, or all of the above via CPAN (perl?s module installer). The ones already installed on your system may have issues. If you do not have the ability to install modules, you can install them just for your user using local::lib and the bootstrapping instructions here ?> http://search.cpan.org/~haarg/local-lib-2.000019/lib/local/lib.pm#The_bootstrapping_technique Then reinstall MAKER. ?Carson > On May 12, 2017, at 3:05 PM, Nathan Ricks wrote: > > I've been using maker for some time now. However, I would like to speed up the process by using the mpich2 option. When use the command ./Build install, the following error is produced. Any help would be appreciated. > > Nathan Ricks > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From salim.bougouffa at kaust.edu.sa Sun May 21 02:45:50 2017 From: salim.bougouffa at kaust.edu.sa (Salim Bougouffa) Date: Sun, 21 May 2017 10:45:50 +0300 Subject: [maker-devel] augustus exon calling ~ Message-ID: Hi Maker folks, I have several issues with a plant genome annotation that I am currently doing but perhaps the most recurrent issues are: 1/ CDSs that are missed where significant rna-seq evidence is there (figure artemis01) 2/ vice versa where one or two exons are added without rna-seq evidence/intron hints (figure artemis02) info about the runs: 1/ using augustus with a pre-existing model for a related plant that has high homology to the one I am annotating 2/ umask=1 (seems to do better than umask=0; is this a good thing to do) 3/ evm = 1 (seems to perform better than emv=0) 4/ repeatmasking (denovo + repbase) Best, /SB _______________________________________________________Salim Bougouffa(PhD), Postdoctoral Fellow 4700 KAUST, CBRC, Blg3. Office4326-WS05, Thuwal, Jeddah, KSA, 23955-6900 (966) 012 808 2963 || salim.bougouff at kaust.edu.sa -- ------------------------------ This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: artemis01.png Type: image/png Size: 166352 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: artemis02.png Type: image/png Size: 97140 bytes Desc: not available URL: From mjfi2sb3 at gmail.com Sun May 21 02:48:48 2017 From: mjfi2sb3 at gmail.com (Salim Bougouffa) Date: Sun, 21 May 2017 07:48:48 +0000 Subject: [maker-devel] augustus exon calling ~ Message-ID: Hi Maker folks, I have several issues with a plant genome annotation that I am currently doing but perhaps the most recurrent issues are: 1/ CDSs that are missed where significant rna-seq evidence is there (figure artemis01) 2/ vice versa where one or two exons are added without rna-seq evidence/intron hints (figure artemis02) info about the runs: 1/ using augustus with a pre-existing model for a related plant that has high homology to the one I am annotating 2/ umask=1 (seems to do better than umask=0; is this a good thing to do) 3/ evm = 1 (seems to perform better than emv=0) 4/ repeatmasking (denovo + repbase) Best, /SB -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: artemis01.png Type: image/png Size: 166352 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: artemis02.png Type: image/png Size: 97140 bytes Desc: not available URL: From mjfi2sb3 at gmail.com Sun May 21 02:55:31 2017 From: mjfi2sb3 at gmail.com (Salim Bougouffa) Date: Sun, 21 May 2017 07:55:31 +0000 Subject: [maker-devel] augustus exon calling ~ In-Reply-To: References: Message-ID: Hi, I should have mentioned a third scenario where an exon is not called fully by maker despite augustus getting it right (figure artemis03) [image: artemis03.png] On Sun, 21 May 2017 at 10:48 Salim Bougouffa wrote: > Hi Maker folks, > > I have several issues with a plant genome annotation that I am currently > doing but perhaps the most recurrent issues are: > > 1/ CDSs that are missed where significant rna-seq evidence is there > (figure artemis01) > 2/ vice versa where one or two exons are added without rna-seq > evidence/intron hints (figure artemis02) > > info about the runs: > 1/ using augustus with a pre-existing model for a related plant that has > high homology to the one I am annotating > 2/ umask=1 (seems to do better than umask=0; is this a good thing to do) > 3/ evm = 1 (seems to perform better than emv=0) > 4/ repeatmasking (denovo + repbase) > > Best, > /SB > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: artemis03.png Type: image/png Size: 145860 bytes Desc: not available URL: From admin at genome.arizona.edu Tue May 23 14:52:17 2017 From: admin at genome.arizona.edu (System Admin) Date: Tue, 23 May 2017 12:52:17 -0700 Subject: [maker-devel] Hyperthreading Message-ID: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> We are using maker in a cluster with mpich. Currently hyperthreading is on and we use 'mpiexec -n ' to start maker. Our machinelist file for mpich specifies the total emulated cores for each node. With hyperthreading on, we have up to 256 total emulated cores available. Which is the optimal scenario? 1. Use '-n 256' 2. Use '-n 128' with hyperthreading still on 3. Use '-n 128' with hyperthreading turned off Thanks From carsonhh at gmail.com Tue May 23 15:19:29 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 23 May 2017 14:19:29 -0600 Subject: [maker-devel] Hyperthreading In-Reply-To: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> References: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> Message-ID: MAKER is more of a pipeline. It will launch external tools on as many CPUs as you give it with the mpiexec command. I?ve found that many of the tools used get a boost with hyperthreading even though optimizations are not explicitly built into their code. The short answer is you would have to try it both ways. I doubt there will be much more than a 10-15% difference in runtime. You can pull back to 128 would if you find that you are running low on RAM or have a high IO burden (both of which will double if you go from 128 to 256 even though CPU isn?t really doubling). Also MAKER per job performance plateaus at around 200 processes due to communication overhead. Above that threshold it is often useful to divide datasets into multiple separate jobs that can run simultaneously. ?Carson > On May 23, 2017, at 1:52 PM, System Admin wrote: > > We are using maker in a cluster with mpich. Currently hyperthreading is on and we use 'mpiexec -n ' to start maker. Our machinelist file for mpich specifies the total emulated cores for each node. > With hyperthreading on, we have up to 256 total emulated cores available. > > Which is the optimal scenario? > 1. Use '-n 256' > 2. Use '-n 128' with hyperthreading still on > 3. Use '-n 128' with hyperthreading turned off > > Thanks > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From admin at genome.arizona.edu Tue May 23 15:31:48 2017 From: admin at genome.arizona.edu (admin at genome.arizona.edu) Date: Tue, 23 May 2017 13:31:48 -0700 Subject: [maker-devel] Hyperthreading In-Reply-To: References: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> Message-ID: Carson Holt wrote on 05/23/2017 01:19 PM: > You can pull back to 128 would if you find that you are running low on RAM or have a high IO burden (both of which will double if you go from 128 to 256 even though CPU isn?t really doubling). Also MAKER per job performance plateaus at around 200 processes due to communication overhead. Above that threshold it is often useful to divide datasets into multiple separate jobs that can run simultaneously. Yes, with '-n 192' we found the load on the cluster will initially go up to 360-380 but then continually decreases until maker is finished. Memory usage was very low during the processing (under 20%). From carsonhh at gmail.com Tue May 23 15:38:42 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 23 May 2017 14:38:42 -0600 Subject: [maker-devel] Hyperthreading In-Reply-To: References: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> Message-ID: <47000CA1-0F27-40B2-8BB7-3289C2010853@gmail.com> Also make sure cpus in the control file are set to 1 when using MPI. Otherwise it will tell each program it calls to try and use more CPUs per call. ?Carson > On May 23, 2017, at 2:31 PM, admin at genome.arizona.edu wrote: > > Carson Holt wrote on 05/23/2017 01:19 PM: >> You can pull back to 128 would if you find that you are running low on RAM or have a high IO burden (both of which will double if you go from 128 to 256 even though CPU isn?t really doubling). Also MAKER per job performance plateaus at around 200 processes due to communication overhead. Above that threshold it is often useful to divide datasets into multiple separate jobs that can run simultaneously. > > Yes, with '-n 192' we found the load on the cluster will initially go up to 360-380 but then continually decreases until maker is finished. Memory usage was very low during the processing (under 20%). > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mmokrejs at gmail.com Tue May 23 15:45:21 2017 From: mmokrejs at gmail.com (=?UTF-8?Q?Martin_MOKREJ=c5=a0?=) Date: Tue, 23 May 2017 22:45:21 +0200 Subject: [maker-devel] Hyperthreading In-Reply-To: References: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> Message-ID: <30ae31f7-264e-44d0-30bc-a010de3e54a7@gmail.com> admin at genome.arizona.edu wrote: > Carson Holt wrote on 05/23/2017 01:19 PM: >> You can pull back to 128 would if you find that you are running low on RAM or have a high IO burden (both of which will double if you go from 128 to 256 even though CPU isn?t really doubling). Also MAKER per job performance plateaus at around 200 processes due to communication overhead. Above that threshold it is often useful to divide datasets into multiple separate jobs that can run simultaneously. > > Yes, with '-n 192' we found the load on the cluster will initially go up to 360-380 but then continually decreases until maker is finished. Memory usage was very low during the processing (under 20%). Hi, the high load could be caused by disk IO or other reasons. The only proof is to run top, htop or similar and check that the processes are in *running* state ("R" is displayed in the status column). There could be "S" (sleep) when task is waiting for data input or output and also "D"(disk) coudl be shown when waiting for disk IO (unlike network IO). Martin From mmokrejs at gmail.com Tue May 23 15:51:18 2017 From: mmokrejs at gmail.com (=?UTF-8?Q?Martin_MOKREJ=c5=a0?=) Date: Tue, 23 May 2017 22:51:18 +0200 Subject: [maker-devel] Hyperthreading In-Reply-To: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> References: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> Message-ID: <54fbcfd6-ba28-e570-cfab-a6d83620f747@gmail.com> System Admin wrote: > We are using maker in a cluster with mpich. Currently hyperthreading is on and we use 'mpiexec -n ' to start maker. Our machinelist file for mpich specifies the total emulated cores for each node. > With hyperthreading on, we have up to 256 total emulated cores available. > > Which is the optimal scenario? > 1. Use '-n 256' > 2. Use '-n 128' with hyperthreading still on > 3. Use '-n 128' with hyperthreading turned off Go for 3. but make sure to disable *hyperthreading* in the kernel of the machines as well. I also disable multicore scheduler (which should again be helping if there are more long-term running processes than physical cores available and if some should probably share a cache). We do not have such jobs, hmmer and blast are mostly accessing data from memory, so the CPU cache is not much relevant for these. Hyperthreading only helps if jobs are lousy, waiting for some input/output etc., and in that case *it helps* if another process can be executed on the CPU core (hopefully not having same bottleneck). This is generally a helped in bad situations. You are after good setup, so disable hyperthreading in kernel, load only that many jobs equal to the number of physical CPI cores, and monitor performance. If jobs are starving, resolve the issue. Martin From admin at genome.arizona.edu Tue May 23 15:57:56 2017 From: admin at genome.arizona.edu (admin at genome.arizona.edu) Date: Tue, 23 May 2017 13:57:56 -0700 Subject: [maker-devel] Hyperthreading In-Reply-To: <47000CA1-0F27-40B2-8BB7-3289C2010853@gmail.com> References: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> <47000CA1-0F27-40B2-8BB7-3289C2010853@gmail.com> Message-ID: Carson Holt wrote on 05/23/2017 01:38 PM: > Also make sure cpus in the control file are set to 1 when using MPI. > Otherwise it will tell each program it calls to try and use more > CPUs per call. Yes we are using cpus=1 in the control file Thanks From carsonhh at gmail.com Tue May 23 16:03:17 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 23 May 2017 15:03:17 -0600 Subject: [maker-devel] Hyperthreading In-Reply-To: References: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> Message-ID: <054B76C7-FAC0-4B9B-A6D1-29A8202E35B3@gmail.com> One last thing to check if using CentOS or RedHat. I?ve seen it happen on a handful of clusters where transparent hugepages can create odd load issues and very high sys CPU usage under top (not just with maker but with BWA, GATK, and other programs that can have larger memory footprints). If using CentOS or RedHat, you may want to disable defrag for hugepages. You do this on CentOS 6 to disable it (the process is similar on CentOS 7 and RedHat but you may have to google it) ?> echo never > /sys/kernel/mm/transparent_hugepage/defrag echo 0 > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag ?Carson > On May 23, 2017, at 2:31 PM, admin at genome.arizona.edu wrote: > > Carson Holt wrote on 05/23/2017 01:19 PM: >> You can pull back to 128 would if you find that you are running low on RAM or have a high IO burden (both of which will double if you go from 128 to 256 even though CPU isn?t really doubling). Also MAKER per job performance plateaus at around 200 processes due to communication overhead. Above that threshold it is often useful to divide datasets into multiple separate jobs that can run simultaneously. > > Yes, with '-n 192' we found the load on the cluster will initially go up to 360-380 but then continually decreases until maker is finished. Memory usage was very low during the processing (under 20%). > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue May 23 16:34:15 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 23 May 2017 15:34:15 -0600 Subject: [maker-devel] augustus exon calling ~ In-Reply-To: References: Message-ID: EVM works extremely well when evidence closely matches the predictions and there are no assembly anomalies affecting ORF. Otherwise, EVM performs very very poorly. Also I would not set unmask=1. It adds noise to the calls. Note in all cases given, gene models are from Augustus (MAKER doesn?t make predictions). MAKER just provides hints that Augustus can use for the second call set. Hints boost the score a model gets whenever a feature matches the hint. What you see as an Augustus match/match_part feature are just references of what Augustus calls without hints. So if I tell Augustus there is probably an exon/intron at location X, then any model that includes that exon/intron will bump up its score thus causing Augustus to keep models that match the hints and report those over models that don?t match. However if there is an issue with the evidence (i.e. merge mRNA-seq assembly), or an issue with the assembly (base change generates an early stop codon or causes a frameshift), then Augustus may choose to truncate or skip an exon in order to capture the bonus from downstream hints. So it is unlikely that there is a workable model that capture the exact intron exon structure because it breaks the ORF at some point. So Augustus instead produces the best model it can to capture as many hint bonuses as it can. That being said, look for any odd hint sources like very poor protein or transcript evidence alignments. Eliminating bad hints will improve performance (if using mRNA-seq assemblies Trinity has a jaccard_clip option which helps avoid false merging of transcript evidence for example). Or if an organism you used for protein evidence constantly produces bad protein alignments, then you may want to drop it completely from evidence. Finally training Augustus on the genome being annotated will help improve performance (note just because a species is closely related in evolutionary space does not mean that its HMM's will perform well; it?s a common fallacy about ab initio prediction discussed in the SNAP paper). Also try adding another gene predictor like SNAP to see if it hurts or helps. ?Carson > On May 21, 2017, at 1:48 AM, Salim Bougouffa wrote: > > Hi Maker folks, > > I have several issues with a plant genome annotation that I am currently doing but perhaps the most recurrent issues are: > > 1/ CDSs that are missed where significant rna-seq evidence is there (figure artemis01) > 2/ vice versa where one or two exons are added without rna-seq evidence/intron hints (figure artemis02) > > info about the runs: > 1/ using augustus with a pre-existing model for a related plant that has high homology to the one I am annotating > 2/ umask=1 (seems to do better than umask=0; is this a good thing to do) > 3/ evm = 1 (seems to perform better than emv=0) > 4/ repeatmasking (denovo + repbase) > > Best, > /SB > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan.ricks at gmail.com Tue May 23 16:23:49 2017 From: nathan.ricks at gmail.com (Nathan Ricks) Date: Tue, 23 May 2017 15:23:49 -0600 Subject: [maker-devel] maker_functional_gff Message-ID: I've been working with maker and trying to use the maker_functional_gff to create an annotated .gff file. However, whenever I run the command, the following pops up, and just continues for a long time. Use of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58363. Use of uninitialized value $qid in hash element at ./maker_functional_gff line 170, <$IN> line 58363. Use of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58367. Use of uninitialized value $qid in hash element at ./maker_functional_gff line 170, <$IN> line 58367. ^CUse of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58368. Nathan Ricks -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.ence at ufl.edu Tue May 23 16:39:32 2017 From: d.ence at ufl.edu (Ence,daniel) Date: Tue, 23 May 2017 21:39:32 +0000 Subject: [maker-devel] maker_functional_gff In-Reply-To: References: Message-ID: <9388406A-302C-4B19-9F35-D56C06CC9582@mail.ufl.edu> Hi Nathan, can you send the command line that you?re using and is giving the error? Thanks, Daniel Ence > On May 23, 2017, at 5:23 PM, Nathan Ricks wrote: > > I've been working with maker and trying to use the maker_functional_gff to create an annotated .gff file. However, whenever I run the command, the following pops up, and just continues for a long time. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58363. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 170, <$IN> line 58363. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58367. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 170, <$IN> line 58367. > ^CUse of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58368. > > > Nathan Ricks > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue May 23 16:44:05 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 23 May 2017 15:44:05 -0600 Subject: [maker-devel] maker_functional_gff In-Reply-To: References: Message-ID: <25B003D7-6A19-4486-B21F-71070F00A580@gmail.com> The blast report you gave it is in the wrong format, it is partial/truncated, or you provided the files in the wrong order. Basically it receive an empty line from the file at some point. The blast report format must in tabular foramt which is "wu-blast -mformat 2? or "ncbi-blast -outfmt 6" Also the script only supports blast results against UniProt/Swiss-prot. ?Carson > On May 23, 2017, at 3:23 PM, Nathan Ricks wrote: > > I've been working with maker and trying to use the maker_functional_gff to create an annotated .gff file. However, whenever I run the command, the following pops up, and just continues for a long time. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58363. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 170, <$IN> line 58363. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58367. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 170, <$IN> line 58367. > ^CUse of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58368. > > > Nathan Ricks > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From yuejiaxing at gmail.com Fri May 26 04:28:48 2017 From: yuejiaxing at gmail.com (Jia-Xing Yue) Date: Fri, 26 May 2017 11:28:48 +0200 Subject: [maker-devel] multiple overlapped snoRNA genes got annotated by maker In-Reply-To: References: <0AC3F89F-28EB-4E2A-ADDE-2DF8BD625416@gmail.com> Message-ID: Hi Michael, This is a follow-up for the snoscan issue. I found the snoscan_meth option seems have been removed in the current maker_opts.ctl template file (v2.31.9). This option used to be there according to this post ( https://www.biostars.org/p/217240/). I manually specified this option in my maker_opts.ctl file but I don't think maker has correctly recognized this option: STATUS: Parsing control files... WARNING: Invalid option 'snoscan_meth' in control file maker_opts.ctl ... Do you know is there a way to work around this problem? Thanks! Best, Jia-Xing On Mon, May 15, 2017 at 7:14 PM, Jia-Xing Yue wrote: > Hi Michael, > > Many thanks for the information! I will specify the "snoscan_meth" file > and give it another try then. I majorly want to use maker to annotate > protein-coding genes and tRNAs. But it would be nice to have snoRNA > reasonably annotated as well. > Thanks gain and have a great day! > > Best, > Jia-Xing > > > On Mon, May 15, 2017 at 6:28 PM, Michael Campbell < > michael.s.campbell1 at gmail.com> wrote: > >> Hi Jia-Xing, >> >> That has been my experience in the past as well. For the non-coding RNAs >> tRNA-scan is very accurate while snoscan seems to be quite sensitive but >> very specific. Did you give it a ?snoscan_meth? file? Giving it >> a snoscan_meth file will help with accuracy. The biggest gains in accuracy >> are from small RNA-seq data. In the paper where we used snoscan on maize we >> didn?t keep any snoRNA predictions that didn?t have support from small >> RNA-seq data, in practical terms we got rid of anything with a AED of 1. >> >> I hope this helps, >> Mike >> >> On May 15, 2017, at 7:28 AM, Jia-Xing Yue wrote: >> >> Hello, >> >> I configured snoscan (v.0.9.1) for my maker installation (v2.31.9) and >> run the annotation for a yeast (S. cerevisiae) genome. I think the >> annotation went well with regard to tRNAs and protein-coding genes but I am >> not sure about snoRNAs. I found multiple overlapped snoRNA genes were >> annotated by maker as the example below shows. I was wondering if this is >> expected. If not, what might have caused this problem and is there a way to >> work around. Thanks in advance! >> >> chrIX maker gene 4328 4416 . + . >> ID=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-nonc >> oding-gene-0.49 >> chrIX maker snoRNA 4328 4416 . + . >> ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;Parent=snoscan >> -chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding- >> gene-0.49-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|90|0 >> chrIX maker exon 4328 4416 . + . >> ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1:exon:12260;Par >> ent=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1 >> chrIX maker gene 4375 4563 . + . >> ID=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-nonc >> oding-gene-0.50 >> chrIX maker snoRNA 4375 4563 . + . >> ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;Parent=snoscan >> -chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding- >> gene-0.50-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|190|0 >> chrIX maker exon 4375 4563 . + . >> ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1:exon:12261;Par >> ent=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1 >> chrIX maker gene 4375 4461 . + . >> ID=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-nonc >> oding-gene-0.51 >> chrIX maker snoRNA 4375 4461 . + . >> ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;Parent=snoscan >> -chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding- >> gene-0.51-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|88|0 >> chrIX maker exon 4375 4461 . + . >> ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1:exon:12262;Par >> ent=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1 >> chrIX maker gene 4375 4491 . + . >> ID=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-nonc >> oding-gene-0.52 >> chrIX maker snoRNA 4375 4491 . + . >> ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;Parent=snoscan >> -chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding- >> gene-0.52-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|118|0 >> chrIX maker exon 4375 4491 . + . >> ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1:exon:12263;Par >> ent=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1 >> chrIX maker gene 4375 4500 . + . >> ID=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-nonc >> oding-gene-0.53 >> chrIX maker snoRNA 4375 4500 . + . >> ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;Parent=snoscan >> -chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding- >> gene-0.53-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|127|0 >> chrIX maker exon 4375 4500 . + . >> ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1:exon:12264;Par >> ent=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1 >> >> Best, >> Jia-Xing >> >> >> -- >> Jia-Xing Yue >> >> Population Genomics and Complex Traits Group >> Tour Pasteur 8eme etage >> Facult? de M?decine >> Institute for Research on Cancer and Aging, Nice (IRCAN) >> CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) >> 28 Avenue de Valombrose >> 06107 NICE Cedex 2 >> France >> >> Personal website: http://www.iamphioxus.org/ >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > > -- > Jia-Xing Yue > > Population Genomics and Complex Traits Group > Tour Pasteur 8eme etage > Facult? de M?decine > Institute for Research on Cancer and Aging, Nice (IRCAN) > CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) > 28 Avenue de Valombrose > 06107 NICE Cedex 2 > France > > Personal website: http://www.iamphioxus.org/ > > -- Jia-Xing Yue Population Genomics and Complex Traits Group Tour Pasteur 8eme etage Facult? de M?decine Institute for Research on Cancer and Aging, Nice (IRCAN) CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) 28 Avenue de Valombrose 06107 NICE Cedex 2 France Personal website: http://www.iamphioxus.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Fri May 26 08:54:44 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Fri, 26 May 2017 09:54:44 -0400 Subject: [maker-devel] multiple overlapped snoRNA genes got annotated by maker In-Reply-To: References: <0AC3F89F-28EB-4E2A-ADDE-2DF8BD625416@gmail.com> Message-ID: <6461FDD0-BE78-403A-9FEF-E71C3D24F2CA@gmail.com> Hi Jia-Xing, v2.31.9 may not have had that option. I know that it is in the v3.00.0 version, so you best option may be to update. Thanks, Mike > On May 26, 2017, at 5:28 AM, Jia-Xing Yue wrote: > > Hi Michael, > > This is a follow-up for the snoscan issue. I found the snoscan_meth option seems have been removed in the current maker_opts.ctl template file (v2.31.9). This option used to be there according to this post (https://www.biostars.org/p/217240/ ). I manually specified this option in my maker_opts.ctl file but I don't think maker has correctly recognized this option: > > > STATUS: Parsing control files... > WARNING: Invalid option 'snoscan_meth' in control file maker_opts.ctl > ... > > Do you know is there a way to work around this problem? Thanks! > > Best, > Jia-Xing > > > > On Mon, May 15, 2017 at 7:14 PM, Jia-Xing Yue > wrote: > Hi Michael, > > Many thanks for the information! I will specify the "snoscan_meth" file and give it another try then. I majorly want to use maker to annotate protein-coding genes and tRNAs. But it would be nice to have snoRNA reasonably annotated as well. > Thanks gain and have a great day! > > Best, > Jia-Xing > > > On Mon, May 15, 2017 at 6:28 PM, Michael Campbell > wrote: > Hi Jia-Xing, > > That has been my experience in the past as well. For the non-coding RNAs tRNA-scan is very accurate while snoscan seems to be quite sensitive but very specific. Did you give it a ?snoscan_meth? file? Giving it a snoscan_meth file will help with accuracy. The biggest gains in accuracy are from small RNA-seq data. In the paper where we used snoscan on maize we didn?t keep any snoRNA predictions that didn?t have support from small RNA-seq data, in practical terms we got rid of anything with a AED of 1. > > I hope this helps, > Mike >> On May 15, 2017, at 7:28 AM, Jia-Xing Yue > wrote: >> >> Hello, >> >> I configured snoscan (v.0.9.1) for my maker installation (v2.31.9) and run the annotation for a yeast (S. cerevisiae) genome. I think the annotation went well with regard to tRNAs and protein-coding genes but I am not sure about snoRNAs. I found multiple overlapped snoRNA genes were annotated by maker as the example below shows. I was wondering if this is expected. If not, what might have caused this problem and is there a way to work around. Thanks in advance! >> >> chrIX maker gene 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding-gene-0.49 >> chrIX maker snoRNA 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|90|0 >> chrIX maker exon 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1:exon:12260;Parent=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1 >> chrIX maker gene 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding-gene-0.50 >> chrIX maker snoRNA 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|190|0 >> chrIX maker exon 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1:exon:12261;Parent=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1 >> chrIX maker gene 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding-gene-0.51 >> chrIX maker snoRNA 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|88|0 >> chrIX maker exon 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1:exon:12262;Parent=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1 >> chrIX maker gene 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding-gene-0.52 >> chrIX maker snoRNA 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|118|0 >> chrIX maker exon 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1:exon:12263;Parent=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1 >> chrIX maker gene 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding-gene-0.53 >> chrIX maker snoRNA 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|127|0 >> chrIX maker exon 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1:exon:12264;Parent=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1 >> >> Best, >> Jia-Xing >> >> >> -- >> Jia-Xing Yue >> >> Population Genomics and Complex Traits Group >> Tour Pasteur 8eme etage >> Facult? de M?decine >> Institute for Research on Cancer and Aging, Nice (IRCAN) >> CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) >> 28 Avenue de Valombrose >> 06107 NICE Cedex 2 >> France >> >> Personal website: http://www.iamphioxus.org/ >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > Jia-Xing Yue > > Population Genomics and Complex Traits Group > Tour Pasteur 8eme etage > Facult? de M?decine > Institute for Research on Cancer and Aging, Nice (IRCAN) > CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) > 28 Avenue de Valombrose > 06107 NICE Cedex 2 > France > > Personal website: http://www.iamphioxus.org/ > > > > > -- > Jia-Xing Yue > > Population Genomics and Complex Traits Group > Tour Pasteur 8eme etage > Facult? de M?decine > Institute for Research on Cancer and Aging, Nice (IRCAN) > CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) > 28 Avenue de Valombrose > 06107 NICE Cedex 2 > France > > Personal website: http://www.iamphioxus.org/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yuejiaxing at gmail.com Fri May 26 09:20:03 2017 From: yuejiaxing at gmail.com (Jia-Xing Yue) Date: Fri, 26 May 2017 16:20:03 +0200 Subject: [maker-devel] multiple overlapped snoRNA genes got annotated by maker In-Reply-To: <6461FDD0-BE78-403A-9FEF-E71C3D24F2CA@gmail.com> References: <0AC3F89F-28EB-4E2A-ADDE-2DF8BD625416@gmail.com> <6461FDD0-BE78-403A-9FEF-E71C3D24F2CA@gmail.com> Message-ID: I see. Thanks Michael! Best, Jia-Xing On Fri, May 26, 2017 at 3:54 PM, Michael Campbell < michael.s.campbell1 at gmail.com> wrote: > Hi Jia-Xing, > > v2.31.9 may not have had that option. I know that it is in the v3.00.0 > version, so you best option may be to update. > > Thanks, > Mike > > On May 26, 2017, at 5:28 AM, Jia-Xing Yue wrote: > > Hi Michael, > > This is a follow-up for the snoscan issue. I found the snoscan_meth option > seems have been removed in the current maker_opts.ctl template file > (v2.31.9). This option used to be there according to this post ( > https://www.biostars.org/p/217240/). I manually specified this option in > my maker_opts.ctl file but I don't think maker has correctly recognized > this option: > > > STATUS: Parsing control files... > WARNING: Invalid option 'snoscan_meth' in control file maker_opts.ctl > ... > > Do you know is there a way to work around this problem? Thanks! > > Best, > Jia-Xing > > > > On Mon, May 15, 2017 at 7:14 PM, Jia-Xing Yue > wrote: > >> Hi Michael, >> >> Many thanks for the information! I will specify the "snoscan_meth" file >> and give it another try then. I majorly want to use maker to annotate >> protein-coding genes and tRNAs. But it would be nice to have snoRNA >> reasonably annotated as well. >> Thanks gain and have a great day! >> >> Best, >> Jia-Xing >> >> >> On Mon, May 15, 2017 at 6:28 PM, Michael Campbell < >> michael.s.campbell1 at gmail.com> wrote: >> >>> Hi Jia-Xing, >>> >>> That has been my experience in the past as well. For the non-coding RNAs >>> tRNA-scan is very accurate while snoscan seems to be quite sensitive but >>> very specific. Did you give it a ?snoscan_meth? file? Giving it >>> a snoscan_meth file will help with accuracy. The biggest gains in accuracy >>> are from small RNA-seq data. In the paper where we used snoscan on maize we >>> didn?t keep any snoRNA predictions that didn?t have support from small >>> RNA-seq data, in practical terms we got rid of anything with a AED of 1. >>> >>> I hope this helps, >>> Mike >>> >>> On May 15, 2017, at 7:28 AM, Jia-Xing Yue wrote: >>> >>> Hello, >>> >>> I configured snoscan (v.0.9.1) for my maker installation (v2.31.9) and >>> run the annotation for a yeast (S. cerevisiae) genome. I think the >>> annotation went well with regard to tRNAs and protein-coding genes but I am >>> not sure about snoRNAs. I found multiple overlapped snoRNA genes were >>> annotated by maker as the example below shows. I was wondering if this is >>> expected. If not, what might have caused this problem and is there a way to >>> work around. Thanks in advance! >>> >>> chrIX maker gene 4328 4416 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-nonc >>> oding-gene-0.49 >>> chrIX maker snoRNA 4328 4416 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;Parent=snoscan >>> -chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding-gene >>> -0.49-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|90|0 >>> chrIX maker exon 4328 4416 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1:exon:12260;Par >>> ent=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1 >>> chrIX maker gene 4375 4563 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-nonc >>> oding-gene-0.50 >>> chrIX maker snoRNA 4375 4563 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;Parent=snoscan >>> -chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding-gene >>> -0.50-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|190|0 >>> chrIX maker exon 4375 4563 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1:exon:12261;Par >>> ent=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1 >>> chrIX maker gene 4375 4461 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-nonc >>> oding-gene-0.51 >>> chrIX maker snoRNA 4375 4461 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;Parent=snoscan >>> -chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding-gene >>> -0.51-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|88|0 >>> chrIX maker exon 4375 4461 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1:exon:12262;Par >>> ent=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1 >>> chrIX maker gene 4375 4491 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-nonc >>> oding-gene-0.52 >>> chrIX maker snoRNA 4375 4491 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;Parent=snoscan >>> -chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding-gene >>> -0.52-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|118|0 >>> chrIX maker exon 4375 4491 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1:exon:12263;Par >>> ent=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1 >>> chrIX maker gene 4375 4500 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-nonc >>> oding-gene-0.53 >>> chrIX maker snoRNA 4375 4500 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;Parent=snoscan >>> -chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding-gene >>> -0.53-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|127|0 >>> chrIX maker exon 4375 4500 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1:exon:12264;Par >>> ent=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1 >>> >>> Best, >>> Jia-Xing >>> >>> >>> -- >>> Jia-Xing Yue >>> >>> Population Genomics and Complex Traits Group >>> Tour Pasteur 8eme etage >>> Facult? de M?decine >>> Institute for Research on Cancer and Aging, Nice (IRCAN) >>> CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) >>> 28 Avenue de Valombrose >>> 06107 NICE Cedex 2 >>> France >>> >>> Personal website: http://www.iamphioxus.org/ >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >> >> >> -- >> Jia-Xing Yue >> >> Population Genomics and Complex Traits Group >> Tour Pasteur 8eme etage >> Facult? de M?decine >> Institute for Research on Cancer and Aging, Nice (IRCAN) >> CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) >> 28 Avenue de Valombrose >> 06107 NICE Cedex 2 >> France >> >> Personal website: http://www.iamphioxus.org/ >> >> > > > -- > Jia-Xing Yue > > Population Genomics and Complex Traits Group > Tour Pasteur 8eme etage > Facult? de M?decine > Institute for Research on Cancer and Aging, Nice (IRCAN) > CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) > 28 Avenue de Valombrose > 06107 NICE Cedex 2 > France > > Personal website: http://www.iamphioxus.org/ > > > -- Jia-Xing Yue Population Genomics and Complex Traits Group Tour Pasteur 8eme etage Facult? de M?decine Institute for Research on Cancer and Aging, Nice (IRCAN) CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) 28 Avenue de Valombrose 06107 NICE Cedex 2 France Personal website: http://www.iamphioxus.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcg at cau.edu.cn Mon May 1 07:32:30 2017 From: dcg at cau.edu.cn (dcg at cau.edu.cn) Date: Mon, 1 May 2017 21:32:30 +0800 Subject: [maker-devel] Why my maker get no results? Message-ID: <2017050121323023791817@cau.edu.cn> Dear sir: I' have bben working on genome annotation these days.My process in as below: 1. I split my contigs into 300 parts and deal with them simultaneously to speed up my process. 2. I used my splited-genome, protein, ESTs and RNA-seq to make the first alignment( est2genome=1, AED_threshold=0.2 ). 3. Merge the maker.*_.master_datastore_index.log to get all the paths of results. 4. gff_merge script to merge all the results in different dirs. However, there is no results returned. (My genome is about 3GB, but the gff of result is none.) index_all.log.all.gff 1KB index_all.log.all.maker.proteins.fasta 2837KB index_all.log.all.maker.transcripts.fasta 9866KB Where can the problems take place? Thanks! Yours sincerely. Chao Chao dcg at cau.edu.cn -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 1 14:04:36 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 1 May 2017 14:04:36 -0600 Subject: [maker-devel] Why my maker get no results? In-Reply-To: <2017050121323023791817@cau.edu.cn> References: <2017050121323023791817@cau.edu.cn> Message-ID: <28772B4F-D674-49E2-BFBD-CE2651CE0454@gmail.com> You can merge datastore indexes that way. You will need to run them separately (i.e. unmodified location and content from what MAKER gave you), and then merge the fasta and gff3 files afterwards. ?Carson > On May 1, 2017, at 7:32 AM, dcg at cau.edu.cn wrote: > > Dear sir: > I' have bben working on genome annotation these days.My process in as below: > > 1. I split my contigs into 300 parts and deal with them simultaneously to speed up my process. > 2. I used my splited-genome, protein, ESTs and RNA-seq to make the first alignment( est2genome=1, AED_threshold=0.2 ). > 3. Merge the maker.*_.master_datastore_index.log to get all the paths of results. > 4. gff_merge script to merge all the results in different dirs. > > However, there is no results returned. (My genome is about 3GB, but the gff of result is none.) > index_all.log.all.gff 1KB > index_all.log.all.maker.proteins.fasta 2837KB > index_all.log.all.maker.transcripts.fasta 9866KB > > > > > Where can the problems take place? > Thanks! > Yours sincerely. > > Chao Chao > dcg at cau.edu.cn _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim03ljy at 126.com Wed May 3 06:15:22 2017 From: jim03ljy at 126.com (=?GBK?B?wqy98NStSmlt?=) Date: Wed, 3 May 2017 20:15:22 +0800 (CST) Subject: [maker-devel] RepeatMasker: NCBIBlastSearchEngine::search: Error Message-ID: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> Hi, I'm a newbie of maker. I met some errors in Repeatmasker step. The error is here: NCBIBlastSearchEngine::search: Error...compressed subject database (/home/softwares/RepeatMasker/Libraries/20170127/general/is.lib) does not exist! I tried ncbi+blast 2.5.0 version and 2.6.0 version as the path to blast, both have the same error. And when I use the command as "maker -R", which skips the repeatmasker step, the maker could work. I checked the former similar errors reported by another user and he solved the problem by updating the RepBase. So, I deleted and re-installed the RepeatMasker, updated the RepBase, also installed RMblast. The error is the same. I'm stuck in the problem now. Would highly appreciate any help - thanks! Jinyuan Lu Shanghai Jiao Tong University No. 800 Dong Chuan Road,Minhang District, Shanghai, P.R. China -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcg at cau.edu.cn Wed May 3 09:29:18 2017 From: dcg at cau.edu.cn (dcg at cau.edu.cn) Date: Wed, 3 May 2017 23:29:18 +0800 Subject: [maker-devel] How to explain the maker results? Message-ID: <2017050323291810262239@cau.edu.cn> Dear sir: I?ve been using maker to do my genome annotation. However, I still have something I can't understand: 1. After assembly, I have many contigs. Firstly, I set est2genome=1 and protein2genome=1 , with my proteins, ESTs and RNA-seq.. Which way below is correct? 1.1 Each contig has its own gff. I just use its own maker_gff file to get a pyu.hmm(be used in snap practice), and then, train the single contig. 1.2 I merge all the maker_gff to produce a pyu.hmm(for snap) , and then, use this pyu.hmm to train all the contigs. 2. The aim of my project is to find new protein, so I need to guarantee the rigor of my annotation. I made a plan that the predicted protein should be successfully aligned to the Uniprot(reviewed protein, total number is about 30K) with 100% identity and coverage. However, if I choose method 1.2 as above: After the first step (est2genome=1 and protein2genome=1), about 1600 proteins can be 100% aligned to the Uniprot. After 2 rounds training(est2genome=0 and protein2genome=0), less proteins can be 100% aligned. Is my test method reasonable? Why the final results can't get more well aligned proteins? After training and fasta_merge, the results can be index_all.log.all.maker.proteins.fasta, index_all.log.all.maker.snap_masked.proteins.fasta, index_all.log.all.maker.non_overlapping_ab_initio.proteins.fasta, which is the final results? I'm looking forward to hearing from you. Thanks! Yours sincerely! Chao Chao dcg at cau.edu.cn -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.ence at ufl.edu Wed May 3 09:49:08 2017 From: d.ence at ufl.edu (Ence,daniel) Date: Wed, 3 May 2017 15:49:08 +0000 Subject: [maker-devel] RepeatMasker: NCBIBlastSearchEngine::search: Error In-Reply-To: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> References: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> Message-ID: Hi, the error is regarding a specific file (is.lib) which isn?t being found. Can you verify that the file is there after you updated Repbase? Use the command: ?ls -l /home/softwares/RepeatMasker/Libraries/20170127/general/is.lib? Thanks, Daniel Ence On May 3, 2017, at 8:15 AM, ???Jim > wrote: Hi, I'm a newbie of maker. I met some errors in Repeatmasker step. The error is here: NCBIBlastSearchEngine::search: Error...compressed subject database (/home/softwares/RepeatMasker/Libraries/20170127/general/is.lib) does not exist! I tried ncbi+blast 2.5.0 version and 2.6.0 version as the path to blast, both have the same error. And when I use the command as "maker -R", which skips the repeatmasker step, the maker could work. I checked the former similar errors reported by another user and he solved the problem by updating the RepBase. So, I deleted and re-installed the RepeatMasker, updated the RepBase, also installed RMblast. The error is the same. I'm stuck in the problem now. Would highly appreciate any help - thanks! Jinyuan Lu Shanghai Jiao Tong University No. 800 Dong Chuan Road,Minhang District, Shanghai, P.R. China _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 3 09:53:40 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 3 May 2017 09:53:40 -0600 Subject: [maker-devel] RepeatMasker: NCBIBlastSearchEngine::search: Error In-Reply-To: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> References: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> Message-ID: <92771056-564B-4953-B738-5A1B97FC71AF@gmail.com> RepBase and RepeatMasker have changed structure with the new 4.0.7 released two months ago. The new version and RepBase is only compatible with the new version of RepeatMasker. You have to update both (complete reinstall). Or you have to use the previous version of RepeatMasker with the previous version of RepBase. ?Carson > On May 3, 2017, at 6:15 AM, ???Jim wrote: > > Hi, I'm a newbie of maker. > I met some errors in Repeatmasker step. > > The error is here: > NCBIBlastSearchEngine::search: Error...compressed subject database (/home/softwares/RepeatMasker/Libraries/20170127/general/is.lib) does not exist! > > I tried ncbi+blast 2.5.0 version and 2.6.0 version as the path to blast, both have the same error. > And when I use the command as "maker -R", which skips the repeatmasker step, the maker could work. > I checked the former similar errors reported by another user and he solved the problem by updating the RepBase. > So, > I deleted and re-installed the RepeatMasker, updated the RepBase, also installed RMblast. > The error is the same. > > I'm stuck in the problem now. > Would highly appreciate any help - thanks! > > Jinyuan Lu > Shanghai Jiao Tong University > No. 800 Dong Chuan Road,Minhang District, Shanghai, P.R. China > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 3 09:55:41 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 3 May 2017 09:55:41 -0600 Subject: [maker-devel] RepeatMasker: NCBIBlastSearchEngine::search: Error In-Reply-To: <92771056-564B-4953-B738-5A1B97FC71AF@gmail.com> References: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> <92771056-564B-4953-B738-5A1B97FC71AF@gmail.com> Message-ID: <2E668C83-8884-430B-A764-CD0B44D03D19@gmail.com> You may want to use the previous version of both as the new version may still have hidden bugs. ?Carson > On May 3, 2017, at 9:53 AM, Carson Holt wrote: > > RepBase and RepeatMasker have changed structure with the new 4.0.7 released two months ago. The new version and RepBase is only compatible with the new version of RepeatMasker. You have to update both (complete reinstall). Or you have to use the previous version of RepeatMasker with the previous version of RepBase. > > ?Carson > > > >> On May 3, 2017, at 6:15 AM, ???Jim > wrote: >> >> Hi, I'm a newbie of maker. >> I met some errors in Repeatmasker step. >> >> The error is here: >> NCBIBlastSearchEngine::search: Error...compressed subject database (/home/softwares/RepeatMasker/Libraries/20170127/general/is.lib) does not exist! >> >> I tried ncbi+blast 2.5.0 version and 2.6.0 version as the path to blast, both have the same error. >> And when I use the command as "maker -R", which skips the repeatmasker step, the maker could work. >> I checked the former similar errors reported by another user and he solved the problem by updating the RepBase. >> So, >> I deleted and re-installed the RepeatMasker, updated the RepBase, also installed RMblast. >> The error is the same. >> >> I'm stuck in the problem now. >> Would highly appreciate any help - thanks! >> >> Jinyuan Lu >> Shanghai Jiao Tong University >> No. 800 Dong Chuan Road,Minhang District, Shanghai, P.R. China >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 3 10:04:20 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 3 May 2017 10:04:20 -0600 Subject: [maker-devel] RepeatMasker: NCBIBlastSearchEngine::search: Error In-Reply-To: <2E668C83-8884-430B-A764-CD0B44D03D19@gmail.com> References: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> <92771056-564B-4953-B738-5A1B97FC71AF@gmail.com> <2E668C83-8884-430B-A764-CD0B44D03D19@gmail.com> Message-ID: You may have to contact RepBase via e-mail to find out how to get the libraries compatible with RepeatMasker 4.0.6 as it looks like they have removed the previous release from the website. The last release for 4.0.6 was ?> repeatmaskerlibraries-20160829.tar.gz ?Carson > On May 3, 2017, at 9:55 AM, Carson Holt wrote: > > You may want to use the previous version of both as the new version may still have hidden bugs. > > ?Carson > >> On May 3, 2017, at 9:53 AM, Carson Holt > wrote: >> >> RepBase and RepeatMasker have changed structure with the new 4.0.7 released two months ago. The new version and RepBase is only compatible with the new version of RepeatMasker. You have to update both (complete reinstall). Or you have to use the previous version of RepeatMasker with the previous version of RepBase. >> >> ?Carson >> >> >> >>> On May 3, 2017, at 6:15 AM, ???Jim > wrote: >>> >>> Hi, I'm a newbie of maker. >>> I met some errors in Repeatmasker step. >>> >>> The error is here: >>> NCBIBlastSearchEngine::search: Error...compressed subject database (/home/softwares/RepeatMasker/Libraries/20170127/general/is.lib) does not exist! >>> >>> I tried ncbi+blast 2.5.0 version and 2.6.0 version as the path to blast, both have the same error. >>> And when I use the command as "maker -R", which skips the repeatmasker step, the maker could work. >>> I checked the former similar errors reported by another user and he solved the problem by updating the RepBase. >>> So, >>> I deleted and re-installed the RepeatMasker, updated the RepBase, also installed RMblast. >>> The error is the same. >>> >>> I'm stuck in the problem now. >>> Would highly appreciate any help - thanks! >>> >>> Jinyuan Lu >>> Shanghai Jiao Tong University >>> No. 800 Dong Chuan Road,Minhang District, Shanghai, P.R. China >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 3 10:10:48 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 3 May 2017 10:10:48 -0600 Subject: [maker-devel] How to explain the maker results? In-Reply-To: <2017050323291810262239@cau.edu.cn> References: <2017050323291810262239@cau.edu.cn> Message-ID: <049F8AC8-7E16-4F05-B8B2-01CA7AB88751@gmail.com> Use the merged gff3 to train snap, otherwise you won?t have enough models. Info on training can be found on the wiki ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors Also you can find additional detailed info by searching the mailing list archives ?> http://groups.google.com/group/maker-devel I?m not sure what you are asking with the last question. Alignment is not a function of training, and will not be affected by the hmm, but 100% coverage and identity is too strict a threshold even for data derived from the same species. ?Carson > On May 3, 2017, at 9:29 AM, dcg at cau.edu.cn wrote: > > Dear sir: > I?ve been using maker to do my genome annotation. However, I still have something I can't understand: > > 1. After assembly, I have many contigs. Firstly, I set est2genome=1 and protein2genome=1 , with my proteins, ESTs and RNA-seq.. Which way below is correct? > 1.1 Each contig has its own gff. I just use its own maker_gff file to get a pyu.hmm(be used in snap practice), and then, train the single contig. > 1.2 I merge all the maker_gff to produce a pyu.hmm(for snap) , and then, use this pyu.hmm to train all the contigs. > > 2. The aim of my project is to find new protein, so I need to guarantee the rigor of my annotation. > I made a plan that the predicted protein should be successfully aligned to the Uniprot(reviewed protein, total number is about 30K) with 100% identity and coverage. > However, if I choose method 1.2 as above: > After the first step (est2genome=1 and protein2genome=1), about 1600 proteins can be 100% aligned to the Uniprot. After 2 rounds training(est2genome=0 and protein2genome=0), less proteins can be 100% aligned. > Is my test method reasonable? Why the final results can't get more well aligned proteins? > After training and fasta_merge, the results can be index_all.log.all.maker.proteins.fasta, index_all.log.all.maker.snap_masked.proteins.fasta, index_all.log.all.maker.non_overlapping_ab_initio.proteins.fasta, which is the final results? > > > I'm looking forward to hearing from you. Thanks! > Yours sincerely! > > > Chao Chao > dcg at cau.edu.cn _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan.ricks at gmail.com Wed May 3 10:19:57 2017 From: nathan.ricks at gmail.com (Nathan Ricks) Date: Wed, 3 May 2017 10:19:57 -0600 Subject: [maker-devel] Post Processing of Annotations Message-ID: Hi, I've been running your Maker pipeline, and I've reached Post Processing of Annotations portion. In your Online training you use the output.blastp and the outuput.iprscan files to help assign function. My question is what format do these files need to be in. Iprscan can produce files in a variety of formats: tsv, xml, gff3, html and SVG while blastp can produce the tabular, pairise, xml and a number of others. Thanks Nathan Ricks -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 3 10:30:03 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 3 May 2017 10:30:03 -0600 Subject: [maker-devel] Post Processing of Annotations In-Reply-To: References: Message-ID: Use blastp with the the tab delimited format option and the UniProt/Swiss-Prot database. What additional filters you choose to set (i.e. e-value limit) may vary, although I would recommend 1e-6 or lower. ?Carson > On May 3, 2017, at 10:19 AM, Nathan Ricks wrote: > > Hi, > I've been running your Maker pipeline, and I've reached Post Processing of Annotations portion. In your Online training you use the output.blastp and the outuput.iprscan files to help assign function. > My question is what format do these files need to be in. > Iprscan can produce files in a variety of formats: tsv, xml, gff3, html and SVG > while blastp can produce the tabular, pairise, xml and a number of others. > > Thanks > > Nathan Ricks > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From d.ence at ufl.edu Wed May 3 10:34:35 2017 From: d.ence at ufl.edu (Ence,daniel) Date: Wed, 3 May 2017 16:34:35 +0000 Subject: [maker-devel] Post Processing of Annotations In-Reply-To: References: Message-ID: Hi, The iprscan output should be in tsv format, which is tab-separated, and the usage statement for the maker_functional_gff says that the blastp output should be in ?wu-blast -mformat 2?, which I think is tabbed too. ~Daniel > On May 3, 2017, at 12:19 PM, Nathan Ricks wrote: > > Hi, > I've been running your Maker pipeline, and I've reached Post Processing of Annotations portion. In your Online training you use the output.blastp and the outuput.iprscan files to help assign function. > My question is what format do these files need to be in. > Iprscan can produce files in a variety of formats: tsv, xml, gff3, html and SVG > while blastp can produce the tabular, pairise, xml and a number of others. > > Thanks > > Nathan Ricks > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed May 3 13:20:31 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 3 May 2017 13:20:31 -0600 Subject: [maker-devel] Post Processing of Annotations In-Reply-To: References: Message-ID: <0EE0D7F6-5F28-46E7-9AB8-CED93DC811F6@gmail.com> The maker_functional_gff and maker_functional_fasta scripts pull specific fields out of the UniProt fasta header, so they are tied to the format used by UniProt/Swiss-Prot. At one time I had modified them to also work with NR, but that was several years ago, so I don?t know if it would still work. ?Carson > On May 3, 2017, at 1:10 PM, Nathan Ricks wrote: > > Is it possible to make my own database from sequences that I have downloaded form NCBI instead of using the UniProt/Swiss-Prot? > > On Wed, May 3, 2017 at 10:30 AM, Carson Holt > wrote: > Use blastp with the the tab delimited format option and the UniProt/Swiss-Prot database. What additional filters you choose to set (i.e. e-value limit) may vary, although I would recommend 1e-6 or lower. > > ?Carson > > > On May 3, 2017, at 10:19 AM, Nathan Ricks > wrote: > > > > Hi, > > I've been running your Maker pipeline, and I've reached Post Processing of Annotations portion. In your Online training you use the output.blastp and the outuput.iprscan files to help assign function. > > My question is what format do these files need to be in. > > Iprscan can produce files in a variety of formats: tsv, xml, gff3, html and SVG > > while blastp can produce the tabular, pairise, xml and a number of others. > > > > Thanks > > > > Nathan Ricks > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mjfi2sb3 at gmail.com Thu May 4 00:37:52 2017 From: mjfi2sb3 at gmail.com (Salim Bougouffa) Date: Thu, 04 May 2017 06:37:52 +0000 Subject: [maker-devel] advanced repeat masking library constructions & rna-seq assembly choices Message-ID: Hi, I am attempting to annotate a plant genome. I have a couple of questions: *1) RNA-seq assembly* a) I assembled my RNA-seq data using Trinity and StringTie. The two produce drastically different numbers. When I compare the two assemblies for each sample using TransRate, StringTie produces a higher score. for most of the assemblies. I see in all of the threads that you recommend Trinity but doesn't trinity produce way too many transcripts (even after chucking out the "bad" ones using transrate). b) During hint creation in MAKER, does it take into account that different transcripts have different read coverage (expression levels). I guess my question is should I filter transcripts that have a small read coverage. *2) Repeat Masking * I am following the advanced repeat library construction tutorial ( http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced). The initial steps find 15 sequences for the LTR and 159 for MITE. But, when I get to the perl DIR_CRL/CRL_Step4.pl step, both output files (Inner_Seq_For_BLAST.fasta, lLTRs_Seq_For_BLAST.fasta) are empty. a) are these numbers normal because I was expecting a lot more than 16 for the LTR? b) I don't get any errors when I run CRL_Step4.pl yet no output. What's going on?! Many thanks, /SB -- ____________________________ Sent from Inbox Mobile -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim03ljy at 126.com Thu May 4 00:36:12 2017 From: jim03ljy at 126.com (=?GBK?B?wqy98NStSmlt?=) Date: Thu, 4 May 2017 14:36:12 +0800 (CST) Subject: [maker-devel] RepeatMasker: NCBIBlastSearchEngine::search: Error In-Reply-To: References: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> <92771056-564B-4953-B738-5A1B97FC71AF@gmail.com> <2E668C83-8884-430B-A764-CD0B44D03D19@gmail.com> Message-ID: <12de56ed.5ded.15bd22c56c7.Coremail.jim03ljy@126.com> Thanks a lot! Problem solved. I matched the RepeatMasker 4.0.7 with RepBase20170127 and it worked! Thanks! ----Jinyuan Lu At 2017-05-04 00:04:20, "Carson Holt" wrote: You may have to contact RepBase via e-mail to find out how to get the libraries compatible with RepeatMasker 4.0.6 as it looks like they have removed the previous release from the website. The last release for 4.0.6 was ?> repeatmaskerlibraries-20160829.tar.gz ?Carson On May 3, 2017, at 9:55 AM, Carson Holt wrote: You may want to use the previous version of both as the new version may still have hidden bugs. ?Carson On May 3, 2017, at 9:53 AM, Carson Holt wrote: RepBase and RepeatMasker have changed structure with the new 4.0.7 released two months ago. The new version and RepBase is only compatible with the new version of RepeatMasker. You have to update both (complete reinstall). Or you have to use the previous version of RepeatMasker with the previous version of RepBase. ?Carson On May 3, 2017, at 6:15 AM, ???Jim wrote: Hi, I'm a newbie of maker. I met some errors in Repeatmasker step. The error is here: NCBIBlastSearchEngine::search: Error...compressed subject database (/home/softwares/RepeatMasker/Libraries/20170127/general/is.lib) does not exist! I tried ncbi+blast 2.5.0 version and 2.6.0 version as the path to blast, both have the same error. And when I use the command as "maker -R", which skips the repeatmasker step, the maker could work. I checked the former similar errors reported by another user and he solved the problem by updating the RepBase. So, I deleted and re-installed the RepeatMasker, updated the RepBase, also installed RMblast. The error is the same. I'm stuck in the problem now. Would highly appreciate any help - thanks! Jinyuan Lu Shanghai Jiao Tong University No. 800 Dong Chuan Road,Minhang District, Shanghai, P.R. China _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcg at cau.edu.cn Fri May 5 07:43:43 2017 From: dcg at cau.edu.cn (dcg at cau.edu.cn) Date: Fri, 5 May 2017 21:43:43 +0800 Subject: [maker-devel] How to evaluate maker proteins' quality? Message-ID: <2017050521434331108720@cau.edu.cn> Dear sir: After I finished my maker running, I should check the quality of my results. My annotation purpose is to find some new proteins. There is about 30K reviewed proteins of my species. If I want to see how many predicted proteins can support the reviewed proteins, how to do it?(Can blastp be OK? How to set the threshold? ) I used Uniprot, ESTs and RNA-seq to do my annotation. From my perspective, if the protein is reviewed and used to train snap/augustus, we should get the same one after several training rounds. So I planned to align maker_proteins to Uniprot proteins(which I utilized to annotate). If the predicted proteins match Uniprot by 100% identity and coverage, they can be thought to support the reviewed proteins. Is it correct? If not, maybe I can evaluate my proteins only by AED value and proteome domain? I'm looking forward to your help. Thanks a lot! Yours sincerely! Chao Chao dcg at cau.edu.cn -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun May 7 18:31:31 2017 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 7 May 2017 18:31:31 -0600 Subject: [maker-devel] How to evaluate maker proteins' quality? In-Reply-To: <2017050521434331108720@cau.edu.cn> References: <2017050521434331108720@cau.edu.cn> Message-ID: <51620CC3-43D9-47D5-B8B3-871F291D6518@gmail.com> Because of small differences in the assemblies, individual variants, annotated proteins used as reference being partial, as well as potential assembly error, a 100% identity expectation is too high. About 90+% would be more reasonable for a same species comparison. AED gives a good correlation with protein confidence. A perfect zero score will not happen often though since the way alignment algorithms work will leave alignment errors around splice sites and short exons. Also the evidence used is never perfect, so with AED lower values are better than higher values but can not be used as an overly specific measurement (it is only correlative and not exact). ?Carson > On May 5, 2017, at 7:43 AM, dcg at cau.edu.cn wrote: > > Dear sir: > After I finished my maker running, I should check the quality of my results. > My annotation purpose is to find some new proteins. > There is about 30K reviewed proteins of my species. If I want to see how many predicted proteins can support the reviewed proteins, how to do it?(Can blastp be OK? How to set the threshold? ) > I used Uniprot, ESTs and RNA-seq to do my annotation. From my perspective, if the protein is reviewed and used to train snap/augustus, we should get the same one after several training rounds. So I planned to align maker_proteins to Uniprot proteins(which I utilized to annotate). If the predicted proteins match Uniprot by 100% identity and coverage, they can be thought to support the reviewed proteins. Is it correct? > > If not, maybe I can evaluate my proteins only by AED value and proteome domain? > > I'm looking forward to your help. Thanks a lot! > > Yours sincerely! > > Chao Chao > dcg at cau.edu.cn _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun May 7 19:17:37 2017 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 7 May 2017 19:17:37 -0600 Subject: [maker-devel] advanced repeat masking library constructions & rna-seq assembly choices In-Reply-To: References: Message-ID: <18086AF2-01C3-4671-B974-C5FF36460618@gmail.com> Michael can you answer the second question (Michael wrote the protocol, so I CC?d him). With respect to the first question. Expression level is not necessarily relevant to the annotation process (so no MAKER does not look at read coverage). Instead we use the transcript assemblies to identify introns via splice aware alignment (yes it is the introns and not the exons we care about). Trinity has a nice option called jaccard_clip which avoids false merging of neighboring transcripts (mostly occurs in fungi where UTR can overlap). Merging of transcripts will cause extra introns to be assigned as hints as well as potential overextension of UTR during final polishing steps. The jaccard_clip option is the main reason we recommend Trinity. If Stringtie has a similar option, then it can be used as well. Thanks, Carson > On May 4, 2017, at 12:37 AM, Salim Bougouffa wrote: > > Hi, > > I am attempting to annotate a plant genome. I have a couple of questions: > > 1) RNA-seq assembly > a) I assembled my RNA-seq data using Trinity and StringTie. The two produce drastically different numbers. When I compare the two assemblies for each sample using TransRate, StringTie produces a higher score. for most of the assemblies. I see in all of the threads that you recommend Trinity but doesn't trinity produce way too many transcripts (even after chucking out the "bad" ones using transrate). > b) During hint creation in MAKER, does it take into account that different transcripts have different read coverage (expression levels). I guess my question is should I filter transcripts that have a small read coverage. > > 2) Repeat Masking > I am following the advanced repeat library construction tutorial (http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced ). The initial steps find 15 sequences for the LTR and 159 for MITE. But, when I get to the perl DIR_CRL/CRL_Step4.pl step, both output files (Inner_Seq_For_BLAST.fasta, lLTRs_Seq_For_BLAST.fasta) are empty. > > a) are these numbers normal because I was expecting a lot more than 16 for the LTR? > b) I don't get any errors when I run CRL_Step4.pl yet no output. What's going on?! > > Many thanks, > /SB > -- > ____________________________ > Sent from Inbox Mobile > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcampbel at cshl.edu Sun May 7 19:24:27 2017 From: mcampbel at cshl.edu (Campbell, Michael) Date: Mon, 8 May 2017 01:24:27 +0000 Subject: [maker-devel] advanced repeat masking library constructions & rna-seq assembly choices In-Reply-To: <18086AF2-01C3-4671-B974-C5FF36460618@gmail.com> References: <18086AF2-01C3-4671-B974-C5FF36460618@gmail.com> Message-ID: <076B034E-8107-49CE-90C7-277AA4AB4ED3@cshl.edu> Hi SB, I?ve added Ning Jaing to this email. She has put great effort into updating this protocol recently and will be able to address your questions better than I can. Ning, would you mind helping out with this? Thanks, Mike On May 7, 2017, at 9:17 PM, Carson Holt > wrote: Michael can you answer the second question (Michael wrote the protocol, so I CC?d him). With respect to the first question. Expression level is not necessarily relevant to the annotation process (so no MAKER does not look at read coverage). Instead we use the transcript assemblies to identify introns via splice aware alignment (yes it is the introns and not the exons we care about). Trinity has a nice option called jaccard_clip which avoids false merging of neighboring transcripts (mostly occurs in fungi where UTR can overlap). Merging of transcripts will cause extra introns to be assigned as hints as well as potential overextension of UTR during final polishing steps. The jaccard_clip option is the main reason we recommend Trinity. If Stringtie has a similar option, then it can be used as well. Thanks, Carson On May 4, 2017, at 12:37 AM, Salim Bougouffa > wrote: Hi, I am attempting to annotate a plant genome. I have a couple of questions: 1) RNA-seq assembly a) I assembled my RNA-seq data using Trinity and StringTie. The two produce drastically different numbers. When I compare the two assemblies for each sample using TransRate, StringTie produces a higher score. for most of the assemblies. I see in all of the threads that you recommend Trinity but doesn't trinity produce way too many transcripts (even after chucking out the "bad" ones using transrate). b) During hint creation in MAKER, does it take into account that different transcripts have different read coverage (expression levels). I guess my question is should I filter transcripts that have a small read coverage. 2) Repeat Masking I am following the advanced repeat library construction tutorial (http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced). The initial steps find 15 sequences for the LTR and 159 for MITE. But, when I get to the perl DIR_CRL/CRL_Step4.pl step, both output files (Inner_Seq_For_BLAST.fasta, lLTRs_Seq_For_BLAST.fasta) are empty. a) are these numbers normal because I was expecting a lot more than 16 for the LTR? b) I don't get any errors when I run CRL_Step4.pl yet no output. What's going on?! Many thanks, /SB -- ____________________________ Sent from Inbox Mobile _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From jiangn at msu.edu Mon May 8 09:50:45 2017 From: jiangn at msu.edu (Jiang, Ning) Date: Mon, 8 May 2017 15:50:45 +0000 Subject: [maker-devel] advanced repeat masking library constructions & rna-seq assembly choices In-Reply-To: <076B034E-8107-49CE-90C7-277AA4AB4ED3@cshl.edu> References: <18086AF2-01C3-4671-B974-C5FF36460618@gmail.com>, <076B034E-8107-49CE-90C7-277AA4AB4ED3@cshl.edu> Message-ID: Hi Salim, I am sorry to learn about the issues. it depends on the quality of your genome assembly for how many intact LTR elements you would get; however, 16 seems too low to me. The inner and LTR sequence file should NOT be empty. Some times the issue could be due to that the initial sequence name is long and complicated. If that's the case for your sequences, you might want to simplify your sequence name (only including letters and numbers) and try again. We are working on an automatic pipeline for LTR collection, if everything goes smoothly, it should be available in two to three months. Best wishes, Ning ________________________________ From: Campbell, Michael Sent: Sunday, May 7, 2017 9:24 PM To: Carson Holt Cc: Salim Bougouffa; maker-devel at yandell-lab.org List; Jiang, Ning Subject: Re: [maker-devel] advanced repeat masking library constructions & rna-seq assembly choices Hi SB, I?ve added Ning Jaing to this email. She has put great effort into updating this protocol recently and will be able to address your questions better than I can. Ning, would you mind helping out with this? Thanks, Mike On May 7, 2017, at 9:17 PM, Carson Holt > wrote: Michael can you answer the second question (Michael wrote the protocol, so I CC?d him). With respect to the first question. Expression level is not necessarily relevant to the annotation process (so no MAKER does not look at read coverage). Instead we use the transcript assemblies to identify introns via splice aware alignment (yes it is the introns and not the exons we care about). Trinity has a nice option called jaccard_clip which avoids false merging of neighboring transcripts (mostly occurs in fungi where UTR can overlap). Merging of transcripts will cause extra introns to be assigned as hints as well as potential overextension of UTR during final polishing steps. The jaccard_clip option is the main reason we recommend Trinity. If Stringtie has a similar option, then it can be used as well. Thanks, Carson On May 4, 2017, at 12:37 AM, Salim Bougouffa > wrote: Hi, I am attempting to annotate a plant genome. I have a couple of questions: 1) RNA-seq assembly a) I assembled my RNA-seq data using Trinity and StringTie. The two produce drastically different numbers. When I compare the two assemblies for each sample using TransRate, StringTie produces a higher score. for most of the assemblies. I see in all of the threads that you recommend Trinity but doesn't trinity produce way too many transcripts (even after chucking out the "bad" ones using transrate). b) During hint creation in MAKER, does it take into account that different transcripts have different read coverage (expression levels). I guess my question is should I filter transcripts that have a small read coverage. 2) Repeat Masking I am following the advanced repeat library construction tutorial (http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced). The initial steps find 15 sequences for the LTR and 159 for MITE. But, when I get to the perl DIR_CRL/CRL_Step4.pl step, both output files (Inner_Seq_For_BLAST.fasta, lLTRs_Seq_For_BLAST.fasta) are empty. a) are these numbers normal because I was expecting a lot more than 16 for the LTR? b) I don't get any errors when I run CRL_Step4.pl yet no output. What's going on?! Many thanks, /SB -- ____________________________ Sent from Inbox Mobile _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mjfi2sb3 at gmail.com Mon May 8 10:41:51 2017 From: mjfi2sb3 at gmail.com (Salim Bougouffa) Date: Mon, 08 May 2017 16:41:51 +0000 Subject: [maker-devel] advanced repeat masking library constructions & rna-seq assembly choices In-Reply-To: References: <18086AF2-01C3-4671-B974-C5FF36460618@gmail.com> <076B034E-8107-49CE-90C7-277AA4AB4ED3@cshl.edu> Message-ID: Thank you all for your responses. Regards, /SB On Mon, 8 May 2017, 18:50 Jiang, Ning, wrote: > Hi Salim, > > > I am sorry to learn about the issues. it depends on the quality of your > genome assembly for how many intact LTR elements you would get; however, 16 > seems too low to me. > > > The inner and LTR sequence file should NOT be empty. Some times the issue > could be due to that the initial sequence name is long and complicated. If > that's the case for your sequences, you might want to simplify your > sequence name (only including letters and numbers) and try again. > > > We are working on an automatic pipeline for LTR collection, if everything > goes smoothly, it should be available in two to three months. > > > Best wishes, > > > Ning > ------------------------------ > *From:* Campbell, Michael > *Sent:* Sunday, May 7, 2017 9:24 PM > *To:* Carson Holt > *Cc:* Salim Bougouffa; maker-devel at yandell-lab.org List; Jiang, Ning > *Subject:* Re: [maker-devel] advanced repeat masking library > constructions & rna-seq assembly choices > > Hi SB, > > I?ve added Ning Jaing to this email. She has put great effort into > updating this protocol recently and will be able to address your questions > better than I can. > > Ning, would you mind helping out with this? > > Thanks, > Mike > > On May 7, 2017, at 9:17 PM, Carson Holt carsonhh at gmail.com>> wrote: > > Michael can you answer the second question (Michael wrote the protocol, so > I CC?d him). > > With respect to the first question. Expression level is not necessarily > relevant to the annotation process (so no MAKER does not look at read > coverage). Instead we use the transcript assemblies to identify introns via > splice aware alignment (yes it is the introns and not the exons we care > about). Trinity has a nice option called jaccard_clip which avoids false > merging of neighboring transcripts (mostly occurs in fungi where UTR can > overlap). Merging of transcripts will cause extra introns to be assigned as > hints as well as potential overextension of UTR during final polishing > steps. The jaccard_clip option is the main reason we recommend Trinity. If > Stringtie has a similar option, then it can be used as well. > > Thanks, > Carson > > > > On May 4, 2017, at 12:37 AM, Salim Bougouffa mjfi2sb3 at gmail.com>> wrote: > > Hi, > > I am attempting to annotate a plant genome. I have a couple of questions: > > 1) RNA-seq assembly > a) I assembled my RNA-seq data using Trinity and StringTie. The two > produce drastically different numbers. When I compare the two assemblies > for each sample using TransRate, StringTie produces a higher score. for > most of the assemblies. I see in all of the threads that you recommend > Trinity but doesn't trinity produce way too many transcripts (even after > chucking out the "bad" ones using transrate). > b) During hint creation in MAKER, does it take into account that different > transcripts have different read coverage (expression levels). I guess my > question is should I filter transcripts that have a small read coverage. > > 2) Repeat Masking > I am following the advanced repeat library construction tutorial ( > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced). > The initial steps find 15 sequences for the LTR and 159 for MITE. But, when > I get to the perl DIR_CRL/CRL_Step4.pl step, both output files > (Inner_Seq_For_BLAST.fasta, lLTRs_Seq_For_BLAST.fasta) are empty. > > a) are these numbers normal because I was expecting a lot more than 16 for > the LTR? > b) I don't get any errors when I run CRL_Step4.pl yet no output. What's > going on?! > > Many thanks, > /SB > -- > > ____________________________ > Sent from Inbox Mobile > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Wed May 10 09:48:01 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Wed, 10 May 2017 11:48:01 -0400 Subject: [maker-devel] want coding sequences Message-ID: Hello: Thanks for development and maintenance of the tool "Maker2". We have used Maker2 to do genome annotation of a new rodent species. Now we are doing downstream analysis, which requires inputs of coding sequences from different species. I found the outputs I got from Maker2 only include protein sequences and transcripts. Is there an easy way that I can get the coding sequences for our annotated genome? Many thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From munholl at uwindsor.ca Wed May 10 14:24:49 2017 From: munholl at uwindsor.ca (Seth Munholland) Date: Wed, 10 May 2017 16:24:49 -0400 Subject: [maker-devel] MAKER only running 1 task Message-ID: Hello, I'm running a MAKER annotation on an ubuntu cluster and my top screen shows the following: Tasks: 831 total, 3 running, 826 sleeping, 0 stopped, 1 zombie and my maker run (on a screen) shows: ... total clusters:4 now processing 0 ...processing 0 of 3 ...processing 1 of 3 ...processing 2 of 3 total clusters:4 now processing 0 flattening protein clusters prepare section files merging blast reports... flattening protein clusters prepare section files merging blast reports... flattening protein clusters prepare section files Of the 826, the vast majority of them are maker and only one of the running tasks is maker. Is this normal behaviour or has my maker run stopped processing? Seth Munholland, B.Sc. Department of Biological Sciences Rm. 304 Biology Building University of Windsor 401 Sunset Ave. N9B 3P4 T: (519) 253-3000 Ext: 4755 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu May 11 09:58:59 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 11 May 2017 09:58:59 -0600 Subject: [maker-devel] want coding sequences In-Reply-To: References: Message-ID: <82DF4E49-8E78-45F6-8A78-01A45F908987@gmail.com> Use the fasta_tool utility with ?trim_maker_utr to get just the CDS part of each transcript. ?Carson > On May 10, 2017, at 9:48 AM, Quanwei Zhang wrote: > > Hello: > > Thanks for development and maintenance of the tool "Maker2". We have used Maker2 to do genome annotation of a new rodent species. Now we are doing downstream analysis, which requires inputs of coding sequences from different species. > > I found the outputs I got from Maker2 only include protein sequences and transcripts. Is there an easy way that I can get the coding sequences for our annotated genome? > > Many thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu May 11 10:03:04 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 11 May 2017 10:03:04 -0600 Subject: [maker-devel] MAKER only running 1 task In-Reply-To: References: Message-ID: <2119AF6E-3571-4D28-9D81-B24A513839E5@gmail.com> It may be frozen, or if it is on the last contig it can be running a non-paralelizable step (very last cluster merging step for each contig is not paralelizable). So on a large contig the very last step can take a little while, and if there are no other contigs, then there is no work to give to other processes to keep them busy in the meantime. So everyone has to wait so they can all exit together once the last step is done. But as I said, this will only happen if you are on the last contig and it is large. Otherwise it is probably frozen somehow (look for any errors further up the log). ?Carson > On May 10, 2017, at 2:24 PM, Seth Munholland wrote: > > Hello, > > I'm running a MAKER annotation on an ubuntu cluster and my top screen shows the following: > > Tasks: 831 total, 3 running, 826 sleeping, 0 stopped, 1 zombie > > and my maker run (on a screen) shows: > > ... > total clusters:4 now processing 0 > ...processing 0 of 3 > ...processing 1 of 3 > ...processing 2 of 3 > total clusters:4 now processing 0 > flattening protein clusters > prepare section files > merging blast reports... > flattening protein clusters > prepare section files > merging blast reports... > flattening protein clusters > prepare section files > > Of the 826, the vast majority of them are maker and only one of the running tasks is maker. Is this normal behaviour or has my maker run stopped processing? > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 <>_______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaymik at tgen.org Thu May 11 12:29:46 2017 From: mnaymik at tgen.org (Marcus Naymik) Date: Thu, 11 May 2017 11:29:46 -0700 Subject: [maker-devel] Maker gene vs snap match in final GFF's Message-ID: In the final GFF annotations what is the difference between a 'gene' from maker and a 'match' from snap? -- *This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu May 11 12:33:55 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 11 May 2017 12:33:55 -0600 Subject: [maker-devel] Maker gene vs snap match in final GFF's In-Reply-To: References: Message-ID: <8A77F384-9BAD-4BF4-BD1E-EDAE4E010612@gmail.com> MAKER results can be the result of additional hints sent to SNAP together with post processing to add UTR and additional exons that have support form transcript evidence. MAKER results will also have support from either protein or EST/mRNA evidence. SNAP match is simply the raw ab initio call made by SNAP (no hints, no post processing, and may or may not have evidence supporting the structure). They are there just for reference purposes. so you know what SNAP will produce outside of MAKER given the underlying HMM. ?Carson > On May 11, 2017, at 12:29 PM, Marcus Naymik wrote: > > In the final GFF annotations what is the difference between a 'gene' from maker and a 'match' from snap? > > This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.ence at ufl.edu Thu May 11 12:35:00 2017 From: d.ence at ufl.edu (Ence,daniel) Date: Thu, 11 May 2017 18:35:00 +0000 Subject: [maker-devel] Maker gene vs snap match in final GFF's In-Reply-To: References: Message-ID: <2560F44E-3D8D-4E81-B7B9-621921408B61@mail.ufl.edu> The two might have identical coordinates in some cases, but they are different kinds of features. The ?match? is a product of an abinitio gene prediction algorithm, while the ?gene? is is supported by evidence and passed through the maker polishing and filtering steps. On May 11, 2017, at 2:29 PM, Marcus Naymik > wrote: In the final GFF annotations what is the difference between a 'gene' from maker and a 'match' from snap? This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan.ricks at gmail.com Fri May 12 15:05:45 2017 From: nathan.ricks at gmail.com (Nathan Ricks) Date: Fri, 12 May 2017 15:05:45 -0600 Subject: [maker-devel] Using mpich2 with Maker Message-ID: I've been using maker for some time now. However, I would like to speed up the process by using the mpich2 option. When use the command ./Build install, the following error is produced. Any help would be appreciated. Nathan Ricks [image: Inline image 1] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 170139 bytes Desc: not available URL: From yuejiaxing at gmail.com Mon May 15 05:28:47 2017 From: yuejiaxing at gmail.com (Jia-Xing Yue) Date: Mon, 15 May 2017 13:28:47 +0200 Subject: [maker-devel] multiple overlapped snoRNA genes got annotated by maker Message-ID: Hello, I configured snoscan (v.0.9.1) for my maker installation (v2.31.9) and run the annotation for a yeast (S. cerevisiae) genome. I think the annotation went well with regard to tRNAs and protein-coding genes but I am not sure about snoRNAs. I found multiple overlapped snoRNA genes were annotated by maker as the example below shows. I was wondering if this is expected. If not, what might have caused this problem and is there a way to work around. Thanks in advance! chrIX maker gene 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding-gene-0.49 chrIX maker snoRNA 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|90|0 chrIX maker exon 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1:exon:12260;Parent=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1 chrIX maker gene 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding-gene-0.50 chrIX maker snoRNA 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|190|0 chrIX maker exon 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1:exon:12261;Parent=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1 chrIX maker gene 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding-gene-0.51 chrIX maker snoRNA 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|88|0 chrIX maker exon 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1:exon:12262;Parent=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1 chrIX maker gene 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding-gene-0.52 chrIX maker snoRNA 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|118|0 chrIX maker exon 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1:exon:12263;Parent=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1 chrIX maker gene 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding-gene-0.53 chrIX maker snoRNA 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|127|0 chrIX maker exon 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1:exon:12264;Parent=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1 Best, Jia-Xing -- Jia-Xing Yue Population Genomics and Complex Traits Group Tour Pasteur 8eme etage Facult? de M?decine Institute for Research on Cancer and Aging, Nice (IRCAN) CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) 28 Avenue de Valombrose 06107 NICE Cedex 2 France Personal website: http://www.iamphioxus.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon May 15 10:28:58 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 15 May 2017 12:28:58 -0400 Subject: [maker-devel] multiple overlapped snoRNA genes got annotated by maker In-Reply-To: References: Message-ID: <0AC3F89F-28EB-4E2A-ADDE-2DF8BD625416@gmail.com> Hi Jia-Xing, That has been my experience in the past as well. For the non-coding RNAs tRNA-scan is very accurate while snoscan seems to be quite sensitive but very specific. Did you give it a ?snoscan_meth? file? Giving it a snoscan_meth file will help with accuracy. The biggest gains in accuracy are from small RNA-seq data. In the paper where we used snoscan on maize we didn?t keep any snoRNA predictions that didn?t have support from small RNA-seq data, in practical terms we got rid of anything with a AED of 1. I hope this helps, Mike > On May 15, 2017, at 7:28 AM, Jia-Xing Yue wrote: > > Hello, > > I configured snoscan (v.0.9.1) for my maker installation (v2.31.9) and run the annotation for a yeast (S. cerevisiae) genome. I think the annotation went well with regard to tRNAs and protein-coding genes but I am not sure about snoRNAs. I found multiple overlapped snoRNA genes were annotated by maker as the example below shows. I was wondering if this is expected. If not, what might have caused this problem and is there a way to work around. Thanks in advance! > > chrIX maker gene 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding-gene-0.49 > chrIX maker snoRNA 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|90|0 > chrIX maker exon 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1:exon:12260;Parent=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1 > chrIX maker gene 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding-gene-0.50 > chrIX maker snoRNA 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|190|0 > chrIX maker exon 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1:exon:12261;Parent=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1 > chrIX maker gene 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding-gene-0.51 > chrIX maker snoRNA 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|88|0 > chrIX maker exon 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1:exon:12262;Parent=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1 > chrIX maker gene 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding-gene-0.52 > chrIX maker snoRNA 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|118|0 > chrIX maker exon 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1:exon:12263;Parent=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1 > chrIX maker gene 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding-gene-0.53 > chrIX maker snoRNA 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|127|0 > chrIX maker exon 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1:exon:12264;Parent=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1 > > Best, > Jia-Xing > > > -- > Jia-Xing Yue > > Population Genomics and Complex Traits Group > Tour Pasteur 8eme etage > Facult? de M?decine > Institute for Research on Cancer and Aging, Nice (IRCAN) > CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) > 28 Avenue de Valombrose > 06107 NICE Cedex 2 > France > > Personal website: http://www.iamphioxus.org/ > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From yuejiaxing at gmail.com Mon May 15 11:14:26 2017 From: yuejiaxing at gmail.com (Jia-Xing Yue) Date: Mon, 15 May 2017 19:14:26 +0200 Subject: [maker-devel] multiple overlapped snoRNA genes got annotated by maker In-Reply-To: <0AC3F89F-28EB-4E2A-ADDE-2DF8BD625416@gmail.com> References: <0AC3F89F-28EB-4E2A-ADDE-2DF8BD625416@gmail.com> Message-ID: Hi Michael, Many thanks for the information! I will specify the "snoscan_meth" file and give it another try then. I majorly want to use maker to annotate protein-coding genes and tRNAs. But it would be nice to have snoRNA reasonably annotated as well. Thanks gain and have a great day! Best, Jia-Xing On Mon, May 15, 2017 at 6:28 PM, Michael Campbell < michael.s.campbell1 at gmail.com> wrote: > Hi Jia-Xing, > > That has been my experience in the past as well. For the non-coding RNAs > tRNA-scan is very accurate while snoscan seems to be quite sensitive but > very specific. Did you give it a ?snoscan_meth? file? Giving it > a snoscan_meth file will help with accuracy. The biggest gains in accuracy > are from small RNA-seq data. In the paper where we used snoscan on maize we > didn?t keep any snoRNA predictions that didn?t have support from small > RNA-seq data, in practical terms we got rid of anything with a AED of 1. > > I hope this helps, > Mike > > On May 15, 2017, at 7:28 AM, Jia-Xing Yue wrote: > > Hello, > > I configured snoscan (v.0.9.1) for my maker installation (v2.31.9) and run > the annotation for a yeast (S. cerevisiae) genome. I think the annotation > went well with regard to tRNAs and protein-coding genes but I am not sure > about snoRNAs. I found multiple overlapped snoRNA genes were annotated by > maker as the example below shows. I was wondering if this is expected. If > not, what might have caused this problem and is there a way to work around. > Thanks in advance! > > chrIX maker gene 4328 4416 . + . > ID=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX- > noncoding-gene-0.49 > chrIX maker snoRNA 4328 4416 . + . > ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;Parent= > snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX- > noncoding-gene-0.49-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|90|0 > chrIX maker exon 4328 4416 . + . > ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1:exon:12260; > Parent=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1 > chrIX maker gene 4375 4563 . + . > ID=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX- > noncoding-gene-0.50 > chrIX maker snoRNA 4375 4563 . + . > ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;Parent= > snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX- > noncoding-gene-0.50-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1| > 0|0|-1|0|1|190|0 > chrIX maker exon 4375 4563 . + . > ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1:exon:12261; > Parent=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1 > chrIX maker gene 4375 4461 . + . > ID=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX- > noncoding-gene-0.51 > chrIX maker snoRNA 4375 4461 . + . > ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;Parent= > snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX- > noncoding-gene-0.51-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|88|0 > chrIX maker exon 4375 4461 . + . > ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1:exon:12262; > Parent=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1 > chrIX maker gene 4375 4491 . + . > ID=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX- > noncoding-gene-0.52 > chrIX maker snoRNA 4375 4491 . + . > ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;Parent= > snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX- > noncoding-gene-0.52-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1| > 0|0|-1|0|1|118|0 > chrIX maker exon 4375 4491 . + . > ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1:exon:12263; > Parent=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1 > chrIX maker gene 4375 4500 . + . > ID=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX- > noncoding-gene-0.53 > chrIX maker snoRNA 4375 4500 . + . > ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;Parent= > snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX- > noncoding-gene-0.53-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1| > 0|0|-1|0|1|127|0 > chrIX maker exon 4375 4500 . + . > ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1:exon:12264; > Parent=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1 > > Best, > Jia-Xing > > > -- > Jia-Xing Yue > > Population Genomics and Complex Traits Group > Tour Pasteur 8eme etage > Facult? de M?decine > Institute for Research on Cancer and Aging, Nice (IRCAN) > CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) > 28 Avenue de Valombrose > 06107 NICE Cedex 2 > France > > Personal website: http://www.iamphioxus.org/ > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -- Jia-Xing Yue Population Genomics and Complex Traits Group Tour Pasteur 8eme etage Facult? de M?decine Institute for Research on Cancer and Aging, Nice (IRCAN) CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) 28 Avenue de Valombrose 06107 NICE Cedex 2 France Personal website: http://www.iamphioxus.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 16 08:51:00 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 16 May 2017 08:51:00 -0600 Subject: [maker-devel] Using mpich2 with Maker In-Reply-To: References: Message-ID: <6C739B3B-D5D6-4263-A73C-4EA1762B1EE1@gmail.com> You probably need to reinstall Parse::RecDescent, Inline, Inline::C, or all of the above via CPAN (perl?s module installer). The ones already installed on your system may have issues. If you do not have the ability to install modules, you can install them just for your user using local::lib and the bootstrapping instructions here ?> http://search.cpan.org/~haarg/local-lib-2.000019/lib/local/lib.pm#The_bootstrapping_technique Then reinstall MAKER. ?Carson > On May 12, 2017, at 3:05 PM, Nathan Ricks wrote: > > I've been using maker for some time now. However, I would like to speed up the process by using the mpich2 option. When use the command ./Build install, the following error is produced. Any help would be appreciated. > > Nathan Ricks > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From salim.bougouffa at kaust.edu.sa Sun May 21 01:45:50 2017 From: salim.bougouffa at kaust.edu.sa (Salim Bougouffa) Date: Sun, 21 May 2017 10:45:50 +0300 Subject: [maker-devel] augustus exon calling ~ Message-ID: Hi Maker folks, I have several issues with a plant genome annotation that I am currently doing but perhaps the most recurrent issues are: 1/ CDSs that are missed where significant rna-seq evidence is there (figure artemis01) 2/ vice versa where one or two exons are added without rna-seq evidence/intron hints (figure artemis02) info about the runs: 1/ using augustus with a pre-existing model for a related plant that has high homology to the one I am annotating 2/ umask=1 (seems to do better than umask=0; is this a good thing to do) 3/ evm = 1 (seems to perform better than emv=0) 4/ repeatmasking (denovo + repbase) Best, /SB _______________________________________________________Salim Bougouffa(PhD), Postdoctoral Fellow 4700 KAUST, CBRC, Blg3. Office4326-WS05, Thuwal, Jeddah, KSA, 23955-6900 (966) 012 808 2963 || salim.bougouff at kaust.edu.sa -- ------------------------------ This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: artemis01.png Type: image/png Size: 166352 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: artemis02.png Type: image/png Size: 97140 bytes Desc: not available URL: From mjfi2sb3 at gmail.com Sun May 21 01:48:48 2017 From: mjfi2sb3 at gmail.com (Salim Bougouffa) Date: Sun, 21 May 2017 07:48:48 +0000 Subject: [maker-devel] augustus exon calling ~ Message-ID: Hi Maker folks, I have several issues with a plant genome annotation that I am currently doing but perhaps the most recurrent issues are: 1/ CDSs that are missed where significant rna-seq evidence is there (figure artemis01) 2/ vice versa where one or two exons are added without rna-seq evidence/intron hints (figure artemis02) info about the runs: 1/ using augustus with a pre-existing model for a related plant that has high homology to the one I am annotating 2/ umask=1 (seems to do better than umask=0; is this a good thing to do) 3/ evm = 1 (seems to perform better than emv=0) 4/ repeatmasking (denovo + repbase) Best, /SB -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: artemis01.png Type: image/png Size: 166352 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: artemis02.png Type: image/png Size: 97140 bytes Desc: not available URL: From mjfi2sb3 at gmail.com Sun May 21 01:55:31 2017 From: mjfi2sb3 at gmail.com (Salim Bougouffa) Date: Sun, 21 May 2017 07:55:31 +0000 Subject: [maker-devel] augustus exon calling ~ In-Reply-To: References: Message-ID: Hi, I should have mentioned a third scenario where an exon is not called fully by maker despite augustus getting it right (figure artemis03) [image: artemis03.png] On Sun, 21 May 2017 at 10:48 Salim Bougouffa wrote: > Hi Maker folks, > > I have several issues with a plant genome annotation that I am currently > doing but perhaps the most recurrent issues are: > > 1/ CDSs that are missed where significant rna-seq evidence is there > (figure artemis01) > 2/ vice versa where one or two exons are added without rna-seq > evidence/intron hints (figure artemis02) > > info about the runs: > 1/ using augustus with a pre-existing model for a related plant that has > high homology to the one I am annotating > 2/ umask=1 (seems to do better than umask=0; is this a good thing to do) > 3/ evm = 1 (seems to perform better than emv=0) > 4/ repeatmasking (denovo + repbase) > > Best, > /SB > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: artemis03.png Type: image/png Size: 145860 bytes Desc: not available URL: From admin at genome.arizona.edu Tue May 23 13:52:17 2017 From: admin at genome.arizona.edu (System Admin) Date: Tue, 23 May 2017 12:52:17 -0700 Subject: [maker-devel] Hyperthreading Message-ID: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> We are using maker in a cluster with mpich. Currently hyperthreading is on and we use 'mpiexec -n ' to start maker. Our machinelist file for mpich specifies the total emulated cores for each node. With hyperthreading on, we have up to 256 total emulated cores available. Which is the optimal scenario? 1. Use '-n 256' 2. Use '-n 128' with hyperthreading still on 3. Use '-n 128' with hyperthreading turned off Thanks From carsonhh at gmail.com Tue May 23 14:19:29 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 23 May 2017 14:19:29 -0600 Subject: [maker-devel] Hyperthreading In-Reply-To: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> References: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> Message-ID: MAKER is more of a pipeline. It will launch external tools on as many CPUs as you give it with the mpiexec command. I?ve found that many of the tools used get a boost with hyperthreading even though optimizations are not explicitly built into their code. The short answer is you would have to try it both ways. I doubt there will be much more than a 10-15% difference in runtime. You can pull back to 128 would if you find that you are running low on RAM or have a high IO burden (both of which will double if you go from 128 to 256 even though CPU isn?t really doubling). Also MAKER per job performance plateaus at around 200 processes due to communication overhead. Above that threshold it is often useful to divide datasets into multiple separate jobs that can run simultaneously. ?Carson > On May 23, 2017, at 1:52 PM, System Admin wrote: > > We are using maker in a cluster with mpich. Currently hyperthreading is on and we use 'mpiexec -n ' to start maker. Our machinelist file for mpich specifies the total emulated cores for each node. > With hyperthreading on, we have up to 256 total emulated cores available. > > Which is the optimal scenario? > 1. Use '-n 256' > 2. Use '-n 128' with hyperthreading still on > 3. Use '-n 128' with hyperthreading turned off > > Thanks > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From admin at genome.arizona.edu Tue May 23 14:31:48 2017 From: admin at genome.arizona.edu (admin at genome.arizona.edu) Date: Tue, 23 May 2017 13:31:48 -0700 Subject: [maker-devel] Hyperthreading In-Reply-To: References: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> Message-ID: Carson Holt wrote on 05/23/2017 01:19 PM: > You can pull back to 128 would if you find that you are running low on RAM or have a high IO burden (both of which will double if you go from 128 to 256 even though CPU isn?t really doubling). Also MAKER per job performance plateaus at around 200 processes due to communication overhead. Above that threshold it is often useful to divide datasets into multiple separate jobs that can run simultaneously. Yes, with '-n 192' we found the load on the cluster will initially go up to 360-380 but then continually decreases until maker is finished. Memory usage was very low during the processing (under 20%). From carsonhh at gmail.com Tue May 23 14:38:42 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 23 May 2017 14:38:42 -0600 Subject: [maker-devel] Hyperthreading In-Reply-To: References: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> Message-ID: <47000CA1-0F27-40B2-8BB7-3289C2010853@gmail.com> Also make sure cpus in the control file are set to 1 when using MPI. Otherwise it will tell each program it calls to try and use more CPUs per call. ?Carson > On May 23, 2017, at 2:31 PM, admin at genome.arizona.edu wrote: > > Carson Holt wrote on 05/23/2017 01:19 PM: >> You can pull back to 128 would if you find that you are running low on RAM or have a high IO burden (both of which will double if you go from 128 to 256 even though CPU isn?t really doubling). Also MAKER per job performance plateaus at around 200 processes due to communication overhead. Above that threshold it is often useful to divide datasets into multiple separate jobs that can run simultaneously. > > Yes, with '-n 192' we found the load on the cluster will initially go up to 360-380 but then continually decreases until maker is finished. Memory usage was very low during the processing (under 20%). > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mmokrejs at gmail.com Tue May 23 14:45:21 2017 From: mmokrejs at gmail.com (=?UTF-8?Q?Martin_MOKREJ=c5=a0?=) Date: Tue, 23 May 2017 22:45:21 +0200 Subject: [maker-devel] Hyperthreading In-Reply-To: References: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> Message-ID: <30ae31f7-264e-44d0-30bc-a010de3e54a7@gmail.com> admin at genome.arizona.edu wrote: > Carson Holt wrote on 05/23/2017 01:19 PM: >> You can pull back to 128 would if you find that you are running low on RAM or have a high IO burden (both of which will double if you go from 128 to 256 even though CPU isn?t really doubling). Also MAKER per job performance plateaus at around 200 processes due to communication overhead. Above that threshold it is often useful to divide datasets into multiple separate jobs that can run simultaneously. > > Yes, with '-n 192' we found the load on the cluster will initially go up to 360-380 but then continually decreases until maker is finished. Memory usage was very low during the processing (under 20%). Hi, the high load could be caused by disk IO or other reasons. The only proof is to run top, htop or similar and check that the processes are in *running* state ("R" is displayed in the status column). There could be "S" (sleep) when task is waiting for data input or output and also "D"(disk) coudl be shown when waiting for disk IO (unlike network IO). Martin From mmokrejs at gmail.com Tue May 23 14:51:18 2017 From: mmokrejs at gmail.com (=?UTF-8?Q?Martin_MOKREJ=c5=a0?=) Date: Tue, 23 May 2017 22:51:18 +0200 Subject: [maker-devel] Hyperthreading In-Reply-To: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> References: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> Message-ID: <54fbcfd6-ba28-e570-cfab-a6d83620f747@gmail.com> System Admin wrote: > We are using maker in a cluster with mpich. Currently hyperthreading is on and we use 'mpiexec -n ' to start maker. Our machinelist file for mpich specifies the total emulated cores for each node. > With hyperthreading on, we have up to 256 total emulated cores available. > > Which is the optimal scenario? > 1. Use '-n 256' > 2. Use '-n 128' with hyperthreading still on > 3. Use '-n 128' with hyperthreading turned off Go for 3. but make sure to disable *hyperthreading* in the kernel of the machines as well. I also disable multicore scheduler (which should again be helping if there are more long-term running processes than physical cores available and if some should probably share a cache). We do not have such jobs, hmmer and blast are mostly accessing data from memory, so the CPU cache is not much relevant for these. Hyperthreading only helps if jobs are lousy, waiting for some input/output etc., and in that case *it helps* if another process can be executed on the CPU core (hopefully not having same bottleneck). This is generally a helped in bad situations. You are after good setup, so disable hyperthreading in kernel, load only that many jobs equal to the number of physical CPI cores, and monitor performance. If jobs are starving, resolve the issue. Martin From admin at genome.arizona.edu Tue May 23 14:57:56 2017 From: admin at genome.arizona.edu (admin at genome.arizona.edu) Date: Tue, 23 May 2017 13:57:56 -0700 Subject: [maker-devel] Hyperthreading In-Reply-To: <47000CA1-0F27-40B2-8BB7-3289C2010853@gmail.com> References: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> <47000CA1-0F27-40B2-8BB7-3289C2010853@gmail.com> Message-ID: Carson Holt wrote on 05/23/2017 01:38 PM: > Also make sure cpus in the control file are set to 1 when using MPI. > Otherwise it will tell each program it calls to try and use more > CPUs per call. Yes we are using cpus=1 in the control file Thanks From carsonhh at gmail.com Tue May 23 15:03:17 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 23 May 2017 15:03:17 -0600 Subject: [maker-devel] Hyperthreading In-Reply-To: References: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> Message-ID: <054B76C7-FAC0-4B9B-A6D1-29A8202E35B3@gmail.com> One last thing to check if using CentOS or RedHat. I?ve seen it happen on a handful of clusters where transparent hugepages can create odd load issues and very high sys CPU usage under top (not just with maker but with BWA, GATK, and other programs that can have larger memory footprints). If using CentOS or RedHat, you may want to disable defrag for hugepages. You do this on CentOS 6 to disable it (the process is similar on CentOS 7 and RedHat but you may have to google it) ?> echo never > /sys/kernel/mm/transparent_hugepage/defrag echo 0 > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag ?Carson > On May 23, 2017, at 2:31 PM, admin at genome.arizona.edu wrote: > > Carson Holt wrote on 05/23/2017 01:19 PM: >> You can pull back to 128 would if you find that you are running low on RAM or have a high IO burden (both of which will double if you go from 128 to 256 even though CPU isn?t really doubling). Also MAKER per job performance plateaus at around 200 processes due to communication overhead. Above that threshold it is often useful to divide datasets into multiple separate jobs that can run simultaneously. > > Yes, with '-n 192' we found the load on the cluster will initially go up to 360-380 but then continually decreases until maker is finished. Memory usage was very low during the processing (under 20%). > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue May 23 15:34:15 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 23 May 2017 15:34:15 -0600 Subject: [maker-devel] augustus exon calling ~ In-Reply-To: References: Message-ID: EVM works extremely well when evidence closely matches the predictions and there are no assembly anomalies affecting ORF. Otherwise, EVM performs very very poorly. Also I would not set unmask=1. It adds noise to the calls. Note in all cases given, gene models are from Augustus (MAKER doesn?t make predictions). MAKER just provides hints that Augustus can use for the second call set. Hints boost the score a model gets whenever a feature matches the hint. What you see as an Augustus match/match_part feature are just references of what Augustus calls without hints. So if I tell Augustus there is probably an exon/intron at location X, then any model that includes that exon/intron will bump up its score thus causing Augustus to keep models that match the hints and report those over models that don?t match. However if there is an issue with the evidence (i.e. merge mRNA-seq assembly), or an issue with the assembly (base change generates an early stop codon or causes a frameshift), then Augustus may choose to truncate or skip an exon in order to capture the bonus from downstream hints. So it is unlikely that there is a workable model that capture the exact intron exon structure because it breaks the ORF at some point. So Augustus instead produces the best model it can to capture as many hint bonuses as it can. That being said, look for any odd hint sources like very poor protein or transcript evidence alignments. Eliminating bad hints will improve performance (if using mRNA-seq assemblies Trinity has a jaccard_clip option which helps avoid false merging of transcript evidence for example). Or if an organism you used for protein evidence constantly produces bad protein alignments, then you may want to drop it completely from evidence. Finally training Augustus on the genome being annotated will help improve performance (note just because a species is closely related in evolutionary space does not mean that its HMM's will perform well; it?s a common fallacy about ab initio prediction discussed in the SNAP paper). Also try adding another gene predictor like SNAP to see if it hurts or helps. ?Carson > On May 21, 2017, at 1:48 AM, Salim Bougouffa wrote: > > Hi Maker folks, > > I have several issues with a plant genome annotation that I am currently doing but perhaps the most recurrent issues are: > > 1/ CDSs that are missed where significant rna-seq evidence is there (figure artemis01) > 2/ vice versa where one or two exons are added without rna-seq evidence/intron hints (figure artemis02) > > info about the runs: > 1/ using augustus with a pre-existing model for a related plant that has high homology to the one I am annotating > 2/ umask=1 (seems to do better than umask=0; is this a good thing to do) > 3/ evm = 1 (seems to perform better than emv=0) > 4/ repeatmasking (denovo + repbase) > > Best, > /SB > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan.ricks at gmail.com Tue May 23 15:23:49 2017 From: nathan.ricks at gmail.com (Nathan Ricks) Date: Tue, 23 May 2017 15:23:49 -0600 Subject: [maker-devel] maker_functional_gff Message-ID: I've been working with maker and trying to use the maker_functional_gff to create an annotated .gff file. However, whenever I run the command, the following pops up, and just continues for a long time. Use of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58363. Use of uninitialized value $qid in hash element at ./maker_functional_gff line 170, <$IN> line 58363. Use of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58367. Use of uninitialized value $qid in hash element at ./maker_functional_gff line 170, <$IN> line 58367. ^CUse of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58368. Nathan Ricks -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.ence at ufl.edu Tue May 23 15:39:32 2017 From: d.ence at ufl.edu (Ence,daniel) Date: Tue, 23 May 2017 21:39:32 +0000 Subject: [maker-devel] maker_functional_gff In-Reply-To: References: Message-ID: <9388406A-302C-4B19-9F35-D56C06CC9582@mail.ufl.edu> Hi Nathan, can you send the command line that you?re using and is giving the error? Thanks, Daniel Ence > On May 23, 2017, at 5:23 PM, Nathan Ricks wrote: > > I've been working with maker and trying to use the maker_functional_gff to create an annotated .gff file. However, whenever I run the command, the following pops up, and just continues for a long time. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58363. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 170, <$IN> line 58363. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58367. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 170, <$IN> line 58367. > ^CUse of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58368. > > > Nathan Ricks > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue May 23 15:44:05 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 23 May 2017 15:44:05 -0600 Subject: [maker-devel] maker_functional_gff In-Reply-To: References: Message-ID: <25B003D7-6A19-4486-B21F-71070F00A580@gmail.com> The blast report you gave it is in the wrong format, it is partial/truncated, or you provided the files in the wrong order. Basically it receive an empty line from the file at some point. The blast report format must in tabular foramt which is "wu-blast -mformat 2? or "ncbi-blast -outfmt 6" Also the script only supports blast results against UniProt/Swiss-prot. ?Carson > On May 23, 2017, at 3:23 PM, Nathan Ricks wrote: > > I've been working with maker and trying to use the maker_functional_gff to create an annotated .gff file. However, whenever I run the command, the following pops up, and just continues for a long time. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58363. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 170, <$IN> line 58363. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58367. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 170, <$IN> line 58367. > ^CUse of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58368. > > > Nathan Ricks > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From yuejiaxing at gmail.com Fri May 26 03:28:48 2017 From: yuejiaxing at gmail.com (Jia-Xing Yue) Date: Fri, 26 May 2017 11:28:48 +0200 Subject: [maker-devel] multiple overlapped snoRNA genes got annotated by maker In-Reply-To: References: <0AC3F89F-28EB-4E2A-ADDE-2DF8BD625416@gmail.com> Message-ID: Hi Michael, This is a follow-up for the snoscan issue. I found the snoscan_meth option seems have been removed in the current maker_opts.ctl template file (v2.31.9). This option used to be there according to this post ( https://www.biostars.org/p/217240/). I manually specified this option in my maker_opts.ctl file but I don't think maker has correctly recognized this option: STATUS: Parsing control files... WARNING: Invalid option 'snoscan_meth' in control file maker_opts.ctl ... Do you know is there a way to work around this problem? Thanks! Best, Jia-Xing On Mon, May 15, 2017 at 7:14 PM, Jia-Xing Yue wrote: > Hi Michael, > > Many thanks for the information! I will specify the "snoscan_meth" file > and give it another try then. I majorly want to use maker to annotate > protein-coding genes and tRNAs. But it would be nice to have snoRNA > reasonably annotated as well. > Thanks gain and have a great day! > > Best, > Jia-Xing > > > On Mon, May 15, 2017 at 6:28 PM, Michael Campbell < > michael.s.campbell1 at gmail.com> wrote: > >> Hi Jia-Xing, >> >> That has been my experience in the past as well. For the non-coding RNAs >> tRNA-scan is very accurate while snoscan seems to be quite sensitive but >> very specific. Did you give it a ?snoscan_meth? file? Giving it >> a snoscan_meth file will help with accuracy. The biggest gains in accuracy >> are from small RNA-seq data. In the paper where we used snoscan on maize we >> didn?t keep any snoRNA predictions that didn?t have support from small >> RNA-seq data, in practical terms we got rid of anything with a AED of 1. >> >> I hope this helps, >> Mike >> >> On May 15, 2017, at 7:28 AM, Jia-Xing Yue wrote: >> >> Hello, >> >> I configured snoscan (v.0.9.1) for my maker installation (v2.31.9) and >> run the annotation for a yeast (S. cerevisiae) genome. I think the >> annotation went well with regard to tRNAs and protein-coding genes but I am >> not sure about snoRNAs. I found multiple overlapped snoRNA genes were >> annotated by maker as the example below shows. I was wondering if this is >> expected. If not, what might have caused this problem and is there a way to >> work around. Thanks in advance! >> >> chrIX maker gene 4328 4416 . + . >> ID=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-nonc >> oding-gene-0.49 >> chrIX maker snoRNA 4328 4416 . + . >> ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;Parent=snoscan >> -chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding- >> gene-0.49-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|90|0 >> chrIX maker exon 4328 4416 . + . >> ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1:exon:12260;Par >> ent=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1 >> chrIX maker gene 4375 4563 . + . >> ID=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-nonc >> oding-gene-0.50 >> chrIX maker snoRNA 4375 4563 . + . >> ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;Parent=snoscan >> -chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding- >> gene-0.50-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|190|0 >> chrIX maker exon 4375 4563 . + . >> ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1:exon:12261;Par >> ent=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1 >> chrIX maker gene 4375 4461 . + . >> ID=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-nonc >> oding-gene-0.51 >> chrIX maker snoRNA 4375 4461 . + . >> ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;Parent=snoscan >> -chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding- >> gene-0.51-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|88|0 >> chrIX maker exon 4375 4461 . + . >> ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1:exon:12262;Par >> ent=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1 >> chrIX maker gene 4375 4491 . + . >> ID=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-nonc >> oding-gene-0.52 >> chrIX maker snoRNA 4375 4491 . + . >> ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;Parent=snoscan >> -chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding- >> gene-0.52-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|118|0 >> chrIX maker exon 4375 4491 . + . >> ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1:exon:12263;Par >> ent=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1 >> chrIX maker gene 4375 4500 . + . >> ID=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-nonc >> oding-gene-0.53 >> chrIX maker snoRNA 4375 4500 . + . >> ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;Parent=snoscan >> -chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding- >> gene-0.53-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|127|0 >> chrIX maker exon 4375 4500 . + . >> ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1:exon:12264;Par >> ent=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1 >> >> Best, >> Jia-Xing >> >> >> -- >> Jia-Xing Yue >> >> Population Genomics and Complex Traits Group >> Tour Pasteur 8eme etage >> Facult? de M?decine >> Institute for Research on Cancer and Aging, Nice (IRCAN) >> CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) >> 28 Avenue de Valombrose >> 06107 NICE Cedex 2 >> France >> >> Personal website: http://www.iamphioxus.org/ >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > > -- > Jia-Xing Yue > > Population Genomics and Complex Traits Group > Tour Pasteur 8eme etage > Facult? de M?decine > Institute for Research on Cancer and Aging, Nice (IRCAN) > CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) > 28 Avenue de Valombrose > 06107 NICE Cedex 2 > France > > Personal website: http://www.iamphioxus.org/ > > -- Jia-Xing Yue Population Genomics and Complex Traits Group Tour Pasteur 8eme etage Facult? de M?decine Institute for Research on Cancer and Aging, Nice (IRCAN) CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) 28 Avenue de Valombrose 06107 NICE Cedex 2 France Personal website: http://www.iamphioxus.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Fri May 26 07:54:44 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Fri, 26 May 2017 09:54:44 -0400 Subject: [maker-devel] multiple overlapped snoRNA genes got annotated by maker In-Reply-To: References: <0AC3F89F-28EB-4E2A-ADDE-2DF8BD625416@gmail.com> Message-ID: <6461FDD0-BE78-403A-9FEF-E71C3D24F2CA@gmail.com> Hi Jia-Xing, v2.31.9 may not have had that option. I know that it is in the v3.00.0 version, so you best option may be to update. Thanks, Mike > On May 26, 2017, at 5:28 AM, Jia-Xing Yue wrote: > > Hi Michael, > > This is a follow-up for the snoscan issue. I found the snoscan_meth option seems have been removed in the current maker_opts.ctl template file (v2.31.9). This option used to be there according to this post (https://www.biostars.org/p/217240/ ). I manually specified this option in my maker_opts.ctl file but I don't think maker has correctly recognized this option: > > > STATUS: Parsing control files... > WARNING: Invalid option 'snoscan_meth' in control file maker_opts.ctl > ... > > Do you know is there a way to work around this problem? Thanks! > > Best, > Jia-Xing > > > > On Mon, May 15, 2017 at 7:14 PM, Jia-Xing Yue > wrote: > Hi Michael, > > Many thanks for the information! I will specify the "snoscan_meth" file and give it another try then. I majorly want to use maker to annotate protein-coding genes and tRNAs. But it would be nice to have snoRNA reasonably annotated as well. > Thanks gain and have a great day! > > Best, > Jia-Xing > > > On Mon, May 15, 2017 at 6:28 PM, Michael Campbell > wrote: > Hi Jia-Xing, > > That has been my experience in the past as well. For the non-coding RNAs tRNA-scan is very accurate while snoscan seems to be quite sensitive but very specific. Did you give it a ?snoscan_meth? file? Giving it a snoscan_meth file will help with accuracy. The biggest gains in accuracy are from small RNA-seq data. In the paper where we used snoscan on maize we didn?t keep any snoRNA predictions that didn?t have support from small RNA-seq data, in practical terms we got rid of anything with a AED of 1. > > I hope this helps, > Mike >> On May 15, 2017, at 7:28 AM, Jia-Xing Yue > wrote: >> >> Hello, >> >> I configured snoscan (v.0.9.1) for my maker installation (v2.31.9) and run the annotation for a yeast (S. cerevisiae) genome. I think the annotation went well with regard to tRNAs and protein-coding genes but I am not sure about snoRNAs. I found multiple overlapped snoRNA genes were annotated by maker as the example below shows. I was wondering if this is expected. If not, what might have caused this problem and is there a way to work around. Thanks in advance! >> >> chrIX maker gene 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding-gene-0.49 >> chrIX maker snoRNA 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|90|0 >> chrIX maker exon 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1:exon:12260;Parent=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1 >> chrIX maker gene 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding-gene-0.50 >> chrIX maker snoRNA 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|190|0 >> chrIX maker exon 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1:exon:12261;Parent=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1 >> chrIX maker gene 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding-gene-0.51 >> chrIX maker snoRNA 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|88|0 >> chrIX maker exon 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1:exon:12262;Parent=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1 >> chrIX maker gene 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding-gene-0.52 >> chrIX maker snoRNA 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|118|0 >> chrIX maker exon 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1:exon:12263;Parent=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1 >> chrIX maker gene 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding-gene-0.53 >> chrIX maker snoRNA 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|127|0 >> chrIX maker exon 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1:exon:12264;Parent=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1 >> >> Best, >> Jia-Xing >> >> >> -- >> Jia-Xing Yue >> >> Population Genomics and Complex Traits Group >> Tour Pasteur 8eme etage >> Facult? de M?decine >> Institute for Research on Cancer and Aging, Nice (IRCAN) >> CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) >> 28 Avenue de Valombrose >> 06107 NICE Cedex 2 >> France >> >> Personal website: http://www.iamphioxus.org/ >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > Jia-Xing Yue > > Population Genomics and Complex Traits Group > Tour Pasteur 8eme etage > Facult? de M?decine > Institute for Research on Cancer and Aging, Nice (IRCAN) > CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) > 28 Avenue de Valombrose > 06107 NICE Cedex 2 > France > > Personal website: http://www.iamphioxus.org/ > > > > > -- > Jia-Xing Yue > > Population Genomics and Complex Traits Group > Tour Pasteur 8eme etage > Facult? de M?decine > Institute for Research on Cancer and Aging, Nice (IRCAN) > CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) > 28 Avenue de Valombrose > 06107 NICE Cedex 2 > France > > Personal website: http://www.iamphioxus.org/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yuejiaxing at gmail.com Fri May 26 08:20:03 2017 From: yuejiaxing at gmail.com (Jia-Xing Yue) Date: Fri, 26 May 2017 16:20:03 +0200 Subject: [maker-devel] multiple overlapped snoRNA genes got annotated by maker In-Reply-To: <6461FDD0-BE78-403A-9FEF-E71C3D24F2CA@gmail.com> References: <0AC3F89F-28EB-4E2A-ADDE-2DF8BD625416@gmail.com> <6461FDD0-BE78-403A-9FEF-E71C3D24F2CA@gmail.com> Message-ID: I see. Thanks Michael! Best, Jia-Xing On Fri, May 26, 2017 at 3:54 PM, Michael Campbell < michael.s.campbell1 at gmail.com> wrote: > Hi Jia-Xing, > > v2.31.9 may not have had that option. I know that it is in the v3.00.0 > version, so you best option may be to update. > > Thanks, > Mike > > On May 26, 2017, at 5:28 AM, Jia-Xing Yue wrote: > > Hi Michael, > > This is a follow-up for the snoscan issue. I found the snoscan_meth option > seems have been removed in the current maker_opts.ctl template file > (v2.31.9). This option used to be there according to this post ( > https://www.biostars.org/p/217240/). I manually specified this option in > my maker_opts.ctl file but I don't think maker has correctly recognized > this option: > > > STATUS: Parsing control files... > WARNING: Invalid option 'snoscan_meth' in control file maker_opts.ctl > ... > > Do you know is there a way to work around this problem? Thanks! > > Best, > Jia-Xing > > > > On Mon, May 15, 2017 at 7:14 PM, Jia-Xing Yue > wrote: > >> Hi Michael, >> >> Many thanks for the information! I will specify the "snoscan_meth" file >> and give it another try then. I majorly want to use maker to annotate >> protein-coding genes and tRNAs. But it would be nice to have snoRNA >> reasonably annotated as well. >> Thanks gain and have a great day! >> >> Best, >> Jia-Xing >> >> >> On Mon, May 15, 2017 at 6:28 PM, Michael Campbell < >> michael.s.campbell1 at gmail.com> wrote: >> >>> Hi Jia-Xing, >>> >>> That has been my experience in the past as well. For the non-coding RNAs >>> tRNA-scan is very accurate while snoscan seems to be quite sensitive but >>> very specific. Did you give it a ?snoscan_meth? file? Giving it >>> a snoscan_meth file will help with accuracy. The biggest gains in accuracy >>> are from small RNA-seq data. In the paper where we used snoscan on maize we >>> didn?t keep any snoRNA predictions that didn?t have support from small >>> RNA-seq data, in practical terms we got rid of anything with a AED of 1. >>> >>> I hope this helps, >>> Mike >>> >>> On May 15, 2017, at 7:28 AM, Jia-Xing Yue wrote: >>> >>> Hello, >>> >>> I configured snoscan (v.0.9.1) for my maker installation (v2.31.9) and >>> run the annotation for a yeast (S. cerevisiae) genome. I think the >>> annotation went well with regard to tRNAs and protein-coding genes but I am >>> not sure about snoRNAs. I found multiple overlapped snoRNA genes were >>> annotated by maker as the example below shows. I was wondering if this is >>> expected. If not, what might have caused this problem and is there a way to >>> work around. Thanks in advance! >>> >>> chrIX maker gene 4328 4416 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-nonc >>> oding-gene-0.49 >>> chrIX maker snoRNA 4328 4416 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;Parent=snoscan >>> -chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding-gene >>> -0.49-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|90|0 >>> chrIX maker exon 4328 4416 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1:exon:12260;Par >>> ent=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1 >>> chrIX maker gene 4375 4563 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-nonc >>> oding-gene-0.50 >>> chrIX maker snoRNA 4375 4563 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;Parent=snoscan >>> -chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding-gene >>> -0.50-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|190|0 >>> chrIX maker exon 4375 4563 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1:exon:12261;Par >>> ent=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1 >>> chrIX maker gene 4375 4461 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-nonc >>> oding-gene-0.51 >>> chrIX maker snoRNA 4375 4461 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;Parent=snoscan >>> -chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding-gene >>> -0.51-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|88|0 >>> chrIX maker exon 4375 4461 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1:exon:12262;Par >>> ent=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1 >>> chrIX maker gene 4375 4491 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-nonc >>> oding-gene-0.52 >>> chrIX maker snoRNA 4375 4491 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;Parent=snoscan >>> -chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding-gene >>> -0.52-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|118|0 >>> chrIX maker exon 4375 4491 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1:exon:12263;Par >>> ent=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1 >>> chrIX maker gene 4375 4500 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-nonc >>> oding-gene-0.53 >>> chrIX maker snoRNA 4375 4500 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;Parent=snoscan >>> -chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding-gene >>> -0.53-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|127|0 >>> chrIX maker exon 4375 4500 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1:exon:12264;Par >>> ent=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1 >>> >>> Best, >>> Jia-Xing >>> >>> >>> -- >>> Jia-Xing Yue >>> >>> Population Genomics and Complex Traits Group >>> Tour Pasteur 8eme etage >>> Facult? de M?decine >>> Institute for Research on Cancer and Aging, Nice (IRCAN) >>> CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) >>> 28 Avenue de Valombrose >>> 06107 NICE Cedex 2 >>> France >>> >>> Personal website: http://www.iamphioxus.org/ >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >> >> >> -- >> Jia-Xing Yue >> >> Population Genomics and Complex Traits Group >> Tour Pasteur 8eme etage >> Facult? de M?decine >> Institute for Research on Cancer and Aging, Nice (IRCAN) >> CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) >> 28 Avenue de Valombrose >> 06107 NICE Cedex 2 >> France >> >> Personal website: http://www.iamphioxus.org/ >> >> > > > -- > Jia-Xing Yue > > Population Genomics and Complex Traits Group > Tour Pasteur 8eme etage > Facult? de M?decine > Institute for Research on Cancer and Aging, Nice (IRCAN) > CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) > 28 Avenue de Valombrose > 06107 NICE Cedex 2 > France > > Personal website: http://www.iamphioxus.org/ > > > -- Jia-Xing Yue Population Genomics and Complex Traits Group Tour Pasteur 8eme etage Facult? de M?decine Institute for Research on Cancer and Aging, Nice (IRCAN) CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) 28 Avenue de Valombrose 06107 NICE Cedex 2 France Personal website: http://www.iamphioxus.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcg at cau.edu.cn Mon May 1 07:32:30 2017 From: dcg at cau.edu.cn (dcg at cau.edu.cn) Date: Mon, 1 May 2017 21:32:30 +0800 Subject: [maker-devel] Why my maker get no results? Message-ID: <2017050121323023791817@cau.edu.cn> Dear sir: I' have bben working on genome annotation these days.My process in as below: 1. I split my contigs into 300 parts and deal with them simultaneously to speed up my process. 2. I used my splited-genome, protein, ESTs and RNA-seq to make the first alignment( est2genome=1, AED_threshold=0.2 ). 3. Merge the maker.*_.master_datastore_index.log to get all the paths of results. 4. gff_merge script to merge all the results in different dirs. However, there is no results returned. (My genome is about 3GB, but the gff of result is none.) index_all.log.all.gff 1KB index_all.log.all.maker.proteins.fasta 2837KB index_all.log.all.maker.transcripts.fasta 9866KB Where can the problems take place? Thanks! Yours sincerely. Chao Chao dcg at cau.edu.cn -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 1 14:04:36 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 1 May 2017 14:04:36 -0600 Subject: [maker-devel] Why my maker get no results? In-Reply-To: <2017050121323023791817@cau.edu.cn> References: <2017050121323023791817@cau.edu.cn> Message-ID: <28772B4F-D674-49E2-BFBD-CE2651CE0454@gmail.com> You can merge datastore indexes that way. You will need to run them separately (i.e. unmodified location and content from what MAKER gave you), and then merge the fasta and gff3 files afterwards. ?Carson > On May 1, 2017, at 7:32 AM, dcg at cau.edu.cn wrote: > > Dear sir: > I' have bben working on genome annotation these days.My process in as below: > > 1. I split my contigs into 300 parts and deal with them simultaneously to speed up my process. > 2. I used my splited-genome, protein, ESTs and RNA-seq to make the first alignment( est2genome=1, AED_threshold=0.2 ). > 3. Merge the maker.*_.master_datastore_index.log to get all the paths of results. > 4. gff_merge script to merge all the results in different dirs. > > However, there is no results returned. (My genome is about 3GB, but the gff of result is none.) > index_all.log.all.gff 1KB > index_all.log.all.maker.proteins.fasta 2837KB > index_all.log.all.maker.transcripts.fasta 9866KB > > > > > Where can the problems take place? > Thanks! > Yours sincerely. > > Chao Chao > dcg at cau.edu.cn _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim03ljy at 126.com Wed May 3 06:15:22 2017 From: jim03ljy at 126.com (=?GBK?B?wqy98NStSmlt?=) Date: Wed, 3 May 2017 20:15:22 +0800 (CST) Subject: [maker-devel] RepeatMasker: NCBIBlastSearchEngine::search: Error Message-ID: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> Hi, I'm a newbie of maker. I met some errors in Repeatmasker step. The error is here: NCBIBlastSearchEngine::search: Error...compressed subject database (/home/softwares/RepeatMasker/Libraries/20170127/general/is.lib) does not exist! I tried ncbi+blast 2.5.0 version and 2.6.0 version as the path to blast, both have the same error. And when I use the command as "maker -R", which skips the repeatmasker step, the maker could work. I checked the former similar errors reported by another user and he solved the problem by updating the RepBase. So, I deleted and re-installed the RepeatMasker, updated the RepBase, also installed RMblast. The error is the same. I'm stuck in the problem now. Would highly appreciate any help - thanks! Jinyuan Lu Shanghai Jiao Tong University No. 800 Dong Chuan Road,Minhang District, Shanghai, P.R. China -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcg at cau.edu.cn Wed May 3 09:29:18 2017 From: dcg at cau.edu.cn (dcg at cau.edu.cn) Date: Wed, 3 May 2017 23:29:18 +0800 Subject: [maker-devel] How to explain the maker results? Message-ID: <2017050323291810262239@cau.edu.cn> Dear sir: I?ve been using maker to do my genome annotation. However, I still have something I can't understand: 1. After assembly, I have many contigs. Firstly, I set est2genome=1 and protein2genome=1 , with my proteins, ESTs and RNA-seq.. Which way below is correct? 1.1 Each contig has its own gff. I just use its own maker_gff file to get a pyu.hmm(be used in snap practice), and then, train the single contig. 1.2 I merge all the maker_gff to produce a pyu.hmm(for snap) , and then, use this pyu.hmm to train all the contigs. 2. The aim of my project is to find new protein, so I need to guarantee the rigor of my annotation. I made a plan that the predicted protein should be successfully aligned to the Uniprot(reviewed protein, total number is about 30K) with 100% identity and coverage. However, if I choose method 1.2 as above: After the first step (est2genome=1 and protein2genome=1), about 1600 proteins can be 100% aligned to the Uniprot. After 2 rounds training(est2genome=0 and protein2genome=0), less proteins can be 100% aligned. Is my test method reasonable? Why the final results can't get more well aligned proteins? After training and fasta_merge, the results can be index_all.log.all.maker.proteins.fasta, index_all.log.all.maker.snap_masked.proteins.fasta, index_all.log.all.maker.non_overlapping_ab_initio.proteins.fasta, which is the final results? I'm looking forward to hearing from you. Thanks! Yours sincerely! Chao Chao dcg at cau.edu.cn -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.ence at ufl.edu Wed May 3 09:49:08 2017 From: d.ence at ufl.edu (Ence,daniel) Date: Wed, 3 May 2017 15:49:08 +0000 Subject: [maker-devel] RepeatMasker: NCBIBlastSearchEngine::search: Error In-Reply-To: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> References: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> Message-ID: Hi, the error is regarding a specific file (is.lib) which isn?t being found. Can you verify that the file is there after you updated Repbase? Use the command: ?ls -l /home/softwares/RepeatMasker/Libraries/20170127/general/is.lib? Thanks, Daniel Ence On May 3, 2017, at 8:15 AM, ???Jim > wrote: Hi, I'm a newbie of maker. I met some errors in Repeatmasker step. The error is here: NCBIBlastSearchEngine::search: Error...compressed subject database (/home/softwares/RepeatMasker/Libraries/20170127/general/is.lib) does not exist! I tried ncbi+blast 2.5.0 version and 2.6.0 version as the path to blast, both have the same error. And when I use the command as "maker -R", which skips the repeatmasker step, the maker could work. I checked the former similar errors reported by another user and he solved the problem by updating the RepBase. So, I deleted and re-installed the RepeatMasker, updated the RepBase, also installed RMblast. The error is the same. I'm stuck in the problem now. Would highly appreciate any help - thanks! Jinyuan Lu Shanghai Jiao Tong University No. 800 Dong Chuan Road,Minhang District, Shanghai, P.R. China _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 3 09:53:40 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 3 May 2017 09:53:40 -0600 Subject: [maker-devel] RepeatMasker: NCBIBlastSearchEngine::search: Error In-Reply-To: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> References: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> Message-ID: <92771056-564B-4953-B738-5A1B97FC71AF@gmail.com> RepBase and RepeatMasker have changed structure with the new 4.0.7 released two months ago. The new version and RepBase is only compatible with the new version of RepeatMasker. You have to update both (complete reinstall). Or you have to use the previous version of RepeatMasker with the previous version of RepBase. ?Carson > On May 3, 2017, at 6:15 AM, ???Jim wrote: > > Hi, I'm a newbie of maker. > I met some errors in Repeatmasker step. > > The error is here: > NCBIBlastSearchEngine::search: Error...compressed subject database (/home/softwares/RepeatMasker/Libraries/20170127/general/is.lib) does not exist! > > I tried ncbi+blast 2.5.0 version and 2.6.0 version as the path to blast, both have the same error. > And when I use the command as "maker -R", which skips the repeatmasker step, the maker could work. > I checked the former similar errors reported by another user and he solved the problem by updating the RepBase. > So, > I deleted and re-installed the RepeatMasker, updated the RepBase, also installed RMblast. > The error is the same. > > I'm stuck in the problem now. > Would highly appreciate any help - thanks! > > Jinyuan Lu > Shanghai Jiao Tong University > No. 800 Dong Chuan Road,Minhang District, Shanghai, P.R. China > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 3 09:55:41 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 3 May 2017 09:55:41 -0600 Subject: [maker-devel] RepeatMasker: NCBIBlastSearchEngine::search: Error In-Reply-To: <92771056-564B-4953-B738-5A1B97FC71AF@gmail.com> References: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> <92771056-564B-4953-B738-5A1B97FC71AF@gmail.com> Message-ID: <2E668C83-8884-430B-A764-CD0B44D03D19@gmail.com> You may want to use the previous version of both as the new version may still have hidden bugs. ?Carson > On May 3, 2017, at 9:53 AM, Carson Holt wrote: > > RepBase and RepeatMasker have changed structure with the new 4.0.7 released two months ago. The new version and RepBase is only compatible with the new version of RepeatMasker. You have to update both (complete reinstall). Or you have to use the previous version of RepeatMasker with the previous version of RepBase. > > ?Carson > > > >> On May 3, 2017, at 6:15 AM, ???Jim > wrote: >> >> Hi, I'm a newbie of maker. >> I met some errors in Repeatmasker step. >> >> The error is here: >> NCBIBlastSearchEngine::search: Error...compressed subject database (/home/softwares/RepeatMasker/Libraries/20170127/general/is.lib) does not exist! >> >> I tried ncbi+blast 2.5.0 version and 2.6.0 version as the path to blast, both have the same error. >> And when I use the command as "maker -R", which skips the repeatmasker step, the maker could work. >> I checked the former similar errors reported by another user and he solved the problem by updating the RepBase. >> So, >> I deleted and re-installed the RepeatMasker, updated the RepBase, also installed RMblast. >> The error is the same. >> >> I'm stuck in the problem now. >> Would highly appreciate any help - thanks! >> >> Jinyuan Lu >> Shanghai Jiao Tong University >> No. 800 Dong Chuan Road,Minhang District, Shanghai, P.R. China >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 3 10:04:20 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 3 May 2017 10:04:20 -0600 Subject: [maker-devel] RepeatMasker: NCBIBlastSearchEngine::search: Error In-Reply-To: <2E668C83-8884-430B-A764-CD0B44D03D19@gmail.com> References: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> <92771056-564B-4953-B738-5A1B97FC71AF@gmail.com> <2E668C83-8884-430B-A764-CD0B44D03D19@gmail.com> Message-ID: You may have to contact RepBase via e-mail to find out how to get the libraries compatible with RepeatMasker 4.0.6 as it looks like they have removed the previous release from the website. The last release for 4.0.6 was ?> repeatmaskerlibraries-20160829.tar.gz ?Carson > On May 3, 2017, at 9:55 AM, Carson Holt wrote: > > You may want to use the previous version of both as the new version may still have hidden bugs. > > ?Carson > >> On May 3, 2017, at 9:53 AM, Carson Holt > wrote: >> >> RepBase and RepeatMasker have changed structure with the new 4.0.7 released two months ago. The new version and RepBase is only compatible with the new version of RepeatMasker. You have to update both (complete reinstall). Or you have to use the previous version of RepeatMasker with the previous version of RepBase. >> >> ?Carson >> >> >> >>> On May 3, 2017, at 6:15 AM, ???Jim > wrote: >>> >>> Hi, I'm a newbie of maker. >>> I met some errors in Repeatmasker step. >>> >>> The error is here: >>> NCBIBlastSearchEngine::search: Error...compressed subject database (/home/softwares/RepeatMasker/Libraries/20170127/general/is.lib) does not exist! >>> >>> I tried ncbi+blast 2.5.0 version and 2.6.0 version as the path to blast, both have the same error. >>> And when I use the command as "maker -R", which skips the repeatmasker step, the maker could work. >>> I checked the former similar errors reported by another user and he solved the problem by updating the RepBase. >>> So, >>> I deleted and re-installed the RepeatMasker, updated the RepBase, also installed RMblast. >>> The error is the same. >>> >>> I'm stuck in the problem now. >>> Would highly appreciate any help - thanks! >>> >>> Jinyuan Lu >>> Shanghai Jiao Tong University >>> No. 800 Dong Chuan Road,Minhang District, Shanghai, P.R. China >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 3 10:10:48 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 3 May 2017 10:10:48 -0600 Subject: [maker-devel] How to explain the maker results? In-Reply-To: <2017050323291810262239@cau.edu.cn> References: <2017050323291810262239@cau.edu.cn> Message-ID: <049F8AC8-7E16-4F05-B8B2-01CA7AB88751@gmail.com> Use the merged gff3 to train snap, otherwise you won?t have enough models. Info on training can be found on the wiki ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors Also you can find additional detailed info by searching the mailing list archives ?> http://groups.google.com/group/maker-devel I?m not sure what you are asking with the last question. Alignment is not a function of training, and will not be affected by the hmm, but 100% coverage and identity is too strict a threshold even for data derived from the same species. ?Carson > On May 3, 2017, at 9:29 AM, dcg at cau.edu.cn wrote: > > Dear sir: > I?ve been using maker to do my genome annotation. However, I still have something I can't understand: > > 1. After assembly, I have many contigs. Firstly, I set est2genome=1 and protein2genome=1 , with my proteins, ESTs and RNA-seq.. Which way below is correct? > 1.1 Each contig has its own gff. I just use its own maker_gff file to get a pyu.hmm(be used in snap practice), and then, train the single contig. > 1.2 I merge all the maker_gff to produce a pyu.hmm(for snap) , and then, use this pyu.hmm to train all the contigs. > > 2. The aim of my project is to find new protein, so I need to guarantee the rigor of my annotation. > I made a plan that the predicted protein should be successfully aligned to the Uniprot(reviewed protein, total number is about 30K) with 100% identity and coverage. > However, if I choose method 1.2 as above: > After the first step (est2genome=1 and protein2genome=1), about 1600 proteins can be 100% aligned to the Uniprot. After 2 rounds training(est2genome=0 and protein2genome=0), less proteins can be 100% aligned. > Is my test method reasonable? Why the final results can't get more well aligned proteins? > After training and fasta_merge, the results can be index_all.log.all.maker.proteins.fasta, index_all.log.all.maker.snap_masked.proteins.fasta, index_all.log.all.maker.non_overlapping_ab_initio.proteins.fasta, which is the final results? > > > I'm looking forward to hearing from you. Thanks! > Yours sincerely! > > > Chao Chao > dcg at cau.edu.cn _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan.ricks at gmail.com Wed May 3 10:19:57 2017 From: nathan.ricks at gmail.com (Nathan Ricks) Date: Wed, 3 May 2017 10:19:57 -0600 Subject: [maker-devel] Post Processing of Annotations Message-ID: Hi, I've been running your Maker pipeline, and I've reached Post Processing of Annotations portion. In your Online training you use the output.blastp and the outuput.iprscan files to help assign function. My question is what format do these files need to be in. Iprscan can produce files in a variety of formats: tsv, xml, gff3, html and SVG while blastp can produce the tabular, pairise, xml and a number of others. Thanks Nathan Ricks -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 3 10:30:03 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 3 May 2017 10:30:03 -0600 Subject: [maker-devel] Post Processing of Annotations In-Reply-To: References: Message-ID: Use blastp with the the tab delimited format option and the UniProt/Swiss-Prot database. What additional filters you choose to set (i.e. e-value limit) may vary, although I would recommend 1e-6 or lower. ?Carson > On May 3, 2017, at 10:19 AM, Nathan Ricks wrote: > > Hi, > I've been running your Maker pipeline, and I've reached Post Processing of Annotations portion. In your Online training you use the output.blastp and the outuput.iprscan files to help assign function. > My question is what format do these files need to be in. > Iprscan can produce files in a variety of formats: tsv, xml, gff3, html and SVG > while blastp can produce the tabular, pairise, xml and a number of others. > > Thanks > > Nathan Ricks > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From d.ence at ufl.edu Wed May 3 10:34:35 2017 From: d.ence at ufl.edu (Ence,daniel) Date: Wed, 3 May 2017 16:34:35 +0000 Subject: [maker-devel] Post Processing of Annotations In-Reply-To: References: Message-ID: Hi, The iprscan output should be in tsv format, which is tab-separated, and the usage statement for the maker_functional_gff says that the blastp output should be in ?wu-blast -mformat 2?, which I think is tabbed too. ~Daniel > On May 3, 2017, at 12:19 PM, Nathan Ricks wrote: > > Hi, > I've been running your Maker pipeline, and I've reached Post Processing of Annotations portion. In your Online training you use the output.blastp and the outuput.iprscan files to help assign function. > My question is what format do these files need to be in. > Iprscan can produce files in a variety of formats: tsv, xml, gff3, html and SVG > while blastp can produce the tabular, pairise, xml and a number of others. > > Thanks > > Nathan Ricks > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed May 3 13:20:31 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 3 May 2017 13:20:31 -0600 Subject: [maker-devel] Post Processing of Annotations In-Reply-To: References: Message-ID: <0EE0D7F6-5F28-46E7-9AB8-CED93DC811F6@gmail.com> The maker_functional_gff and maker_functional_fasta scripts pull specific fields out of the UniProt fasta header, so they are tied to the format used by UniProt/Swiss-Prot. At one time I had modified them to also work with NR, but that was several years ago, so I don?t know if it would still work. ?Carson > On May 3, 2017, at 1:10 PM, Nathan Ricks wrote: > > Is it possible to make my own database from sequences that I have downloaded form NCBI instead of using the UniProt/Swiss-Prot? > > On Wed, May 3, 2017 at 10:30 AM, Carson Holt > wrote: > Use blastp with the the tab delimited format option and the UniProt/Swiss-Prot database. What additional filters you choose to set (i.e. e-value limit) may vary, although I would recommend 1e-6 or lower. > > ?Carson > > > On May 3, 2017, at 10:19 AM, Nathan Ricks > wrote: > > > > Hi, > > I've been running your Maker pipeline, and I've reached Post Processing of Annotations portion. In your Online training you use the output.blastp and the outuput.iprscan files to help assign function. > > My question is what format do these files need to be in. > > Iprscan can produce files in a variety of formats: tsv, xml, gff3, html and SVG > > while blastp can produce the tabular, pairise, xml and a number of others. > > > > Thanks > > > > Nathan Ricks > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mjfi2sb3 at gmail.com Thu May 4 00:37:52 2017 From: mjfi2sb3 at gmail.com (Salim Bougouffa) Date: Thu, 04 May 2017 06:37:52 +0000 Subject: [maker-devel] advanced repeat masking library constructions & rna-seq assembly choices Message-ID: Hi, I am attempting to annotate a plant genome. I have a couple of questions: *1) RNA-seq assembly* a) I assembled my RNA-seq data using Trinity and StringTie. The two produce drastically different numbers. When I compare the two assemblies for each sample using TransRate, StringTie produces a higher score. for most of the assemblies. I see in all of the threads that you recommend Trinity but doesn't trinity produce way too many transcripts (even after chucking out the "bad" ones using transrate). b) During hint creation in MAKER, does it take into account that different transcripts have different read coverage (expression levels). I guess my question is should I filter transcripts that have a small read coverage. *2) Repeat Masking * I am following the advanced repeat library construction tutorial ( http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced). The initial steps find 15 sequences for the LTR and 159 for MITE. But, when I get to the perl DIR_CRL/CRL_Step4.pl step, both output files (Inner_Seq_For_BLAST.fasta, lLTRs_Seq_For_BLAST.fasta) are empty. a) are these numbers normal because I was expecting a lot more than 16 for the LTR? b) I don't get any errors when I run CRL_Step4.pl yet no output. What's going on?! Many thanks, /SB -- ____________________________ Sent from Inbox Mobile -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim03ljy at 126.com Thu May 4 00:36:12 2017 From: jim03ljy at 126.com (=?GBK?B?wqy98NStSmlt?=) Date: Thu, 4 May 2017 14:36:12 +0800 (CST) Subject: [maker-devel] RepeatMasker: NCBIBlastSearchEngine::search: Error In-Reply-To: References: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> <92771056-564B-4953-B738-5A1B97FC71AF@gmail.com> <2E668C83-8884-430B-A764-CD0B44D03D19@gmail.com> Message-ID: <12de56ed.5ded.15bd22c56c7.Coremail.jim03ljy@126.com> Thanks a lot! Problem solved. I matched the RepeatMasker 4.0.7 with RepBase20170127 and it worked! Thanks! ----Jinyuan Lu At 2017-05-04 00:04:20, "Carson Holt" wrote: You may have to contact RepBase via e-mail to find out how to get the libraries compatible with RepeatMasker 4.0.6 as it looks like they have removed the previous release from the website. The last release for 4.0.6 was ?> repeatmaskerlibraries-20160829.tar.gz ?Carson On May 3, 2017, at 9:55 AM, Carson Holt wrote: You may want to use the previous version of both as the new version may still have hidden bugs. ?Carson On May 3, 2017, at 9:53 AM, Carson Holt wrote: RepBase and RepeatMasker have changed structure with the new 4.0.7 released two months ago. The new version and RepBase is only compatible with the new version of RepeatMasker. You have to update both (complete reinstall). Or you have to use the previous version of RepeatMasker with the previous version of RepBase. ?Carson On May 3, 2017, at 6:15 AM, ???Jim wrote: Hi, I'm a newbie of maker. I met some errors in Repeatmasker step. The error is here: NCBIBlastSearchEngine::search: Error...compressed subject database (/home/softwares/RepeatMasker/Libraries/20170127/general/is.lib) does not exist! I tried ncbi+blast 2.5.0 version and 2.6.0 version as the path to blast, both have the same error. And when I use the command as "maker -R", which skips the repeatmasker step, the maker could work. I checked the former similar errors reported by another user and he solved the problem by updating the RepBase. So, I deleted and re-installed the RepeatMasker, updated the RepBase, also installed RMblast. The error is the same. I'm stuck in the problem now. Would highly appreciate any help - thanks! Jinyuan Lu Shanghai Jiao Tong University No. 800 Dong Chuan Road,Minhang District, Shanghai, P.R. China _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcg at cau.edu.cn Fri May 5 07:43:43 2017 From: dcg at cau.edu.cn (dcg at cau.edu.cn) Date: Fri, 5 May 2017 21:43:43 +0800 Subject: [maker-devel] How to evaluate maker proteins' quality? Message-ID: <2017050521434331108720@cau.edu.cn> Dear sir: After I finished my maker running, I should check the quality of my results. My annotation purpose is to find some new proteins. There is about 30K reviewed proteins of my species. If I want to see how many predicted proteins can support the reviewed proteins, how to do it?(Can blastp be OK? How to set the threshold? ) I used Uniprot, ESTs and RNA-seq to do my annotation. From my perspective, if the protein is reviewed and used to train snap/augustus, we should get the same one after several training rounds. So I planned to align maker_proteins to Uniprot proteins(which I utilized to annotate). If the predicted proteins match Uniprot by 100% identity and coverage, they can be thought to support the reviewed proteins. Is it correct? If not, maybe I can evaluate my proteins only by AED value and proteome domain? I'm looking forward to your help. Thanks a lot! Yours sincerely! Chao Chao dcg at cau.edu.cn -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun May 7 18:31:31 2017 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 7 May 2017 18:31:31 -0600 Subject: [maker-devel] How to evaluate maker proteins' quality? In-Reply-To: <2017050521434331108720@cau.edu.cn> References: <2017050521434331108720@cau.edu.cn> Message-ID: <51620CC3-43D9-47D5-B8B3-871F291D6518@gmail.com> Because of small differences in the assemblies, individual variants, annotated proteins used as reference being partial, as well as potential assembly error, a 100% identity expectation is too high. About 90+% would be more reasonable for a same species comparison. AED gives a good correlation with protein confidence. A perfect zero score will not happen often though since the way alignment algorithms work will leave alignment errors around splice sites and short exons. Also the evidence used is never perfect, so with AED lower values are better than higher values but can not be used as an overly specific measurement (it is only correlative and not exact). ?Carson > On May 5, 2017, at 7:43 AM, dcg at cau.edu.cn wrote: > > Dear sir: > After I finished my maker running, I should check the quality of my results. > My annotation purpose is to find some new proteins. > There is about 30K reviewed proteins of my species. If I want to see how many predicted proteins can support the reviewed proteins, how to do it?(Can blastp be OK? How to set the threshold? ) > I used Uniprot, ESTs and RNA-seq to do my annotation. From my perspective, if the protein is reviewed and used to train snap/augustus, we should get the same one after several training rounds. So I planned to align maker_proteins to Uniprot proteins(which I utilized to annotate). If the predicted proteins match Uniprot by 100% identity and coverage, they can be thought to support the reviewed proteins. Is it correct? > > If not, maybe I can evaluate my proteins only by AED value and proteome domain? > > I'm looking forward to your help. Thanks a lot! > > Yours sincerely! > > Chao Chao > dcg at cau.edu.cn _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun May 7 19:17:37 2017 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 7 May 2017 19:17:37 -0600 Subject: [maker-devel] advanced repeat masking library constructions & rna-seq assembly choices In-Reply-To: References: Message-ID: <18086AF2-01C3-4671-B974-C5FF36460618@gmail.com> Michael can you answer the second question (Michael wrote the protocol, so I CC?d him). With respect to the first question. Expression level is not necessarily relevant to the annotation process (so no MAKER does not look at read coverage). Instead we use the transcript assemblies to identify introns via splice aware alignment (yes it is the introns and not the exons we care about). Trinity has a nice option called jaccard_clip which avoids false merging of neighboring transcripts (mostly occurs in fungi where UTR can overlap). Merging of transcripts will cause extra introns to be assigned as hints as well as potential overextension of UTR during final polishing steps. The jaccard_clip option is the main reason we recommend Trinity. If Stringtie has a similar option, then it can be used as well. Thanks, Carson > On May 4, 2017, at 12:37 AM, Salim Bougouffa wrote: > > Hi, > > I am attempting to annotate a plant genome. I have a couple of questions: > > 1) RNA-seq assembly > a) I assembled my RNA-seq data using Trinity and StringTie. The two produce drastically different numbers. When I compare the two assemblies for each sample using TransRate, StringTie produces a higher score. for most of the assemblies. I see in all of the threads that you recommend Trinity but doesn't trinity produce way too many transcripts (even after chucking out the "bad" ones using transrate). > b) During hint creation in MAKER, does it take into account that different transcripts have different read coverage (expression levels). I guess my question is should I filter transcripts that have a small read coverage. > > 2) Repeat Masking > I am following the advanced repeat library construction tutorial (http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced ). The initial steps find 15 sequences for the LTR and 159 for MITE. But, when I get to the perl DIR_CRL/CRL_Step4.pl step, both output files (Inner_Seq_For_BLAST.fasta, lLTRs_Seq_For_BLAST.fasta) are empty. > > a) are these numbers normal because I was expecting a lot more than 16 for the LTR? > b) I don't get any errors when I run CRL_Step4.pl yet no output. What's going on?! > > Many thanks, > /SB > -- > ____________________________ > Sent from Inbox Mobile > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcampbel at cshl.edu Sun May 7 19:24:27 2017 From: mcampbel at cshl.edu (Campbell, Michael) Date: Mon, 8 May 2017 01:24:27 +0000 Subject: [maker-devel] advanced repeat masking library constructions & rna-seq assembly choices In-Reply-To: <18086AF2-01C3-4671-B974-C5FF36460618@gmail.com> References: <18086AF2-01C3-4671-B974-C5FF36460618@gmail.com> Message-ID: <076B034E-8107-49CE-90C7-277AA4AB4ED3@cshl.edu> Hi SB, I?ve added Ning Jaing to this email. She has put great effort into updating this protocol recently and will be able to address your questions better than I can. Ning, would you mind helping out with this? Thanks, Mike On May 7, 2017, at 9:17 PM, Carson Holt > wrote: Michael can you answer the second question (Michael wrote the protocol, so I CC?d him). With respect to the first question. Expression level is not necessarily relevant to the annotation process (so no MAKER does not look at read coverage). Instead we use the transcript assemblies to identify introns via splice aware alignment (yes it is the introns and not the exons we care about). Trinity has a nice option called jaccard_clip which avoids false merging of neighboring transcripts (mostly occurs in fungi where UTR can overlap). Merging of transcripts will cause extra introns to be assigned as hints as well as potential overextension of UTR during final polishing steps. The jaccard_clip option is the main reason we recommend Trinity. If Stringtie has a similar option, then it can be used as well. Thanks, Carson On May 4, 2017, at 12:37 AM, Salim Bougouffa > wrote: Hi, I am attempting to annotate a plant genome. I have a couple of questions: 1) RNA-seq assembly a) I assembled my RNA-seq data using Trinity and StringTie. The two produce drastically different numbers. When I compare the two assemblies for each sample using TransRate, StringTie produces a higher score. for most of the assemblies. I see in all of the threads that you recommend Trinity but doesn't trinity produce way too many transcripts (even after chucking out the "bad" ones using transrate). b) During hint creation in MAKER, does it take into account that different transcripts have different read coverage (expression levels). I guess my question is should I filter transcripts that have a small read coverage. 2) Repeat Masking I am following the advanced repeat library construction tutorial (http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced). The initial steps find 15 sequences for the LTR and 159 for MITE. But, when I get to the perl DIR_CRL/CRL_Step4.pl step, both output files (Inner_Seq_For_BLAST.fasta, lLTRs_Seq_For_BLAST.fasta) are empty. a) are these numbers normal because I was expecting a lot more than 16 for the LTR? b) I don't get any errors when I run CRL_Step4.pl yet no output. What's going on?! Many thanks, /SB -- ____________________________ Sent from Inbox Mobile _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From jiangn at msu.edu Mon May 8 09:50:45 2017 From: jiangn at msu.edu (Jiang, Ning) Date: Mon, 8 May 2017 15:50:45 +0000 Subject: [maker-devel] advanced repeat masking library constructions & rna-seq assembly choices In-Reply-To: <076B034E-8107-49CE-90C7-277AA4AB4ED3@cshl.edu> References: <18086AF2-01C3-4671-B974-C5FF36460618@gmail.com>, <076B034E-8107-49CE-90C7-277AA4AB4ED3@cshl.edu> Message-ID: Hi Salim, I am sorry to learn about the issues. it depends on the quality of your genome assembly for how many intact LTR elements you would get; however, 16 seems too low to me. The inner and LTR sequence file should NOT be empty. Some times the issue could be due to that the initial sequence name is long and complicated. If that's the case for your sequences, you might want to simplify your sequence name (only including letters and numbers) and try again. We are working on an automatic pipeline for LTR collection, if everything goes smoothly, it should be available in two to three months. Best wishes, Ning ________________________________ From: Campbell, Michael Sent: Sunday, May 7, 2017 9:24 PM To: Carson Holt Cc: Salim Bougouffa; maker-devel at yandell-lab.org List; Jiang, Ning Subject: Re: [maker-devel] advanced repeat masking library constructions & rna-seq assembly choices Hi SB, I?ve added Ning Jaing to this email. She has put great effort into updating this protocol recently and will be able to address your questions better than I can. Ning, would you mind helping out with this? Thanks, Mike On May 7, 2017, at 9:17 PM, Carson Holt > wrote: Michael can you answer the second question (Michael wrote the protocol, so I CC?d him). With respect to the first question. Expression level is not necessarily relevant to the annotation process (so no MAKER does not look at read coverage). Instead we use the transcript assemblies to identify introns via splice aware alignment (yes it is the introns and not the exons we care about). Trinity has a nice option called jaccard_clip which avoids false merging of neighboring transcripts (mostly occurs in fungi where UTR can overlap). Merging of transcripts will cause extra introns to be assigned as hints as well as potential overextension of UTR during final polishing steps. The jaccard_clip option is the main reason we recommend Trinity. If Stringtie has a similar option, then it can be used as well. Thanks, Carson On May 4, 2017, at 12:37 AM, Salim Bougouffa > wrote: Hi, I am attempting to annotate a plant genome. I have a couple of questions: 1) RNA-seq assembly a) I assembled my RNA-seq data using Trinity and StringTie. The two produce drastically different numbers. When I compare the two assemblies for each sample using TransRate, StringTie produces a higher score. for most of the assemblies. I see in all of the threads that you recommend Trinity but doesn't trinity produce way too many transcripts (even after chucking out the "bad" ones using transrate). b) During hint creation in MAKER, does it take into account that different transcripts have different read coverage (expression levels). I guess my question is should I filter transcripts that have a small read coverage. 2) Repeat Masking I am following the advanced repeat library construction tutorial (http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced). The initial steps find 15 sequences for the LTR and 159 for MITE. But, when I get to the perl DIR_CRL/CRL_Step4.pl step, both output files (Inner_Seq_For_BLAST.fasta, lLTRs_Seq_For_BLAST.fasta) are empty. a) are these numbers normal because I was expecting a lot more than 16 for the LTR? b) I don't get any errors when I run CRL_Step4.pl yet no output. What's going on?! Many thanks, /SB -- ____________________________ Sent from Inbox Mobile _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mjfi2sb3 at gmail.com Mon May 8 10:41:51 2017 From: mjfi2sb3 at gmail.com (Salim Bougouffa) Date: Mon, 08 May 2017 16:41:51 +0000 Subject: [maker-devel] advanced repeat masking library constructions & rna-seq assembly choices In-Reply-To: References: <18086AF2-01C3-4671-B974-C5FF36460618@gmail.com> <076B034E-8107-49CE-90C7-277AA4AB4ED3@cshl.edu> Message-ID: Thank you all for your responses. Regards, /SB On Mon, 8 May 2017, 18:50 Jiang, Ning, wrote: > Hi Salim, > > > I am sorry to learn about the issues. it depends on the quality of your > genome assembly for how many intact LTR elements you would get; however, 16 > seems too low to me. > > > The inner and LTR sequence file should NOT be empty. Some times the issue > could be due to that the initial sequence name is long and complicated. If > that's the case for your sequences, you might want to simplify your > sequence name (only including letters and numbers) and try again. > > > We are working on an automatic pipeline for LTR collection, if everything > goes smoothly, it should be available in two to three months. > > > Best wishes, > > > Ning > ------------------------------ > *From:* Campbell, Michael > *Sent:* Sunday, May 7, 2017 9:24 PM > *To:* Carson Holt > *Cc:* Salim Bougouffa; maker-devel at yandell-lab.org List; Jiang, Ning > *Subject:* Re: [maker-devel] advanced repeat masking library > constructions & rna-seq assembly choices > > Hi SB, > > I?ve added Ning Jaing to this email. She has put great effort into > updating this protocol recently and will be able to address your questions > better than I can. > > Ning, would you mind helping out with this? > > Thanks, > Mike > > On May 7, 2017, at 9:17 PM, Carson Holt carsonhh at gmail.com>> wrote: > > Michael can you answer the second question (Michael wrote the protocol, so > I CC?d him). > > With respect to the first question. Expression level is not necessarily > relevant to the annotation process (so no MAKER does not look at read > coverage). Instead we use the transcript assemblies to identify introns via > splice aware alignment (yes it is the introns and not the exons we care > about). Trinity has a nice option called jaccard_clip which avoids false > merging of neighboring transcripts (mostly occurs in fungi where UTR can > overlap). Merging of transcripts will cause extra introns to be assigned as > hints as well as potential overextension of UTR during final polishing > steps. The jaccard_clip option is the main reason we recommend Trinity. If > Stringtie has a similar option, then it can be used as well. > > Thanks, > Carson > > > > On May 4, 2017, at 12:37 AM, Salim Bougouffa mjfi2sb3 at gmail.com>> wrote: > > Hi, > > I am attempting to annotate a plant genome. I have a couple of questions: > > 1) RNA-seq assembly > a) I assembled my RNA-seq data using Trinity and StringTie. The two > produce drastically different numbers. When I compare the two assemblies > for each sample using TransRate, StringTie produces a higher score. for > most of the assemblies. I see in all of the threads that you recommend > Trinity but doesn't trinity produce way too many transcripts (even after > chucking out the "bad" ones using transrate). > b) During hint creation in MAKER, does it take into account that different > transcripts have different read coverage (expression levels). I guess my > question is should I filter transcripts that have a small read coverage. > > 2) Repeat Masking > I am following the advanced repeat library construction tutorial ( > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced). > The initial steps find 15 sequences for the LTR and 159 for MITE. But, when > I get to the perl DIR_CRL/CRL_Step4.pl step, both output files > (Inner_Seq_For_BLAST.fasta, lLTRs_Seq_For_BLAST.fasta) are empty. > > a) are these numbers normal because I was expecting a lot more than 16 for > the LTR? > b) I don't get any errors when I run CRL_Step4.pl yet no output. What's > going on?! > > Many thanks, > /SB > -- > > ____________________________ > Sent from Inbox Mobile > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Wed May 10 09:48:01 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Wed, 10 May 2017 11:48:01 -0400 Subject: [maker-devel] want coding sequences Message-ID: Hello: Thanks for development and maintenance of the tool "Maker2". We have used Maker2 to do genome annotation of a new rodent species. Now we are doing downstream analysis, which requires inputs of coding sequences from different species. I found the outputs I got from Maker2 only include protein sequences and transcripts. Is there an easy way that I can get the coding sequences for our annotated genome? Many thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From munholl at uwindsor.ca Wed May 10 14:24:49 2017 From: munholl at uwindsor.ca (Seth Munholland) Date: Wed, 10 May 2017 16:24:49 -0400 Subject: [maker-devel] MAKER only running 1 task Message-ID: Hello, I'm running a MAKER annotation on an ubuntu cluster and my top screen shows the following: Tasks: 831 total, 3 running, 826 sleeping, 0 stopped, 1 zombie and my maker run (on a screen) shows: ... total clusters:4 now processing 0 ...processing 0 of 3 ...processing 1 of 3 ...processing 2 of 3 total clusters:4 now processing 0 flattening protein clusters prepare section files merging blast reports... flattening protein clusters prepare section files merging blast reports... flattening protein clusters prepare section files Of the 826, the vast majority of them are maker and only one of the running tasks is maker. Is this normal behaviour or has my maker run stopped processing? Seth Munholland, B.Sc. Department of Biological Sciences Rm. 304 Biology Building University of Windsor 401 Sunset Ave. N9B 3P4 T: (519) 253-3000 Ext: 4755 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu May 11 09:58:59 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 11 May 2017 09:58:59 -0600 Subject: [maker-devel] want coding sequences In-Reply-To: References: Message-ID: <82DF4E49-8E78-45F6-8A78-01A45F908987@gmail.com> Use the fasta_tool utility with ?trim_maker_utr to get just the CDS part of each transcript. ?Carson > On May 10, 2017, at 9:48 AM, Quanwei Zhang wrote: > > Hello: > > Thanks for development and maintenance of the tool "Maker2". We have used Maker2 to do genome annotation of a new rodent species. Now we are doing downstream analysis, which requires inputs of coding sequences from different species. > > I found the outputs I got from Maker2 only include protein sequences and transcripts. Is there an easy way that I can get the coding sequences for our annotated genome? > > Many thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu May 11 10:03:04 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 11 May 2017 10:03:04 -0600 Subject: [maker-devel] MAKER only running 1 task In-Reply-To: References: Message-ID: <2119AF6E-3571-4D28-9D81-B24A513839E5@gmail.com> It may be frozen, or if it is on the last contig it can be running a non-paralelizable step (very last cluster merging step for each contig is not paralelizable). So on a large contig the very last step can take a little while, and if there are no other contigs, then there is no work to give to other processes to keep them busy in the meantime. So everyone has to wait so they can all exit together once the last step is done. But as I said, this will only happen if you are on the last contig and it is large. Otherwise it is probably frozen somehow (look for any errors further up the log). ?Carson > On May 10, 2017, at 2:24 PM, Seth Munholland wrote: > > Hello, > > I'm running a MAKER annotation on an ubuntu cluster and my top screen shows the following: > > Tasks: 831 total, 3 running, 826 sleeping, 0 stopped, 1 zombie > > and my maker run (on a screen) shows: > > ... > total clusters:4 now processing 0 > ...processing 0 of 3 > ...processing 1 of 3 > ...processing 2 of 3 > total clusters:4 now processing 0 > flattening protein clusters > prepare section files > merging blast reports... > flattening protein clusters > prepare section files > merging blast reports... > flattening protein clusters > prepare section files > > Of the 826, the vast majority of them are maker and only one of the running tasks is maker. Is this normal behaviour or has my maker run stopped processing? > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 <>_______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaymik at tgen.org Thu May 11 12:29:46 2017 From: mnaymik at tgen.org (Marcus Naymik) Date: Thu, 11 May 2017 11:29:46 -0700 Subject: [maker-devel] Maker gene vs snap match in final GFF's Message-ID: In the final GFF annotations what is the difference between a 'gene' from maker and a 'match' from snap? -- *This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu May 11 12:33:55 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 11 May 2017 12:33:55 -0600 Subject: [maker-devel] Maker gene vs snap match in final GFF's In-Reply-To: References: Message-ID: <8A77F384-9BAD-4BF4-BD1E-EDAE4E010612@gmail.com> MAKER results can be the result of additional hints sent to SNAP together with post processing to add UTR and additional exons that have support form transcript evidence. MAKER results will also have support from either protein or EST/mRNA evidence. SNAP match is simply the raw ab initio call made by SNAP (no hints, no post processing, and may or may not have evidence supporting the structure). They are there just for reference purposes. so you know what SNAP will produce outside of MAKER given the underlying HMM. ?Carson > On May 11, 2017, at 12:29 PM, Marcus Naymik wrote: > > In the final GFF annotations what is the difference between a 'gene' from maker and a 'match' from snap? > > This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.ence at ufl.edu Thu May 11 12:35:00 2017 From: d.ence at ufl.edu (Ence,daniel) Date: Thu, 11 May 2017 18:35:00 +0000 Subject: [maker-devel] Maker gene vs snap match in final GFF's In-Reply-To: References: Message-ID: <2560F44E-3D8D-4E81-B7B9-621921408B61@mail.ufl.edu> The two might have identical coordinates in some cases, but they are different kinds of features. The ?match? is a product of an abinitio gene prediction algorithm, while the ?gene? is is supported by evidence and passed through the maker polishing and filtering steps. On May 11, 2017, at 2:29 PM, Marcus Naymik > wrote: In the final GFF annotations what is the difference between a 'gene' from maker and a 'match' from snap? This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan.ricks at gmail.com Fri May 12 15:05:45 2017 From: nathan.ricks at gmail.com (Nathan Ricks) Date: Fri, 12 May 2017 15:05:45 -0600 Subject: [maker-devel] Using mpich2 with Maker Message-ID: I've been using maker for some time now. However, I would like to speed up the process by using the mpich2 option. When use the command ./Build install, the following error is produced. Any help would be appreciated. Nathan Ricks [image: Inline image 1] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 170139 bytes Desc: not available URL: From yuejiaxing at gmail.com Mon May 15 05:28:47 2017 From: yuejiaxing at gmail.com (Jia-Xing Yue) Date: Mon, 15 May 2017 13:28:47 +0200 Subject: [maker-devel] multiple overlapped snoRNA genes got annotated by maker Message-ID: Hello, I configured snoscan (v.0.9.1) for my maker installation (v2.31.9) and run the annotation for a yeast (S. cerevisiae) genome. I think the annotation went well with regard to tRNAs and protein-coding genes but I am not sure about snoRNAs. I found multiple overlapped snoRNA genes were annotated by maker as the example below shows. I was wondering if this is expected. If not, what might have caused this problem and is there a way to work around. Thanks in advance! chrIX maker gene 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding-gene-0.49 chrIX maker snoRNA 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|90|0 chrIX maker exon 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1:exon:12260;Parent=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1 chrIX maker gene 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding-gene-0.50 chrIX maker snoRNA 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|190|0 chrIX maker exon 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1:exon:12261;Parent=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1 chrIX maker gene 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding-gene-0.51 chrIX maker snoRNA 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|88|0 chrIX maker exon 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1:exon:12262;Parent=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1 chrIX maker gene 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding-gene-0.52 chrIX maker snoRNA 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|118|0 chrIX maker exon 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1:exon:12263;Parent=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1 chrIX maker gene 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding-gene-0.53 chrIX maker snoRNA 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|127|0 chrIX maker exon 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1:exon:12264;Parent=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1 Best, Jia-Xing -- Jia-Xing Yue Population Genomics and Complex Traits Group Tour Pasteur 8eme etage Facult? de M?decine Institute for Research on Cancer and Aging, Nice (IRCAN) CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) 28 Avenue de Valombrose 06107 NICE Cedex 2 France Personal website: http://www.iamphioxus.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon May 15 10:28:58 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 15 May 2017 12:28:58 -0400 Subject: [maker-devel] multiple overlapped snoRNA genes got annotated by maker In-Reply-To: References: Message-ID: <0AC3F89F-28EB-4E2A-ADDE-2DF8BD625416@gmail.com> Hi Jia-Xing, That has been my experience in the past as well. For the non-coding RNAs tRNA-scan is very accurate while snoscan seems to be quite sensitive but very specific. Did you give it a ?snoscan_meth? file? Giving it a snoscan_meth file will help with accuracy. The biggest gains in accuracy are from small RNA-seq data. In the paper where we used snoscan on maize we didn?t keep any snoRNA predictions that didn?t have support from small RNA-seq data, in practical terms we got rid of anything with a AED of 1. I hope this helps, Mike > On May 15, 2017, at 7:28 AM, Jia-Xing Yue wrote: > > Hello, > > I configured snoscan (v.0.9.1) for my maker installation (v2.31.9) and run the annotation for a yeast (S. cerevisiae) genome. I think the annotation went well with regard to tRNAs and protein-coding genes but I am not sure about snoRNAs. I found multiple overlapped snoRNA genes were annotated by maker as the example below shows. I was wondering if this is expected. If not, what might have caused this problem and is there a way to work around. Thanks in advance! > > chrIX maker gene 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding-gene-0.49 > chrIX maker snoRNA 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|90|0 > chrIX maker exon 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1:exon:12260;Parent=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1 > chrIX maker gene 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding-gene-0.50 > chrIX maker snoRNA 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|190|0 > chrIX maker exon 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1:exon:12261;Parent=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1 > chrIX maker gene 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding-gene-0.51 > chrIX maker snoRNA 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|88|0 > chrIX maker exon 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1:exon:12262;Parent=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1 > chrIX maker gene 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding-gene-0.52 > chrIX maker snoRNA 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|118|0 > chrIX maker exon 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1:exon:12263;Parent=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1 > chrIX maker gene 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding-gene-0.53 > chrIX maker snoRNA 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|127|0 > chrIX maker exon 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1:exon:12264;Parent=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1 > > Best, > Jia-Xing > > > -- > Jia-Xing Yue > > Population Genomics and Complex Traits Group > Tour Pasteur 8eme etage > Facult? de M?decine > Institute for Research on Cancer and Aging, Nice (IRCAN) > CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) > 28 Avenue de Valombrose > 06107 NICE Cedex 2 > France > > Personal website: http://www.iamphioxus.org/ > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From yuejiaxing at gmail.com Mon May 15 11:14:26 2017 From: yuejiaxing at gmail.com (Jia-Xing Yue) Date: Mon, 15 May 2017 19:14:26 +0200 Subject: [maker-devel] multiple overlapped snoRNA genes got annotated by maker In-Reply-To: <0AC3F89F-28EB-4E2A-ADDE-2DF8BD625416@gmail.com> References: <0AC3F89F-28EB-4E2A-ADDE-2DF8BD625416@gmail.com> Message-ID: Hi Michael, Many thanks for the information! I will specify the "snoscan_meth" file and give it another try then. I majorly want to use maker to annotate protein-coding genes and tRNAs. But it would be nice to have snoRNA reasonably annotated as well. Thanks gain and have a great day! Best, Jia-Xing On Mon, May 15, 2017 at 6:28 PM, Michael Campbell < michael.s.campbell1 at gmail.com> wrote: > Hi Jia-Xing, > > That has been my experience in the past as well. For the non-coding RNAs > tRNA-scan is very accurate while snoscan seems to be quite sensitive but > very specific. Did you give it a ?snoscan_meth? file? Giving it > a snoscan_meth file will help with accuracy. The biggest gains in accuracy > are from small RNA-seq data. In the paper where we used snoscan on maize we > didn?t keep any snoRNA predictions that didn?t have support from small > RNA-seq data, in practical terms we got rid of anything with a AED of 1. > > I hope this helps, > Mike > > On May 15, 2017, at 7:28 AM, Jia-Xing Yue wrote: > > Hello, > > I configured snoscan (v.0.9.1) for my maker installation (v2.31.9) and run > the annotation for a yeast (S. cerevisiae) genome. I think the annotation > went well with regard to tRNAs and protein-coding genes but I am not sure > about snoRNAs. I found multiple overlapped snoRNA genes were annotated by > maker as the example below shows. I was wondering if this is expected. If > not, what might have caused this problem and is there a way to work around. > Thanks in advance! > > chrIX maker gene 4328 4416 . + . > ID=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX- > noncoding-gene-0.49 > chrIX maker snoRNA 4328 4416 . + . > ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;Parent= > snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX- > noncoding-gene-0.49-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|90|0 > chrIX maker exon 4328 4416 . + . > ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1:exon:12260; > Parent=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1 > chrIX maker gene 4375 4563 . + . > ID=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX- > noncoding-gene-0.50 > chrIX maker snoRNA 4375 4563 . + . > ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;Parent= > snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX- > noncoding-gene-0.50-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1| > 0|0|-1|0|1|190|0 > chrIX maker exon 4375 4563 . + . > ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1:exon:12261; > Parent=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1 > chrIX maker gene 4375 4461 . + . > ID=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX- > noncoding-gene-0.51 > chrIX maker snoRNA 4375 4461 . + . > ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;Parent= > snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX- > noncoding-gene-0.51-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|88|0 > chrIX maker exon 4375 4461 . + . > ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1:exon:12262; > Parent=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1 > chrIX maker gene 4375 4491 . + . > ID=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX- > noncoding-gene-0.52 > chrIX maker snoRNA 4375 4491 . + . > ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;Parent= > snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX- > noncoding-gene-0.52-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1| > 0|0|-1|0|1|118|0 > chrIX maker exon 4375 4491 . + . > ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1:exon:12263; > Parent=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1 > chrIX maker gene 4375 4500 . + . > ID=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX- > noncoding-gene-0.53 > chrIX maker snoRNA 4375 4500 . + . > ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;Parent= > snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX- > noncoding-gene-0.53-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1| > 0|0|-1|0|1|127|0 > chrIX maker exon 4375 4500 . + . > ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1:exon:12264; > Parent=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1 > > Best, > Jia-Xing > > > -- > Jia-Xing Yue > > Population Genomics and Complex Traits Group > Tour Pasteur 8eme etage > Facult? de M?decine > Institute for Research on Cancer and Aging, Nice (IRCAN) > CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) > 28 Avenue de Valombrose > 06107 NICE Cedex 2 > France > > Personal website: http://www.iamphioxus.org/ > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -- Jia-Xing Yue Population Genomics and Complex Traits Group Tour Pasteur 8eme etage Facult? de M?decine Institute for Research on Cancer and Aging, Nice (IRCAN) CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) 28 Avenue de Valombrose 06107 NICE Cedex 2 France Personal website: http://www.iamphioxus.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 16 08:51:00 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 16 May 2017 08:51:00 -0600 Subject: [maker-devel] Using mpich2 with Maker In-Reply-To: References: Message-ID: <6C739B3B-D5D6-4263-A73C-4EA1762B1EE1@gmail.com> You probably need to reinstall Parse::RecDescent, Inline, Inline::C, or all of the above via CPAN (perl?s module installer). The ones already installed on your system may have issues. If you do not have the ability to install modules, you can install them just for your user using local::lib and the bootstrapping instructions here ?> http://search.cpan.org/~haarg/local-lib-2.000019/lib/local/lib.pm#The_bootstrapping_technique Then reinstall MAKER. ?Carson > On May 12, 2017, at 3:05 PM, Nathan Ricks wrote: > > I've been using maker for some time now. However, I would like to speed up the process by using the mpich2 option. When use the command ./Build install, the following error is produced. Any help would be appreciated. > > Nathan Ricks > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From salim.bougouffa at kaust.edu.sa Sun May 21 01:45:50 2017 From: salim.bougouffa at kaust.edu.sa (Salim Bougouffa) Date: Sun, 21 May 2017 10:45:50 +0300 Subject: [maker-devel] augustus exon calling ~ Message-ID: Hi Maker folks, I have several issues with a plant genome annotation that I am currently doing but perhaps the most recurrent issues are: 1/ CDSs that are missed where significant rna-seq evidence is there (figure artemis01) 2/ vice versa where one or two exons are added without rna-seq evidence/intron hints (figure artemis02) info about the runs: 1/ using augustus with a pre-existing model for a related plant that has high homology to the one I am annotating 2/ umask=1 (seems to do better than umask=0; is this a good thing to do) 3/ evm = 1 (seems to perform better than emv=0) 4/ repeatmasking (denovo + repbase) Best, /SB _______________________________________________________Salim Bougouffa(PhD), Postdoctoral Fellow 4700 KAUST, CBRC, Blg3. Office4326-WS05, Thuwal, Jeddah, KSA, 23955-6900 (966) 012 808 2963 || salim.bougouff at kaust.edu.sa -- ------------------------------ This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: artemis01.png Type: image/png Size: 166352 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: artemis02.png Type: image/png Size: 97140 bytes Desc: not available URL: From mjfi2sb3 at gmail.com Sun May 21 01:48:48 2017 From: mjfi2sb3 at gmail.com (Salim Bougouffa) Date: Sun, 21 May 2017 07:48:48 +0000 Subject: [maker-devel] augustus exon calling ~ Message-ID: Hi Maker folks, I have several issues with a plant genome annotation that I am currently doing but perhaps the most recurrent issues are: 1/ CDSs that are missed where significant rna-seq evidence is there (figure artemis01) 2/ vice versa where one or two exons are added without rna-seq evidence/intron hints (figure artemis02) info about the runs: 1/ using augustus with a pre-existing model for a related plant that has high homology to the one I am annotating 2/ umask=1 (seems to do better than umask=0; is this a good thing to do) 3/ evm = 1 (seems to perform better than emv=0) 4/ repeatmasking (denovo + repbase) Best, /SB -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: artemis01.png Type: image/png Size: 166352 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: artemis02.png Type: image/png Size: 97140 bytes Desc: not available URL: From mjfi2sb3 at gmail.com Sun May 21 01:55:31 2017 From: mjfi2sb3 at gmail.com (Salim Bougouffa) Date: Sun, 21 May 2017 07:55:31 +0000 Subject: [maker-devel] augustus exon calling ~ In-Reply-To: References: Message-ID: Hi, I should have mentioned a third scenario where an exon is not called fully by maker despite augustus getting it right (figure artemis03) [image: artemis03.png] On Sun, 21 May 2017 at 10:48 Salim Bougouffa wrote: > Hi Maker folks, > > I have several issues with a plant genome annotation that I am currently > doing but perhaps the most recurrent issues are: > > 1/ CDSs that are missed where significant rna-seq evidence is there > (figure artemis01) > 2/ vice versa where one or two exons are added without rna-seq > evidence/intron hints (figure artemis02) > > info about the runs: > 1/ using augustus with a pre-existing model for a related plant that has > high homology to the one I am annotating > 2/ umask=1 (seems to do better than umask=0; is this a good thing to do) > 3/ evm = 1 (seems to perform better than emv=0) > 4/ repeatmasking (denovo + repbase) > > Best, > /SB > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: artemis03.png Type: image/png Size: 145860 bytes Desc: not available URL: From admin at genome.arizona.edu Tue May 23 13:52:17 2017 From: admin at genome.arizona.edu (System Admin) Date: Tue, 23 May 2017 12:52:17 -0700 Subject: [maker-devel] Hyperthreading Message-ID: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> We are using maker in a cluster with mpich. Currently hyperthreading is on and we use 'mpiexec -n ' to start maker. Our machinelist file for mpich specifies the total emulated cores for each node. With hyperthreading on, we have up to 256 total emulated cores available. Which is the optimal scenario? 1. Use '-n 256' 2. Use '-n 128' with hyperthreading still on 3. Use '-n 128' with hyperthreading turned off Thanks From carsonhh at gmail.com Tue May 23 14:19:29 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 23 May 2017 14:19:29 -0600 Subject: [maker-devel] Hyperthreading In-Reply-To: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> References: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> Message-ID: MAKER is more of a pipeline. It will launch external tools on as many CPUs as you give it with the mpiexec command. I?ve found that many of the tools used get a boost with hyperthreading even though optimizations are not explicitly built into their code. The short answer is you would have to try it both ways. I doubt there will be much more than a 10-15% difference in runtime. You can pull back to 128 would if you find that you are running low on RAM or have a high IO burden (both of which will double if you go from 128 to 256 even though CPU isn?t really doubling). Also MAKER per job performance plateaus at around 200 processes due to communication overhead. Above that threshold it is often useful to divide datasets into multiple separate jobs that can run simultaneously. ?Carson > On May 23, 2017, at 1:52 PM, System Admin wrote: > > We are using maker in a cluster with mpich. Currently hyperthreading is on and we use 'mpiexec -n ' to start maker. Our machinelist file for mpich specifies the total emulated cores for each node. > With hyperthreading on, we have up to 256 total emulated cores available. > > Which is the optimal scenario? > 1. Use '-n 256' > 2. Use '-n 128' with hyperthreading still on > 3. Use '-n 128' with hyperthreading turned off > > Thanks > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From admin at genome.arizona.edu Tue May 23 14:31:48 2017 From: admin at genome.arizona.edu (admin at genome.arizona.edu) Date: Tue, 23 May 2017 13:31:48 -0700 Subject: [maker-devel] Hyperthreading In-Reply-To: References: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> Message-ID: Carson Holt wrote on 05/23/2017 01:19 PM: > You can pull back to 128 would if you find that you are running low on RAM or have a high IO burden (both of which will double if you go from 128 to 256 even though CPU isn?t really doubling). Also MAKER per job performance plateaus at around 200 processes due to communication overhead. Above that threshold it is often useful to divide datasets into multiple separate jobs that can run simultaneously. Yes, with '-n 192' we found the load on the cluster will initially go up to 360-380 but then continually decreases until maker is finished. Memory usage was very low during the processing (under 20%). From carsonhh at gmail.com Tue May 23 14:38:42 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 23 May 2017 14:38:42 -0600 Subject: [maker-devel] Hyperthreading In-Reply-To: References: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> Message-ID: <47000CA1-0F27-40B2-8BB7-3289C2010853@gmail.com> Also make sure cpus in the control file are set to 1 when using MPI. Otherwise it will tell each program it calls to try and use more CPUs per call. ?Carson > On May 23, 2017, at 2:31 PM, admin at genome.arizona.edu wrote: > > Carson Holt wrote on 05/23/2017 01:19 PM: >> You can pull back to 128 would if you find that you are running low on RAM or have a high IO burden (both of which will double if you go from 128 to 256 even though CPU isn?t really doubling). Also MAKER per job performance plateaus at around 200 processes due to communication overhead. Above that threshold it is often useful to divide datasets into multiple separate jobs that can run simultaneously. > > Yes, with '-n 192' we found the load on the cluster will initially go up to 360-380 but then continually decreases until maker is finished. Memory usage was very low during the processing (under 20%). > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mmokrejs at gmail.com Tue May 23 14:45:21 2017 From: mmokrejs at gmail.com (=?UTF-8?Q?Martin_MOKREJ=c5=a0?=) Date: Tue, 23 May 2017 22:45:21 +0200 Subject: [maker-devel] Hyperthreading In-Reply-To: References: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> Message-ID: <30ae31f7-264e-44d0-30bc-a010de3e54a7@gmail.com> admin at genome.arizona.edu wrote: > Carson Holt wrote on 05/23/2017 01:19 PM: >> You can pull back to 128 would if you find that you are running low on RAM or have a high IO burden (both of which will double if you go from 128 to 256 even though CPU isn?t really doubling). Also MAKER per job performance plateaus at around 200 processes due to communication overhead. Above that threshold it is often useful to divide datasets into multiple separate jobs that can run simultaneously. > > Yes, with '-n 192' we found the load on the cluster will initially go up to 360-380 but then continually decreases until maker is finished. Memory usage was very low during the processing (under 20%). Hi, the high load could be caused by disk IO or other reasons. The only proof is to run top, htop or similar and check that the processes are in *running* state ("R" is displayed in the status column). There could be "S" (sleep) when task is waiting for data input or output and also "D"(disk) coudl be shown when waiting for disk IO (unlike network IO). Martin From mmokrejs at gmail.com Tue May 23 14:51:18 2017 From: mmokrejs at gmail.com (=?UTF-8?Q?Martin_MOKREJ=c5=a0?=) Date: Tue, 23 May 2017 22:51:18 +0200 Subject: [maker-devel] Hyperthreading In-Reply-To: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> References: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> Message-ID: <54fbcfd6-ba28-e570-cfab-a6d83620f747@gmail.com> System Admin wrote: > We are using maker in a cluster with mpich. Currently hyperthreading is on and we use 'mpiexec -n ' to start maker. Our machinelist file for mpich specifies the total emulated cores for each node. > With hyperthreading on, we have up to 256 total emulated cores available. > > Which is the optimal scenario? > 1. Use '-n 256' > 2. Use '-n 128' with hyperthreading still on > 3. Use '-n 128' with hyperthreading turned off Go for 3. but make sure to disable *hyperthreading* in the kernel of the machines as well. I also disable multicore scheduler (which should again be helping if there are more long-term running processes than physical cores available and if some should probably share a cache). We do not have such jobs, hmmer and blast are mostly accessing data from memory, so the CPU cache is not much relevant for these. Hyperthreading only helps if jobs are lousy, waiting for some input/output etc., and in that case *it helps* if another process can be executed on the CPU core (hopefully not having same bottleneck). This is generally a helped in bad situations. You are after good setup, so disable hyperthreading in kernel, load only that many jobs equal to the number of physical CPI cores, and monitor performance. If jobs are starving, resolve the issue. Martin From admin at genome.arizona.edu Tue May 23 14:57:56 2017 From: admin at genome.arizona.edu (admin at genome.arizona.edu) Date: Tue, 23 May 2017 13:57:56 -0700 Subject: [maker-devel] Hyperthreading In-Reply-To: <47000CA1-0F27-40B2-8BB7-3289C2010853@gmail.com> References: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> <47000CA1-0F27-40B2-8BB7-3289C2010853@gmail.com> Message-ID: Carson Holt wrote on 05/23/2017 01:38 PM: > Also make sure cpus in the control file are set to 1 when using MPI. > Otherwise it will tell each program it calls to try and use more > CPUs per call. Yes we are using cpus=1 in the control file Thanks From carsonhh at gmail.com Tue May 23 15:03:17 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 23 May 2017 15:03:17 -0600 Subject: [maker-devel] Hyperthreading In-Reply-To: References: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> Message-ID: <054B76C7-FAC0-4B9B-A6D1-29A8202E35B3@gmail.com> One last thing to check if using CentOS or RedHat. I?ve seen it happen on a handful of clusters where transparent hugepages can create odd load issues and very high sys CPU usage under top (not just with maker but with BWA, GATK, and other programs that can have larger memory footprints). If using CentOS or RedHat, you may want to disable defrag for hugepages. You do this on CentOS 6 to disable it (the process is similar on CentOS 7 and RedHat but you may have to google it) ?> echo never > /sys/kernel/mm/transparent_hugepage/defrag echo 0 > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag ?Carson > On May 23, 2017, at 2:31 PM, admin at genome.arizona.edu wrote: > > Carson Holt wrote on 05/23/2017 01:19 PM: >> You can pull back to 128 would if you find that you are running low on RAM or have a high IO burden (both of which will double if you go from 128 to 256 even though CPU isn?t really doubling). Also MAKER per job performance plateaus at around 200 processes due to communication overhead. Above that threshold it is often useful to divide datasets into multiple separate jobs that can run simultaneously. > > Yes, with '-n 192' we found the load on the cluster will initially go up to 360-380 but then continually decreases until maker is finished. Memory usage was very low during the processing (under 20%). > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue May 23 15:34:15 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 23 May 2017 15:34:15 -0600 Subject: [maker-devel] augustus exon calling ~ In-Reply-To: References: Message-ID: EVM works extremely well when evidence closely matches the predictions and there are no assembly anomalies affecting ORF. Otherwise, EVM performs very very poorly. Also I would not set unmask=1. It adds noise to the calls. Note in all cases given, gene models are from Augustus (MAKER doesn?t make predictions). MAKER just provides hints that Augustus can use for the second call set. Hints boost the score a model gets whenever a feature matches the hint. What you see as an Augustus match/match_part feature are just references of what Augustus calls without hints. So if I tell Augustus there is probably an exon/intron at location X, then any model that includes that exon/intron will bump up its score thus causing Augustus to keep models that match the hints and report those over models that don?t match. However if there is an issue with the evidence (i.e. merge mRNA-seq assembly), or an issue with the assembly (base change generates an early stop codon or causes a frameshift), then Augustus may choose to truncate or skip an exon in order to capture the bonus from downstream hints. So it is unlikely that there is a workable model that capture the exact intron exon structure because it breaks the ORF at some point. So Augustus instead produces the best model it can to capture as many hint bonuses as it can. That being said, look for any odd hint sources like very poor protein or transcript evidence alignments. Eliminating bad hints will improve performance (if using mRNA-seq assemblies Trinity has a jaccard_clip option which helps avoid false merging of transcript evidence for example). Or if an organism you used for protein evidence constantly produces bad protein alignments, then you may want to drop it completely from evidence. Finally training Augustus on the genome being annotated will help improve performance (note just because a species is closely related in evolutionary space does not mean that its HMM's will perform well; it?s a common fallacy about ab initio prediction discussed in the SNAP paper). Also try adding another gene predictor like SNAP to see if it hurts or helps. ?Carson > On May 21, 2017, at 1:48 AM, Salim Bougouffa wrote: > > Hi Maker folks, > > I have several issues with a plant genome annotation that I am currently doing but perhaps the most recurrent issues are: > > 1/ CDSs that are missed where significant rna-seq evidence is there (figure artemis01) > 2/ vice versa where one or two exons are added without rna-seq evidence/intron hints (figure artemis02) > > info about the runs: > 1/ using augustus with a pre-existing model for a related plant that has high homology to the one I am annotating > 2/ umask=1 (seems to do better than umask=0; is this a good thing to do) > 3/ evm = 1 (seems to perform better than emv=0) > 4/ repeatmasking (denovo + repbase) > > Best, > /SB > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan.ricks at gmail.com Tue May 23 15:23:49 2017 From: nathan.ricks at gmail.com (Nathan Ricks) Date: Tue, 23 May 2017 15:23:49 -0600 Subject: [maker-devel] maker_functional_gff Message-ID: I've been working with maker and trying to use the maker_functional_gff to create an annotated .gff file. However, whenever I run the command, the following pops up, and just continues for a long time. Use of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58363. Use of uninitialized value $qid in hash element at ./maker_functional_gff line 170, <$IN> line 58363. Use of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58367. Use of uninitialized value $qid in hash element at ./maker_functional_gff line 170, <$IN> line 58367. ^CUse of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58368. Nathan Ricks -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.ence at ufl.edu Tue May 23 15:39:32 2017 From: d.ence at ufl.edu (Ence,daniel) Date: Tue, 23 May 2017 21:39:32 +0000 Subject: [maker-devel] maker_functional_gff In-Reply-To: References: Message-ID: <9388406A-302C-4B19-9F35-D56C06CC9582@mail.ufl.edu> Hi Nathan, can you send the command line that you?re using and is giving the error? Thanks, Daniel Ence > On May 23, 2017, at 5:23 PM, Nathan Ricks wrote: > > I've been working with maker and trying to use the maker_functional_gff to create an annotated .gff file. However, whenever I run the command, the following pops up, and just continues for a long time. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58363. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 170, <$IN> line 58363. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58367. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 170, <$IN> line 58367. > ^CUse of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58368. > > > Nathan Ricks > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue May 23 15:44:05 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 23 May 2017 15:44:05 -0600 Subject: [maker-devel] maker_functional_gff In-Reply-To: References: Message-ID: <25B003D7-6A19-4486-B21F-71070F00A580@gmail.com> The blast report you gave it is in the wrong format, it is partial/truncated, or you provided the files in the wrong order. Basically it receive an empty line from the file at some point. The blast report format must in tabular foramt which is "wu-blast -mformat 2? or "ncbi-blast -outfmt 6" Also the script only supports blast results against UniProt/Swiss-prot. ?Carson > On May 23, 2017, at 3:23 PM, Nathan Ricks wrote: > > I've been working with maker and trying to use the maker_functional_gff to create an annotated .gff file. However, whenever I run the command, the following pops up, and just continues for a long time. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58363. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 170, <$IN> line 58363. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58367. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 170, <$IN> line 58367. > ^CUse of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58368. > > > Nathan Ricks > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From yuejiaxing at gmail.com Fri May 26 03:28:48 2017 From: yuejiaxing at gmail.com (Jia-Xing Yue) Date: Fri, 26 May 2017 11:28:48 +0200 Subject: [maker-devel] multiple overlapped snoRNA genes got annotated by maker In-Reply-To: References: <0AC3F89F-28EB-4E2A-ADDE-2DF8BD625416@gmail.com> Message-ID: Hi Michael, This is a follow-up for the snoscan issue. I found the snoscan_meth option seems have been removed in the current maker_opts.ctl template file (v2.31.9). This option used to be there according to this post ( https://www.biostars.org/p/217240/). I manually specified this option in my maker_opts.ctl file but I don't think maker has correctly recognized this option: STATUS: Parsing control files... WARNING: Invalid option 'snoscan_meth' in control file maker_opts.ctl ... Do you know is there a way to work around this problem? Thanks! Best, Jia-Xing On Mon, May 15, 2017 at 7:14 PM, Jia-Xing Yue wrote: > Hi Michael, > > Many thanks for the information! I will specify the "snoscan_meth" file > and give it another try then. I majorly want to use maker to annotate > protein-coding genes and tRNAs. But it would be nice to have snoRNA > reasonably annotated as well. > Thanks gain and have a great day! > > Best, > Jia-Xing > > > On Mon, May 15, 2017 at 6:28 PM, Michael Campbell < > michael.s.campbell1 at gmail.com> wrote: > >> Hi Jia-Xing, >> >> That has been my experience in the past as well. For the non-coding RNAs >> tRNA-scan is very accurate while snoscan seems to be quite sensitive but >> very specific. Did you give it a ?snoscan_meth? file? Giving it >> a snoscan_meth file will help with accuracy. The biggest gains in accuracy >> are from small RNA-seq data. In the paper where we used snoscan on maize we >> didn?t keep any snoRNA predictions that didn?t have support from small >> RNA-seq data, in practical terms we got rid of anything with a AED of 1. >> >> I hope this helps, >> Mike >> >> On May 15, 2017, at 7:28 AM, Jia-Xing Yue wrote: >> >> Hello, >> >> I configured snoscan (v.0.9.1) for my maker installation (v2.31.9) and >> run the annotation for a yeast (S. cerevisiae) genome. I think the >> annotation went well with regard to tRNAs and protein-coding genes but I am >> not sure about snoRNAs. I found multiple overlapped snoRNA genes were >> annotated by maker as the example below shows. I was wondering if this is >> expected. If not, what might have caused this problem and is there a way to >> work around. Thanks in advance! >> >> chrIX maker gene 4328 4416 . + . >> ID=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-nonc >> oding-gene-0.49 >> chrIX maker snoRNA 4328 4416 . + . >> ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;Parent=snoscan >> -chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding- >> gene-0.49-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|90|0 >> chrIX maker exon 4328 4416 . + . >> ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1:exon:12260;Par >> ent=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1 >> chrIX maker gene 4375 4563 . + . >> ID=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-nonc >> oding-gene-0.50 >> chrIX maker snoRNA 4375 4563 . + . >> ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;Parent=snoscan >> -chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding- >> gene-0.50-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|190|0 >> chrIX maker exon 4375 4563 . + . >> ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1:exon:12261;Par >> ent=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1 >> chrIX maker gene 4375 4461 . + . >> ID=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-nonc >> oding-gene-0.51 >> chrIX maker snoRNA 4375 4461 . + . >> ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;Parent=snoscan >> -chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding- >> gene-0.51-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|88|0 >> chrIX maker exon 4375 4461 . + . >> ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1:exon:12262;Par >> ent=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1 >> chrIX maker gene 4375 4491 . + . >> ID=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-nonc >> oding-gene-0.52 >> chrIX maker snoRNA 4375 4491 . + . >> ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;Parent=snoscan >> -chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding- >> gene-0.52-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|118|0 >> chrIX maker exon 4375 4491 . + . >> ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1:exon:12263;Par >> ent=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1 >> chrIX maker gene 4375 4500 . + . >> ID=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-nonc >> oding-gene-0.53 >> chrIX maker snoRNA 4375 4500 . + . >> ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;Parent=snoscan >> -chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding- >> gene-0.53-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|127|0 >> chrIX maker exon 4375 4500 . + . >> ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1:exon:12264;Par >> ent=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1 >> >> Best, >> Jia-Xing >> >> >> -- >> Jia-Xing Yue >> >> Population Genomics and Complex Traits Group >> Tour Pasteur 8eme etage >> Facult? de M?decine >> Institute for Research on Cancer and Aging, Nice (IRCAN) >> CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) >> 28 Avenue de Valombrose >> 06107 NICE Cedex 2 >> France >> >> Personal website: http://www.iamphioxus.org/ >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > > -- > Jia-Xing Yue > > Population Genomics and Complex Traits Group > Tour Pasteur 8eme etage > Facult? de M?decine > Institute for Research on Cancer and Aging, Nice (IRCAN) > CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) > 28 Avenue de Valombrose > 06107 NICE Cedex 2 > France > > Personal website: http://www.iamphioxus.org/ > > -- Jia-Xing Yue Population Genomics and Complex Traits Group Tour Pasteur 8eme etage Facult? de M?decine Institute for Research on Cancer and Aging, Nice (IRCAN) CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) 28 Avenue de Valombrose 06107 NICE Cedex 2 France Personal website: http://www.iamphioxus.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Fri May 26 07:54:44 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Fri, 26 May 2017 09:54:44 -0400 Subject: [maker-devel] multiple overlapped snoRNA genes got annotated by maker In-Reply-To: References: <0AC3F89F-28EB-4E2A-ADDE-2DF8BD625416@gmail.com> Message-ID: <6461FDD0-BE78-403A-9FEF-E71C3D24F2CA@gmail.com> Hi Jia-Xing, v2.31.9 may not have had that option. I know that it is in the v3.00.0 version, so you best option may be to update. Thanks, Mike > On May 26, 2017, at 5:28 AM, Jia-Xing Yue wrote: > > Hi Michael, > > This is a follow-up for the snoscan issue. I found the snoscan_meth option seems have been removed in the current maker_opts.ctl template file (v2.31.9). This option used to be there according to this post (https://www.biostars.org/p/217240/ ). I manually specified this option in my maker_opts.ctl file but I don't think maker has correctly recognized this option: > > > STATUS: Parsing control files... > WARNING: Invalid option 'snoscan_meth' in control file maker_opts.ctl > ... > > Do you know is there a way to work around this problem? Thanks! > > Best, > Jia-Xing > > > > On Mon, May 15, 2017 at 7:14 PM, Jia-Xing Yue > wrote: > Hi Michael, > > Many thanks for the information! I will specify the "snoscan_meth" file and give it another try then. I majorly want to use maker to annotate protein-coding genes and tRNAs. But it would be nice to have snoRNA reasonably annotated as well. > Thanks gain and have a great day! > > Best, > Jia-Xing > > > On Mon, May 15, 2017 at 6:28 PM, Michael Campbell > wrote: > Hi Jia-Xing, > > That has been my experience in the past as well. For the non-coding RNAs tRNA-scan is very accurate while snoscan seems to be quite sensitive but very specific. Did you give it a ?snoscan_meth? file? Giving it a snoscan_meth file will help with accuracy. The biggest gains in accuracy are from small RNA-seq data. In the paper where we used snoscan on maize we didn?t keep any snoRNA predictions that didn?t have support from small RNA-seq data, in practical terms we got rid of anything with a AED of 1. > > I hope this helps, > Mike >> On May 15, 2017, at 7:28 AM, Jia-Xing Yue > wrote: >> >> Hello, >> >> I configured snoscan (v.0.9.1) for my maker installation (v2.31.9) and run the annotation for a yeast (S. cerevisiae) genome. I think the annotation went well with regard to tRNAs and protein-coding genes but I am not sure about snoRNAs. I found multiple overlapped snoRNA genes were annotated by maker as the example below shows. I was wondering if this is expected. If not, what might have caused this problem and is there a way to work around. Thanks in advance! >> >> chrIX maker gene 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding-gene-0.49 >> chrIX maker snoRNA 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|90|0 >> chrIX maker exon 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1:exon:12260;Parent=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1 >> chrIX maker gene 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding-gene-0.50 >> chrIX maker snoRNA 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|190|0 >> chrIX maker exon 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1:exon:12261;Parent=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1 >> chrIX maker gene 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding-gene-0.51 >> chrIX maker snoRNA 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|88|0 >> chrIX maker exon 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1:exon:12262;Parent=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1 >> chrIX maker gene 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding-gene-0.52 >> chrIX maker snoRNA 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|118|0 >> chrIX maker exon 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1:exon:12263;Parent=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1 >> chrIX maker gene 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding-gene-0.53 >> chrIX maker snoRNA 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|127|0 >> chrIX maker exon 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1:exon:12264;Parent=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1 >> >> Best, >> Jia-Xing >> >> >> -- >> Jia-Xing Yue >> >> Population Genomics and Complex Traits Group >> Tour Pasteur 8eme etage >> Facult? de M?decine >> Institute for Research on Cancer and Aging, Nice (IRCAN) >> CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) >> 28 Avenue de Valombrose >> 06107 NICE Cedex 2 >> France >> >> Personal website: http://www.iamphioxus.org/ >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > Jia-Xing Yue > > Population Genomics and Complex Traits Group > Tour Pasteur 8eme etage > Facult? de M?decine > Institute for Research on Cancer and Aging, Nice (IRCAN) > CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) > 28 Avenue de Valombrose > 06107 NICE Cedex 2 > France > > Personal website: http://www.iamphioxus.org/ > > > > > -- > Jia-Xing Yue > > Population Genomics and Complex Traits Group > Tour Pasteur 8eme etage > Facult? de M?decine > Institute for Research on Cancer and Aging, Nice (IRCAN) > CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) > 28 Avenue de Valombrose > 06107 NICE Cedex 2 > France > > Personal website: http://www.iamphioxus.org/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yuejiaxing at gmail.com Fri May 26 08:20:03 2017 From: yuejiaxing at gmail.com (Jia-Xing Yue) Date: Fri, 26 May 2017 16:20:03 +0200 Subject: [maker-devel] multiple overlapped snoRNA genes got annotated by maker In-Reply-To: <6461FDD0-BE78-403A-9FEF-E71C3D24F2CA@gmail.com> References: <0AC3F89F-28EB-4E2A-ADDE-2DF8BD625416@gmail.com> <6461FDD0-BE78-403A-9FEF-E71C3D24F2CA@gmail.com> Message-ID: I see. Thanks Michael! Best, Jia-Xing On Fri, May 26, 2017 at 3:54 PM, Michael Campbell < michael.s.campbell1 at gmail.com> wrote: > Hi Jia-Xing, > > v2.31.9 may not have had that option. I know that it is in the v3.00.0 > version, so you best option may be to update. > > Thanks, > Mike > > On May 26, 2017, at 5:28 AM, Jia-Xing Yue wrote: > > Hi Michael, > > This is a follow-up for the snoscan issue. I found the snoscan_meth option > seems have been removed in the current maker_opts.ctl template file > (v2.31.9). This option used to be there according to this post ( > https://www.biostars.org/p/217240/). I manually specified this option in > my maker_opts.ctl file but I don't think maker has correctly recognized > this option: > > > STATUS: Parsing control files... > WARNING: Invalid option 'snoscan_meth' in control file maker_opts.ctl > ... > > Do you know is there a way to work around this problem? Thanks! > > Best, > Jia-Xing > > > > On Mon, May 15, 2017 at 7:14 PM, Jia-Xing Yue > wrote: > >> Hi Michael, >> >> Many thanks for the information! I will specify the "snoscan_meth" file >> and give it another try then. I majorly want to use maker to annotate >> protein-coding genes and tRNAs. But it would be nice to have snoRNA >> reasonably annotated as well. >> Thanks gain and have a great day! >> >> Best, >> Jia-Xing >> >> >> On Mon, May 15, 2017 at 6:28 PM, Michael Campbell < >> michael.s.campbell1 at gmail.com> wrote: >> >>> Hi Jia-Xing, >>> >>> That has been my experience in the past as well. For the non-coding RNAs >>> tRNA-scan is very accurate while snoscan seems to be quite sensitive but >>> very specific. Did you give it a ?snoscan_meth? file? Giving it >>> a snoscan_meth file will help with accuracy. The biggest gains in accuracy >>> are from small RNA-seq data. In the paper where we used snoscan on maize we >>> didn?t keep any snoRNA predictions that didn?t have support from small >>> RNA-seq data, in practical terms we got rid of anything with a AED of 1. >>> >>> I hope this helps, >>> Mike >>> >>> On May 15, 2017, at 7:28 AM, Jia-Xing Yue wrote: >>> >>> Hello, >>> >>> I configured snoscan (v.0.9.1) for my maker installation (v2.31.9) and >>> run the annotation for a yeast (S. cerevisiae) genome. I think the >>> annotation went well with regard to tRNAs and protein-coding genes but I am >>> not sure about snoRNAs. I found multiple overlapped snoRNA genes were >>> annotated by maker as the example below shows. I was wondering if this is >>> expected. If not, what might have caused this problem and is there a way to >>> work around. Thanks in advance! >>> >>> chrIX maker gene 4328 4416 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-nonc >>> oding-gene-0.49 >>> chrIX maker snoRNA 4328 4416 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;Parent=snoscan >>> -chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding-gene >>> -0.49-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|90|0 >>> chrIX maker exon 4328 4416 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1:exon:12260;Par >>> ent=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1 >>> chrIX maker gene 4375 4563 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-nonc >>> oding-gene-0.50 >>> chrIX maker snoRNA 4375 4563 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;Parent=snoscan >>> -chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding-gene >>> -0.50-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|190|0 >>> chrIX maker exon 4375 4563 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1:exon:12261;Par >>> ent=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1 >>> chrIX maker gene 4375 4461 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-nonc >>> oding-gene-0.51 >>> chrIX maker snoRNA 4375 4461 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;Parent=snoscan >>> -chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding-gene >>> -0.51-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|88|0 >>> chrIX maker exon 4375 4461 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1:exon:12262;Par >>> ent=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1 >>> chrIX maker gene 4375 4491 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-nonc >>> oding-gene-0.52 >>> chrIX maker snoRNA 4375 4491 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;Parent=snoscan >>> -chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding-gene >>> -0.52-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|118|0 >>> chrIX maker exon 4375 4491 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1:exon:12263;Par >>> ent=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1 >>> chrIX maker gene 4375 4500 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-nonc >>> oding-gene-0.53 >>> chrIX maker snoRNA 4375 4500 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;Parent=snoscan >>> -chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding-gene >>> -0.53-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|127|0 >>> chrIX maker exon 4375 4500 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1:exon:12264;Par >>> ent=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1 >>> >>> Best, >>> Jia-Xing >>> >>> >>> -- >>> Jia-Xing Yue >>> >>> Population Genomics and Complex Traits Group >>> Tour Pasteur 8eme etage >>> Facult? de M?decine >>> Institute for Research on Cancer and Aging, Nice (IRCAN) >>> CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) >>> 28 Avenue de Valombrose >>> 06107 NICE Cedex 2 >>> France >>> >>> Personal website: http://www.iamphioxus.org/ >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >> >> >> -- >> Jia-Xing Yue >> >> Population Genomics and Complex Traits Group >> Tour Pasteur 8eme etage >> Facult? de M?decine >> Institute for Research on Cancer and Aging, Nice (IRCAN) >> CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) >> 28 Avenue de Valombrose >> 06107 NICE Cedex 2 >> France >> >> Personal website: http://www.iamphioxus.org/ >> >> > > > -- > Jia-Xing Yue > > Population Genomics and Complex Traits Group > Tour Pasteur 8eme etage > Facult? de M?decine > Institute for Research on Cancer and Aging, Nice (IRCAN) > CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) > 28 Avenue de Valombrose > 06107 NICE Cedex 2 > France > > Personal website: http://www.iamphioxus.org/ > > > -- Jia-Xing Yue Population Genomics and Complex Traits Group Tour Pasteur 8eme etage Facult? de M?decine Institute for Research on Cancer and Aging, Nice (IRCAN) CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) 28 Avenue de Valombrose 06107 NICE Cedex 2 France Personal website: http://www.iamphioxus.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcg at cau.edu.cn Mon May 1 07:32:30 2017 From: dcg at cau.edu.cn (dcg at cau.edu.cn) Date: Mon, 1 May 2017 21:32:30 +0800 Subject: [maker-devel] Why my maker get no results? Message-ID: <2017050121323023791817@cau.edu.cn> Dear sir: I' have bben working on genome annotation these days.My process in as below: 1. I split my contigs into 300 parts and deal with them simultaneously to speed up my process. 2. I used my splited-genome, protein, ESTs and RNA-seq to make the first alignment( est2genome=1, AED_threshold=0.2 ). 3. Merge the maker.*_.master_datastore_index.log to get all the paths of results. 4. gff_merge script to merge all the results in different dirs. However, there is no results returned. (My genome is about 3GB, but the gff of result is none.) index_all.log.all.gff 1KB index_all.log.all.maker.proteins.fasta 2837KB index_all.log.all.maker.transcripts.fasta 9866KB Where can the problems take place? Thanks! Yours sincerely. Chao Chao dcg at cau.edu.cn -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 1 14:04:36 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 1 May 2017 14:04:36 -0600 Subject: [maker-devel] Why my maker get no results? In-Reply-To: <2017050121323023791817@cau.edu.cn> References: <2017050121323023791817@cau.edu.cn> Message-ID: <28772B4F-D674-49E2-BFBD-CE2651CE0454@gmail.com> You can merge datastore indexes that way. You will need to run them separately (i.e. unmodified location and content from what MAKER gave you), and then merge the fasta and gff3 files afterwards. ?Carson > On May 1, 2017, at 7:32 AM, dcg at cau.edu.cn wrote: > > Dear sir: > I' have bben working on genome annotation these days.My process in as below: > > 1. I split my contigs into 300 parts and deal with them simultaneously to speed up my process. > 2. I used my splited-genome, protein, ESTs and RNA-seq to make the first alignment( est2genome=1, AED_threshold=0.2 ). > 3. Merge the maker.*_.master_datastore_index.log to get all the paths of results. > 4. gff_merge script to merge all the results in different dirs. > > However, there is no results returned. (My genome is about 3GB, but the gff of result is none.) > index_all.log.all.gff 1KB > index_all.log.all.maker.proteins.fasta 2837KB > index_all.log.all.maker.transcripts.fasta 9866KB > > > > > Where can the problems take place? > Thanks! > Yours sincerely. > > Chao Chao > dcg at cau.edu.cn _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim03ljy at 126.com Wed May 3 06:15:22 2017 From: jim03ljy at 126.com (=?GBK?B?wqy98NStSmlt?=) Date: Wed, 3 May 2017 20:15:22 +0800 (CST) Subject: [maker-devel] RepeatMasker: NCBIBlastSearchEngine::search: Error Message-ID: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> Hi, I'm a newbie of maker. I met some errors in Repeatmasker step. The error is here: NCBIBlastSearchEngine::search: Error...compressed subject database (/home/softwares/RepeatMasker/Libraries/20170127/general/is.lib) does not exist! I tried ncbi+blast 2.5.0 version and 2.6.0 version as the path to blast, both have the same error. And when I use the command as "maker -R", which skips the repeatmasker step, the maker could work. I checked the former similar errors reported by another user and he solved the problem by updating the RepBase. So, I deleted and re-installed the RepeatMasker, updated the RepBase, also installed RMblast. The error is the same. I'm stuck in the problem now. Would highly appreciate any help - thanks! Jinyuan Lu Shanghai Jiao Tong University No. 800 Dong Chuan Road,Minhang District, Shanghai, P.R. China -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcg at cau.edu.cn Wed May 3 09:29:18 2017 From: dcg at cau.edu.cn (dcg at cau.edu.cn) Date: Wed, 3 May 2017 23:29:18 +0800 Subject: [maker-devel] How to explain the maker results? Message-ID: <2017050323291810262239@cau.edu.cn> Dear sir: I?ve been using maker to do my genome annotation. However, I still have something I can't understand: 1. After assembly, I have many contigs. Firstly, I set est2genome=1 and protein2genome=1 , with my proteins, ESTs and RNA-seq.. Which way below is correct? 1.1 Each contig has its own gff. I just use its own maker_gff file to get a pyu.hmm(be used in snap practice), and then, train the single contig. 1.2 I merge all the maker_gff to produce a pyu.hmm(for snap) , and then, use this pyu.hmm to train all the contigs. 2. The aim of my project is to find new protein, so I need to guarantee the rigor of my annotation. I made a plan that the predicted protein should be successfully aligned to the Uniprot(reviewed protein, total number is about 30K) with 100% identity and coverage. However, if I choose method 1.2 as above: After the first step (est2genome=1 and protein2genome=1), about 1600 proteins can be 100% aligned to the Uniprot. After 2 rounds training(est2genome=0 and protein2genome=0), less proteins can be 100% aligned. Is my test method reasonable? Why the final results can't get more well aligned proteins? After training and fasta_merge, the results can be index_all.log.all.maker.proteins.fasta, index_all.log.all.maker.snap_masked.proteins.fasta, index_all.log.all.maker.non_overlapping_ab_initio.proteins.fasta, which is the final results? I'm looking forward to hearing from you. Thanks! Yours sincerely! Chao Chao dcg at cau.edu.cn -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.ence at ufl.edu Wed May 3 09:49:08 2017 From: d.ence at ufl.edu (Ence,daniel) Date: Wed, 3 May 2017 15:49:08 +0000 Subject: [maker-devel] RepeatMasker: NCBIBlastSearchEngine::search: Error In-Reply-To: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> References: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> Message-ID: Hi, the error is regarding a specific file (is.lib) which isn?t being found. Can you verify that the file is there after you updated Repbase? Use the command: ?ls -l /home/softwares/RepeatMasker/Libraries/20170127/general/is.lib? Thanks, Daniel Ence On May 3, 2017, at 8:15 AM, ???Jim > wrote: Hi, I'm a newbie of maker. I met some errors in Repeatmasker step. The error is here: NCBIBlastSearchEngine::search: Error...compressed subject database (/home/softwares/RepeatMasker/Libraries/20170127/general/is.lib) does not exist! I tried ncbi+blast 2.5.0 version and 2.6.0 version as the path to blast, both have the same error. And when I use the command as "maker -R", which skips the repeatmasker step, the maker could work. I checked the former similar errors reported by another user and he solved the problem by updating the RepBase. So, I deleted and re-installed the RepeatMasker, updated the RepBase, also installed RMblast. The error is the same. I'm stuck in the problem now. Would highly appreciate any help - thanks! Jinyuan Lu Shanghai Jiao Tong University No. 800 Dong Chuan Road,Minhang District, Shanghai, P.R. China _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 3 09:53:40 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 3 May 2017 09:53:40 -0600 Subject: [maker-devel] RepeatMasker: NCBIBlastSearchEngine::search: Error In-Reply-To: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> References: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> Message-ID: <92771056-564B-4953-B738-5A1B97FC71AF@gmail.com> RepBase and RepeatMasker have changed structure with the new 4.0.7 released two months ago. The new version and RepBase is only compatible with the new version of RepeatMasker. You have to update both (complete reinstall). Or you have to use the previous version of RepeatMasker with the previous version of RepBase. ?Carson > On May 3, 2017, at 6:15 AM, ???Jim wrote: > > Hi, I'm a newbie of maker. > I met some errors in Repeatmasker step. > > The error is here: > NCBIBlastSearchEngine::search: Error...compressed subject database (/home/softwares/RepeatMasker/Libraries/20170127/general/is.lib) does not exist! > > I tried ncbi+blast 2.5.0 version and 2.6.0 version as the path to blast, both have the same error. > And when I use the command as "maker -R", which skips the repeatmasker step, the maker could work. > I checked the former similar errors reported by another user and he solved the problem by updating the RepBase. > So, > I deleted and re-installed the RepeatMasker, updated the RepBase, also installed RMblast. > The error is the same. > > I'm stuck in the problem now. > Would highly appreciate any help - thanks! > > Jinyuan Lu > Shanghai Jiao Tong University > No. 800 Dong Chuan Road,Minhang District, Shanghai, P.R. China > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 3 09:55:41 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 3 May 2017 09:55:41 -0600 Subject: [maker-devel] RepeatMasker: NCBIBlastSearchEngine::search: Error In-Reply-To: <92771056-564B-4953-B738-5A1B97FC71AF@gmail.com> References: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> <92771056-564B-4953-B738-5A1B97FC71AF@gmail.com> Message-ID: <2E668C83-8884-430B-A764-CD0B44D03D19@gmail.com> You may want to use the previous version of both as the new version may still have hidden bugs. ?Carson > On May 3, 2017, at 9:53 AM, Carson Holt wrote: > > RepBase and RepeatMasker have changed structure with the new 4.0.7 released two months ago. The new version and RepBase is only compatible with the new version of RepeatMasker. You have to update both (complete reinstall). Or you have to use the previous version of RepeatMasker with the previous version of RepBase. > > ?Carson > > > >> On May 3, 2017, at 6:15 AM, ???Jim > wrote: >> >> Hi, I'm a newbie of maker. >> I met some errors in Repeatmasker step. >> >> The error is here: >> NCBIBlastSearchEngine::search: Error...compressed subject database (/home/softwares/RepeatMasker/Libraries/20170127/general/is.lib) does not exist! >> >> I tried ncbi+blast 2.5.0 version and 2.6.0 version as the path to blast, both have the same error. >> And when I use the command as "maker -R", which skips the repeatmasker step, the maker could work. >> I checked the former similar errors reported by another user and he solved the problem by updating the RepBase. >> So, >> I deleted and re-installed the RepeatMasker, updated the RepBase, also installed RMblast. >> The error is the same. >> >> I'm stuck in the problem now. >> Would highly appreciate any help - thanks! >> >> Jinyuan Lu >> Shanghai Jiao Tong University >> No. 800 Dong Chuan Road,Minhang District, Shanghai, P.R. China >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 3 10:04:20 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 3 May 2017 10:04:20 -0600 Subject: [maker-devel] RepeatMasker: NCBIBlastSearchEngine::search: Error In-Reply-To: <2E668C83-8884-430B-A764-CD0B44D03D19@gmail.com> References: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> <92771056-564B-4953-B738-5A1B97FC71AF@gmail.com> <2E668C83-8884-430B-A764-CD0B44D03D19@gmail.com> Message-ID: You may have to contact RepBase via e-mail to find out how to get the libraries compatible with RepeatMasker 4.0.6 as it looks like they have removed the previous release from the website. The last release for 4.0.6 was ?> repeatmaskerlibraries-20160829.tar.gz ?Carson > On May 3, 2017, at 9:55 AM, Carson Holt wrote: > > You may want to use the previous version of both as the new version may still have hidden bugs. > > ?Carson > >> On May 3, 2017, at 9:53 AM, Carson Holt > wrote: >> >> RepBase and RepeatMasker have changed structure with the new 4.0.7 released two months ago. The new version and RepBase is only compatible with the new version of RepeatMasker. You have to update both (complete reinstall). Or you have to use the previous version of RepeatMasker with the previous version of RepBase. >> >> ?Carson >> >> >> >>> On May 3, 2017, at 6:15 AM, ???Jim > wrote: >>> >>> Hi, I'm a newbie of maker. >>> I met some errors in Repeatmasker step. >>> >>> The error is here: >>> NCBIBlastSearchEngine::search: Error...compressed subject database (/home/softwares/RepeatMasker/Libraries/20170127/general/is.lib) does not exist! >>> >>> I tried ncbi+blast 2.5.0 version and 2.6.0 version as the path to blast, both have the same error. >>> And when I use the command as "maker -R", which skips the repeatmasker step, the maker could work. >>> I checked the former similar errors reported by another user and he solved the problem by updating the RepBase. >>> So, >>> I deleted and re-installed the RepeatMasker, updated the RepBase, also installed RMblast. >>> The error is the same. >>> >>> I'm stuck in the problem now. >>> Would highly appreciate any help - thanks! >>> >>> Jinyuan Lu >>> Shanghai Jiao Tong University >>> No. 800 Dong Chuan Road,Minhang District, Shanghai, P.R. China >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 3 10:10:48 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 3 May 2017 10:10:48 -0600 Subject: [maker-devel] How to explain the maker results? In-Reply-To: <2017050323291810262239@cau.edu.cn> References: <2017050323291810262239@cau.edu.cn> Message-ID: <049F8AC8-7E16-4F05-B8B2-01CA7AB88751@gmail.com> Use the merged gff3 to train snap, otherwise you won?t have enough models. Info on training can be found on the wiki ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors Also you can find additional detailed info by searching the mailing list archives ?> http://groups.google.com/group/maker-devel I?m not sure what you are asking with the last question. Alignment is not a function of training, and will not be affected by the hmm, but 100% coverage and identity is too strict a threshold even for data derived from the same species. ?Carson > On May 3, 2017, at 9:29 AM, dcg at cau.edu.cn wrote: > > Dear sir: > I?ve been using maker to do my genome annotation. However, I still have something I can't understand: > > 1. After assembly, I have many contigs. Firstly, I set est2genome=1 and protein2genome=1 , with my proteins, ESTs and RNA-seq.. Which way below is correct? > 1.1 Each contig has its own gff. I just use its own maker_gff file to get a pyu.hmm(be used in snap practice), and then, train the single contig. > 1.2 I merge all the maker_gff to produce a pyu.hmm(for snap) , and then, use this pyu.hmm to train all the contigs. > > 2. The aim of my project is to find new protein, so I need to guarantee the rigor of my annotation. > I made a plan that the predicted protein should be successfully aligned to the Uniprot(reviewed protein, total number is about 30K) with 100% identity and coverage. > However, if I choose method 1.2 as above: > After the first step (est2genome=1 and protein2genome=1), about 1600 proteins can be 100% aligned to the Uniprot. After 2 rounds training(est2genome=0 and protein2genome=0), less proteins can be 100% aligned. > Is my test method reasonable? Why the final results can't get more well aligned proteins? > After training and fasta_merge, the results can be index_all.log.all.maker.proteins.fasta, index_all.log.all.maker.snap_masked.proteins.fasta, index_all.log.all.maker.non_overlapping_ab_initio.proteins.fasta, which is the final results? > > > I'm looking forward to hearing from you. Thanks! > Yours sincerely! > > > Chao Chao > dcg at cau.edu.cn _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan.ricks at gmail.com Wed May 3 10:19:57 2017 From: nathan.ricks at gmail.com (Nathan Ricks) Date: Wed, 3 May 2017 10:19:57 -0600 Subject: [maker-devel] Post Processing of Annotations Message-ID: Hi, I've been running your Maker pipeline, and I've reached Post Processing of Annotations portion. In your Online training you use the output.blastp and the outuput.iprscan files to help assign function. My question is what format do these files need to be in. Iprscan can produce files in a variety of formats: tsv, xml, gff3, html and SVG while blastp can produce the tabular, pairise, xml and a number of others. Thanks Nathan Ricks -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 3 10:30:03 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 3 May 2017 10:30:03 -0600 Subject: [maker-devel] Post Processing of Annotations In-Reply-To: References: Message-ID: Use blastp with the the tab delimited format option and the UniProt/Swiss-Prot database. What additional filters you choose to set (i.e. e-value limit) may vary, although I would recommend 1e-6 or lower. ?Carson > On May 3, 2017, at 10:19 AM, Nathan Ricks wrote: > > Hi, > I've been running your Maker pipeline, and I've reached Post Processing of Annotations portion. In your Online training you use the output.blastp and the outuput.iprscan files to help assign function. > My question is what format do these files need to be in. > Iprscan can produce files in a variety of formats: tsv, xml, gff3, html and SVG > while blastp can produce the tabular, pairise, xml and a number of others. > > Thanks > > Nathan Ricks > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From d.ence at ufl.edu Wed May 3 10:34:35 2017 From: d.ence at ufl.edu (Ence,daniel) Date: Wed, 3 May 2017 16:34:35 +0000 Subject: [maker-devel] Post Processing of Annotations In-Reply-To: References: Message-ID: Hi, The iprscan output should be in tsv format, which is tab-separated, and the usage statement for the maker_functional_gff says that the blastp output should be in ?wu-blast -mformat 2?, which I think is tabbed too. ~Daniel > On May 3, 2017, at 12:19 PM, Nathan Ricks wrote: > > Hi, > I've been running your Maker pipeline, and I've reached Post Processing of Annotations portion. In your Online training you use the output.blastp and the outuput.iprscan files to help assign function. > My question is what format do these files need to be in. > Iprscan can produce files in a variety of formats: tsv, xml, gff3, html and SVG > while blastp can produce the tabular, pairise, xml and a number of others. > > Thanks > > Nathan Ricks > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed May 3 13:20:31 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 3 May 2017 13:20:31 -0600 Subject: [maker-devel] Post Processing of Annotations In-Reply-To: References: Message-ID: <0EE0D7F6-5F28-46E7-9AB8-CED93DC811F6@gmail.com> The maker_functional_gff and maker_functional_fasta scripts pull specific fields out of the UniProt fasta header, so they are tied to the format used by UniProt/Swiss-Prot. At one time I had modified them to also work with NR, but that was several years ago, so I don?t know if it would still work. ?Carson > On May 3, 2017, at 1:10 PM, Nathan Ricks wrote: > > Is it possible to make my own database from sequences that I have downloaded form NCBI instead of using the UniProt/Swiss-Prot? > > On Wed, May 3, 2017 at 10:30 AM, Carson Holt > wrote: > Use blastp with the the tab delimited format option and the UniProt/Swiss-Prot database. What additional filters you choose to set (i.e. e-value limit) may vary, although I would recommend 1e-6 or lower. > > ?Carson > > > On May 3, 2017, at 10:19 AM, Nathan Ricks > wrote: > > > > Hi, > > I've been running your Maker pipeline, and I've reached Post Processing of Annotations portion. In your Online training you use the output.blastp and the outuput.iprscan files to help assign function. > > My question is what format do these files need to be in. > > Iprscan can produce files in a variety of formats: tsv, xml, gff3, html and SVG > > while blastp can produce the tabular, pairise, xml and a number of others. > > > > Thanks > > > > Nathan Ricks > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mjfi2sb3 at gmail.com Thu May 4 00:37:52 2017 From: mjfi2sb3 at gmail.com (Salim Bougouffa) Date: Thu, 04 May 2017 06:37:52 +0000 Subject: [maker-devel] advanced repeat masking library constructions & rna-seq assembly choices Message-ID: Hi, I am attempting to annotate a plant genome. I have a couple of questions: *1) RNA-seq assembly* a) I assembled my RNA-seq data using Trinity and StringTie. The two produce drastically different numbers. When I compare the two assemblies for each sample using TransRate, StringTie produces a higher score. for most of the assemblies. I see in all of the threads that you recommend Trinity but doesn't trinity produce way too many transcripts (even after chucking out the "bad" ones using transrate). b) During hint creation in MAKER, does it take into account that different transcripts have different read coverage (expression levels). I guess my question is should I filter transcripts that have a small read coverage. *2) Repeat Masking * I am following the advanced repeat library construction tutorial ( http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced). The initial steps find 15 sequences for the LTR and 159 for MITE. But, when I get to the perl DIR_CRL/CRL_Step4.pl step, both output files (Inner_Seq_For_BLAST.fasta, lLTRs_Seq_For_BLAST.fasta) are empty. a) are these numbers normal because I was expecting a lot more than 16 for the LTR? b) I don't get any errors when I run CRL_Step4.pl yet no output. What's going on?! Many thanks, /SB -- ____________________________ Sent from Inbox Mobile -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim03ljy at 126.com Thu May 4 00:36:12 2017 From: jim03ljy at 126.com (=?GBK?B?wqy98NStSmlt?=) Date: Thu, 4 May 2017 14:36:12 +0800 (CST) Subject: [maker-devel] RepeatMasker: NCBIBlastSearchEngine::search: Error In-Reply-To: References: <50d2a447.a7ce.15bce3c7dcb.Coremail.jim03ljy@126.com> <92771056-564B-4953-B738-5A1B97FC71AF@gmail.com> <2E668C83-8884-430B-A764-CD0B44D03D19@gmail.com> Message-ID: <12de56ed.5ded.15bd22c56c7.Coremail.jim03ljy@126.com> Thanks a lot! Problem solved. I matched the RepeatMasker 4.0.7 with RepBase20170127 and it worked! Thanks! ----Jinyuan Lu At 2017-05-04 00:04:20, "Carson Holt" wrote: You may have to contact RepBase via e-mail to find out how to get the libraries compatible with RepeatMasker 4.0.6 as it looks like they have removed the previous release from the website. The last release for 4.0.6 was ?> repeatmaskerlibraries-20160829.tar.gz ?Carson On May 3, 2017, at 9:55 AM, Carson Holt wrote: You may want to use the previous version of both as the new version may still have hidden bugs. ?Carson On May 3, 2017, at 9:53 AM, Carson Holt wrote: RepBase and RepeatMasker have changed structure with the new 4.0.7 released two months ago. The new version and RepBase is only compatible with the new version of RepeatMasker. You have to update both (complete reinstall). Or you have to use the previous version of RepeatMasker with the previous version of RepBase. ?Carson On May 3, 2017, at 6:15 AM, ???Jim wrote: Hi, I'm a newbie of maker. I met some errors in Repeatmasker step. The error is here: NCBIBlastSearchEngine::search: Error...compressed subject database (/home/softwares/RepeatMasker/Libraries/20170127/general/is.lib) does not exist! I tried ncbi+blast 2.5.0 version and 2.6.0 version as the path to blast, both have the same error. And when I use the command as "maker -R", which skips the repeatmasker step, the maker could work. I checked the former similar errors reported by another user and he solved the problem by updating the RepBase. So, I deleted and re-installed the RepeatMasker, updated the RepBase, also installed RMblast. The error is the same. I'm stuck in the problem now. Would highly appreciate any help - thanks! Jinyuan Lu Shanghai Jiao Tong University No. 800 Dong Chuan Road,Minhang District, Shanghai, P.R. China _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcg at cau.edu.cn Fri May 5 07:43:43 2017 From: dcg at cau.edu.cn (dcg at cau.edu.cn) Date: Fri, 5 May 2017 21:43:43 +0800 Subject: [maker-devel] How to evaluate maker proteins' quality? Message-ID: <2017050521434331108720@cau.edu.cn> Dear sir: After I finished my maker running, I should check the quality of my results. My annotation purpose is to find some new proteins. There is about 30K reviewed proteins of my species. If I want to see how many predicted proteins can support the reviewed proteins, how to do it?(Can blastp be OK? How to set the threshold? ) I used Uniprot, ESTs and RNA-seq to do my annotation. From my perspective, if the protein is reviewed and used to train snap/augustus, we should get the same one after several training rounds. So I planned to align maker_proteins to Uniprot proteins(which I utilized to annotate). If the predicted proteins match Uniprot by 100% identity and coverage, they can be thought to support the reviewed proteins. Is it correct? If not, maybe I can evaluate my proteins only by AED value and proteome domain? I'm looking forward to your help. Thanks a lot! Yours sincerely! Chao Chao dcg at cau.edu.cn -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun May 7 18:31:31 2017 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 7 May 2017 18:31:31 -0600 Subject: [maker-devel] How to evaluate maker proteins' quality? In-Reply-To: <2017050521434331108720@cau.edu.cn> References: <2017050521434331108720@cau.edu.cn> Message-ID: <51620CC3-43D9-47D5-B8B3-871F291D6518@gmail.com> Because of small differences in the assemblies, individual variants, annotated proteins used as reference being partial, as well as potential assembly error, a 100% identity expectation is too high. About 90+% would be more reasonable for a same species comparison. AED gives a good correlation with protein confidence. A perfect zero score will not happen often though since the way alignment algorithms work will leave alignment errors around splice sites and short exons. Also the evidence used is never perfect, so with AED lower values are better than higher values but can not be used as an overly specific measurement (it is only correlative and not exact). ?Carson > On May 5, 2017, at 7:43 AM, dcg at cau.edu.cn wrote: > > Dear sir: > After I finished my maker running, I should check the quality of my results. > My annotation purpose is to find some new proteins. > There is about 30K reviewed proteins of my species. If I want to see how many predicted proteins can support the reviewed proteins, how to do it?(Can blastp be OK? How to set the threshold? ) > I used Uniprot, ESTs and RNA-seq to do my annotation. From my perspective, if the protein is reviewed and used to train snap/augustus, we should get the same one after several training rounds. So I planned to align maker_proteins to Uniprot proteins(which I utilized to annotate). If the predicted proteins match Uniprot by 100% identity and coverage, they can be thought to support the reviewed proteins. Is it correct? > > If not, maybe I can evaluate my proteins only by AED value and proteome domain? > > I'm looking forward to your help. Thanks a lot! > > Yours sincerely! > > Chao Chao > dcg at cau.edu.cn _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun May 7 19:17:37 2017 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 7 May 2017 19:17:37 -0600 Subject: [maker-devel] advanced repeat masking library constructions & rna-seq assembly choices In-Reply-To: References: Message-ID: <18086AF2-01C3-4671-B974-C5FF36460618@gmail.com> Michael can you answer the second question (Michael wrote the protocol, so I CC?d him). With respect to the first question. Expression level is not necessarily relevant to the annotation process (so no MAKER does not look at read coverage). Instead we use the transcript assemblies to identify introns via splice aware alignment (yes it is the introns and not the exons we care about). Trinity has a nice option called jaccard_clip which avoids false merging of neighboring transcripts (mostly occurs in fungi where UTR can overlap). Merging of transcripts will cause extra introns to be assigned as hints as well as potential overextension of UTR during final polishing steps. The jaccard_clip option is the main reason we recommend Trinity. If Stringtie has a similar option, then it can be used as well. Thanks, Carson > On May 4, 2017, at 12:37 AM, Salim Bougouffa wrote: > > Hi, > > I am attempting to annotate a plant genome. I have a couple of questions: > > 1) RNA-seq assembly > a) I assembled my RNA-seq data using Trinity and StringTie. The two produce drastically different numbers. When I compare the two assemblies for each sample using TransRate, StringTie produces a higher score. for most of the assemblies. I see in all of the threads that you recommend Trinity but doesn't trinity produce way too many transcripts (even after chucking out the "bad" ones using transrate). > b) During hint creation in MAKER, does it take into account that different transcripts have different read coverage (expression levels). I guess my question is should I filter transcripts that have a small read coverage. > > 2) Repeat Masking > I am following the advanced repeat library construction tutorial (http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced ). The initial steps find 15 sequences for the LTR and 159 for MITE. But, when I get to the perl DIR_CRL/CRL_Step4.pl step, both output files (Inner_Seq_For_BLAST.fasta, lLTRs_Seq_For_BLAST.fasta) are empty. > > a) are these numbers normal because I was expecting a lot more than 16 for the LTR? > b) I don't get any errors when I run CRL_Step4.pl yet no output. What's going on?! > > Many thanks, > /SB > -- > ____________________________ > Sent from Inbox Mobile > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcampbel at cshl.edu Sun May 7 19:24:27 2017 From: mcampbel at cshl.edu (Campbell, Michael) Date: Mon, 8 May 2017 01:24:27 +0000 Subject: [maker-devel] advanced repeat masking library constructions & rna-seq assembly choices In-Reply-To: <18086AF2-01C3-4671-B974-C5FF36460618@gmail.com> References: <18086AF2-01C3-4671-B974-C5FF36460618@gmail.com> Message-ID: <076B034E-8107-49CE-90C7-277AA4AB4ED3@cshl.edu> Hi SB, I?ve added Ning Jaing to this email. She has put great effort into updating this protocol recently and will be able to address your questions better than I can. Ning, would you mind helping out with this? Thanks, Mike On May 7, 2017, at 9:17 PM, Carson Holt > wrote: Michael can you answer the second question (Michael wrote the protocol, so I CC?d him). With respect to the first question. Expression level is not necessarily relevant to the annotation process (so no MAKER does not look at read coverage). Instead we use the transcript assemblies to identify introns via splice aware alignment (yes it is the introns and not the exons we care about). Trinity has a nice option called jaccard_clip which avoids false merging of neighboring transcripts (mostly occurs in fungi where UTR can overlap). Merging of transcripts will cause extra introns to be assigned as hints as well as potential overextension of UTR during final polishing steps. The jaccard_clip option is the main reason we recommend Trinity. If Stringtie has a similar option, then it can be used as well. Thanks, Carson On May 4, 2017, at 12:37 AM, Salim Bougouffa > wrote: Hi, I am attempting to annotate a plant genome. I have a couple of questions: 1) RNA-seq assembly a) I assembled my RNA-seq data using Trinity and StringTie. The two produce drastically different numbers. When I compare the two assemblies for each sample using TransRate, StringTie produces a higher score. for most of the assemblies. I see in all of the threads that you recommend Trinity but doesn't trinity produce way too many transcripts (even after chucking out the "bad" ones using transrate). b) During hint creation in MAKER, does it take into account that different transcripts have different read coverage (expression levels). I guess my question is should I filter transcripts that have a small read coverage. 2) Repeat Masking I am following the advanced repeat library construction tutorial (http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced). The initial steps find 15 sequences for the LTR and 159 for MITE. But, when I get to the perl DIR_CRL/CRL_Step4.pl step, both output files (Inner_Seq_For_BLAST.fasta, lLTRs_Seq_For_BLAST.fasta) are empty. a) are these numbers normal because I was expecting a lot more than 16 for the LTR? b) I don't get any errors when I run CRL_Step4.pl yet no output. What's going on?! Many thanks, /SB -- ____________________________ Sent from Inbox Mobile _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From jiangn at msu.edu Mon May 8 09:50:45 2017 From: jiangn at msu.edu (Jiang, Ning) Date: Mon, 8 May 2017 15:50:45 +0000 Subject: [maker-devel] advanced repeat masking library constructions & rna-seq assembly choices In-Reply-To: <076B034E-8107-49CE-90C7-277AA4AB4ED3@cshl.edu> References: <18086AF2-01C3-4671-B974-C5FF36460618@gmail.com>, <076B034E-8107-49CE-90C7-277AA4AB4ED3@cshl.edu> Message-ID: Hi Salim, I am sorry to learn about the issues. it depends on the quality of your genome assembly for how many intact LTR elements you would get; however, 16 seems too low to me. The inner and LTR sequence file should NOT be empty. Some times the issue could be due to that the initial sequence name is long and complicated. If that's the case for your sequences, you might want to simplify your sequence name (only including letters and numbers) and try again. We are working on an automatic pipeline for LTR collection, if everything goes smoothly, it should be available in two to three months. Best wishes, Ning ________________________________ From: Campbell, Michael Sent: Sunday, May 7, 2017 9:24 PM To: Carson Holt Cc: Salim Bougouffa; maker-devel at yandell-lab.org List; Jiang, Ning Subject: Re: [maker-devel] advanced repeat masking library constructions & rna-seq assembly choices Hi SB, I?ve added Ning Jaing to this email. She has put great effort into updating this protocol recently and will be able to address your questions better than I can. Ning, would you mind helping out with this? Thanks, Mike On May 7, 2017, at 9:17 PM, Carson Holt > wrote: Michael can you answer the second question (Michael wrote the protocol, so I CC?d him). With respect to the first question. Expression level is not necessarily relevant to the annotation process (so no MAKER does not look at read coverage). Instead we use the transcript assemblies to identify introns via splice aware alignment (yes it is the introns and not the exons we care about). Trinity has a nice option called jaccard_clip which avoids false merging of neighboring transcripts (mostly occurs in fungi where UTR can overlap). Merging of transcripts will cause extra introns to be assigned as hints as well as potential overextension of UTR during final polishing steps. The jaccard_clip option is the main reason we recommend Trinity. If Stringtie has a similar option, then it can be used as well. Thanks, Carson On May 4, 2017, at 12:37 AM, Salim Bougouffa > wrote: Hi, I am attempting to annotate a plant genome. I have a couple of questions: 1) RNA-seq assembly a) I assembled my RNA-seq data using Trinity and StringTie. The two produce drastically different numbers. When I compare the two assemblies for each sample using TransRate, StringTie produces a higher score. for most of the assemblies. I see in all of the threads that you recommend Trinity but doesn't trinity produce way too many transcripts (even after chucking out the "bad" ones using transrate). b) During hint creation in MAKER, does it take into account that different transcripts have different read coverage (expression levels). I guess my question is should I filter transcripts that have a small read coverage. 2) Repeat Masking I am following the advanced repeat library construction tutorial (http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced). The initial steps find 15 sequences for the LTR and 159 for MITE. But, when I get to the perl DIR_CRL/CRL_Step4.pl step, both output files (Inner_Seq_For_BLAST.fasta, lLTRs_Seq_For_BLAST.fasta) are empty. a) are these numbers normal because I was expecting a lot more than 16 for the LTR? b) I don't get any errors when I run CRL_Step4.pl yet no output. What's going on?! Many thanks, /SB -- ____________________________ Sent from Inbox Mobile _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mjfi2sb3 at gmail.com Mon May 8 10:41:51 2017 From: mjfi2sb3 at gmail.com (Salim Bougouffa) Date: Mon, 08 May 2017 16:41:51 +0000 Subject: [maker-devel] advanced repeat masking library constructions & rna-seq assembly choices In-Reply-To: References: <18086AF2-01C3-4671-B974-C5FF36460618@gmail.com> <076B034E-8107-49CE-90C7-277AA4AB4ED3@cshl.edu> Message-ID: Thank you all for your responses. Regards, /SB On Mon, 8 May 2017, 18:50 Jiang, Ning, wrote: > Hi Salim, > > > I am sorry to learn about the issues. it depends on the quality of your > genome assembly for how many intact LTR elements you would get; however, 16 > seems too low to me. > > > The inner and LTR sequence file should NOT be empty. Some times the issue > could be due to that the initial sequence name is long and complicated. If > that's the case for your sequences, you might want to simplify your > sequence name (only including letters and numbers) and try again. > > > We are working on an automatic pipeline for LTR collection, if everything > goes smoothly, it should be available in two to three months. > > > Best wishes, > > > Ning > ------------------------------ > *From:* Campbell, Michael > *Sent:* Sunday, May 7, 2017 9:24 PM > *To:* Carson Holt > *Cc:* Salim Bougouffa; maker-devel at yandell-lab.org List; Jiang, Ning > *Subject:* Re: [maker-devel] advanced repeat masking library > constructions & rna-seq assembly choices > > Hi SB, > > I?ve added Ning Jaing to this email. She has put great effort into > updating this protocol recently and will be able to address your questions > better than I can. > > Ning, would you mind helping out with this? > > Thanks, > Mike > > On May 7, 2017, at 9:17 PM, Carson Holt carsonhh at gmail.com>> wrote: > > Michael can you answer the second question (Michael wrote the protocol, so > I CC?d him). > > With respect to the first question. Expression level is not necessarily > relevant to the annotation process (so no MAKER does not look at read > coverage). Instead we use the transcript assemblies to identify introns via > splice aware alignment (yes it is the introns and not the exons we care > about). Trinity has a nice option called jaccard_clip which avoids false > merging of neighboring transcripts (mostly occurs in fungi where UTR can > overlap). Merging of transcripts will cause extra introns to be assigned as > hints as well as potential overextension of UTR during final polishing > steps. The jaccard_clip option is the main reason we recommend Trinity. If > Stringtie has a similar option, then it can be used as well. > > Thanks, > Carson > > > > On May 4, 2017, at 12:37 AM, Salim Bougouffa mjfi2sb3 at gmail.com>> wrote: > > Hi, > > I am attempting to annotate a plant genome. I have a couple of questions: > > 1) RNA-seq assembly > a) I assembled my RNA-seq data using Trinity and StringTie. The two > produce drastically different numbers. When I compare the two assemblies > for each sample using TransRate, StringTie produces a higher score. for > most of the assemblies. I see in all of the threads that you recommend > Trinity but doesn't trinity produce way too many transcripts (even after > chucking out the "bad" ones using transrate). > b) During hint creation in MAKER, does it take into account that different > transcripts have different read coverage (expression levels). I guess my > question is should I filter transcripts that have a small read coverage. > > 2) Repeat Masking > I am following the advanced repeat library construction tutorial ( > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced). > The initial steps find 15 sequences for the LTR and 159 for MITE. But, when > I get to the perl DIR_CRL/CRL_Step4.pl step, both output files > (Inner_Seq_For_BLAST.fasta, lLTRs_Seq_For_BLAST.fasta) are empty. > > a) are these numbers normal because I was expecting a lot more than 16 for > the LTR? > b) I don't get any errors when I run CRL_Step4.pl yet no output. What's > going on?! > > Many thanks, > /SB > -- > > ____________________________ > Sent from Inbox Mobile > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Wed May 10 09:48:01 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Wed, 10 May 2017 11:48:01 -0400 Subject: [maker-devel] want coding sequences Message-ID: Hello: Thanks for development and maintenance of the tool "Maker2". We have used Maker2 to do genome annotation of a new rodent species. Now we are doing downstream analysis, which requires inputs of coding sequences from different species. I found the outputs I got from Maker2 only include protein sequences and transcripts. Is there an easy way that I can get the coding sequences for our annotated genome? Many thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From munholl at uwindsor.ca Wed May 10 14:24:49 2017 From: munholl at uwindsor.ca (Seth Munholland) Date: Wed, 10 May 2017 16:24:49 -0400 Subject: [maker-devel] MAKER only running 1 task Message-ID: Hello, I'm running a MAKER annotation on an ubuntu cluster and my top screen shows the following: Tasks: 831 total, 3 running, 826 sleeping, 0 stopped, 1 zombie and my maker run (on a screen) shows: ... total clusters:4 now processing 0 ...processing 0 of 3 ...processing 1 of 3 ...processing 2 of 3 total clusters:4 now processing 0 flattening protein clusters prepare section files merging blast reports... flattening protein clusters prepare section files merging blast reports... flattening protein clusters prepare section files Of the 826, the vast majority of them are maker and only one of the running tasks is maker. Is this normal behaviour or has my maker run stopped processing? Seth Munholland, B.Sc. Department of Biological Sciences Rm. 304 Biology Building University of Windsor 401 Sunset Ave. N9B 3P4 T: (519) 253-3000 Ext: 4755 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu May 11 09:58:59 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 11 May 2017 09:58:59 -0600 Subject: [maker-devel] want coding sequences In-Reply-To: References: Message-ID: <82DF4E49-8E78-45F6-8A78-01A45F908987@gmail.com> Use the fasta_tool utility with ?trim_maker_utr to get just the CDS part of each transcript. ?Carson > On May 10, 2017, at 9:48 AM, Quanwei Zhang wrote: > > Hello: > > Thanks for development and maintenance of the tool "Maker2". We have used Maker2 to do genome annotation of a new rodent species. Now we are doing downstream analysis, which requires inputs of coding sequences from different species. > > I found the outputs I got from Maker2 only include protein sequences and transcripts. Is there an easy way that I can get the coding sequences for our annotated genome? > > Many thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu May 11 10:03:04 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 11 May 2017 10:03:04 -0600 Subject: [maker-devel] MAKER only running 1 task In-Reply-To: References: Message-ID: <2119AF6E-3571-4D28-9D81-B24A513839E5@gmail.com> It may be frozen, or if it is on the last contig it can be running a non-paralelizable step (very last cluster merging step for each contig is not paralelizable). So on a large contig the very last step can take a little while, and if there are no other contigs, then there is no work to give to other processes to keep them busy in the meantime. So everyone has to wait so they can all exit together once the last step is done. But as I said, this will only happen if you are on the last contig and it is large. Otherwise it is probably frozen somehow (look for any errors further up the log). ?Carson > On May 10, 2017, at 2:24 PM, Seth Munholland wrote: > > Hello, > > I'm running a MAKER annotation on an ubuntu cluster and my top screen shows the following: > > Tasks: 831 total, 3 running, 826 sleeping, 0 stopped, 1 zombie > > and my maker run (on a screen) shows: > > ... > total clusters:4 now processing 0 > ...processing 0 of 3 > ...processing 1 of 3 > ...processing 2 of 3 > total clusters:4 now processing 0 > flattening protein clusters > prepare section files > merging blast reports... > flattening protein clusters > prepare section files > merging blast reports... > flattening protein clusters > prepare section files > > Of the 826, the vast majority of them are maker and only one of the running tasks is maker. Is this normal behaviour or has my maker run stopped processing? > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 <>_______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaymik at tgen.org Thu May 11 12:29:46 2017 From: mnaymik at tgen.org (Marcus Naymik) Date: Thu, 11 May 2017 11:29:46 -0700 Subject: [maker-devel] Maker gene vs snap match in final GFF's Message-ID: In the final GFF annotations what is the difference between a 'gene' from maker and a 'match' from snap? -- *This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu May 11 12:33:55 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 11 May 2017 12:33:55 -0600 Subject: [maker-devel] Maker gene vs snap match in final GFF's In-Reply-To: References: Message-ID: <8A77F384-9BAD-4BF4-BD1E-EDAE4E010612@gmail.com> MAKER results can be the result of additional hints sent to SNAP together with post processing to add UTR and additional exons that have support form transcript evidence. MAKER results will also have support from either protein or EST/mRNA evidence. SNAP match is simply the raw ab initio call made by SNAP (no hints, no post processing, and may or may not have evidence supporting the structure). They are there just for reference purposes. so you know what SNAP will produce outside of MAKER given the underlying HMM. ?Carson > On May 11, 2017, at 12:29 PM, Marcus Naymik wrote: > > In the final GFF annotations what is the difference between a 'gene' from maker and a 'match' from snap? > > This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.ence at ufl.edu Thu May 11 12:35:00 2017 From: d.ence at ufl.edu (Ence,daniel) Date: Thu, 11 May 2017 18:35:00 +0000 Subject: [maker-devel] Maker gene vs snap match in final GFF's In-Reply-To: References: Message-ID: <2560F44E-3D8D-4E81-B7B9-621921408B61@mail.ufl.edu> The two might have identical coordinates in some cases, but they are different kinds of features. The ?match? is a product of an abinitio gene prediction algorithm, while the ?gene? is is supported by evidence and passed through the maker polishing and filtering steps. On May 11, 2017, at 2:29 PM, Marcus Naymik > wrote: In the final GFF annotations what is the difference between a 'gene' from maker and a 'match' from snap? This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan.ricks at gmail.com Fri May 12 15:05:45 2017 From: nathan.ricks at gmail.com (Nathan Ricks) Date: Fri, 12 May 2017 15:05:45 -0600 Subject: [maker-devel] Using mpich2 with Maker Message-ID: I've been using maker for some time now. However, I would like to speed up the process by using the mpich2 option. When use the command ./Build install, the following error is produced. Any help would be appreciated. Nathan Ricks [image: Inline image 1] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 170139 bytes Desc: not available URL: From yuejiaxing at gmail.com Mon May 15 05:28:47 2017 From: yuejiaxing at gmail.com (Jia-Xing Yue) Date: Mon, 15 May 2017 13:28:47 +0200 Subject: [maker-devel] multiple overlapped snoRNA genes got annotated by maker Message-ID: Hello, I configured snoscan (v.0.9.1) for my maker installation (v2.31.9) and run the annotation for a yeast (S. cerevisiae) genome. I think the annotation went well with regard to tRNAs and protein-coding genes but I am not sure about snoRNAs. I found multiple overlapped snoRNA genes were annotated by maker as the example below shows. I was wondering if this is expected. If not, what might have caused this problem and is there a way to work around. Thanks in advance! chrIX maker gene 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding-gene-0.49 chrIX maker snoRNA 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|90|0 chrIX maker exon 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1:exon:12260;Parent=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1 chrIX maker gene 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding-gene-0.50 chrIX maker snoRNA 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|190|0 chrIX maker exon 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1:exon:12261;Parent=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1 chrIX maker gene 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding-gene-0.51 chrIX maker snoRNA 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|88|0 chrIX maker exon 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1:exon:12262;Parent=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1 chrIX maker gene 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding-gene-0.52 chrIX maker snoRNA 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|118|0 chrIX maker exon 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1:exon:12263;Parent=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1 chrIX maker gene 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding-gene-0.53 chrIX maker snoRNA 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|127|0 chrIX maker exon 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1:exon:12264;Parent=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1 Best, Jia-Xing -- Jia-Xing Yue Population Genomics and Complex Traits Group Tour Pasteur 8eme etage Facult? de M?decine Institute for Research on Cancer and Aging, Nice (IRCAN) CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) 28 Avenue de Valombrose 06107 NICE Cedex 2 France Personal website: http://www.iamphioxus.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon May 15 10:28:58 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 15 May 2017 12:28:58 -0400 Subject: [maker-devel] multiple overlapped snoRNA genes got annotated by maker In-Reply-To: References: Message-ID: <0AC3F89F-28EB-4E2A-ADDE-2DF8BD625416@gmail.com> Hi Jia-Xing, That has been my experience in the past as well. For the non-coding RNAs tRNA-scan is very accurate while snoscan seems to be quite sensitive but very specific. Did you give it a ?snoscan_meth? file? Giving it a snoscan_meth file will help with accuracy. The biggest gains in accuracy are from small RNA-seq data. In the paper where we used snoscan on maize we didn?t keep any snoRNA predictions that didn?t have support from small RNA-seq data, in practical terms we got rid of anything with a AED of 1. I hope this helps, Mike > On May 15, 2017, at 7:28 AM, Jia-Xing Yue wrote: > > Hello, > > I configured snoscan (v.0.9.1) for my maker installation (v2.31.9) and run the annotation for a yeast (S. cerevisiae) genome. I think the annotation went well with regard to tRNAs and protein-coding genes but I am not sure about snoRNAs. I found multiple overlapped snoRNA genes were annotated by maker as the example below shows. I was wondering if this is expected. If not, what might have caused this problem and is there a way to work around. Thanks in advance! > > chrIX maker gene 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding-gene-0.49 > chrIX maker snoRNA 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|90|0 > chrIX maker exon 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1:exon:12260;Parent=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1 > chrIX maker gene 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding-gene-0.50 > chrIX maker snoRNA 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|190|0 > chrIX maker exon 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1:exon:12261;Parent=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1 > chrIX maker gene 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding-gene-0.51 > chrIX maker snoRNA 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|88|0 > chrIX maker exon 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1:exon:12262;Parent=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1 > chrIX maker gene 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding-gene-0.52 > chrIX maker snoRNA 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|118|0 > chrIX maker exon 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1:exon:12263;Parent=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1 > chrIX maker gene 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding-gene-0.53 > chrIX maker snoRNA 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|127|0 > chrIX maker exon 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1:exon:12264;Parent=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1 > > Best, > Jia-Xing > > > -- > Jia-Xing Yue > > Population Genomics and Complex Traits Group > Tour Pasteur 8eme etage > Facult? de M?decine > Institute for Research on Cancer and Aging, Nice (IRCAN) > CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) > 28 Avenue de Valombrose > 06107 NICE Cedex 2 > France > > Personal website: http://www.iamphioxus.org/ > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From yuejiaxing at gmail.com Mon May 15 11:14:26 2017 From: yuejiaxing at gmail.com (Jia-Xing Yue) Date: Mon, 15 May 2017 19:14:26 +0200 Subject: [maker-devel] multiple overlapped snoRNA genes got annotated by maker In-Reply-To: <0AC3F89F-28EB-4E2A-ADDE-2DF8BD625416@gmail.com> References: <0AC3F89F-28EB-4E2A-ADDE-2DF8BD625416@gmail.com> Message-ID: Hi Michael, Many thanks for the information! I will specify the "snoscan_meth" file and give it another try then. I majorly want to use maker to annotate protein-coding genes and tRNAs. But it would be nice to have snoRNA reasonably annotated as well. Thanks gain and have a great day! Best, Jia-Xing On Mon, May 15, 2017 at 6:28 PM, Michael Campbell < michael.s.campbell1 at gmail.com> wrote: > Hi Jia-Xing, > > That has been my experience in the past as well. For the non-coding RNAs > tRNA-scan is very accurate while snoscan seems to be quite sensitive but > very specific. Did you give it a ?snoscan_meth? file? Giving it > a snoscan_meth file will help with accuracy. The biggest gains in accuracy > are from small RNA-seq data. In the paper where we used snoscan on maize we > didn?t keep any snoRNA predictions that didn?t have support from small > RNA-seq data, in practical terms we got rid of anything with a AED of 1. > > I hope this helps, > Mike > > On May 15, 2017, at 7:28 AM, Jia-Xing Yue wrote: > > Hello, > > I configured snoscan (v.0.9.1) for my maker installation (v2.31.9) and run > the annotation for a yeast (S. cerevisiae) genome. I think the annotation > went well with regard to tRNAs and protein-coding genes but I am not sure > about snoRNAs. I found multiple overlapped snoRNA genes were annotated by > maker as the example below shows. I was wondering if this is expected. If > not, what might have caused this problem and is there a way to work around. > Thanks in advance! > > chrIX maker gene 4328 4416 . + . > ID=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX- > noncoding-gene-0.49 > chrIX maker snoRNA 4328 4416 . + . > ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;Parent= > snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX- > noncoding-gene-0.49-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|90|0 > chrIX maker exon 4328 4416 . + . > ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1:exon:12260; > Parent=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1 > chrIX maker gene 4375 4563 . + . > ID=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX- > noncoding-gene-0.50 > chrIX maker snoRNA 4375 4563 . + . > ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;Parent= > snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX- > noncoding-gene-0.50-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1| > 0|0|-1|0|1|190|0 > chrIX maker exon 4375 4563 . + . > ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1:exon:12261; > Parent=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1 > chrIX maker gene 4375 4461 . + . > ID=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX- > noncoding-gene-0.51 > chrIX maker snoRNA 4375 4461 . + . > ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;Parent= > snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX- > noncoding-gene-0.51-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|88|0 > chrIX maker exon 4375 4461 . + . > ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1:exon:12262; > Parent=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1 > chrIX maker gene 4375 4491 . + . > ID=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX- > noncoding-gene-0.52 > chrIX maker snoRNA 4375 4491 . + . > ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;Parent= > snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX- > noncoding-gene-0.52-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1| > 0|0|-1|0|1|118|0 > chrIX maker exon 4375 4491 . + . > ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1:exon:12263; > Parent=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1 > chrIX maker gene 4375 4500 . + . > ID=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX- > noncoding-gene-0.53 > chrIX maker snoRNA 4375 4500 . + . > ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;Parent= > snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX- > noncoding-gene-0.53-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1| > 0|0|-1|0|1|127|0 > chrIX maker exon 4375 4500 . + . > ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1:exon:12264; > Parent=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1 > > Best, > Jia-Xing > > > -- > Jia-Xing Yue > > Population Genomics and Complex Traits Group > Tour Pasteur 8eme etage > Facult? de M?decine > Institute for Research on Cancer and Aging, Nice (IRCAN) > CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) > 28 Avenue de Valombrose > 06107 NICE Cedex 2 > France > > Personal website: http://www.iamphioxus.org/ > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -- Jia-Xing Yue Population Genomics and Complex Traits Group Tour Pasteur 8eme etage Facult? de M?decine Institute for Research on Cancer and Aging, Nice (IRCAN) CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) 28 Avenue de Valombrose 06107 NICE Cedex 2 France Personal website: http://www.iamphioxus.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 16 08:51:00 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 16 May 2017 08:51:00 -0600 Subject: [maker-devel] Using mpich2 with Maker In-Reply-To: References: Message-ID: <6C739B3B-D5D6-4263-A73C-4EA1762B1EE1@gmail.com> You probably need to reinstall Parse::RecDescent, Inline, Inline::C, or all of the above via CPAN (perl?s module installer). The ones already installed on your system may have issues. If you do not have the ability to install modules, you can install them just for your user using local::lib and the bootstrapping instructions here ?> http://search.cpan.org/~haarg/local-lib-2.000019/lib/local/lib.pm#The_bootstrapping_technique Then reinstall MAKER. ?Carson > On May 12, 2017, at 3:05 PM, Nathan Ricks wrote: > > I've been using maker for some time now. However, I would like to speed up the process by using the mpich2 option. When use the command ./Build install, the following error is produced. Any help would be appreciated. > > Nathan Ricks > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From salim.bougouffa at kaust.edu.sa Sun May 21 01:45:50 2017 From: salim.bougouffa at kaust.edu.sa (Salim Bougouffa) Date: Sun, 21 May 2017 10:45:50 +0300 Subject: [maker-devel] augustus exon calling ~ Message-ID: Hi Maker folks, I have several issues with a plant genome annotation that I am currently doing but perhaps the most recurrent issues are: 1/ CDSs that are missed where significant rna-seq evidence is there (figure artemis01) 2/ vice versa where one or two exons are added without rna-seq evidence/intron hints (figure artemis02) info about the runs: 1/ using augustus with a pre-existing model for a related plant that has high homology to the one I am annotating 2/ umask=1 (seems to do better than umask=0; is this a good thing to do) 3/ evm = 1 (seems to perform better than emv=0) 4/ repeatmasking (denovo + repbase) Best, /SB _______________________________________________________Salim Bougouffa(PhD), Postdoctoral Fellow 4700 KAUST, CBRC, Blg3. Office4326-WS05, Thuwal, Jeddah, KSA, 23955-6900 (966) 012 808 2963 || salim.bougouff at kaust.edu.sa -- ------------------------------ This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: artemis01.png Type: image/png Size: 166352 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: artemis02.png Type: image/png Size: 97140 bytes Desc: not available URL: From mjfi2sb3 at gmail.com Sun May 21 01:48:48 2017 From: mjfi2sb3 at gmail.com (Salim Bougouffa) Date: Sun, 21 May 2017 07:48:48 +0000 Subject: [maker-devel] augustus exon calling ~ Message-ID: Hi Maker folks, I have several issues with a plant genome annotation that I am currently doing but perhaps the most recurrent issues are: 1/ CDSs that are missed where significant rna-seq evidence is there (figure artemis01) 2/ vice versa where one or two exons are added without rna-seq evidence/intron hints (figure artemis02) info about the runs: 1/ using augustus with a pre-existing model for a related plant that has high homology to the one I am annotating 2/ umask=1 (seems to do better than umask=0; is this a good thing to do) 3/ evm = 1 (seems to perform better than emv=0) 4/ repeatmasking (denovo + repbase) Best, /SB -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: artemis01.png Type: image/png Size: 166352 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: artemis02.png Type: image/png Size: 97140 bytes Desc: not available URL: From mjfi2sb3 at gmail.com Sun May 21 01:55:31 2017 From: mjfi2sb3 at gmail.com (Salim Bougouffa) Date: Sun, 21 May 2017 07:55:31 +0000 Subject: [maker-devel] augustus exon calling ~ In-Reply-To: References: Message-ID: Hi, I should have mentioned a third scenario where an exon is not called fully by maker despite augustus getting it right (figure artemis03) [image: artemis03.png] On Sun, 21 May 2017 at 10:48 Salim Bougouffa wrote: > Hi Maker folks, > > I have several issues with a plant genome annotation that I am currently > doing but perhaps the most recurrent issues are: > > 1/ CDSs that are missed where significant rna-seq evidence is there > (figure artemis01) > 2/ vice versa where one or two exons are added without rna-seq > evidence/intron hints (figure artemis02) > > info about the runs: > 1/ using augustus with a pre-existing model for a related plant that has > high homology to the one I am annotating > 2/ umask=1 (seems to do better than umask=0; is this a good thing to do) > 3/ evm = 1 (seems to perform better than emv=0) > 4/ repeatmasking (denovo + repbase) > > Best, > /SB > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: artemis03.png Type: image/png Size: 145860 bytes Desc: not available URL: From admin at genome.arizona.edu Tue May 23 13:52:17 2017 From: admin at genome.arizona.edu (System Admin) Date: Tue, 23 May 2017 12:52:17 -0700 Subject: [maker-devel] Hyperthreading Message-ID: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> We are using maker in a cluster with mpich. Currently hyperthreading is on and we use 'mpiexec -n ' to start maker. Our machinelist file for mpich specifies the total emulated cores for each node. With hyperthreading on, we have up to 256 total emulated cores available. Which is the optimal scenario? 1. Use '-n 256' 2. Use '-n 128' with hyperthreading still on 3. Use '-n 128' with hyperthreading turned off Thanks From carsonhh at gmail.com Tue May 23 14:19:29 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 23 May 2017 14:19:29 -0600 Subject: [maker-devel] Hyperthreading In-Reply-To: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> References: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> Message-ID: MAKER is more of a pipeline. It will launch external tools on as many CPUs as you give it with the mpiexec command. I?ve found that many of the tools used get a boost with hyperthreading even though optimizations are not explicitly built into their code. The short answer is you would have to try it both ways. I doubt there will be much more than a 10-15% difference in runtime. You can pull back to 128 would if you find that you are running low on RAM or have a high IO burden (both of which will double if you go from 128 to 256 even though CPU isn?t really doubling). Also MAKER per job performance plateaus at around 200 processes due to communication overhead. Above that threshold it is often useful to divide datasets into multiple separate jobs that can run simultaneously. ?Carson > On May 23, 2017, at 1:52 PM, System Admin wrote: > > We are using maker in a cluster with mpich. Currently hyperthreading is on and we use 'mpiexec -n ' to start maker. Our machinelist file for mpich specifies the total emulated cores for each node. > With hyperthreading on, we have up to 256 total emulated cores available. > > Which is the optimal scenario? > 1. Use '-n 256' > 2. Use '-n 128' with hyperthreading still on > 3. Use '-n 128' with hyperthreading turned off > > Thanks > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From admin at genome.arizona.edu Tue May 23 14:31:48 2017 From: admin at genome.arizona.edu (admin at genome.arizona.edu) Date: Tue, 23 May 2017 13:31:48 -0700 Subject: [maker-devel] Hyperthreading In-Reply-To: References: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> Message-ID: Carson Holt wrote on 05/23/2017 01:19 PM: > You can pull back to 128 would if you find that you are running low on RAM or have a high IO burden (both of which will double if you go from 128 to 256 even though CPU isn?t really doubling). Also MAKER per job performance plateaus at around 200 processes due to communication overhead. Above that threshold it is often useful to divide datasets into multiple separate jobs that can run simultaneously. Yes, with '-n 192' we found the load on the cluster will initially go up to 360-380 but then continually decreases until maker is finished. Memory usage was very low during the processing (under 20%). From carsonhh at gmail.com Tue May 23 14:38:42 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 23 May 2017 14:38:42 -0600 Subject: [maker-devel] Hyperthreading In-Reply-To: References: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> Message-ID: <47000CA1-0F27-40B2-8BB7-3289C2010853@gmail.com> Also make sure cpus in the control file are set to 1 when using MPI. Otherwise it will tell each program it calls to try and use more CPUs per call. ?Carson > On May 23, 2017, at 2:31 PM, admin at genome.arizona.edu wrote: > > Carson Holt wrote on 05/23/2017 01:19 PM: >> You can pull back to 128 would if you find that you are running low on RAM or have a high IO burden (both of which will double if you go from 128 to 256 even though CPU isn?t really doubling). Also MAKER per job performance plateaus at around 200 processes due to communication overhead. Above that threshold it is often useful to divide datasets into multiple separate jobs that can run simultaneously. > > Yes, with '-n 192' we found the load on the cluster will initially go up to 360-380 but then continually decreases until maker is finished. Memory usage was very low during the processing (under 20%). > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mmokrejs at gmail.com Tue May 23 14:45:21 2017 From: mmokrejs at gmail.com (=?UTF-8?Q?Martin_MOKREJ=c5=a0?=) Date: Tue, 23 May 2017 22:45:21 +0200 Subject: [maker-devel] Hyperthreading In-Reply-To: References: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> Message-ID: <30ae31f7-264e-44d0-30bc-a010de3e54a7@gmail.com> admin at genome.arizona.edu wrote: > Carson Holt wrote on 05/23/2017 01:19 PM: >> You can pull back to 128 would if you find that you are running low on RAM or have a high IO burden (both of which will double if you go from 128 to 256 even though CPU isn?t really doubling). Also MAKER per job performance plateaus at around 200 processes due to communication overhead. Above that threshold it is often useful to divide datasets into multiple separate jobs that can run simultaneously. > > Yes, with '-n 192' we found the load on the cluster will initially go up to 360-380 but then continually decreases until maker is finished. Memory usage was very low during the processing (under 20%). Hi, the high load could be caused by disk IO or other reasons. The only proof is to run top, htop or similar and check that the processes are in *running* state ("R" is displayed in the status column). There could be "S" (sleep) when task is waiting for data input or output and also "D"(disk) coudl be shown when waiting for disk IO (unlike network IO). Martin From mmokrejs at gmail.com Tue May 23 14:51:18 2017 From: mmokrejs at gmail.com (=?UTF-8?Q?Martin_MOKREJ=c5=a0?=) Date: Tue, 23 May 2017 22:51:18 +0200 Subject: [maker-devel] Hyperthreading In-Reply-To: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> References: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> Message-ID: <54fbcfd6-ba28-e570-cfab-a6d83620f747@gmail.com> System Admin wrote: > We are using maker in a cluster with mpich. Currently hyperthreading is on and we use 'mpiexec -n ' to start maker. Our machinelist file for mpich specifies the total emulated cores for each node. > With hyperthreading on, we have up to 256 total emulated cores available. > > Which is the optimal scenario? > 1. Use '-n 256' > 2. Use '-n 128' with hyperthreading still on > 3. Use '-n 128' with hyperthreading turned off Go for 3. but make sure to disable *hyperthreading* in the kernel of the machines as well. I also disable multicore scheduler (which should again be helping if there are more long-term running processes than physical cores available and if some should probably share a cache). We do not have such jobs, hmmer and blast are mostly accessing data from memory, so the CPU cache is not much relevant for these. Hyperthreading only helps if jobs are lousy, waiting for some input/output etc., and in that case *it helps* if another process can be executed on the CPU core (hopefully not having same bottleneck). This is generally a helped in bad situations. You are after good setup, so disable hyperthreading in kernel, load only that many jobs equal to the number of physical CPI cores, and monitor performance. If jobs are starving, resolve the issue. Martin From admin at genome.arizona.edu Tue May 23 14:57:56 2017 From: admin at genome.arizona.edu (admin at genome.arizona.edu) Date: Tue, 23 May 2017 13:57:56 -0700 Subject: [maker-devel] Hyperthreading In-Reply-To: <47000CA1-0F27-40B2-8BB7-3289C2010853@gmail.com> References: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> <47000CA1-0F27-40B2-8BB7-3289C2010853@gmail.com> Message-ID: Carson Holt wrote on 05/23/2017 01:38 PM: > Also make sure cpus in the control file are set to 1 when using MPI. > Otherwise it will tell each program it calls to try and use more > CPUs per call. Yes we are using cpus=1 in the control file Thanks From carsonhh at gmail.com Tue May 23 15:03:17 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 23 May 2017 15:03:17 -0600 Subject: [maker-devel] Hyperthreading In-Reply-To: References: <66d4d95b-fc19-233c-af19-787c2af73e25@genome.arizona.edu> Message-ID: <054B76C7-FAC0-4B9B-A6D1-29A8202E35B3@gmail.com> One last thing to check if using CentOS or RedHat. I?ve seen it happen on a handful of clusters where transparent hugepages can create odd load issues and very high sys CPU usage under top (not just with maker but with BWA, GATK, and other programs that can have larger memory footprints). If using CentOS or RedHat, you may want to disable defrag for hugepages. You do this on CentOS 6 to disable it (the process is similar on CentOS 7 and RedHat but you may have to google it) ?> echo never > /sys/kernel/mm/transparent_hugepage/defrag echo 0 > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag ?Carson > On May 23, 2017, at 2:31 PM, admin at genome.arizona.edu wrote: > > Carson Holt wrote on 05/23/2017 01:19 PM: >> You can pull back to 128 would if you find that you are running low on RAM or have a high IO burden (both of which will double if you go from 128 to 256 even though CPU isn?t really doubling). Also MAKER per job performance plateaus at around 200 processes due to communication overhead. Above that threshold it is often useful to divide datasets into multiple separate jobs that can run simultaneously. > > Yes, with '-n 192' we found the load on the cluster will initially go up to 360-380 but then continually decreases until maker is finished. Memory usage was very low during the processing (under 20%). > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue May 23 15:34:15 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 23 May 2017 15:34:15 -0600 Subject: [maker-devel] augustus exon calling ~ In-Reply-To: References: Message-ID: EVM works extremely well when evidence closely matches the predictions and there are no assembly anomalies affecting ORF. Otherwise, EVM performs very very poorly. Also I would not set unmask=1. It adds noise to the calls. Note in all cases given, gene models are from Augustus (MAKER doesn?t make predictions). MAKER just provides hints that Augustus can use for the second call set. Hints boost the score a model gets whenever a feature matches the hint. What you see as an Augustus match/match_part feature are just references of what Augustus calls without hints. So if I tell Augustus there is probably an exon/intron at location X, then any model that includes that exon/intron will bump up its score thus causing Augustus to keep models that match the hints and report those over models that don?t match. However if there is an issue with the evidence (i.e. merge mRNA-seq assembly), or an issue with the assembly (base change generates an early stop codon or causes a frameshift), then Augustus may choose to truncate or skip an exon in order to capture the bonus from downstream hints. So it is unlikely that there is a workable model that capture the exact intron exon structure because it breaks the ORF at some point. So Augustus instead produces the best model it can to capture as many hint bonuses as it can. That being said, look for any odd hint sources like very poor protein or transcript evidence alignments. Eliminating bad hints will improve performance (if using mRNA-seq assemblies Trinity has a jaccard_clip option which helps avoid false merging of transcript evidence for example). Or if an organism you used for protein evidence constantly produces bad protein alignments, then you may want to drop it completely from evidence. Finally training Augustus on the genome being annotated will help improve performance (note just because a species is closely related in evolutionary space does not mean that its HMM's will perform well; it?s a common fallacy about ab initio prediction discussed in the SNAP paper). Also try adding another gene predictor like SNAP to see if it hurts or helps. ?Carson > On May 21, 2017, at 1:48 AM, Salim Bougouffa wrote: > > Hi Maker folks, > > I have several issues with a plant genome annotation that I am currently doing but perhaps the most recurrent issues are: > > 1/ CDSs that are missed where significant rna-seq evidence is there (figure artemis01) > 2/ vice versa where one or two exons are added without rna-seq evidence/intron hints (figure artemis02) > > info about the runs: > 1/ using augustus with a pre-existing model for a related plant that has high homology to the one I am annotating > 2/ umask=1 (seems to do better than umask=0; is this a good thing to do) > 3/ evm = 1 (seems to perform better than emv=0) > 4/ repeatmasking (denovo + repbase) > > Best, > /SB > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan.ricks at gmail.com Tue May 23 15:23:49 2017 From: nathan.ricks at gmail.com (Nathan Ricks) Date: Tue, 23 May 2017 15:23:49 -0600 Subject: [maker-devel] maker_functional_gff Message-ID: I've been working with maker and trying to use the maker_functional_gff to create an annotated .gff file. However, whenever I run the command, the following pops up, and just continues for a long time. Use of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58363. Use of uninitialized value $qid in hash element at ./maker_functional_gff line 170, <$IN> line 58363. Use of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58367. Use of uninitialized value $qid in hash element at ./maker_functional_gff line 170, <$IN> line 58367. ^CUse of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58368. Nathan Ricks -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.ence at ufl.edu Tue May 23 15:39:32 2017 From: d.ence at ufl.edu (Ence,daniel) Date: Tue, 23 May 2017 21:39:32 +0000 Subject: [maker-devel] maker_functional_gff In-Reply-To: References: Message-ID: <9388406A-302C-4B19-9F35-D56C06CC9582@mail.ufl.edu> Hi Nathan, can you send the command line that you?re using and is giving the error? Thanks, Daniel Ence > On May 23, 2017, at 5:23 PM, Nathan Ricks wrote: > > I've been working with maker and trying to use the maker_functional_gff to create an annotated .gff file. However, whenever I run the command, the following pops up, and just continues for a long time. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58363. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 170, <$IN> line 58363. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58367. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 170, <$IN> line 58367. > ^CUse of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58368. > > > Nathan Ricks > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue May 23 15:44:05 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 23 May 2017 15:44:05 -0600 Subject: [maker-devel] maker_functional_gff In-Reply-To: References: Message-ID: <25B003D7-6A19-4486-B21F-71070F00A580@gmail.com> The blast report you gave it is in the wrong format, it is partial/truncated, or you provided the files in the wrong order. Basically it receive an empty line from the file at some point. The blast report format must in tabular foramt which is "wu-blast -mformat 2? or "ncbi-blast -outfmt 6" Also the script only supports blast results against UniProt/Swiss-prot. ?Carson > On May 23, 2017, at 3:23 PM, Nathan Ricks wrote: > > I've been working with maker and trying to use the maker_functional_gff to create an annotated .gff file. However, whenever I run the command, the following pops up, and just continues for a long time. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58363. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 170, <$IN> line 58363. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58367. > Use of uninitialized value $qid in hash element at ./maker_functional_gff line 170, <$IN> line 58367. > ^CUse of uninitialized value $qid in hash element at ./maker_functional_gff line 165, <$IN> line 58368. > > > Nathan Ricks > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From yuejiaxing at gmail.com Fri May 26 03:28:48 2017 From: yuejiaxing at gmail.com (Jia-Xing Yue) Date: Fri, 26 May 2017 11:28:48 +0200 Subject: [maker-devel] multiple overlapped snoRNA genes got annotated by maker In-Reply-To: References: <0AC3F89F-28EB-4E2A-ADDE-2DF8BD625416@gmail.com> Message-ID: Hi Michael, This is a follow-up for the snoscan issue. I found the snoscan_meth option seems have been removed in the current maker_opts.ctl template file (v2.31.9). This option used to be there according to this post ( https://www.biostars.org/p/217240/). I manually specified this option in my maker_opts.ctl file but I don't think maker has correctly recognized this option: STATUS: Parsing control files... WARNING: Invalid option 'snoscan_meth' in control file maker_opts.ctl ... Do you know is there a way to work around this problem? Thanks! Best, Jia-Xing On Mon, May 15, 2017 at 7:14 PM, Jia-Xing Yue wrote: > Hi Michael, > > Many thanks for the information! I will specify the "snoscan_meth" file > and give it another try then. I majorly want to use maker to annotate > protein-coding genes and tRNAs. But it would be nice to have snoRNA > reasonably annotated as well. > Thanks gain and have a great day! > > Best, > Jia-Xing > > > On Mon, May 15, 2017 at 6:28 PM, Michael Campbell < > michael.s.campbell1 at gmail.com> wrote: > >> Hi Jia-Xing, >> >> That has been my experience in the past as well. For the non-coding RNAs >> tRNA-scan is very accurate while snoscan seems to be quite sensitive but >> very specific. Did you give it a ?snoscan_meth? file? Giving it >> a snoscan_meth file will help with accuracy. The biggest gains in accuracy >> are from small RNA-seq data. In the paper where we used snoscan on maize we >> didn?t keep any snoRNA predictions that didn?t have support from small >> RNA-seq data, in practical terms we got rid of anything with a AED of 1. >> >> I hope this helps, >> Mike >> >> On May 15, 2017, at 7:28 AM, Jia-Xing Yue wrote: >> >> Hello, >> >> I configured snoscan (v.0.9.1) for my maker installation (v2.31.9) and >> run the annotation for a yeast (S. cerevisiae) genome. I think the >> annotation went well with regard to tRNAs and protein-coding genes but I am >> not sure about snoRNAs. I found multiple overlapped snoRNA genes were >> annotated by maker as the example below shows. I was wondering if this is >> expected. If not, what might have caused this problem and is there a way to >> work around. Thanks in advance! >> >> chrIX maker gene 4328 4416 . + . >> ID=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-nonc >> oding-gene-0.49 >> chrIX maker snoRNA 4328 4416 . + . >> ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;Parent=snoscan >> -chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding- >> gene-0.49-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|90|0 >> chrIX maker exon 4328 4416 . + . >> ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1:exon:12260;Par >> ent=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1 >> chrIX maker gene 4375 4563 . + . >> ID=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-nonc >> oding-gene-0.50 >> chrIX maker snoRNA 4375 4563 . + . >> ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;Parent=snoscan >> -chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding- >> gene-0.50-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|190|0 >> chrIX maker exon 4375 4563 . + . >> ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1:exon:12261;Par >> ent=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1 >> chrIX maker gene 4375 4461 . + . >> ID=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-nonc >> oding-gene-0.51 >> chrIX maker snoRNA 4375 4461 . + . >> ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;Parent=snoscan >> -chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding- >> gene-0.51-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|88|0 >> chrIX maker exon 4375 4461 . + . >> ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1:exon:12262;Par >> ent=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1 >> chrIX maker gene 4375 4491 . + . >> ID=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-nonc >> oding-gene-0.52 >> chrIX maker snoRNA 4375 4491 . + . >> ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;Parent=snoscan >> -chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding- >> gene-0.52-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|118|0 >> chrIX maker exon 4375 4491 . + . >> ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1:exon:12263;Par >> ent=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1 >> chrIX maker gene 4375 4500 . + . >> ID=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-nonc >> oding-gene-0.53 >> chrIX maker snoRNA 4375 4500 . + . >> ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;Parent=snoscan >> -chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding- >> gene-0.53-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|127|0 >> chrIX maker exon 4375 4500 . + . >> ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1:exon:12264;Par >> ent=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1 >> >> Best, >> Jia-Xing >> >> >> -- >> Jia-Xing Yue >> >> Population Genomics and Complex Traits Group >> Tour Pasteur 8eme etage >> Facult? de M?decine >> Institute for Research on Cancer and Aging, Nice (IRCAN) >> CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) >> 28 Avenue de Valombrose >> 06107 NICE Cedex 2 >> France >> >> Personal website: http://www.iamphioxus.org/ >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > > -- > Jia-Xing Yue > > Population Genomics and Complex Traits Group > Tour Pasteur 8eme etage > Facult? de M?decine > Institute for Research on Cancer and Aging, Nice (IRCAN) > CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) > 28 Avenue de Valombrose > 06107 NICE Cedex 2 > France > > Personal website: http://www.iamphioxus.org/ > > -- Jia-Xing Yue Population Genomics and Complex Traits Group Tour Pasteur 8eme etage Facult? de M?decine Institute for Research on Cancer and Aging, Nice (IRCAN) CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) 28 Avenue de Valombrose 06107 NICE Cedex 2 France Personal website: http://www.iamphioxus.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Fri May 26 07:54:44 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Fri, 26 May 2017 09:54:44 -0400 Subject: [maker-devel] multiple overlapped snoRNA genes got annotated by maker In-Reply-To: References: <0AC3F89F-28EB-4E2A-ADDE-2DF8BD625416@gmail.com> Message-ID: <6461FDD0-BE78-403A-9FEF-E71C3D24F2CA@gmail.com> Hi Jia-Xing, v2.31.9 may not have had that option. I know that it is in the v3.00.0 version, so you best option may be to update. Thanks, Mike > On May 26, 2017, at 5:28 AM, Jia-Xing Yue wrote: > > Hi Michael, > > This is a follow-up for the snoscan issue. I found the snoscan_meth option seems have been removed in the current maker_opts.ctl template file (v2.31.9). This option used to be there according to this post (https://www.biostars.org/p/217240/ ). I manually specified this option in my maker_opts.ctl file but I don't think maker has correctly recognized this option: > > > STATUS: Parsing control files... > WARNING: Invalid option 'snoscan_meth' in control file maker_opts.ctl > ... > > Do you know is there a way to work around this problem? Thanks! > > Best, > Jia-Xing > > > > On Mon, May 15, 2017 at 7:14 PM, Jia-Xing Yue > wrote: > Hi Michael, > > Many thanks for the information! I will specify the "snoscan_meth" file and give it another try then. I majorly want to use maker to annotate protein-coding genes and tRNAs. But it would be nice to have snoRNA reasonably annotated as well. > Thanks gain and have a great day! > > Best, > Jia-Xing > > > On Mon, May 15, 2017 at 6:28 PM, Michael Campbell > wrote: > Hi Jia-Xing, > > That has been my experience in the past as well. For the non-coding RNAs tRNA-scan is very accurate while snoscan seems to be quite sensitive but very specific. Did you give it a ?snoscan_meth? file? Giving it a snoscan_meth file will help with accuracy. The biggest gains in accuracy are from small RNA-seq data. In the paper where we used snoscan on maize we didn?t keep any snoRNA predictions that didn?t have support from small RNA-seq data, in practical terms we got rid of anything with a AED of 1. > > I hope this helps, > Mike >> On May 15, 2017, at 7:28 AM, Jia-Xing Yue > wrote: >> >> Hello, >> >> I configured snoscan (v.0.9.1) for my maker installation (v2.31.9) and run the annotation for a yeast (S. cerevisiae) genome. I think the annotation went well with regard to tRNAs and protein-coding genes but I am not sure about snoRNAs. I found multiple overlapped snoRNA genes were annotated by maker as the example below shows. I was wondering if this is expected. If not, what might have caused this problem and is there a way to work around. Thanks in advance! >> >> chrIX maker gene 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding-gene-0.49 >> chrIX maker snoRNA 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|90|0 >> chrIX maker exon 4328 4416 . + . ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1:exon:12260;Parent=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1 >> chrIX maker gene 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding-gene-0.50 >> chrIX maker snoRNA 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|190|0 >> chrIX maker exon 4375 4563 . + . ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1:exon:12261;Parent=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1 >> chrIX maker gene 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding-gene-0.51 >> chrIX maker snoRNA 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|88|0 >> chrIX maker exon 4375 4461 . + . ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1:exon:12262;Parent=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1 >> chrIX maker gene 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding-gene-0.52 >> chrIX maker snoRNA 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|118|0 >> chrIX maker exon 4375 4491 . + . ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1:exon:12263;Parent=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1 >> chrIX maker gene 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding-gene-0.53 >> chrIX maker snoRNA 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;Parent=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|127|0 >> chrIX maker exon 4375 4500 . + . ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1:exon:12264;Parent=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1 >> >> Best, >> Jia-Xing >> >> >> -- >> Jia-Xing Yue >> >> Population Genomics and Complex Traits Group >> Tour Pasteur 8eme etage >> Facult? de M?decine >> Institute for Research on Cancer and Aging, Nice (IRCAN) >> CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) >> 28 Avenue de Valombrose >> 06107 NICE Cedex 2 >> France >> >> Personal website: http://www.iamphioxus.org/ >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > Jia-Xing Yue > > Population Genomics and Complex Traits Group > Tour Pasteur 8eme etage > Facult? de M?decine > Institute for Research on Cancer and Aging, Nice (IRCAN) > CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) > 28 Avenue de Valombrose > 06107 NICE Cedex 2 > France > > Personal website: http://www.iamphioxus.org/ > > > > > -- > Jia-Xing Yue > > Population Genomics and Complex Traits Group > Tour Pasteur 8eme etage > Facult? de M?decine > Institute for Research on Cancer and Aging, Nice (IRCAN) > CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) > 28 Avenue de Valombrose > 06107 NICE Cedex 2 > France > > Personal website: http://www.iamphioxus.org/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yuejiaxing at gmail.com Fri May 26 08:20:03 2017 From: yuejiaxing at gmail.com (Jia-Xing Yue) Date: Fri, 26 May 2017 16:20:03 +0200 Subject: [maker-devel] multiple overlapped snoRNA genes got annotated by maker In-Reply-To: <6461FDD0-BE78-403A-9FEF-E71C3D24F2CA@gmail.com> References: <0AC3F89F-28EB-4E2A-ADDE-2DF8BD625416@gmail.com> <6461FDD0-BE78-403A-9FEF-E71C3D24F2CA@gmail.com> Message-ID: I see. Thanks Michael! Best, Jia-Xing On Fri, May 26, 2017 at 3:54 PM, Michael Campbell < michael.s.campbell1 at gmail.com> wrote: > Hi Jia-Xing, > > v2.31.9 may not have had that option. I know that it is in the v3.00.0 > version, so you best option may be to update. > > Thanks, > Mike > > On May 26, 2017, at 5:28 AM, Jia-Xing Yue wrote: > > Hi Michael, > > This is a follow-up for the snoscan issue. I found the snoscan_meth option > seems have been removed in the current maker_opts.ctl template file > (v2.31.9). This option used to be there according to this post ( > https://www.biostars.org/p/217240/). I manually specified this option in > my maker_opts.ctl file but I don't think maker has correctly recognized > this option: > > > STATUS: Parsing control files... > WARNING: Invalid option 'snoscan_meth' in control file maker_opts.ctl > ... > > Do you know is there a way to work around this problem? Thanks! > > Best, > Jia-Xing > > > > On Mon, May 15, 2017 at 7:14 PM, Jia-Xing Yue > wrote: > >> Hi Michael, >> >> Many thanks for the information! I will specify the "snoscan_meth" file >> and give it another try then. I majorly want to use maker to annotate >> protein-coding genes and tRNAs. But it would be nice to have snoRNA >> reasonably annotated as well. >> Thanks gain and have a great day! >> >> Best, >> Jia-Xing >> >> >> On Mon, May 15, 2017 at 6:28 PM, Michael Campbell < >> michael.s.campbell1 at gmail.com> wrote: >> >>> Hi Jia-Xing, >>> >>> That has been my experience in the past as well. For the non-coding RNAs >>> tRNA-scan is very accurate while snoscan seems to be quite sensitive but >>> very specific. Did you give it a ?snoscan_meth? file? Giving it >>> a snoscan_meth file will help with accuracy. The biggest gains in accuracy >>> are from small RNA-seq data. In the paper where we used snoscan on maize we >>> didn?t keep any snoRNA predictions that didn?t have support from small >>> RNA-seq data, in practical terms we got rid of anything with a AED of 1. >>> >>> I hope this helps, >>> Mike >>> >>> On May 15, 2017, at 7:28 AM, Jia-Xing Yue wrote: >>> >>> Hello, >>> >>> I configured snoscan (v.0.9.1) for my maker installation (v2.31.9) and >>> run the annotation for a yeast (S. cerevisiae) genome. I think the >>> annotation went well with regard to tRNAs and protein-coding genes but I am >>> not sure about snoRNAs. I found multiple overlapped snoRNA genes were >>> annotated by maker as the example below shows. I was wondering if this is >>> expected. If not, what might have caused this problem and is there a way to >>> work around. Thanks in advance! >>> >>> chrIX maker gene 4328 4416 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-nonc >>> oding-gene-0.49 >>> chrIX maker snoRNA 4328 4416 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1;Parent=snoscan >>> -chrIX-noncoding-gene-0.49;Name=snoscan-chrIX-noncoding-gene >>> -0.49-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|90|0 >>> chrIX maker exon 4328 4416 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1:exon:12260;Par >>> ent=snoscan-chrIX-noncoding-gene-0.49-snoRNA-1 >>> chrIX maker gene 4375 4563 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-nonc >>> oding-gene-0.50 >>> chrIX maker snoRNA 4375 4563 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1;Parent=snoscan >>> -chrIX-noncoding-gene-0.50;Name=snoscan-chrIX-noncoding-gene >>> -0.50-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|190|0 >>> chrIX maker exon 4375 4563 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1:exon:12261;Par >>> ent=snoscan-chrIX-noncoding-gene-0.50-snoRNA-1 >>> chrIX maker gene 4375 4461 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-nonc >>> oding-gene-0.51 >>> chrIX maker snoRNA 4375 4461 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1;Parent=snoscan >>> -chrIX-noncoding-gene-0.51;Name=snoscan-chrIX-noncoding-gene >>> -0.51-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|88|0 >>> chrIX maker exon 4375 4461 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1:exon:12262;Par >>> ent=snoscan-chrIX-noncoding-gene-0.51-snoRNA-1 >>> chrIX maker gene 4375 4491 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-nonc >>> oding-gene-0.52 >>> chrIX maker snoRNA 4375 4491 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1;Parent=snoscan >>> -chrIX-noncoding-gene-0.52;Name=snoscan-chrIX-noncoding-gene >>> -0.52-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|118|0 >>> chrIX maker exon 4375 4491 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1:exon:12263;Par >>> ent=snoscan-chrIX-noncoding-gene-0.52-snoRNA-1 >>> chrIX maker gene 4375 4500 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-nonc >>> oding-gene-0.53 >>> chrIX maker snoRNA 4375 4500 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1;Parent=snoscan >>> -chrIX-noncoding-gene-0.53;Name=snoscan-chrIX-noncoding-gene >>> -0.53-snoRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|127|0 >>> chrIX maker exon 4375 4500 . + . >>> ID=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1:exon:12264;Par >>> ent=snoscan-chrIX-noncoding-gene-0.53-snoRNA-1 >>> >>> Best, >>> Jia-Xing >>> >>> >>> -- >>> Jia-Xing Yue >>> >>> Population Genomics and Complex Traits Group >>> Tour Pasteur 8eme etage >>> Facult? de M?decine >>> Institute for Research on Cancer and Aging, Nice (IRCAN) >>> CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) >>> 28 Avenue de Valombrose >>> 06107 NICE Cedex 2 >>> France >>> >>> Personal website: http://www.iamphioxus.org/ >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >> >> >> -- >> Jia-Xing Yue >> >> Population Genomics and Complex Traits Group >> Tour Pasteur 8eme etage >> Facult? de M?decine >> Institute for Research on Cancer and Aging, Nice (IRCAN) >> CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) >> 28 Avenue de Valombrose >> 06107 NICE Cedex 2 >> France >> >> Personal website: http://www.iamphioxus.org/ >> >> > > > -- > Jia-Xing Yue > > Population Genomics and Complex Traits Group > Tour Pasteur 8eme etage > Facult? de M?decine > Institute for Research on Cancer and Aging, Nice (IRCAN) > CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) > 28 Avenue de Valombrose > 06107 NICE Cedex 2 > France > > Personal website: http://www.iamphioxus.org/ > > > -- Jia-Xing Yue Population Genomics and Complex Traits Group Tour Pasteur 8eme etage Facult? de M?decine Institute for Research on Cancer and Aging, Nice (IRCAN) CNRS UMR 7284 - INSERM U 1081 - Universit? C?te d?Azur (UCA) 28 Avenue de Valombrose 06107 NICE Cedex 2 France Personal website: http://www.iamphioxus.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: