From elyssa_garza at yahoo.com Fri Dec 11 11:43:32 2015 From: elyssa_garza at yahoo.com (Elyssa Garza) Date: Fri, 11 Dec 2015 11:43:32 -0600 Subject: [maker-devel] First time using maker- Train or not to train? Message-ID: <084E7DB7-0A91-458E-B590-58BB6CC42E70@yahoo.com> Hello, I have recently begun running Maker. I am currently trying to annotate my Caulanthus Genome (~372Mb); a relative to Arabidopsis. I am unsure about the parameters I have chosen for my first run in maker, which include: genome=CAB_assembly.fasta (1044 contigs) est=Representative_transcript_loci.fasta (assembled transcripts btw 200-20000bp long) protein=TAIR10pep.fasta (Arabidopsis proteins) ? Repeat masking model_org=arabidopsis rmlib=list of Brassicaceae and common plant repeats repeat_protein=te_proteins.fasta Gene Prediction snaphmm=A.thaliana.hmm augustus_species=arabidopsis est2genome=1 I have run a sample file of scaffolds, as well as the entire genome. In the sample file of scaffolds, I gff3merged the gffs and then ran evaluator. I noticed that my AED are all 1. Is this bad? What should I try next? I am also unsure on how to train files and if this should be done in my case. Can anyone advise me on these issues? -Elyssa -------------- next part -------------- An HTML attachment was scrubbed... URL: From ole.toerresen at gmail.com Mon Dec 14 13:21:11 2015 From: ole.toerresen at gmail.com (=?UTF-8?Q?Ole_Kristian_T=C3=B8rresen?=) Date: Mon, 14 Dec 2015 20:21:11 +0100 Subject: [maker-devel] Error with maker_functional_gff Message-ID: Hi, I'm trying to update my annotation with some functional annotations with maker_functional_gff, but get this annoying error: Can't use string ("") as a HASH ref while "strict refs" in use at /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, <$IN> line 108947. Line 108947 in the input gff is this: LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; It seems like the regexp in line 55 in the maker_functional_gff script doesn't pick up the ID, but I can't see any difference between that line and other similar lines. Any help to trace down this is really appreciated. Do you need any other information? Thank you. Sincerely, Ole Kristian T?rresen -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Dec 16 12:07:07 2015 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 16 Dec 2015 18:07:07 +0000 Subject: [maker-devel] First time using maker- Train or not to train? In-Reply-To: <084E7DB7-0A91-458E-B590-58BB6CC42E70@yahoo.com> References: <084E7DB7-0A91-458E-B590-58BB6CC42E70@yahoo.com> Message-ID: Hi Elyssa, Setting est2genome=1 tells MAKER to promote all of the est2genome alignments to a gene model, which is not what you want for a final gene set. That being said, since your gene models are basically the unmodified alignments, I?m surprised that all of them have an AED of 1, since that means that they?re not supported by any of the evidence (either est or protein). Did you get gene models from snap or augustus? You can gather those with the fasta_merge script. Those should be a good starting place for training ab initio predictors. Instructions for training snap can be found here: http://gmod.org/wiki/MAKER_Tutorial#Training_ab_initio_Gene_Predictors Augustus can also be trained but is much more involved. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Dec 11, 2015, at 10:43 AM, Elyssa Garza > wrote: Hello, I have recently begun running Maker. I am currently trying to annotate my Caulanthus Genome (~372Mb); a relative to Arabidopsis. I am unsure about the parameters I have chosen for my first run in maker, which include: genome=CAB_assembly.fasta (1044 contigs) est=Representative_transcript_loci.fasta (assembled transcripts btw 200-20000bp long) protein=TAIR10pep.fasta (Arabidopsis proteins) ? Repeat masking model_org=arabidopsis rmlib=list of Brassicaceae and common plant repeats repeat_protein=te_proteins.fasta Gene Prediction snaphmm=A.thaliana.hmm augustus_species=arabidopsis est2genome=1 I have run a sample file of scaffolds, as well as the entire genome. In the sample file of scaffolds, I gff3merged the gffs and then ran evaluator. I noticed that my AED are all 1. Is this bad? What should I try next? I am also unsure on how to train files and if this should be done in my case. Can anyone advise me on these issues? -Elyssa _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Dec 16 13:27:00 2015 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 16 Dec 2015 19:27:00 +0000 Subject: [maker-devel] Error with maker_functional_gff In-Reply-To: References: Message-ID: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> Hi Ole, can you send a line for a gene feature that does work? Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Dec 14, 2015, at 12:21 PM, Ole Kristian T?rresen > wrote: Hi, I'm trying to update my annotation with some functional annotations with maker_functional_gff, but get this annoying error: Can't use string ("") as a HASH ref while "strict refs" in use at /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, <$IN> line 108947. Line 108947 in the input gff is this: LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; It seems like the regexp in line 55 in the maker_functional_gff script doesn't pick up the ID, but I can't see any difference between that line and other similar lines. Any help to trace down this is really appreciated. Do you need any other information? Thank you. Sincerely, Ole Kristian T?rresen _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Dec 16 13:37:14 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 16 Dec 2015 12:37:14 -0700 Subject: [maker-devel] Error with maker_functional_gff In-Reply-To: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> References: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> Message-ID: I?ve seen this exact same error before (https://groups.google.com/forum/#!searchin/maker-devel/$2Fmaker_functional_gff$20line$2058/maker-devel/cBuQMKTJj2M/aXGnARZ7JhsJ). It is caused by the ID from the blast report and input protein fasta. maker_functional_gff is not a generic script that can work on any input, it only works on blast results against Uniprot/Swiss-prot. The script is expecting a very specific header format in both the report and the protein fasta and if it doesn?t see it, then it is missing certain pieces of needed information. Thanks, Carson > On Dec 16, 2015, at 12:27 PM, Daniel Ence wrote: > > Hi Ole, can you send a line for a gene feature that does work? > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Dec 14, 2015, at 12:21 PM, Ole Kristian T?rresen > wrote: >> >> Hi, >> I'm trying to update my annotation with some functional annotations with maker_functional_gff, but get this annoying error: >> Can't use string ("") as a HASH ref while "strict refs" in use at /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, <$IN> line 108947. >> Line 108947 in the input gff is this: >> >> LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; >> It seems like the regexp in line 55 in the maker_functional_gff script doesn't pick up the ID, but I can't see any difference between that line and other similar lines. >> >> Any help to trace down this is really appreciated. Do you need any other information? >> >> Thank you. >> >> Sincerely, >> >> Ole Kristian T?rresen >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ole.toerresen at gmail.com Wed Dec 16 14:53:25 2015 From: ole.toerresen at gmail.com (=?UTF-8?Q?Ole_Kristian_T=C3=B8rresen?=) Date: Wed, 16 Dec 2015 21:53:25 +0100 Subject: [maker-devel] Error with maker_functional_gff In-Reply-To: References: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> Message-ID: Daniel, this is the previous gene, before maker_functional_gff: LG08 maker gene 13648888 13656687 . - . ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325; LG08 maker mRNA 13648888 13656687 . - . ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45; LG08 maker exon 13648888 13648944 . - . ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; LG08 maker exon 13649295 13649577 . - . ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; LG08 maker exon 13649816 13651468 . - . ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; LG08 maker exon 13651736 13651789 . - . ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; LG08 maker exon 13652270 13652365 . - . ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; LG08 maker exon 13652643 13652730 . - . ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; LG08 maker exon 13653175 13653212 . - . ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; LG08 maker exon 13653587 13653641 . - . ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; LG08 maker exon 13653764 13653817 . - . ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; LG08 maker exon 13653910 13653974 . - . ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; LG08 maker exon 13654085 13654164 . - . ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; LG08 maker exon 13654474 13654828 . - . ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; LG08 maker exon 13656667 13656687 . - . ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; LG08 maker CDS 13656667 13656687 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13654474 13654828 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13654085 13654164 . - 2 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653910 13653974 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653764 13653817 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653587 13653641 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653175 13653212 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13652643 13652730 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13652270 13652365 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13651736 13651789 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13651319 13651468 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker three_prime_UTR 13649816 13651318 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; LG08 maker three_prime_UTR 13649295 13649577 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; LG08 maker three_prime_UTR 13648888 13648944 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; LG08 maker mRNA 13786695 13806565 . - . ID=GAMO_00029233-RA;Parent=GAMO_00029233;Name=GAMO_00029233-RA;Alias=maker-LG08-snap-gene-46.343-mRNA-1;_AED=0.47;_QI=173|0.78|0.66|1|0.21|0.26|15|0|301;_eAED=0.47; After : LG08 maker gene 13648888 13656687 . - . ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325;Note=Similar to Tmbim1: Protein lifeguard 3 (Mus musculus); LG08 maker mRNA 13648888 13656687 . - . ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45;Note=Similar to Tmbim1: Protein lifeguard 3 (Mus musculus); LG08 maker exon 13648888 13648944 . - . ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; LG08 maker exon 13649295 13649577 . - . ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; LG08 maker exon 13649816 13651468 . - . ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; LG08 maker exon 13651736 13651789 . - . ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; LG08 maker exon 13652270 13652365 . - . ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; LG08 maker exon 13652643 13652730 . - . ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; LG08 maker exon 13653175 13653212 . - . ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; LG08 maker exon 13653587 13653641 . - . ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; LG08 maker exon 13653764 13653817 . - . ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; LG08 maker exon 13653910 13653974 . - . ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; LG08 maker exon 13654085 13654164 . - . ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; LG08 maker exon 13654474 13654828 . - . ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; LG08 maker exon 13656667 13656687 . - . ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; LG08 maker CDS 13656667 13656687 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13654474 13654828 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13654085 13654164 . - 2 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653910 13653974 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653764 13653817 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653587 13653641 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653175 13653212 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13652643 13652730 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13652270 13652365 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13651736 13651789 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13651319 13651468 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker three_prime_UTR 13649816 13651318 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; LG08 maker three_prime_UTR 13649295 13649577 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; LG08 maker three_prime_UTR 13648888 13648944 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; Carson, I saw that, but I did use Uniprot/Swiss-prot. A snap of the blast-output used as input here: GAMO_00029212-RA sp|Q8BJZ3|LFG3_MOUSE 53.93 280 112 3 81 348 33 307 2e-92 285 GAMO_00029212-RA sp|Q969X1|LFG3_HUMAN 54.51 288 103 5 76 347 33 308 4e-92 284 GAMO_00029212-RA sp|Q9BWQ8|LFG2_HUMAN 45.73 328 134 6 44 351 13 316 2e-86 270 GAMO_00029212-RA sp|Q5R4I4|LFG2_PONAB 45.73 328 134 6 44 351 13 316 3e-86 269 GAMO_00029212-RA sp|Q1LZ71|LFG2_BOVIN 45.03 322 145 5 44 351 13 316 5e-84 264 GAMO_00029212-RA sp|O88407|LFG2_RAT 44.65 327 139 6 44 351 13 316 8e-83 261 GAMO_00029212-RA sp|Q8K097|LFG2_MOUSE 45.16 310 129 5 60 351 31 317 1e-80 255 GAMO_00029212-RA sp|Q7Z429|LFG1_HUMAN 39.32 351 164 9 32 351 39 371 6e-69 226 GAMO_00029212-RA sp|Q32L53|LFG1_BOVIN 41.69 343 158 8 29 351 46 366 8e-66 218 GAMO_00029212-RA sp|Q9ESF4|LFG1_MOUSE 40.43 324 156 8 53 351 34 345 2e-59 201 GAMO_00029212-RA sp|Q6P6R0|LFG1_RAT 39.71 345 165 11 34 351 20 348 2e-59 201 GAMO_00029212-RA sp|Q9DA39|LFG4_MOUSE 35.59 222 120 7 142 351 27 237 3e-24 103 GAMO_00029212-RA sp|Q49P94|GAAP_VACCL 33.47 239 128 9 113 337 1 222 5e-22 97.1 GAMO_00029233-RA sp|Q2KIK0|SGT1_BOVIN 53.18 299 100 3 5 268 17 310 5e-89 275 GAMO_00029233-RA sp|B0BN85|SGT1_RAT 51.51 299 104 3 5 268 16 308 5e-86 268 GAMO_00029233-RA sp|Q9CX34|SGT1_MOUSE 51.51 299 104 3 5 268 16 308 8e-86 267 GAMO_00029233-RA sp|Q9Y2Z0|SGT1_HUMAN 46.83 331 100 5 5 268 16 337 1e-80 254 GAMO_00029233-RA sp|Q0JL44|SGT1_ORYSJ 30.75 322 160 4 10 268 16 337 5e-36 137 GAMO_00029233-RA sp|Q9SUT5|SGT1B_ARATH 27.99 318 171 4 9 268 11 328 3e-35 135 GAMO_00029233-RA sp|Q9SUR9|SGT1A_ARATH 28.28 297 159 5 24 268 26 320 7e-35 134 GAMO_00029233-RA sp|Q55ED0|SGT1_DICDI 37.72 167 63 3 138 268 196 357 5e-25 107 521 genes have had added function before maker_functional_gff choked particular gene GAMO_00029233. Thank you. Ole On 16 December 2015 at 20:37, Carson Holt wrote: > I?ve seen this exact same error before ( > https://groups.google.com/forum/#!searchin/maker-devel/$2Fmaker_functional_gff$20line$2058/maker-devel/cBuQMKTJj2M/aXGnARZ7JhsJ > ). > > It is caused by the ID from the blast report and input protein > fasta. maker_functional_gff is not a generic script that can work on any > input, it only works on blast results against Uniprot/Swiss-prot. The > script is expecting a very specific header format in both the report and > the protein fasta and if it doesn?t see it, then it is missing certain > pieces of needed information. > > Thanks, > Carson > > On Dec 16, 2015, at 12:27 PM, Daniel Ence wrote: > > Hi Ole, can you send a line for a gene feature that does work? > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On Dec 14, 2015, at 12:21 PM, Ole Kristian T?rresen < > ole.toerresen at gmail.com> wrote: > > Hi, > I'm trying to update my annotation with some functional annotations > with maker_functional_gff, but get this annoying error: > Can't use string ("") as a HASH ref while "strict refs" in use at > /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, > <$IN> line 108947. > > Line 108947 in the input gff is this: > > LG08 maker gene 13786695 13806565 . - . > ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; > > It seems like the regexp in line 55 in the maker_functional_gff script > doesn't pick up the ID, but I can't see any difference between that line > and other similar lines. > > Any help to trace down this is really appreciated. Do you need any other > information? > > Thank you. > > Sincerely, > > Ole Kristian T?rresen > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Dec 16 14:55:14 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 16 Dec 2015 13:55:14 -0700 Subject: [maker-devel] Error with maker_functional_gff In-Reply-To: References: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> Message-ID: Find the hit for GAMO_00029233 and then pull it?s header line out of the Uniprot fasta file. There may be an unexpected formatting difference in that header. ?Carson > On Dec 16, 2015, at 1:53 PM, Ole Kristian T?rresen wrote: > > Daniel, > this is the previous gene, before maker_functional_gff: > LG08 maker gene 13648888 13656687 . - . ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325; > LG08 maker mRNA 13648888 13656687 . - . ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45; > LG08 maker exon 13648888 13648944 . - . ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; > LG08 maker exon 13649295 13649577 . - . ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; > LG08 maker exon 13649816 13651468 . - . ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; > LG08 maker exon 13651736 13651789 . - . ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; > LG08 maker exon 13652270 13652365 . - . ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; > LG08 maker exon 13652643 13652730 . - . ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; > LG08 maker exon 13653175 13653212 . - . ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; > LG08 maker exon 13653587 13653641 . - . ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; > LG08 maker exon 13653764 13653817 . - . ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; > LG08 maker exon 13653910 13653974 . - . ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; > LG08 maker exon 13654085 13654164 . - . ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; > LG08 maker exon 13654474 13654828 . - . ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; > LG08 maker exon 13656667 13656687 . - . ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; > LG08 maker CDS 13656667 13656687 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654474 13654828 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654085 13654164 . - 2 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653910 13653974 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653764 13653817 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653587 13653641 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653175 13653212 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652643 13652730 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652270 13652365 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651736 13651789 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651319 13651468 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649816 13651318 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649295 13649577 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13648888 13648944 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; > LG08 maker mRNA 13786695 13806565 . - . ID=GAMO_00029233-RA;Parent=GAMO_00029233;Name=GAMO_00029233-RA;Alias=maker-LG08-snap-gene-46.343-mRNA-1;_AED=0.47;_QI=173|0.78|0.66|1|0.21|0.26|15|0|301;_eAED=0.47; > > After : > LG08 maker gene 13648888 13656687 . - . ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325;Note=Similar to Tmbim1: Protein lifeguard 3 (Mus musculus); > LG08 maker mRNA 13648888 13656687 . - . ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45;Note=Similar to Tmbim1: Protein lifeguard 3 (Mus musculus); > LG08 maker exon 13648888 13648944 . - . ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; > LG08 maker exon 13649295 13649577 . - . ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; > LG08 maker exon 13649816 13651468 . - . ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; > LG08 maker exon 13651736 13651789 . - . ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; > LG08 maker exon 13652270 13652365 . - . ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; > LG08 maker exon 13652643 13652730 . - . ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; > LG08 maker exon 13653175 13653212 . - . ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; > LG08 maker exon 13653587 13653641 . - . ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; > LG08 maker exon 13653764 13653817 . - . ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; > LG08 maker exon 13653910 13653974 . - . ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; > LG08 maker exon 13654085 13654164 . - . ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; > LG08 maker exon 13654474 13654828 . - . ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; > LG08 maker exon 13656667 13656687 . - . ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; > LG08 maker CDS 13656667 13656687 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654474 13654828 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654085 13654164 . - 2 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653910 13653974 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653764 13653817 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653587 13653641 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653175 13653212 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652643 13652730 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652270 13652365 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651736 13651789 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651319 13651468 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649816 13651318 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649295 13649577 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13648888 13648944 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > > Carson, I saw that, but I did use Uniprot/Swiss-prot. A snap of the blast-output used as input here: > GAMO_00029212-RA sp|Q8BJZ3|LFG3_MOUSE 53.93 280 112 3 81 348 33 307 2e-92 285 > GAMO_00029212-RA sp|Q969X1|LFG3_HUMAN 54.51 288 103 5 76 347 33 308 4e-92 284 > GAMO_00029212-RA sp|Q9BWQ8|LFG2_HUMAN 45.73 328 134 6 44 351 13 316 2e-86 270 > GAMO_00029212-RA sp|Q5R4I4|LFG2_PONAB 45.73 328 134 6 44 351 13 316 3e-86 269 > GAMO_00029212-RA sp|Q1LZ71|LFG2_BOVIN 45.03 322 145 5 44 351 13 316 5e-84 264 > GAMO_00029212-RA sp|O88407|LFG2_RAT 44.65 327 139 6 44 351 13 316 8e-83 261 > GAMO_00029212-RA sp|Q8K097|LFG2_MOUSE 45.16 310 129 5 60 351 31 317 1e-80 255 > GAMO_00029212-RA sp|Q7Z429|LFG1_HUMAN 39.32 351 164 9 32 351 39 371 6e-69 226 > GAMO_00029212-RA sp|Q32L53|LFG1_BOVIN 41.69 343 158 8 29 351 46 366 8e-66 218 > GAMO_00029212-RA sp|Q9ESF4|LFG1_MOUSE 40.43 324 156 8 53 351 34 345 2e-59 201 > GAMO_00029212-RA sp|Q6P6R0|LFG1_RAT 39.71 345 165 11 34 351 20 348 2e-59 201 > GAMO_00029212-RA sp|Q9DA39|LFG4_MOUSE 35.59 222 120 7 142 351 27 237 3e-24 103 > GAMO_00029212-RA sp|Q49P94|GAAP_VACCL 33.47 239 128 9 113 337 1 222 5e-22 97.1 > GAMO_00029233-RA sp|Q2KIK0|SGT1_BOVIN 53.18 299 100 3 5 268 17 310 5e-89 275 > GAMO_00029233-RA sp|B0BN85|SGT1_RAT 51.51 299 104 3 5 268 16 308 5e-86 268 > GAMO_00029233-RA sp|Q9CX34|SGT1_MOUSE 51.51 299 104 3 5 268 16 308 8e-86 267 > GAMO_00029233-RA sp|Q9Y2Z0|SGT1_HUMAN 46.83 331 100 5 5 268 16 337 1e-80 254 > GAMO_00029233-RA sp|Q0JL44|SGT1_ORYSJ 30.75 322 160 4 10 268 16 337 5e-36 137 > GAMO_00029233-RA sp|Q9SUT5|SGT1B_ARATH 27.99 318 171 4 9 268 11 328 3e-35 135 > GAMO_00029233-RA sp|Q9SUR9|SGT1A_ARATH 28.28 297 159 5 24 268 26 320 7e-35 134 > GAMO_00029233-RA sp|Q55ED0|SGT1_DICDI 37.72 167 63 3 138 268 196 357 5e-25 107 > > 521 genes have had added function before maker_functional_gff choked particular gene GAMO_00029233. > > Thank you. > > Ole > > > On 16 December 2015 at 20:37, Carson Holt > wrote: > I?ve seen this exact same error before (https://groups.google.com/forum/#!searchin/maker-devel/$2Fmaker_functional_gff$20line$2058/maker-devel/cBuQMKTJj2M/aXGnARZ7JhsJ ). > > It is caused by the ID from the blast report and input protein fasta. maker_functional_gff is not a generic script that can work on any input, it only works on blast results against Uniprot/Swiss-prot. The script is expecting a very specific header format in both the report and the protein fasta and if it doesn?t see it, then it is missing certain pieces of needed information. > > Thanks, > Carson > >> On Dec 16, 2015, at 12:27 PM, Daniel Ence > wrote: >> >> Hi Ole, can you send a line for a gene feature that does work? >> >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> >>> On Dec 14, 2015, at 12:21 PM, Ole Kristian T?rresen > wrote: >>> >>> Hi, >>> I'm trying to update my annotation with some functional annotations with maker_functional_gff, but get this annoying error: >>> Can't use string ("") as a HASH ref while "strict refs" in use at /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, <$IN> line 108947. >>> Line 108947 in the input gff is this: >>> >>> LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; >>> It seems like the regexp in line 55 in the maker_functional_gff script doesn't pick up the ID, but I can't see any difference between that line and other similar lines. >>> >>> Any help to trace down this is really appreciated. Do you need any other information? >>> >>> Thank you. >>> >>> Sincerely, >>> >>> Ole Kristian T?rresen >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Wed Dec 16 17:41:48 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Thu, 17 Dec 2015 10:41:48 +1100 Subject: [maker-devel] First time using maker- Train or not to train? In-Reply-To: References: <084E7DB7-0A91-458E-B590-58BB6CC42E70@yahoo.com> Message-ID: Hi Daniel, Have you guys heard about BUSCO ? It's kind of a replacement for CEGMA, which was based in a rather limited set of genes (according to their devels we should stop using). BUSCO does not only produces a more thorough completeness profile but it also generates the Augustus species training profile (it needs access to your local Augustus species folder). According to the manual, if you use the --long option it is similar to a training and retraining step in the old training method. I recently used it for training Augustus for my fungal genomes and it works well. Unfortunately, it may not apply for this case as they don't have the plant profile dataset ready yet. You may request early access to it though I used to use the CEGMA output plus the webAugustus training service, a bit more tedious but not that complicated. I copy below what I had in my old protocol, nonetheless I would recommend any other user not dealing with plant genomes to use BUSCO instead: Augustus gff files are a bit different from CEGMA ones. Get the CEGMA > output and run the following script: > cegma2zff output.cegma.gff > augustus.gff > > Upload the genome file (e.g. contigs.fa from velvet) and the "training > gene structure file" (augustus.gff) to > http://bioinf.uni-greifswald.de/webaugustus/training/create > > Once finished, the "Species parameter archive" (parameters.tar.gz) will > contain a folder with the model files for your species. Copy it to the > species folder of Augustus (augustus/config/species). > > Re-training > > From Maker's output, follow the the same initial instructions as for SNAP > training detailed in the Maker tutorial: > In the directory that contains MYGENOME.maker.output/ folder: > mkdir snap > cd snap > gff3_merge -d > ../MYGENOME.maker.output/MYGENOME_master_datastore_index.log > maker2zff -n MYGENOME.all.gff > The option -n is not included in the original tutorial but you may end > with empty genome.ann and genome.dna files. > From this point we generate training files for both SNAP and Augustus: > > fathom genome.ann genome.dna -categorize 1000 > fathom uni.ann uni.dna -export 1000 -plus > forge export.ann export.dna > > For Augustus, we need the script "zff2augustus_gbk.pl". This will take > the export.dna generated by fathom and generate a *.gb file that will be > used as "training gene structure file" in a new training submission in > WebAugustus, but remember to give it a new name in the submission, e.g. > MYGENOME_v2, or Maker won't see the difference (same name): > perl PATH/TO/SCRIPT/zff2augustus_gbk.pl > MYGENOME_v2.train.gb > Xabier On 17 December 2015 at 05:07, Daniel Ence wrote: > Hi Elyssa, > > Setting est2genome=1 tells MAKER to promote all of the est2genome > alignments to a gene model, which is not what you want for a final gene > set. That being said, since your gene models are basically the unmodified > alignments, I?m surprised that all of them have an AED of 1, since that > means that they?re not supported by any of the evidence (either est or > protein). > > Did you get gene models from snap or augustus? You can gather those with > the fasta_merge script. Those should be a good starting place for training > ab initio predictors. Instructions for training snap can be found here: > http://gmod.org/wiki/MAKER_Tutorial#Training_ab_initio_Gene_Predictors > > Augustus can also be trained but is much more involved. > > ~Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On Dec 11, 2015, at 10:43 AM, Elyssa Garza wrote: > > Hello, > > I have recently begun running Maker. I am currently trying to annotate my > Caulanthus Genome (~372Mb); a relative to Arabidopsis. I am unsure about > the parameters I have chosen for my first run in maker, which include: > > genome=CAB_assembly.fasta (1044 contigs) > est=Representative_transcript_loci.fasta (assembled transcripts btw > 200-20000bp long) > protein=TAIR10pep.fasta (Arabidopsis proteins) > ? > *Repeat masking* > model_org=arabidopsis > rmlib=list of Brassicaceae and common plant repeats > repeat_protein=te_proteins.fasta > *Gene Prediction* > snaphmm=A.thaliana.hmm > augustus_species=arabidopsis > est2genome=1 > > I have run a sample file of scaffolds, as well as the entire genome. > In the sample file of scaffolds, I gff3merged the gffs and then ran > evaluator. I noticed that my AED are all 1. Is this bad? What should I > try next? > > I am also unsure on how to train files and if this should be done in my > case. > > Can anyone advise me on these issues? > > -Elyssa > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Dec 16 18:13:29 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 16 Dec 2015 17:13:29 -0700 Subject: [maker-devel] First time using maker- Train or not to train? In-Reply-To: References: <084E7DB7-0A91-458E-B590-58BB6CC42E70@yahoo.com> Message-ID: Yes. BUSCO is awesome. Also they have presentations this year at PAG in both the ?Next Generation Genome Annotation and Analysis? and ?Computational Gene Discovery? workshops. ?Carson > On Dec 16, 2015, at 4:41 PM, Xabier V?zquez Campos wrote: > > Hi Daniel, > > Have you guys heard about BUSCO ? It's kind of a replacement for CEGMA, which was based in a rather limited set of genes (according to their devels we should stop using). BUSCO does not only produces a more thorough completeness profile but it also generates the Augustus species training profile (it needs access to your local Augustus species folder). According to the manual, if you use the --long option it is similar to a training and retraining step in the old training method. > > I recently used it for training Augustus for my fungal genomes and it works well. Unfortunately, it may not apply for this case as they don't have the plant profile dataset ready yet. You may request early access to it though > > I used to use the CEGMA output plus the webAugustus training service, a bit more tedious but not that complicated. I copy below what I had in my old protocol, nonetheless I would recommend any other user not dealing with plant genomes to use BUSCO instead: > > Augustus gff files are a bit different from CEGMA ones. Get the CEGMA output and run the following script: > cegma2zff output.cegma.gff > augustus.gff > > Upload the genome file (e.g. contigs.fa from velvet) and the "training gene structure file" (augustus.gff) to http://bioinf.uni-greifswald.de/webaugustus/training/create > > Once finished, the "Species parameter archive" (parameters.tar.gz) will contain a folder with the model files for your species. Copy it to the species folder of Augustus (augustus/config/species). > > Re-training > > From Maker's output, follow the the same initial instructions as for SNAP training detailed in the Maker tutorial: > In the directory that contains MYGENOME.maker.output/ folder: > mkdir snap > cd snap > gff3_merge -d ../MYGENOME.maker.output/MYGENOME_master_datastore_index.log > maker2zff -n MYGENOME.all.gff > The option -n is not included in the original tutorial but you may end with empty genome.ann and genome.dna files. > From this point we generate training files for both SNAP and Augustus: > > fathom genome.ann genome.dna -categorize 1000 > fathom uni.ann uni.dna -export 1000 -plus > forge export.ann export.dna > > For Augustus, we need the script "zff2augustus_gbk.pl ". This will take the export.dna generated by fathom and generate a *.gb file that will be used as "training gene structure file" in a new training submission in WebAugustus, but remember to give it a new name in the submission, e.g. MYGENOME_v2, or Maker won't see the difference (same name): > perl PATH/TO/SCRIPT/zff2augustus_gbk.pl > MYGENOME_v2.train.gb > > Xabier > > On 17 December 2015 at 05:07, Daniel Ence > wrote: > Hi Elyssa, > > Setting est2genome=1 tells MAKER to promote all of the est2genome alignments to a gene model, which is not what you want for a final gene set. That being said, since your gene models are basically the unmodified alignments, I?m surprised that all of them have an AED of 1, since that means that they?re not supported by any of the evidence (either est or protein). > > Did you get gene models from snap or augustus? You can gather those with the fasta_merge script. Those should be a good starting place for training ab initio predictors. Instructions for training snap can be found here: > http://gmod.org/wiki/MAKER_Tutorial#Training_ab_initio_Gene_Predictors > > Augustus can also be trained but is much more involved. > > ~Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Dec 11, 2015, at 10:43 AM, Elyssa Garza > wrote: >> >> Hello, >> >> I have recently begun running Maker. I am currently trying to annotate my Caulanthus Genome (~372Mb); a relative to Arabidopsis. I am unsure about the parameters I have chosen for my first run in maker, which include: >> >> genome=CAB_assembly.fasta (1044 contigs) >> est=Representative_transcript_loci.fasta (assembled transcripts btw 200-20000bp long) >> protein=TAIR10pep.fasta (Arabidopsis proteins) >> ? >> Repeat masking >> model_org=arabidopsis >> rmlib=list of Brassicaceae and common plant repeats >> repeat_protein=te_proteins.fasta >> Gene Prediction >> snaphmm=A.thaliana.hmm >> augustus_species=arabidopsis >> est2genome=1 >> >> I have run a sample file of scaffolds, as well as the entire genome. >> In the sample file of scaffolds, I gff3merged the gffs and then ran evaluator. I noticed that my AED are all 1. Is this bad? What should I try next? >> >> I am also unsure on how to train files and if this should be done in my case. >> >> Can anyone advise me on these issues? >> >> -Elyssa >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > Xabier V?zquez-Campos, PhD > Research Associate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ole.toerresen at gmail.com Thu Dec 17 00:32:26 2015 From: ole.toerresen at gmail.com (=?UTF-8?Q?Ole_Kristian_T=C3=B8rresen?=) Date: Thu, 17 Dec 2015 07:32:26 +0100 Subject: [maker-devel] Error with maker_functional_gff In-Reply-To: References: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> Message-ID: Here's the hits for GAMO_00029233 >sp|Q9SUR9|SGT1A_ARATH Protein SGT1 homolog A OS=Arabidopsis thaliana GN=SGT1A PE=1 SV=1 >sp|Q9SUT5|SGT1B_ARATH Protein SGT1 homolog B OS=Arabidopsis thaliana GN=SGT1B PE=1 SV=1 >sp|Q2KIK0|SGT1_BOVIN Protein SGT1 homolog OS=Bos taurus GN=SUGT1 PE=2 SV=1 >sp|Q55ED0|SGT1_DICDI Protein SGT1 homolog OS=Dictyostelium discoideum GN=sugt1 PE=2 SV=1 >sp|Q9Y2Z0|SGT1_HUMAN Protein SGT1 homolog OS=Homo sapiens GN=SUGT1 PE=1 SV=3 >sp|Q9CX34|SGT1_MOUSE Protein SGT1 homolog OS=Mus musculus GN=Sugt1 PE=1 SV=3 >sp|Q0JL44|SGT1_ORYSJ Protein SGT1 homolog OS=Oryza sativa subsp. japonica GN=SGT1 PE=1 SV=1 >sp|B0BN85|SGT1_RAT Protein SGT1 homolog OS=Rattus norvegicus GN=Sugt1 PE=2 SV=1 The bovin is the first hit. I can't really see anything different about that. I'm don't know perl that well. Do you have some code which I can use to debug this? In line 58 it tries to access the blast hash with the ID as a key, if I understand this correctly. Either the hash is empty where the key tries to access, or the key is empty. If I could print each ID as it is found, maybe I can find a pattern. And/or print each blast entry when the blast hash is created. Thank you. Ole On 16 December 2015 at 21:55, Carson Holt wrote: > Find the hit for GAMO_00029233 and then pull it?s header line out of the > Uniprot fasta file. There may be an unexpected formatting difference in > that header. > > ?Carson > > > > On Dec 16, 2015, at 1:53 PM, Ole Kristian T?rresen < > ole.toerresen at gmail.com> wrote: > > Daniel, > this is the previous gene, before maker_functional_gff: > LG08 maker gene 13648888 13656687 . - . > ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325; > LG08 maker mRNA 13648888 13656687 . - . > > ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45; > LG08 maker exon 13648888 13648944 . - . > ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; > LG08 maker exon 13649295 13649577 . - . > ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; > LG08 maker exon 13649816 13651468 . - . > ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; > LG08 maker exon 13651736 13651789 . - . > ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; > LG08 maker exon 13652270 13652365 . - . > ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; > LG08 maker exon 13652643 13652730 . - . > ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; > LG08 maker exon 13653175 13653212 . - . > ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; > LG08 maker exon 13653587 13653641 . - . > ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; > LG08 maker exon 13653764 13653817 . - . > ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; > LG08 maker exon 13653910 13653974 . - . > ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; > LG08 maker exon 13654085 13654164 . - . > ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; > LG08 maker exon 13654474 13654828 . - . > ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; > LG08 maker exon 13656667 13656687 . - . > ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; > LG08 maker CDS 13656667 13656687 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654474 13654828 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654085 13654164 . - 2 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653910 13653974 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653764 13653817 . - 1 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653587 13653641 . - 1 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653175 13653212 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652643 13652730 . - 1 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652270 13652365 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651736 13651789 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651319 13651468 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649816 13651318 . - > . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649295 13649577 . - > . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13648888 13648944 . - > . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker gene 13786695 13806565 . - . > ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; > LG08 maker mRNA 13786695 13806565 . - . > > ID=GAMO_00029233-RA;Parent=GAMO_00029233;Name=GAMO_00029233-RA;Alias=maker-LG08-snap-gene-46.343-mRNA-1;_AED=0.47;_QI=173|0.78|0.66|1|0.21|0.26|15|0|301;_eAED=0.47; > > After : > LG08 maker gene 13648888 13656687 . - . > > ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325;Note=Similar > to Tmbim1: Protein lifeguard 3 (Mus musculus); > LG08 maker mRNA 13648888 13656687 . - . > > ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45;Note=Similar > to Tmbim1: Protein lifeguard 3 (Mus musculus); > LG08 maker exon 13648888 13648944 . - . > ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; > LG08 maker exon 13649295 13649577 . - . > ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; > LG08 maker exon 13649816 13651468 . - . > ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; > LG08 maker exon 13651736 13651789 . - . > ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; > LG08 maker exon 13652270 13652365 . - . > ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; > LG08 maker exon 13652643 13652730 . - . > ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; > LG08 maker exon 13653175 13653212 . - . > ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; > LG08 maker exon 13653587 13653641 . - . > ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; > LG08 maker exon 13653764 13653817 . - . > ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; > LG08 maker exon 13653910 13653974 . - . > ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; > LG08 maker exon 13654085 13654164 . - . > ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; > LG08 maker exon 13654474 13654828 . - . > ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; > LG08 maker exon 13656667 13656687 . - . > ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; > LG08 maker CDS 13656667 13656687 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654474 13654828 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654085 13654164 . - 2 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653910 13653974 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653764 13653817 . - 1 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653587 13653641 . - 1 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653175 13653212 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652643 13652730 . - 1 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652270 13652365 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651736 13651789 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651319 13651468 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649816 13651318 . - > . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649295 13649577 . - > . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13648888 13648944 . - > . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > > Carson, I saw that, but I did use Uniprot/Swiss-prot. A snap of the > blast-output used as input here: > GAMO_00029212-RA sp|Q8BJZ3|LFG3_MOUSE 53.93 280 112 3 > 81 348 33 307 2e-92 285 > GAMO_00029212-RA sp|Q969X1|LFG3_HUMAN 54.51 288 103 5 > 76 347 33 308 4e-92 284 > GAMO_00029212-RA sp|Q9BWQ8|LFG2_HUMAN 45.73 328 134 6 > 44 351 13 316 2e-86 270 > GAMO_00029212-RA sp|Q5R4I4|LFG2_PONAB 45.73 328 134 6 > 44 351 13 316 3e-86 269 > GAMO_00029212-RA sp|Q1LZ71|LFG2_BOVIN 45.03 322 145 5 > 44 351 13 316 5e-84 264 > GAMO_00029212-RA sp|O88407|LFG2_RAT 44.65 327 139 6 > 44 351 13 316 8e-83 261 > GAMO_00029212-RA sp|Q8K097|LFG2_MOUSE 45.16 310 129 5 > 60 351 31 317 1e-80 255 > GAMO_00029212-RA sp|Q7Z429|LFG1_HUMAN 39.32 351 164 9 > 32 351 39 371 6e-69 226 > GAMO_00029212-RA sp|Q32L53|LFG1_BOVIN 41.69 343 158 8 > 29 351 46 366 8e-66 218 > GAMO_00029212-RA sp|Q9ESF4|LFG1_MOUSE 40.43 324 156 8 > 53 351 34 345 2e-59 201 > GAMO_00029212-RA sp|Q6P6R0|LFG1_RAT 39.71 345 165 11 > 34 351 20 348 2e-59 201 > GAMO_00029212-RA sp|Q9DA39|LFG4_MOUSE 35.59 222 120 7 > 142 351 27 237 3e-24 103 > GAMO_00029212-RA sp|Q49P94|GAAP_VACCL 33.47 239 128 9 > 113 337 1 222 5e-22 97.1 > GAMO_00029233-RA sp|Q2KIK0|SGT1_BOVIN 53.18 299 100 3 > 5 268 17 310 5e-89 275 > GAMO_00029233-RA sp|B0BN85|SGT1_RAT 51.51 299 104 3 > 5 268 16 308 5e-86 268 > GAMO_00029233-RA sp|Q9CX34|SGT1_MOUSE 51.51 299 104 3 > 5 268 16 308 8e-86 267 > GAMO_00029233-RA sp|Q9Y2Z0|SGT1_HUMAN 46.83 331 100 5 > 5 268 16 337 1e-80 254 > GAMO_00029233-RA sp|Q0JL44|SGT1_ORYSJ 30.75 322 160 4 > 10 268 16 337 5e-36 137 > GAMO_00029233-RA sp|Q9SUT5|SGT1B_ARATH 27.99 318 171 4 > 9 268 11 328 3e-35 135 > GAMO_00029233-RA sp|Q9SUR9|SGT1A_ARATH 28.28 297 159 5 > 24 268 26 320 7e-35 134 > GAMO_00029233-RA sp|Q55ED0|SGT1_DICDI 37.72 167 63 3 > 138 268 196 357 5e-25 107 > > 521 genes have had added function before maker_functional_gff choked > particular gene GAMO_00029233. > > Thank you. > > Ole > > > On 16 December 2015 at 20:37, Carson Holt wrote: > >> I?ve seen this exact same error before ( >> https://groups.google.com/forum/#!searchin/maker-devel/$2Fmaker_functional_gff$20line$2058/maker-devel/cBuQMKTJj2M/aXGnARZ7JhsJ >> ). >> >> It is caused by the ID from the blast report and input protein >> fasta. maker_functional_gff is not a generic script that can work on any >> input, it only works on blast results against Uniprot/Swiss-prot. The >> script is expecting a very specific header format in both the report and >> the protein fasta and if it doesn?t see it, then it is missing certain >> pieces of needed information. >> >> Thanks, >> Carson >> >> On Dec 16, 2015, at 12:27 PM, Daniel Ence >> wrote: >> >> Hi Ole, can you send a line for a gene feature that does work? >> >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> >> On Dec 14, 2015, at 12:21 PM, Ole Kristian T?rresen < >> ole.toerresen at gmail.com> wrote: >> >> Hi, >> I'm trying to update my annotation with some functional annotations >> with maker_functional_gff, but get this annoying error: >> Can't use string ("") as a HASH ref while "strict refs" in use at >> /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, >> <$IN> line 108947. >> >> Line 108947 in the input gff is this: >> >> LG08 maker gene 13786695 13806565 . - . >> ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; >> >> It seems like the regexp in line 55 in the maker_functional_gff script >> doesn't pick up the ID, but I can't see any difference between that line >> and other similar lines. >> >> Any help to trace down this is really appreciated. Do you need any other >> information? >> >> Thank you. >> >> Sincerely, >> >> Ole Kristian T?rresen >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From huazhong at nmsu.edu Thu Dec 17 12:03:56 2015 From: huazhong at nmsu.edu (Hua Zhong) Date: Thu, 17 Dec 2015 18:03:56 +0000 Subject: [maker-devel] maker 2.31.8 segmentation fault when setting up GFF3 output and fasta chunks with mvapich2 Message-ID: Hello, we are using maker (2.31.8) with mvapich2, but the program terminates with a segmentation fault while setting up GFF3 output and fasta chunks. We really have no idea what the problem was. Below is the error message: +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ setting up GFF3 output and fasta chunks setting up GFF3 output and fasta chunks setting up GFF3 output and fasta chunks setting up GFF3 output and fasta chunks setting up GFF3 output and fasta chunks setting up GFF3 output and fasta chunks [fpga04.cluster:mpi_rank_111][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_107][error_sighandler] Caught error: Segmentation fault (signal 11) Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached [fpga04.cluster:mpi_rank_113][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_115][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_114][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_105][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_108][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_110][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_104][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_106][error_sighandler] Caught error: Segmentation fault (signal 11) +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Best regards, Hua -------------- next part -------------- An HTML attachment was scrubbed... URL: From elyssa_garza at yahoo.com Thu Dec 17 16:29:56 2015 From: elyssa_garza at yahoo.com (Elyssa Garza) Date: Thu, 17 Dec 2015 22:29:56 +0000 (UTC) Subject: [maker-devel] First time using maker- Train or not to train? In-Reply-To: References: Message-ID: <802013873.330112.1450391396060.JavaMail.yahoo@mail.yahoo.com> Hi Daniel, I used the pre-trained models of Arabidopsis from SNAP and Augustus for this first run of maker.? Do you think it would be wise to use the run I used previously (shown at the start of the topic) or should I make a new run with the following parameters to use for training?? genome=CAB_assembly.fastaest=RTLs.faaltest=Brassica_oleracea.fasta protein=Arabidopsis_proteins.fastaest2genome=0protein2genome=0SNAP=A.thalianaAugustus=arabidopsismodel_org=arabidopsisrmlib=Brassicaceae_repeats.fastarepeat_protein=te_proteins.fasta At what point would I use est2genome=1?? Also for this plant genome, is it better to use model_org=arabidopsis or model_org=all?? I am also considering using RepeatModeler to create a custom repeat library, but I am not sure it is necessary with all of the repeat information I am putting in already. Any advice is helpful.Thanks,-Elyssa On Wednesday, December 16, 2015 12:07 PM, Daniel Ence wrote: Hi Elyssa,? Setting est2genome=1 tells MAKER to promote all of the est2genome alignments to a gene model, which is not what you want for a final gene set. That being said, since your gene models are basically the unmodified alignments, I?m surprised that all of them have an AED of 1, since that means that they?re not supported by any of the evidence (either est or protein).? Did you get gene models from snap or augustus? You can gather those with the fasta_merge script. Those should be a good starting place for training ab initio predictors. Instructions for training snap can be found here:http://gmod.org/wiki/MAKER_Tutorial#Training_ab_initio_Gene_Predictors Augustus can also be trained but is much more involved. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Dec 11, 2015, at 10:43 AM, Elyssa Garza wrote: Hello, I have recently begun running Maker. ?I am currently trying to annotate my Caulanthus Genome (~372Mb); a relative to Arabidopsis. ?I am unsure about the parameters I have chosen for my first run in maker, which include: genome=CAB_assembly.fasta (1044 contigs)est=Representative_transcript_loci.fasta (assembled transcripts btw 200-20000bp long)protein=TAIR10pep.fasta (Arabidopsis proteins)?Repeat maskingmodel_org=arabidopsisrmlib=list of Brassicaceae and common plant repeatsrepeat_protein=te_proteins.fastaGene Predictionsnaphmm=A.thaliana.hmmaugustus_species=arabidopsisest2genome=1 I have run a sample file of scaffolds, as well as the entire genome.In the sample file of scaffolds, I gff3merged the gffs and then ran evaluator. ?I noticed that my AED are all 1. ?Is this bad? ?What should I try next? I am also unsure on how to train files and if this should be done in my case. Can anyone advise me on these issues? -Elyssa_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Dec 17 16:37:43 2015 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 17 Dec 2015 15:37:43 -0700 Subject: [maker-devel] maker 2.31.8 segmentation fault when setting up GFF3 output and fasta chunks with mvapich2 In-Reply-To: References: Message-ID: <417397FD-0BFD-46E6-972F-4792C42FBAC7@gmail.com> MAKER does not work with mvapich2. You must use either OpenMPI or MPICH2. The following is from the INSTALL instructions that come with MAKER ?> If using OpenMPI, make sure to set LD_PRELOAD to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/location/of/openmpi/lib/libmpi.so). 1. Say yes to the 'configure for MPI' question when running 'perl Build.PL? in step 1 of the EASY INSTALL. 2. Give path to 'mpicc'. Note to make sure you do not give the path to ?mpicc' from another MPI flavor that might be installed on your system. 3. Give path to the folder containing 'mpi,h'. Note to make sure you do not give the path to a folder from another MPI flavor that might be installed on your system. Mixing MPI flavors for 'mpicc' and 'mpi.h' will cause failures. Make sure to read and confirm the auto-detected paths. 4. Finish installation according to steps 2-4 of the EASY INSTALL Note: For OpenMPI you may also want to set OMPI_MCA_mpi_warn_on_fork=0 in your ~/.bash_profile to turn off certain nonfatal warnings. Note: If jobs hang or freeze when using mpiexec under OpenMPI try adding the '-mca btl ^openib' flag to mpiexec command when running MAKER. Example: mpiexec -mca btl ^openib -n 20 maker Thanks, Carson > On Dec 17, 2015, at 11:03 AM, Hua Zhong wrote: > > Hello, > we are using maker (2.31.8) with mvapich2, but the program terminates with a segmentation fault while setting up GFF3 output and fasta chunks. We really have no idea what the problem was. > > Below is the error message: > > > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > setting up GFF3 output and fasta chunks > setting up GFF3 output and fasta chunks > setting up GFF3 output and fasta chunks > setting up GFF3 output and fasta chunks > setting up GFF3 output and fasta chunks > setting up GFF3 output and fasta chunks > [fpga04.cluster:mpi_rank_111][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_107][error_sighandler] Caught error: Segmentation fault (signal 11) > Perl exited with active threads: > 1 running and unjoined > 0 finished and unjoined > 0 running and detached > [fpga04.cluster:mpi_rank_113][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_115][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_114][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_105][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_108][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_110][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_104][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_106][error_sighandler] Caught error: Segmentation fault (signal 11) > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Best regards, > > Hua > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From elyssa_garza at yahoo.com Mon Dec 28 14:21:40 2015 From: elyssa_garza at yahoo.com (Elyssa Garza) Date: Mon, 28 Dec 2015 14:21:40 -0600 Subject: [maker-devel] getting AED scores Message-ID: <8611B3D7-76C4-4F37-972E-91055D752D47@yahoo.com> pred_stats=0 #report AED and QI statistics for all predictions as well as models I recently finished a run of maker on my genome and would like to look at the AED score. I usually load the resulting files into CLCbio to see the AED. However, I noticed that pred_stats was an option available in the GMOD 2014 tutorial. I tried using this option and I receive the following warning: WARNING: Invalid option 'pred_stats' in control file maker_opts.ctl Is there a separate script I can use to get these statistics? -Elyssa -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Dec 29 19:43:06 2015 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 29 Dec 2015 18:43:06 -0700 Subject: [maker-devel] maker-devel post from elyssa_garza@yahoo.com requires approval In-Reply-To: References: Message-ID: <3B52FCE6-09AA-48E0-93EF-9D1F8ED2EF0A@gmail.com> It means you have a really old MAKER installation on your system that predates the ?pred_stats? option. You just need to update. Thanks, Carson > > From: Elyssa Garza > Subject: getting AED scores > Date: December 28, 2015 at 1:21:40 PM MST > To: maker-devel at yandell-lab.org > > > pred_stats=0 #report AED and QI statistics for all predictions as well as models > I recently finished a run of maker on my genome and would like to look at the AED score. I usually load the resulting files into CLCbio to see the AED. However, I noticed that pred_stats was an option available in the GMOD 2014 tutorial. I tried using this option and I receive the following warning: > > > WARNING: Invalid option 'pred_stats' in control file maker_opts.ctl > > Is there a separate script I can use to get these statistics? > > -Elyssa > -------------- next part -------------- An HTML attachment was scrubbed... URL: From elyssa_garza at yahoo.com Fri Dec 11 10:43:32 2015 From: elyssa_garza at yahoo.com (Elyssa Garza) Date: Fri, 11 Dec 2015 11:43:32 -0600 Subject: [maker-devel] First time using maker- Train or not to train? Message-ID: <084E7DB7-0A91-458E-B590-58BB6CC42E70@yahoo.com> Hello, I have recently begun running Maker. I am currently trying to annotate my Caulanthus Genome (~372Mb); a relative to Arabidopsis. I am unsure about the parameters I have chosen for my first run in maker, which include: genome=CAB_assembly.fasta (1044 contigs) est=Representative_transcript_loci.fasta (assembled transcripts btw 200-20000bp long) protein=TAIR10pep.fasta (Arabidopsis proteins) ? Repeat masking model_org=arabidopsis rmlib=list of Brassicaceae and common plant repeats repeat_protein=te_proteins.fasta Gene Prediction snaphmm=A.thaliana.hmm augustus_species=arabidopsis est2genome=1 I have run a sample file of scaffolds, as well as the entire genome. In the sample file of scaffolds, I gff3merged the gffs and then ran evaluator. I noticed that my AED are all 1. Is this bad? What should I try next? I am also unsure on how to train files and if this should be done in my case. Can anyone advise me on these issues? -Elyssa -------------- next part -------------- An HTML attachment was scrubbed... URL: From ole.toerresen at gmail.com Mon Dec 14 12:21:11 2015 From: ole.toerresen at gmail.com (=?UTF-8?Q?Ole_Kristian_T=C3=B8rresen?=) Date: Mon, 14 Dec 2015 20:21:11 +0100 Subject: [maker-devel] Error with maker_functional_gff Message-ID: Hi, I'm trying to update my annotation with some functional annotations with maker_functional_gff, but get this annoying error: Can't use string ("") as a HASH ref while "strict refs" in use at /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, <$IN> line 108947. Line 108947 in the input gff is this: LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; It seems like the regexp in line 55 in the maker_functional_gff script doesn't pick up the ID, but I can't see any difference between that line and other similar lines. Any help to trace down this is really appreciated. Do you need any other information? Thank you. Sincerely, Ole Kristian T?rresen -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Dec 16 11:07:07 2015 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 16 Dec 2015 18:07:07 +0000 Subject: [maker-devel] First time using maker- Train or not to train? In-Reply-To: <084E7DB7-0A91-458E-B590-58BB6CC42E70@yahoo.com> References: <084E7DB7-0A91-458E-B590-58BB6CC42E70@yahoo.com> Message-ID: Hi Elyssa, Setting est2genome=1 tells MAKER to promote all of the est2genome alignments to a gene model, which is not what you want for a final gene set. That being said, since your gene models are basically the unmodified alignments, I?m surprised that all of them have an AED of 1, since that means that they?re not supported by any of the evidence (either est or protein). Did you get gene models from snap or augustus? You can gather those with the fasta_merge script. Those should be a good starting place for training ab initio predictors. Instructions for training snap can be found here: http://gmod.org/wiki/MAKER_Tutorial#Training_ab_initio_Gene_Predictors Augustus can also be trained but is much more involved. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Dec 11, 2015, at 10:43 AM, Elyssa Garza > wrote: Hello, I have recently begun running Maker. I am currently trying to annotate my Caulanthus Genome (~372Mb); a relative to Arabidopsis. I am unsure about the parameters I have chosen for my first run in maker, which include: genome=CAB_assembly.fasta (1044 contigs) est=Representative_transcript_loci.fasta (assembled transcripts btw 200-20000bp long) protein=TAIR10pep.fasta (Arabidopsis proteins) ? Repeat masking model_org=arabidopsis rmlib=list of Brassicaceae and common plant repeats repeat_protein=te_proteins.fasta Gene Prediction snaphmm=A.thaliana.hmm augustus_species=arabidopsis est2genome=1 I have run a sample file of scaffolds, as well as the entire genome. In the sample file of scaffolds, I gff3merged the gffs and then ran evaluator. I noticed that my AED are all 1. Is this bad? What should I try next? I am also unsure on how to train files and if this should be done in my case. Can anyone advise me on these issues? -Elyssa _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Dec 16 12:27:00 2015 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 16 Dec 2015 19:27:00 +0000 Subject: [maker-devel] Error with maker_functional_gff In-Reply-To: References: Message-ID: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> Hi Ole, can you send a line for a gene feature that does work? Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Dec 14, 2015, at 12:21 PM, Ole Kristian T?rresen > wrote: Hi, I'm trying to update my annotation with some functional annotations with maker_functional_gff, but get this annoying error: Can't use string ("") as a HASH ref while "strict refs" in use at /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, <$IN> line 108947. Line 108947 in the input gff is this: LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; It seems like the regexp in line 55 in the maker_functional_gff script doesn't pick up the ID, but I can't see any difference between that line and other similar lines. Any help to trace down this is really appreciated. Do you need any other information? Thank you. Sincerely, Ole Kristian T?rresen _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Dec 16 12:37:14 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 16 Dec 2015 12:37:14 -0700 Subject: [maker-devel] Error with maker_functional_gff In-Reply-To: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> References: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> Message-ID: I?ve seen this exact same error before (https://groups.google.com/forum/#!searchin/maker-devel/$2Fmaker_functional_gff$20line$2058/maker-devel/cBuQMKTJj2M/aXGnARZ7JhsJ). It is caused by the ID from the blast report and input protein fasta. maker_functional_gff is not a generic script that can work on any input, it only works on blast results against Uniprot/Swiss-prot. The script is expecting a very specific header format in both the report and the protein fasta and if it doesn?t see it, then it is missing certain pieces of needed information. Thanks, Carson > On Dec 16, 2015, at 12:27 PM, Daniel Ence wrote: > > Hi Ole, can you send a line for a gene feature that does work? > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Dec 14, 2015, at 12:21 PM, Ole Kristian T?rresen > wrote: >> >> Hi, >> I'm trying to update my annotation with some functional annotations with maker_functional_gff, but get this annoying error: >> Can't use string ("") as a HASH ref while "strict refs" in use at /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, <$IN> line 108947. >> Line 108947 in the input gff is this: >> >> LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; >> It seems like the regexp in line 55 in the maker_functional_gff script doesn't pick up the ID, but I can't see any difference between that line and other similar lines. >> >> Any help to trace down this is really appreciated. Do you need any other information? >> >> Thank you. >> >> Sincerely, >> >> Ole Kristian T?rresen >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ole.toerresen at gmail.com Wed Dec 16 13:53:25 2015 From: ole.toerresen at gmail.com (=?UTF-8?Q?Ole_Kristian_T=C3=B8rresen?=) Date: Wed, 16 Dec 2015 21:53:25 +0100 Subject: [maker-devel] Error with maker_functional_gff In-Reply-To: References: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> Message-ID: Daniel, this is the previous gene, before maker_functional_gff: LG08 maker gene 13648888 13656687 . - . ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325; LG08 maker mRNA 13648888 13656687 . - . ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45; LG08 maker exon 13648888 13648944 . - . ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; LG08 maker exon 13649295 13649577 . - . ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; LG08 maker exon 13649816 13651468 . - . ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; LG08 maker exon 13651736 13651789 . - . ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; LG08 maker exon 13652270 13652365 . - . ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; LG08 maker exon 13652643 13652730 . - . ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; LG08 maker exon 13653175 13653212 . - . ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; LG08 maker exon 13653587 13653641 . - . ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; LG08 maker exon 13653764 13653817 . - . ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; LG08 maker exon 13653910 13653974 . - . ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; LG08 maker exon 13654085 13654164 . - . ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; LG08 maker exon 13654474 13654828 . - . ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; LG08 maker exon 13656667 13656687 . - . ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; LG08 maker CDS 13656667 13656687 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13654474 13654828 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13654085 13654164 . - 2 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653910 13653974 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653764 13653817 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653587 13653641 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653175 13653212 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13652643 13652730 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13652270 13652365 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13651736 13651789 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13651319 13651468 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker three_prime_UTR 13649816 13651318 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; LG08 maker three_prime_UTR 13649295 13649577 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; LG08 maker three_prime_UTR 13648888 13648944 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; LG08 maker mRNA 13786695 13806565 . - . ID=GAMO_00029233-RA;Parent=GAMO_00029233;Name=GAMO_00029233-RA;Alias=maker-LG08-snap-gene-46.343-mRNA-1;_AED=0.47;_QI=173|0.78|0.66|1|0.21|0.26|15|0|301;_eAED=0.47; After : LG08 maker gene 13648888 13656687 . - . ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325;Note=Similar to Tmbim1: Protein lifeguard 3 (Mus musculus); LG08 maker mRNA 13648888 13656687 . - . ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45;Note=Similar to Tmbim1: Protein lifeguard 3 (Mus musculus); LG08 maker exon 13648888 13648944 . - . ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; LG08 maker exon 13649295 13649577 . - . ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; LG08 maker exon 13649816 13651468 . - . ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; LG08 maker exon 13651736 13651789 . - . ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; LG08 maker exon 13652270 13652365 . - . ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; LG08 maker exon 13652643 13652730 . - . ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; LG08 maker exon 13653175 13653212 . - . ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; LG08 maker exon 13653587 13653641 . - . ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; LG08 maker exon 13653764 13653817 . - . ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; LG08 maker exon 13653910 13653974 . - . ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; LG08 maker exon 13654085 13654164 . - . ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; LG08 maker exon 13654474 13654828 . - . ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; LG08 maker exon 13656667 13656687 . - . ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; LG08 maker CDS 13656667 13656687 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13654474 13654828 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13654085 13654164 . - 2 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653910 13653974 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653764 13653817 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653587 13653641 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653175 13653212 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13652643 13652730 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13652270 13652365 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13651736 13651789 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13651319 13651468 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker three_prime_UTR 13649816 13651318 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; LG08 maker three_prime_UTR 13649295 13649577 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; LG08 maker three_prime_UTR 13648888 13648944 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; Carson, I saw that, but I did use Uniprot/Swiss-prot. A snap of the blast-output used as input here: GAMO_00029212-RA sp|Q8BJZ3|LFG3_MOUSE 53.93 280 112 3 81 348 33 307 2e-92 285 GAMO_00029212-RA sp|Q969X1|LFG3_HUMAN 54.51 288 103 5 76 347 33 308 4e-92 284 GAMO_00029212-RA sp|Q9BWQ8|LFG2_HUMAN 45.73 328 134 6 44 351 13 316 2e-86 270 GAMO_00029212-RA sp|Q5R4I4|LFG2_PONAB 45.73 328 134 6 44 351 13 316 3e-86 269 GAMO_00029212-RA sp|Q1LZ71|LFG2_BOVIN 45.03 322 145 5 44 351 13 316 5e-84 264 GAMO_00029212-RA sp|O88407|LFG2_RAT 44.65 327 139 6 44 351 13 316 8e-83 261 GAMO_00029212-RA sp|Q8K097|LFG2_MOUSE 45.16 310 129 5 60 351 31 317 1e-80 255 GAMO_00029212-RA sp|Q7Z429|LFG1_HUMAN 39.32 351 164 9 32 351 39 371 6e-69 226 GAMO_00029212-RA sp|Q32L53|LFG1_BOVIN 41.69 343 158 8 29 351 46 366 8e-66 218 GAMO_00029212-RA sp|Q9ESF4|LFG1_MOUSE 40.43 324 156 8 53 351 34 345 2e-59 201 GAMO_00029212-RA sp|Q6P6R0|LFG1_RAT 39.71 345 165 11 34 351 20 348 2e-59 201 GAMO_00029212-RA sp|Q9DA39|LFG4_MOUSE 35.59 222 120 7 142 351 27 237 3e-24 103 GAMO_00029212-RA sp|Q49P94|GAAP_VACCL 33.47 239 128 9 113 337 1 222 5e-22 97.1 GAMO_00029233-RA sp|Q2KIK0|SGT1_BOVIN 53.18 299 100 3 5 268 17 310 5e-89 275 GAMO_00029233-RA sp|B0BN85|SGT1_RAT 51.51 299 104 3 5 268 16 308 5e-86 268 GAMO_00029233-RA sp|Q9CX34|SGT1_MOUSE 51.51 299 104 3 5 268 16 308 8e-86 267 GAMO_00029233-RA sp|Q9Y2Z0|SGT1_HUMAN 46.83 331 100 5 5 268 16 337 1e-80 254 GAMO_00029233-RA sp|Q0JL44|SGT1_ORYSJ 30.75 322 160 4 10 268 16 337 5e-36 137 GAMO_00029233-RA sp|Q9SUT5|SGT1B_ARATH 27.99 318 171 4 9 268 11 328 3e-35 135 GAMO_00029233-RA sp|Q9SUR9|SGT1A_ARATH 28.28 297 159 5 24 268 26 320 7e-35 134 GAMO_00029233-RA sp|Q55ED0|SGT1_DICDI 37.72 167 63 3 138 268 196 357 5e-25 107 521 genes have had added function before maker_functional_gff choked particular gene GAMO_00029233. Thank you. Ole On 16 December 2015 at 20:37, Carson Holt wrote: > I?ve seen this exact same error before ( > https://groups.google.com/forum/#!searchin/maker-devel/$2Fmaker_functional_gff$20line$2058/maker-devel/cBuQMKTJj2M/aXGnARZ7JhsJ > ). > > It is caused by the ID from the blast report and input protein > fasta. maker_functional_gff is not a generic script that can work on any > input, it only works on blast results against Uniprot/Swiss-prot. The > script is expecting a very specific header format in both the report and > the protein fasta and if it doesn?t see it, then it is missing certain > pieces of needed information. > > Thanks, > Carson > > On Dec 16, 2015, at 12:27 PM, Daniel Ence wrote: > > Hi Ole, can you send a line for a gene feature that does work? > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On Dec 14, 2015, at 12:21 PM, Ole Kristian T?rresen < > ole.toerresen at gmail.com> wrote: > > Hi, > I'm trying to update my annotation with some functional annotations > with maker_functional_gff, but get this annoying error: > Can't use string ("") as a HASH ref while "strict refs" in use at > /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, > <$IN> line 108947. > > Line 108947 in the input gff is this: > > LG08 maker gene 13786695 13806565 . - . > ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; > > It seems like the regexp in line 55 in the maker_functional_gff script > doesn't pick up the ID, but I can't see any difference between that line > and other similar lines. > > Any help to trace down this is really appreciated. Do you need any other > information? > > Thank you. > > Sincerely, > > Ole Kristian T?rresen > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Dec 16 13:55:14 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 16 Dec 2015 13:55:14 -0700 Subject: [maker-devel] Error with maker_functional_gff In-Reply-To: References: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> Message-ID: Find the hit for GAMO_00029233 and then pull it?s header line out of the Uniprot fasta file. There may be an unexpected formatting difference in that header. ?Carson > On Dec 16, 2015, at 1:53 PM, Ole Kristian T?rresen wrote: > > Daniel, > this is the previous gene, before maker_functional_gff: > LG08 maker gene 13648888 13656687 . - . ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325; > LG08 maker mRNA 13648888 13656687 . - . ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45; > LG08 maker exon 13648888 13648944 . - . ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; > LG08 maker exon 13649295 13649577 . - . ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; > LG08 maker exon 13649816 13651468 . - . ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; > LG08 maker exon 13651736 13651789 . - . ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; > LG08 maker exon 13652270 13652365 . - . ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; > LG08 maker exon 13652643 13652730 . - . ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; > LG08 maker exon 13653175 13653212 . - . ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; > LG08 maker exon 13653587 13653641 . - . ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; > LG08 maker exon 13653764 13653817 . - . ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; > LG08 maker exon 13653910 13653974 . - . ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; > LG08 maker exon 13654085 13654164 . - . ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; > LG08 maker exon 13654474 13654828 . - . ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; > LG08 maker exon 13656667 13656687 . - . ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; > LG08 maker CDS 13656667 13656687 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654474 13654828 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654085 13654164 . - 2 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653910 13653974 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653764 13653817 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653587 13653641 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653175 13653212 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652643 13652730 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652270 13652365 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651736 13651789 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651319 13651468 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649816 13651318 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649295 13649577 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13648888 13648944 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; > LG08 maker mRNA 13786695 13806565 . - . ID=GAMO_00029233-RA;Parent=GAMO_00029233;Name=GAMO_00029233-RA;Alias=maker-LG08-snap-gene-46.343-mRNA-1;_AED=0.47;_QI=173|0.78|0.66|1|0.21|0.26|15|0|301;_eAED=0.47; > > After : > LG08 maker gene 13648888 13656687 . - . ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325;Note=Similar to Tmbim1: Protein lifeguard 3 (Mus musculus); > LG08 maker mRNA 13648888 13656687 . - . ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45;Note=Similar to Tmbim1: Protein lifeguard 3 (Mus musculus); > LG08 maker exon 13648888 13648944 . - . ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; > LG08 maker exon 13649295 13649577 . - . ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; > LG08 maker exon 13649816 13651468 . - . ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; > LG08 maker exon 13651736 13651789 . - . ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; > LG08 maker exon 13652270 13652365 . - . ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; > LG08 maker exon 13652643 13652730 . - . ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; > LG08 maker exon 13653175 13653212 . - . ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; > LG08 maker exon 13653587 13653641 . - . ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; > LG08 maker exon 13653764 13653817 . - . ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; > LG08 maker exon 13653910 13653974 . - . ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; > LG08 maker exon 13654085 13654164 . - . ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; > LG08 maker exon 13654474 13654828 . - . ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; > LG08 maker exon 13656667 13656687 . - . ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; > LG08 maker CDS 13656667 13656687 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654474 13654828 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654085 13654164 . - 2 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653910 13653974 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653764 13653817 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653587 13653641 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653175 13653212 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652643 13652730 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652270 13652365 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651736 13651789 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651319 13651468 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649816 13651318 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649295 13649577 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13648888 13648944 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > > Carson, I saw that, but I did use Uniprot/Swiss-prot. A snap of the blast-output used as input here: > GAMO_00029212-RA sp|Q8BJZ3|LFG3_MOUSE 53.93 280 112 3 81 348 33 307 2e-92 285 > GAMO_00029212-RA sp|Q969X1|LFG3_HUMAN 54.51 288 103 5 76 347 33 308 4e-92 284 > GAMO_00029212-RA sp|Q9BWQ8|LFG2_HUMAN 45.73 328 134 6 44 351 13 316 2e-86 270 > GAMO_00029212-RA sp|Q5R4I4|LFG2_PONAB 45.73 328 134 6 44 351 13 316 3e-86 269 > GAMO_00029212-RA sp|Q1LZ71|LFG2_BOVIN 45.03 322 145 5 44 351 13 316 5e-84 264 > GAMO_00029212-RA sp|O88407|LFG2_RAT 44.65 327 139 6 44 351 13 316 8e-83 261 > GAMO_00029212-RA sp|Q8K097|LFG2_MOUSE 45.16 310 129 5 60 351 31 317 1e-80 255 > GAMO_00029212-RA sp|Q7Z429|LFG1_HUMAN 39.32 351 164 9 32 351 39 371 6e-69 226 > GAMO_00029212-RA sp|Q32L53|LFG1_BOVIN 41.69 343 158 8 29 351 46 366 8e-66 218 > GAMO_00029212-RA sp|Q9ESF4|LFG1_MOUSE 40.43 324 156 8 53 351 34 345 2e-59 201 > GAMO_00029212-RA sp|Q6P6R0|LFG1_RAT 39.71 345 165 11 34 351 20 348 2e-59 201 > GAMO_00029212-RA sp|Q9DA39|LFG4_MOUSE 35.59 222 120 7 142 351 27 237 3e-24 103 > GAMO_00029212-RA sp|Q49P94|GAAP_VACCL 33.47 239 128 9 113 337 1 222 5e-22 97.1 > GAMO_00029233-RA sp|Q2KIK0|SGT1_BOVIN 53.18 299 100 3 5 268 17 310 5e-89 275 > GAMO_00029233-RA sp|B0BN85|SGT1_RAT 51.51 299 104 3 5 268 16 308 5e-86 268 > GAMO_00029233-RA sp|Q9CX34|SGT1_MOUSE 51.51 299 104 3 5 268 16 308 8e-86 267 > GAMO_00029233-RA sp|Q9Y2Z0|SGT1_HUMAN 46.83 331 100 5 5 268 16 337 1e-80 254 > GAMO_00029233-RA sp|Q0JL44|SGT1_ORYSJ 30.75 322 160 4 10 268 16 337 5e-36 137 > GAMO_00029233-RA sp|Q9SUT5|SGT1B_ARATH 27.99 318 171 4 9 268 11 328 3e-35 135 > GAMO_00029233-RA sp|Q9SUR9|SGT1A_ARATH 28.28 297 159 5 24 268 26 320 7e-35 134 > GAMO_00029233-RA sp|Q55ED0|SGT1_DICDI 37.72 167 63 3 138 268 196 357 5e-25 107 > > 521 genes have had added function before maker_functional_gff choked particular gene GAMO_00029233. > > Thank you. > > Ole > > > On 16 December 2015 at 20:37, Carson Holt > wrote: > I?ve seen this exact same error before (https://groups.google.com/forum/#!searchin/maker-devel/$2Fmaker_functional_gff$20line$2058/maker-devel/cBuQMKTJj2M/aXGnARZ7JhsJ ). > > It is caused by the ID from the blast report and input protein fasta. maker_functional_gff is not a generic script that can work on any input, it only works on blast results against Uniprot/Swiss-prot. The script is expecting a very specific header format in both the report and the protein fasta and if it doesn?t see it, then it is missing certain pieces of needed information. > > Thanks, > Carson > >> On Dec 16, 2015, at 12:27 PM, Daniel Ence > wrote: >> >> Hi Ole, can you send a line for a gene feature that does work? >> >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> >>> On Dec 14, 2015, at 12:21 PM, Ole Kristian T?rresen > wrote: >>> >>> Hi, >>> I'm trying to update my annotation with some functional annotations with maker_functional_gff, but get this annoying error: >>> Can't use string ("") as a HASH ref while "strict refs" in use at /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, <$IN> line 108947. >>> Line 108947 in the input gff is this: >>> >>> LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; >>> It seems like the regexp in line 55 in the maker_functional_gff script doesn't pick up the ID, but I can't see any difference between that line and other similar lines. >>> >>> Any help to trace down this is really appreciated. Do you need any other information? >>> >>> Thank you. >>> >>> Sincerely, >>> >>> Ole Kristian T?rresen >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Wed Dec 16 16:41:48 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Thu, 17 Dec 2015 10:41:48 +1100 Subject: [maker-devel] First time using maker- Train or not to train? In-Reply-To: References: <084E7DB7-0A91-458E-B590-58BB6CC42E70@yahoo.com> Message-ID: Hi Daniel, Have you guys heard about BUSCO ? It's kind of a replacement for CEGMA, which was based in a rather limited set of genes (according to their devels we should stop using). BUSCO does not only produces a more thorough completeness profile but it also generates the Augustus species training profile (it needs access to your local Augustus species folder). According to the manual, if you use the --long option it is similar to a training and retraining step in the old training method. I recently used it for training Augustus for my fungal genomes and it works well. Unfortunately, it may not apply for this case as they don't have the plant profile dataset ready yet. You may request early access to it though I used to use the CEGMA output plus the webAugustus training service, a bit more tedious but not that complicated. I copy below what I had in my old protocol, nonetheless I would recommend any other user not dealing with plant genomes to use BUSCO instead: Augustus gff files are a bit different from CEGMA ones. Get the CEGMA > output and run the following script: > cegma2zff output.cegma.gff > augustus.gff > > Upload the genome file (e.g. contigs.fa from velvet) and the "training > gene structure file" (augustus.gff) to > http://bioinf.uni-greifswald.de/webaugustus/training/create > > Once finished, the "Species parameter archive" (parameters.tar.gz) will > contain a folder with the model files for your species. Copy it to the > species folder of Augustus (augustus/config/species). > > Re-training > > From Maker's output, follow the the same initial instructions as for SNAP > training detailed in the Maker tutorial: > In the directory that contains MYGENOME.maker.output/ folder: > mkdir snap > cd snap > gff3_merge -d > ../MYGENOME.maker.output/MYGENOME_master_datastore_index.log > maker2zff -n MYGENOME.all.gff > The option -n is not included in the original tutorial but you may end > with empty genome.ann and genome.dna files. > From this point we generate training files for both SNAP and Augustus: > > fathom genome.ann genome.dna -categorize 1000 > fathom uni.ann uni.dna -export 1000 -plus > forge export.ann export.dna > > For Augustus, we need the script "zff2augustus_gbk.pl". This will take > the export.dna generated by fathom and generate a *.gb file that will be > used as "training gene structure file" in a new training submission in > WebAugustus, but remember to give it a new name in the submission, e.g. > MYGENOME_v2, or Maker won't see the difference (same name): > perl PATH/TO/SCRIPT/zff2augustus_gbk.pl > MYGENOME_v2.train.gb > Xabier On 17 December 2015 at 05:07, Daniel Ence wrote: > Hi Elyssa, > > Setting est2genome=1 tells MAKER to promote all of the est2genome > alignments to a gene model, which is not what you want for a final gene > set. That being said, since your gene models are basically the unmodified > alignments, I?m surprised that all of them have an AED of 1, since that > means that they?re not supported by any of the evidence (either est or > protein). > > Did you get gene models from snap or augustus? You can gather those with > the fasta_merge script. Those should be a good starting place for training > ab initio predictors. Instructions for training snap can be found here: > http://gmod.org/wiki/MAKER_Tutorial#Training_ab_initio_Gene_Predictors > > Augustus can also be trained but is much more involved. > > ~Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On Dec 11, 2015, at 10:43 AM, Elyssa Garza wrote: > > Hello, > > I have recently begun running Maker. I am currently trying to annotate my > Caulanthus Genome (~372Mb); a relative to Arabidopsis. I am unsure about > the parameters I have chosen for my first run in maker, which include: > > genome=CAB_assembly.fasta (1044 contigs) > est=Representative_transcript_loci.fasta (assembled transcripts btw > 200-20000bp long) > protein=TAIR10pep.fasta (Arabidopsis proteins) > ? > *Repeat masking* > model_org=arabidopsis > rmlib=list of Brassicaceae and common plant repeats > repeat_protein=te_proteins.fasta > *Gene Prediction* > snaphmm=A.thaliana.hmm > augustus_species=arabidopsis > est2genome=1 > > I have run a sample file of scaffolds, as well as the entire genome. > In the sample file of scaffolds, I gff3merged the gffs and then ran > evaluator. I noticed that my AED are all 1. Is this bad? What should I > try next? > > I am also unsure on how to train files and if this should be done in my > case. > > Can anyone advise me on these issues? > > -Elyssa > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Dec 16 17:13:29 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 16 Dec 2015 17:13:29 -0700 Subject: [maker-devel] First time using maker- Train or not to train? In-Reply-To: References: <084E7DB7-0A91-458E-B590-58BB6CC42E70@yahoo.com> Message-ID: Yes. BUSCO is awesome. Also they have presentations this year at PAG in both the ?Next Generation Genome Annotation and Analysis? and ?Computational Gene Discovery? workshops. ?Carson > On Dec 16, 2015, at 4:41 PM, Xabier V?zquez Campos wrote: > > Hi Daniel, > > Have you guys heard about BUSCO ? It's kind of a replacement for CEGMA, which was based in a rather limited set of genes (according to their devels we should stop using). BUSCO does not only produces a more thorough completeness profile but it also generates the Augustus species training profile (it needs access to your local Augustus species folder). According to the manual, if you use the --long option it is similar to a training and retraining step in the old training method. > > I recently used it for training Augustus for my fungal genomes and it works well. Unfortunately, it may not apply for this case as they don't have the plant profile dataset ready yet. You may request early access to it though > > I used to use the CEGMA output plus the webAugustus training service, a bit more tedious but not that complicated. I copy below what I had in my old protocol, nonetheless I would recommend any other user not dealing with plant genomes to use BUSCO instead: > > Augustus gff files are a bit different from CEGMA ones. Get the CEGMA output and run the following script: > cegma2zff output.cegma.gff > augustus.gff > > Upload the genome file (e.g. contigs.fa from velvet) and the "training gene structure file" (augustus.gff) to http://bioinf.uni-greifswald.de/webaugustus/training/create > > Once finished, the "Species parameter archive" (parameters.tar.gz) will contain a folder with the model files for your species. Copy it to the species folder of Augustus (augustus/config/species). > > Re-training > > From Maker's output, follow the the same initial instructions as for SNAP training detailed in the Maker tutorial: > In the directory that contains MYGENOME.maker.output/ folder: > mkdir snap > cd snap > gff3_merge -d ../MYGENOME.maker.output/MYGENOME_master_datastore_index.log > maker2zff -n MYGENOME.all.gff > The option -n is not included in the original tutorial but you may end with empty genome.ann and genome.dna files. > From this point we generate training files for both SNAP and Augustus: > > fathom genome.ann genome.dna -categorize 1000 > fathom uni.ann uni.dna -export 1000 -plus > forge export.ann export.dna > > For Augustus, we need the script "zff2augustus_gbk.pl ". This will take the export.dna generated by fathom and generate a *.gb file that will be used as "training gene structure file" in a new training submission in WebAugustus, but remember to give it a new name in the submission, e.g. MYGENOME_v2, or Maker won't see the difference (same name): > perl PATH/TO/SCRIPT/zff2augustus_gbk.pl > MYGENOME_v2.train.gb > > Xabier > > On 17 December 2015 at 05:07, Daniel Ence > wrote: > Hi Elyssa, > > Setting est2genome=1 tells MAKER to promote all of the est2genome alignments to a gene model, which is not what you want for a final gene set. That being said, since your gene models are basically the unmodified alignments, I?m surprised that all of them have an AED of 1, since that means that they?re not supported by any of the evidence (either est or protein). > > Did you get gene models from snap or augustus? You can gather those with the fasta_merge script. Those should be a good starting place for training ab initio predictors. Instructions for training snap can be found here: > http://gmod.org/wiki/MAKER_Tutorial#Training_ab_initio_Gene_Predictors > > Augustus can also be trained but is much more involved. > > ~Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Dec 11, 2015, at 10:43 AM, Elyssa Garza > wrote: >> >> Hello, >> >> I have recently begun running Maker. I am currently trying to annotate my Caulanthus Genome (~372Mb); a relative to Arabidopsis. I am unsure about the parameters I have chosen for my first run in maker, which include: >> >> genome=CAB_assembly.fasta (1044 contigs) >> est=Representative_transcript_loci.fasta (assembled transcripts btw 200-20000bp long) >> protein=TAIR10pep.fasta (Arabidopsis proteins) >> ? >> Repeat masking >> model_org=arabidopsis >> rmlib=list of Brassicaceae and common plant repeats >> repeat_protein=te_proteins.fasta >> Gene Prediction >> snaphmm=A.thaliana.hmm >> augustus_species=arabidopsis >> est2genome=1 >> >> I have run a sample file of scaffolds, as well as the entire genome. >> In the sample file of scaffolds, I gff3merged the gffs and then ran evaluator. I noticed that my AED are all 1. Is this bad? What should I try next? >> >> I am also unsure on how to train files and if this should be done in my case. >> >> Can anyone advise me on these issues? >> >> -Elyssa >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > Xabier V?zquez-Campos, PhD > Research Associate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ole.toerresen at gmail.com Wed Dec 16 23:32:26 2015 From: ole.toerresen at gmail.com (=?UTF-8?Q?Ole_Kristian_T=C3=B8rresen?=) Date: Thu, 17 Dec 2015 07:32:26 +0100 Subject: [maker-devel] Error with maker_functional_gff In-Reply-To: References: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> Message-ID: Here's the hits for GAMO_00029233 >sp|Q9SUR9|SGT1A_ARATH Protein SGT1 homolog A OS=Arabidopsis thaliana GN=SGT1A PE=1 SV=1 >sp|Q9SUT5|SGT1B_ARATH Protein SGT1 homolog B OS=Arabidopsis thaliana GN=SGT1B PE=1 SV=1 >sp|Q2KIK0|SGT1_BOVIN Protein SGT1 homolog OS=Bos taurus GN=SUGT1 PE=2 SV=1 >sp|Q55ED0|SGT1_DICDI Protein SGT1 homolog OS=Dictyostelium discoideum GN=sugt1 PE=2 SV=1 >sp|Q9Y2Z0|SGT1_HUMAN Protein SGT1 homolog OS=Homo sapiens GN=SUGT1 PE=1 SV=3 >sp|Q9CX34|SGT1_MOUSE Protein SGT1 homolog OS=Mus musculus GN=Sugt1 PE=1 SV=3 >sp|Q0JL44|SGT1_ORYSJ Protein SGT1 homolog OS=Oryza sativa subsp. japonica GN=SGT1 PE=1 SV=1 >sp|B0BN85|SGT1_RAT Protein SGT1 homolog OS=Rattus norvegicus GN=Sugt1 PE=2 SV=1 The bovin is the first hit. I can't really see anything different about that. I'm don't know perl that well. Do you have some code which I can use to debug this? In line 58 it tries to access the blast hash with the ID as a key, if I understand this correctly. Either the hash is empty where the key tries to access, or the key is empty. If I could print each ID as it is found, maybe I can find a pattern. And/or print each blast entry when the blast hash is created. Thank you. Ole On 16 December 2015 at 21:55, Carson Holt wrote: > Find the hit for GAMO_00029233 and then pull it?s header line out of the > Uniprot fasta file. There may be an unexpected formatting difference in > that header. > > ?Carson > > > > On Dec 16, 2015, at 1:53 PM, Ole Kristian T?rresen < > ole.toerresen at gmail.com> wrote: > > Daniel, > this is the previous gene, before maker_functional_gff: > LG08 maker gene 13648888 13656687 . - . > ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325; > LG08 maker mRNA 13648888 13656687 . - . > > ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45; > LG08 maker exon 13648888 13648944 . - . > ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; > LG08 maker exon 13649295 13649577 . - . > ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; > LG08 maker exon 13649816 13651468 . - . > ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; > LG08 maker exon 13651736 13651789 . - . > ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; > LG08 maker exon 13652270 13652365 . - . > ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; > LG08 maker exon 13652643 13652730 . - . > ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; > LG08 maker exon 13653175 13653212 . - . > ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; > LG08 maker exon 13653587 13653641 . - . > ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; > LG08 maker exon 13653764 13653817 . - . > ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; > LG08 maker exon 13653910 13653974 . - . > ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; > LG08 maker exon 13654085 13654164 . - . > ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; > LG08 maker exon 13654474 13654828 . - . > ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; > LG08 maker exon 13656667 13656687 . - . > ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; > LG08 maker CDS 13656667 13656687 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654474 13654828 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654085 13654164 . - 2 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653910 13653974 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653764 13653817 . - 1 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653587 13653641 . - 1 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653175 13653212 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652643 13652730 . - 1 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652270 13652365 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651736 13651789 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651319 13651468 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649816 13651318 . - > . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649295 13649577 . - > . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13648888 13648944 . - > . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker gene 13786695 13806565 . - . > ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; > LG08 maker mRNA 13786695 13806565 . - . > > ID=GAMO_00029233-RA;Parent=GAMO_00029233;Name=GAMO_00029233-RA;Alias=maker-LG08-snap-gene-46.343-mRNA-1;_AED=0.47;_QI=173|0.78|0.66|1|0.21|0.26|15|0|301;_eAED=0.47; > > After : > LG08 maker gene 13648888 13656687 . - . > > ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325;Note=Similar > to Tmbim1: Protein lifeguard 3 (Mus musculus); > LG08 maker mRNA 13648888 13656687 . - . > > ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45;Note=Similar > to Tmbim1: Protein lifeguard 3 (Mus musculus); > LG08 maker exon 13648888 13648944 . - . > ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; > LG08 maker exon 13649295 13649577 . - . > ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; > LG08 maker exon 13649816 13651468 . - . > ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; > LG08 maker exon 13651736 13651789 . - . > ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; > LG08 maker exon 13652270 13652365 . - . > ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; > LG08 maker exon 13652643 13652730 . - . > ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; > LG08 maker exon 13653175 13653212 . - . > ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; > LG08 maker exon 13653587 13653641 . - . > ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; > LG08 maker exon 13653764 13653817 . - . > ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; > LG08 maker exon 13653910 13653974 . - . > ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; > LG08 maker exon 13654085 13654164 . - . > ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; > LG08 maker exon 13654474 13654828 . - . > ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; > LG08 maker exon 13656667 13656687 . - . > ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; > LG08 maker CDS 13656667 13656687 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654474 13654828 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654085 13654164 . - 2 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653910 13653974 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653764 13653817 . - 1 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653587 13653641 . - 1 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653175 13653212 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652643 13652730 . - 1 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652270 13652365 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651736 13651789 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651319 13651468 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649816 13651318 . - > . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649295 13649577 . - > . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13648888 13648944 . - > . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > > Carson, I saw that, but I did use Uniprot/Swiss-prot. A snap of the > blast-output used as input here: > GAMO_00029212-RA sp|Q8BJZ3|LFG3_MOUSE 53.93 280 112 3 > 81 348 33 307 2e-92 285 > GAMO_00029212-RA sp|Q969X1|LFG3_HUMAN 54.51 288 103 5 > 76 347 33 308 4e-92 284 > GAMO_00029212-RA sp|Q9BWQ8|LFG2_HUMAN 45.73 328 134 6 > 44 351 13 316 2e-86 270 > GAMO_00029212-RA sp|Q5R4I4|LFG2_PONAB 45.73 328 134 6 > 44 351 13 316 3e-86 269 > GAMO_00029212-RA sp|Q1LZ71|LFG2_BOVIN 45.03 322 145 5 > 44 351 13 316 5e-84 264 > GAMO_00029212-RA sp|O88407|LFG2_RAT 44.65 327 139 6 > 44 351 13 316 8e-83 261 > GAMO_00029212-RA sp|Q8K097|LFG2_MOUSE 45.16 310 129 5 > 60 351 31 317 1e-80 255 > GAMO_00029212-RA sp|Q7Z429|LFG1_HUMAN 39.32 351 164 9 > 32 351 39 371 6e-69 226 > GAMO_00029212-RA sp|Q32L53|LFG1_BOVIN 41.69 343 158 8 > 29 351 46 366 8e-66 218 > GAMO_00029212-RA sp|Q9ESF4|LFG1_MOUSE 40.43 324 156 8 > 53 351 34 345 2e-59 201 > GAMO_00029212-RA sp|Q6P6R0|LFG1_RAT 39.71 345 165 11 > 34 351 20 348 2e-59 201 > GAMO_00029212-RA sp|Q9DA39|LFG4_MOUSE 35.59 222 120 7 > 142 351 27 237 3e-24 103 > GAMO_00029212-RA sp|Q49P94|GAAP_VACCL 33.47 239 128 9 > 113 337 1 222 5e-22 97.1 > GAMO_00029233-RA sp|Q2KIK0|SGT1_BOVIN 53.18 299 100 3 > 5 268 17 310 5e-89 275 > GAMO_00029233-RA sp|B0BN85|SGT1_RAT 51.51 299 104 3 > 5 268 16 308 5e-86 268 > GAMO_00029233-RA sp|Q9CX34|SGT1_MOUSE 51.51 299 104 3 > 5 268 16 308 8e-86 267 > GAMO_00029233-RA sp|Q9Y2Z0|SGT1_HUMAN 46.83 331 100 5 > 5 268 16 337 1e-80 254 > GAMO_00029233-RA sp|Q0JL44|SGT1_ORYSJ 30.75 322 160 4 > 10 268 16 337 5e-36 137 > GAMO_00029233-RA sp|Q9SUT5|SGT1B_ARATH 27.99 318 171 4 > 9 268 11 328 3e-35 135 > GAMO_00029233-RA sp|Q9SUR9|SGT1A_ARATH 28.28 297 159 5 > 24 268 26 320 7e-35 134 > GAMO_00029233-RA sp|Q55ED0|SGT1_DICDI 37.72 167 63 3 > 138 268 196 357 5e-25 107 > > 521 genes have had added function before maker_functional_gff choked > particular gene GAMO_00029233. > > Thank you. > > Ole > > > On 16 December 2015 at 20:37, Carson Holt wrote: > >> I?ve seen this exact same error before ( >> https://groups.google.com/forum/#!searchin/maker-devel/$2Fmaker_functional_gff$20line$2058/maker-devel/cBuQMKTJj2M/aXGnARZ7JhsJ >> ). >> >> It is caused by the ID from the blast report and input protein >> fasta. maker_functional_gff is not a generic script that can work on any >> input, it only works on blast results against Uniprot/Swiss-prot. The >> script is expecting a very specific header format in both the report and >> the protein fasta and if it doesn?t see it, then it is missing certain >> pieces of needed information. >> >> Thanks, >> Carson >> >> On Dec 16, 2015, at 12:27 PM, Daniel Ence >> wrote: >> >> Hi Ole, can you send a line for a gene feature that does work? >> >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> >> On Dec 14, 2015, at 12:21 PM, Ole Kristian T?rresen < >> ole.toerresen at gmail.com> wrote: >> >> Hi, >> I'm trying to update my annotation with some functional annotations >> with maker_functional_gff, but get this annoying error: >> Can't use string ("") as a HASH ref while "strict refs" in use at >> /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, >> <$IN> line 108947. >> >> Line 108947 in the input gff is this: >> >> LG08 maker gene 13786695 13806565 . - . >> ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; >> >> It seems like the regexp in line 55 in the maker_functional_gff script >> doesn't pick up the ID, but I can't see any difference between that line >> and other similar lines. >> >> Any help to trace down this is really appreciated. Do you need any other >> information? >> >> Thank you. >> >> Sincerely, >> >> Ole Kristian T?rresen >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From huazhong at nmsu.edu Thu Dec 17 11:03:56 2015 From: huazhong at nmsu.edu (Hua Zhong) Date: Thu, 17 Dec 2015 18:03:56 +0000 Subject: [maker-devel] maker 2.31.8 segmentation fault when setting up GFF3 output and fasta chunks with mvapich2 Message-ID: Hello, we are using maker (2.31.8) with mvapich2, but the program terminates with a segmentation fault while setting up GFF3 output and fasta chunks. We really have no idea what the problem was. Below is the error message: +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ setting up GFF3 output and fasta chunks setting up GFF3 output and fasta chunks setting up GFF3 output and fasta chunks setting up GFF3 output and fasta chunks setting up GFF3 output and fasta chunks setting up GFF3 output and fasta chunks [fpga04.cluster:mpi_rank_111][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_107][error_sighandler] Caught error: Segmentation fault (signal 11) Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached [fpga04.cluster:mpi_rank_113][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_115][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_114][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_105][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_108][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_110][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_104][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_106][error_sighandler] Caught error: Segmentation fault (signal 11) +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Best regards, Hua -------------- next part -------------- An HTML attachment was scrubbed... URL: From elyssa_garza at yahoo.com Thu Dec 17 15:29:56 2015 From: elyssa_garza at yahoo.com (Elyssa Garza) Date: Thu, 17 Dec 2015 22:29:56 +0000 (UTC) Subject: [maker-devel] First time using maker- Train or not to train? In-Reply-To: References: Message-ID: <802013873.330112.1450391396060.JavaMail.yahoo@mail.yahoo.com> Hi Daniel, I used the pre-trained models of Arabidopsis from SNAP and Augustus for this first run of maker.? Do you think it would be wise to use the run I used previously (shown at the start of the topic) or should I make a new run with the following parameters to use for training?? genome=CAB_assembly.fastaest=RTLs.faaltest=Brassica_oleracea.fasta protein=Arabidopsis_proteins.fastaest2genome=0protein2genome=0SNAP=A.thalianaAugustus=arabidopsismodel_org=arabidopsisrmlib=Brassicaceae_repeats.fastarepeat_protein=te_proteins.fasta At what point would I use est2genome=1?? Also for this plant genome, is it better to use model_org=arabidopsis or model_org=all?? I am also considering using RepeatModeler to create a custom repeat library, but I am not sure it is necessary with all of the repeat information I am putting in already. Any advice is helpful.Thanks,-Elyssa On Wednesday, December 16, 2015 12:07 PM, Daniel Ence wrote: Hi Elyssa,? Setting est2genome=1 tells MAKER to promote all of the est2genome alignments to a gene model, which is not what you want for a final gene set. That being said, since your gene models are basically the unmodified alignments, I?m surprised that all of them have an AED of 1, since that means that they?re not supported by any of the evidence (either est or protein).? Did you get gene models from snap or augustus? You can gather those with the fasta_merge script. Those should be a good starting place for training ab initio predictors. Instructions for training snap can be found here:http://gmod.org/wiki/MAKER_Tutorial#Training_ab_initio_Gene_Predictors Augustus can also be trained but is much more involved. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Dec 11, 2015, at 10:43 AM, Elyssa Garza wrote: Hello, I have recently begun running Maker. ?I am currently trying to annotate my Caulanthus Genome (~372Mb); a relative to Arabidopsis. ?I am unsure about the parameters I have chosen for my first run in maker, which include: genome=CAB_assembly.fasta (1044 contigs)est=Representative_transcript_loci.fasta (assembled transcripts btw 200-20000bp long)protein=TAIR10pep.fasta (Arabidopsis proteins)?Repeat maskingmodel_org=arabidopsisrmlib=list of Brassicaceae and common plant repeatsrepeat_protein=te_proteins.fastaGene Predictionsnaphmm=A.thaliana.hmmaugustus_species=arabidopsisest2genome=1 I have run a sample file of scaffolds, as well as the entire genome.In the sample file of scaffolds, I gff3merged the gffs and then ran evaluator. ?I noticed that my AED are all 1. ?Is this bad? ?What should I try next? I am also unsure on how to train files and if this should be done in my case. Can anyone advise me on these issues? -Elyssa_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Dec 17 15:37:43 2015 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 17 Dec 2015 15:37:43 -0700 Subject: [maker-devel] maker 2.31.8 segmentation fault when setting up GFF3 output and fasta chunks with mvapich2 In-Reply-To: References: Message-ID: <417397FD-0BFD-46E6-972F-4792C42FBAC7@gmail.com> MAKER does not work with mvapich2. You must use either OpenMPI or MPICH2. The following is from the INSTALL instructions that come with MAKER ?> If using OpenMPI, make sure to set LD_PRELOAD to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/location/of/openmpi/lib/libmpi.so). 1. Say yes to the 'configure for MPI' question when running 'perl Build.PL? in step 1 of the EASY INSTALL. 2. Give path to 'mpicc'. Note to make sure you do not give the path to ?mpicc' from another MPI flavor that might be installed on your system. 3. Give path to the folder containing 'mpi,h'. Note to make sure you do not give the path to a folder from another MPI flavor that might be installed on your system. Mixing MPI flavors for 'mpicc' and 'mpi.h' will cause failures. Make sure to read and confirm the auto-detected paths. 4. Finish installation according to steps 2-4 of the EASY INSTALL Note: For OpenMPI you may also want to set OMPI_MCA_mpi_warn_on_fork=0 in your ~/.bash_profile to turn off certain nonfatal warnings. Note: If jobs hang or freeze when using mpiexec under OpenMPI try adding the '-mca btl ^openib' flag to mpiexec command when running MAKER. Example: mpiexec -mca btl ^openib -n 20 maker Thanks, Carson > On Dec 17, 2015, at 11:03 AM, Hua Zhong wrote: > > Hello, > we are using maker (2.31.8) with mvapich2, but the program terminates with a segmentation fault while setting up GFF3 output and fasta chunks. We really have no idea what the problem was. > > Below is the error message: > > > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > setting up GFF3 output and fasta chunks > setting up GFF3 output and fasta chunks > setting up GFF3 output and fasta chunks > setting up GFF3 output and fasta chunks > setting up GFF3 output and fasta chunks > setting up GFF3 output and fasta chunks > [fpga04.cluster:mpi_rank_111][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_107][error_sighandler] Caught error: Segmentation fault (signal 11) > Perl exited with active threads: > 1 running and unjoined > 0 finished and unjoined > 0 running and detached > [fpga04.cluster:mpi_rank_113][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_115][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_114][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_105][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_108][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_110][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_104][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_106][error_sighandler] Caught error: Segmentation fault (signal 11) > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Best regards, > > Hua > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From elyssa_garza at yahoo.com Mon Dec 28 13:21:40 2015 From: elyssa_garza at yahoo.com (Elyssa Garza) Date: Mon, 28 Dec 2015 14:21:40 -0600 Subject: [maker-devel] getting AED scores Message-ID: <8611B3D7-76C4-4F37-972E-91055D752D47@yahoo.com> pred_stats=0 #report AED and QI statistics for all predictions as well as models I recently finished a run of maker on my genome and would like to look at the AED score. I usually load the resulting files into CLCbio to see the AED. However, I noticed that pred_stats was an option available in the GMOD 2014 tutorial. I tried using this option and I receive the following warning: WARNING: Invalid option 'pred_stats' in control file maker_opts.ctl Is there a separate script I can use to get these statistics? -Elyssa -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Dec 29 18:43:06 2015 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 29 Dec 2015 18:43:06 -0700 Subject: [maker-devel] maker-devel post from elyssa_garza@yahoo.com requires approval In-Reply-To: References: Message-ID: <3B52FCE6-09AA-48E0-93EF-9D1F8ED2EF0A@gmail.com> It means you have a really old MAKER installation on your system that predates the ?pred_stats? option. You just need to update. Thanks, Carson > > From: Elyssa Garza > Subject: getting AED scores > Date: December 28, 2015 at 1:21:40 PM MST > To: maker-devel at yandell-lab.org > > > pred_stats=0 #report AED and QI statistics for all predictions as well as models > I recently finished a run of maker on my genome and would like to look at the AED score. I usually load the resulting files into CLCbio to see the AED. However, I noticed that pred_stats was an option available in the GMOD 2014 tutorial. I tried using this option and I receive the following warning: > > > WARNING: Invalid option 'pred_stats' in control file maker_opts.ctl > > Is there a separate script I can use to get these statistics? > > -Elyssa > -------------- next part -------------- An HTML attachment was scrubbed... URL: From elyssa_garza at yahoo.com Fri Dec 11 10:43:32 2015 From: elyssa_garza at yahoo.com (Elyssa Garza) Date: Fri, 11 Dec 2015 11:43:32 -0600 Subject: [maker-devel] First time using maker- Train or not to train? Message-ID: <084E7DB7-0A91-458E-B590-58BB6CC42E70@yahoo.com> Hello, I have recently begun running Maker. I am currently trying to annotate my Caulanthus Genome (~372Mb); a relative to Arabidopsis. I am unsure about the parameters I have chosen for my first run in maker, which include: genome=CAB_assembly.fasta (1044 contigs) est=Representative_transcript_loci.fasta (assembled transcripts btw 200-20000bp long) protein=TAIR10pep.fasta (Arabidopsis proteins) ? Repeat masking model_org=arabidopsis rmlib=list of Brassicaceae and common plant repeats repeat_protein=te_proteins.fasta Gene Prediction snaphmm=A.thaliana.hmm augustus_species=arabidopsis est2genome=1 I have run a sample file of scaffolds, as well as the entire genome. In the sample file of scaffolds, I gff3merged the gffs and then ran evaluator. I noticed that my AED are all 1. Is this bad? What should I try next? I am also unsure on how to train files and if this should be done in my case. Can anyone advise me on these issues? -Elyssa -------------- next part -------------- An HTML attachment was scrubbed... URL: From ole.toerresen at gmail.com Mon Dec 14 12:21:11 2015 From: ole.toerresen at gmail.com (=?UTF-8?Q?Ole_Kristian_T=C3=B8rresen?=) Date: Mon, 14 Dec 2015 20:21:11 +0100 Subject: [maker-devel] Error with maker_functional_gff Message-ID: Hi, I'm trying to update my annotation with some functional annotations with maker_functional_gff, but get this annoying error: Can't use string ("") as a HASH ref while "strict refs" in use at /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, <$IN> line 108947. Line 108947 in the input gff is this: LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; It seems like the regexp in line 55 in the maker_functional_gff script doesn't pick up the ID, but I can't see any difference between that line and other similar lines. Any help to trace down this is really appreciated. Do you need any other information? Thank you. Sincerely, Ole Kristian T?rresen -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Dec 16 11:07:07 2015 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 16 Dec 2015 18:07:07 +0000 Subject: [maker-devel] First time using maker- Train or not to train? In-Reply-To: <084E7DB7-0A91-458E-B590-58BB6CC42E70@yahoo.com> References: <084E7DB7-0A91-458E-B590-58BB6CC42E70@yahoo.com> Message-ID: Hi Elyssa, Setting est2genome=1 tells MAKER to promote all of the est2genome alignments to a gene model, which is not what you want for a final gene set. That being said, since your gene models are basically the unmodified alignments, I?m surprised that all of them have an AED of 1, since that means that they?re not supported by any of the evidence (either est or protein). Did you get gene models from snap or augustus? You can gather those with the fasta_merge script. Those should be a good starting place for training ab initio predictors. Instructions for training snap can be found here: http://gmod.org/wiki/MAKER_Tutorial#Training_ab_initio_Gene_Predictors Augustus can also be trained but is much more involved. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Dec 11, 2015, at 10:43 AM, Elyssa Garza > wrote: Hello, I have recently begun running Maker. I am currently trying to annotate my Caulanthus Genome (~372Mb); a relative to Arabidopsis. I am unsure about the parameters I have chosen for my first run in maker, which include: genome=CAB_assembly.fasta (1044 contigs) est=Representative_transcript_loci.fasta (assembled transcripts btw 200-20000bp long) protein=TAIR10pep.fasta (Arabidopsis proteins) ? Repeat masking model_org=arabidopsis rmlib=list of Brassicaceae and common plant repeats repeat_protein=te_proteins.fasta Gene Prediction snaphmm=A.thaliana.hmm augustus_species=arabidopsis est2genome=1 I have run a sample file of scaffolds, as well as the entire genome. In the sample file of scaffolds, I gff3merged the gffs and then ran evaluator. I noticed that my AED are all 1. Is this bad? What should I try next? I am also unsure on how to train files and if this should be done in my case. Can anyone advise me on these issues? -Elyssa _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Dec 16 12:27:00 2015 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 16 Dec 2015 19:27:00 +0000 Subject: [maker-devel] Error with maker_functional_gff In-Reply-To: References: Message-ID: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> Hi Ole, can you send a line for a gene feature that does work? Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Dec 14, 2015, at 12:21 PM, Ole Kristian T?rresen > wrote: Hi, I'm trying to update my annotation with some functional annotations with maker_functional_gff, but get this annoying error: Can't use string ("") as a HASH ref while "strict refs" in use at /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, <$IN> line 108947. Line 108947 in the input gff is this: LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; It seems like the regexp in line 55 in the maker_functional_gff script doesn't pick up the ID, but I can't see any difference between that line and other similar lines. Any help to trace down this is really appreciated. Do you need any other information? Thank you. Sincerely, Ole Kristian T?rresen _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Dec 16 12:37:14 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 16 Dec 2015 12:37:14 -0700 Subject: [maker-devel] Error with maker_functional_gff In-Reply-To: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> References: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> Message-ID: I?ve seen this exact same error before (https://groups.google.com/forum/#!searchin/maker-devel/$2Fmaker_functional_gff$20line$2058/maker-devel/cBuQMKTJj2M/aXGnARZ7JhsJ). It is caused by the ID from the blast report and input protein fasta. maker_functional_gff is not a generic script that can work on any input, it only works on blast results against Uniprot/Swiss-prot. The script is expecting a very specific header format in both the report and the protein fasta and if it doesn?t see it, then it is missing certain pieces of needed information. Thanks, Carson > On Dec 16, 2015, at 12:27 PM, Daniel Ence wrote: > > Hi Ole, can you send a line for a gene feature that does work? > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Dec 14, 2015, at 12:21 PM, Ole Kristian T?rresen > wrote: >> >> Hi, >> I'm trying to update my annotation with some functional annotations with maker_functional_gff, but get this annoying error: >> Can't use string ("") as a HASH ref while "strict refs" in use at /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, <$IN> line 108947. >> Line 108947 in the input gff is this: >> >> LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; >> It seems like the regexp in line 55 in the maker_functional_gff script doesn't pick up the ID, but I can't see any difference between that line and other similar lines. >> >> Any help to trace down this is really appreciated. Do you need any other information? >> >> Thank you. >> >> Sincerely, >> >> Ole Kristian T?rresen >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ole.toerresen at gmail.com Wed Dec 16 13:53:25 2015 From: ole.toerresen at gmail.com (=?UTF-8?Q?Ole_Kristian_T=C3=B8rresen?=) Date: Wed, 16 Dec 2015 21:53:25 +0100 Subject: [maker-devel] Error with maker_functional_gff In-Reply-To: References: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> Message-ID: Daniel, this is the previous gene, before maker_functional_gff: LG08 maker gene 13648888 13656687 . - . ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325; LG08 maker mRNA 13648888 13656687 . - . ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45; LG08 maker exon 13648888 13648944 . - . ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; LG08 maker exon 13649295 13649577 . - . ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; LG08 maker exon 13649816 13651468 . - . ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; LG08 maker exon 13651736 13651789 . - . ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; LG08 maker exon 13652270 13652365 . - . ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; LG08 maker exon 13652643 13652730 . - . ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; LG08 maker exon 13653175 13653212 . - . ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; LG08 maker exon 13653587 13653641 . - . ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; LG08 maker exon 13653764 13653817 . - . ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; LG08 maker exon 13653910 13653974 . - . ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; LG08 maker exon 13654085 13654164 . - . ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; LG08 maker exon 13654474 13654828 . - . ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; LG08 maker exon 13656667 13656687 . - . ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; LG08 maker CDS 13656667 13656687 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13654474 13654828 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13654085 13654164 . - 2 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653910 13653974 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653764 13653817 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653587 13653641 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653175 13653212 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13652643 13652730 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13652270 13652365 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13651736 13651789 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13651319 13651468 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker three_prime_UTR 13649816 13651318 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; LG08 maker three_prime_UTR 13649295 13649577 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; LG08 maker three_prime_UTR 13648888 13648944 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; LG08 maker mRNA 13786695 13806565 . - . ID=GAMO_00029233-RA;Parent=GAMO_00029233;Name=GAMO_00029233-RA;Alias=maker-LG08-snap-gene-46.343-mRNA-1;_AED=0.47;_QI=173|0.78|0.66|1|0.21|0.26|15|0|301;_eAED=0.47; After : LG08 maker gene 13648888 13656687 . - . ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325;Note=Similar to Tmbim1: Protein lifeguard 3 (Mus musculus); LG08 maker mRNA 13648888 13656687 . - . ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45;Note=Similar to Tmbim1: Protein lifeguard 3 (Mus musculus); LG08 maker exon 13648888 13648944 . - . ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; LG08 maker exon 13649295 13649577 . - . ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; LG08 maker exon 13649816 13651468 . - . ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; LG08 maker exon 13651736 13651789 . - . ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; LG08 maker exon 13652270 13652365 . - . ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; LG08 maker exon 13652643 13652730 . - . ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; LG08 maker exon 13653175 13653212 . - . ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; LG08 maker exon 13653587 13653641 . - . ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; LG08 maker exon 13653764 13653817 . - . ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; LG08 maker exon 13653910 13653974 . - . ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; LG08 maker exon 13654085 13654164 . - . ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; LG08 maker exon 13654474 13654828 . - . ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; LG08 maker exon 13656667 13656687 . - . ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; LG08 maker CDS 13656667 13656687 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13654474 13654828 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13654085 13654164 . - 2 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653910 13653974 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653764 13653817 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653587 13653641 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653175 13653212 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13652643 13652730 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13652270 13652365 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13651736 13651789 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13651319 13651468 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker three_prime_UTR 13649816 13651318 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; LG08 maker three_prime_UTR 13649295 13649577 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; LG08 maker three_prime_UTR 13648888 13648944 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; Carson, I saw that, but I did use Uniprot/Swiss-prot. A snap of the blast-output used as input here: GAMO_00029212-RA sp|Q8BJZ3|LFG3_MOUSE 53.93 280 112 3 81 348 33 307 2e-92 285 GAMO_00029212-RA sp|Q969X1|LFG3_HUMAN 54.51 288 103 5 76 347 33 308 4e-92 284 GAMO_00029212-RA sp|Q9BWQ8|LFG2_HUMAN 45.73 328 134 6 44 351 13 316 2e-86 270 GAMO_00029212-RA sp|Q5R4I4|LFG2_PONAB 45.73 328 134 6 44 351 13 316 3e-86 269 GAMO_00029212-RA sp|Q1LZ71|LFG2_BOVIN 45.03 322 145 5 44 351 13 316 5e-84 264 GAMO_00029212-RA sp|O88407|LFG2_RAT 44.65 327 139 6 44 351 13 316 8e-83 261 GAMO_00029212-RA sp|Q8K097|LFG2_MOUSE 45.16 310 129 5 60 351 31 317 1e-80 255 GAMO_00029212-RA sp|Q7Z429|LFG1_HUMAN 39.32 351 164 9 32 351 39 371 6e-69 226 GAMO_00029212-RA sp|Q32L53|LFG1_BOVIN 41.69 343 158 8 29 351 46 366 8e-66 218 GAMO_00029212-RA sp|Q9ESF4|LFG1_MOUSE 40.43 324 156 8 53 351 34 345 2e-59 201 GAMO_00029212-RA sp|Q6P6R0|LFG1_RAT 39.71 345 165 11 34 351 20 348 2e-59 201 GAMO_00029212-RA sp|Q9DA39|LFG4_MOUSE 35.59 222 120 7 142 351 27 237 3e-24 103 GAMO_00029212-RA sp|Q49P94|GAAP_VACCL 33.47 239 128 9 113 337 1 222 5e-22 97.1 GAMO_00029233-RA sp|Q2KIK0|SGT1_BOVIN 53.18 299 100 3 5 268 17 310 5e-89 275 GAMO_00029233-RA sp|B0BN85|SGT1_RAT 51.51 299 104 3 5 268 16 308 5e-86 268 GAMO_00029233-RA sp|Q9CX34|SGT1_MOUSE 51.51 299 104 3 5 268 16 308 8e-86 267 GAMO_00029233-RA sp|Q9Y2Z0|SGT1_HUMAN 46.83 331 100 5 5 268 16 337 1e-80 254 GAMO_00029233-RA sp|Q0JL44|SGT1_ORYSJ 30.75 322 160 4 10 268 16 337 5e-36 137 GAMO_00029233-RA sp|Q9SUT5|SGT1B_ARATH 27.99 318 171 4 9 268 11 328 3e-35 135 GAMO_00029233-RA sp|Q9SUR9|SGT1A_ARATH 28.28 297 159 5 24 268 26 320 7e-35 134 GAMO_00029233-RA sp|Q55ED0|SGT1_DICDI 37.72 167 63 3 138 268 196 357 5e-25 107 521 genes have had added function before maker_functional_gff choked particular gene GAMO_00029233. Thank you. Ole On 16 December 2015 at 20:37, Carson Holt wrote: > I?ve seen this exact same error before ( > https://groups.google.com/forum/#!searchin/maker-devel/$2Fmaker_functional_gff$20line$2058/maker-devel/cBuQMKTJj2M/aXGnARZ7JhsJ > ). > > It is caused by the ID from the blast report and input protein > fasta. maker_functional_gff is not a generic script that can work on any > input, it only works on blast results against Uniprot/Swiss-prot. The > script is expecting a very specific header format in both the report and > the protein fasta and if it doesn?t see it, then it is missing certain > pieces of needed information. > > Thanks, > Carson > > On Dec 16, 2015, at 12:27 PM, Daniel Ence wrote: > > Hi Ole, can you send a line for a gene feature that does work? > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On Dec 14, 2015, at 12:21 PM, Ole Kristian T?rresen < > ole.toerresen at gmail.com> wrote: > > Hi, > I'm trying to update my annotation with some functional annotations > with maker_functional_gff, but get this annoying error: > Can't use string ("") as a HASH ref while "strict refs" in use at > /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, > <$IN> line 108947. > > Line 108947 in the input gff is this: > > LG08 maker gene 13786695 13806565 . - . > ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; > > It seems like the regexp in line 55 in the maker_functional_gff script > doesn't pick up the ID, but I can't see any difference between that line > and other similar lines. > > Any help to trace down this is really appreciated. Do you need any other > information? > > Thank you. > > Sincerely, > > Ole Kristian T?rresen > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Dec 16 13:55:14 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 16 Dec 2015 13:55:14 -0700 Subject: [maker-devel] Error with maker_functional_gff In-Reply-To: References: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> Message-ID: Find the hit for GAMO_00029233 and then pull it?s header line out of the Uniprot fasta file. There may be an unexpected formatting difference in that header. ?Carson > On Dec 16, 2015, at 1:53 PM, Ole Kristian T?rresen wrote: > > Daniel, > this is the previous gene, before maker_functional_gff: > LG08 maker gene 13648888 13656687 . - . ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325; > LG08 maker mRNA 13648888 13656687 . - . ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45; > LG08 maker exon 13648888 13648944 . - . ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; > LG08 maker exon 13649295 13649577 . - . ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; > LG08 maker exon 13649816 13651468 . - . ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; > LG08 maker exon 13651736 13651789 . - . ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; > LG08 maker exon 13652270 13652365 . - . ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; > LG08 maker exon 13652643 13652730 . - . ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; > LG08 maker exon 13653175 13653212 . - . ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; > LG08 maker exon 13653587 13653641 . - . ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; > LG08 maker exon 13653764 13653817 . - . ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; > LG08 maker exon 13653910 13653974 . - . ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; > LG08 maker exon 13654085 13654164 . - . ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; > LG08 maker exon 13654474 13654828 . - . ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; > LG08 maker exon 13656667 13656687 . - . ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; > LG08 maker CDS 13656667 13656687 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654474 13654828 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654085 13654164 . - 2 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653910 13653974 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653764 13653817 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653587 13653641 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653175 13653212 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652643 13652730 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652270 13652365 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651736 13651789 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651319 13651468 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649816 13651318 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649295 13649577 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13648888 13648944 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; > LG08 maker mRNA 13786695 13806565 . - . ID=GAMO_00029233-RA;Parent=GAMO_00029233;Name=GAMO_00029233-RA;Alias=maker-LG08-snap-gene-46.343-mRNA-1;_AED=0.47;_QI=173|0.78|0.66|1|0.21|0.26|15|0|301;_eAED=0.47; > > After : > LG08 maker gene 13648888 13656687 . - . ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325;Note=Similar to Tmbim1: Protein lifeguard 3 (Mus musculus); > LG08 maker mRNA 13648888 13656687 . - . ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45;Note=Similar to Tmbim1: Protein lifeguard 3 (Mus musculus); > LG08 maker exon 13648888 13648944 . - . ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; > LG08 maker exon 13649295 13649577 . - . ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; > LG08 maker exon 13649816 13651468 . - . ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; > LG08 maker exon 13651736 13651789 . - . ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; > LG08 maker exon 13652270 13652365 . - . ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; > LG08 maker exon 13652643 13652730 . - . ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; > LG08 maker exon 13653175 13653212 . - . ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; > LG08 maker exon 13653587 13653641 . - . ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; > LG08 maker exon 13653764 13653817 . - . ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; > LG08 maker exon 13653910 13653974 . - . ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; > LG08 maker exon 13654085 13654164 . - . ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; > LG08 maker exon 13654474 13654828 . - . ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; > LG08 maker exon 13656667 13656687 . - . ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; > LG08 maker CDS 13656667 13656687 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654474 13654828 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654085 13654164 . - 2 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653910 13653974 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653764 13653817 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653587 13653641 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653175 13653212 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652643 13652730 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652270 13652365 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651736 13651789 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651319 13651468 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649816 13651318 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649295 13649577 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13648888 13648944 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > > Carson, I saw that, but I did use Uniprot/Swiss-prot. A snap of the blast-output used as input here: > GAMO_00029212-RA sp|Q8BJZ3|LFG3_MOUSE 53.93 280 112 3 81 348 33 307 2e-92 285 > GAMO_00029212-RA sp|Q969X1|LFG3_HUMAN 54.51 288 103 5 76 347 33 308 4e-92 284 > GAMO_00029212-RA sp|Q9BWQ8|LFG2_HUMAN 45.73 328 134 6 44 351 13 316 2e-86 270 > GAMO_00029212-RA sp|Q5R4I4|LFG2_PONAB 45.73 328 134 6 44 351 13 316 3e-86 269 > GAMO_00029212-RA sp|Q1LZ71|LFG2_BOVIN 45.03 322 145 5 44 351 13 316 5e-84 264 > GAMO_00029212-RA sp|O88407|LFG2_RAT 44.65 327 139 6 44 351 13 316 8e-83 261 > GAMO_00029212-RA sp|Q8K097|LFG2_MOUSE 45.16 310 129 5 60 351 31 317 1e-80 255 > GAMO_00029212-RA sp|Q7Z429|LFG1_HUMAN 39.32 351 164 9 32 351 39 371 6e-69 226 > GAMO_00029212-RA sp|Q32L53|LFG1_BOVIN 41.69 343 158 8 29 351 46 366 8e-66 218 > GAMO_00029212-RA sp|Q9ESF4|LFG1_MOUSE 40.43 324 156 8 53 351 34 345 2e-59 201 > GAMO_00029212-RA sp|Q6P6R0|LFG1_RAT 39.71 345 165 11 34 351 20 348 2e-59 201 > GAMO_00029212-RA sp|Q9DA39|LFG4_MOUSE 35.59 222 120 7 142 351 27 237 3e-24 103 > GAMO_00029212-RA sp|Q49P94|GAAP_VACCL 33.47 239 128 9 113 337 1 222 5e-22 97.1 > GAMO_00029233-RA sp|Q2KIK0|SGT1_BOVIN 53.18 299 100 3 5 268 17 310 5e-89 275 > GAMO_00029233-RA sp|B0BN85|SGT1_RAT 51.51 299 104 3 5 268 16 308 5e-86 268 > GAMO_00029233-RA sp|Q9CX34|SGT1_MOUSE 51.51 299 104 3 5 268 16 308 8e-86 267 > GAMO_00029233-RA sp|Q9Y2Z0|SGT1_HUMAN 46.83 331 100 5 5 268 16 337 1e-80 254 > GAMO_00029233-RA sp|Q0JL44|SGT1_ORYSJ 30.75 322 160 4 10 268 16 337 5e-36 137 > GAMO_00029233-RA sp|Q9SUT5|SGT1B_ARATH 27.99 318 171 4 9 268 11 328 3e-35 135 > GAMO_00029233-RA sp|Q9SUR9|SGT1A_ARATH 28.28 297 159 5 24 268 26 320 7e-35 134 > GAMO_00029233-RA sp|Q55ED0|SGT1_DICDI 37.72 167 63 3 138 268 196 357 5e-25 107 > > 521 genes have had added function before maker_functional_gff choked particular gene GAMO_00029233. > > Thank you. > > Ole > > > On 16 December 2015 at 20:37, Carson Holt > wrote: > I?ve seen this exact same error before (https://groups.google.com/forum/#!searchin/maker-devel/$2Fmaker_functional_gff$20line$2058/maker-devel/cBuQMKTJj2M/aXGnARZ7JhsJ ). > > It is caused by the ID from the blast report and input protein fasta. maker_functional_gff is not a generic script that can work on any input, it only works on blast results against Uniprot/Swiss-prot. The script is expecting a very specific header format in both the report and the protein fasta and if it doesn?t see it, then it is missing certain pieces of needed information. > > Thanks, > Carson > >> On Dec 16, 2015, at 12:27 PM, Daniel Ence > wrote: >> >> Hi Ole, can you send a line for a gene feature that does work? >> >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> >>> On Dec 14, 2015, at 12:21 PM, Ole Kristian T?rresen > wrote: >>> >>> Hi, >>> I'm trying to update my annotation with some functional annotations with maker_functional_gff, but get this annoying error: >>> Can't use string ("") as a HASH ref while "strict refs" in use at /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, <$IN> line 108947. >>> Line 108947 in the input gff is this: >>> >>> LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; >>> It seems like the regexp in line 55 in the maker_functional_gff script doesn't pick up the ID, but I can't see any difference between that line and other similar lines. >>> >>> Any help to trace down this is really appreciated. Do you need any other information? >>> >>> Thank you. >>> >>> Sincerely, >>> >>> Ole Kristian T?rresen >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Wed Dec 16 16:41:48 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Thu, 17 Dec 2015 10:41:48 +1100 Subject: [maker-devel] First time using maker- Train or not to train? In-Reply-To: References: <084E7DB7-0A91-458E-B590-58BB6CC42E70@yahoo.com> Message-ID: Hi Daniel, Have you guys heard about BUSCO ? It's kind of a replacement for CEGMA, which was based in a rather limited set of genes (according to their devels we should stop using). BUSCO does not only produces a more thorough completeness profile but it also generates the Augustus species training profile (it needs access to your local Augustus species folder). According to the manual, if you use the --long option it is similar to a training and retraining step in the old training method. I recently used it for training Augustus for my fungal genomes and it works well. Unfortunately, it may not apply for this case as they don't have the plant profile dataset ready yet. You may request early access to it though I used to use the CEGMA output plus the webAugustus training service, a bit more tedious but not that complicated. I copy below what I had in my old protocol, nonetheless I would recommend any other user not dealing with plant genomes to use BUSCO instead: Augustus gff files are a bit different from CEGMA ones. Get the CEGMA > output and run the following script: > cegma2zff output.cegma.gff > augustus.gff > > Upload the genome file (e.g. contigs.fa from velvet) and the "training > gene structure file" (augustus.gff) to > http://bioinf.uni-greifswald.de/webaugustus/training/create > > Once finished, the "Species parameter archive" (parameters.tar.gz) will > contain a folder with the model files for your species. Copy it to the > species folder of Augustus (augustus/config/species). > > Re-training > > From Maker's output, follow the the same initial instructions as for SNAP > training detailed in the Maker tutorial: > In the directory that contains MYGENOME.maker.output/ folder: > mkdir snap > cd snap > gff3_merge -d > ../MYGENOME.maker.output/MYGENOME_master_datastore_index.log > maker2zff -n MYGENOME.all.gff > The option -n is not included in the original tutorial but you may end > with empty genome.ann and genome.dna files. > From this point we generate training files for both SNAP and Augustus: > > fathom genome.ann genome.dna -categorize 1000 > fathom uni.ann uni.dna -export 1000 -plus > forge export.ann export.dna > > For Augustus, we need the script "zff2augustus_gbk.pl". This will take > the export.dna generated by fathom and generate a *.gb file that will be > used as "training gene structure file" in a new training submission in > WebAugustus, but remember to give it a new name in the submission, e.g. > MYGENOME_v2, or Maker won't see the difference (same name): > perl PATH/TO/SCRIPT/zff2augustus_gbk.pl > MYGENOME_v2.train.gb > Xabier On 17 December 2015 at 05:07, Daniel Ence wrote: > Hi Elyssa, > > Setting est2genome=1 tells MAKER to promote all of the est2genome > alignments to a gene model, which is not what you want for a final gene > set. That being said, since your gene models are basically the unmodified > alignments, I?m surprised that all of them have an AED of 1, since that > means that they?re not supported by any of the evidence (either est or > protein). > > Did you get gene models from snap or augustus? You can gather those with > the fasta_merge script. Those should be a good starting place for training > ab initio predictors. Instructions for training snap can be found here: > http://gmod.org/wiki/MAKER_Tutorial#Training_ab_initio_Gene_Predictors > > Augustus can also be trained but is much more involved. > > ~Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On Dec 11, 2015, at 10:43 AM, Elyssa Garza wrote: > > Hello, > > I have recently begun running Maker. I am currently trying to annotate my > Caulanthus Genome (~372Mb); a relative to Arabidopsis. I am unsure about > the parameters I have chosen for my first run in maker, which include: > > genome=CAB_assembly.fasta (1044 contigs) > est=Representative_transcript_loci.fasta (assembled transcripts btw > 200-20000bp long) > protein=TAIR10pep.fasta (Arabidopsis proteins) > ? > *Repeat masking* > model_org=arabidopsis > rmlib=list of Brassicaceae and common plant repeats > repeat_protein=te_proteins.fasta > *Gene Prediction* > snaphmm=A.thaliana.hmm > augustus_species=arabidopsis > est2genome=1 > > I have run a sample file of scaffolds, as well as the entire genome. > In the sample file of scaffolds, I gff3merged the gffs and then ran > evaluator. I noticed that my AED are all 1. Is this bad? What should I > try next? > > I am also unsure on how to train files and if this should be done in my > case. > > Can anyone advise me on these issues? > > -Elyssa > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Dec 16 17:13:29 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 16 Dec 2015 17:13:29 -0700 Subject: [maker-devel] First time using maker- Train or not to train? In-Reply-To: References: <084E7DB7-0A91-458E-B590-58BB6CC42E70@yahoo.com> Message-ID: Yes. BUSCO is awesome. Also they have presentations this year at PAG in both the ?Next Generation Genome Annotation and Analysis? and ?Computational Gene Discovery? workshops. ?Carson > On Dec 16, 2015, at 4:41 PM, Xabier V?zquez Campos wrote: > > Hi Daniel, > > Have you guys heard about BUSCO ? It's kind of a replacement for CEGMA, which was based in a rather limited set of genes (according to their devels we should stop using). BUSCO does not only produces a more thorough completeness profile but it also generates the Augustus species training profile (it needs access to your local Augustus species folder). According to the manual, if you use the --long option it is similar to a training and retraining step in the old training method. > > I recently used it for training Augustus for my fungal genomes and it works well. Unfortunately, it may not apply for this case as they don't have the plant profile dataset ready yet. You may request early access to it though > > I used to use the CEGMA output plus the webAugustus training service, a bit more tedious but not that complicated. I copy below what I had in my old protocol, nonetheless I would recommend any other user not dealing with plant genomes to use BUSCO instead: > > Augustus gff files are a bit different from CEGMA ones. Get the CEGMA output and run the following script: > cegma2zff output.cegma.gff > augustus.gff > > Upload the genome file (e.g. contigs.fa from velvet) and the "training gene structure file" (augustus.gff) to http://bioinf.uni-greifswald.de/webaugustus/training/create > > Once finished, the "Species parameter archive" (parameters.tar.gz) will contain a folder with the model files for your species. Copy it to the species folder of Augustus (augustus/config/species). > > Re-training > > From Maker's output, follow the the same initial instructions as for SNAP training detailed in the Maker tutorial: > In the directory that contains MYGENOME.maker.output/ folder: > mkdir snap > cd snap > gff3_merge -d ../MYGENOME.maker.output/MYGENOME_master_datastore_index.log > maker2zff -n MYGENOME.all.gff > The option -n is not included in the original tutorial but you may end with empty genome.ann and genome.dna files. > From this point we generate training files for both SNAP and Augustus: > > fathom genome.ann genome.dna -categorize 1000 > fathom uni.ann uni.dna -export 1000 -plus > forge export.ann export.dna > > For Augustus, we need the script "zff2augustus_gbk.pl ". This will take the export.dna generated by fathom and generate a *.gb file that will be used as "training gene structure file" in a new training submission in WebAugustus, but remember to give it a new name in the submission, e.g. MYGENOME_v2, or Maker won't see the difference (same name): > perl PATH/TO/SCRIPT/zff2augustus_gbk.pl > MYGENOME_v2.train.gb > > Xabier > > On 17 December 2015 at 05:07, Daniel Ence > wrote: > Hi Elyssa, > > Setting est2genome=1 tells MAKER to promote all of the est2genome alignments to a gene model, which is not what you want for a final gene set. That being said, since your gene models are basically the unmodified alignments, I?m surprised that all of them have an AED of 1, since that means that they?re not supported by any of the evidence (either est or protein). > > Did you get gene models from snap or augustus? You can gather those with the fasta_merge script. Those should be a good starting place for training ab initio predictors. Instructions for training snap can be found here: > http://gmod.org/wiki/MAKER_Tutorial#Training_ab_initio_Gene_Predictors > > Augustus can also be trained but is much more involved. > > ~Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Dec 11, 2015, at 10:43 AM, Elyssa Garza > wrote: >> >> Hello, >> >> I have recently begun running Maker. I am currently trying to annotate my Caulanthus Genome (~372Mb); a relative to Arabidopsis. I am unsure about the parameters I have chosen for my first run in maker, which include: >> >> genome=CAB_assembly.fasta (1044 contigs) >> est=Representative_transcript_loci.fasta (assembled transcripts btw 200-20000bp long) >> protein=TAIR10pep.fasta (Arabidopsis proteins) >> ? >> Repeat masking >> model_org=arabidopsis >> rmlib=list of Brassicaceae and common plant repeats >> repeat_protein=te_proteins.fasta >> Gene Prediction >> snaphmm=A.thaliana.hmm >> augustus_species=arabidopsis >> est2genome=1 >> >> I have run a sample file of scaffolds, as well as the entire genome. >> In the sample file of scaffolds, I gff3merged the gffs and then ran evaluator. I noticed that my AED are all 1. Is this bad? What should I try next? >> >> I am also unsure on how to train files and if this should be done in my case. >> >> Can anyone advise me on these issues? >> >> -Elyssa >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > Xabier V?zquez-Campos, PhD > Research Associate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ole.toerresen at gmail.com Wed Dec 16 23:32:26 2015 From: ole.toerresen at gmail.com (=?UTF-8?Q?Ole_Kristian_T=C3=B8rresen?=) Date: Thu, 17 Dec 2015 07:32:26 +0100 Subject: [maker-devel] Error with maker_functional_gff In-Reply-To: References: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> Message-ID: Here's the hits for GAMO_00029233 >sp|Q9SUR9|SGT1A_ARATH Protein SGT1 homolog A OS=Arabidopsis thaliana GN=SGT1A PE=1 SV=1 >sp|Q9SUT5|SGT1B_ARATH Protein SGT1 homolog B OS=Arabidopsis thaliana GN=SGT1B PE=1 SV=1 >sp|Q2KIK0|SGT1_BOVIN Protein SGT1 homolog OS=Bos taurus GN=SUGT1 PE=2 SV=1 >sp|Q55ED0|SGT1_DICDI Protein SGT1 homolog OS=Dictyostelium discoideum GN=sugt1 PE=2 SV=1 >sp|Q9Y2Z0|SGT1_HUMAN Protein SGT1 homolog OS=Homo sapiens GN=SUGT1 PE=1 SV=3 >sp|Q9CX34|SGT1_MOUSE Protein SGT1 homolog OS=Mus musculus GN=Sugt1 PE=1 SV=3 >sp|Q0JL44|SGT1_ORYSJ Protein SGT1 homolog OS=Oryza sativa subsp. japonica GN=SGT1 PE=1 SV=1 >sp|B0BN85|SGT1_RAT Protein SGT1 homolog OS=Rattus norvegicus GN=Sugt1 PE=2 SV=1 The bovin is the first hit. I can't really see anything different about that. I'm don't know perl that well. Do you have some code which I can use to debug this? In line 58 it tries to access the blast hash with the ID as a key, if I understand this correctly. Either the hash is empty where the key tries to access, or the key is empty. If I could print each ID as it is found, maybe I can find a pattern. And/or print each blast entry when the blast hash is created. Thank you. Ole On 16 December 2015 at 21:55, Carson Holt wrote: > Find the hit for GAMO_00029233 and then pull it?s header line out of the > Uniprot fasta file. There may be an unexpected formatting difference in > that header. > > ?Carson > > > > On Dec 16, 2015, at 1:53 PM, Ole Kristian T?rresen < > ole.toerresen at gmail.com> wrote: > > Daniel, > this is the previous gene, before maker_functional_gff: > LG08 maker gene 13648888 13656687 . - . > ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325; > LG08 maker mRNA 13648888 13656687 . - . > > ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45; > LG08 maker exon 13648888 13648944 . - . > ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; > LG08 maker exon 13649295 13649577 . - . > ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; > LG08 maker exon 13649816 13651468 . - . > ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; > LG08 maker exon 13651736 13651789 . - . > ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; > LG08 maker exon 13652270 13652365 . - . > ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; > LG08 maker exon 13652643 13652730 . - . > ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; > LG08 maker exon 13653175 13653212 . - . > ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; > LG08 maker exon 13653587 13653641 . - . > ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; > LG08 maker exon 13653764 13653817 . - . > ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; > LG08 maker exon 13653910 13653974 . - . > ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; > LG08 maker exon 13654085 13654164 . - . > ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; > LG08 maker exon 13654474 13654828 . - . > ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; > LG08 maker exon 13656667 13656687 . - . > ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; > LG08 maker CDS 13656667 13656687 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654474 13654828 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654085 13654164 . - 2 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653910 13653974 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653764 13653817 . - 1 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653587 13653641 . - 1 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653175 13653212 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652643 13652730 . - 1 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652270 13652365 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651736 13651789 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651319 13651468 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649816 13651318 . - > . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649295 13649577 . - > . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13648888 13648944 . - > . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker gene 13786695 13806565 . - . > ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; > LG08 maker mRNA 13786695 13806565 . - . > > ID=GAMO_00029233-RA;Parent=GAMO_00029233;Name=GAMO_00029233-RA;Alias=maker-LG08-snap-gene-46.343-mRNA-1;_AED=0.47;_QI=173|0.78|0.66|1|0.21|0.26|15|0|301;_eAED=0.47; > > After : > LG08 maker gene 13648888 13656687 . - . > > ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325;Note=Similar > to Tmbim1: Protein lifeguard 3 (Mus musculus); > LG08 maker mRNA 13648888 13656687 . - . > > ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45;Note=Similar > to Tmbim1: Protein lifeguard 3 (Mus musculus); > LG08 maker exon 13648888 13648944 . - . > ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; > LG08 maker exon 13649295 13649577 . - . > ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; > LG08 maker exon 13649816 13651468 . - . > ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; > LG08 maker exon 13651736 13651789 . - . > ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; > LG08 maker exon 13652270 13652365 . - . > ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; > LG08 maker exon 13652643 13652730 . - . > ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; > LG08 maker exon 13653175 13653212 . - . > ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; > LG08 maker exon 13653587 13653641 . - . > ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; > LG08 maker exon 13653764 13653817 . - . > ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; > LG08 maker exon 13653910 13653974 . - . > ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; > LG08 maker exon 13654085 13654164 . - . > ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; > LG08 maker exon 13654474 13654828 . - . > ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; > LG08 maker exon 13656667 13656687 . - . > ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; > LG08 maker CDS 13656667 13656687 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654474 13654828 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654085 13654164 . - 2 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653910 13653974 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653764 13653817 . - 1 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653587 13653641 . - 1 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653175 13653212 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652643 13652730 . - 1 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652270 13652365 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651736 13651789 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651319 13651468 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649816 13651318 . - > . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649295 13649577 . - > . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13648888 13648944 . - > . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > > Carson, I saw that, but I did use Uniprot/Swiss-prot. A snap of the > blast-output used as input here: > GAMO_00029212-RA sp|Q8BJZ3|LFG3_MOUSE 53.93 280 112 3 > 81 348 33 307 2e-92 285 > GAMO_00029212-RA sp|Q969X1|LFG3_HUMAN 54.51 288 103 5 > 76 347 33 308 4e-92 284 > GAMO_00029212-RA sp|Q9BWQ8|LFG2_HUMAN 45.73 328 134 6 > 44 351 13 316 2e-86 270 > GAMO_00029212-RA sp|Q5R4I4|LFG2_PONAB 45.73 328 134 6 > 44 351 13 316 3e-86 269 > GAMO_00029212-RA sp|Q1LZ71|LFG2_BOVIN 45.03 322 145 5 > 44 351 13 316 5e-84 264 > GAMO_00029212-RA sp|O88407|LFG2_RAT 44.65 327 139 6 > 44 351 13 316 8e-83 261 > GAMO_00029212-RA sp|Q8K097|LFG2_MOUSE 45.16 310 129 5 > 60 351 31 317 1e-80 255 > GAMO_00029212-RA sp|Q7Z429|LFG1_HUMAN 39.32 351 164 9 > 32 351 39 371 6e-69 226 > GAMO_00029212-RA sp|Q32L53|LFG1_BOVIN 41.69 343 158 8 > 29 351 46 366 8e-66 218 > GAMO_00029212-RA sp|Q9ESF4|LFG1_MOUSE 40.43 324 156 8 > 53 351 34 345 2e-59 201 > GAMO_00029212-RA sp|Q6P6R0|LFG1_RAT 39.71 345 165 11 > 34 351 20 348 2e-59 201 > GAMO_00029212-RA sp|Q9DA39|LFG4_MOUSE 35.59 222 120 7 > 142 351 27 237 3e-24 103 > GAMO_00029212-RA sp|Q49P94|GAAP_VACCL 33.47 239 128 9 > 113 337 1 222 5e-22 97.1 > GAMO_00029233-RA sp|Q2KIK0|SGT1_BOVIN 53.18 299 100 3 > 5 268 17 310 5e-89 275 > GAMO_00029233-RA sp|B0BN85|SGT1_RAT 51.51 299 104 3 > 5 268 16 308 5e-86 268 > GAMO_00029233-RA sp|Q9CX34|SGT1_MOUSE 51.51 299 104 3 > 5 268 16 308 8e-86 267 > GAMO_00029233-RA sp|Q9Y2Z0|SGT1_HUMAN 46.83 331 100 5 > 5 268 16 337 1e-80 254 > GAMO_00029233-RA sp|Q0JL44|SGT1_ORYSJ 30.75 322 160 4 > 10 268 16 337 5e-36 137 > GAMO_00029233-RA sp|Q9SUT5|SGT1B_ARATH 27.99 318 171 4 > 9 268 11 328 3e-35 135 > GAMO_00029233-RA sp|Q9SUR9|SGT1A_ARATH 28.28 297 159 5 > 24 268 26 320 7e-35 134 > GAMO_00029233-RA sp|Q55ED0|SGT1_DICDI 37.72 167 63 3 > 138 268 196 357 5e-25 107 > > 521 genes have had added function before maker_functional_gff choked > particular gene GAMO_00029233. > > Thank you. > > Ole > > > On 16 December 2015 at 20:37, Carson Holt wrote: > >> I?ve seen this exact same error before ( >> https://groups.google.com/forum/#!searchin/maker-devel/$2Fmaker_functional_gff$20line$2058/maker-devel/cBuQMKTJj2M/aXGnARZ7JhsJ >> ). >> >> It is caused by the ID from the blast report and input protein >> fasta. maker_functional_gff is not a generic script that can work on any >> input, it only works on blast results against Uniprot/Swiss-prot. The >> script is expecting a very specific header format in both the report and >> the protein fasta and if it doesn?t see it, then it is missing certain >> pieces of needed information. >> >> Thanks, >> Carson >> >> On Dec 16, 2015, at 12:27 PM, Daniel Ence >> wrote: >> >> Hi Ole, can you send a line for a gene feature that does work? >> >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> >> On Dec 14, 2015, at 12:21 PM, Ole Kristian T?rresen < >> ole.toerresen at gmail.com> wrote: >> >> Hi, >> I'm trying to update my annotation with some functional annotations >> with maker_functional_gff, but get this annoying error: >> Can't use string ("") as a HASH ref while "strict refs" in use at >> /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, >> <$IN> line 108947. >> >> Line 108947 in the input gff is this: >> >> LG08 maker gene 13786695 13806565 . - . >> ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; >> >> It seems like the regexp in line 55 in the maker_functional_gff script >> doesn't pick up the ID, but I can't see any difference between that line >> and other similar lines. >> >> Any help to trace down this is really appreciated. Do you need any other >> information? >> >> Thank you. >> >> Sincerely, >> >> Ole Kristian T?rresen >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From huazhong at nmsu.edu Thu Dec 17 11:03:56 2015 From: huazhong at nmsu.edu (Hua Zhong) Date: Thu, 17 Dec 2015 18:03:56 +0000 Subject: [maker-devel] maker 2.31.8 segmentation fault when setting up GFF3 output and fasta chunks with mvapich2 Message-ID: Hello, we are using maker (2.31.8) with mvapich2, but the program terminates with a segmentation fault while setting up GFF3 output and fasta chunks. We really have no idea what the problem was. Below is the error message: +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ setting up GFF3 output and fasta chunks setting up GFF3 output and fasta chunks setting up GFF3 output and fasta chunks setting up GFF3 output and fasta chunks setting up GFF3 output and fasta chunks setting up GFF3 output and fasta chunks [fpga04.cluster:mpi_rank_111][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_107][error_sighandler] Caught error: Segmentation fault (signal 11) Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached [fpga04.cluster:mpi_rank_113][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_115][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_114][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_105][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_108][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_110][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_104][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_106][error_sighandler] Caught error: Segmentation fault (signal 11) +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Best regards, Hua -------------- next part -------------- An HTML attachment was scrubbed... URL: From elyssa_garza at yahoo.com Thu Dec 17 15:29:56 2015 From: elyssa_garza at yahoo.com (Elyssa Garza) Date: Thu, 17 Dec 2015 22:29:56 +0000 (UTC) Subject: [maker-devel] First time using maker- Train or not to train? In-Reply-To: References: Message-ID: <802013873.330112.1450391396060.JavaMail.yahoo@mail.yahoo.com> Hi Daniel, I used the pre-trained models of Arabidopsis from SNAP and Augustus for this first run of maker.? Do you think it would be wise to use the run I used previously (shown at the start of the topic) or should I make a new run with the following parameters to use for training?? genome=CAB_assembly.fastaest=RTLs.faaltest=Brassica_oleracea.fasta protein=Arabidopsis_proteins.fastaest2genome=0protein2genome=0SNAP=A.thalianaAugustus=arabidopsismodel_org=arabidopsisrmlib=Brassicaceae_repeats.fastarepeat_protein=te_proteins.fasta At what point would I use est2genome=1?? Also for this plant genome, is it better to use model_org=arabidopsis or model_org=all?? I am also considering using RepeatModeler to create a custom repeat library, but I am not sure it is necessary with all of the repeat information I am putting in already. Any advice is helpful.Thanks,-Elyssa On Wednesday, December 16, 2015 12:07 PM, Daniel Ence wrote: Hi Elyssa,? Setting est2genome=1 tells MAKER to promote all of the est2genome alignments to a gene model, which is not what you want for a final gene set. That being said, since your gene models are basically the unmodified alignments, I?m surprised that all of them have an AED of 1, since that means that they?re not supported by any of the evidence (either est or protein).? Did you get gene models from snap or augustus? You can gather those with the fasta_merge script. Those should be a good starting place for training ab initio predictors. Instructions for training snap can be found here:http://gmod.org/wiki/MAKER_Tutorial#Training_ab_initio_Gene_Predictors Augustus can also be trained but is much more involved. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Dec 11, 2015, at 10:43 AM, Elyssa Garza wrote: Hello, I have recently begun running Maker. ?I am currently trying to annotate my Caulanthus Genome (~372Mb); a relative to Arabidopsis. ?I am unsure about the parameters I have chosen for my first run in maker, which include: genome=CAB_assembly.fasta (1044 contigs)est=Representative_transcript_loci.fasta (assembled transcripts btw 200-20000bp long)protein=TAIR10pep.fasta (Arabidopsis proteins)?Repeat maskingmodel_org=arabidopsisrmlib=list of Brassicaceae and common plant repeatsrepeat_protein=te_proteins.fastaGene Predictionsnaphmm=A.thaliana.hmmaugustus_species=arabidopsisest2genome=1 I have run a sample file of scaffolds, as well as the entire genome.In the sample file of scaffolds, I gff3merged the gffs and then ran evaluator. ?I noticed that my AED are all 1. ?Is this bad? ?What should I try next? I am also unsure on how to train files and if this should be done in my case. Can anyone advise me on these issues? -Elyssa_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Dec 17 15:37:43 2015 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 17 Dec 2015 15:37:43 -0700 Subject: [maker-devel] maker 2.31.8 segmentation fault when setting up GFF3 output and fasta chunks with mvapich2 In-Reply-To: References: Message-ID: <417397FD-0BFD-46E6-972F-4792C42FBAC7@gmail.com> MAKER does not work with mvapich2. You must use either OpenMPI or MPICH2. The following is from the INSTALL instructions that come with MAKER ?> If using OpenMPI, make sure to set LD_PRELOAD to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/location/of/openmpi/lib/libmpi.so). 1. Say yes to the 'configure for MPI' question when running 'perl Build.PL? in step 1 of the EASY INSTALL. 2. Give path to 'mpicc'. Note to make sure you do not give the path to ?mpicc' from another MPI flavor that might be installed on your system. 3. Give path to the folder containing 'mpi,h'. Note to make sure you do not give the path to a folder from another MPI flavor that might be installed on your system. Mixing MPI flavors for 'mpicc' and 'mpi.h' will cause failures. Make sure to read and confirm the auto-detected paths. 4. Finish installation according to steps 2-4 of the EASY INSTALL Note: For OpenMPI you may also want to set OMPI_MCA_mpi_warn_on_fork=0 in your ~/.bash_profile to turn off certain nonfatal warnings. Note: If jobs hang or freeze when using mpiexec under OpenMPI try adding the '-mca btl ^openib' flag to mpiexec command when running MAKER. Example: mpiexec -mca btl ^openib -n 20 maker Thanks, Carson > On Dec 17, 2015, at 11:03 AM, Hua Zhong wrote: > > Hello, > we are using maker (2.31.8) with mvapich2, but the program terminates with a segmentation fault while setting up GFF3 output and fasta chunks. We really have no idea what the problem was. > > Below is the error message: > > > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > setting up GFF3 output and fasta chunks > setting up GFF3 output and fasta chunks > setting up GFF3 output and fasta chunks > setting up GFF3 output and fasta chunks > setting up GFF3 output and fasta chunks > setting up GFF3 output and fasta chunks > [fpga04.cluster:mpi_rank_111][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_107][error_sighandler] Caught error: Segmentation fault (signal 11) > Perl exited with active threads: > 1 running and unjoined > 0 finished and unjoined > 0 running and detached > [fpga04.cluster:mpi_rank_113][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_115][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_114][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_105][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_108][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_110][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_104][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_106][error_sighandler] Caught error: Segmentation fault (signal 11) > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Best regards, > > Hua > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From elyssa_garza at yahoo.com Mon Dec 28 13:21:40 2015 From: elyssa_garza at yahoo.com (Elyssa Garza) Date: Mon, 28 Dec 2015 14:21:40 -0600 Subject: [maker-devel] getting AED scores Message-ID: <8611B3D7-76C4-4F37-972E-91055D752D47@yahoo.com> pred_stats=0 #report AED and QI statistics for all predictions as well as models I recently finished a run of maker on my genome and would like to look at the AED score. I usually load the resulting files into CLCbio to see the AED. However, I noticed that pred_stats was an option available in the GMOD 2014 tutorial. I tried using this option and I receive the following warning: WARNING: Invalid option 'pred_stats' in control file maker_opts.ctl Is there a separate script I can use to get these statistics? -Elyssa -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Dec 29 18:43:06 2015 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 29 Dec 2015 18:43:06 -0700 Subject: [maker-devel] maker-devel post from elyssa_garza@yahoo.com requires approval In-Reply-To: References: Message-ID: <3B52FCE6-09AA-48E0-93EF-9D1F8ED2EF0A@gmail.com> It means you have a really old MAKER installation on your system that predates the ?pred_stats? option. You just need to update. Thanks, Carson > > From: Elyssa Garza > Subject: getting AED scores > Date: December 28, 2015 at 1:21:40 PM MST > To: maker-devel at yandell-lab.org > > > pred_stats=0 #report AED and QI statistics for all predictions as well as models > I recently finished a run of maker on my genome and would like to look at the AED score. I usually load the resulting files into CLCbio to see the AED. However, I noticed that pred_stats was an option available in the GMOD 2014 tutorial. I tried using this option and I receive the following warning: > > > WARNING: Invalid option 'pred_stats' in control file maker_opts.ctl > > Is there a separate script I can use to get these statistics? > > -Elyssa > -------------- next part -------------- An HTML attachment was scrubbed... URL: From elyssa_garza at yahoo.com Fri Dec 11 10:43:32 2015 From: elyssa_garza at yahoo.com (Elyssa Garza) Date: Fri, 11 Dec 2015 11:43:32 -0600 Subject: [maker-devel] First time using maker- Train or not to train? Message-ID: <084E7DB7-0A91-458E-B590-58BB6CC42E70@yahoo.com> Hello, I have recently begun running Maker. I am currently trying to annotate my Caulanthus Genome (~372Mb); a relative to Arabidopsis. I am unsure about the parameters I have chosen for my first run in maker, which include: genome=CAB_assembly.fasta (1044 contigs) est=Representative_transcript_loci.fasta (assembled transcripts btw 200-20000bp long) protein=TAIR10pep.fasta (Arabidopsis proteins) ? Repeat masking model_org=arabidopsis rmlib=list of Brassicaceae and common plant repeats repeat_protein=te_proteins.fasta Gene Prediction snaphmm=A.thaliana.hmm augustus_species=arabidopsis est2genome=1 I have run a sample file of scaffolds, as well as the entire genome. In the sample file of scaffolds, I gff3merged the gffs and then ran evaluator. I noticed that my AED are all 1. Is this bad? What should I try next? I am also unsure on how to train files and if this should be done in my case. Can anyone advise me on these issues? -Elyssa -------------- next part -------------- An HTML attachment was scrubbed... URL: From ole.toerresen at gmail.com Mon Dec 14 12:21:11 2015 From: ole.toerresen at gmail.com (=?UTF-8?Q?Ole_Kristian_T=C3=B8rresen?=) Date: Mon, 14 Dec 2015 20:21:11 +0100 Subject: [maker-devel] Error with maker_functional_gff Message-ID: Hi, I'm trying to update my annotation with some functional annotations with maker_functional_gff, but get this annoying error: Can't use string ("") as a HASH ref while "strict refs" in use at /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, <$IN> line 108947. Line 108947 in the input gff is this: LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; It seems like the regexp in line 55 in the maker_functional_gff script doesn't pick up the ID, but I can't see any difference between that line and other similar lines. Any help to trace down this is really appreciated. Do you need any other information? Thank you. Sincerely, Ole Kristian T?rresen -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Dec 16 11:07:07 2015 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 16 Dec 2015 18:07:07 +0000 Subject: [maker-devel] First time using maker- Train or not to train? In-Reply-To: <084E7DB7-0A91-458E-B590-58BB6CC42E70@yahoo.com> References: <084E7DB7-0A91-458E-B590-58BB6CC42E70@yahoo.com> Message-ID: Hi Elyssa, Setting est2genome=1 tells MAKER to promote all of the est2genome alignments to a gene model, which is not what you want for a final gene set. That being said, since your gene models are basically the unmodified alignments, I?m surprised that all of them have an AED of 1, since that means that they?re not supported by any of the evidence (either est or protein). Did you get gene models from snap or augustus? You can gather those with the fasta_merge script. Those should be a good starting place for training ab initio predictors. Instructions for training snap can be found here: http://gmod.org/wiki/MAKER_Tutorial#Training_ab_initio_Gene_Predictors Augustus can also be trained but is much more involved. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Dec 11, 2015, at 10:43 AM, Elyssa Garza > wrote: Hello, I have recently begun running Maker. I am currently trying to annotate my Caulanthus Genome (~372Mb); a relative to Arabidopsis. I am unsure about the parameters I have chosen for my first run in maker, which include: genome=CAB_assembly.fasta (1044 contigs) est=Representative_transcript_loci.fasta (assembled transcripts btw 200-20000bp long) protein=TAIR10pep.fasta (Arabidopsis proteins) ? Repeat masking model_org=arabidopsis rmlib=list of Brassicaceae and common plant repeats repeat_protein=te_proteins.fasta Gene Prediction snaphmm=A.thaliana.hmm augustus_species=arabidopsis est2genome=1 I have run a sample file of scaffolds, as well as the entire genome. In the sample file of scaffolds, I gff3merged the gffs and then ran evaluator. I noticed that my AED are all 1. Is this bad? What should I try next? I am also unsure on how to train files and if this should be done in my case. Can anyone advise me on these issues? -Elyssa _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Dec 16 12:27:00 2015 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 16 Dec 2015 19:27:00 +0000 Subject: [maker-devel] Error with maker_functional_gff In-Reply-To: References: Message-ID: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> Hi Ole, can you send a line for a gene feature that does work? Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Dec 14, 2015, at 12:21 PM, Ole Kristian T?rresen > wrote: Hi, I'm trying to update my annotation with some functional annotations with maker_functional_gff, but get this annoying error: Can't use string ("") as a HASH ref while "strict refs" in use at /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, <$IN> line 108947. Line 108947 in the input gff is this: LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; It seems like the regexp in line 55 in the maker_functional_gff script doesn't pick up the ID, but I can't see any difference between that line and other similar lines. Any help to trace down this is really appreciated. Do you need any other information? Thank you. Sincerely, Ole Kristian T?rresen _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Dec 16 12:37:14 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 16 Dec 2015 12:37:14 -0700 Subject: [maker-devel] Error with maker_functional_gff In-Reply-To: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> References: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> Message-ID: I?ve seen this exact same error before (https://groups.google.com/forum/#!searchin/maker-devel/$2Fmaker_functional_gff$20line$2058/maker-devel/cBuQMKTJj2M/aXGnARZ7JhsJ). It is caused by the ID from the blast report and input protein fasta. maker_functional_gff is not a generic script that can work on any input, it only works on blast results against Uniprot/Swiss-prot. The script is expecting a very specific header format in both the report and the protein fasta and if it doesn?t see it, then it is missing certain pieces of needed information. Thanks, Carson > On Dec 16, 2015, at 12:27 PM, Daniel Ence wrote: > > Hi Ole, can you send a line for a gene feature that does work? > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Dec 14, 2015, at 12:21 PM, Ole Kristian T?rresen > wrote: >> >> Hi, >> I'm trying to update my annotation with some functional annotations with maker_functional_gff, but get this annoying error: >> Can't use string ("") as a HASH ref while "strict refs" in use at /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, <$IN> line 108947. >> Line 108947 in the input gff is this: >> >> LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; >> It seems like the regexp in line 55 in the maker_functional_gff script doesn't pick up the ID, but I can't see any difference between that line and other similar lines. >> >> Any help to trace down this is really appreciated. Do you need any other information? >> >> Thank you. >> >> Sincerely, >> >> Ole Kristian T?rresen >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ole.toerresen at gmail.com Wed Dec 16 13:53:25 2015 From: ole.toerresen at gmail.com (=?UTF-8?Q?Ole_Kristian_T=C3=B8rresen?=) Date: Wed, 16 Dec 2015 21:53:25 +0100 Subject: [maker-devel] Error with maker_functional_gff In-Reply-To: References: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> Message-ID: Daniel, this is the previous gene, before maker_functional_gff: LG08 maker gene 13648888 13656687 . - . ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325; LG08 maker mRNA 13648888 13656687 . - . ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45; LG08 maker exon 13648888 13648944 . - . ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; LG08 maker exon 13649295 13649577 . - . ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; LG08 maker exon 13649816 13651468 . - . ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; LG08 maker exon 13651736 13651789 . - . ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; LG08 maker exon 13652270 13652365 . - . ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; LG08 maker exon 13652643 13652730 . - . ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; LG08 maker exon 13653175 13653212 . - . ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; LG08 maker exon 13653587 13653641 . - . ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; LG08 maker exon 13653764 13653817 . - . ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; LG08 maker exon 13653910 13653974 . - . ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; LG08 maker exon 13654085 13654164 . - . ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; LG08 maker exon 13654474 13654828 . - . ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; LG08 maker exon 13656667 13656687 . - . ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; LG08 maker CDS 13656667 13656687 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13654474 13654828 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13654085 13654164 . - 2 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653910 13653974 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653764 13653817 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653587 13653641 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653175 13653212 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13652643 13652730 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13652270 13652365 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13651736 13651789 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13651319 13651468 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker three_prime_UTR 13649816 13651318 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; LG08 maker three_prime_UTR 13649295 13649577 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; LG08 maker three_prime_UTR 13648888 13648944 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; LG08 maker mRNA 13786695 13806565 . - . ID=GAMO_00029233-RA;Parent=GAMO_00029233;Name=GAMO_00029233-RA;Alias=maker-LG08-snap-gene-46.343-mRNA-1;_AED=0.47;_QI=173|0.78|0.66|1|0.21|0.26|15|0|301;_eAED=0.47; After : LG08 maker gene 13648888 13656687 . - . ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325;Note=Similar to Tmbim1: Protein lifeguard 3 (Mus musculus); LG08 maker mRNA 13648888 13656687 . - . ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45;Note=Similar to Tmbim1: Protein lifeguard 3 (Mus musculus); LG08 maker exon 13648888 13648944 . - . ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; LG08 maker exon 13649295 13649577 . - . ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; LG08 maker exon 13649816 13651468 . - . ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; LG08 maker exon 13651736 13651789 . - . ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; LG08 maker exon 13652270 13652365 . - . ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; LG08 maker exon 13652643 13652730 . - . ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; LG08 maker exon 13653175 13653212 . - . ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; LG08 maker exon 13653587 13653641 . - . ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; LG08 maker exon 13653764 13653817 . - . ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; LG08 maker exon 13653910 13653974 . - . ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; LG08 maker exon 13654085 13654164 . - . ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; LG08 maker exon 13654474 13654828 . - . ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; LG08 maker exon 13656667 13656687 . - . ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; LG08 maker CDS 13656667 13656687 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13654474 13654828 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13654085 13654164 . - 2 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653910 13653974 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653764 13653817 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653587 13653641 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13653175 13653212 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13652643 13652730 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13652270 13652365 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13651736 13651789 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker CDS 13651319 13651468 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; LG08 maker three_prime_UTR 13649816 13651318 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; LG08 maker three_prime_UTR 13649295 13649577 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; LG08 maker three_prime_UTR 13648888 13648944 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; Carson, I saw that, but I did use Uniprot/Swiss-prot. A snap of the blast-output used as input here: GAMO_00029212-RA sp|Q8BJZ3|LFG3_MOUSE 53.93 280 112 3 81 348 33 307 2e-92 285 GAMO_00029212-RA sp|Q969X1|LFG3_HUMAN 54.51 288 103 5 76 347 33 308 4e-92 284 GAMO_00029212-RA sp|Q9BWQ8|LFG2_HUMAN 45.73 328 134 6 44 351 13 316 2e-86 270 GAMO_00029212-RA sp|Q5R4I4|LFG2_PONAB 45.73 328 134 6 44 351 13 316 3e-86 269 GAMO_00029212-RA sp|Q1LZ71|LFG2_BOVIN 45.03 322 145 5 44 351 13 316 5e-84 264 GAMO_00029212-RA sp|O88407|LFG2_RAT 44.65 327 139 6 44 351 13 316 8e-83 261 GAMO_00029212-RA sp|Q8K097|LFG2_MOUSE 45.16 310 129 5 60 351 31 317 1e-80 255 GAMO_00029212-RA sp|Q7Z429|LFG1_HUMAN 39.32 351 164 9 32 351 39 371 6e-69 226 GAMO_00029212-RA sp|Q32L53|LFG1_BOVIN 41.69 343 158 8 29 351 46 366 8e-66 218 GAMO_00029212-RA sp|Q9ESF4|LFG1_MOUSE 40.43 324 156 8 53 351 34 345 2e-59 201 GAMO_00029212-RA sp|Q6P6R0|LFG1_RAT 39.71 345 165 11 34 351 20 348 2e-59 201 GAMO_00029212-RA sp|Q9DA39|LFG4_MOUSE 35.59 222 120 7 142 351 27 237 3e-24 103 GAMO_00029212-RA sp|Q49P94|GAAP_VACCL 33.47 239 128 9 113 337 1 222 5e-22 97.1 GAMO_00029233-RA sp|Q2KIK0|SGT1_BOVIN 53.18 299 100 3 5 268 17 310 5e-89 275 GAMO_00029233-RA sp|B0BN85|SGT1_RAT 51.51 299 104 3 5 268 16 308 5e-86 268 GAMO_00029233-RA sp|Q9CX34|SGT1_MOUSE 51.51 299 104 3 5 268 16 308 8e-86 267 GAMO_00029233-RA sp|Q9Y2Z0|SGT1_HUMAN 46.83 331 100 5 5 268 16 337 1e-80 254 GAMO_00029233-RA sp|Q0JL44|SGT1_ORYSJ 30.75 322 160 4 10 268 16 337 5e-36 137 GAMO_00029233-RA sp|Q9SUT5|SGT1B_ARATH 27.99 318 171 4 9 268 11 328 3e-35 135 GAMO_00029233-RA sp|Q9SUR9|SGT1A_ARATH 28.28 297 159 5 24 268 26 320 7e-35 134 GAMO_00029233-RA sp|Q55ED0|SGT1_DICDI 37.72 167 63 3 138 268 196 357 5e-25 107 521 genes have had added function before maker_functional_gff choked particular gene GAMO_00029233. Thank you. Ole On 16 December 2015 at 20:37, Carson Holt wrote: > I?ve seen this exact same error before ( > https://groups.google.com/forum/#!searchin/maker-devel/$2Fmaker_functional_gff$20line$2058/maker-devel/cBuQMKTJj2M/aXGnARZ7JhsJ > ). > > It is caused by the ID from the blast report and input protein > fasta. maker_functional_gff is not a generic script that can work on any > input, it only works on blast results against Uniprot/Swiss-prot. The > script is expecting a very specific header format in both the report and > the protein fasta and if it doesn?t see it, then it is missing certain > pieces of needed information. > > Thanks, > Carson > > On Dec 16, 2015, at 12:27 PM, Daniel Ence wrote: > > Hi Ole, can you send a line for a gene feature that does work? > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On Dec 14, 2015, at 12:21 PM, Ole Kristian T?rresen < > ole.toerresen at gmail.com> wrote: > > Hi, > I'm trying to update my annotation with some functional annotations > with maker_functional_gff, but get this annoying error: > Can't use string ("") as a HASH ref while "strict refs" in use at > /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, > <$IN> line 108947. > > Line 108947 in the input gff is this: > > LG08 maker gene 13786695 13806565 . - . > ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; > > It seems like the regexp in line 55 in the maker_functional_gff script > doesn't pick up the ID, but I can't see any difference between that line > and other similar lines. > > Any help to trace down this is really appreciated. Do you need any other > information? > > Thank you. > > Sincerely, > > Ole Kristian T?rresen > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Dec 16 13:55:14 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 16 Dec 2015 13:55:14 -0700 Subject: [maker-devel] Error with maker_functional_gff In-Reply-To: References: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> Message-ID: Find the hit for GAMO_00029233 and then pull it?s header line out of the Uniprot fasta file. There may be an unexpected formatting difference in that header. ?Carson > On Dec 16, 2015, at 1:53 PM, Ole Kristian T?rresen wrote: > > Daniel, > this is the previous gene, before maker_functional_gff: > LG08 maker gene 13648888 13656687 . - . ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325; > LG08 maker mRNA 13648888 13656687 . - . ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45; > LG08 maker exon 13648888 13648944 . - . ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; > LG08 maker exon 13649295 13649577 . - . ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; > LG08 maker exon 13649816 13651468 . - . ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; > LG08 maker exon 13651736 13651789 . - . ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; > LG08 maker exon 13652270 13652365 . - . ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; > LG08 maker exon 13652643 13652730 . - . ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; > LG08 maker exon 13653175 13653212 . - . ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; > LG08 maker exon 13653587 13653641 . - . ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; > LG08 maker exon 13653764 13653817 . - . ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; > LG08 maker exon 13653910 13653974 . - . ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; > LG08 maker exon 13654085 13654164 . - . ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; > LG08 maker exon 13654474 13654828 . - . ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; > LG08 maker exon 13656667 13656687 . - . ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; > LG08 maker CDS 13656667 13656687 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654474 13654828 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654085 13654164 . - 2 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653910 13653974 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653764 13653817 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653587 13653641 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653175 13653212 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652643 13652730 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652270 13652365 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651736 13651789 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651319 13651468 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649816 13651318 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649295 13649577 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13648888 13648944 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; > LG08 maker mRNA 13786695 13806565 . - . ID=GAMO_00029233-RA;Parent=GAMO_00029233;Name=GAMO_00029233-RA;Alias=maker-LG08-snap-gene-46.343-mRNA-1;_AED=0.47;_QI=173|0.78|0.66|1|0.21|0.26|15|0|301;_eAED=0.47; > > After : > LG08 maker gene 13648888 13656687 . - . ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325;Note=Similar to Tmbim1: Protein lifeguard 3 (Mus musculus); > LG08 maker mRNA 13648888 13656687 . - . ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45;Note=Similar to Tmbim1: Protein lifeguard 3 (Mus musculus); > LG08 maker exon 13648888 13648944 . - . ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; > LG08 maker exon 13649295 13649577 . - . ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; > LG08 maker exon 13649816 13651468 . - . ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; > LG08 maker exon 13651736 13651789 . - . ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; > LG08 maker exon 13652270 13652365 . - . ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; > LG08 maker exon 13652643 13652730 . - . ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; > LG08 maker exon 13653175 13653212 . - . ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; > LG08 maker exon 13653587 13653641 . - . ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; > LG08 maker exon 13653764 13653817 . - . ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; > LG08 maker exon 13653910 13653974 . - . ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; > LG08 maker exon 13654085 13654164 . - . ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; > LG08 maker exon 13654474 13654828 . - . ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; > LG08 maker exon 13656667 13656687 . - . ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; > LG08 maker CDS 13656667 13656687 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654474 13654828 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654085 13654164 . - 2 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653910 13653974 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653764 13653817 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653587 13653641 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653175 13653212 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652643 13652730 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652270 13652365 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651736 13651789 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651319 13651468 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649816 13651318 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649295 13649577 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13648888 13648944 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > > Carson, I saw that, but I did use Uniprot/Swiss-prot. A snap of the blast-output used as input here: > GAMO_00029212-RA sp|Q8BJZ3|LFG3_MOUSE 53.93 280 112 3 81 348 33 307 2e-92 285 > GAMO_00029212-RA sp|Q969X1|LFG3_HUMAN 54.51 288 103 5 76 347 33 308 4e-92 284 > GAMO_00029212-RA sp|Q9BWQ8|LFG2_HUMAN 45.73 328 134 6 44 351 13 316 2e-86 270 > GAMO_00029212-RA sp|Q5R4I4|LFG2_PONAB 45.73 328 134 6 44 351 13 316 3e-86 269 > GAMO_00029212-RA sp|Q1LZ71|LFG2_BOVIN 45.03 322 145 5 44 351 13 316 5e-84 264 > GAMO_00029212-RA sp|O88407|LFG2_RAT 44.65 327 139 6 44 351 13 316 8e-83 261 > GAMO_00029212-RA sp|Q8K097|LFG2_MOUSE 45.16 310 129 5 60 351 31 317 1e-80 255 > GAMO_00029212-RA sp|Q7Z429|LFG1_HUMAN 39.32 351 164 9 32 351 39 371 6e-69 226 > GAMO_00029212-RA sp|Q32L53|LFG1_BOVIN 41.69 343 158 8 29 351 46 366 8e-66 218 > GAMO_00029212-RA sp|Q9ESF4|LFG1_MOUSE 40.43 324 156 8 53 351 34 345 2e-59 201 > GAMO_00029212-RA sp|Q6P6R0|LFG1_RAT 39.71 345 165 11 34 351 20 348 2e-59 201 > GAMO_00029212-RA sp|Q9DA39|LFG4_MOUSE 35.59 222 120 7 142 351 27 237 3e-24 103 > GAMO_00029212-RA sp|Q49P94|GAAP_VACCL 33.47 239 128 9 113 337 1 222 5e-22 97.1 > GAMO_00029233-RA sp|Q2KIK0|SGT1_BOVIN 53.18 299 100 3 5 268 17 310 5e-89 275 > GAMO_00029233-RA sp|B0BN85|SGT1_RAT 51.51 299 104 3 5 268 16 308 5e-86 268 > GAMO_00029233-RA sp|Q9CX34|SGT1_MOUSE 51.51 299 104 3 5 268 16 308 8e-86 267 > GAMO_00029233-RA sp|Q9Y2Z0|SGT1_HUMAN 46.83 331 100 5 5 268 16 337 1e-80 254 > GAMO_00029233-RA sp|Q0JL44|SGT1_ORYSJ 30.75 322 160 4 10 268 16 337 5e-36 137 > GAMO_00029233-RA sp|Q9SUT5|SGT1B_ARATH 27.99 318 171 4 9 268 11 328 3e-35 135 > GAMO_00029233-RA sp|Q9SUR9|SGT1A_ARATH 28.28 297 159 5 24 268 26 320 7e-35 134 > GAMO_00029233-RA sp|Q55ED0|SGT1_DICDI 37.72 167 63 3 138 268 196 357 5e-25 107 > > 521 genes have had added function before maker_functional_gff choked particular gene GAMO_00029233. > > Thank you. > > Ole > > > On 16 December 2015 at 20:37, Carson Holt > wrote: > I?ve seen this exact same error before (https://groups.google.com/forum/#!searchin/maker-devel/$2Fmaker_functional_gff$20line$2058/maker-devel/cBuQMKTJj2M/aXGnARZ7JhsJ ). > > It is caused by the ID from the blast report and input protein fasta. maker_functional_gff is not a generic script that can work on any input, it only works on blast results against Uniprot/Swiss-prot. The script is expecting a very specific header format in both the report and the protein fasta and if it doesn?t see it, then it is missing certain pieces of needed information. > > Thanks, > Carson > >> On Dec 16, 2015, at 12:27 PM, Daniel Ence > wrote: >> >> Hi Ole, can you send a line for a gene feature that does work? >> >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> >>> On Dec 14, 2015, at 12:21 PM, Ole Kristian T?rresen > wrote: >>> >>> Hi, >>> I'm trying to update my annotation with some functional annotations with maker_functional_gff, but get this annoying error: >>> Can't use string ("") as a HASH ref while "strict refs" in use at /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, <$IN> line 108947. >>> Line 108947 in the input gff is this: >>> >>> LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; >>> It seems like the regexp in line 55 in the maker_functional_gff script doesn't pick up the ID, but I can't see any difference between that line and other similar lines. >>> >>> Any help to trace down this is really appreciated. Do you need any other information? >>> >>> Thank you. >>> >>> Sincerely, >>> >>> Ole Kristian T?rresen >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Wed Dec 16 16:41:48 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Thu, 17 Dec 2015 10:41:48 +1100 Subject: [maker-devel] First time using maker- Train or not to train? In-Reply-To: References: <084E7DB7-0A91-458E-B590-58BB6CC42E70@yahoo.com> Message-ID: Hi Daniel, Have you guys heard about BUSCO ? It's kind of a replacement for CEGMA, which was based in a rather limited set of genes (according to their devels we should stop using). BUSCO does not only produces a more thorough completeness profile but it also generates the Augustus species training profile (it needs access to your local Augustus species folder). According to the manual, if you use the --long option it is similar to a training and retraining step in the old training method. I recently used it for training Augustus for my fungal genomes and it works well. Unfortunately, it may not apply for this case as they don't have the plant profile dataset ready yet. You may request early access to it though I used to use the CEGMA output plus the webAugustus training service, a bit more tedious but not that complicated. I copy below what I had in my old protocol, nonetheless I would recommend any other user not dealing with plant genomes to use BUSCO instead: Augustus gff files are a bit different from CEGMA ones. Get the CEGMA > output and run the following script: > cegma2zff output.cegma.gff > augustus.gff > > Upload the genome file (e.g. contigs.fa from velvet) and the "training > gene structure file" (augustus.gff) to > http://bioinf.uni-greifswald.de/webaugustus/training/create > > Once finished, the "Species parameter archive" (parameters.tar.gz) will > contain a folder with the model files for your species. Copy it to the > species folder of Augustus (augustus/config/species). > > Re-training > > From Maker's output, follow the the same initial instructions as for SNAP > training detailed in the Maker tutorial: > In the directory that contains MYGENOME.maker.output/ folder: > mkdir snap > cd snap > gff3_merge -d > ../MYGENOME.maker.output/MYGENOME_master_datastore_index.log > maker2zff -n MYGENOME.all.gff > The option -n is not included in the original tutorial but you may end > with empty genome.ann and genome.dna files. > From this point we generate training files for both SNAP and Augustus: > > fathom genome.ann genome.dna -categorize 1000 > fathom uni.ann uni.dna -export 1000 -plus > forge export.ann export.dna > > For Augustus, we need the script "zff2augustus_gbk.pl". This will take > the export.dna generated by fathom and generate a *.gb file that will be > used as "training gene structure file" in a new training submission in > WebAugustus, but remember to give it a new name in the submission, e.g. > MYGENOME_v2, or Maker won't see the difference (same name): > perl PATH/TO/SCRIPT/zff2augustus_gbk.pl > MYGENOME_v2.train.gb > Xabier On 17 December 2015 at 05:07, Daniel Ence wrote: > Hi Elyssa, > > Setting est2genome=1 tells MAKER to promote all of the est2genome > alignments to a gene model, which is not what you want for a final gene > set. That being said, since your gene models are basically the unmodified > alignments, I?m surprised that all of them have an AED of 1, since that > means that they?re not supported by any of the evidence (either est or > protein). > > Did you get gene models from snap or augustus? You can gather those with > the fasta_merge script. Those should be a good starting place for training > ab initio predictors. Instructions for training snap can be found here: > http://gmod.org/wiki/MAKER_Tutorial#Training_ab_initio_Gene_Predictors > > Augustus can also be trained but is much more involved. > > ~Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On Dec 11, 2015, at 10:43 AM, Elyssa Garza wrote: > > Hello, > > I have recently begun running Maker. I am currently trying to annotate my > Caulanthus Genome (~372Mb); a relative to Arabidopsis. I am unsure about > the parameters I have chosen for my first run in maker, which include: > > genome=CAB_assembly.fasta (1044 contigs) > est=Representative_transcript_loci.fasta (assembled transcripts btw > 200-20000bp long) > protein=TAIR10pep.fasta (Arabidopsis proteins) > ? > *Repeat masking* > model_org=arabidopsis > rmlib=list of Brassicaceae and common plant repeats > repeat_protein=te_proteins.fasta > *Gene Prediction* > snaphmm=A.thaliana.hmm > augustus_species=arabidopsis > est2genome=1 > > I have run a sample file of scaffolds, as well as the entire genome. > In the sample file of scaffolds, I gff3merged the gffs and then ran > evaluator. I noticed that my AED are all 1. Is this bad? What should I > try next? > > I am also unsure on how to train files and if this should be done in my > case. > > Can anyone advise me on these issues? > > -Elyssa > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Dec 16 17:13:29 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 16 Dec 2015 17:13:29 -0700 Subject: [maker-devel] First time using maker- Train or not to train? In-Reply-To: References: <084E7DB7-0A91-458E-B590-58BB6CC42E70@yahoo.com> Message-ID: Yes. BUSCO is awesome. Also they have presentations this year at PAG in both the ?Next Generation Genome Annotation and Analysis? and ?Computational Gene Discovery? workshops. ?Carson > On Dec 16, 2015, at 4:41 PM, Xabier V?zquez Campos wrote: > > Hi Daniel, > > Have you guys heard about BUSCO ? It's kind of a replacement for CEGMA, which was based in a rather limited set of genes (according to their devels we should stop using). BUSCO does not only produces a more thorough completeness profile but it also generates the Augustus species training profile (it needs access to your local Augustus species folder). According to the manual, if you use the --long option it is similar to a training and retraining step in the old training method. > > I recently used it for training Augustus for my fungal genomes and it works well. Unfortunately, it may not apply for this case as they don't have the plant profile dataset ready yet. You may request early access to it though > > I used to use the CEGMA output plus the webAugustus training service, a bit more tedious but not that complicated. I copy below what I had in my old protocol, nonetheless I would recommend any other user not dealing with plant genomes to use BUSCO instead: > > Augustus gff files are a bit different from CEGMA ones. Get the CEGMA output and run the following script: > cegma2zff output.cegma.gff > augustus.gff > > Upload the genome file (e.g. contigs.fa from velvet) and the "training gene structure file" (augustus.gff) to http://bioinf.uni-greifswald.de/webaugustus/training/create > > Once finished, the "Species parameter archive" (parameters.tar.gz) will contain a folder with the model files for your species. Copy it to the species folder of Augustus (augustus/config/species). > > Re-training > > From Maker's output, follow the the same initial instructions as for SNAP training detailed in the Maker tutorial: > In the directory that contains MYGENOME.maker.output/ folder: > mkdir snap > cd snap > gff3_merge -d ../MYGENOME.maker.output/MYGENOME_master_datastore_index.log > maker2zff -n MYGENOME.all.gff > The option -n is not included in the original tutorial but you may end with empty genome.ann and genome.dna files. > From this point we generate training files for both SNAP and Augustus: > > fathom genome.ann genome.dna -categorize 1000 > fathom uni.ann uni.dna -export 1000 -plus > forge export.ann export.dna > > For Augustus, we need the script "zff2augustus_gbk.pl ". This will take the export.dna generated by fathom and generate a *.gb file that will be used as "training gene structure file" in a new training submission in WebAugustus, but remember to give it a new name in the submission, e.g. MYGENOME_v2, or Maker won't see the difference (same name): > perl PATH/TO/SCRIPT/zff2augustus_gbk.pl > MYGENOME_v2.train.gb > > Xabier > > On 17 December 2015 at 05:07, Daniel Ence > wrote: > Hi Elyssa, > > Setting est2genome=1 tells MAKER to promote all of the est2genome alignments to a gene model, which is not what you want for a final gene set. That being said, since your gene models are basically the unmodified alignments, I?m surprised that all of them have an AED of 1, since that means that they?re not supported by any of the evidence (either est or protein). > > Did you get gene models from snap or augustus? You can gather those with the fasta_merge script. Those should be a good starting place for training ab initio predictors. Instructions for training snap can be found here: > http://gmod.org/wiki/MAKER_Tutorial#Training_ab_initio_Gene_Predictors > > Augustus can also be trained but is much more involved. > > ~Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Dec 11, 2015, at 10:43 AM, Elyssa Garza > wrote: >> >> Hello, >> >> I have recently begun running Maker. I am currently trying to annotate my Caulanthus Genome (~372Mb); a relative to Arabidopsis. I am unsure about the parameters I have chosen for my first run in maker, which include: >> >> genome=CAB_assembly.fasta (1044 contigs) >> est=Representative_transcript_loci.fasta (assembled transcripts btw 200-20000bp long) >> protein=TAIR10pep.fasta (Arabidopsis proteins) >> ? >> Repeat masking >> model_org=arabidopsis >> rmlib=list of Brassicaceae and common plant repeats >> repeat_protein=te_proteins.fasta >> Gene Prediction >> snaphmm=A.thaliana.hmm >> augustus_species=arabidopsis >> est2genome=1 >> >> I have run a sample file of scaffolds, as well as the entire genome. >> In the sample file of scaffolds, I gff3merged the gffs and then ran evaluator. I noticed that my AED are all 1. Is this bad? What should I try next? >> >> I am also unsure on how to train files and if this should be done in my case. >> >> Can anyone advise me on these issues? >> >> -Elyssa >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > Xabier V?zquez-Campos, PhD > Research Associate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ole.toerresen at gmail.com Wed Dec 16 23:32:26 2015 From: ole.toerresen at gmail.com (=?UTF-8?Q?Ole_Kristian_T=C3=B8rresen?=) Date: Thu, 17 Dec 2015 07:32:26 +0100 Subject: [maker-devel] Error with maker_functional_gff In-Reply-To: References: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> Message-ID: Here's the hits for GAMO_00029233 >sp|Q9SUR9|SGT1A_ARATH Protein SGT1 homolog A OS=Arabidopsis thaliana GN=SGT1A PE=1 SV=1 >sp|Q9SUT5|SGT1B_ARATH Protein SGT1 homolog B OS=Arabidopsis thaliana GN=SGT1B PE=1 SV=1 >sp|Q2KIK0|SGT1_BOVIN Protein SGT1 homolog OS=Bos taurus GN=SUGT1 PE=2 SV=1 >sp|Q55ED0|SGT1_DICDI Protein SGT1 homolog OS=Dictyostelium discoideum GN=sugt1 PE=2 SV=1 >sp|Q9Y2Z0|SGT1_HUMAN Protein SGT1 homolog OS=Homo sapiens GN=SUGT1 PE=1 SV=3 >sp|Q9CX34|SGT1_MOUSE Protein SGT1 homolog OS=Mus musculus GN=Sugt1 PE=1 SV=3 >sp|Q0JL44|SGT1_ORYSJ Protein SGT1 homolog OS=Oryza sativa subsp. japonica GN=SGT1 PE=1 SV=1 >sp|B0BN85|SGT1_RAT Protein SGT1 homolog OS=Rattus norvegicus GN=Sugt1 PE=2 SV=1 The bovin is the first hit. I can't really see anything different about that. I'm don't know perl that well. Do you have some code which I can use to debug this? In line 58 it tries to access the blast hash with the ID as a key, if I understand this correctly. Either the hash is empty where the key tries to access, or the key is empty. If I could print each ID as it is found, maybe I can find a pattern. And/or print each blast entry when the blast hash is created. Thank you. Ole On 16 December 2015 at 21:55, Carson Holt wrote: > Find the hit for GAMO_00029233 and then pull it?s header line out of the > Uniprot fasta file. There may be an unexpected formatting difference in > that header. > > ?Carson > > > > On Dec 16, 2015, at 1:53 PM, Ole Kristian T?rresen < > ole.toerresen at gmail.com> wrote: > > Daniel, > this is the previous gene, before maker_functional_gff: > LG08 maker gene 13648888 13656687 . - . > ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325; > LG08 maker mRNA 13648888 13656687 . - . > > ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45; > LG08 maker exon 13648888 13648944 . - . > ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; > LG08 maker exon 13649295 13649577 . - . > ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; > LG08 maker exon 13649816 13651468 . - . > ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; > LG08 maker exon 13651736 13651789 . - . > ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; > LG08 maker exon 13652270 13652365 . - . > ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; > LG08 maker exon 13652643 13652730 . - . > ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; > LG08 maker exon 13653175 13653212 . - . > ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; > LG08 maker exon 13653587 13653641 . - . > ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; > LG08 maker exon 13653764 13653817 . - . > ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; > LG08 maker exon 13653910 13653974 . - . > ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; > LG08 maker exon 13654085 13654164 . - . > ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; > LG08 maker exon 13654474 13654828 . - . > ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; > LG08 maker exon 13656667 13656687 . - . > ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; > LG08 maker CDS 13656667 13656687 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654474 13654828 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654085 13654164 . - 2 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653910 13653974 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653764 13653817 . - 1 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653587 13653641 . - 1 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653175 13653212 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652643 13652730 . - 1 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652270 13652365 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651736 13651789 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651319 13651468 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649816 13651318 . - > . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649295 13649577 . - > . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13648888 13648944 . - > . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker gene 13786695 13806565 . - . > ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; > LG08 maker mRNA 13786695 13806565 . - . > > ID=GAMO_00029233-RA;Parent=GAMO_00029233;Name=GAMO_00029233-RA;Alias=maker-LG08-snap-gene-46.343-mRNA-1;_AED=0.47;_QI=173|0.78|0.66|1|0.21|0.26|15|0|301;_eAED=0.47; > > After : > LG08 maker gene 13648888 13656687 . - . > > ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325;Note=Similar > to Tmbim1: Protein lifeguard 3 (Mus musculus); > LG08 maker mRNA 13648888 13656687 . - . > > ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45;Note=Similar > to Tmbim1: Protein lifeguard 3 (Mus musculus); > LG08 maker exon 13648888 13648944 . - . > ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; > LG08 maker exon 13649295 13649577 . - . > ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; > LG08 maker exon 13649816 13651468 . - . > ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; > LG08 maker exon 13651736 13651789 . - . > ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; > LG08 maker exon 13652270 13652365 . - . > ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; > LG08 maker exon 13652643 13652730 . - . > ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; > LG08 maker exon 13653175 13653212 . - . > ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; > LG08 maker exon 13653587 13653641 . - . > ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; > LG08 maker exon 13653764 13653817 . - . > ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; > LG08 maker exon 13653910 13653974 . - . > ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; > LG08 maker exon 13654085 13654164 . - . > ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; > LG08 maker exon 13654474 13654828 . - . > ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; > LG08 maker exon 13656667 13656687 . - . > ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; > LG08 maker CDS 13656667 13656687 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654474 13654828 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13654085 13654164 . - 2 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653910 13653974 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653764 13653817 . - 1 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653587 13653641 . - 1 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13653175 13653212 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652643 13652730 . - 1 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13652270 13652365 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651736 13651789 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker CDS 13651319 13651468 . - 0 > ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649816 13651318 . - > . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13649295 13649577 . - > . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > LG08 maker three_prime_UTR 13648888 13648944 . - > . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; > > Carson, I saw that, but I did use Uniprot/Swiss-prot. A snap of the > blast-output used as input here: > GAMO_00029212-RA sp|Q8BJZ3|LFG3_MOUSE 53.93 280 112 3 > 81 348 33 307 2e-92 285 > GAMO_00029212-RA sp|Q969X1|LFG3_HUMAN 54.51 288 103 5 > 76 347 33 308 4e-92 284 > GAMO_00029212-RA sp|Q9BWQ8|LFG2_HUMAN 45.73 328 134 6 > 44 351 13 316 2e-86 270 > GAMO_00029212-RA sp|Q5R4I4|LFG2_PONAB 45.73 328 134 6 > 44 351 13 316 3e-86 269 > GAMO_00029212-RA sp|Q1LZ71|LFG2_BOVIN 45.03 322 145 5 > 44 351 13 316 5e-84 264 > GAMO_00029212-RA sp|O88407|LFG2_RAT 44.65 327 139 6 > 44 351 13 316 8e-83 261 > GAMO_00029212-RA sp|Q8K097|LFG2_MOUSE 45.16 310 129 5 > 60 351 31 317 1e-80 255 > GAMO_00029212-RA sp|Q7Z429|LFG1_HUMAN 39.32 351 164 9 > 32 351 39 371 6e-69 226 > GAMO_00029212-RA sp|Q32L53|LFG1_BOVIN 41.69 343 158 8 > 29 351 46 366 8e-66 218 > GAMO_00029212-RA sp|Q9ESF4|LFG1_MOUSE 40.43 324 156 8 > 53 351 34 345 2e-59 201 > GAMO_00029212-RA sp|Q6P6R0|LFG1_RAT 39.71 345 165 11 > 34 351 20 348 2e-59 201 > GAMO_00029212-RA sp|Q9DA39|LFG4_MOUSE 35.59 222 120 7 > 142 351 27 237 3e-24 103 > GAMO_00029212-RA sp|Q49P94|GAAP_VACCL 33.47 239 128 9 > 113 337 1 222 5e-22 97.1 > GAMO_00029233-RA sp|Q2KIK0|SGT1_BOVIN 53.18 299 100 3 > 5 268 17 310 5e-89 275 > GAMO_00029233-RA sp|B0BN85|SGT1_RAT 51.51 299 104 3 > 5 268 16 308 5e-86 268 > GAMO_00029233-RA sp|Q9CX34|SGT1_MOUSE 51.51 299 104 3 > 5 268 16 308 8e-86 267 > GAMO_00029233-RA sp|Q9Y2Z0|SGT1_HUMAN 46.83 331 100 5 > 5 268 16 337 1e-80 254 > GAMO_00029233-RA sp|Q0JL44|SGT1_ORYSJ 30.75 322 160 4 > 10 268 16 337 5e-36 137 > GAMO_00029233-RA sp|Q9SUT5|SGT1B_ARATH 27.99 318 171 4 > 9 268 11 328 3e-35 135 > GAMO_00029233-RA sp|Q9SUR9|SGT1A_ARATH 28.28 297 159 5 > 24 268 26 320 7e-35 134 > GAMO_00029233-RA sp|Q55ED0|SGT1_DICDI 37.72 167 63 3 > 138 268 196 357 5e-25 107 > > 521 genes have had added function before maker_functional_gff choked > particular gene GAMO_00029233. > > Thank you. > > Ole > > > On 16 December 2015 at 20:37, Carson Holt wrote: > >> I?ve seen this exact same error before ( >> https://groups.google.com/forum/#!searchin/maker-devel/$2Fmaker_functional_gff$20line$2058/maker-devel/cBuQMKTJj2M/aXGnARZ7JhsJ >> ). >> >> It is caused by the ID from the blast report and input protein >> fasta. maker_functional_gff is not a generic script that can work on any >> input, it only works on blast results against Uniprot/Swiss-prot. The >> script is expecting a very specific header format in both the report and >> the protein fasta and if it doesn?t see it, then it is missing certain >> pieces of needed information. >> >> Thanks, >> Carson >> >> On Dec 16, 2015, at 12:27 PM, Daniel Ence >> wrote: >> >> Hi Ole, can you send a line for a gene feature that does work? >> >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> >> On Dec 14, 2015, at 12:21 PM, Ole Kristian T?rresen < >> ole.toerresen at gmail.com> wrote: >> >> Hi, >> I'm trying to update my annotation with some functional annotations >> with maker_functional_gff, but get this annoying error: >> Can't use string ("") as a HASH ref while "strict refs" in use at >> /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, >> <$IN> line 108947. >> >> Line 108947 in the input gff is this: >> >> LG08 maker gene 13786695 13806565 . - . >> ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; >> >> It seems like the regexp in line 55 in the maker_functional_gff script >> doesn't pick up the ID, but I can't see any difference between that line >> and other similar lines. >> >> Any help to trace down this is really appreciated. Do you need any other >> information? >> >> Thank you. >> >> Sincerely, >> >> Ole Kristian T?rresen >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From huazhong at nmsu.edu Thu Dec 17 11:03:56 2015 From: huazhong at nmsu.edu (Hua Zhong) Date: Thu, 17 Dec 2015 18:03:56 +0000 Subject: [maker-devel] maker 2.31.8 segmentation fault when setting up GFF3 output and fasta chunks with mvapich2 Message-ID: Hello, we are using maker (2.31.8) with mvapich2, but the program terminates with a segmentation fault while setting up GFF3 output and fasta chunks. We really have no idea what the problem was. Below is the error message: +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ setting up GFF3 output and fasta chunks setting up GFF3 output and fasta chunks setting up GFF3 output and fasta chunks setting up GFF3 output and fasta chunks setting up GFF3 output and fasta chunks setting up GFF3 output and fasta chunks [fpga04.cluster:mpi_rank_111][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_107][error_sighandler] Caught error: Segmentation fault (signal 11) Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached [fpga04.cluster:mpi_rank_113][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_115][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_114][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_105][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_108][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_110][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_104][error_sighandler] Caught error: Segmentation fault (signal 11) [fpga04.cluster:mpi_rank_106][error_sighandler] Caught error: Segmentation fault (signal 11) +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Best regards, Hua -------------- next part -------------- An HTML attachment was scrubbed... URL: From elyssa_garza at yahoo.com Thu Dec 17 15:29:56 2015 From: elyssa_garza at yahoo.com (Elyssa Garza) Date: Thu, 17 Dec 2015 22:29:56 +0000 (UTC) Subject: [maker-devel] First time using maker- Train or not to train? In-Reply-To: References: Message-ID: <802013873.330112.1450391396060.JavaMail.yahoo@mail.yahoo.com> Hi Daniel, I used the pre-trained models of Arabidopsis from SNAP and Augustus for this first run of maker.? Do you think it would be wise to use the run I used previously (shown at the start of the topic) or should I make a new run with the following parameters to use for training?? genome=CAB_assembly.fastaest=RTLs.faaltest=Brassica_oleracea.fasta protein=Arabidopsis_proteins.fastaest2genome=0protein2genome=0SNAP=A.thalianaAugustus=arabidopsismodel_org=arabidopsisrmlib=Brassicaceae_repeats.fastarepeat_protein=te_proteins.fasta At what point would I use est2genome=1?? Also for this plant genome, is it better to use model_org=arabidopsis or model_org=all?? I am also considering using RepeatModeler to create a custom repeat library, but I am not sure it is necessary with all of the repeat information I am putting in already. Any advice is helpful.Thanks,-Elyssa On Wednesday, December 16, 2015 12:07 PM, Daniel Ence wrote: Hi Elyssa,? Setting est2genome=1 tells MAKER to promote all of the est2genome alignments to a gene model, which is not what you want for a final gene set. That being said, since your gene models are basically the unmodified alignments, I?m surprised that all of them have an AED of 1, since that means that they?re not supported by any of the evidence (either est or protein).? Did you get gene models from snap or augustus? You can gather those with the fasta_merge script. Those should be a good starting place for training ab initio predictors. Instructions for training snap can be found here:http://gmod.org/wiki/MAKER_Tutorial#Training_ab_initio_Gene_Predictors Augustus can also be trained but is much more involved. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Dec 11, 2015, at 10:43 AM, Elyssa Garza wrote: Hello, I have recently begun running Maker. ?I am currently trying to annotate my Caulanthus Genome (~372Mb); a relative to Arabidopsis. ?I am unsure about the parameters I have chosen for my first run in maker, which include: genome=CAB_assembly.fasta (1044 contigs)est=Representative_transcript_loci.fasta (assembled transcripts btw 200-20000bp long)protein=TAIR10pep.fasta (Arabidopsis proteins)?Repeat maskingmodel_org=arabidopsisrmlib=list of Brassicaceae and common plant repeatsrepeat_protein=te_proteins.fastaGene Predictionsnaphmm=A.thaliana.hmmaugustus_species=arabidopsisest2genome=1 I have run a sample file of scaffolds, as well as the entire genome.In the sample file of scaffolds, I gff3merged the gffs and then ran evaluator. ?I noticed that my AED are all 1. ?Is this bad? ?What should I try next? I am also unsure on how to train files and if this should be done in my case. Can anyone advise me on these issues? -Elyssa_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Dec 17 15:37:43 2015 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 17 Dec 2015 15:37:43 -0700 Subject: [maker-devel] maker 2.31.8 segmentation fault when setting up GFF3 output and fasta chunks with mvapich2 In-Reply-To: References: Message-ID: <417397FD-0BFD-46E6-972F-4792C42FBAC7@gmail.com> MAKER does not work with mvapich2. You must use either OpenMPI or MPICH2. The following is from the INSTALL instructions that come with MAKER ?> If using OpenMPI, make sure to set LD_PRELOAD to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/location/of/openmpi/lib/libmpi.so). 1. Say yes to the 'configure for MPI' question when running 'perl Build.PL? in step 1 of the EASY INSTALL. 2. Give path to 'mpicc'. Note to make sure you do not give the path to ?mpicc' from another MPI flavor that might be installed on your system. 3. Give path to the folder containing 'mpi,h'. Note to make sure you do not give the path to a folder from another MPI flavor that might be installed on your system. Mixing MPI flavors for 'mpicc' and 'mpi.h' will cause failures. Make sure to read and confirm the auto-detected paths. 4. Finish installation according to steps 2-4 of the EASY INSTALL Note: For OpenMPI you may also want to set OMPI_MCA_mpi_warn_on_fork=0 in your ~/.bash_profile to turn off certain nonfatal warnings. Note: If jobs hang or freeze when using mpiexec under OpenMPI try adding the '-mca btl ^openib' flag to mpiexec command when running MAKER. Example: mpiexec -mca btl ^openib -n 20 maker Thanks, Carson > On Dec 17, 2015, at 11:03 AM, Hua Zhong wrote: > > Hello, > we are using maker (2.31.8) with mvapich2, but the program terminates with a segmentation fault while setting up GFF3 output and fasta chunks. We really have no idea what the problem was. > > Below is the error message: > > > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > setting up GFF3 output and fasta chunks > setting up GFF3 output and fasta chunks > setting up GFF3 output and fasta chunks > setting up GFF3 output and fasta chunks > setting up GFF3 output and fasta chunks > setting up GFF3 output and fasta chunks > [fpga04.cluster:mpi_rank_111][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_107][error_sighandler] Caught error: Segmentation fault (signal 11) > Perl exited with active threads: > 1 running and unjoined > 0 finished and unjoined > 0 running and detached > [fpga04.cluster:mpi_rank_113][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_115][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_114][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_105][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_108][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_110][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_104][error_sighandler] Caught error: Segmentation fault (signal 11) > [fpga04.cluster:mpi_rank_106][error_sighandler] Caught error: Segmentation fault (signal 11) > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Best regards, > > Hua > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From elyssa_garza at yahoo.com Mon Dec 28 13:21:40 2015 From: elyssa_garza at yahoo.com (Elyssa Garza) Date: Mon, 28 Dec 2015 14:21:40 -0600 Subject: [maker-devel] getting AED scores Message-ID: <8611B3D7-76C4-4F37-972E-91055D752D47@yahoo.com> pred_stats=0 #report AED and QI statistics for all predictions as well as models I recently finished a run of maker on my genome and would like to look at the AED score. I usually load the resulting files into CLCbio to see the AED. However, I noticed that pred_stats was an option available in the GMOD 2014 tutorial. I tried using this option and I receive the following warning: WARNING: Invalid option 'pred_stats' in control file maker_opts.ctl Is there a separate script I can use to get these statistics? -Elyssa -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Dec 29 18:43:06 2015 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 29 Dec 2015 18:43:06 -0700 Subject: [maker-devel] maker-devel post from elyssa_garza@yahoo.com requires approval In-Reply-To: References: Message-ID: <3B52FCE6-09AA-48E0-93EF-9D1F8ED2EF0A@gmail.com> It means you have a really old MAKER installation on your system that predates the ?pred_stats? option. You just need to update. Thanks, Carson > > From: Elyssa Garza > Subject: getting AED scores > Date: December 28, 2015 at 1:21:40 PM MST > To: maker-devel at yandell-lab.org > > > pred_stats=0 #report AED and QI statistics for all predictions as well as models > I recently finished a run of maker on my genome and would like to look at the AED score. I usually load the resulting files into CLCbio to see the AED. However, I noticed that pred_stats was an option available in the GMOD 2014 tutorial. I tried using this option and I receive the following warning: > > > WARNING: Invalid option 'pred_stats' in control file maker_opts.ctl > > Is there a separate script I can use to get these statistics? > > -Elyssa > -------------- next part -------------- An HTML attachment was scrubbed... URL: