<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">Hi Chia-Yi,</div><div class=""><br class=""></div><div class="">I’m glad to see you found a way around the issue you were seeing. Another solution may be to split up your input genome into several separate jobs, and run each one separately.</div><div class=""><br class=""></div><div class="">Just out of curiosity could you send me the results of these two commands?</div><div class=""><br class=""></div><div class="">df -h /tmp</div><div class="">df -h <directory_where_you_are_running_maker></div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">A GFFDB.pm lock failure generally means either your working directory is network mounted and MAKER can’t detect it or that /tmp is tmpfs both of which can cause SQLite failures.</div><div class=""><br class=""></div><div class="">Thanks,</div><div class="">Carson</div><div class=""><br class=""></div><br class=""><div><blockquote type="cite" class=""><div class="">On Sep 8, 2015, at 9:46 AM, Cheng, Chia-Yi <<a href="mailto:ccheng@jcvi.org" class="">ccheng@jcvi.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; font-size: 14px; font-family: Calibri, sans-serif;" class=""><div class=""><div class="">Hi Carson,</div><div class=""><br class=""></div><div class="">Thank you for the suggestions. For my previous runs, I’ve been setting the TMP to a non-NFS position and used 4 or 8 CPUs for MPI. In the MPI log file there is a consistent error, DBD::SQLite::db selectcol_arrayref failed: database is locked at maker-2.31.8/bin/../lib/GFFDB.pm line 525./, which may associate with the IO error you pointed out. This is likely caused by the MPI setting in our institute. Therefore, my team mate Vivek suggested to run on non-MPI. It took about a day to run, compared to ~6 hours when using MPI. Yet it did not create any error and the AED from two runs were identical. The command for the successful runs was, maker -R -quiet -TMP /tmp -fix_nucleotides</div><div class=""><br class=""></div><div class="">It looks like this approach has resolved the issue. Please feel free to post this update to the Google group. Again, thank you for your help.</div><div class=""><br class=""></div><div class="">Best,</div><div class="">Chia-Yi</div></div><div class=""><br class=""></div><div class=""><br class=""></div><span id="OLK_SRC_BODY_SECTION" class=""><div style="font-family: Calibri; font-size: 11pt; text-align: left; border-width: 1pt medium medium; border-style: solid none none; padding: 3pt 0in 0in; border-top-color: rgb(181, 196, 223);" class=""><span style="font-weight:bold" class="">From: </span> Carson Holt <<a href="mailto:carsonhh@gmail.com" class="">carsonhh@gmail.com</a>><br class=""><span style="font-weight:bold" class="">Date: </span> Friday, September 4, 2015 at 2:43 PM<br class=""><span style="font-weight:bold" class="">To: </span> Cheng Chia-Yi <<a href="mailto:ccheng@jcvi.org" class="">ccheng@jcvi.org</a>><br class=""><span style="font-weight:bold" class="">Subject: </span> Re: [maker-devel] AED scores from MAKER pipeline - deterministic or not?<br class=""></div><div class=""><br class=""></div><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">Hi Chia-Yi,</div><div class=""><br class=""></div>
I think I found the issue based off the data difference between the GFF3 files. MAKER uses a number of intermediate files to store data as it progresses (will be in regional chunks). It looks like you had an IO error in one of the runs and one of these files
was likely empty (note attached image with circled region where all EST/mRNA data just drops out - only happens in one of the files). It didn’t kill the job (NFS errors rarely do - it’s one of their optimizations, they always return success and assume it
will complete eventually). You can run again with MAKER -a options to rebuild the data output.
<div class=""><br class=""></div><div class="">Make sure your TMP= environment variable is not pointing to an NFS mounted location (that would exacerbate issues). You also may need to scale back the number of CPUs you are running using MPI in order to reduce the IO burden.
<div class=""><br class=""></div><div class="">Thanks,</div><div class="">Carson</div><div class=""><div class=""><br class=""></div><div class=""><br class=""><div class=""><span id="cid:328D506C-D864-4AC5-BFC3-8347CFF6FBE3@uconnect.utah.edu"><Screen Shot 2015-09-04 at 11.17.09 AM.png></span><br class=""><div class=""><blockquote type="cite" class=""><div class="">On Sep 4, 2015, at 9:06 AM, Cheng, Chia-Yi <<a href="mailto:ccheng@jcvi.org" class="">ccheng@jcvi.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; font-size: 14px; font-family: Calibri, sans-serif;" class=""><div class="">Hi Carson,</div><div class=""><br class=""></div><div class="">Thank you for clarifying it up. The two MAKER generated GFF files could be downloaded from iPlant now,</div><div class=""><br class=""></div><div class=""><a href="http://de.iplantcollaborative.org/dl/d/0C9CBD8F-9B6E-40F1-A2FA-4F7AC7AAE4B5/Chr1.gff.20150831" class="">http://de.iplantcollaborative.org/dl/d/0C9CBD8F-9B6E-40F1-A2FA-4F7AC7AAE4B5/Chr1.gff.20150831</a></div><div class=""><a href="http://de.iplantcollaborative.org/dl/d/4C73FD9D-BE7E-4937-84D5-1D7F32196B67/Chr1.gff.repeat_20150831" class="">http://de.iplantcollaborative.org/dl/d/4C73FD9D-BE7E-4937-84D5-1D7F32196B67/Chr1.gff.repeat_20150831</a></div><div class=""><br class=""></div><div class="">The control files for these two runs and the a list of 818 models with different AED scores are attached to this email.</div><div class=""><br class=""></div><div class="">Please let me know if you need any other information. Thank you so much for your help.</div><div class=""><br class=""></div><div class="">Best,</div><div class="">Chia-Yi</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><span id="OLK_SRC_BODY_SECTION" class=""><div style="font-family: Calibri; font-size: 11pt; text-align: left; border-width: 1pt medium medium; border-style: solid none none; padding: 3pt 0in 0in; border-top-color: rgb(181, 196, 223);" class=""><span style="font-weight:bold" class="">From: </span>Carson Holt <<a href="mailto:carsonhh@gmail.com" class="">carsonhh@gmail.com</a>><br class=""><span style="font-weight:bold" class="">Date: </span>Thursday, September 3, 2015 at 6:40 PM<br class=""><span style="font-weight:bold" class="">To: </span>Cheng Chia-Yi <<a href="mailto:ccheng@jcvi.org" class="">ccheng@jcvi.org</a>><br class=""><span style="font-weight:bold" class="">Subject: </span>Re: [maker-devel] AED scores from MAKER pipeline - deterministic or not?<br class=""></div><div class=""><br class=""></div><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">Hi Chia-Yi,</div><div class=""><br class=""></div>
What I really need are the MAKER produced GFF3 outputs from both runs (the individual contig files with the fasta at the end). Just Chr1 is sufficient.
<div class=""><br class=""></div><div class="">Thanks,</div><div class="">Carson<br class=""><div class=""><br class=""></div><div class=""><br class=""><div class=""><blockquote type="cite" class=""><div class="">On Aug 31, 2015, at 10:20 AM, Cheng, Chia-Yi <<a href="mailto:ccheng@jcvi.org" class="">ccheng@jcvi.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; font-size: 14px; font-family: Calibri, sans-serif;" class=""><div class="">Hi Carson,</div><div class=""><br class=""></div><div class="">Please find the 1142 gene models with different AED from both runs. Due to the size, please download the annotated GFF3 and fasta files from iPlant,</div><div class=""><a href="http://de.iplantcollaborative.org/dl/d/2C1901E6-7F52-4264-9CB7-AB72CEF6BD67/TAIR10.protein_coding_loci_27415.gff" class="">http://de.iplantcollaborative.org/dl/d/2C1901E6-7F52-4264-9CB7-AB72CEF6BD67/TAIR10.protein_coding_loci_27415.gff</a></div><div class=""><a href="http://de.iplantcollaborative.org/dl/d/44A6AD38-E408-4DB7-AC32-6689D3D1AC7A/TAIR10.protein_coding_loci_27415.fasta" class="">http://de.iplantcollaborative.org/dl/d/44A6AD38-E408-4DB7-AC32-6689D3D1AC7A/TAIR10.protein_coding_loci_27415.fasta</a></div><div class=""><br class=""></div><div class="">The single_exon= was set to zero in both sets. The two runs have used identical control files which were also attached. I thought single_exon= only mattered for generating annotation and didn’t realize it would also affect AED calculation.</div><div class=""><br class=""></div><div class="">Thank you.</div><div class=""><br class=""></div><div class="">Chia-Yi</div><div class=""><br class=""></div><span id="OLK_SRC_BODY_SECTION" class=""><div style="font-family: Calibri; font-size: 11pt; text-align: left; border-width: 1pt medium medium; border-style: solid none none; padding: 3pt 0in 0in; border-top-color: rgb(181, 196, 223);" class=""><span style="font-weight:bold" class="">From: </span>Carson Holt <<a href="mailto:carsonhh@gmail.com" class="">carsonhh@gmail.com</a>><br class=""><span style="font-weight:bold" class="">Date: </span>Monday, August 31, 2015 at 11:08 AM<br class=""><span style="font-weight:bold" class="">To: </span>Cheng Chia-Yi <<a href="mailto:ccheng@jcvi.org" class="">ccheng@jcvi.org</a>><br class=""><span style="font-weight:bold" class="">Cc: </span>"<a href="mailto:maker-devel@yandell-lab.org" class="">maker-devel@yandell-lab.org</a>" <<a href="mailto:maker-devel@yandell-lab.org" class="">maker-devel@yandell-lab.org</a>><br class=""><span style="font-weight:bold" class="">Subject: </span>Re: [maker-devel] AED scores from MAKER pipeline - deterministic or not?<br class=""></div><div class=""><br class=""></div><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
I would have to see the actual GFF3 files (full file including fast at the end). Give me both GFF3 files and the coordinates of the gene in question. My first guess is that you had the single_exon= filter set to different values on each run. The gene in question
is an unspliced single exon gene (based on the QI), your primary piece of evidence appears to be a single exon EST, and the only value that changes in the QI is the exon overlap. Single exon evidence will be ignored by default for the AED calculation unless
you have single_exon set to 1.
<div class=""><br class=""></div><div class="">Thanks,</div><div class="">Carson</div><div class=""><br class=""><div class=""><blockquote type="cite" class=""><div class="">On Aug 31, 2015, at 8:47 AM, Cheng, Chia-Yi <<a href="mailto:ccheng@jcvi.org" class="">ccheng@jcvi.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; font-size: 14px; font-family: Calibri, sans-serif;" class=""><div class=""><div class="">Hello MAKER team,</div><div class=""><br class=""></div><div class="">We at JCVI have been using MAKER (2.31.8) to calculate the AED of Arabidopsis gene models. We provided the annotation set as ‘model_gff’ with evidence file in ‘protein_gff’ and ‘est_gff’. All the other settings were default. One issue I’ve noticed
was that the AED scores did not seem to be deterministic. When I compare the AED scores from two runs using identical control files, ~1,000 (out of 35,385) gene models had different AED scores. The difference between two sets of AED scores could range from
0.01 to 1.00.</div><div class=""><br class=""></div><div class="">I looked into several gene models with lager difference, i.e. AED = 0.00 in run 1 and AED = 1.00 in run 2, and noticed a disagreement in the QI:</div><div class=""><br class=""></div><div class="">Run 1: _AED=0.00;_eAED=-0.00;_QI=0|-1|0|1|-1|0|1|0|344</div><div class="">Run 2: _AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|0|344</div><div class=""><br class=""></div><div class="">The discrepancy in the 4th column seemed to suggest the evidence file was not used properly in run 2. I’m not sure what may have caused as both runs have used the same input. A snapshot of the evidence files are pasted in the end of the email
in case needed. </div><div class=""><br class=""></div><div class="">Please let me know if more info is needed. Any help is appreciated. Thank you.</div><div class=""><br class=""></div><div class="">Chia-Yi</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">RNA-seq evidence file:</div><div class="">Chr1<span class="Apple-tab-span" style="white-space:pre"> </span>assembler-aerial2_pasa<span class="Apple-tab-span" style="white-space:pre"></span>cDNA_match<span class="Apple-tab-span" style="white-space:pre"></span>3624<span class="Apple-tab-span" style="white-space:pre"></span>5927<span class="Apple-tab-span" style="white-space:pre"></span>.<span class="Apple-tab-span" style="white-space:pre"></span>+<span class="Apple-tab-span" style="white-space:pre"></span>.<span class="Apple-tab-span" style="white-space:pre"></span>ID=aerial2_align_161343;Target=asmbl_1
1082 1234 +%2Casmbl_1 692 1081 +%2Casmbl_1 572 691 +%2Casmbl_1 1 290 +%2Casmbl_1 291 571 +%2Casmbl_1 1235 1723 +</div><div class="">Chr1<span class="Apple-tab-span" style="white-space:pre"> </span>assembler-aerial2_pasa<span class="Apple-tab-span" style="white-space:pre"></span>match_part<span class="Apple-tab-span" style="white-space:pre"></span>3624<span class="Apple-tab-span" style="white-space:pre"></span>3913<span class="Apple-tab-span" style="white-space:pre"></span>.<span class="Apple-tab-span" style="white-space:pre"></span>+<span class="Apple-tab-span" style="white-space:pre"></span>.<span class="Apple-tab-span" style="white-space:pre"></span>ID=aerial2_align_161343-1;Parent=aerial2_align_161343</div><div class="">Chr1<span class="Apple-tab-span" style="white-space:pre"> </span>assembler-aerial2_pasa<span class="Apple-tab-span" style="white-space:pre"></span>match_part<span class="Apple-tab-span" style="white-space:pre"></span>3996<span class="Apple-tab-span" style="white-space:pre"></span>4276<span class="Apple-tab-span" style="white-space:pre"></span>.<span class="Apple-tab-span" style="white-space:pre"></span>+<span class="Apple-tab-span" style="white-space:pre"></span>.<span class="Apple-tab-span" style="white-space:pre"></span>ID=aerial2_align_161343-2;Parent=aerial2_align_161343</div><div class=""><br class=""></div><div class="">EST evidence file:</div><div class="">Chr1<span class="Apple-tab-span" style="white-space:pre"> </span>est2genome<span class="Apple-tab-span" style="white-space:pre"></span>expressed_sequence_match<span class="Apple-tab-span" style="white-space:pre"></span>5470<span class="Apple-tab-span" style="white-space:pre"></span>5899<span class="Apple-tab-span" style="white-space:pre"></span>2150<span class="Apple-tab-span" style="white-space:pre"></span>-<span class="Apple-tab-span" style="white-space:pre"></span>.<span class="Apple-tab-span" style="white-space:pre"></span>ID=Chr1:hit:213:3.2.0.0;Name=gi|19829901|gb|AV795918|RAFL08-19-M04</div><div class="">Chr1<span class="Apple-tab-span" style="white-space:pre"> </span>est2genome<span class="Apple-tab-span" style="white-space:pre"></span>match_part<span class="Apple-tab-span" style="white-space:pre"></span>5470<span class="Apple-tab-span" style="white-space:pre"></span>5899<span class="Apple-tab-span" style="white-space:pre"></span>2150<span class="Apple-tab-span" style="white-space:pre"></span>-<span class="Apple-tab-span" style="white-space:pre"></span>.<span class="Apple-tab-span" style="white-space:pre"></span>ID=Chr1:hsp:500:3.2.0.0;Parent=Chr1:hit:213:3.2.0.0;Target=gi|19829901|gb|AV795918|RAFL08-19-M04
2 431 +;Gap=M430</div><div class=""><br class=""></div><div class="">Protein evidence file:</div><div class="">Chr1<span class="Apple-tab-span" style="white-space:pre"> </span>protein2genome<span class="Apple-tab-span" style="white-space:pre"></span>protein_match<span class="Apple-tab-span" style="white-space:pre"></span>3760<span class="Apple-tab-span" style="white-space:pre"></span>5284<span class="Apple-tab-span" style="white-space:pre"></span>727<span class="Apple-tab-span" style="white-space:pre"></span>+<span class="Apple-tab-span" style="white-space:pre"></span>.<span class="Apple-tab-span" style="white-space:pre"></span>ID=Chr1:hit:202:3.10.0.0;Name=UniRef90_M4EWW1</div><div class="">Chr1<span class="Apple-tab-span" style="white-space:pre"> </span>protein2genome<span class="Apple-tab-span" style="white-space:pre"></span>match_part<span class="Apple-tab-span" style="white-space:pre"></span>3760<span class="Apple-tab-span" style="white-space:pre"></span>3913<span class="Apple-tab-span" style="white-space:pre"></span>727<span class="Apple-tab-span" style="white-space:pre"></span>+<span class="Apple-tab-span" style="white-space:pre"></span>.<span class="Apple-tab-span" style="white-space:pre"></span>ID=Chr1:hsp:488:3.10.0.0;Parent=Chr1:hit:202:3.10.0.0;Target=UniRef90_M4EWW1
1 50;Gap=M31 D1 M19 F1</div><div class="">Chr1<span class="Apple-tab-span" style="white-space:pre"> </span>protein2genome<span class="Apple-tab-span" style="white-space:pre"></span>match_part<span class="Apple-tab-span" style="white-space:pre"></span>3996<span class="Apple-tab-span" style="white-space:pre"></span>4276<span class="Apple-tab-span" style="white-space:pre"></span>727<span class="Apple-tab-span" style="white-space:pre"></span>+<span class="Apple-tab-span" style="white-space:pre"></span>.<span class="Apple-tab-span" style="white-space:pre"></span>ID=Chr1:hsp:489:3.10.0.0;Parent=Chr1:hit:202:3.10.0.0;Target=UniRef90_M4EWW1
51 144;Gap=R1 M23 D1 M28 D1 M36 I2 M5</div></div><div class=""><br class=""></div></div>
_______________________________________________<br class="">
maker-devel mailing list<br class=""><a href="mailto:maker-devel@box290.bluehost.com" class="">maker-devel@box290.bluehost.com</a><br class=""><a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org" class="">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a><br class=""></div></blockquote></div><br class=""></div></div></div></span></div><span id="cid:2E4D9B66-D642-4E49-AAD5-943EE2864B9C@genetics.utah.edu" class=""><maker_bopts.ctl></span><span id="cid:4E61EB1D-87AC-43CC-8913-E5DE18E94E93@genetics.utah.edu" class=""><maker_exe.ctl></span><span id="cid:121E675A-D397-4515-A9FE-C89AEBE53788@genetics.utah.edu" class=""><maker_opts.ctl></span><span id="cid:0650656C-F756-448C-861E-FC19B3DBF2CE@genetics.utah.edu" class=""><1142_models.diff_AED.gff></span></div></blockquote></div><br class=""></div></div></div></div></span></div><span id="cid:1227C338-1547-4B46-8224-7BE59D9D4D8D@uconnect.utah.edu" class=""><818.diff_AED.20150831></span><span id="cid:1050B7CB-36FD-419D-A2A6-23852AC61ABD@uconnect.utah.edu" class=""><maker_bopts.ctl></span><span id="cid:5F1A2FB3-92BF-4332-AFEE-A6BCA70E4C69@uconnect.utah.edu" class=""><maker_exe.ctl></span><span id="cid:946A6D75-4AAD-4BB2-9006-FA859D708896@uconnect.utah.edu" class=""><maker_opts.ctl></span></div></blockquote></div><br class=""></div></div></div></div></div></div></span></div>
<span id="cid:328D506C-D864-4AC5-BFC3-8347CFF6FBE3@uconnect.utah.edu"><Screen Shot 2015-09-04 at 11.17.09 AM.png></span></div></blockquote></div><br class=""></body></html>