From carsonhh at gmail.com Mon Jan 5 20:59:23 2015 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 5 Jan 2015 19:59:23 -0700 Subject: [maker-devel] some problems using MAKER In-Reply-To: References: Message-ID: <08B46BBA-522B-43BC-9E82-57F641E0127D@gmail.com> I?d have to see the two GFF3 files you are using for your comparison. However one thing that comes to mind is that you may be unfamiliar with eval?s output. Eval provides several levels of strictness in the report at the gene, transcript, exon, and base pair levels. If you are using the gene level strictness in the report for example, then a single base pair difference in any of the transcripts will cause the entire gene to be considered a miss-match. You really only should use the base pair level SN/SP strictness for your comparison which will be in the eval report. In the most extreme case an exon level SN/SP strictness may be used, but in general no gold standard dataset is considered perfect enough to use the gene level SN/SP (or usually even the exon level strictness). ?Carson > On Dec 31, 2014, at 6:48 PM, ?? wrote: > > Hi all, > > Recently I'm using MAKER to annotate a single chromosome of rice as a pre-experiment. And I'm confronting some problems. After the annotation when I run the evaluation of eval between my result and gold standard, the gene sensitivity&specificity is only around 20%. And after I added the gff3 file maker made itself to run maker again, I found that the result is worse than 20%. > > My input is a Trinity-processed RNA-seq file and a protein file. I chose snap, augustus and genemark as ab initio predictors. > > I paste my maker_opts.ctl here: > > #-----Genome (these are always required) > genome=chr12.fasta #genome sequence (fasta file or fasta embeded in GFF3 file) > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff=chr12.gff #MAKER derived GFF3 file > est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est=rna-seq_trinity.fasta #set of ESTs or assembled mRNA-seq in fasta format > altest= #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at least one) > protein=Osativa_193_peptide.fa #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org=Rice #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker > repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no > softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) > > #-----Gene Prediction > snaphmm=rice #SNAP HMM file > gmhmm=/lustre/home/clswcc/yzhao/MAKER/maker/exe/genemark_hmm_euk_linux_64/ehmm/o_sativa.mod #GeneMark HMM file > augustus_species=arabidopsis #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff=augus.gff3 #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) > est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=1 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases > cpus=16 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) > > > Could you help me? Thank you !!! > > > > -- > Yue Zhao (Jerry) > Bachelor Candidate of Plant Biotechnology > Researcher in UCLA-CSST program > Shanghai Jiao Tong University, Shanghai > jerryzhaosjtu at gmail.com _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jerryzhaosjtu at gmail.com Wed Jan 7 05:16:45 2015 From: jerryzhaosjtu at gmail.com (=?UTF-8?B?6LW16LaK?=) Date: Wed, 7 Jan 2015 19:16:45 +0800 Subject: [maker-devel] using MAKER with MPI Message-ID: Greetings, Can I use mpirun instead of mpiexec? Thank you!! -- *Yue Zhao (Jerry)* Bachelor Candidate of Plant Biotechnology Researcher in UCLA-CSST program Shanghai Jiao Tong University, Shanghai *jerryzhaosjtu at gmail.com * -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jan 7 10:13:50 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 7 Jan 2015 09:13:50 -0700 Subject: [maker-devel] using MAKER with MPI In-Reply-To: References: Message-ID: Yes they are interchangeable. In fact in OpenMPI both mpiexec and mpirun are softlinks to the exact same executable ?> orterun Just remember MAKER works which MPICH2/3 and OpenMPI flavors of MPI but not with MVAPICH2. Also If using MPICH, make sure to enable shared libaries during installation (this is not the default). If using OpenMPI, make sure to set LD_PRELOAD to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/usr/local/openmpi/lib/libmpi.so). If jobs hang or freeze when using OpenMPI try adding the '-mca btl ^openib' flag to the mpiexec command when running MAKER. Example: mpiexec -mca btl ^openib -n 20 maker ?Carson > On Jan 7, 2015, at 4:16 AM, ?? wrote: > > Greetings, > > Can I use mpirun instead of mpiexec? Thank you!! > > -- > Yue Zhao (Jerry) > Bachelor Candidate of Plant Biotechnology > Researcher in UCLA-CSST program > Shanghai Jiao Tong University, Shanghai > jerryzhaosjtu at gmail.com _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jan 8 09:47:29 2015 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 8 Jan 2015 08:47:29 -0700 Subject: [maker-devel] MAKER mpi running wrong In-Reply-To: References: Message-ID: <13241A86-804F-4674-A8FD-CA90026CF4AF@gmail.com> When running large jobs in MPI semi-random issues can arise as well as tuning issues where hardware configuration, IO performance, buffer sizes etc. all play a role. Using one of the NIH flagship clusters from XSEDE for example, I can run on over 2000 CPUs without issue. But the IT specialists with XSEDE have also spent a lot of time tuning MPI by enabling and disabling certain options for their hardware and network configuration (The IT specialists for the XSEDE project are actually the developers for many of the MPI flavors available, so they actually wrote MPI to work really well on this specific cluster). On other clusters I can?t go over 200 cpus on a single job. Or on another XSEDE cluster I can run on exactly 1424 CPUs. If I increase by a single CPU, the jobs always fails. For these kinds of issues you would have to delve into some of the more obscure parameters of OpenMPI via trial and error (http://www.open-mpi.org/doc/ ). What happens under the hood in OpenMPI is that different buffer sizes and network communication strategies are triggered as the number of nodes increases, so you can often identify a specific CPU count that is stable, and going one over that number causes a failure. You then look in the documentation for a parameter that matches that trigger value and alter it higher or lower. Or if you can identify the stable CPU count, then just submit multiple jobs at exactly that CPU count. ?Carson > On Jan 8, 2015, at 8:27 AM, ?? wrote: > > Hi Carson, > > After using the flag in your example, the warning after runing MAKER was gone, yet after running with MPI in 512 threads for 2 hours, MAKER 'Exited with exit code 1' The stdout info is as followed: > > [node206][[7968,1],269][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104) > [node206][[7968,1],269][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104) > SIGTERM received > Perl exited with active threads: > 1 running and unjoined > 0 finished and unjoined > 0 running and detached > > Also, my job submission is like: > > #BSUB -J maker_mpi > #BSUB -n 512 > #BSUB -R "span[ptile=16]" > module purge && module load gcc/4.9.1 openmpi/gcc/1.6.5 > mpiexec -mca btl ^openib -n 512 perl /lustre/home/clswcc/yzhao/MAKER/maker/bin/maker -fix_nucleotides > > > Could you help me find out where is going wrong? The stdout at first is normal as followd : > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /lustre/home/clswcc/SOP_1Krice/gene_prediction/mpi/unaln.maker.output/unaln_datastore > > To access files for individual sequences use the datastore index: > /lustre/home/clswcc/SOP_1Krice/gene_prediction/mpi/unaln.maker.output/unaln_master_datastore_index.log > > STATUS: Now running MAKER... > > > > > Regards, > yue > > -- > Yue Zhao (Jerry) > Bachelor Candidate of Plant Biotechnology > Researcher in UCLA-CSST program > Shanghai Jiao Tong University, Shanghai > jerryzhaosjtu at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Wed Jan 14 02:40:38 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Wed, 14 Jan 2015 19:40:38 +1100 Subject: [maker-devel] doubt about selection of the best model Message-ID: Hi Maker developers and users, After quite a bit of time dealing with Maker, I can run it without problems (thank you Carson). However, I have doubts about the evaluation of the best model produced by Maker. I found the AED_cdf_generator.pl script while searching in the mail list and it is great but, when you use it, what gff files are you comparing? I initially thought that the models to be compared where those from each *ab initio* program, e.g. SNAP vs Augustus, and inside them, the subsequent bootstrap training steps, but unless you run only one each time you run Maker, the XXX.all.gff file will contain data from both predictions. Should I run them individually? Following the topic, Maker will generate different FASTA files for proteins and transcripts from each program (Maker and each *ab initio* predictor) as well as "non_overlapping" files. Which one(s) do you select to continue with the functional annotation? Thank you in advance, Xabier -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Wed Jan 14 02:49:34 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Wed, 14 Jan 2015 19:49:34 +1100 Subject: [maker-devel] Augustus retraining?? Message-ID: Hi, I trained Augustus using the output of CEGMA ( http://bioinf.uni-greifswald.de/bioinf/wiki/pmwiki.php?n=Augustus.CEGMATraining) through WebAugustus, which makes the training very easy but, and here is my question, can/should I re-train Augustus like it is done with SNAP? And what would I use for the re-training Thank you, Xabier -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Wed Jan 14 04:08:33 2015 From: mikael.durling at slu.se (=?utf-8?B?TWlrYWVsIEJyYW5kc3Ryw7ZtIER1cmxpbmc=?=) Date: Wed, 14 Jan 2015 10:08:33 +0000 Subject: [maker-devel] Augustus retraining?? In-Reply-To: References: Message-ID: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> Hi, 14 jan 2015 kl. 09:49 skrev Xabier V?zquez Campos >: Hi, I trained Augustus using the output of CEGMA (http://bioinf.uni-greifswald.de/bioinf/wiki/pmwiki.php?n=Augustus.CEGMATraining) through WebAugustus, which makes the training very easy but, and here is my question, can/should I re-train Augustus like it is done with SNAP? And what would I use for the re-training I?ve tried an approach of retraining augustus in a manner similar to what has been suggested here earlier for retraining of SNAP. This has been run with a local augustus installation as part of an automated framework I have set up to annotate fungal genomes. Interestingly, augustus seems to converge very quickly. It is not uncommon that autoAugustus reports that it could not improve the initial models that were derived from the CEGMA dataset. Are there other similar experiences on the list? I also a modified version of maker2zff which I call maker2augustus_gff which extracts an evidence set for augustus retraining from the initial round of maker. I?m happy to share it with anyone interested. cheers, Mikael Thank you, Xabier -- Xabier V?zquez Campos PhD Candidate Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jan 14 09:22:57 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 Jan 2015 08:22:57 -0700 Subject: [maker-devel] Augustus retraining?? In-Reply-To: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> References: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> Message-ID: <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> Here is some info on training SNAP via the bootstrap technique (i.e. using the models produced by the initial training to seed the next round of training). Even though the examples use SNAP, it would be applicable using the scripts and methods Mikael described in his w-mail ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors Also Jason Stajich wrote an excellent explanation on training Augustus on the GMOD mailing list ?> http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html He also included his own scripts to assist with the training ?> https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2augustus_gbk.pl ?Carson > On Jan 14, 2015, at 3:08 AM, Mikael Brandstr?m Durling wrote: > > Hi, > > >> 14 jan 2015 kl. 09:49 skrev Xabier V?zquez Campos >: >> >> Hi, >> >> I trained Augustus using the output of CEGMA (http://bioinf.uni-greifswald.de/bioinf/wiki/pmwiki.php?n=Augustus.CEGMATraining ) through WebAugustus, which makes the training very easy but, and here is my question, can/should I re-train Augustus like it is done with SNAP? And what would I use for the re-training > > I?ve tried an approach of retraining augustus in a manner similar to what has been suggested here earlier for retraining of SNAP. This has been run with a local augustus installation as part of an automated framework I have set up to annotate fungal genomes. Interestingly, augustus seems to converge very quickly. It is not uncommon that autoAugustus reports that it could not improve the initial models that were derived from the CEGMA dataset. Are there other similar experiences on the list? > > I also a modified version of maker2zff which I call maker2augustus_gff which extracts an evidence set for augustus retraining from the initial round of maker. I?m happy to share it with anyone interested. > > cheers, > Mikael > > >> >> Thank you, >> >> Xabier >> -- >> Xabier V?zquez Campos >> PhD Candidate >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jan 14 09:37:43 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 Jan 2015 08:37:43 -0700 Subject: [maker-devel] doubt about selection of the best model In-Reply-To: References: Message-ID: The MAKER models will be the final models. Fasta files and features from the raw ab initio gene predictors on the other hand are there for reference purposes only and unless you have a need for them should be ignored. MAKER models are the combination of ab initio gene predictions filtered for best evidence match together with hint based models from the predictors. Basically MAKER took the best models from each separate predictor and created a final consensus gene set. The CDF generator really is for comparison of how evidence match changes between different releases of the genome or for different parameter options (i.e. you are comparing curves between independent MAKER runs and not within a single MAKER run). THE AED CDF curve is interpreted similar to a ROC curve in that shifts up and to the left indicate improved gene models. This is as opposed to using sensitivity and specificity, because those measures require you to already know the correct models in order to generate a comparison. For de-novo annotation that is impossible (if you already had the correct models you wouldn?t be running MAKER), so since such values cannot be generated then AED which used evidence overlap acts as a proxy measurement. This paper probably gives the overall best example of how AED correlates with model quality (Figures 2 and 3) ?> http://www.biomedcentral.com/1471-2105/12/491 ?Carson > On Jan 14, 2015, at 1:40 AM, Xabier V?zquez Campos wrote: > > Hi Maker developers and users, > > After quite a bit of time dealing with Maker, I can run it without problems (thank you Carson). However, I have doubts about the evaluation of the best model produced by Maker. > > I found the AED_cdf_generator.pl script while searching in the mail list and it is great but, when you use it, what gff files are you comparing? I initially thought that the models to be compared where those from each ab initio program, e.g. SNAP vs Augustus, and inside them, the subsequent bootstrap training steps, but unless you run only one each time you run Maker, the XXX.all.gff file will contain data from both predictions. Should I run them individually? > > Following the topic, Maker will generate different FASTA files for proteins and transcripts from each program (Maker and each ab initio predictor) as well as "non_overlapping" files. Which one(s) do you select to continue with the functional annotation? > > Thank you in advance, > > Xabier > > -- > Xabier V?zquez Campos > PhD Candidate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Fri Jan 16 02:09:11 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Fri, 16 Jan 2015 19:09:11 +1100 Subject: [maker-devel] functional annotation Message-ID: Hi, What file from the Maker output do you use for the functional annotation? The fasta part of the XXX.all.gff? I'll probably be using BLAST and InterProScan. I tested B2go (basic version), good stuff but it is annoyingly slow. Thank you -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Fri Jan 16 04:11:21 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Fri, 16 Jan 2015 21:11:21 +1100 Subject: [maker-devel] repeat masking and repeat libraries Message-ID: Hi there, First, a general question. Probably kind of silly but I prefer to be sure... When you browse RepBase, for example in fungi, all the repeats are marked as Eukaryota (Ancestral) or under the name of the species but no other taxa ranks are indicated. Does RepeatMasker recognise orders, families etc? or in my case should I stick with model_org=fungi? I've been trying to create a repeat libraries specific for my genomes and I did't have any luck with the programs described in the Basic and advanced tutorials (neither in my computer or in the cluster), reporting errors at all times, with exception of RepeatModeler, which ran with no problems. Is the output from RepeatModeler enough to improve the masking? It is not the best option I guess, but better than just the RepBase libraries by themselves, isn't it? Thank you for your time, Xabier -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Fri Jan 16 11:01:37 2015 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Fri, 16 Jan 2015 10:01:37 -0700 Subject: [maker-devel] functional annotation In-Reply-To: References: Message-ID: Hi Xabier, The FASTA at the end of the GFF3 file is the genome. For functional annotation you want to use the XXXout.all.maker.proteins.fasta file. It contains the protein sequences for your MAKER gene models. Good luck, Mike On Fri, Jan 16, 2015 at 1:09 AM, Xabier V?zquez Campos wrote: > Hi, > > What file from the Maker output do you use for the functional annotation? > The fasta part of the XXX.all.gff? > > I'll probably be using BLAST and InterProScan. I tested B2go (basic > version), good stuff but it is annoyingly slow. > > Thank you > > -- > Xabier V?zquez Campos > *PhD Candidate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 16 11:04:09 2015 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 16 Jan 2015 10:04:09 -0700 Subject: [maker-devel] repeat masking and repeat libraries In-Reply-To: References: Message-ID: Using both RepBase and a RepeatModeler produced library should be sufficient, especially for fungi. ?Carson > On Jan 16, 2015, at 3:11 AM, Xabier V?zquez Campos wrote: > > Hi there, > > First, a general question. Probably kind of silly but I prefer to be sure... When you browse RepBase, for example in fungi, all the repeats are marked as Eukaryota (Ancestral) or under the name of the species but no other taxa ranks are indicated. Does RepeatMasker recognise orders, families etc? or in my case should I stick with model_org=fungi? > > I've been trying to create a repeat libraries specific for my genomes and I did't have any luck with the programs described in the Basic and advanced tutorials (neither in my computer or in the cluster), reporting errors at all times, with exception of RepeatModeler, which ran with no problems. Is the output from RepeatModeler enough to improve the masking? It is not the best option I guess, but better than just the RepBase libraries by themselves, isn't it? > > Thank you for your time, > > Xabier > > -- > Xabier V?zquez Campos > PhD Candidate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Fri Jan 16 11:08:43 2015 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Fri, 16 Jan 2015 10:08:43 -0700 Subject: [maker-devel] repeat masking and repeat libraries In-Reply-To: References: Message-ID: Hi Xabier, I haven't seen orders or families documented for repeatmasker with repbase. Fungi seems safe to me. If you want to give yourself a little more peace of mind about the repeatmodeler library you can blast it to database of known fungal proteins and remove the entries int he library that have strong hits to a known protein to avoid over-masking. Mike On Fri, Jan 16, 2015 at 10:04 AM, Carson Holt wrote: > Using both RepBase and a RepeatModeler produced library should be > sufficient, especially for fungi. > > ?Carson > > > On Jan 16, 2015, at 3:11 AM, Xabier V?zquez Campos > wrote: > > Hi there, > > First, a general question. Probably kind of silly but I prefer to be > sure... When you browse RepBase, for example in fungi, all the repeats are > marked as Eukaryota (Ancestral) or under the name of the species but no > other taxa ranks are indicated. Does RepeatMasker recognise orders, > families etc? or in my case should I stick with model_org=fungi? > > I've been trying to create a repeat libraries specific for my genomes and > I did't have any luck with the programs described in the Basic > > and advanced > > tutorials (neither in my computer or in the cluster), reporting errors at > all times, with exception of RepeatModeler, which ran with no problems. Is > the output from RepeatModeler enough to improve the masking? It is not the > best option I guess, but better than just the RepBase libraries by > themselves, isn't it? > > Thank you for your time, > > Xabier > > -- > Xabier V?zquez Campos > *PhD Candidate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Fri Jan 16 21:57:26 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Sat, 17 Jan 2015 14:57:26 +1100 Subject: [maker-devel] AED score script error Message-ID: Hi, Just reporting the following error with the AED_cdf_generator.pl script: Use of uninitialized value $opt_b in division (/) at AED_cdf_generator.pl > line 20. > Illegal division by zero at AED_cdf_generator.pl line 20. > Anybody else with this problem? I use the version attached here: https://groups.google.com/forum/#!topic/maker-devel/LCpB3CEm63M Thank you -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Jan 19 11:27:52 2015 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 19 Jan 2015 10:27:52 -0700 Subject: [maker-devel] AED score script error In-Reply-To: References: Message-ID: Hi Xabier, Did you give the -b option a value on the command line ( e.g. -b 0.1)? Mike On Fri, Jan 16, 2015 at 8:57 PM, Xabier V?zquez Campos wrote: > Hi, > > Just reporting the following error with the AED_cdf_generator.pl script: > > Use of uninitialized value $opt_b in division (/) at AED_cdf_generator.pl >> line 20. >> Illegal division by zero at AED_cdf_generator.pl line 20. >> > > Anybody else with this problem? > I use the version attached here: > https://groups.google.com/forum/#!topic/maker-devel/LCpB3CEm63M > > Thank you > > > -- > Xabier V?zquez Campos > *PhD Candidate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Tue Jan 20 00:14:58 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Tue, 20 Jan 2015 17:14:58 +1100 Subject: [maker-devel] AED score script error In-Reply-To: References: Message-ID: Thanks Mike. It was that. 2015-01-20 4:27 GMT+11:00 Michael Campbell : > Hi Xabier, > > Did you give the -b option a value on the command line ( e.g. -b 0.1)? > > Mike > > On Fri, Jan 16, 2015 at 8:57 PM, Xabier V?zquez Campos < > xvazquezc at gmail.com> wrote: > >> Hi, >> >> Just reporting the following error with the AED_cdf_generator.pl script: >> >> Use of uninitialized value $opt_b in division (/) at AED_cdf_generator.pl >>> line 20. >>> Illegal division by zero at AED_cdf_generator.pl line 20. >>> >> >> Anybody else with this problem? >> I use the version attached here: >> https://groups.google.com/forum/#!topic/maker-devel/LCpB3CEm63M >> >> Thank you >> >> >> -- >> Xabier V?zquez Campos >> *PhD Candidate* >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > -- > Michael Campbell MS, RD. > Doctoral Candidate > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ph:585-3543 > > -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jan 20 10:45:01 2015 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 Jan 2015 09:45:01 -0700 Subject: [maker-devel] Issue due to intensive I/O In-Reply-To: References: Message-ID: <6F82AB5F-4782-41CA-A61F-C79894EFABB4@gmail.com> Genome annotation is very data intensive as opposed to CPU intensive. In MAKER, most IO intensive operations will occur in a temporary directory pointed to by the TMP= option in the MAKER control files. If you are setting this value to a location on a network mounted drive then this could be the source of your problem. Also TMP= defaults to the location of the TMPDIR Linux environmental variable, so make sure that TMPDIR is not set to a network mounted location either. The temporary directory needs to be a locally mounted location. There will still need to be a number of global files though; however, we?ve previously ran MAKER on over 8,000 cpus on Lustre file systems with no issues. It is possible that it is the metadata server that is having problems as opposed to the object storage server if the genome being annotated has a large number of small contigs. Lots of small contigs in a fragmented genome assembly result in a lot of small result files, but very little reading and writing. Such a situation can be quite stressful for Lustre file systems because they don?t like having large numbers of very small files (it overwhelms the metadata server even though the object storage server will be under more moderate load). Make sure you are setting min_contig= to something like 10000 if that is the case to avoid generating analysis for short un-annotatable contigs (they may number in the hundreds of thousands on lower quality genome assemblies and contain no useful information). You can also set clean_up=1 in the maker control files, to delete files as MAKER advances. This removes restart capability because you won?t have logged results from previous runs, but it will reduce the burden on the Metadata server (which is affected by total file number as opposed to file read/write operations). Also setting clean_up=1 can help you avoid any administrator defined limits on total file number per user (administrators commonly set this limit on Lustre based file systems to avoid taxing the metadata server). So your issue is likely caused by one of two things: 1. Improperly setting TMP= in the maker_opts.ctl file or the Linux TMPDIR environmental variable to a network mounted location. Fixed by setting these to a locally mounted location (usually /tmp). 2. Too many total files being generated by a fragmented genome assembly. Fixed by either setting min_contig=10000 in order to skip short contigs or by setting clean_up=1 to avoid logging too many files. This happen because it is very difficult to overwhelm Lustre's object storage servers (which perform IO read/write operations), but it?s relatively easy to overwhelming the metadata server (affected by total file count rather than total IO throughput). ?Carson > On Jan 19, 2015, at 5:55 AM, Stephen Wang wrote: > > Dear MAKER Team, > > I am a cluster administrator in the university. The issue is caused by MAKER jobs, which access massive small files and crashed Lustre file system. > > Hardware: 16 cores per node > Software: OpenMPI 1.6.5 and GCC 4.9.1 > > Q1: Does MAKER have to generate a large number of files on the global file system? > Q2: Can any parameters help MAKER avoid I/O intensive access? Any experience on Lustre? > > MAKER is a quite important software for our user. Hope for your help. > > BR, > Stephen > > -- > Stephen Wang, GPU Computing Specialist > Center for High Performance Computing > Shanghai Jiao Tong University > Room 205 Network Center, 800 Dongchuan Road, Shanghai 200240 China > Mobi:+86-136-6151-1618 Web:http://hpc.sjtu.edu.cn -------------- next part -------------- An HTML attachment was scrubbed... URL: From jgallant at msu.edu Wed Jan 21 07:56:02 2015 From: jgallant at msu.edu (Jason Gallant) Date: Wed, 21 Jan 2015 05:56:02 -0800 (PST) Subject: [maker-devel] Maker on Amazon EC2 Using Starcluster Message-ID: <1421848561970.c8b481bf@Nodemailer> Hi Everyone, I?m attempting to run Maker on Amazon EC2 using MIT?s starcluster? I?ve started a 200 node cluster, and enabled MPICH2 (Starcluster by default uses OpenMPI). ?I plan on documenting this setup once I?ve figured out how to run things reliably. I?m having a persistent issue where something fails on one of the nodes, and std error is flooded with: examining contents of the fasta file and run log [67] ERROR: could not make datastore directory [67] --> rank=67, hostname=node067 [67] ERROR: Failed while examining contents of the fasta file and run log [67] ERROR: Chunk failed at level:0, tier_type:0 [67] FAILED CONTIG:Scaffold261 This error repeats for each ?next? scaffold for some time. ?When I go back to find the ?source? of the error in the log, the following is the first error message on that node: 67] #-------------------------------# [67] deleted:-60 hits [67] collecting blastx reports [67] ERROR: Could not colapse BLAST reports [67]? at /root/maker/bin/../lib/GI.pm line 2524 thread 1. [67] GI::combine_blast_report(FastaChunk=HASH(0x108e1a90), ARRAY(0x1b874938), ARRAY(0xf127ad8), runlog=HASH(0x4d54ed8)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760 thread 1 [67] Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415 thread 1 [67] eval {...} called at /root/maker/bin/../lib/Error.pm line 407 thread 1 [67] Error::subs::try(CODE(0x1514eb00), HASH(0x9cbeb568)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4215 thread 1 [67] Process::MpiChunk::_go(Process::MpiChunk=HASH(0x13976308), "run", HASH(0x12e04268), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341 thread 1 [67] Process::MpiChunk::run(Process::MpiChunk=HASH(0x13976308), 67) called at /root/maker/bin/maker line 1457 thread 1 [67] main::node_thread("/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 [67] eval {...} called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 [67] threads::new("threads", CODE(0x3dc5b38), "/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /root/maker/bin/maker line 917 thread 1 [67] --> rank=67, hostname=node067 [67] ERROR: Failed while collecting blastx reports [67] ERROR: Chunk failed at level:9, tier_type:3 [67] FAILED CONTIG:Scaffold66 [67]? [67] ERROR: Chunk failed at level:4, tier_type:0 [67] FAILED CONTIG:Scaffold66 I?ve attempted to ignore the error to see if things will proceed on the other 199 processors. ?When I returned to the ?master? node after the evening, Maker keeps repeating the same error code over and over (same scaffold): ] examining contents of the fasta file and run log [67] ERROR: could not make datastore directory [67] --> rank=67, hostname=node067 [67] ERROR: Failed while examining contents of the fasta file and run log [67] ERROR: Chunk failed at level:0, tier_type:0 [67] FAILED CONTIG:Scaffold1589 I stop the job, and restart, and after only a few minutes of running, the same error is reported, this time on a new scaffold. ?Strangely here, the error is reported in the MPI tag of node001, but the error originates at node137: ERROR: Could not colapse BLAST reports [1]? at /root/maker/bin/../lib/GI.pm line 2524. [1] ? ? GI::combine_blast_report(FastaChunk=HASH(0xf4aa9b8), ARRAY(0xf628f90), ARRAY(0x325fea78), runlog=HASH(0x133cc8e8)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760 [1] ? ? Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415 [1] ? ? eval {...} called at /root/maker/bin/../lib/Error.pm line 407 [1] ? ? Error::subs::try(CODE(0x352c9b8), HASH(0xdab3b690)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4215 [1] ? ? Process::MpiChunk::_go(Process::MpiChunk=HASH(0x3545d90), "run", HASH(0x30aa710), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341 [1] ? ? Process::MpiChunk::run(Process::MpiChunk=HASH(0x3545d90), 137) called at /root/maker/bin/maker line 979 [1] --> rank=137, hostname=node137 [1] ERROR: Failed while collecting blastx reports [1] ERROR: Chunk failed at level:9, tier_type:3 [1] FAILED CONTIG:Scaffold249 [1] [1] ERROR: Chunk failed at level:4, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 I?d appreciate any guidance as how best to diagnose this error! Many thanks, Jason Gallant ? Dr. Jason R. GallantAssistant Professor Room 38 Natural Sciences Department of Zoology Michigan State University East Lansing, MI 48824 jgallant at msu.edu office: 517-884-7756 -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Wed Jan 21 18:42:35 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Thu, 22 Jan 2015 11:42:35 +1100 Subject: [maker-devel] repeat masking and repeat libraries In-Reply-To: References: Message-ID: Thanks Mike, I've blasted (blastx against nr) and many, if not most of the repeatmodeler library sequences match with transposases, pol proteins, gag proteins, retrotransposons,... all of them present in other fungi of the same order. Should I leave it to be masked? I still do run prediction on the unmasked genome too? Also, in many cases, the match a couple of thousand bp on the extreme of a 9kbp sequence and in none of them InterProScan is capable of finding anything except potential TM domains or so, provided by SignalP. What do you think? Should I leave it as it is? Thank you again for your time 2015-01-17 4:08 GMT+11:00 Michael Campbell : > Hi Xabier, > > I haven't seen orders or families documented for repeatmasker with > repbase. Fungi seems safe to me. > > If you want to give yourself a little more peace of mind about the > repeatmodeler library you can blast it to database of known fungal proteins > and remove the entries int he library that have strong hits to a known > protein to avoid over-masking. > > Mike > > On Fri, Jan 16, 2015 at 10:04 AM, Carson Holt wrote: > >> Using both RepBase and a RepeatModeler produced library should be >> sufficient, especially for fungi. >> >> ?Carson >> >> >> On Jan 16, 2015, at 3:11 AM, Xabier V?zquez Campos >> wrote: >> >> Hi there, >> >> First, a general question. Probably kind of silly but I prefer to be >> sure... When you browse RepBase, for example in fungi, all the repeats are >> marked as Eukaryota (Ancestral) or under the name of the species but no >> other taxa ranks are indicated. Does RepeatMasker recognise orders, >> families etc? or in my case should I stick with model_org=fungi? >> >> I've been trying to create a repeat libraries specific for my genomes and >> I did't have any luck with the programs described in the Basic >> >> and advanced >> >> tutorials (neither in my computer or in the cluster), reporting errors at >> all times, with exception of RepeatModeler, which ran with no problems. Is >> the output from RepeatModeler enough to improve the masking? It is not the >> best option I guess, but better than just the RepBase libraries by >> themselves, isn't it? >> >> Thank you for your time, >> >> Xabier >> >> -- >> Xabier V?zquez Campos >> *PhD Candidate* >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > -- > Michael Campbell MS, RD. > Doctoral Candidate > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ph:585-3543 > > -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Thu Jan 22 10:42:56 2015 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Thu, 22 Jan 2015 09:42:56 -0700 Subject: [maker-devel] repeat masking and repeat libraries In-Reply-To: References: Message-ID: Hi Xabier, >From what you described I would leave it as is. Mike On Wed, Jan 21, 2015 at 5:42 PM, Xabier V?zquez Campos wrote: > Thanks Mike, > > I've blasted (blastx against nr) and many, if not most of the > repeatmodeler library sequences match with transposases, pol proteins, gag > proteins, retrotransposons,... all of them present in other fungi of the > same order. Should I leave it to be masked? I still do run prediction on > the unmasked genome too? > Also, in many cases, the match a couple of thousand bp on the extreme of a > 9kbp sequence and in none of them InterProScan is capable of finding > anything except potential TM domains or so, provided by SignalP. > > What do you think? Should I leave it as it is? > > Thank you again for your time > > 2015-01-17 4:08 GMT+11:00 Michael Campbell > : > >> Hi Xabier, >> >> I haven't seen orders or families documented for repeatmasker with >> repbase. Fungi seems safe to me. >> >> If you want to give yourself a little more peace of mind about the >> repeatmodeler library you can blast it to database of known fungal proteins >> and remove the entries int he library that have strong hits to a known >> protein to avoid over-masking. >> >> Mike >> >> On Fri, Jan 16, 2015 at 10:04 AM, Carson Holt wrote: >> >>> Using both RepBase and a RepeatModeler produced library should be >>> sufficient, especially for fungi. >>> >>> ?Carson >>> >>> >>> On Jan 16, 2015, at 3:11 AM, Xabier V?zquez Campos >>> wrote: >>> >>> Hi there, >>> >>> First, a general question. Probably kind of silly but I prefer to be >>> sure... When you browse RepBase, for example in fungi, all the repeats are >>> marked as Eukaryota (Ancestral) or under the name of the species but no >>> other taxa ranks are indicated. Does RepeatMasker recognise orders, >>> families etc? or in my case should I stick with model_org=fungi? >>> >>> I've been trying to create a repeat libraries specific for my genomes >>> and I did't have any luck with the programs described in the Basic >>> >>> and advanced >>> >>> tutorials (neither in my computer or in the cluster), reporting errors at >>> all times, with exception of RepeatModeler, which ran with no problems. Is >>> the output from RepeatModeler enough to improve the masking? It is not the >>> best option I guess, but better than just the RepBase libraries by >>> themselves, isn't it? >>> >>> Thank you for your time, >>> >>> Xabier >>> >>> -- >>> Xabier V?zquez Campos >>> *PhD Candidate* >>> Water Research Centre >>> School of Civil and Environmental Engineering >>> The University of New South Wales >>> Sydney NSW 2052 AUSTRALIA >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> >> >> -- >> Michael Campbell MS, RD. >> Doctoral Candidate >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ph:585-3543 >> >> > > > -- > Xabier V?zquez Campos > *PhD Candidate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 23 13:17:36 2015 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 23 Jan 2015 12:17:36 -0700 Subject: [maker-devel] running maker on TACC Stampede. In-Reply-To: References: Message-ID: Stampede only has MVAPICH2. It does not have OpenMPI (even though it has been requested several times). OpenFabrics libraries (used by MVAPICH2) have a known issue that restricts programs from making system calls while running under MPI. A system call is when one program launches another (i.e. MAKER launching BLAST). For this reason MAKER does not work with MVAPICH2. It only works with OpenMPI. You can still get it to work on MVAPICH2, but only on a single node. If you request more than one node then it will fail. The solution would be for TACC to install OpenMPI as an option on Stampede (like they have on Lonestar), but until that happens you can only run MAKER on a single node. Thanks, Carson > On Jan 22, 2015, at 10:51 PM, Won C Yim wrote: > > Dear anyone whom may it concern, > > Hi! > > My name is Won Cheol Yim in University of Nevada, Reno. > > I try to run MAKER on TACC Stampede. > > It looks everything installed properly. > > ============================================================================== > STATUS MAKER v2.31.8 > ============================================================================== > PERL Dependencies: > VERIFIED > External Programs: > VERIFIED > External C Libraries: > VERIFIED > MPI SUPPORT: > ENABLED > MWAS Web Interface: > DISABLED > MAKER PACKAGE: > CONFIGURATION OK > > And I installed Perl 5.18.4 with threads option. > > But I try to run it with MPI, it generated error. > > I assumed this problem came from ibrun in Stampede. > > Is there anyway to run it on Stampede? > > Here is my log. > > TACC: Starting up job > TACC: Setting up parallel environment for MVAPICH ssh-based mpirun. > cat: /home1/02908/wyim/.sge/job..hostlist.kUm5vXw9: No such file or directory > sort: open failed: /home1/02908/wyim/.sge/job..hostlist.kUm5vXw9: No such file or directory > TACC: Setup complete. Running job script. > TACC: starting parallel tasks... > [c404-703.stampede.tacc.utexas.edu:mpirun_rsh][read_hostfile] Can't open hostfile `/home1/02908/wyim/.sge/job..hostlist.kUm5vXw9': (2) > TACC: MPI job exited with code: 1 > TACC: Shutting down parallel environment. > TACC: Shutdown complete. Exiting. > > > Regards, > > Won > -- > Yim, Won Cheol > Sent with Airmail -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 23 14:00:56 2015 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 23 Jan 2015 13:00:56 -0700 Subject: [maker-devel] Maker on Amazon EC2 Using Starcluster In-Reply-To: <1421848561970.c8b481bf@Nodemailer> References: <1421848561970.c8b481bf@Nodemailer> Message-ID: MAKER needs a global storage location. You probably need to set up one of your instances up to act as a shared storage server. AWS has lustre implementations for the cloud, perhaps you can try that. Also use OpenMPI instead of MPICH2. It?s more stable. I look forward to seeing how your experiment with AWS, MPI, and MAKER works out. ?Carson > On Jan 21, 2015, at 6:56 AM, Jason Gallant wrote: > > Hi Everyone, > > I?m attempting to run Maker on Amazon EC2 using MIT?s starcluster? I?ve started a 200 node cluster, and enabled MPICH2 (Starcluster by default uses OpenMPI). I plan on documenting this setup once I?ve figured out how to run things reliably. > > I?m having a persistent issue where something fails on one of the nodes, and std error is flooded with: > > examining contents of the fasta file and run log > [67] ERROR: could not make datastore directory > [67] --> rank=67, hostname=node067 > [67] ERROR: Failed while examining contents of the fasta file and run log > [67] ERROR: Chunk failed at level:0, tier_type:0 > [67] FAILED CONTIG:Scaffold261 > > This error repeats for each ?next? scaffold for some time. When I go back to find the ?source? of the error in the log, the following is the first error message on that node: > > 67] #-------------------------------# > [67] deleted:-60 hits > [67] collecting blastx reports > [67] ERROR: Could not colapse BLAST reports > [67] at /root/maker/bin/../lib/GI.pm line 2524 thread 1. > [67] GI::combine_blast_report(FastaChunk=HASH(0x108e1a90), ARRAY(0x1b874938), ARRAY(0xf127ad8), runlog=HASH(0x4d54ed8)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760 thread 1 > [67] Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415 thread 1 > [67] eval {...} called at /root/maker/bin/../lib/Error.pm line 407 thread 1 > [67] Error::subs::try(CODE(0x1514eb00), HASH(0x9cbeb568)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4215 thread 1 > [67] Process::MpiChunk::_go(Process::MpiChunk=HASH(0x13976308), "run", HASH(0x12e04268), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341 thread 1 > [67] Process::MpiChunk::run(Process::MpiChunk=HASH(0x13976308), 67) called at /root/maker/bin/maker line 1457 thread 1 > [67] main::node_thread("/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 > [67] eval {...} called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 > [67] threads::new("threads", CODE(0x3dc5b38), "/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /root/maker/bin/maker line 917 thread 1 > [67] --> rank=67, hostname=node067 > [67] ERROR: Failed while collecting blastx reports > [67] ERROR: Chunk failed at level:9, tier_type:3 > [67] FAILED CONTIG:Scaffold66 > [67] > [67] ERROR: Chunk failed at level:4, tier_type:0 > [67] FAILED CONTIG:Scaffold66 > > > I?ve attempted to ignore the error to see if things will proceed on the other 199 processors. When I returned to the ?master? node after the evening, Maker keeps repeating the same error code over and over (same scaffold): > ] examining contents of the fasta file and run log > [67] ERROR: could not make datastore directory > [67] --> rank=67, hostname=node067 > [67] ERROR: Failed while examining contents of the fasta file and run log > [67] ERROR: Chunk failed at level:0, tier_type:0 > [67] FAILED CONTIG:Scaffold1589 > > I stop the job, and restart, and after only a few minutes of running, the same error is reported, this time on a new scaffold. Strangely here, the error is reported in the MPI tag of node001, but the error originates at node137: > > ERROR: Could not colapse BLAST reports > [1] at /root/maker/bin/../lib/GI.pm line 2524. > [1] GI::combine_blast_report(FastaChunk=HASH(0xf4aa9b8), ARRAY(0xf628f90), ARRAY(0x325fea78), runlog=HASH(0x133cc8e8)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760 > [1] Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415 > [1] eval {...} called at /root/maker/bin/../lib/Error.pm line 407 > [1] Error::subs::try(CODE(0x352c9b8), HASH(0xdab3b690)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4215 > [1] Process::MpiChunk::_go(Process::MpiChunk=HASH(0x3545d90), "run", HASH(0x30aa710), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341 > [1] Process::MpiChunk::run(Process::MpiChunk=HASH(0x3545d90), 137) called at /root/maker/bin/maker line 979 > [1] --> rank=137, hostname=node137 > [1] ERROR: Failed while collecting blastx reports > [1] ERROR: Chunk failed at level:9, tier_type:3 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] ERROR: Chunk failed at level:4, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > > I?d appreciate any guidance as how best to diagnose this error! > > Many thanks, > Jason Gallant > > > > > ? > Dr. Jason R. Gallant > Assistant Professor > Room 38 Natural Sciences > Department of Zoology > Michigan State University > East Lansing, MI 48824 > jgallant at msu.edu > office: 517-884-7756 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From jcornel3 at asu.edu Fri Jan 23 15:28:13 2015 From: jcornel3 at asu.edu (John Cornelius) Date: Fri, 23 Jan 2015 13:28:13 -0800 Subject: [maker-devel] Maker-P vs. Maker Message-ID: Hi, I'm working on annotating a tetraploid animal with a genome size that is 3.1 gigabase in size. I was wondering if maker-P would be appropriate for this organism or is I should just stick with maker? Thanks. -- John Cornelius MCB PhD Candidate Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 23 15:59:01 2015 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 23 Jan 2015 14:59:01 -0700 Subject: [maker-devel] Maker-P vs. Maker In-Reply-To: References: Message-ID: <7813BFBE-7237-4298-8AD3-B210CB96DDD2@gmail.com> Actually the code bases have been merged. So if you use the most recent version of MAKER, the plant extensions for RNA annotation and extra analysis scripts from MAKER-P will be there. If you don?t need them, then just don?t turn the options on in the control files. ?Carson > On Jan 23, 2015, at 2:28 PM, John Cornelius wrote: > > Hi, I'm working on annotating a tetraploid animal with a genome size that is 3.1 gigabase in size. I was wondering if maker-P would be appropriate for this organism or is I should just stick with maker? Thanks. > > -- > John Cornelius > MCB PhD Candidate > Arizona State University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Jan 26 13:17:45 2015 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 26 Jan 2015 12:17:45 -0700 Subject: [maker-devel] running maker on TACC Stampede. In-Reply-To: References: Message-ID: Do you mean sequence upstream of the gene? If that is the case you would probably have to write a script to do this. BioPerl is one options that has several Perl modules that help with manipulating fasta sequences and many common biology tool file formats ?> http://www.bioperl.org ?Carson > On Jan 26, 2015, at 12:10 PM, Won C Yim wrote: > > Dear Carson Holt, > > Thank you for your reply. > > I asked this issue to STAMPEDE and there?s no way to help me. > > I think we need to move another server for MAKER. > > Thank you for your help. > > And I have a one more question. > > Is there any way to extract upstream sequence from MAKER results? > > I tried to extract upstream and downstream results from them, but it?s really hard to do it. > > Regards, > > Won > > -- > Yim, Won Cheol > MS330/Department of Biochemistry & Molecular Biology > 1664 N. Virginia Street > University of Nevada, Reno > > email: wyim at unr.edu > > > On January 23, 2015 at 11:17:41 AM, Carson Holt (carsonhh at gmail.com ) wrote: > >> Stampede only has MVAPICH2. It does not have OpenMPI (even though it has been requested several times). OpenFabrics libraries (used by MVAPICH2) have a known issue that restricts programs from making system calls while running under MPI. A system call is when one program launches another (i.e. MAKER launching BLAST). For this reason MAKER does not work with MVAPICH2. It only works with OpenMPI. >> >> You can still get it to work on MVAPICH2, but only on a single node. If you request more than one node then it will fail. The solution would be for TACC to install OpenMPI as an option on Stampede (like they have on Lonestar), but until that happens you can only run MAKER on a single node. >> >> Thanks, >> Carson >> >> >>> On Jan 22, 2015, at 10:51 PM, Won C Yim > wrote: >>> >>> Dear anyone whom may it concern, >>> >>> Hi! >>> >>> My name is Won Cheol Yim in University of Nevada, Reno. >>> >>> I try to run MAKER on TACC Stampede. >>> >>> It looks everything installed properly. >>> >>> ============================================================================== >>> STATUS MAKER v2.31.8 >>> ============================================================================== >>> PERL Dependencies:VERIFIED >>> External Programs:VERIFIED >>> External C Libraries:VERIFIED >>> MPI SUPPORT:ENABLED >>> MWAS Web Interface:DISABLED >>> MAKER PACKAGE:CONFIGURATION OK >>> >>> And I installed Perl 5.18.4 with threads option. >>> >>> But I try to run it with MPI, it generated error. >>> >>> I assumed this problem came from ibrun in Stampede. >>> >>> Is there anyway to run it on Stampede? >>> >>> Here is my log. >>> >>> TACC: Starting up job >>> TACC: Setting up parallel environment for MVAPICH ssh-based mpirun. >>> cat: /home1/02908/wyim/.sge/job..hostlist.kUm5vXw9: No such file or directory >>> sort: open failed: /home1/02908/wyim/.sge/job..hostlist.kUm5vXw9: No such file or directory >>> TACC: Setup complete. Running job script. >>> TACC: starting parallel tasks... >>> [c404-703.stampede.tacc.utexas.edu:mpirun_rsh][read_hostfile] Can't open hostfile `/home1/02908/wyim/.sge/job..hostlist.kUm5vXw9': (2) >>> TACC: MPI job exited with code: 1 >>> TACC: Shutting down parallel environment. >>> TACC: Shutdown complete. Exiting. >>> >>> >>> Regards, >>> >>> Won >>> -- >>> Yim, Won Cheol >>> Sent with Airmail -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.hoeppner at imbim.uu.se Wed Jan 28 01:01:48 2015 From: marc.hoeppner at imbim.uu.se (=?utf-8?B?TWFyYyBIw7ZwcG5lcg==?=) Date: Wed, 28 Jan 2015 07:01:48 +0000 Subject: [maker-devel] Maker crash on increasingly small contigs In-Reply-To: <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> References: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> Message-ID: Hi, this is probably a long shot, but I was hoping that someone on the list may have some advice as to how to debug an error that has been popping up when running Maker on our 10 node cluster. So, what is the issue? Maker runs fine on several assemblies that w have processed in the past, but I recently started on a fairly fragment (low N50) mammalian assembly and the collaborator was keen to have all contigs annotated, down to 1kb (I guess it is more about the repeats and blast matches in those small bits). Anyway, As the contigs get smaller, Maker starts crashing in MPI mode with the following error (no other message given prior to that): perl:13424 terminated with signal 11 at PC=3d47095012 SP=7f8ac076e530. Backtrace: /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x3d47095012] /lib64/libpthread.so.0[0x358ae0f710] /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x0)[0x3d47094ff0] /lib64/libpthread.so.0[0x358ae0f710] /lib64/libc.so.6(__poll+0x53)[0x358aadf343] /sw/openmpi/1.8.3/lib/libopen-pal.so.6(+0x6af4a)[0x7f8ac0a29f4a] /sw/openmpi/1.8.3/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x221)[0x7f8ac0a21961] /sw/openmpi/1.8.3/lib/libopen-rte.so.7(+0x52f8e)[0x7f8ac0ce5f8e] /lib64/libpthread.so.0[0x358ae079d1] /lib64/libc.so.6(clone+0x6d)[0x358aae8b6d] SIGTERM received A few words about the setup: We have 10 nodes, 160 cores and the shared file system is exported via Infiniband from a ?standard? NFS server. As OS we run Scientific Linux 6.5. Tests so far don?t point to congestion issues or anything like that, the bandwidth usage is actually fairly low. I So far I tried: - running the MPI processes through both the ethernet network as well as over IPoIB, same problem. - installing a more recent version of perl through perlbrew, with all the required modules, and re-compiled Maker - ran some (albeit simple) network checks to for retransmissions, lost packages etc - nothing popped up - running Maker in a subset of nodes to eliminate the possibility of a bad node The error message is a bit cryptic to me and it would be very helpful to know if Maker has a problem with accessing a file, or whether OpenMPI has a communication problem etc - but I am not able to tell from the information I have been able to extract so far. Any ideas? So Cheers, Marc Marc P. Hoeppner, PhD Team Leader BILS Genome Annotation Platform Department for Medical Biochemistry and Microbiology Uppsala University, Sweden marc.hoeppner at imbim.uu.se From dence at genetics.utah.edu Wed Jan 28 10:22:09 2015 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 28 Jan 2015 16:22:09 +0000 Subject: [maker-devel] Maker crash on increasingly small contigs In-Reply-To: References: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> Message-ID: <19F7E075-6B18-4DB2-B97A-922D29456E52@genetics.utah.edu> Hi Marc, so a few things on the maker side to check out. Did you have the min_contig set to 1000, to set the lower limit on contig size? Did maker do anything with the 1kb contigs? Or did it just skip them? You can check that in the master_datastore_index.log or in the void directories for the small contigs. That will tell us whether maker is functioning correctly, even though it?s giving those messages. With the newer versions of makers, I get messages identical to what you sent as part of the normal thread termination, even when maker is functioning normally. Thanks, Daniel > On Jan 28, 2015, at 12:01 AM, Marc H?ppner wrote: > > Hi, > > this is probably a long shot, but I was hoping that someone on the list may have some advice as to how to debug an error that has been popping up when running Maker on our 10 node cluster. So, what is the issue? > > Maker runs fine on several assemblies that w have processed in the past, but I recently started on a fairly fragment (low N50) mammalian assembly and the collaborator was keen to have all contigs annotated, down to 1kb (I guess it is more about the repeats and blast matches in those small bits). Anyway, As the contigs get smaller, Maker starts crashing in MPI mode with the following error (no other message given prior to that): > > perl:13424 terminated with signal 11 at PC=3d47095012 SP=7f8ac076e530. Backtrace: > /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x3d47095012] > /lib64/libpthread.so.0[0x358ae0f710] > /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x0)[0x3d47094ff0] > /lib64/libpthread.so.0[0x358ae0f710] > /lib64/libc.so.6(__poll+0x53)[0x358aadf343] > /sw/openmpi/1.8.3/lib/libopen-pal.so.6(+0x6af4a)[0x7f8ac0a29f4a] > /sw/openmpi/1.8.3/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x221)[0x7f8ac0a21961] > /sw/openmpi/1.8.3/lib/libopen-rte.so.7(+0x52f8e)[0x7f8ac0ce5f8e] > /lib64/libpthread.so.0[0x358ae079d1] > /lib64/libc.so.6(clone+0x6d)[0x358aae8b6d] > SIGTERM received > > A few words about the setup: > > We have 10 nodes, 160 cores and the shared file system is exported via Infiniband from a ?standard? NFS server. As OS we run Scientific Linux 6.5. Tests so far don?t point to congestion issues or anything like that, the bandwidth usage is actually fairly low. I > > So far I tried: > > - running the MPI processes through both the ethernet network as well as over IPoIB, same problem. > - installing a more recent version of perl through perlbrew, with all the required modules, and re-compiled Maker > - ran some (albeit simple) network checks to for retransmissions, lost packages etc - nothing popped up > - running Maker in a subset of nodes to eliminate the possibility of a bad node > > The error message is a bit cryptic to me and it would be very helpful to know if Maker has a problem with accessing a file, or whether OpenMPI has a communication problem etc - but I am not able to tell from the information I have been able to extract so far. Any ideas? > > So > > Cheers, > > Marc > > > Marc P. Hoeppner, PhD > Team Leader > BILS Genome Annotation Platform > Department for Medical Biochemistry and Microbiology > Uppsala University, Sweden > marc.hoeppner at imbim.uu.se > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From marc.hoeppner at imbim.uu.se Thu Jan 29 01:34:17 2015 From: marc.hoeppner at imbim.uu.se (Marc P. Hoeppner) Date: Thu, 29 Jan 2015 08:34:17 +0100 Subject: [maker-devel] Maker crash on increasingly small contigs In-Reply-To: <19F7E075-6B18-4DB2-B97A-922D29456E52@genetics.utah.edu> References: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> <19F7E075-6B18-4DB2-B97A-922D29456E52@genetics.utah.edu> Message-ID: <54C9E279.8040907@imbim.uu.se> Hi, thanks for the feedback. If I resume maker enough times, it will eventually run through an complete all contigs. The question is whether there is any way to debug why it drops at random times , most commonly when running on small contigs (which is probably more due to the increasing frequency of starting/finishing jobs rather than their size). I guess Maker has no debug mode or any other way to find out why it dies? Any idea what could make Maker drop like that? I was thinking NFS, but the nfsstat looks fine, nothing in the log and NFS function is generally good - so I can't identify a good point to look for the problem. Regards, Marc On 2015-01-28 17:22, Daniel Ence wrote: > Hi Marc, so a few things on the maker side to check out. > > Did you have the min_contig set to 1000, to set the lower limit on contig size? > Did maker do anything with the 1kb contigs? Or did it just skip them? > You can check that in the master_datastore_index.log or in the void directories for the small contigs. > That will tell us whether maker is functioning correctly, even though it?s giving those messages. > > With the newer versions of makers, I get messages identical to what you sent as part of the normal thread termination, even when maker is functioning normally. > > Thanks, > Daniel > > > >> On Jan 28, 2015, at 12:01 AM, Marc H?ppner wrote: >> >> Hi, >> >> this is probably a long shot, but I was hoping that someone on the list may have some advice as to how to debug an error that has been popping up when running Maker on our 10 node cluster. So, what is the issue? >> >> Maker runs fine on several assemblies that w have processed in the past, but I recently started on a fairly fragment (low N50) mammalian assembly and the collaborator was keen to have all contigs annotated, down to 1kb (I guess it is more about the repeats and blast matches in those small bits). Anyway, As the contigs get smaller, Maker starts crashing in MPI mode with the following error (no other message given prior to that): >> >> perl:13424 terminated with signal 11 at PC=3d47095012 SP=7f8ac076e530. Backtrace: >> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x3d47095012] >> /lib64/libpthread.so.0[0x358ae0f710] >> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x0)[0x3d47094ff0] >> /lib64/libpthread.so.0[0x358ae0f710] >> /lib64/libc.so.6(__poll+0x53)[0x358aadf343] >> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(+0x6af4a)[0x7f8ac0a29f4a] >> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x221)[0x7f8ac0a21961] >> /sw/openmpi/1.8.3/lib/libopen-rte.so.7(+0x52f8e)[0x7f8ac0ce5f8e] >> /lib64/libpthread.so.0[0x358ae079d1] >> /lib64/libc.so.6(clone+0x6d)[0x358aae8b6d] >> SIGTERM received >> >> A few words about the setup: >> >> We have 10 nodes, 160 cores and the shared file system is exported via Infiniband from a ?standard? NFS server. As OS we run Scientific Linux 6.5. Tests so far don?t point to congestion issues or anything like that, the bandwidth usage is actually fairly low. I >> >> So far I tried: >> >> - running the MPI processes through both the ethernet network as well as over IPoIB, same problem. >> - installing a more recent version of perl through perlbrew, with all the required modules, and re-compiled Maker >> - ran some (albeit simple) network checks to for retransmissions, lost packages etc - nothing popped up >> - running Maker in a subset of nodes to eliminate the possibility of a bad node >> >> The error message is a bit cryptic to me and it would be very helpful to know if Maker has a problem with accessing a file, or whether OpenMPI has a communication problem etc - but I am not able to tell from the information I have been able to extract so far. Any ideas? >> >> So >> >> Cheers, >> >> Marc >> >> >> Marc P. Hoeppner, PhD >> Team Leader >> BILS Genome Annotation Platform >> Department for Medical Biochemistry and Microbiology >> Uppsala University, Sweden >> marc.hoeppner at imbim.uu.se >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mikael.durling at slu.se Thu Jan 29 03:37:23 2015 From: mikael.durling at slu.se (=?utf-8?B?TWlrYWVsIEJyYW5kc3Ryw7ZtIER1cmxpbmc=?=) Date: Thu, 29 Jan 2015 09:37:23 +0000 Subject: [maker-devel] Maker crash on increasingly small contigs In-Reply-To: <54C9E279.8040907@imbim.uu.se> References: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> <19F7E075-6B18-4DB2-B97A-922D29456E52@genetics.utah.edu> <54C9E279.8040907@imbim.uu.se> Message-ID: Hi, are you running the NFS servers in synchronous or asynchronous mode? I have seen cases when maker fails with the nfs server in async mode, but the failures are random and I can?t really reproduce them. In the end, I have continued running maker on NFS in async mode, since the speed gains are significant, at the cost of occasional reruns. (And yes, nfsstats shows no signs of errors). Mikael > 29 jan 2015 kl. 08:34 skrev Marc P. Hoeppner : > > Hi, > > thanks for the feedback. If I resume maker enough times, it will eventually run through an complete all contigs. The question is whether there is any way to debug why it drops at random times , most commonly when running on small contigs (which is probably more due to the increasing frequency of starting/finishing jobs rather than their size). I guess Maker has no debug mode or any other way to find out why it dies? Any idea what could make Maker drop like that? I was thinking NFS, but the nfsstat looks fine, nothing in the log and NFS function is generally good - so I can't identify a good point to look for the problem. > > Regards, > > Marc > > On 2015-01-28 17:22, Daniel Ence wrote: >> Hi Marc, so a few things on the maker side to check out. >> >> Did you have the min_contig set to 1000, to set the lower limit on contig size? >> Did maker do anything with the 1kb contigs? Or did it just skip them? >> You can check that in the master_datastore_index.log or in the void directories for the small contigs. >> That will tell us whether maker is functioning correctly, even though it?s giving those messages. >> >> With the newer versions of makers, I get messages identical to what you sent as part of the normal thread termination, even when maker is functioning normally. >> >> Thanks, >> Daniel >> >> >> >>> On Jan 28, 2015, at 12:01 AM, Marc H?ppner wrote: >>> >>> Hi, >>> >>> this is probably a long shot, but I was hoping that someone on the list may have some advice as to how to debug an error that has been popping up when running Maker on our 10 node cluster. So, what is the issue? >>> >>> Maker runs fine on several assemblies that w have processed in the past, but I recently started on a fairly fragment (low N50) mammalian assembly and the collaborator was keen to have all contigs annotated, down to 1kb (I guess it is more about the repeats and blast matches in those small bits). Anyway, As the contigs get smaller, Maker starts crashing in MPI mode with the following error (no other message given prior to that): >>> >>> perl:13424 terminated with signal 11 at PC=3d47095012 SP=7f8ac076e530. Backtrace: >>> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x3d47095012] >>> /lib64/libpthread.so.0[0x358ae0f710] >>> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x0)[0x3d47094ff0] >>> /lib64/libpthread.so.0[0x358ae0f710] >>> /lib64/libc.so.6(__poll+0x53)[0x358aadf343] >>> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(+0x6af4a)[0x7f8ac0a29f4a] >>> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x221)[0x7f8ac0a21961] >>> /sw/openmpi/1.8.3/lib/libopen-rte.so.7(+0x52f8e)[0x7f8ac0ce5f8e] >>> /lib64/libpthread.so.0[0x358ae079d1] >>> /lib64/libc.so.6(clone+0x6d)[0x358aae8b6d] >>> SIGTERM received >>> >>> A few words about the setup: >>> >>> We have 10 nodes, 160 cores and the shared file system is exported via Infiniband from a ?standard? NFS server. As OS we run Scientific Linux 6.5. Tests so far don?t point to congestion issues or anything like that, the bandwidth usage is actually fairly low. I >>> >>> So far I tried: >>> >>> - running the MPI processes through both the ethernet network as well as over IPoIB, same problem. >>> - installing a more recent version of perl through perlbrew, with all the required modules, and re-compiled Maker >>> - ran some (albeit simple) network checks to for retransmissions, lost packages etc - nothing popped up >>> - running Maker in a subset of nodes to eliminate the possibility of a bad node >>> >>> The error message is a bit cryptic to me and it would be very helpful to know if Maker has a problem with accessing a file, or whether OpenMPI has a communication problem etc - but I am not able to tell from the information I have been able to extract so far. Any ideas? >>> >>> So >>> >>> Cheers, >>> >>> Marc >>> >>> >>> Marc P. Hoeppner, PhD >>> Team Leader >>> BILS Genome Annotation Platform >>> Department for Medical Biochemistry and Microbiology >>> Uppsala University, Sweden >>> marc.hoeppner at imbim.uu.se >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Jan 29 09:22:57 2015 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 29 Jan 2015 08:22:57 -0700 Subject: [maker-devel] Maker crash on increasingly small contigs In-Reply-To: References: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> <19F7E075-6B18-4DB2-B97A-922D29456E52@genetics.utah.edu> <54C9E279.8040907@imbim.uu.se> Message-ID: In my experience NFS is the most likely cause. A lot of very small contigs means that MAKER would produce a lot of very small files very quickly, which creates far more stress for NFS than high IO read/write bandwidth does. There can then be several seconds of lag time between a file being created and the file being available for reading because the asynchronous setting allows the system to return true for IO operation even though the operations have not yet been completed but are only buffered on the NFS server. So when the process tries to read the file it supposably just created, the file doesn?t exist. MAKER tries to offload most small file creation operations that can result in this condition to a temporary directory (indicated by TMP= in the maker_opts.ctl file), so it is critical that this location be set to a local drive and not an NFS location. But running a lot of very small contigs would still result in more frequent file creation on the NFS mount. The only way around this type of NFS issue is either to run on fewer nodes to reduce file creation frequency, turn off asynchronous mode for NFS (which results in serious IO performance degradation) or to just let MAKER retry until it works (brute force) which is the default and in my experience the most effective approach. NFS issues were in fact the reason we put retry and restart capabilities into MAKER in the first place. ?Carson > On Jan 29, 2015, at 2:37 AM, Mikael Brandstr?m Durling wrote: > > Hi, > > are you running the NFS servers in synchronous or asynchronous mode? I have seen cases when maker fails with the nfs server in async mode, but the failures are random and I can?t really reproduce them. In the end, I have continued running maker on NFS in async mode, since the speed gains are significant, at the cost of occasional reruns. (And yes, nfsstats shows no signs of errors). > > Mikael > > >> 29 jan 2015 kl. 08:34 skrev Marc P. Hoeppner : >> >> Hi, >> >> thanks for the feedback. If I resume maker enough times, it will eventually run through an complete all contigs. The question is whether there is any way to debug why it drops at random times , most commonly when running on small contigs (which is probably more due to the increasing frequency of starting/finishing jobs rather than their size). I guess Maker has no debug mode or any other way to find out why it dies? Any idea what could make Maker drop like that? I was thinking NFS, but the nfsstat looks fine, nothing in the log and NFS function is generally good - so I can't identify a good point to look for the problem. >> >> Regards, >> >> Marc >> >> On 2015-01-28 17:22, Daniel Ence wrote: >>> Hi Marc, so a few things on the maker side to check out. >>> >>> Did you have the min_contig set to 1000, to set the lower limit on contig size? >>> Did maker do anything with the 1kb contigs? Or did it just skip them? >>> You can check that in the master_datastore_index.log or in the void directories for the small contigs. >>> That will tell us whether maker is functioning correctly, even though it?s giving those messages. >>> >>> With the newer versions of makers, I get messages identical to what you sent as part of the normal thread termination, even when maker is functioning normally. >>> >>> Thanks, >>> Daniel >>> >>> >>> >>>> On Jan 28, 2015, at 12:01 AM, Marc H?ppner wrote: >>>> >>>> Hi, >>>> >>>> this is probably a long shot, but I was hoping that someone on the list may have some advice as to how to debug an error that has been popping up when running Maker on our 10 node cluster. So, what is the issue? >>>> >>>> Maker runs fine on several assemblies that w have processed in the past, but I recently started on a fairly fragment (low N50) mammalian assembly and the collaborator was keen to have all contigs annotated, down to 1kb (I guess it is more about the repeats and blast matches in those small bits). Anyway, As the contigs get smaller, Maker starts crashing in MPI mode with the following error (no other message given prior to that): >>>> >>>> perl:13424 terminated with signal 11 at PC=3d47095012 SP=7f8ac076e530. Backtrace: >>>> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x3d47095012] >>>> /lib64/libpthread.so.0[0x358ae0f710] >>>> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x0)[0x3d47094ff0] >>>> /lib64/libpthread.so.0[0x358ae0f710] >>>> /lib64/libc.so.6(__poll+0x53)[0x358aadf343] >>>> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(+0x6af4a)[0x7f8ac0a29f4a] >>>> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x221)[0x7f8ac0a21961] >>>> /sw/openmpi/1.8.3/lib/libopen-rte.so.7(+0x52f8e)[0x7f8ac0ce5f8e] >>>> /lib64/libpthread.so.0[0x358ae079d1] >>>> /lib64/libc.so.6(clone+0x6d)[0x358aae8b6d] >>>> SIGTERM received >>>> >>>> A few words about the setup: >>>> >>>> We have 10 nodes, 160 cores and the shared file system is exported via Infiniband from a ?standard? NFS server. As OS we run Scientific Linux 6.5. Tests so far don?t point to congestion issues or anything like that, the bandwidth usage is actually fairly low. I >>>> >>>> So far I tried: >>>> >>>> - running the MPI processes through both the ethernet network as well as over IPoIB, same problem. >>>> - installing a more recent version of perl through perlbrew, with all the required modules, and re-compiled Maker >>>> - ran some (albeit simple) network checks to for retransmissions, lost packages etc - nothing popped up >>>> - running Maker in a subset of nodes to eliminate the possibility of a bad node >>>> >>>> The error message is a bit cryptic to me and it would be very helpful to know if Maker has a problem with accessing a file, or whether OpenMPI has a communication problem etc - but I am not able to tell from the information I have been able to extract so far. Any ideas? >>>> >>>> So >>>> >>>> Cheers, >>>> >>>> Marc >>>> >>>> >>>> Marc P. Hoeppner, PhD >>>> Team Leader >>>> BILS Genome Annotation Platform >>>> Department for Medical Biochemistry and Microbiology >>>> Uppsala University, Sweden >>>> marc.hoeppner at imbim.uu.se >>>> >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From myandell at genetics.utah.edu Thu Jan 29 10:54:50 2015 From: myandell at genetics.utah.edu (Mark Yandell) Date: Thu, 29 Jan 2015 16:54:50 +0000 Subject: [maker-devel] Maker crash on increasingly small contigs In-Reply-To: <54C9E279.8040907@imbim.uu.se> References: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> <19F7E075-6B18-4DB2-B97A-922D29456E52@genetics.utah.edu>, <54C9E279.8040907@imbim.uu.se> Message-ID: <7A60AB257EFF2B48B1F4C814817EA053E371D456@mxb2.hg.genetics.utah.edu> Hi Marc, are you sure this n't your system? E.G. bad NFS mounts, scratch full etc? Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Co-director USTAR Center for Genetic Discovery Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Marc P. Hoeppner [marc.hoeppner at imbim.uu.se] Sent: Thursday, January 29, 2015 12:34 AM To: Daniel Ence Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Maker crash on increasingly small contigs Hi, thanks for the feedback. If I resume maker enough times, it will eventually run through an complete all contigs. The question is whether there is any way to debug why it drops at random times , most commonly when running on small contigs (which is probably more due to the increasing frequency of starting/finishing jobs rather than their size). I guess Maker has no debug mode or any other way to find out why it dies? Any idea what could make Maker drop like that? I was thinking NFS, but the nfsstat looks fine, nothing in the log and NFS function is generally good - so I can't identify a good point to look for the problem. Regards, Marc On 2015-01-28 17:22, Daniel Ence wrote: > Hi Marc, so a few things on the maker side to check out. > > Did you have the min_contig set to 1000, to set the lower limit on contig size? > Did maker do anything with the 1kb contigs? Or did it just skip them? > You can check that in the master_datastore_index.log or in the void directories for the small contigs. > That will tell us whether maker is functioning correctly, even though it?s giving those messages. > > With the newer versions of makers, I get messages identical to what you sent as part of the normal thread termination, even when maker is functioning normally. > > Thanks, > Daniel > > > >> On Jan 28, 2015, at 12:01 AM, Marc H?ppner wrote: >> >> Hi, >> >> this is probably a long shot, but I was hoping that someone on the list may have some advice as to how to debug an error that has been popping up when running Maker on our 10 node cluster. So, what is the issue? >> >> Maker runs fine on several assemblies that w have processed in the past, but I recently started on a fairly fragment (low N50) mammalian assembly and the collaborator was keen to have all contigs annotated, down to 1kb (I guess it is more about the repeats and blast matches in those small bits). Anyway, As the contigs get smaller, Maker starts crashing in MPI mode with the following error (no other message given prior to that): >> >> perl:13424 terminated with signal 11 at PC=3d47095012 SP=7f8ac076e530. Backtrace: >> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x3d47095012] >> /lib64/libpthread.so.0[0x358ae0f710] >> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x0)[0x3d47094ff0] >> /lib64/libpthread.so.0[0x358ae0f710] >> /lib64/libc.so.6(__poll+0x53)[0x358aadf343] >> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(+0x6af4a)[0x7f8ac0a29f4a] >> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x221)[0x7f8ac0a21961] >> /sw/openmpi/1.8.3/lib/libopen-rte.so.7(+0x52f8e)[0x7f8ac0ce5f8e] >> /lib64/libpthread.so.0[0x358ae079d1] >> /lib64/libc.so.6(clone+0x6d)[0x358aae8b6d] >> SIGTERM received >> >> A few words about the setup: >> >> We have 10 nodes, 160 cores and the shared file system is exported via Infiniband from a ?standard? NFS server. As OS we run Scientific Linux 6.5. Tests so far don?t point to congestion issues or anything like that, the bandwidth usage is actually fairly low. I >> >> So far I tried: >> >> - running the MPI processes through both the ethernet network as well as over IPoIB, same problem. >> - installing a more recent version of perl through perlbrew, with all the required modules, and re-compiled Maker >> - ran some (albeit simple) network checks to for retransmissions, lost packages etc - nothing popped up >> - running Maker in a subset of nodes to eliminate the possibility of a bad node >> >> The error message is a bit cryptic to me and it would be very helpful to know if Maker has a problem with accessing a file, or whether OpenMPI has a communication problem etc - but I am not able to tell from the information I have been able to extract so far. Any ideas? >> >> So >> >> Cheers, >> >> Marc >> >> >> Marc P. Hoeppner, PhD >> Team Leader >> BILS Genome Annotation Platform >> Department for Medical Biochemistry and Microbiology >> Uppsala University, Sweden >> marc.hoeppner at imbim.uu.se >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From ashrafi at ucdavis.edu Thu Jan 29 12:07:41 2015 From: ashrafi at ucdavis.edu (Hamid Ashrafi) Date: Thu, 29 Jan 2015 13:07:41 -0500 Subject: [maker-devel] GFF and Dereferencing problem Message-ID: <007c01d03bee$7ab100a0$701301e0$@ucdavis.edu> Hi, After maker finishes its job it generates many files one of them is gff file. I see the following in some of my gff files. It seems it is a dereferencing problem. I am just wondering if affects my annotation. Hamid uti_cns_0004767 est2genome match_part 856428 856485 3090 + . ID=uti_cns_0004767:hsp:8340:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 uti_cns_0004767 est2genome match_part 856587 856938 3090 + . ID=uti_cns_0004767:hsp:8341:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 uti_cns_0004767 est2genome match_part 857053 857201 3090 + . ID=uti_cns_0004767:hsp:8342:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 uti_cns_0004767 est2genome match_part 859004 859041 3090 + . ID=uti_cns_0004767:hsp:8343:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 uti_cns_0004767 est2genome expressed_sequence_match 878327 878771 1446 + . ID=uti_cns_0004767:hit:4231:3.2.3.8;Name=Sp_Illum_Trans_W uti_cns_0004767 est2genome match_part 878327 878771 1446 + . ID=uti_cns_0004767:hsp:8344:3.2.3.8;Parent=uti_cns_0004767:hit:4231:3.2.3 uti_cns_0004767 est2genome expressed_sequence_match 884121 886610 2509 + . ID=uti_cns_0004767:hit:4232:3.2.3.8;Name=Sp_Illum_Trans_W uti_cns_0004767 est2genome match_part 884121 884195 2509 + . ID=uti_cns_0004767:hsp:8345:3.2.3.8;Parent=uti_cns_0004767:hit:4232:3.2.3 uti_cns_0004767 est2genome match_part 886180 886610 2509 + . ID=uti_cns_0004767:hsp:8346:3.2.3.8;Parent=uti_cns_0004767:hit:4232:3.2.3 ARRAY(0x1b91f110) ARRAY(0x1a686350) ARRAY(0x1b06bba0) ARRAY(0x1b931e10) ARRAY(0x1b13f3a0) ARRAY(0x1b6af650) ARRAY(0x1b929600) -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jan 29 12:47:11 2015 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 29 Jan 2015 11:47:11 -0700 Subject: [maker-devel] Maker on Amazon EC2 Using Starcluster In-Reply-To: <1422394249179.2a90ef9d@Nodemailer> References: <73716718-1273-46F1-BC94-AAD276DFE0E1@gmail.com> <1422394249179.2a90ef9d@Nodemailer> Message-ID: I believe this may be caused by the latency of ansyncrounous operations on your network shared drive (which could have a lot of lag between operations when running in the cloud). Try using a single AWS instance in your test using the local drive as the working directory. Next try with two instances where one id the NFS server and you run MAKER on the other instance but on the network mounted drive. Then try gradually increasing the number of instances hitting the network shared drive. ?Carson > On Jan 27, 2015, at 2:30 PM, Jason Gallant wrote: > > Carson, > > Thanks for the input and the test script? I was successfully able to run Maker using OpenMPI on Starcluster. However, I am still receiving error messages fairly commonly? this is the error I described earlier in this thread. It seems to appear regardless of whether I use OpenMPI or MPICH2. > > Essentially, there seems to be an error collapsing BLAST reports. This error essentially causes maker to stop accepting new contigs on that machine (in this case node060), and maker continues to report every contig following this error as ?failed?. Otherwise, the other nodes seem to be working normally, but this error seems to be able to happen on other nodes as well, so the issue can compound. > > [1,15]:deleted:-60 hits > [1,15]:collecting blastx reports > [1,15]:ERROR: Could not colapse BLAST reports > [1,15]: at /root/maker/bin/../lib/GI.pm line 2524 thread 1. > [1,15]: GI::combine_blast_report(FastaChunk=HASH(0x1781acd8), ARRAY(0xc1e4fa8), ARRAY(0x15ab20d0), runlog=HASH(0xb87f878)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760 thread 1 > [1,15]: Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415 thread 1 > [1,15]: eval {...} called at /root/maker/bin/../lib/Error.pm line 407 thread 1 > [1,15]: Error::subs::try(CODE(0x198e22f8), HASH(0x9c9b65c0)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4224 thread 1 > [1,15]: Process::MpiChunk::_go(Process::MpiChunk=HASH(0x1b8a7cd0), "run", HASH(0x15e3e1a0), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341 thread 1 > [1,15]: Process::MpiChunk::run(Process::MpiChunk=HASH(0x1b8a7cd0), 15) called at /root/maker/bin/maker line 1457 thread 1 > [1,15]: main::node_thread("/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 > [1,15]: eval {...} called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 > [1,15]: threads::new("threads", CODE(0x36c9a98), "/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /root/maker/bin/maker line 917 thread 1 > [1,15]:--> rank=15, hostname=node015 > [1,15]:ERROR: Failed while collecting blastx reports > [1,15]:ERROR: Chunk failed at level:9, tier_type:3 > [1,15]:FAILED CONTIG:Scaffold66 > [1,15]: > [1,15]:ERROR: Chunk failed at level:4, tier_type:0 > [1,15]:FAILED CONTIG:Scaffold66 > [1,15]: > [1,15]:examining contents of the fasta file and run log > [1,15]:ERROR: could not make datastore directory > [1,15]:--> rank=15, hostname=node015 > [1,15]:ERROR: Failed while examining contents of the fasta file and run log > [1,15]:ERROR: Chunk failed at level:0, tier_type:0 > [1,15]:FAILED CONTIG:Scaffold483 > > ? > Dr. Jason R. Gallant > Assistant Professor > Room 38 Natural Sciences > Department of Zoology > Michigan State University > East Lansing, MI 48824 > jgallant at msu.edu > office: 517-884-7756 > > > On Fri, Jan 23, 2015 at 3:25 PM, Carson Holt > wrote: > > The complaining is because there is more than one MAKER process running and they are not connected via MPI. So the problem is OpenMPI. Try installing a small MPI script (like the one attached) and using that to test OpenMPI. Once it is configured correctly then each separate processes will communicate with each other (pay attention to comm size and rank messages). > > ?Carson > > > > > >> On Jan 23, 2015, at 1:15 PM, Jason Gallant > wrote: >> >> Hi Carson, >> >> Yes, I?ve tried that and still have the issue of maker complaining about multiple processes in the same directory. Other ideas? >> >> Best, >> Jason >> >> ? >> Dr. Jason R. Gallant >> Assistant Professor >> Room 38 Natural Sciences >> Department of Zoology >> Michigan State University >> East Lansing, MI 48824 >> jgallant at msu.edu >> office: 517-884-7756 >> >> >> On Fri, Jan 23, 2015 at 3:14 PM, Carson Holt > wrote: >> >> If using OpenMPI, make sure to set LD_PRELOAD to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/usr/local/openmpi/lib/libmpi.so). >> >> >> For OpenMPI you may also want to set OMPI_MCA_mpi_warn_on_fork=0 in your ~/.bash_profile to turn off certain nonfatal warnings. Also if jobs hang or freeze when using mpiexec under OpenMPI try adding the '-mca btl ^openib' flag to mpiexec command when running MAKER. >> >> Example: mpiexec -mca btl ^openib -n 20 maker >> >> ?Carson >> >> >> >>> On Jan 23, 2015, at 1:08 PM, Jason Gallant > wrote: >>> >>> Hi Carson, >>> >>> Yes, STARCLUSTER enables a global storage space, which is via NFS to an EBS drive that I?ve created. >>> >>> I?m using the local disk space on each instance for the /tmp directory, however. >>> >>> It occurred to me on reading the forums that MPICH2 doesn?t scale as well as OPENMPI, however when I try to configure Maker for openmpi and run it, I get complaints from maker that multiple makers are running in the same directory? >>> >>> Thanks for your advice! >>> >>> Best, >>> Jason >>> >>> ? >>> Dr. Jason R. Gallant >>> Assistant Professor >>> Room 38 Natural Sciences >>> Department of Zoology >>> Michigan State University >>> East Lansing, MI 48824 >>> jgallant at msu.edu >>> office: 517-884-7756 >>> >>> >>> On Fri, Jan 23, 2015 at 3:01 PM, Carson Holt > wrote: >>> >>> MAKER needs a global storage location. You probably need to set up one of your instances up to act as a shared storage server. AWS has lustre implementations for the cloud, perhaps you can try that. Also use OpenMPI instead of MPICH2. It?s more stable. >>> >>> I look forward to seeing how your experiment with AWS, MPI, and MAKER works out. >>> >>> ?Carson >>> >>> >>> >>> > On Jan 21, 2015, at 6:56 AM, Jason Gallant > wrote: >>> > >>> > Hi Everyone, >>> > >>> > I?m attempting to run Maker on Amazon EC2 using MIT?s starcluster? I?ve started a 200 node cluster, and enabled MPICH2 (Starcluster by default uses OpenMPI). I plan on documenting this setup once I?ve figured out how to run things reliably. >>> > >>> > I?m having a persistent issue where something fails on one of the nodes, and std error is flooded with: >>> > >>> > examining contents of the fasta file and run log >>> > [67] ERROR: could not make datastore directory >>> > [67] --> rank=67, hostname=node067 >>> > [67] ERROR: Failed while examining contents of the fasta file and run log >>> > [67] ERROR: Chunk failed at level:0, tier_type:0 >>> > [67] FAILED CONTIG:Scaffold261 >>> > >>> > This error repeats for each ?next? scaffold for some time. When I go back to find the ?source? of the error in the log, the following is the first error message on that node: >>> > >>> > 67] #-------------------------------# >>> > [67] deleted:-60 hits >>> > [67] collecting blastx reports >>> > [67] ERROR: Could not colapse BLAST reports >>> > [67] at /root/maker/bin/../lib/GI.pm line 2524 thread 1. >>> > [67] GI::combine_blast_report(FastaChunk=HASH(0x108e1a90), ARRAY(0x1b874938), ARRAY(0xf127ad8), runlog=HASH(0x4d54ed8)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760 thread 1 >>> > [67] Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415 thread 1 >>> > [67] eval {...} called at /root/maker/bin/../lib/Error.pm line 407 thread 1 >>> > [67] Error::subs::try(CODE(0x1514eb00), HASH(0x9cbeb568)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4215 thread 1 >>> > [67] Process::MpiChunk::_go(Process::MpiChunk=HASH(0x13976308), "run", HASH(0x12e04268), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341 thread 1 >>> > [67] Process::MpiChunk::run(Process::MpiChunk=HASH(0x13976308), 67) called at /root/maker/bin/maker line 1457 thread 1 >>> > [67] main::node_thread("/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 >>> > [67] eval {...} called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 >>> > [67] threads::new("threads", CODE(0x3dc5b38), "/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /root/maker/bin/maker line 917 thread 1 >>> > [67] --> rank=67, hostname=node067 >>> > [67] ERROR: Failed while collecting blastx reports >>> > [67] ERROR: Chunk failed at level:9, tier_type:3 >>> > [67] FAILED CONTIG:Scaffold66 >>> > [67] >>> > [67] ERROR: Chunk failed at level:4, tier_type:0 >>> > [67] FAILED CONTIG:Scaffold66 >>> > >>> > >>> > I?ve attempted to ignore the error to see if things will proceed on the other 199 processors. When I returned to the ?master? node after the evening, Maker keeps repeating the same error code over and over (same scaffold): >>> > ] examining contents of the fasta file and run log >>> > [67] ERROR: could not make datastore directory >>> > [67] --> rank=67, hostname=node067 >>> > [67] ERROR: Failed while examining contents of the fasta file and run log >>> > [67] ERROR: Chunk failed at level:0, tier_type:0 >>> > [67] FAILED CONTIG:Scaffold1589 >>> > >>> > I stop the job, and restart, and after only a few minutes of running, the same error is reported, this time on a new scaffold. Strangely here, the error is reported in the MPI tag of node001, but the error originates at node137: >>> > >>> > ERROR: Could not colapse BLAST reports >>> > [1] at /root/maker/bin/../lib/GI.pm line 2524. >>> > [1] GI::combine_blast_report(FastaChunk=HASH(0xf4aa9b8), ARRAY(0xf628f90), ARRAY(0x325fea78), runlog=HASH(0x133cc8e8)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760 >>> > [1] Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415 >>> > [1] eval {...} called at /root/maker/bin/../lib/Error.pm line 407 >>> > [1] Error::subs::try(CODE(0x352c9b8), HASH(0xdab3b690)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4215 >>> > [1] Process::MpiChunk::_go(Process::MpiChunk=HASH(0x3545d90), "run", HASH(0x30aa710), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341 >>> > [1] Process::MpiChunk::run(Process::MpiChunk=HASH(0x3545d90), 137) called at /root/maker/bin/maker line 979 >>> > [1] --> rank=137, hostname=node137 >>> > [1] ERROR: Failed while collecting blastx reports >>> > [1] ERROR: Chunk failed at level:9, tier_type:3 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] ERROR: Chunk failed at level:4, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > >>> > I?d appreciate any guidance as how best to diagnose this error! >>> > >>> > Many thanks, >>> > Jason Gallant >>> > >>> > >>> > >>> > >>> > ? >>> > Dr. Jason R. Gallant >>> > Assistant Professor >>> > Room 38 Natural Sciences >>> > Department of Zoology >>> > Michigan State University >>> > East Lansing, MI 48824 >>> > jgallant at msu.edu >>> > office: 517-884-7756 >>> > _______________________________________________ >>> > maker-devel mailing list >>> > maker-devel at box290.bluehost.com >>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jan 29 13:40:09 2015 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 29 Jan 2015 12:40:09 -0700 Subject: [maker-devel] GFF and Dereferencing problem In-Reply-To: <007c01d03bee$7ab100a0$701301e0$@ucdavis.edu> References: <007c01d03bee$7ab100a0$701301e0$@ucdavis.edu> Message-ID: <65C0DD7A-A3CA-4404-B15E-91B77DC6D8FE@gmail.com> Could you make sure you are using the most recent version of MAKER? There was a bug similar to this that was fixed some time ago. Current version is 2.31.8. also when rerunning with the most recent version of MAKER, make sure to set the -a flag on the command line to force rerun of logged data. ?Carson > On Jan 29, 2015, at 11:07 AM, Hamid Ashrafi wrote: > > Hi, > > After maker finishes its job it generates many files one of them is gff file. I see the following in some of my gff files. It seems it is a dereferencing problem. I am just wondering if affects my annotation. > > Hamid > > uti_cns_0004767 est2genome match_part 856428 856485 3090 + . ID=uti_cns_0004767:hsp:8340:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 > uti_cns_0004767 est2genome match_part 856587 856938 3090 + . ID=uti_cns_0004767:hsp:8341:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 > uti_cns_0004767 est2genome match_part 857053 857201 3090 + . ID=uti_cns_0004767:hsp:8342:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 > uti_cns_0004767 est2genome match_part 859004 859041 3090 + . ID=uti_cns_0004767:hsp:8343:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 > uti_cns_0004767 est2genome expressed_sequence_match 878327 878771 1446 + . ID=uti_cns_0004767:hit:4231:3.2.3.8;Name=Sp_Illum_Trans_W > uti_cns_0004767 est2genome match_part 878327 878771 1446 + . ID=uti_cns_0004767:hsp:8344:3.2.3.8;Parent=uti_cns_0004767:hit:4231:3.2.3 > uti_cns_0004767 est2genome expressed_sequence_match 884121 886610 2509 + . ID=uti_cns_0004767:hit:4232:3.2.3.8;Name=Sp_Illum_Trans_W > uti_cns_0004767 est2genome match_part 884121 884195 2509 + . ID=uti_cns_0004767:hsp:8345:3.2.3.8;Parent=uti_cns_0004767:hit:4232:3.2.3 > uti_cns_0004767 est2genome match_part 886180 886610 2509 + . ID=uti_cns_0004767:hsp:8346:3.2.3.8;Parent=uti_cns_0004767:hit:4232:3.2.3 > ARRAY(0x1b91f110) > ARRAY(0x1a686350) > ARRAY(0x1b06bba0) > ARRAY(0x1b931e10) > ARRAY(0x1b13f3a0) > ARRAY(0x1b6af650) > ARRAY(0x1b929600) > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 30 10:33:46 2015 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 30 Jan 2015 09:33:46 -0700 Subject: [maker-devel] How to improve the result of Maker In-Reply-To: References: Message-ID: <492A6635-67E9-4700-B544-E137C4248E55@gmail.com> See below ?> > I have join "Maker-devel" google group, but I don't known why I can't reply a topic and create a new topic. Is there some limitation? The google site is just a searchable archive of MAKER related e-mails. The actual conversations occur through the MAKER mailing list ?> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org E-mails sent to the list will be automatically archived on google. > I have finish genome annotation with Maker. I use SNAP and Augustus in Maker. I have some questions, could you help me? > > When gene finders have prediction at the same location, maker would choose the best prediction as final output, right? but if the prediction doesn't match evidence very much, how maker will synthesize the prediction with evidence? My knowledge about maker's action is as follow, I'm not sure whether it is right: > > assume that there is an exon existing in evidence but not in prediction, if the exon locate at the end of prediction, it will be output as UTR, but if the exon locate inside prediction, it will be ignored, and not be output, right? No. MAKER uses the introns and exons in the evidence alignments to provide hints to the gene predictors. Hints increases the probability scores of the HMM models by increasing the likelihood of the exon or intron state wherever it overlaps the evidence alignment. This process bumps up the likelihood values for models that better match the evidence alignments resulting in better models than SNAP and Augustus produce on their own without hints. Note that models are still governed by the same constraints of what constitutes an open reading frame and a splice site regardless of evidence alignments. This means that no amount of evidence based hints can overcome an assembly error. > for example: > > the exon pointed by red arrow. all evidences contain this exon, but it was missed in the final output. There are two possibilities. Given how different the snap and augustus models are from one another, this would suggest they have not been trained appropriately (for example if you are picking another related organisms parameter file rather than training these programs, there are several assumptions that are being made that can actually make such an approach almost worse than just picking a parameter file at random). But more likely the evidence supported exon breaks the reading frame of the model. This usually indicates that you have an assembly error (possibly issues with homopolymers). No amount of evidence support will allow you to call an exon that generates a mis-sense causing frameshift, so the predictors do the next most reasonable thing - they drop the exon if another model is tenable. More concerning would be the mRNA-seq alignments near the 3? end of the gene call. The structure suggests significant capture of background transcription with the mRNA-seq reads (long UTRs with weird mini-introns). I would suggest not using cufflinks in this case. You should probably go with an assembly based approach of mRNA-seq reads instead. I would suggest using trinity. It will reduce sensitivity but greatly increase evidence specificity which is where you need the most improvement based on these images. I would also suggest using the jaccard_clip option with trinity. I would further suggest looking at the model in question using apollo, and manually adding the exon (click and drag it into the model). You can examine the reading frame after adding the exon and see if it is in fact a frameshift assembly error. If it?s a homopolymer derived frameshift, then you can expect a lot more of these throughout your assembly. Also I do not see any protein alignments here? MAKER cannot work on transcript evidence alone. You need to provide the full proteome of at least two other species (they don?t have to be that closely related, but closer is better). Protein alignments will also help you better interpret the coding status of exons supported by mRNA-seq. For example in the second image, you would expect protein evidence to support all the coding exons but not the UTR exons which would remove any doubt as to whether an exon is really UTR or not. > In this example, long UTR is another issue, is it non-coding RNA? > > I have another example: > > > The yellow was evidencs from cufflinks. The final output choose the prediction from Augustus, but the last two exon was annotated as UTR, I thought UTR should be continuous, and should not contain intron. Actually UTR is not expected to be continuous and without introns. In fact the majority of alternate splicing events occur in the 5? UTR (not in the CDS) and 5? UTR commonly contain introns (just as we see here). This makes evolutionary sense. Alternatively spiced 5? UTR allows for differential and tissue specific control of the exact same protein by swapping out the upstream regulatory sequence. Alternate splicing of the 3? UTR on the other hand is less common (it?s involved in nonsense mediated decay and not so much in regulation of expression), but introns in the 3? UTR are still not uncommon. The mRNA-seq alignments suggests that those exons are transcribed, so unless there is an assembly error causing a framefhift in the CDS and an early stop codon, the 3? UTR would be correct. If you had protein alignments from another species here, then you could see which exons they support as being coding exons. Thanks, Carson -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Fri Jan 30 22:48:33 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Sat, 31 Jan 2015 15:48:33 +1100 Subject: [maker-devel] genome duplication? Message-ID: Hi all, One of the fungal genomes I'm annotating is relatively shattered (?), with many contigs/scaffolds and based on CEGMA analysis only may indicate a potential widespread duplication of the genome # Statistics of the completeness of the genome based on 248 CEGs # > #Prots %Completeness - #Total Average %Ortho > > Complete 181 72.98 - 365 2.02 67.40 > Group 1 54 81.82 - 105 1.94 66.67 > Group 2 39 69.64 - 86 2.21 71.79 > Group 3 45 73.77 - 86 1.91 57.78 > Group 4 43 66.15 - 88 2.05 74.42 > Partial 230 92.74 - 528 2.30 77.83 > Group 1 61 92.42 - 140 2.30 72.13 > Group 2 53 94.64 - 127 2.40 84.91 > Group 3 56 91.80 - 126 2.25 69.64 > Group 4 60 92.31 - 135 2.25 85.00 The expected genome size is relatively low (~42 Mb by abyss-fac) in comparison with *Hortaea werneckii* (51.6Mb, 23333 genes), a related fungi with nearly 90% of its genes present in at least two copies. Paper: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0071328 Now to the Maker part... So, as part of the Maker annotation, I trained SNAP and Augustus, and I generated a specific RepeatModeler library. I recorded the predicted outputs from each Maker run (AED, number of predicted proteins and transcripts...). Both Augustus and SNAP used to give quite high number (~19000 and ~23000 respectively) in comparison with the xxx.all.maker.proteins.fasta (about 13600). So, my first question is, how does maker deal with gene duplications? Or is this just a phenomenon given that there is no support from the protein files provided initially to Maker? I've used 4 different protein files for the annotation, could it be that they weren't the best choices? I picked them from the closest relatives and similar environments So, in my last run I turn the keep_preds=1 and the proteins in the xxx.all.maker.proteins.fasta reached to Last question regarding the protein files. I download the annotated genomes from the JGI and most of them have two annotation folders "All_models,_Filtered_and_Not" and "Filtered_Models___best__". I've been using the protein files found in the later as I expected to have real evidence and a lower chance of being predicting false genes. Am I right? Thank you in advance, Xabier -- Xabier V?zquez Campos PhD Candidate Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Sat Jan 31 02:42:51 2015 From: mikael.durling at slu.se (=?utf-8?B?TWlrYWVsIEJyYW5kc3Ryw7ZtIER1cmxpbmc=?=) Date: Sat, 31 Jan 2015 08:42:51 +0000 Subject: [maker-devel] genome duplication? In-Reply-To: References: Message-ID: Hi Xabier, 31 jan 2015 kl. 05:48 skrev Xabier V?zquez Campos >: Hi all, One of the fungal genomes I'm annotating is relatively shattered (?), with many contigs/scaffolds and based on CEGMA analysis only may indicate a potential widespread duplication of the genome # Statistics of the completeness of the genome based on 248 CEGs # #Prots %Completeness - #Total Average %Ortho Complete 181 72.98 - 365 2.02 67.40 Partial 230 92.74 - 528 2.30 77.83 Judging from these figure, you seem to have a very fragmented assembly? What N50 have you reached? According to my experience, assemblies with an N50 below 5-10 times the average gene length tend to give problems in producing good gene sets. Not to say that the gene sets are unusable, but for comparing e.g. gene complements to other species, it will be hard to draw any conclusions when a high proportion of the genes are incomplete. The expected genome size is relatively low (~42 Mb by abyss-fac) in comparison with Hortaea werneckii (51.6Mb, 23333 genes), a related fungi with nearly 90% of its genes present in at least two copies. Paper: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0071328 Now to the Maker part... So, as part of the Maker annotation, I trained SNAP and Augustus, and I generated a specific RepeatModeler library. I recorded the predicted outputs from each Maker run (AED, number of predicted proteins and transcripts...). Both Augustus and SNAP used to give quite high number (~19000 and ~23000 respectively) in comparison with the xxx.all.maker.proteins.fasta (about 13600). So, my first question is, how does maker deal with gene duplications? Or is this just a phenomenon given that there is no support from the protein files provided initially to Maker? I've used 4 different protein files for the annotation, could it be that they weren't the best choices? I picked them from the closest relatives and similar environments Unless you by mistake filter out duplicated gene families as repeats with repeat modeler, maker should not care about duplicated genes. However, maker, without keep_preds=1, reports only genes with some kind of support (be it EST or protein homology). This is rather conservative, but if you enable keep_preds, you will get more genes as you have noted. Just for the sake of comparison, I have reannotad more than ten genomes downloaded from JGI, providing MAKER with similar evidence as JGI, and consistently, MAKER is reporting fewer gene models. I have yet to do a more thorough comparison to tell what genes JGI are reporting that don?t appear in the MAKER annotations. So, in my last run I turn the keep_preds=1 and the proteins in the xxx.all.maker.proteins.fasta reached to Last question regarding the protein files. I download the annotated genomes from the JGI and most of them have two annotation folders "All_models,_Filtered_and_Not" and "Filtered_Models___best__". I've been using the protein files found in the later as I expected to have real evidence and a lower chance of being predicting false genes. Am I right? Yes, I would say so. The FilteredModels have passed through their model selection pipeline, while all_models contains models from all predictors, as well as combinations of predictors and EST evidence. Just some 2 cents of observations of mine, cheers, Mikael Thank you in advance, Xabier -- Xabier V?zquez Campos PhD Candidate Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Sat Jan 31 02:51:36 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Sat, 31 Jan 2015 19:51:36 +1100 Subject: [maker-devel] genome duplication? In-Reply-To: References: Message-ID: Thanks Mikael, This are the assembly stats as taken from abyss-fac, indeed it isn't a great N50, but it isn't that bad either n n:500 n:N50 min N80 N50 N20 E-size max sum 14277 7099 1185 500 4698 10771 20438 14530 154519 42.68e6 2015-01-31 19:42 GMT+11:00 Mikael Brandstr?m Durling : > Hi Xabier, > > 31 jan 2015 kl. 05:48 skrev Xabier V?zquez Campos : > > Hi all, > > One of the fungal genomes I'm annotating is relatively shattered (?), with > many contigs/scaffolds and based on CEGMA analysis only may indicate a > potential widespread duplication of the genome > > # Statistics of the completeness of the genome based on 248 CEGs >> # >> #Prots %Completeness - #Total Average %Ortho >> >> Complete 181 72.98 - 365 2.02 67.40 >> Partial 230 92.74 - 528 2.30 77.83 >> > > > Judging from these figure, you seem to have a very fragmented assembly? > What N50 have you reached? According to my experience, assemblies with an > N50 below 5-10 times the average gene length tend to give problems in > producing good gene sets. Not to say that the gene sets are unusable, but > for comparing e.g. gene complements to other species, it will be hard to > draw any conclusions when a high proportion of the genes are incomplete. > > The expected genome size is relatively low (~42 Mb by abyss-fac) in > comparison with *Hortaea werneckii* (51.6Mb, 23333 genes), a related > fungi with nearly 90% of its genes present in at least two copies. > Paper: > http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0071328 > > Now to the Maker part... So, as part of the Maker annotation, I trained > SNAP and Augustus, and I generated a specific RepeatModeler library. I > recorded the predicted outputs from each Maker run (AED, number of > predicted proteins and transcripts...). Both Augustus and SNAP used to give > quite high number (~19000 and ~23000 respectively) in comparison with the > xxx.all.maker.proteins.fasta (about 13600). So, my first question is, how > does maker deal with gene duplications? Or is this just a phenomenon given > that there is no support from the protein files provided initially to > Maker? I've used 4 different protein files for the annotation, could it be > that they weren't the best choices? I picked them from the closest > relatives and similar environments > > > Unless you by mistake filter out duplicated gene families as repeats > with repeat modeler, maker should not care about duplicated genes. However, > maker, without keep_preds=1, reports only genes with some kind of support > (be it EST or protein homology). This is rather conservative, but if you > enable keep_preds, you will get more genes as you have noted. Just for the > sake of comparison, I have reannotad more than ten genomes downloaded from > JGI, providing MAKER with similar evidence as JGI, and consistently, MAKER > is reporting fewer gene models. I have yet to do a more thorough comparison > to tell what genes JGI are reporting that don?t appear in the MAKER > annotations. > > > So, in my last run I turn the keep_preds=1 and the proteins in the > xxx.all.maker.proteins.fasta reached to > > Last question regarding the protein files. I download the annotated > genomes from the JGI and most of them have two annotation folders > "All_models,_Filtered_and_Not" and "Filtered_Models___best__". I've been > using the protein files found in the later as I expected to have real > evidence and a lower chance of being predicting false genes. Am I right? > > > Yes, I would say so. The FilteredModels have passed through their model > selection pipeline, while all_models contains models from all predictors, > as well as combinations of predictors and EST evidence. > > Just some 2 cents of observations of mine, > cheers, > Mikael > > > Thank you in advance, > > Xabier > > > -- > Xabier V?zquez Campos > PhD Candidate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From chenwenbo1020 at gmail.com Sat Jan 31 09:54:28 2015 From: chenwenbo1020 at gmail.com (=?UTF-8?B?6ZmI5paH5Y2a?=) Date: Sat, 31 Jan 2015 10:54:28 -0500 Subject: [maker-devel] How to improve the result of Maker In-Reply-To: <492A6635-67E9-4700-B544-E137C4248E55@gmail.com> References: <492A6635-67E9-4700-B544-E137C4248E55@gmail.com> Message-ID: > > > There are two possibilities. Given how different the snap and augustus > models are from one another, this would suggest they have not been trained > appropriately (for example if you are picking another related organisms > parameter file rather than training these programs, there are several > assumptions that are being made that can actually make such an approach > almost worse than just picking a parameter file at random). But more likely > the evidence supported exon breaks the reading frame of the model. This > usually indicates that you have an assembly error (possibly issues with > homopolymers). No amount of evidence support will allow you to call an > exon that generates a mis-sense causing frameshift, so the predictors do > the next most reasonable thing - they drop the exon if another model is > tenable. More concerning would be the mRNA-seq alignments near the 3? end > of the gene call. The structure suggests significant capture of background > transcription with the mRNA-seq reads (long UTRs with weird mini-introns). > I would suggest not using cufflinks in this case. You should probably go > with an assembly based approach of mRNA-seq reads instead. I would suggest > using trinity. It will reduce sensitivity but greatly increase evidence > specificity which is where you need the most improvement based on these > images. I would also suggest using the jaccard_clip option with trinity. > > I would further suggest looking at the model in question using apollo, and > manually adding the exon (click and drag it into the model). You can > examine the reading frame after adding the exon and see if it is in fact a > frameshift assembly error. If it?s a homopolymer derived frameshift, then > you can expect a lot more of these throughout your assembly. > I drag the exon into the model, there is a stop codon in it, it causes the region behind it become UTR, here: [image: ???? 1] the question exon was pointed by red arrow. But the uppermost evidence is the completed EST from NCBI, and it contains start and stop codon. Then I noticed the 5' boundary of the 2nd codon in model is not the same as EST, so it makes frameshift, and cause the stop codon in the exon pointed by red arrow. The first exon should not be CDS, as there would be a start codon in 2nd exon if its 5' boundary is predicted correctly. Would "always_complete=1" fix it? I will try to use trinity. > > Also I do not see any protein alignments here? MAKER cannot work on > transcript evidence alone. You need to provide the full proteome of at > least two other species (they don?t have to be that closely related, but > closer is better). Protein alignments will also help you better interpret > the coding status of exons supported by mRNA-seq. For example in the second > image, you would expect protein evidence to support all the coding exons > but not the UTR exons which would remove any doubt as to whether an exon is > really UTR or not. > I did use 3 sources of protein evidence, one is proteome from related species, and one is proteome from fruitfly, and the last one is Swiss-prot. Thank you very much! Best regards, Wenbo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 10308 bytes Desc: not available URL: From jason.stajich at gmail.com Sat Jan 31 17:21:12 2015 From: jason.stajich at gmail.com (Jason Stajich) Date: Sat, 31 Jan 2015 15:21:12 -0800 Subject: [maker-devel] genome duplication? In-Reply-To: References: Message-ID: Xabier - FYI - though you probably already compared, those stats are on par with the Hortaea v1 assembly, (we do have an improved Hortaea assembly now and genome size is still same range supporting the duplication hypothesis) Hw version 1 asmbly - N50 9623; Max 71563 CEGMA for Hw1 #Prots %Completeness - #Total Average %Ortho Complete 196 79.03 - 498 2.54 81.12 Partial 228 91.94 - 673 2.95 95.18 Mikael - yes - we should compare notes on the models JGI is calling which have little support in MAKER - I am not sure if their pipeline runs with augustus/snap using informant hints though usually they are bringing RNAseq into the mix - I don't know if your approach for reannotation assembled the RNAseq and used it as evidence? We'll be trying to assess some of this when comparisons of proportion of shared genes in the first 1KFG paper so we may be able to say with more certainty of these extra predictions whether they are shared more widely and get a handle on singleton/false positives rates. Jason Jason Stajich jason.stajich at gmail.com On Sat, Jan 31, 2015 at 12:51 AM, Xabier V?zquez Campos wrote: > Thanks Mikael, > > This are the assembly stats as taken from abyss-fac, indeed it isn't a > great N50, but it isn't that bad either > > n n:500 n:N50 min N80 N50 N20 E-size > max sum > 14277 7099 1185 500 4698 10771 20438 14530 154519 > 42.68e6 > > > > 2015-01-31 19:42 GMT+11:00 Mikael Brandstr?m Durling < > mikael.durling at slu.se>: > >> Hi Xabier, >> >> 31 jan 2015 kl. 05:48 skrev Xabier V?zquez Campos : >> >> Hi all, >> >> One of the fungal genomes I'm annotating is relatively shattered (?), >> with many contigs/scaffolds and based on CEGMA analysis only may indicate a >> potential widespread duplication of the genome >> >> # Statistics of the completeness of the genome based on 248 CEGs >>> # >>> #Prots %Completeness - #Total Average %Ortho >>> >>> Complete 181 72.98 - 365 2.02 67.40 >>> Partial 230 92.74 - 528 2.30 77.83 >>> >> >> >> Judging from these figure, you seem to have a very fragmented assembly? >> What N50 have you reached? According to my experience, assemblies with an >> N50 below 5-10 times the average gene length tend to give problems in >> producing good gene sets. Not to say that the gene sets are unusable, but >> for comparing e.g. gene complements to other species, it will be hard to >> draw any conclusions when a high proportion of the genes are incomplete. >> >> The expected genome size is relatively low (~42 Mb by abyss-fac) in >> comparison with *Hortaea werneckii* (51.6Mb, 23333 genes), a related >> fungi with nearly 90% of its genes present in at least two copies. >> Paper: >> http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0071328 >> >> Now to the Maker part... So, as part of the Maker annotation, I trained >> SNAP and Augustus, and I generated a specific RepeatModeler library. I >> recorded the predicted outputs from each Maker run (AED, number of >> predicted proteins and transcripts...). Both Augustus and SNAP used to give >> quite high number (~19000 and ~23000 respectively) in comparison with the >> xxx.all.maker.proteins.fasta (about 13600). So, my first question is, how >> does maker deal with gene duplications? Or is this just a phenomenon given >> that there is no support from the protein files provided initially to >> Maker? I've used 4 different protein files for the annotation, could it be >> that they weren't the best choices? I picked them from the closest >> relatives and similar environments >> >> >> Unless you by mistake filter out duplicated gene families as repeats >> with repeat modeler, maker should not care about duplicated genes. However, >> maker, without keep_preds=1, reports only genes with some kind of support >> (be it EST or protein homology). This is rather conservative, but if you >> enable keep_preds, you will get more genes as you have noted. Just for the >> sake of comparison, I have reannotad more than ten genomes downloaded from >> JGI, providing MAKER with similar evidence as JGI, and consistently, MAKER >> is reporting fewer gene models. I have yet to do a more thorough comparison >> to tell what genes JGI are reporting that don?t appear in the MAKER >> annotations. >> >> >> So, in my last run I turn the keep_preds=1 and the proteins in the >> xxx.all.maker.proteins.fasta reached to >> >> Last question regarding the protein files. I download the annotated >> genomes from the JGI and most of them have two annotation folders >> "All_models,_Filtered_and_Not" and "Filtered_Models___best__". I've been >> using the protein files found in the later as I expected to have real >> evidence and a lower chance of being predicting false genes. Am I right? >> >> >> Yes, I would say so. The FilteredModels have passed through their model >> selection pipeline, while all_models contains models from all predictors, >> as well as combinations of predictors and EST evidence. >> >> Just some 2 cents of observations of mine, >> cheers, >> Mikael >> >> >> Thank you in advance, >> >> Xabier >> >> >> -- >> Xabier V?zquez Campos >> PhD Candidate >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > > -- > Xabier V?zquez Campos > *PhD Candidate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Jan 5 19:59:23 2015 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 5 Jan 2015 19:59:23 -0700 Subject: [maker-devel] some problems using MAKER In-Reply-To: References: Message-ID: <08B46BBA-522B-43BC-9E82-57F641E0127D@gmail.com> I?d have to see the two GFF3 files you are using for your comparison. However one thing that comes to mind is that you may be unfamiliar with eval?s output. Eval provides several levels of strictness in the report at the gene, transcript, exon, and base pair levels. If you are using the gene level strictness in the report for example, then a single base pair difference in any of the transcripts will cause the entire gene to be considered a miss-match. You really only should use the base pair level SN/SP strictness for your comparison which will be in the eval report. In the most extreme case an exon level SN/SP strictness may be used, but in general no gold standard dataset is considered perfect enough to use the gene level SN/SP (or usually even the exon level strictness). ?Carson > On Dec 31, 2014, at 6:48 PM, ?? wrote: > > Hi all, > > Recently I'm using MAKER to annotate a single chromosome of rice as a pre-experiment. And I'm confronting some problems. After the annotation when I run the evaluation of eval between my result and gold standard, the gene sensitivity&specificity is only around 20%. And after I added the gff3 file maker made itself to run maker again, I found that the result is worse than 20%. > > My input is a Trinity-processed RNA-seq file and a protein file. I chose snap, augustus and genemark as ab initio predictors. > > I paste my maker_opts.ctl here: > > #-----Genome (these are always required) > genome=chr12.fasta #genome sequence (fasta file or fasta embeded in GFF3 file) > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff=chr12.gff #MAKER derived GFF3 file > est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est=rna-seq_trinity.fasta #set of ESTs or assembled mRNA-seq in fasta format > altest= #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at least one) > protein=Osativa_193_peptide.fa #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org=Rice #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker > repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no > softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) > > #-----Gene Prediction > snaphmm=rice #SNAP HMM file > gmhmm=/lustre/home/clswcc/yzhao/MAKER/maker/exe/genemark_hmm_euk_linux_64/ehmm/o_sativa.mod #GeneMark HMM file > augustus_species=arabidopsis #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff=augus.gff3 #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) > est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=1 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases > cpus=16 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) > > > Could you help me? Thank you !!! > > > > -- > Yue Zhao (Jerry) > Bachelor Candidate of Plant Biotechnology > Researcher in UCLA-CSST program > Shanghai Jiao Tong University, Shanghai > jerryzhaosjtu at gmail.com _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jerryzhaosjtu at gmail.com Wed Jan 7 04:16:45 2015 From: jerryzhaosjtu at gmail.com (=?UTF-8?B?6LW16LaK?=) Date: Wed, 7 Jan 2015 19:16:45 +0800 Subject: [maker-devel] using MAKER with MPI Message-ID: Greetings, Can I use mpirun instead of mpiexec? Thank you!! -- *Yue Zhao (Jerry)* Bachelor Candidate of Plant Biotechnology Researcher in UCLA-CSST program Shanghai Jiao Tong University, Shanghai *jerryzhaosjtu at gmail.com * -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jan 7 09:13:50 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 7 Jan 2015 09:13:50 -0700 Subject: [maker-devel] using MAKER with MPI In-Reply-To: References: Message-ID: Yes they are interchangeable. In fact in OpenMPI both mpiexec and mpirun are softlinks to the exact same executable ?> orterun Just remember MAKER works which MPICH2/3 and OpenMPI flavors of MPI but not with MVAPICH2. Also If using MPICH, make sure to enable shared libaries during installation (this is not the default). If using OpenMPI, make sure to set LD_PRELOAD to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/usr/local/openmpi/lib/libmpi.so). If jobs hang or freeze when using OpenMPI try adding the '-mca btl ^openib' flag to the mpiexec command when running MAKER. Example: mpiexec -mca btl ^openib -n 20 maker ?Carson > On Jan 7, 2015, at 4:16 AM, ?? wrote: > > Greetings, > > Can I use mpirun instead of mpiexec? Thank you!! > > -- > Yue Zhao (Jerry) > Bachelor Candidate of Plant Biotechnology > Researcher in UCLA-CSST program > Shanghai Jiao Tong University, Shanghai > jerryzhaosjtu at gmail.com _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jan 8 08:47:29 2015 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 8 Jan 2015 08:47:29 -0700 Subject: [maker-devel] MAKER mpi running wrong In-Reply-To: References: Message-ID: <13241A86-804F-4674-A8FD-CA90026CF4AF@gmail.com> When running large jobs in MPI semi-random issues can arise as well as tuning issues where hardware configuration, IO performance, buffer sizes etc. all play a role. Using one of the NIH flagship clusters from XSEDE for example, I can run on over 2000 CPUs without issue. But the IT specialists with XSEDE have also spent a lot of time tuning MPI by enabling and disabling certain options for their hardware and network configuration (The IT specialists for the XSEDE project are actually the developers for many of the MPI flavors available, so they actually wrote MPI to work really well on this specific cluster). On other clusters I can?t go over 200 cpus on a single job. Or on another XSEDE cluster I can run on exactly 1424 CPUs. If I increase by a single CPU, the jobs always fails. For these kinds of issues you would have to delve into some of the more obscure parameters of OpenMPI via trial and error (http://www.open-mpi.org/doc/ ). What happens under the hood in OpenMPI is that different buffer sizes and network communication strategies are triggered as the number of nodes increases, so you can often identify a specific CPU count that is stable, and going one over that number causes a failure. You then look in the documentation for a parameter that matches that trigger value and alter it higher or lower. Or if you can identify the stable CPU count, then just submit multiple jobs at exactly that CPU count. ?Carson > On Jan 8, 2015, at 8:27 AM, ?? wrote: > > Hi Carson, > > After using the flag in your example, the warning after runing MAKER was gone, yet after running with MPI in 512 threads for 2 hours, MAKER 'Exited with exit code 1' The stdout info is as followed: > > [node206][[7968,1],269][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104) > [node206][[7968,1],269][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104) > SIGTERM received > Perl exited with active threads: > 1 running and unjoined > 0 finished and unjoined > 0 running and detached > > Also, my job submission is like: > > #BSUB -J maker_mpi > #BSUB -n 512 > #BSUB -R "span[ptile=16]" > module purge && module load gcc/4.9.1 openmpi/gcc/1.6.5 > mpiexec -mca btl ^openib -n 512 perl /lustre/home/clswcc/yzhao/MAKER/maker/bin/maker -fix_nucleotides > > > Could you help me find out where is going wrong? The stdout at first is normal as followd : > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /lustre/home/clswcc/SOP_1Krice/gene_prediction/mpi/unaln.maker.output/unaln_datastore > > To access files for individual sequences use the datastore index: > /lustre/home/clswcc/SOP_1Krice/gene_prediction/mpi/unaln.maker.output/unaln_master_datastore_index.log > > STATUS: Now running MAKER... > > > > > Regards, > yue > > -- > Yue Zhao (Jerry) > Bachelor Candidate of Plant Biotechnology > Researcher in UCLA-CSST program > Shanghai Jiao Tong University, Shanghai > jerryzhaosjtu at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Wed Jan 14 01:40:38 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Wed, 14 Jan 2015 19:40:38 +1100 Subject: [maker-devel] doubt about selection of the best model Message-ID: Hi Maker developers and users, After quite a bit of time dealing with Maker, I can run it without problems (thank you Carson). However, I have doubts about the evaluation of the best model produced by Maker. I found the AED_cdf_generator.pl script while searching in the mail list and it is great but, when you use it, what gff files are you comparing? I initially thought that the models to be compared where those from each *ab initio* program, e.g. SNAP vs Augustus, and inside them, the subsequent bootstrap training steps, but unless you run only one each time you run Maker, the XXX.all.gff file will contain data from both predictions. Should I run them individually? Following the topic, Maker will generate different FASTA files for proteins and transcripts from each program (Maker and each *ab initio* predictor) as well as "non_overlapping" files. Which one(s) do you select to continue with the functional annotation? Thank you in advance, Xabier -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Wed Jan 14 01:49:34 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Wed, 14 Jan 2015 19:49:34 +1100 Subject: [maker-devel] Augustus retraining?? Message-ID: Hi, I trained Augustus using the output of CEGMA ( http://bioinf.uni-greifswald.de/bioinf/wiki/pmwiki.php?n=Augustus.CEGMATraining) through WebAugustus, which makes the training very easy but, and here is my question, can/should I re-train Augustus like it is done with SNAP? And what would I use for the re-training Thank you, Xabier -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Wed Jan 14 03:08:33 2015 From: mikael.durling at slu.se (=?utf-8?B?TWlrYWVsIEJyYW5kc3Ryw7ZtIER1cmxpbmc=?=) Date: Wed, 14 Jan 2015 10:08:33 +0000 Subject: [maker-devel] Augustus retraining?? In-Reply-To: References: Message-ID: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> Hi, 14 jan 2015 kl. 09:49 skrev Xabier V?zquez Campos >: Hi, I trained Augustus using the output of CEGMA (http://bioinf.uni-greifswald.de/bioinf/wiki/pmwiki.php?n=Augustus.CEGMATraining) through WebAugustus, which makes the training very easy but, and here is my question, can/should I re-train Augustus like it is done with SNAP? And what would I use for the re-training I?ve tried an approach of retraining augustus in a manner similar to what has been suggested here earlier for retraining of SNAP. This has been run with a local augustus installation as part of an automated framework I have set up to annotate fungal genomes. Interestingly, augustus seems to converge very quickly. It is not uncommon that autoAugustus reports that it could not improve the initial models that were derived from the CEGMA dataset. Are there other similar experiences on the list? I also a modified version of maker2zff which I call maker2augustus_gff which extracts an evidence set for augustus retraining from the initial round of maker. I?m happy to share it with anyone interested. cheers, Mikael Thank you, Xabier -- Xabier V?zquez Campos PhD Candidate Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jan 14 08:22:57 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 Jan 2015 08:22:57 -0700 Subject: [maker-devel] Augustus retraining?? In-Reply-To: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> References: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> Message-ID: <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> Here is some info on training SNAP via the bootstrap technique (i.e. using the models produced by the initial training to seed the next round of training). Even though the examples use SNAP, it would be applicable using the scripts and methods Mikael described in his w-mail ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors Also Jason Stajich wrote an excellent explanation on training Augustus on the GMOD mailing list ?> http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html He also included his own scripts to assist with the training ?> https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2augustus_gbk.pl ?Carson > On Jan 14, 2015, at 3:08 AM, Mikael Brandstr?m Durling wrote: > > Hi, > > >> 14 jan 2015 kl. 09:49 skrev Xabier V?zquez Campos >: >> >> Hi, >> >> I trained Augustus using the output of CEGMA (http://bioinf.uni-greifswald.de/bioinf/wiki/pmwiki.php?n=Augustus.CEGMATraining ) through WebAugustus, which makes the training very easy but, and here is my question, can/should I re-train Augustus like it is done with SNAP? And what would I use for the re-training > > I?ve tried an approach of retraining augustus in a manner similar to what has been suggested here earlier for retraining of SNAP. This has been run with a local augustus installation as part of an automated framework I have set up to annotate fungal genomes. Interestingly, augustus seems to converge very quickly. It is not uncommon that autoAugustus reports that it could not improve the initial models that were derived from the CEGMA dataset. Are there other similar experiences on the list? > > I also a modified version of maker2zff which I call maker2augustus_gff which extracts an evidence set for augustus retraining from the initial round of maker. I?m happy to share it with anyone interested. > > cheers, > Mikael > > >> >> Thank you, >> >> Xabier >> -- >> Xabier V?zquez Campos >> PhD Candidate >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jan 14 08:37:43 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 Jan 2015 08:37:43 -0700 Subject: [maker-devel] doubt about selection of the best model In-Reply-To: References: Message-ID: The MAKER models will be the final models. Fasta files and features from the raw ab initio gene predictors on the other hand are there for reference purposes only and unless you have a need for them should be ignored. MAKER models are the combination of ab initio gene predictions filtered for best evidence match together with hint based models from the predictors. Basically MAKER took the best models from each separate predictor and created a final consensus gene set. The CDF generator really is for comparison of how evidence match changes between different releases of the genome or for different parameter options (i.e. you are comparing curves between independent MAKER runs and not within a single MAKER run). THE AED CDF curve is interpreted similar to a ROC curve in that shifts up and to the left indicate improved gene models. This is as opposed to using sensitivity and specificity, because those measures require you to already know the correct models in order to generate a comparison. For de-novo annotation that is impossible (if you already had the correct models you wouldn?t be running MAKER), so since such values cannot be generated then AED which used evidence overlap acts as a proxy measurement. This paper probably gives the overall best example of how AED correlates with model quality (Figures 2 and 3) ?> http://www.biomedcentral.com/1471-2105/12/491 ?Carson > On Jan 14, 2015, at 1:40 AM, Xabier V?zquez Campos wrote: > > Hi Maker developers and users, > > After quite a bit of time dealing with Maker, I can run it without problems (thank you Carson). However, I have doubts about the evaluation of the best model produced by Maker. > > I found the AED_cdf_generator.pl script while searching in the mail list and it is great but, when you use it, what gff files are you comparing? I initially thought that the models to be compared where those from each ab initio program, e.g. SNAP vs Augustus, and inside them, the subsequent bootstrap training steps, but unless you run only one each time you run Maker, the XXX.all.gff file will contain data from both predictions. Should I run them individually? > > Following the topic, Maker will generate different FASTA files for proteins and transcripts from each program (Maker and each ab initio predictor) as well as "non_overlapping" files. Which one(s) do you select to continue with the functional annotation? > > Thank you in advance, > > Xabier > > -- > Xabier V?zquez Campos > PhD Candidate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Fri Jan 16 01:09:11 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Fri, 16 Jan 2015 19:09:11 +1100 Subject: [maker-devel] functional annotation Message-ID: Hi, What file from the Maker output do you use for the functional annotation? The fasta part of the XXX.all.gff? I'll probably be using BLAST and InterProScan. I tested B2go (basic version), good stuff but it is annoyingly slow. Thank you -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Fri Jan 16 03:11:21 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Fri, 16 Jan 2015 21:11:21 +1100 Subject: [maker-devel] repeat masking and repeat libraries Message-ID: Hi there, First, a general question. Probably kind of silly but I prefer to be sure... When you browse RepBase, for example in fungi, all the repeats are marked as Eukaryota (Ancestral) or under the name of the species but no other taxa ranks are indicated. Does RepeatMasker recognise orders, families etc? or in my case should I stick with model_org=fungi? I've been trying to create a repeat libraries specific for my genomes and I did't have any luck with the programs described in the Basic and advanced tutorials (neither in my computer or in the cluster), reporting errors at all times, with exception of RepeatModeler, which ran with no problems. Is the output from RepeatModeler enough to improve the masking? It is not the best option I guess, but better than just the RepBase libraries by themselves, isn't it? Thank you for your time, Xabier -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Fri Jan 16 10:01:37 2015 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Fri, 16 Jan 2015 10:01:37 -0700 Subject: [maker-devel] functional annotation In-Reply-To: References: Message-ID: Hi Xabier, The FASTA at the end of the GFF3 file is the genome. For functional annotation you want to use the XXXout.all.maker.proteins.fasta file. It contains the protein sequences for your MAKER gene models. Good luck, Mike On Fri, Jan 16, 2015 at 1:09 AM, Xabier V?zquez Campos wrote: > Hi, > > What file from the Maker output do you use for the functional annotation? > The fasta part of the XXX.all.gff? > > I'll probably be using BLAST and InterProScan. I tested B2go (basic > version), good stuff but it is annoyingly slow. > > Thank you > > -- > Xabier V?zquez Campos > *PhD Candidate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 16 10:04:09 2015 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 16 Jan 2015 10:04:09 -0700 Subject: [maker-devel] repeat masking and repeat libraries In-Reply-To: References: Message-ID: Using both RepBase and a RepeatModeler produced library should be sufficient, especially for fungi. ?Carson > On Jan 16, 2015, at 3:11 AM, Xabier V?zquez Campos wrote: > > Hi there, > > First, a general question. Probably kind of silly but I prefer to be sure... When you browse RepBase, for example in fungi, all the repeats are marked as Eukaryota (Ancestral) or under the name of the species but no other taxa ranks are indicated. Does RepeatMasker recognise orders, families etc? or in my case should I stick with model_org=fungi? > > I've been trying to create a repeat libraries specific for my genomes and I did't have any luck with the programs described in the Basic and advanced tutorials (neither in my computer or in the cluster), reporting errors at all times, with exception of RepeatModeler, which ran with no problems. Is the output from RepeatModeler enough to improve the masking? It is not the best option I guess, but better than just the RepBase libraries by themselves, isn't it? > > Thank you for your time, > > Xabier > > -- > Xabier V?zquez Campos > PhD Candidate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Fri Jan 16 10:08:43 2015 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Fri, 16 Jan 2015 10:08:43 -0700 Subject: [maker-devel] repeat masking and repeat libraries In-Reply-To: References: Message-ID: Hi Xabier, I haven't seen orders or families documented for repeatmasker with repbase. Fungi seems safe to me. If you want to give yourself a little more peace of mind about the repeatmodeler library you can blast it to database of known fungal proteins and remove the entries int he library that have strong hits to a known protein to avoid over-masking. Mike On Fri, Jan 16, 2015 at 10:04 AM, Carson Holt wrote: > Using both RepBase and a RepeatModeler produced library should be > sufficient, especially for fungi. > > ?Carson > > > On Jan 16, 2015, at 3:11 AM, Xabier V?zquez Campos > wrote: > > Hi there, > > First, a general question. Probably kind of silly but I prefer to be > sure... When you browse RepBase, for example in fungi, all the repeats are > marked as Eukaryota (Ancestral) or under the name of the species but no > other taxa ranks are indicated. Does RepeatMasker recognise orders, > families etc? or in my case should I stick with model_org=fungi? > > I've been trying to create a repeat libraries specific for my genomes and > I did't have any luck with the programs described in the Basic > > and advanced > > tutorials (neither in my computer or in the cluster), reporting errors at > all times, with exception of RepeatModeler, which ran with no problems. Is > the output from RepeatModeler enough to improve the masking? It is not the > best option I guess, but better than just the RepBase libraries by > themselves, isn't it? > > Thank you for your time, > > Xabier > > -- > Xabier V?zquez Campos > *PhD Candidate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Fri Jan 16 20:57:26 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Sat, 17 Jan 2015 14:57:26 +1100 Subject: [maker-devel] AED score script error Message-ID: Hi, Just reporting the following error with the AED_cdf_generator.pl script: Use of uninitialized value $opt_b in division (/) at AED_cdf_generator.pl > line 20. > Illegal division by zero at AED_cdf_generator.pl line 20. > Anybody else with this problem? I use the version attached here: https://groups.google.com/forum/#!topic/maker-devel/LCpB3CEm63M Thank you -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Jan 19 10:27:52 2015 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 19 Jan 2015 10:27:52 -0700 Subject: [maker-devel] AED score script error In-Reply-To: References: Message-ID: Hi Xabier, Did you give the -b option a value on the command line ( e.g. -b 0.1)? Mike On Fri, Jan 16, 2015 at 8:57 PM, Xabier V?zquez Campos wrote: > Hi, > > Just reporting the following error with the AED_cdf_generator.pl script: > > Use of uninitialized value $opt_b in division (/) at AED_cdf_generator.pl >> line 20. >> Illegal division by zero at AED_cdf_generator.pl line 20. >> > > Anybody else with this problem? > I use the version attached here: > https://groups.google.com/forum/#!topic/maker-devel/LCpB3CEm63M > > Thank you > > > -- > Xabier V?zquez Campos > *PhD Candidate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Mon Jan 19 23:14:58 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Tue, 20 Jan 2015 17:14:58 +1100 Subject: [maker-devel] AED score script error In-Reply-To: References: Message-ID: Thanks Mike. It was that. 2015-01-20 4:27 GMT+11:00 Michael Campbell : > Hi Xabier, > > Did you give the -b option a value on the command line ( e.g. -b 0.1)? > > Mike > > On Fri, Jan 16, 2015 at 8:57 PM, Xabier V?zquez Campos < > xvazquezc at gmail.com> wrote: > >> Hi, >> >> Just reporting the following error with the AED_cdf_generator.pl script: >> >> Use of uninitialized value $opt_b in division (/) at AED_cdf_generator.pl >>> line 20. >>> Illegal division by zero at AED_cdf_generator.pl line 20. >>> >> >> Anybody else with this problem? >> I use the version attached here: >> https://groups.google.com/forum/#!topic/maker-devel/LCpB3CEm63M >> >> Thank you >> >> >> -- >> Xabier V?zquez Campos >> *PhD Candidate* >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > -- > Michael Campbell MS, RD. > Doctoral Candidate > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ph:585-3543 > > -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jan 20 09:45:01 2015 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 Jan 2015 09:45:01 -0700 Subject: [maker-devel] Issue due to intensive I/O In-Reply-To: References: Message-ID: <6F82AB5F-4782-41CA-A61F-C79894EFABB4@gmail.com> Genome annotation is very data intensive as opposed to CPU intensive. In MAKER, most IO intensive operations will occur in a temporary directory pointed to by the TMP= option in the MAKER control files. If you are setting this value to a location on a network mounted drive then this could be the source of your problem. Also TMP= defaults to the location of the TMPDIR Linux environmental variable, so make sure that TMPDIR is not set to a network mounted location either. The temporary directory needs to be a locally mounted location. There will still need to be a number of global files though; however, we?ve previously ran MAKER on over 8,000 cpus on Lustre file systems with no issues. It is possible that it is the metadata server that is having problems as opposed to the object storage server if the genome being annotated has a large number of small contigs. Lots of small contigs in a fragmented genome assembly result in a lot of small result files, but very little reading and writing. Such a situation can be quite stressful for Lustre file systems because they don?t like having large numbers of very small files (it overwhelms the metadata server even though the object storage server will be under more moderate load). Make sure you are setting min_contig= to something like 10000 if that is the case to avoid generating analysis for short un-annotatable contigs (they may number in the hundreds of thousands on lower quality genome assemblies and contain no useful information). You can also set clean_up=1 in the maker control files, to delete files as MAKER advances. This removes restart capability because you won?t have logged results from previous runs, but it will reduce the burden on the Metadata server (which is affected by total file number as opposed to file read/write operations). Also setting clean_up=1 can help you avoid any administrator defined limits on total file number per user (administrators commonly set this limit on Lustre based file systems to avoid taxing the metadata server). So your issue is likely caused by one of two things: 1. Improperly setting TMP= in the maker_opts.ctl file or the Linux TMPDIR environmental variable to a network mounted location. Fixed by setting these to a locally mounted location (usually /tmp). 2. Too many total files being generated by a fragmented genome assembly. Fixed by either setting min_contig=10000 in order to skip short contigs or by setting clean_up=1 to avoid logging too many files. This happen because it is very difficult to overwhelm Lustre's object storage servers (which perform IO read/write operations), but it?s relatively easy to overwhelming the metadata server (affected by total file count rather than total IO throughput). ?Carson > On Jan 19, 2015, at 5:55 AM, Stephen Wang wrote: > > Dear MAKER Team, > > I am a cluster administrator in the university. The issue is caused by MAKER jobs, which access massive small files and crashed Lustre file system. > > Hardware: 16 cores per node > Software: OpenMPI 1.6.5 and GCC 4.9.1 > > Q1: Does MAKER have to generate a large number of files on the global file system? > Q2: Can any parameters help MAKER avoid I/O intensive access? Any experience on Lustre? > > MAKER is a quite important software for our user. Hope for your help. > > BR, > Stephen > > -- > Stephen Wang, GPU Computing Specialist > Center for High Performance Computing > Shanghai Jiao Tong University > Room 205 Network Center, 800 Dongchuan Road, Shanghai 200240 China > Mobi:+86-136-6151-1618 Web:http://hpc.sjtu.edu.cn -------------- next part -------------- An HTML attachment was scrubbed... URL: From jgallant at msu.edu Wed Jan 21 06:56:02 2015 From: jgallant at msu.edu (Jason Gallant) Date: Wed, 21 Jan 2015 05:56:02 -0800 (PST) Subject: [maker-devel] Maker on Amazon EC2 Using Starcluster Message-ID: <1421848561970.c8b481bf@Nodemailer> Hi Everyone, I?m attempting to run Maker on Amazon EC2 using MIT?s starcluster? I?ve started a 200 node cluster, and enabled MPICH2 (Starcluster by default uses OpenMPI). ?I plan on documenting this setup once I?ve figured out how to run things reliably. I?m having a persistent issue where something fails on one of the nodes, and std error is flooded with: examining contents of the fasta file and run log [67] ERROR: could not make datastore directory [67] --> rank=67, hostname=node067 [67] ERROR: Failed while examining contents of the fasta file and run log [67] ERROR: Chunk failed at level:0, tier_type:0 [67] FAILED CONTIG:Scaffold261 This error repeats for each ?next? scaffold for some time. ?When I go back to find the ?source? of the error in the log, the following is the first error message on that node: 67] #-------------------------------# [67] deleted:-60 hits [67] collecting blastx reports [67] ERROR: Could not colapse BLAST reports [67]? at /root/maker/bin/../lib/GI.pm line 2524 thread 1. [67] GI::combine_blast_report(FastaChunk=HASH(0x108e1a90), ARRAY(0x1b874938), ARRAY(0xf127ad8), runlog=HASH(0x4d54ed8)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760 thread 1 [67] Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415 thread 1 [67] eval {...} called at /root/maker/bin/../lib/Error.pm line 407 thread 1 [67] Error::subs::try(CODE(0x1514eb00), HASH(0x9cbeb568)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4215 thread 1 [67] Process::MpiChunk::_go(Process::MpiChunk=HASH(0x13976308), "run", HASH(0x12e04268), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341 thread 1 [67] Process::MpiChunk::run(Process::MpiChunk=HASH(0x13976308), 67) called at /root/maker/bin/maker line 1457 thread 1 [67] main::node_thread("/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 [67] eval {...} called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 [67] threads::new("threads", CODE(0x3dc5b38), "/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /root/maker/bin/maker line 917 thread 1 [67] --> rank=67, hostname=node067 [67] ERROR: Failed while collecting blastx reports [67] ERROR: Chunk failed at level:9, tier_type:3 [67] FAILED CONTIG:Scaffold66 [67]? [67] ERROR: Chunk failed at level:4, tier_type:0 [67] FAILED CONTIG:Scaffold66 I?ve attempted to ignore the error to see if things will proceed on the other 199 processors. ?When I returned to the ?master? node after the evening, Maker keeps repeating the same error code over and over (same scaffold): ] examining contents of the fasta file and run log [67] ERROR: could not make datastore directory [67] --> rank=67, hostname=node067 [67] ERROR: Failed while examining contents of the fasta file and run log [67] ERROR: Chunk failed at level:0, tier_type:0 [67] FAILED CONTIG:Scaffold1589 I stop the job, and restart, and after only a few minutes of running, the same error is reported, this time on a new scaffold. ?Strangely here, the error is reported in the MPI tag of node001, but the error originates at node137: ERROR: Could not colapse BLAST reports [1]? at /root/maker/bin/../lib/GI.pm line 2524. [1] ? ? GI::combine_blast_report(FastaChunk=HASH(0xf4aa9b8), ARRAY(0xf628f90), ARRAY(0x325fea78), runlog=HASH(0x133cc8e8)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760 [1] ? ? Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415 [1] ? ? eval {...} called at /root/maker/bin/../lib/Error.pm line 407 [1] ? ? Error::subs::try(CODE(0x352c9b8), HASH(0xdab3b690)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4215 [1] ? ? Process::MpiChunk::_go(Process::MpiChunk=HASH(0x3545d90), "run", HASH(0x30aa710), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341 [1] ? ? Process::MpiChunk::run(Process::MpiChunk=HASH(0x3545d90), 137) called at /root/maker/bin/maker line 979 [1] --> rank=137, hostname=node137 [1] ERROR: Failed while collecting blastx reports [1] ERROR: Chunk failed at level:9, tier_type:3 [1] FAILED CONTIG:Scaffold249 [1] [1] ERROR: Chunk failed at level:4, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 I?d appreciate any guidance as how best to diagnose this error! Many thanks, Jason Gallant ? Dr. Jason R. GallantAssistant Professor Room 38 Natural Sciences Department of Zoology Michigan State University East Lansing, MI 48824 jgallant at msu.edu office: 517-884-7756 -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Wed Jan 21 17:42:35 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Thu, 22 Jan 2015 11:42:35 +1100 Subject: [maker-devel] repeat masking and repeat libraries In-Reply-To: References: Message-ID: Thanks Mike, I've blasted (blastx against nr) and many, if not most of the repeatmodeler library sequences match with transposases, pol proteins, gag proteins, retrotransposons,... all of them present in other fungi of the same order. Should I leave it to be masked? I still do run prediction on the unmasked genome too? Also, in many cases, the match a couple of thousand bp on the extreme of a 9kbp sequence and in none of them InterProScan is capable of finding anything except potential TM domains or so, provided by SignalP. What do you think? Should I leave it as it is? Thank you again for your time 2015-01-17 4:08 GMT+11:00 Michael Campbell : > Hi Xabier, > > I haven't seen orders or families documented for repeatmasker with > repbase. Fungi seems safe to me. > > If you want to give yourself a little more peace of mind about the > repeatmodeler library you can blast it to database of known fungal proteins > and remove the entries int he library that have strong hits to a known > protein to avoid over-masking. > > Mike > > On Fri, Jan 16, 2015 at 10:04 AM, Carson Holt wrote: > >> Using both RepBase and a RepeatModeler produced library should be >> sufficient, especially for fungi. >> >> ?Carson >> >> >> On Jan 16, 2015, at 3:11 AM, Xabier V?zquez Campos >> wrote: >> >> Hi there, >> >> First, a general question. Probably kind of silly but I prefer to be >> sure... When you browse RepBase, for example in fungi, all the repeats are >> marked as Eukaryota (Ancestral) or under the name of the species but no >> other taxa ranks are indicated. Does RepeatMasker recognise orders, >> families etc? or in my case should I stick with model_org=fungi? >> >> I've been trying to create a repeat libraries specific for my genomes and >> I did't have any luck with the programs described in the Basic >> >> and advanced >> >> tutorials (neither in my computer or in the cluster), reporting errors at >> all times, with exception of RepeatModeler, which ran with no problems. Is >> the output from RepeatModeler enough to improve the masking? It is not the >> best option I guess, but better than just the RepBase libraries by >> themselves, isn't it? >> >> Thank you for your time, >> >> Xabier >> >> -- >> Xabier V?zquez Campos >> *PhD Candidate* >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > -- > Michael Campbell MS, RD. > Doctoral Candidate > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ph:585-3543 > > -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Thu Jan 22 09:42:56 2015 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Thu, 22 Jan 2015 09:42:56 -0700 Subject: [maker-devel] repeat masking and repeat libraries In-Reply-To: References: Message-ID: Hi Xabier, >From what you described I would leave it as is. Mike On Wed, Jan 21, 2015 at 5:42 PM, Xabier V?zquez Campos wrote: > Thanks Mike, > > I've blasted (blastx against nr) and many, if not most of the > repeatmodeler library sequences match with transposases, pol proteins, gag > proteins, retrotransposons,... all of them present in other fungi of the > same order. Should I leave it to be masked? I still do run prediction on > the unmasked genome too? > Also, in many cases, the match a couple of thousand bp on the extreme of a > 9kbp sequence and in none of them InterProScan is capable of finding > anything except potential TM domains or so, provided by SignalP. > > What do you think? Should I leave it as it is? > > Thank you again for your time > > 2015-01-17 4:08 GMT+11:00 Michael Campbell > : > >> Hi Xabier, >> >> I haven't seen orders or families documented for repeatmasker with >> repbase. Fungi seems safe to me. >> >> If you want to give yourself a little more peace of mind about the >> repeatmodeler library you can blast it to database of known fungal proteins >> and remove the entries int he library that have strong hits to a known >> protein to avoid over-masking. >> >> Mike >> >> On Fri, Jan 16, 2015 at 10:04 AM, Carson Holt wrote: >> >>> Using both RepBase and a RepeatModeler produced library should be >>> sufficient, especially for fungi. >>> >>> ?Carson >>> >>> >>> On Jan 16, 2015, at 3:11 AM, Xabier V?zquez Campos >>> wrote: >>> >>> Hi there, >>> >>> First, a general question. Probably kind of silly but I prefer to be >>> sure... When you browse RepBase, for example in fungi, all the repeats are >>> marked as Eukaryota (Ancestral) or under the name of the species but no >>> other taxa ranks are indicated. Does RepeatMasker recognise orders, >>> families etc? or in my case should I stick with model_org=fungi? >>> >>> I've been trying to create a repeat libraries specific for my genomes >>> and I did't have any luck with the programs described in the Basic >>> >>> and advanced >>> >>> tutorials (neither in my computer or in the cluster), reporting errors at >>> all times, with exception of RepeatModeler, which ran with no problems. Is >>> the output from RepeatModeler enough to improve the masking? It is not the >>> best option I guess, but better than just the RepBase libraries by >>> themselves, isn't it? >>> >>> Thank you for your time, >>> >>> Xabier >>> >>> -- >>> Xabier V?zquez Campos >>> *PhD Candidate* >>> Water Research Centre >>> School of Civil and Environmental Engineering >>> The University of New South Wales >>> Sydney NSW 2052 AUSTRALIA >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> >> >> -- >> Michael Campbell MS, RD. >> Doctoral Candidate >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ph:585-3543 >> >> > > > -- > Xabier V?zquez Campos > *PhD Candidate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 23 12:17:36 2015 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 23 Jan 2015 12:17:36 -0700 Subject: [maker-devel] running maker on TACC Stampede. In-Reply-To: References: Message-ID: Stampede only has MVAPICH2. It does not have OpenMPI (even though it has been requested several times). OpenFabrics libraries (used by MVAPICH2) have a known issue that restricts programs from making system calls while running under MPI. A system call is when one program launches another (i.e. MAKER launching BLAST). For this reason MAKER does not work with MVAPICH2. It only works with OpenMPI. You can still get it to work on MVAPICH2, but only on a single node. If you request more than one node then it will fail. The solution would be for TACC to install OpenMPI as an option on Stampede (like they have on Lonestar), but until that happens you can only run MAKER on a single node. Thanks, Carson > On Jan 22, 2015, at 10:51 PM, Won C Yim wrote: > > Dear anyone whom may it concern, > > Hi! > > My name is Won Cheol Yim in University of Nevada, Reno. > > I try to run MAKER on TACC Stampede. > > It looks everything installed properly. > > ============================================================================== > STATUS MAKER v2.31.8 > ============================================================================== > PERL Dependencies: > VERIFIED > External Programs: > VERIFIED > External C Libraries: > VERIFIED > MPI SUPPORT: > ENABLED > MWAS Web Interface: > DISABLED > MAKER PACKAGE: > CONFIGURATION OK > > And I installed Perl 5.18.4 with threads option. > > But I try to run it with MPI, it generated error. > > I assumed this problem came from ibrun in Stampede. > > Is there anyway to run it on Stampede? > > Here is my log. > > TACC: Starting up job > TACC: Setting up parallel environment for MVAPICH ssh-based mpirun. > cat: /home1/02908/wyim/.sge/job..hostlist.kUm5vXw9: No such file or directory > sort: open failed: /home1/02908/wyim/.sge/job..hostlist.kUm5vXw9: No such file or directory > TACC: Setup complete. Running job script. > TACC: starting parallel tasks... > [c404-703.stampede.tacc.utexas.edu:mpirun_rsh][read_hostfile] Can't open hostfile `/home1/02908/wyim/.sge/job..hostlist.kUm5vXw9': (2) > TACC: MPI job exited with code: 1 > TACC: Shutting down parallel environment. > TACC: Shutdown complete. Exiting. > > > Regards, > > Won > -- > Yim, Won Cheol > Sent with Airmail -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 23 13:00:56 2015 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 23 Jan 2015 13:00:56 -0700 Subject: [maker-devel] Maker on Amazon EC2 Using Starcluster In-Reply-To: <1421848561970.c8b481bf@Nodemailer> References: <1421848561970.c8b481bf@Nodemailer> Message-ID: MAKER needs a global storage location. You probably need to set up one of your instances up to act as a shared storage server. AWS has lustre implementations for the cloud, perhaps you can try that. Also use OpenMPI instead of MPICH2. It?s more stable. I look forward to seeing how your experiment with AWS, MPI, and MAKER works out. ?Carson > On Jan 21, 2015, at 6:56 AM, Jason Gallant wrote: > > Hi Everyone, > > I?m attempting to run Maker on Amazon EC2 using MIT?s starcluster? I?ve started a 200 node cluster, and enabled MPICH2 (Starcluster by default uses OpenMPI). I plan on documenting this setup once I?ve figured out how to run things reliably. > > I?m having a persistent issue where something fails on one of the nodes, and std error is flooded with: > > examining contents of the fasta file and run log > [67] ERROR: could not make datastore directory > [67] --> rank=67, hostname=node067 > [67] ERROR: Failed while examining contents of the fasta file and run log > [67] ERROR: Chunk failed at level:0, tier_type:0 > [67] FAILED CONTIG:Scaffold261 > > This error repeats for each ?next? scaffold for some time. When I go back to find the ?source? of the error in the log, the following is the first error message on that node: > > 67] #-------------------------------# > [67] deleted:-60 hits > [67] collecting blastx reports > [67] ERROR: Could not colapse BLAST reports > [67] at /root/maker/bin/../lib/GI.pm line 2524 thread 1. > [67] GI::combine_blast_report(FastaChunk=HASH(0x108e1a90), ARRAY(0x1b874938), ARRAY(0xf127ad8), runlog=HASH(0x4d54ed8)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760 thread 1 > [67] Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415 thread 1 > [67] eval {...} called at /root/maker/bin/../lib/Error.pm line 407 thread 1 > [67] Error::subs::try(CODE(0x1514eb00), HASH(0x9cbeb568)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4215 thread 1 > [67] Process::MpiChunk::_go(Process::MpiChunk=HASH(0x13976308), "run", HASH(0x12e04268), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341 thread 1 > [67] Process::MpiChunk::run(Process::MpiChunk=HASH(0x13976308), 67) called at /root/maker/bin/maker line 1457 thread 1 > [67] main::node_thread("/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 > [67] eval {...} called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 > [67] threads::new("threads", CODE(0x3dc5b38), "/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /root/maker/bin/maker line 917 thread 1 > [67] --> rank=67, hostname=node067 > [67] ERROR: Failed while collecting blastx reports > [67] ERROR: Chunk failed at level:9, tier_type:3 > [67] FAILED CONTIG:Scaffold66 > [67] > [67] ERROR: Chunk failed at level:4, tier_type:0 > [67] FAILED CONTIG:Scaffold66 > > > I?ve attempted to ignore the error to see if things will proceed on the other 199 processors. When I returned to the ?master? node after the evening, Maker keeps repeating the same error code over and over (same scaffold): > ] examining contents of the fasta file and run log > [67] ERROR: could not make datastore directory > [67] --> rank=67, hostname=node067 > [67] ERROR: Failed while examining contents of the fasta file and run log > [67] ERROR: Chunk failed at level:0, tier_type:0 > [67] FAILED CONTIG:Scaffold1589 > > I stop the job, and restart, and after only a few minutes of running, the same error is reported, this time on a new scaffold. Strangely here, the error is reported in the MPI tag of node001, but the error originates at node137: > > ERROR: Could not colapse BLAST reports > [1] at /root/maker/bin/../lib/GI.pm line 2524. > [1] GI::combine_blast_report(FastaChunk=HASH(0xf4aa9b8), ARRAY(0xf628f90), ARRAY(0x325fea78), runlog=HASH(0x133cc8e8)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760 > [1] Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415 > [1] eval {...} called at /root/maker/bin/../lib/Error.pm line 407 > [1] Error::subs::try(CODE(0x352c9b8), HASH(0xdab3b690)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4215 > [1] Process::MpiChunk::_go(Process::MpiChunk=HASH(0x3545d90), "run", HASH(0x30aa710), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341 > [1] Process::MpiChunk::run(Process::MpiChunk=HASH(0x3545d90), 137) called at /root/maker/bin/maker line 979 > [1] --> rank=137, hostname=node137 > [1] ERROR: Failed while collecting blastx reports > [1] ERROR: Chunk failed at level:9, tier_type:3 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] ERROR: Chunk failed at level:4, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > > I?d appreciate any guidance as how best to diagnose this error! > > Many thanks, > Jason Gallant > > > > > ? > Dr. Jason R. Gallant > Assistant Professor > Room 38 Natural Sciences > Department of Zoology > Michigan State University > East Lansing, MI 48824 > jgallant at msu.edu > office: 517-884-7756 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From jcornel3 at asu.edu Fri Jan 23 14:28:13 2015 From: jcornel3 at asu.edu (John Cornelius) Date: Fri, 23 Jan 2015 13:28:13 -0800 Subject: [maker-devel] Maker-P vs. Maker Message-ID: Hi, I'm working on annotating a tetraploid animal with a genome size that is 3.1 gigabase in size. I was wondering if maker-P would be appropriate for this organism or is I should just stick with maker? Thanks. -- John Cornelius MCB PhD Candidate Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 23 14:59:01 2015 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 23 Jan 2015 14:59:01 -0700 Subject: [maker-devel] Maker-P vs. Maker In-Reply-To: References: Message-ID: <7813BFBE-7237-4298-8AD3-B210CB96DDD2@gmail.com> Actually the code bases have been merged. So if you use the most recent version of MAKER, the plant extensions for RNA annotation and extra analysis scripts from MAKER-P will be there. If you don?t need them, then just don?t turn the options on in the control files. ?Carson > On Jan 23, 2015, at 2:28 PM, John Cornelius wrote: > > Hi, I'm working on annotating a tetraploid animal with a genome size that is 3.1 gigabase in size. I was wondering if maker-P would be appropriate for this organism or is I should just stick with maker? Thanks. > > -- > John Cornelius > MCB PhD Candidate > Arizona State University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Jan 26 12:17:45 2015 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 26 Jan 2015 12:17:45 -0700 Subject: [maker-devel] running maker on TACC Stampede. In-Reply-To: References: Message-ID: Do you mean sequence upstream of the gene? If that is the case you would probably have to write a script to do this. BioPerl is one options that has several Perl modules that help with manipulating fasta sequences and many common biology tool file formats ?> http://www.bioperl.org ?Carson > On Jan 26, 2015, at 12:10 PM, Won C Yim wrote: > > Dear Carson Holt, > > Thank you for your reply. > > I asked this issue to STAMPEDE and there?s no way to help me. > > I think we need to move another server for MAKER. > > Thank you for your help. > > And I have a one more question. > > Is there any way to extract upstream sequence from MAKER results? > > I tried to extract upstream and downstream results from them, but it?s really hard to do it. > > Regards, > > Won > > -- > Yim, Won Cheol > MS330/Department of Biochemistry & Molecular Biology > 1664 N. Virginia Street > University of Nevada, Reno > > email: wyim at unr.edu > > > On January 23, 2015 at 11:17:41 AM, Carson Holt (carsonhh at gmail.com ) wrote: > >> Stampede only has MVAPICH2. It does not have OpenMPI (even though it has been requested several times). OpenFabrics libraries (used by MVAPICH2) have a known issue that restricts programs from making system calls while running under MPI. A system call is when one program launches another (i.e. MAKER launching BLAST). For this reason MAKER does not work with MVAPICH2. It only works with OpenMPI. >> >> You can still get it to work on MVAPICH2, but only on a single node. If you request more than one node then it will fail. The solution would be for TACC to install OpenMPI as an option on Stampede (like they have on Lonestar), but until that happens you can only run MAKER on a single node. >> >> Thanks, >> Carson >> >> >>> On Jan 22, 2015, at 10:51 PM, Won C Yim > wrote: >>> >>> Dear anyone whom may it concern, >>> >>> Hi! >>> >>> My name is Won Cheol Yim in University of Nevada, Reno. >>> >>> I try to run MAKER on TACC Stampede. >>> >>> It looks everything installed properly. >>> >>> ============================================================================== >>> STATUS MAKER v2.31.8 >>> ============================================================================== >>> PERL Dependencies:VERIFIED >>> External Programs:VERIFIED >>> External C Libraries:VERIFIED >>> MPI SUPPORT:ENABLED >>> MWAS Web Interface:DISABLED >>> MAKER PACKAGE:CONFIGURATION OK >>> >>> And I installed Perl 5.18.4 with threads option. >>> >>> But I try to run it with MPI, it generated error. >>> >>> I assumed this problem came from ibrun in Stampede. >>> >>> Is there anyway to run it on Stampede? >>> >>> Here is my log. >>> >>> TACC: Starting up job >>> TACC: Setting up parallel environment for MVAPICH ssh-based mpirun. >>> cat: /home1/02908/wyim/.sge/job..hostlist.kUm5vXw9: No such file or directory >>> sort: open failed: /home1/02908/wyim/.sge/job..hostlist.kUm5vXw9: No such file or directory >>> TACC: Setup complete. Running job script. >>> TACC: starting parallel tasks... >>> [c404-703.stampede.tacc.utexas.edu:mpirun_rsh][read_hostfile] Can't open hostfile `/home1/02908/wyim/.sge/job..hostlist.kUm5vXw9': (2) >>> TACC: MPI job exited with code: 1 >>> TACC: Shutting down parallel environment. >>> TACC: Shutdown complete. Exiting. >>> >>> >>> Regards, >>> >>> Won >>> -- >>> Yim, Won Cheol >>> Sent with Airmail -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.hoeppner at imbim.uu.se Wed Jan 28 00:01:48 2015 From: marc.hoeppner at imbim.uu.se (=?utf-8?B?TWFyYyBIw7ZwcG5lcg==?=) Date: Wed, 28 Jan 2015 07:01:48 +0000 Subject: [maker-devel] Maker crash on increasingly small contigs In-Reply-To: <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> References: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> Message-ID: Hi, this is probably a long shot, but I was hoping that someone on the list may have some advice as to how to debug an error that has been popping up when running Maker on our 10 node cluster. So, what is the issue? Maker runs fine on several assemblies that w have processed in the past, but I recently started on a fairly fragment (low N50) mammalian assembly and the collaborator was keen to have all contigs annotated, down to 1kb (I guess it is more about the repeats and blast matches in those small bits). Anyway, As the contigs get smaller, Maker starts crashing in MPI mode with the following error (no other message given prior to that): perl:13424 terminated with signal 11 at PC=3d47095012 SP=7f8ac076e530. Backtrace: /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x3d47095012] /lib64/libpthread.so.0[0x358ae0f710] /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x0)[0x3d47094ff0] /lib64/libpthread.so.0[0x358ae0f710] /lib64/libc.so.6(__poll+0x53)[0x358aadf343] /sw/openmpi/1.8.3/lib/libopen-pal.so.6(+0x6af4a)[0x7f8ac0a29f4a] /sw/openmpi/1.8.3/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x221)[0x7f8ac0a21961] /sw/openmpi/1.8.3/lib/libopen-rte.so.7(+0x52f8e)[0x7f8ac0ce5f8e] /lib64/libpthread.so.0[0x358ae079d1] /lib64/libc.so.6(clone+0x6d)[0x358aae8b6d] SIGTERM received A few words about the setup: We have 10 nodes, 160 cores and the shared file system is exported via Infiniband from a ?standard? NFS server. As OS we run Scientific Linux 6.5. Tests so far don?t point to congestion issues or anything like that, the bandwidth usage is actually fairly low. I So far I tried: - running the MPI processes through both the ethernet network as well as over IPoIB, same problem. - installing a more recent version of perl through perlbrew, with all the required modules, and re-compiled Maker - ran some (albeit simple) network checks to for retransmissions, lost packages etc - nothing popped up - running Maker in a subset of nodes to eliminate the possibility of a bad node The error message is a bit cryptic to me and it would be very helpful to know if Maker has a problem with accessing a file, or whether OpenMPI has a communication problem etc - but I am not able to tell from the information I have been able to extract so far. Any ideas? So Cheers, Marc Marc P. Hoeppner, PhD Team Leader BILS Genome Annotation Platform Department for Medical Biochemistry and Microbiology Uppsala University, Sweden marc.hoeppner at imbim.uu.se From dence at genetics.utah.edu Wed Jan 28 09:22:09 2015 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 28 Jan 2015 16:22:09 +0000 Subject: [maker-devel] Maker crash on increasingly small contigs In-Reply-To: References: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> Message-ID: <19F7E075-6B18-4DB2-B97A-922D29456E52@genetics.utah.edu> Hi Marc, so a few things on the maker side to check out. Did you have the min_contig set to 1000, to set the lower limit on contig size? Did maker do anything with the 1kb contigs? Or did it just skip them? You can check that in the master_datastore_index.log or in the void directories for the small contigs. That will tell us whether maker is functioning correctly, even though it?s giving those messages. With the newer versions of makers, I get messages identical to what you sent as part of the normal thread termination, even when maker is functioning normally. Thanks, Daniel > On Jan 28, 2015, at 12:01 AM, Marc H?ppner wrote: > > Hi, > > this is probably a long shot, but I was hoping that someone on the list may have some advice as to how to debug an error that has been popping up when running Maker on our 10 node cluster. So, what is the issue? > > Maker runs fine on several assemblies that w have processed in the past, but I recently started on a fairly fragment (low N50) mammalian assembly and the collaborator was keen to have all contigs annotated, down to 1kb (I guess it is more about the repeats and blast matches in those small bits). Anyway, As the contigs get smaller, Maker starts crashing in MPI mode with the following error (no other message given prior to that): > > perl:13424 terminated with signal 11 at PC=3d47095012 SP=7f8ac076e530. Backtrace: > /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x3d47095012] > /lib64/libpthread.so.0[0x358ae0f710] > /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x0)[0x3d47094ff0] > /lib64/libpthread.so.0[0x358ae0f710] > /lib64/libc.so.6(__poll+0x53)[0x358aadf343] > /sw/openmpi/1.8.3/lib/libopen-pal.so.6(+0x6af4a)[0x7f8ac0a29f4a] > /sw/openmpi/1.8.3/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x221)[0x7f8ac0a21961] > /sw/openmpi/1.8.3/lib/libopen-rte.so.7(+0x52f8e)[0x7f8ac0ce5f8e] > /lib64/libpthread.so.0[0x358ae079d1] > /lib64/libc.so.6(clone+0x6d)[0x358aae8b6d] > SIGTERM received > > A few words about the setup: > > We have 10 nodes, 160 cores and the shared file system is exported via Infiniband from a ?standard? NFS server. As OS we run Scientific Linux 6.5. Tests so far don?t point to congestion issues or anything like that, the bandwidth usage is actually fairly low. I > > So far I tried: > > - running the MPI processes through both the ethernet network as well as over IPoIB, same problem. > - installing a more recent version of perl through perlbrew, with all the required modules, and re-compiled Maker > - ran some (albeit simple) network checks to for retransmissions, lost packages etc - nothing popped up > - running Maker in a subset of nodes to eliminate the possibility of a bad node > > The error message is a bit cryptic to me and it would be very helpful to know if Maker has a problem with accessing a file, or whether OpenMPI has a communication problem etc - but I am not able to tell from the information I have been able to extract so far. Any ideas? > > So > > Cheers, > > Marc > > > Marc P. Hoeppner, PhD > Team Leader > BILS Genome Annotation Platform > Department for Medical Biochemistry and Microbiology > Uppsala University, Sweden > marc.hoeppner at imbim.uu.se > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From marc.hoeppner at imbim.uu.se Thu Jan 29 00:34:17 2015 From: marc.hoeppner at imbim.uu.se (Marc P. Hoeppner) Date: Thu, 29 Jan 2015 08:34:17 +0100 Subject: [maker-devel] Maker crash on increasingly small contigs In-Reply-To: <19F7E075-6B18-4DB2-B97A-922D29456E52@genetics.utah.edu> References: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> <19F7E075-6B18-4DB2-B97A-922D29456E52@genetics.utah.edu> Message-ID: <54C9E279.8040907@imbim.uu.se> Hi, thanks for the feedback. If I resume maker enough times, it will eventually run through an complete all contigs. The question is whether there is any way to debug why it drops at random times , most commonly when running on small contigs (which is probably more due to the increasing frequency of starting/finishing jobs rather than their size). I guess Maker has no debug mode or any other way to find out why it dies? Any idea what could make Maker drop like that? I was thinking NFS, but the nfsstat looks fine, nothing in the log and NFS function is generally good - so I can't identify a good point to look for the problem. Regards, Marc On 2015-01-28 17:22, Daniel Ence wrote: > Hi Marc, so a few things on the maker side to check out. > > Did you have the min_contig set to 1000, to set the lower limit on contig size? > Did maker do anything with the 1kb contigs? Or did it just skip them? > You can check that in the master_datastore_index.log or in the void directories for the small contigs. > That will tell us whether maker is functioning correctly, even though it?s giving those messages. > > With the newer versions of makers, I get messages identical to what you sent as part of the normal thread termination, even when maker is functioning normally. > > Thanks, > Daniel > > > >> On Jan 28, 2015, at 12:01 AM, Marc H?ppner wrote: >> >> Hi, >> >> this is probably a long shot, but I was hoping that someone on the list may have some advice as to how to debug an error that has been popping up when running Maker on our 10 node cluster. So, what is the issue? >> >> Maker runs fine on several assemblies that w have processed in the past, but I recently started on a fairly fragment (low N50) mammalian assembly and the collaborator was keen to have all contigs annotated, down to 1kb (I guess it is more about the repeats and blast matches in those small bits). Anyway, As the contigs get smaller, Maker starts crashing in MPI mode with the following error (no other message given prior to that): >> >> perl:13424 terminated with signal 11 at PC=3d47095012 SP=7f8ac076e530. Backtrace: >> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x3d47095012] >> /lib64/libpthread.so.0[0x358ae0f710] >> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x0)[0x3d47094ff0] >> /lib64/libpthread.so.0[0x358ae0f710] >> /lib64/libc.so.6(__poll+0x53)[0x358aadf343] >> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(+0x6af4a)[0x7f8ac0a29f4a] >> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x221)[0x7f8ac0a21961] >> /sw/openmpi/1.8.3/lib/libopen-rte.so.7(+0x52f8e)[0x7f8ac0ce5f8e] >> /lib64/libpthread.so.0[0x358ae079d1] >> /lib64/libc.so.6(clone+0x6d)[0x358aae8b6d] >> SIGTERM received >> >> A few words about the setup: >> >> We have 10 nodes, 160 cores and the shared file system is exported via Infiniband from a ?standard? NFS server. As OS we run Scientific Linux 6.5. Tests so far don?t point to congestion issues or anything like that, the bandwidth usage is actually fairly low. I >> >> So far I tried: >> >> - running the MPI processes through both the ethernet network as well as over IPoIB, same problem. >> - installing a more recent version of perl through perlbrew, with all the required modules, and re-compiled Maker >> - ran some (albeit simple) network checks to for retransmissions, lost packages etc - nothing popped up >> - running Maker in a subset of nodes to eliminate the possibility of a bad node >> >> The error message is a bit cryptic to me and it would be very helpful to know if Maker has a problem with accessing a file, or whether OpenMPI has a communication problem etc - but I am not able to tell from the information I have been able to extract so far. Any ideas? >> >> So >> >> Cheers, >> >> Marc >> >> >> Marc P. Hoeppner, PhD >> Team Leader >> BILS Genome Annotation Platform >> Department for Medical Biochemistry and Microbiology >> Uppsala University, Sweden >> marc.hoeppner at imbim.uu.se >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mikael.durling at slu.se Thu Jan 29 02:37:23 2015 From: mikael.durling at slu.se (=?utf-8?B?TWlrYWVsIEJyYW5kc3Ryw7ZtIER1cmxpbmc=?=) Date: Thu, 29 Jan 2015 09:37:23 +0000 Subject: [maker-devel] Maker crash on increasingly small contigs In-Reply-To: <54C9E279.8040907@imbim.uu.se> References: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> <19F7E075-6B18-4DB2-B97A-922D29456E52@genetics.utah.edu> <54C9E279.8040907@imbim.uu.se> Message-ID: Hi, are you running the NFS servers in synchronous or asynchronous mode? I have seen cases when maker fails with the nfs server in async mode, but the failures are random and I can?t really reproduce them. In the end, I have continued running maker on NFS in async mode, since the speed gains are significant, at the cost of occasional reruns. (And yes, nfsstats shows no signs of errors). Mikael > 29 jan 2015 kl. 08:34 skrev Marc P. Hoeppner : > > Hi, > > thanks for the feedback. If I resume maker enough times, it will eventually run through an complete all contigs. The question is whether there is any way to debug why it drops at random times , most commonly when running on small contigs (which is probably more due to the increasing frequency of starting/finishing jobs rather than their size). I guess Maker has no debug mode or any other way to find out why it dies? Any idea what could make Maker drop like that? I was thinking NFS, but the nfsstat looks fine, nothing in the log and NFS function is generally good - so I can't identify a good point to look for the problem. > > Regards, > > Marc > > On 2015-01-28 17:22, Daniel Ence wrote: >> Hi Marc, so a few things on the maker side to check out. >> >> Did you have the min_contig set to 1000, to set the lower limit on contig size? >> Did maker do anything with the 1kb contigs? Or did it just skip them? >> You can check that in the master_datastore_index.log or in the void directories for the small contigs. >> That will tell us whether maker is functioning correctly, even though it?s giving those messages. >> >> With the newer versions of makers, I get messages identical to what you sent as part of the normal thread termination, even when maker is functioning normally. >> >> Thanks, >> Daniel >> >> >> >>> On Jan 28, 2015, at 12:01 AM, Marc H?ppner wrote: >>> >>> Hi, >>> >>> this is probably a long shot, but I was hoping that someone on the list may have some advice as to how to debug an error that has been popping up when running Maker on our 10 node cluster. So, what is the issue? >>> >>> Maker runs fine on several assemblies that w have processed in the past, but I recently started on a fairly fragment (low N50) mammalian assembly and the collaborator was keen to have all contigs annotated, down to 1kb (I guess it is more about the repeats and blast matches in those small bits). Anyway, As the contigs get smaller, Maker starts crashing in MPI mode with the following error (no other message given prior to that): >>> >>> perl:13424 terminated with signal 11 at PC=3d47095012 SP=7f8ac076e530. Backtrace: >>> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x3d47095012] >>> /lib64/libpthread.so.0[0x358ae0f710] >>> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x0)[0x3d47094ff0] >>> /lib64/libpthread.so.0[0x358ae0f710] >>> /lib64/libc.so.6(__poll+0x53)[0x358aadf343] >>> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(+0x6af4a)[0x7f8ac0a29f4a] >>> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x221)[0x7f8ac0a21961] >>> /sw/openmpi/1.8.3/lib/libopen-rte.so.7(+0x52f8e)[0x7f8ac0ce5f8e] >>> /lib64/libpthread.so.0[0x358ae079d1] >>> /lib64/libc.so.6(clone+0x6d)[0x358aae8b6d] >>> SIGTERM received >>> >>> A few words about the setup: >>> >>> We have 10 nodes, 160 cores and the shared file system is exported via Infiniband from a ?standard? NFS server. As OS we run Scientific Linux 6.5. Tests so far don?t point to congestion issues or anything like that, the bandwidth usage is actually fairly low. I >>> >>> So far I tried: >>> >>> - running the MPI processes through both the ethernet network as well as over IPoIB, same problem. >>> - installing a more recent version of perl through perlbrew, with all the required modules, and re-compiled Maker >>> - ran some (albeit simple) network checks to for retransmissions, lost packages etc - nothing popped up >>> - running Maker in a subset of nodes to eliminate the possibility of a bad node >>> >>> The error message is a bit cryptic to me and it would be very helpful to know if Maker has a problem with accessing a file, or whether OpenMPI has a communication problem etc - but I am not able to tell from the information I have been able to extract so far. Any ideas? >>> >>> So >>> >>> Cheers, >>> >>> Marc >>> >>> >>> Marc P. Hoeppner, PhD >>> Team Leader >>> BILS Genome Annotation Platform >>> Department for Medical Biochemistry and Microbiology >>> Uppsala University, Sweden >>> marc.hoeppner at imbim.uu.se >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Jan 29 08:22:57 2015 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 29 Jan 2015 08:22:57 -0700 Subject: [maker-devel] Maker crash on increasingly small contigs In-Reply-To: References: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> <19F7E075-6B18-4DB2-B97A-922D29456E52@genetics.utah.edu> <54C9E279.8040907@imbim.uu.se> Message-ID: In my experience NFS is the most likely cause. A lot of very small contigs means that MAKER would produce a lot of very small files very quickly, which creates far more stress for NFS than high IO read/write bandwidth does. There can then be several seconds of lag time between a file being created and the file being available for reading because the asynchronous setting allows the system to return true for IO operation even though the operations have not yet been completed but are only buffered on the NFS server. So when the process tries to read the file it supposably just created, the file doesn?t exist. MAKER tries to offload most small file creation operations that can result in this condition to a temporary directory (indicated by TMP= in the maker_opts.ctl file), so it is critical that this location be set to a local drive and not an NFS location. But running a lot of very small contigs would still result in more frequent file creation on the NFS mount. The only way around this type of NFS issue is either to run on fewer nodes to reduce file creation frequency, turn off asynchronous mode for NFS (which results in serious IO performance degradation) or to just let MAKER retry until it works (brute force) which is the default and in my experience the most effective approach. NFS issues were in fact the reason we put retry and restart capabilities into MAKER in the first place. ?Carson > On Jan 29, 2015, at 2:37 AM, Mikael Brandstr?m Durling wrote: > > Hi, > > are you running the NFS servers in synchronous or asynchronous mode? I have seen cases when maker fails with the nfs server in async mode, but the failures are random and I can?t really reproduce them. In the end, I have continued running maker on NFS in async mode, since the speed gains are significant, at the cost of occasional reruns. (And yes, nfsstats shows no signs of errors). > > Mikael > > >> 29 jan 2015 kl. 08:34 skrev Marc P. Hoeppner : >> >> Hi, >> >> thanks for the feedback. If I resume maker enough times, it will eventually run through an complete all contigs. The question is whether there is any way to debug why it drops at random times , most commonly when running on small contigs (which is probably more due to the increasing frequency of starting/finishing jobs rather than their size). I guess Maker has no debug mode or any other way to find out why it dies? Any idea what could make Maker drop like that? I was thinking NFS, but the nfsstat looks fine, nothing in the log and NFS function is generally good - so I can't identify a good point to look for the problem. >> >> Regards, >> >> Marc >> >> On 2015-01-28 17:22, Daniel Ence wrote: >>> Hi Marc, so a few things on the maker side to check out. >>> >>> Did you have the min_contig set to 1000, to set the lower limit on contig size? >>> Did maker do anything with the 1kb contigs? Or did it just skip them? >>> You can check that in the master_datastore_index.log or in the void directories for the small contigs. >>> That will tell us whether maker is functioning correctly, even though it?s giving those messages. >>> >>> With the newer versions of makers, I get messages identical to what you sent as part of the normal thread termination, even when maker is functioning normally. >>> >>> Thanks, >>> Daniel >>> >>> >>> >>>> On Jan 28, 2015, at 12:01 AM, Marc H?ppner wrote: >>>> >>>> Hi, >>>> >>>> this is probably a long shot, but I was hoping that someone on the list may have some advice as to how to debug an error that has been popping up when running Maker on our 10 node cluster. So, what is the issue? >>>> >>>> Maker runs fine on several assemblies that w have processed in the past, but I recently started on a fairly fragment (low N50) mammalian assembly and the collaborator was keen to have all contigs annotated, down to 1kb (I guess it is more about the repeats and blast matches in those small bits). Anyway, As the contigs get smaller, Maker starts crashing in MPI mode with the following error (no other message given prior to that): >>>> >>>> perl:13424 terminated with signal 11 at PC=3d47095012 SP=7f8ac076e530. Backtrace: >>>> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x3d47095012] >>>> /lib64/libpthread.so.0[0x358ae0f710] >>>> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x0)[0x3d47094ff0] >>>> /lib64/libpthread.so.0[0x358ae0f710] >>>> /lib64/libc.so.6(__poll+0x53)[0x358aadf343] >>>> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(+0x6af4a)[0x7f8ac0a29f4a] >>>> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x221)[0x7f8ac0a21961] >>>> /sw/openmpi/1.8.3/lib/libopen-rte.so.7(+0x52f8e)[0x7f8ac0ce5f8e] >>>> /lib64/libpthread.so.0[0x358ae079d1] >>>> /lib64/libc.so.6(clone+0x6d)[0x358aae8b6d] >>>> SIGTERM received >>>> >>>> A few words about the setup: >>>> >>>> We have 10 nodes, 160 cores and the shared file system is exported via Infiniband from a ?standard? NFS server. As OS we run Scientific Linux 6.5. Tests so far don?t point to congestion issues or anything like that, the bandwidth usage is actually fairly low. I >>>> >>>> So far I tried: >>>> >>>> - running the MPI processes through both the ethernet network as well as over IPoIB, same problem. >>>> - installing a more recent version of perl through perlbrew, with all the required modules, and re-compiled Maker >>>> - ran some (albeit simple) network checks to for retransmissions, lost packages etc - nothing popped up >>>> - running Maker in a subset of nodes to eliminate the possibility of a bad node >>>> >>>> The error message is a bit cryptic to me and it would be very helpful to know if Maker has a problem with accessing a file, or whether OpenMPI has a communication problem etc - but I am not able to tell from the information I have been able to extract so far. Any ideas? >>>> >>>> So >>>> >>>> Cheers, >>>> >>>> Marc >>>> >>>> >>>> Marc P. Hoeppner, PhD >>>> Team Leader >>>> BILS Genome Annotation Platform >>>> Department for Medical Biochemistry and Microbiology >>>> Uppsala University, Sweden >>>> marc.hoeppner at imbim.uu.se >>>> >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From myandell at genetics.utah.edu Thu Jan 29 09:54:50 2015 From: myandell at genetics.utah.edu (Mark Yandell) Date: Thu, 29 Jan 2015 16:54:50 +0000 Subject: [maker-devel] Maker crash on increasingly small contigs In-Reply-To: <54C9E279.8040907@imbim.uu.se> References: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> <19F7E075-6B18-4DB2-B97A-922D29456E52@genetics.utah.edu>, <54C9E279.8040907@imbim.uu.se> Message-ID: <7A60AB257EFF2B48B1F4C814817EA053E371D456@mxb2.hg.genetics.utah.edu> Hi Marc, are you sure this n't your system? E.G. bad NFS mounts, scratch full etc? Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Co-director USTAR Center for Genetic Discovery Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Marc P. Hoeppner [marc.hoeppner at imbim.uu.se] Sent: Thursday, January 29, 2015 12:34 AM To: Daniel Ence Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Maker crash on increasingly small contigs Hi, thanks for the feedback. If I resume maker enough times, it will eventually run through an complete all contigs. The question is whether there is any way to debug why it drops at random times , most commonly when running on small contigs (which is probably more due to the increasing frequency of starting/finishing jobs rather than their size). I guess Maker has no debug mode or any other way to find out why it dies? Any idea what could make Maker drop like that? I was thinking NFS, but the nfsstat looks fine, nothing in the log and NFS function is generally good - so I can't identify a good point to look for the problem. Regards, Marc On 2015-01-28 17:22, Daniel Ence wrote: > Hi Marc, so a few things on the maker side to check out. > > Did you have the min_contig set to 1000, to set the lower limit on contig size? > Did maker do anything with the 1kb contigs? Or did it just skip them? > You can check that in the master_datastore_index.log or in the void directories for the small contigs. > That will tell us whether maker is functioning correctly, even though it?s giving those messages. > > With the newer versions of makers, I get messages identical to what you sent as part of the normal thread termination, even when maker is functioning normally. > > Thanks, > Daniel > > > >> On Jan 28, 2015, at 12:01 AM, Marc H?ppner wrote: >> >> Hi, >> >> this is probably a long shot, but I was hoping that someone on the list may have some advice as to how to debug an error that has been popping up when running Maker on our 10 node cluster. So, what is the issue? >> >> Maker runs fine on several assemblies that w have processed in the past, but I recently started on a fairly fragment (low N50) mammalian assembly and the collaborator was keen to have all contigs annotated, down to 1kb (I guess it is more about the repeats and blast matches in those small bits). Anyway, As the contigs get smaller, Maker starts crashing in MPI mode with the following error (no other message given prior to that): >> >> perl:13424 terminated with signal 11 at PC=3d47095012 SP=7f8ac076e530. Backtrace: >> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x3d47095012] >> /lib64/libpthread.so.0[0x358ae0f710] >> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x0)[0x3d47094ff0] >> /lib64/libpthread.so.0[0x358ae0f710] >> /lib64/libc.so.6(__poll+0x53)[0x358aadf343] >> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(+0x6af4a)[0x7f8ac0a29f4a] >> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x221)[0x7f8ac0a21961] >> /sw/openmpi/1.8.3/lib/libopen-rte.so.7(+0x52f8e)[0x7f8ac0ce5f8e] >> /lib64/libpthread.so.0[0x358ae079d1] >> /lib64/libc.so.6(clone+0x6d)[0x358aae8b6d] >> SIGTERM received >> >> A few words about the setup: >> >> We have 10 nodes, 160 cores and the shared file system is exported via Infiniband from a ?standard? NFS server. As OS we run Scientific Linux 6.5. Tests so far don?t point to congestion issues or anything like that, the bandwidth usage is actually fairly low. I >> >> So far I tried: >> >> - running the MPI processes through both the ethernet network as well as over IPoIB, same problem. >> - installing a more recent version of perl through perlbrew, with all the required modules, and re-compiled Maker >> - ran some (albeit simple) network checks to for retransmissions, lost packages etc - nothing popped up >> - running Maker in a subset of nodes to eliminate the possibility of a bad node >> >> The error message is a bit cryptic to me and it would be very helpful to know if Maker has a problem with accessing a file, or whether OpenMPI has a communication problem etc - but I am not able to tell from the information I have been able to extract so far. Any ideas? >> >> So >> >> Cheers, >> >> Marc >> >> >> Marc P. Hoeppner, PhD >> Team Leader >> BILS Genome Annotation Platform >> Department for Medical Biochemistry and Microbiology >> Uppsala University, Sweden >> marc.hoeppner at imbim.uu.se >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From ashrafi at ucdavis.edu Thu Jan 29 11:07:41 2015 From: ashrafi at ucdavis.edu (Hamid Ashrafi) Date: Thu, 29 Jan 2015 13:07:41 -0500 Subject: [maker-devel] GFF and Dereferencing problem Message-ID: <007c01d03bee$7ab100a0$701301e0$@ucdavis.edu> Hi, After maker finishes its job it generates many files one of them is gff file. I see the following in some of my gff files. It seems it is a dereferencing problem. I am just wondering if affects my annotation. Hamid uti_cns_0004767 est2genome match_part 856428 856485 3090 + . ID=uti_cns_0004767:hsp:8340:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 uti_cns_0004767 est2genome match_part 856587 856938 3090 + . ID=uti_cns_0004767:hsp:8341:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 uti_cns_0004767 est2genome match_part 857053 857201 3090 + . ID=uti_cns_0004767:hsp:8342:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 uti_cns_0004767 est2genome match_part 859004 859041 3090 + . ID=uti_cns_0004767:hsp:8343:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 uti_cns_0004767 est2genome expressed_sequence_match 878327 878771 1446 + . ID=uti_cns_0004767:hit:4231:3.2.3.8;Name=Sp_Illum_Trans_W uti_cns_0004767 est2genome match_part 878327 878771 1446 + . ID=uti_cns_0004767:hsp:8344:3.2.3.8;Parent=uti_cns_0004767:hit:4231:3.2.3 uti_cns_0004767 est2genome expressed_sequence_match 884121 886610 2509 + . ID=uti_cns_0004767:hit:4232:3.2.3.8;Name=Sp_Illum_Trans_W uti_cns_0004767 est2genome match_part 884121 884195 2509 + . ID=uti_cns_0004767:hsp:8345:3.2.3.8;Parent=uti_cns_0004767:hit:4232:3.2.3 uti_cns_0004767 est2genome match_part 886180 886610 2509 + . ID=uti_cns_0004767:hsp:8346:3.2.3.8;Parent=uti_cns_0004767:hit:4232:3.2.3 ARRAY(0x1b91f110) ARRAY(0x1a686350) ARRAY(0x1b06bba0) ARRAY(0x1b931e10) ARRAY(0x1b13f3a0) ARRAY(0x1b6af650) ARRAY(0x1b929600) -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jan 29 11:47:11 2015 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 29 Jan 2015 11:47:11 -0700 Subject: [maker-devel] Maker on Amazon EC2 Using Starcluster In-Reply-To: <1422394249179.2a90ef9d@Nodemailer> References: <73716718-1273-46F1-BC94-AAD276DFE0E1@gmail.com> <1422394249179.2a90ef9d@Nodemailer> Message-ID: I believe this may be caused by the latency of ansyncrounous operations on your network shared drive (which could have a lot of lag between operations when running in the cloud). Try using a single AWS instance in your test using the local drive as the working directory. Next try with two instances where one id the NFS server and you run MAKER on the other instance but on the network mounted drive. Then try gradually increasing the number of instances hitting the network shared drive. ?Carson > On Jan 27, 2015, at 2:30 PM, Jason Gallant wrote: > > Carson, > > Thanks for the input and the test script? I was successfully able to run Maker using OpenMPI on Starcluster. However, I am still receiving error messages fairly commonly? this is the error I described earlier in this thread. It seems to appear regardless of whether I use OpenMPI or MPICH2. > > Essentially, there seems to be an error collapsing BLAST reports. This error essentially causes maker to stop accepting new contigs on that machine (in this case node060), and maker continues to report every contig following this error as ?failed?. Otherwise, the other nodes seem to be working normally, but this error seems to be able to happen on other nodes as well, so the issue can compound. > > [1,15]:deleted:-60 hits > [1,15]:collecting blastx reports > [1,15]:ERROR: Could not colapse BLAST reports > [1,15]: at /root/maker/bin/../lib/GI.pm line 2524 thread 1. > [1,15]: GI::combine_blast_report(FastaChunk=HASH(0x1781acd8), ARRAY(0xc1e4fa8), ARRAY(0x15ab20d0), runlog=HASH(0xb87f878)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760 thread 1 > [1,15]: Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415 thread 1 > [1,15]: eval {...} called at /root/maker/bin/../lib/Error.pm line 407 thread 1 > [1,15]: Error::subs::try(CODE(0x198e22f8), HASH(0x9c9b65c0)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4224 thread 1 > [1,15]: Process::MpiChunk::_go(Process::MpiChunk=HASH(0x1b8a7cd0), "run", HASH(0x15e3e1a0), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341 thread 1 > [1,15]: Process::MpiChunk::run(Process::MpiChunk=HASH(0x1b8a7cd0), 15) called at /root/maker/bin/maker line 1457 thread 1 > [1,15]: main::node_thread("/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 > [1,15]: eval {...} called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 > [1,15]: threads::new("threads", CODE(0x36c9a98), "/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /root/maker/bin/maker line 917 thread 1 > [1,15]:--> rank=15, hostname=node015 > [1,15]:ERROR: Failed while collecting blastx reports > [1,15]:ERROR: Chunk failed at level:9, tier_type:3 > [1,15]:FAILED CONTIG:Scaffold66 > [1,15]: > [1,15]:ERROR: Chunk failed at level:4, tier_type:0 > [1,15]:FAILED CONTIG:Scaffold66 > [1,15]: > [1,15]:examining contents of the fasta file and run log > [1,15]:ERROR: could not make datastore directory > [1,15]:--> rank=15, hostname=node015 > [1,15]:ERROR: Failed while examining contents of the fasta file and run log > [1,15]:ERROR: Chunk failed at level:0, tier_type:0 > [1,15]:FAILED CONTIG:Scaffold483 > > ? > Dr. Jason R. Gallant > Assistant Professor > Room 38 Natural Sciences > Department of Zoology > Michigan State University > East Lansing, MI 48824 > jgallant at msu.edu > office: 517-884-7756 > > > On Fri, Jan 23, 2015 at 3:25 PM, Carson Holt > wrote: > > The complaining is because there is more than one MAKER process running and they are not connected via MPI. So the problem is OpenMPI. Try installing a small MPI script (like the one attached) and using that to test OpenMPI. Once it is configured correctly then each separate processes will communicate with each other (pay attention to comm size and rank messages). > > ?Carson > > > > > >> On Jan 23, 2015, at 1:15 PM, Jason Gallant > wrote: >> >> Hi Carson, >> >> Yes, I?ve tried that and still have the issue of maker complaining about multiple processes in the same directory. Other ideas? >> >> Best, >> Jason >> >> ? >> Dr. Jason R. Gallant >> Assistant Professor >> Room 38 Natural Sciences >> Department of Zoology >> Michigan State University >> East Lansing, MI 48824 >> jgallant at msu.edu >> office: 517-884-7756 >> >> >> On Fri, Jan 23, 2015 at 3:14 PM, Carson Holt > wrote: >> >> If using OpenMPI, make sure to set LD_PRELOAD to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/usr/local/openmpi/lib/libmpi.so). >> >> >> For OpenMPI you may also want to set OMPI_MCA_mpi_warn_on_fork=0 in your ~/.bash_profile to turn off certain nonfatal warnings. Also if jobs hang or freeze when using mpiexec under OpenMPI try adding the '-mca btl ^openib' flag to mpiexec command when running MAKER. >> >> Example: mpiexec -mca btl ^openib -n 20 maker >> >> ?Carson >> >> >> >>> On Jan 23, 2015, at 1:08 PM, Jason Gallant > wrote: >>> >>> Hi Carson, >>> >>> Yes, STARCLUSTER enables a global storage space, which is via NFS to an EBS drive that I?ve created. >>> >>> I?m using the local disk space on each instance for the /tmp directory, however. >>> >>> It occurred to me on reading the forums that MPICH2 doesn?t scale as well as OPENMPI, however when I try to configure Maker for openmpi and run it, I get complaints from maker that multiple makers are running in the same directory? >>> >>> Thanks for your advice! >>> >>> Best, >>> Jason >>> >>> ? >>> Dr. Jason R. Gallant >>> Assistant Professor >>> Room 38 Natural Sciences >>> Department of Zoology >>> Michigan State University >>> East Lansing, MI 48824 >>> jgallant at msu.edu >>> office: 517-884-7756 >>> >>> >>> On Fri, Jan 23, 2015 at 3:01 PM, Carson Holt > wrote: >>> >>> MAKER needs a global storage location. You probably need to set up one of your instances up to act as a shared storage server. AWS has lustre implementations for the cloud, perhaps you can try that. Also use OpenMPI instead of MPICH2. It?s more stable. >>> >>> I look forward to seeing how your experiment with AWS, MPI, and MAKER works out. >>> >>> ?Carson >>> >>> >>> >>> > On Jan 21, 2015, at 6:56 AM, Jason Gallant > wrote: >>> > >>> > Hi Everyone, >>> > >>> > I?m attempting to run Maker on Amazon EC2 using MIT?s starcluster? I?ve started a 200 node cluster, and enabled MPICH2 (Starcluster by default uses OpenMPI). I plan on documenting this setup once I?ve figured out how to run things reliably. >>> > >>> > I?m having a persistent issue where something fails on one of the nodes, and std error is flooded with: >>> > >>> > examining contents of the fasta file and run log >>> > [67] ERROR: could not make datastore directory >>> > [67] --> rank=67, hostname=node067 >>> > [67] ERROR: Failed while examining contents of the fasta file and run log >>> > [67] ERROR: Chunk failed at level:0, tier_type:0 >>> > [67] FAILED CONTIG:Scaffold261 >>> > >>> > This error repeats for each ?next? scaffold for some time. When I go back to find the ?source? of the error in the log, the following is the first error message on that node: >>> > >>> > 67] #-------------------------------# >>> > [67] deleted:-60 hits >>> > [67] collecting blastx reports >>> > [67] ERROR: Could not colapse BLAST reports >>> > [67] at /root/maker/bin/../lib/GI.pm line 2524 thread 1. >>> > [67] GI::combine_blast_report(FastaChunk=HASH(0x108e1a90), ARRAY(0x1b874938), ARRAY(0xf127ad8), runlog=HASH(0x4d54ed8)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760 thread 1 >>> > [67] Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415 thread 1 >>> > [67] eval {...} called at /root/maker/bin/../lib/Error.pm line 407 thread 1 >>> > [67] Error::subs::try(CODE(0x1514eb00), HASH(0x9cbeb568)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4215 thread 1 >>> > [67] Process::MpiChunk::_go(Process::MpiChunk=HASH(0x13976308), "run", HASH(0x12e04268), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341 thread 1 >>> > [67] Process::MpiChunk::run(Process::MpiChunk=HASH(0x13976308), 67) called at /root/maker/bin/maker line 1457 thread 1 >>> > [67] main::node_thread("/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 >>> > [67] eval {...} called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 >>> > [67] threads::new("threads", CODE(0x3dc5b38), "/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /root/maker/bin/maker line 917 thread 1 >>> > [67] --> rank=67, hostname=node067 >>> > [67] ERROR: Failed while collecting blastx reports >>> > [67] ERROR: Chunk failed at level:9, tier_type:3 >>> > [67] FAILED CONTIG:Scaffold66 >>> > [67] >>> > [67] ERROR: Chunk failed at level:4, tier_type:0 >>> > [67] FAILED CONTIG:Scaffold66 >>> > >>> > >>> > I?ve attempted to ignore the error to see if things will proceed on the other 199 processors. When I returned to the ?master? node after the evening, Maker keeps repeating the same error code over and over (same scaffold): >>> > ] examining contents of the fasta file and run log >>> > [67] ERROR: could not make datastore directory >>> > [67] --> rank=67, hostname=node067 >>> > [67] ERROR: Failed while examining contents of the fasta file and run log >>> > [67] ERROR: Chunk failed at level:0, tier_type:0 >>> > [67] FAILED CONTIG:Scaffold1589 >>> > >>> > I stop the job, and restart, and after only a few minutes of running, the same error is reported, this time on a new scaffold. Strangely here, the error is reported in the MPI tag of node001, but the error originates at node137: >>> > >>> > ERROR: Could not colapse BLAST reports >>> > [1] at /root/maker/bin/../lib/GI.pm line 2524. >>> > [1] GI::combine_blast_report(FastaChunk=HASH(0xf4aa9b8), ARRAY(0xf628f90), ARRAY(0x325fea78), runlog=HASH(0x133cc8e8)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760 >>> > [1] Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415 >>> > [1] eval {...} called at /root/maker/bin/../lib/Error.pm line 407 >>> > [1] Error::subs::try(CODE(0x352c9b8), HASH(0xdab3b690)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4215 >>> > [1] Process::MpiChunk::_go(Process::MpiChunk=HASH(0x3545d90), "run", HASH(0x30aa710), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341 >>> > [1] Process::MpiChunk::run(Process::MpiChunk=HASH(0x3545d90), 137) called at /root/maker/bin/maker line 979 >>> > [1] --> rank=137, hostname=node137 >>> > [1] ERROR: Failed while collecting blastx reports >>> > [1] ERROR: Chunk failed at level:9, tier_type:3 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] ERROR: Chunk failed at level:4, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > >>> > I?d appreciate any guidance as how best to diagnose this error! >>> > >>> > Many thanks, >>> > Jason Gallant >>> > >>> > >>> > >>> > >>> > ? >>> > Dr. Jason R. Gallant >>> > Assistant Professor >>> > Room 38 Natural Sciences >>> > Department of Zoology >>> > Michigan State University >>> > East Lansing, MI 48824 >>> > jgallant at msu.edu >>> > office: 517-884-7756 >>> > _______________________________________________ >>> > maker-devel mailing list >>> > maker-devel at box290.bluehost.com >>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jan 29 12:40:09 2015 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 29 Jan 2015 12:40:09 -0700 Subject: [maker-devel] GFF and Dereferencing problem In-Reply-To: <007c01d03bee$7ab100a0$701301e0$@ucdavis.edu> References: <007c01d03bee$7ab100a0$701301e0$@ucdavis.edu> Message-ID: <65C0DD7A-A3CA-4404-B15E-91B77DC6D8FE@gmail.com> Could you make sure you are using the most recent version of MAKER? There was a bug similar to this that was fixed some time ago. Current version is 2.31.8. also when rerunning with the most recent version of MAKER, make sure to set the -a flag on the command line to force rerun of logged data. ?Carson > On Jan 29, 2015, at 11:07 AM, Hamid Ashrafi wrote: > > Hi, > > After maker finishes its job it generates many files one of them is gff file. I see the following in some of my gff files. It seems it is a dereferencing problem. I am just wondering if affects my annotation. > > Hamid > > uti_cns_0004767 est2genome match_part 856428 856485 3090 + . ID=uti_cns_0004767:hsp:8340:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 > uti_cns_0004767 est2genome match_part 856587 856938 3090 + . ID=uti_cns_0004767:hsp:8341:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 > uti_cns_0004767 est2genome match_part 857053 857201 3090 + . ID=uti_cns_0004767:hsp:8342:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 > uti_cns_0004767 est2genome match_part 859004 859041 3090 + . ID=uti_cns_0004767:hsp:8343:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 > uti_cns_0004767 est2genome expressed_sequence_match 878327 878771 1446 + . ID=uti_cns_0004767:hit:4231:3.2.3.8;Name=Sp_Illum_Trans_W > uti_cns_0004767 est2genome match_part 878327 878771 1446 + . ID=uti_cns_0004767:hsp:8344:3.2.3.8;Parent=uti_cns_0004767:hit:4231:3.2.3 > uti_cns_0004767 est2genome expressed_sequence_match 884121 886610 2509 + . ID=uti_cns_0004767:hit:4232:3.2.3.8;Name=Sp_Illum_Trans_W > uti_cns_0004767 est2genome match_part 884121 884195 2509 + . ID=uti_cns_0004767:hsp:8345:3.2.3.8;Parent=uti_cns_0004767:hit:4232:3.2.3 > uti_cns_0004767 est2genome match_part 886180 886610 2509 + . ID=uti_cns_0004767:hsp:8346:3.2.3.8;Parent=uti_cns_0004767:hit:4232:3.2.3 > ARRAY(0x1b91f110) > ARRAY(0x1a686350) > ARRAY(0x1b06bba0) > ARRAY(0x1b931e10) > ARRAY(0x1b13f3a0) > ARRAY(0x1b6af650) > ARRAY(0x1b929600) > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 30 09:33:46 2015 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 30 Jan 2015 09:33:46 -0700 Subject: [maker-devel] How to improve the result of Maker In-Reply-To: References: Message-ID: <492A6635-67E9-4700-B544-E137C4248E55@gmail.com> See below ?> > I have join "Maker-devel" google group, but I don't known why I can't reply a topic and create a new topic. Is there some limitation? The google site is just a searchable archive of MAKER related e-mails. The actual conversations occur through the MAKER mailing list ?> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org E-mails sent to the list will be automatically archived on google. > I have finish genome annotation with Maker. I use SNAP and Augustus in Maker. I have some questions, could you help me? > > When gene finders have prediction at the same location, maker would choose the best prediction as final output, right? but if the prediction doesn't match evidence very much, how maker will synthesize the prediction with evidence? My knowledge about maker's action is as follow, I'm not sure whether it is right: > > assume that there is an exon existing in evidence but not in prediction, if the exon locate at the end of prediction, it will be output as UTR, but if the exon locate inside prediction, it will be ignored, and not be output, right? No. MAKER uses the introns and exons in the evidence alignments to provide hints to the gene predictors. Hints increases the probability scores of the HMM models by increasing the likelihood of the exon or intron state wherever it overlaps the evidence alignment. This process bumps up the likelihood values for models that better match the evidence alignments resulting in better models than SNAP and Augustus produce on their own without hints. Note that models are still governed by the same constraints of what constitutes an open reading frame and a splice site regardless of evidence alignments. This means that no amount of evidence based hints can overcome an assembly error. > for example: > > the exon pointed by red arrow. all evidences contain this exon, but it was missed in the final output. There are two possibilities. Given how different the snap and augustus models are from one another, this would suggest they have not been trained appropriately (for example if you are picking another related organisms parameter file rather than training these programs, there are several assumptions that are being made that can actually make such an approach almost worse than just picking a parameter file at random). But more likely the evidence supported exon breaks the reading frame of the model. This usually indicates that you have an assembly error (possibly issues with homopolymers). No amount of evidence support will allow you to call an exon that generates a mis-sense causing frameshift, so the predictors do the next most reasonable thing - they drop the exon if another model is tenable. More concerning would be the mRNA-seq alignments near the 3? end of the gene call. The structure suggests significant capture of background transcription with the mRNA-seq reads (long UTRs with weird mini-introns). I would suggest not using cufflinks in this case. You should probably go with an assembly based approach of mRNA-seq reads instead. I would suggest using trinity. It will reduce sensitivity but greatly increase evidence specificity which is where you need the most improvement based on these images. I would also suggest using the jaccard_clip option with trinity. I would further suggest looking at the model in question using apollo, and manually adding the exon (click and drag it into the model). You can examine the reading frame after adding the exon and see if it is in fact a frameshift assembly error. If it?s a homopolymer derived frameshift, then you can expect a lot more of these throughout your assembly. Also I do not see any protein alignments here? MAKER cannot work on transcript evidence alone. You need to provide the full proteome of at least two other species (they don?t have to be that closely related, but closer is better). Protein alignments will also help you better interpret the coding status of exons supported by mRNA-seq. For example in the second image, you would expect protein evidence to support all the coding exons but not the UTR exons which would remove any doubt as to whether an exon is really UTR or not. > In this example, long UTR is another issue, is it non-coding RNA? > > I have another example: > > > The yellow was evidencs from cufflinks. The final output choose the prediction from Augustus, but the last two exon was annotated as UTR, I thought UTR should be continuous, and should not contain intron. Actually UTR is not expected to be continuous and without introns. In fact the majority of alternate splicing events occur in the 5? UTR (not in the CDS) and 5? UTR commonly contain introns (just as we see here). This makes evolutionary sense. Alternatively spiced 5? UTR allows for differential and tissue specific control of the exact same protein by swapping out the upstream regulatory sequence. Alternate splicing of the 3? UTR on the other hand is less common (it?s involved in nonsense mediated decay and not so much in regulation of expression), but introns in the 3? UTR are still not uncommon. The mRNA-seq alignments suggests that those exons are transcribed, so unless there is an assembly error causing a framefhift in the CDS and an early stop codon, the 3? UTR would be correct. If you had protein alignments from another species here, then you could see which exons they support as being coding exons. Thanks, Carson -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Fri Jan 30 21:48:33 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Sat, 31 Jan 2015 15:48:33 +1100 Subject: [maker-devel] genome duplication? Message-ID: Hi all, One of the fungal genomes I'm annotating is relatively shattered (?), with many contigs/scaffolds and based on CEGMA analysis only may indicate a potential widespread duplication of the genome # Statistics of the completeness of the genome based on 248 CEGs # > #Prots %Completeness - #Total Average %Ortho > > Complete 181 72.98 - 365 2.02 67.40 > Group 1 54 81.82 - 105 1.94 66.67 > Group 2 39 69.64 - 86 2.21 71.79 > Group 3 45 73.77 - 86 1.91 57.78 > Group 4 43 66.15 - 88 2.05 74.42 > Partial 230 92.74 - 528 2.30 77.83 > Group 1 61 92.42 - 140 2.30 72.13 > Group 2 53 94.64 - 127 2.40 84.91 > Group 3 56 91.80 - 126 2.25 69.64 > Group 4 60 92.31 - 135 2.25 85.00 The expected genome size is relatively low (~42 Mb by abyss-fac) in comparison with *Hortaea werneckii* (51.6Mb, 23333 genes), a related fungi with nearly 90% of its genes present in at least two copies. Paper: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0071328 Now to the Maker part... So, as part of the Maker annotation, I trained SNAP and Augustus, and I generated a specific RepeatModeler library. I recorded the predicted outputs from each Maker run (AED, number of predicted proteins and transcripts...). Both Augustus and SNAP used to give quite high number (~19000 and ~23000 respectively) in comparison with the xxx.all.maker.proteins.fasta (about 13600). So, my first question is, how does maker deal with gene duplications? Or is this just a phenomenon given that there is no support from the protein files provided initially to Maker? I've used 4 different protein files for the annotation, could it be that they weren't the best choices? I picked them from the closest relatives and similar environments So, in my last run I turn the keep_preds=1 and the proteins in the xxx.all.maker.proteins.fasta reached to Last question regarding the protein files. I download the annotated genomes from the JGI and most of them have two annotation folders "All_models,_Filtered_and_Not" and "Filtered_Models___best__". I've been using the protein files found in the later as I expected to have real evidence and a lower chance of being predicting false genes. Am I right? Thank you in advance, Xabier -- Xabier V?zquez Campos PhD Candidate Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Sat Jan 31 01:42:51 2015 From: mikael.durling at slu.se (=?utf-8?B?TWlrYWVsIEJyYW5kc3Ryw7ZtIER1cmxpbmc=?=) Date: Sat, 31 Jan 2015 08:42:51 +0000 Subject: [maker-devel] genome duplication? In-Reply-To: References: Message-ID: Hi Xabier, 31 jan 2015 kl. 05:48 skrev Xabier V?zquez Campos >: Hi all, One of the fungal genomes I'm annotating is relatively shattered (?), with many contigs/scaffolds and based on CEGMA analysis only may indicate a potential widespread duplication of the genome # Statistics of the completeness of the genome based on 248 CEGs # #Prots %Completeness - #Total Average %Ortho Complete 181 72.98 - 365 2.02 67.40 Partial 230 92.74 - 528 2.30 77.83 Judging from these figure, you seem to have a very fragmented assembly? What N50 have you reached? According to my experience, assemblies with an N50 below 5-10 times the average gene length tend to give problems in producing good gene sets. Not to say that the gene sets are unusable, but for comparing e.g. gene complements to other species, it will be hard to draw any conclusions when a high proportion of the genes are incomplete. The expected genome size is relatively low (~42 Mb by abyss-fac) in comparison with Hortaea werneckii (51.6Mb, 23333 genes), a related fungi with nearly 90% of its genes present in at least two copies. Paper: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0071328 Now to the Maker part... So, as part of the Maker annotation, I trained SNAP and Augustus, and I generated a specific RepeatModeler library. I recorded the predicted outputs from each Maker run (AED, number of predicted proteins and transcripts...). Both Augustus and SNAP used to give quite high number (~19000 and ~23000 respectively) in comparison with the xxx.all.maker.proteins.fasta (about 13600). So, my first question is, how does maker deal with gene duplications? Or is this just a phenomenon given that there is no support from the protein files provided initially to Maker? I've used 4 different protein files for the annotation, could it be that they weren't the best choices? I picked them from the closest relatives and similar environments Unless you by mistake filter out duplicated gene families as repeats with repeat modeler, maker should not care about duplicated genes. However, maker, without keep_preds=1, reports only genes with some kind of support (be it EST or protein homology). This is rather conservative, but if you enable keep_preds, you will get more genes as you have noted. Just for the sake of comparison, I have reannotad more than ten genomes downloaded from JGI, providing MAKER with similar evidence as JGI, and consistently, MAKER is reporting fewer gene models. I have yet to do a more thorough comparison to tell what genes JGI are reporting that don?t appear in the MAKER annotations. So, in my last run I turn the keep_preds=1 and the proteins in the xxx.all.maker.proteins.fasta reached to Last question regarding the protein files. I download the annotated genomes from the JGI and most of them have two annotation folders "All_models,_Filtered_and_Not" and "Filtered_Models___best__". I've been using the protein files found in the later as I expected to have real evidence and a lower chance of being predicting false genes. Am I right? Yes, I would say so. The FilteredModels have passed through their model selection pipeline, while all_models contains models from all predictors, as well as combinations of predictors and EST evidence. Just some 2 cents of observations of mine, cheers, Mikael Thank you in advance, Xabier -- Xabier V?zquez Campos PhD Candidate Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Sat Jan 31 01:51:36 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Sat, 31 Jan 2015 19:51:36 +1100 Subject: [maker-devel] genome duplication? In-Reply-To: References: Message-ID: Thanks Mikael, This are the assembly stats as taken from abyss-fac, indeed it isn't a great N50, but it isn't that bad either n n:500 n:N50 min N80 N50 N20 E-size max sum 14277 7099 1185 500 4698 10771 20438 14530 154519 42.68e6 2015-01-31 19:42 GMT+11:00 Mikael Brandstr?m Durling : > Hi Xabier, > > 31 jan 2015 kl. 05:48 skrev Xabier V?zquez Campos : > > Hi all, > > One of the fungal genomes I'm annotating is relatively shattered (?), with > many contigs/scaffolds and based on CEGMA analysis only may indicate a > potential widespread duplication of the genome > > # Statistics of the completeness of the genome based on 248 CEGs >> # >> #Prots %Completeness - #Total Average %Ortho >> >> Complete 181 72.98 - 365 2.02 67.40 >> Partial 230 92.74 - 528 2.30 77.83 >> > > > Judging from these figure, you seem to have a very fragmented assembly? > What N50 have you reached? According to my experience, assemblies with an > N50 below 5-10 times the average gene length tend to give problems in > producing good gene sets. Not to say that the gene sets are unusable, but > for comparing e.g. gene complements to other species, it will be hard to > draw any conclusions when a high proportion of the genes are incomplete. > > The expected genome size is relatively low (~42 Mb by abyss-fac) in > comparison with *Hortaea werneckii* (51.6Mb, 23333 genes), a related > fungi with nearly 90% of its genes present in at least two copies. > Paper: > http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0071328 > > Now to the Maker part... So, as part of the Maker annotation, I trained > SNAP and Augustus, and I generated a specific RepeatModeler library. I > recorded the predicted outputs from each Maker run (AED, number of > predicted proteins and transcripts...). Both Augustus and SNAP used to give > quite high number (~19000 and ~23000 respectively) in comparison with the > xxx.all.maker.proteins.fasta (about 13600). So, my first question is, how > does maker deal with gene duplications? Or is this just a phenomenon given > that there is no support from the protein files provided initially to > Maker? I've used 4 different protein files for the annotation, could it be > that they weren't the best choices? I picked them from the closest > relatives and similar environments > > > Unless you by mistake filter out duplicated gene families as repeats > with repeat modeler, maker should not care about duplicated genes. However, > maker, without keep_preds=1, reports only genes with some kind of support > (be it EST or protein homology). This is rather conservative, but if you > enable keep_preds, you will get more genes as you have noted. Just for the > sake of comparison, I have reannotad more than ten genomes downloaded from > JGI, providing MAKER with similar evidence as JGI, and consistently, MAKER > is reporting fewer gene models. I have yet to do a more thorough comparison > to tell what genes JGI are reporting that don?t appear in the MAKER > annotations. > > > So, in my last run I turn the keep_preds=1 and the proteins in the > xxx.all.maker.proteins.fasta reached to > > Last question regarding the protein files. I download the annotated > genomes from the JGI and most of them have two annotation folders > "All_models,_Filtered_and_Not" and "Filtered_Models___best__". I've been > using the protein files found in the later as I expected to have real > evidence and a lower chance of being predicting false genes. Am I right? > > > Yes, I would say so. The FilteredModels have passed through their model > selection pipeline, while all_models contains models from all predictors, > as well as combinations of predictors and EST evidence. > > Just some 2 cents of observations of mine, > cheers, > Mikael > > > Thank you in advance, > > Xabier > > > -- > Xabier V?zquez Campos > PhD Candidate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From chenwenbo1020 at gmail.com Sat Jan 31 08:54:28 2015 From: chenwenbo1020 at gmail.com (=?UTF-8?B?6ZmI5paH5Y2a?=) Date: Sat, 31 Jan 2015 10:54:28 -0500 Subject: [maker-devel] How to improve the result of Maker In-Reply-To: <492A6635-67E9-4700-B544-E137C4248E55@gmail.com> References: <492A6635-67E9-4700-B544-E137C4248E55@gmail.com> Message-ID: > > > There are two possibilities. Given how different the snap and augustus > models are from one another, this would suggest they have not been trained > appropriately (for example if you are picking another related organisms > parameter file rather than training these programs, there are several > assumptions that are being made that can actually make such an approach > almost worse than just picking a parameter file at random). But more likely > the evidence supported exon breaks the reading frame of the model. This > usually indicates that you have an assembly error (possibly issues with > homopolymers). No amount of evidence support will allow you to call an > exon that generates a mis-sense causing frameshift, so the predictors do > the next most reasonable thing - they drop the exon if another model is > tenable. More concerning would be the mRNA-seq alignments near the 3? end > of the gene call. The structure suggests significant capture of background > transcription with the mRNA-seq reads (long UTRs with weird mini-introns). > I would suggest not using cufflinks in this case. You should probably go > with an assembly based approach of mRNA-seq reads instead. I would suggest > using trinity. It will reduce sensitivity but greatly increase evidence > specificity which is where you need the most improvement based on these > images. I would also suggest using the jaccard_clip option with trinity. > > I would further suggest looking at the model in question using apollo, and > manually adding the exon (click and drag it into the model). You can > examine the reading frame after adding the exon and see if it is in fact a > frameshift assembly error. If it?s a homopolymer derived frameshift, then > you can expect a lot more of these throughout your assembly. > I drag the exon into the model, there is a stop codon in it, it causes the region behind it become UTR, here: [image: ???? 1] the question exon was pointed by red arrow. But the uppermost evidence is the completed EST from NCBI, and it contains start and stop codon. Then I noticed the 5' boundary of the 2nd codon in model is not the same as EST, so it makes frameshift, and cause the stop codon in the exon pointed by red arrow. The first exon should not be CDS, as there would be a start codon in 2nd exon if its 5' boundary is predicted correctly. Would "always_complete=1" fix it? I will try to use trinity. > > Also I do not see any protein alignments here? MAKER cannot work on > transcript evidence alone. You need to provide the full proteome of at > least two other species (they don?t have to be that closely related, but > closer is better). Protein alignments will also help you better interpret > the coding status of exons supported by mRNA-seq. For example in the second > image, you would expect protein evidence to support all the coding exons > but not the UTR exons which would remove any doubt as to whether an exon is > really UTR or not. > I did use 3 sources of protein evidence, one is proteome from related species, and one is proteome from fruitfly, and the last one is Swiss-prot. Thank you very much! Best regards, Wenbo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 10308 bytes Desc: not available URL: From jason.stajich at gmail.com Sat Jan 31 16:21:12 2015 From: jason.stajich at gmail.com (Jason Stajich) Date: Sat, 31 Jan 2015 15:21:12 -0800 Subject: [maker-devel] genome duplication? In-Reply-To: References: Message-ID: Xabier - FYI - though you probably already compared, those stats are on par with the Hortaea v1 assembly, (we do have an improved Hortaea assembly now and genome size is still same range supporting the duplication hypothesis) Hw version 1 asmbly - N50 9623; Max 71563 CEGMA for Hw1 #Prots %Completeness - #Total Average %Ortho Complete 196 79.03 - 498 2.54 81.12 Partial 228 91.94 - 673 2.95 95.18 Mikael - yes - we should compare notes on the models JGI is calling which have little support in MAKER - I am not sure if their pipeline runs with augustus/snap using informant hints though usually they are bringing RNAseq into the mix - I don't know if your approach for reannotation assembled the RNAseq and used it as evidence? We'll be trying to assess some of this when comparisons of proportion of shared genes in the first 1KFG paper so we may be able to say with more certainty of these extra predictions whether they are shared more widely and get a handle on singleton/false positives rates. Jason Jason Stajich jason.stajich at gmail.com On Sat, Jan 31, 2015 at 12:51 AM, Xabier V?zquez Campos wrote: > Thanks Mikael, > > This are the assembly stats as taken from abyss-fac, indeed it isn't a > great N50, but it isn't that bad either > > n n:500 n:N50 min N80 N50 N20 E-size > max sum > 14277 7099 1185 500 4698 10771 20438 14530 154519 > 42.68e6 > > > > 2015-01-31 19:42 GMT+11:00 Mikael Brandstr?m Durling < > mikael.durling at slu.se>: > >> Hi Xabier, >> >> 31 jan 2015 kl. 05:48 skrev Xabier V?zquez Campos : >> >> Hi all, >> >> One of the fungal genomes I'm annotating is relatively shattered (?), >> with many contigs/scaffolds and based on CEGMA analysis only may indicate a >> potential widespread duplication of the genome >> >> # Statistics of the completeness of the genome based on 248 CEGs >>> # >>> #Prots %Completeness - #Total Average %Ortho >>> >>> Complete 181 72.98 - 365 2.02 67.40 >>> Partial 230 92.74 - 528 2.30 77.83 >>> >> >> >> Judging from these figure, you seem to have a very fragmented assembly? >> What N50 have you reached? According to my experience, assemblies with an >> N50 below 5-10 times the average gene length tend to give problems in >> producing good gene sets. Not to say that the gene sets are unusable, but >> for comparing e.g. gene complements to other species, it will be hard to >> draw any conclusions when a high proportion of the genes are incomplete. >> >> The expected genome size is relatively low (~42 Mb by abyss-fac) in >> comparison with *Hortaea werneckii* (51.6Mb, 23333 genes), a related >> fungi with nearly 90% of its genes present in at least two copies. >> Paper: >> http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0071328 >> >> Now to the Maker part... So, as part of the Maker annotation, I trained >> SNAP and Augustus, and I generated a specific RepeatModeler library. I >> recorded the predicted outputs from each Maker run (AED, number of >> predicted proteins and transcripts...). Both Augustus and SNAP used to give >> quite high number (~19000 and ~23000 respectively) in comparison with the >> xxx.all.maker.proteins.fasta (about 13600). So, my first question is, how >> does maker deal with gene duplications? Or is this just a phenomenon given >> that there is no support from the protein files provided initially to >> Maker? I've used 4 different protein files for the annotation, could it be >> that they weren't the best choices? I picked them from the closest >> relatives and similar environments >> >> >> Unless you by mistake filter out duplicated gene families as repeats >> with repeat modeler, maker should not care about duplicated genes. However, >> maker, without keep_preds=1, reports only genes with some kind of support >> (be it EST or protein homology). This is rather conservative, but if you >> enable keep_preds, you will get more genes as you have noted. Just for the >> sake of comparison, I have reannotad more than ten genomes downloaded from >> JGI, providing MAKER with similar evidence as JGI, and consistently, MAKER >> is reporting fewer gene models. I have yet to do a more thorough comparison >> to tell what genes JGI are reporting that don?t appear in the MAKER >> annotations. >> >> >> So, in my last run I turn the keep_preds=1 and the proteins in the >> xxx.all.maker.proteins.fasta reached to >> >> Last question regarding the protein files. I download the annotated >> genomes from the JGI and most of them have two annotation folders >> "All_models,_Filtered_and_Not" and "Filtered_Models___best__". I've been >> using the protein files found in the later as I expected to have real >> evidence and a lower chance of being predicting false genes. Am I right? >> >> >> Yes, I would say so. The FilteredModels have passed through their model >> selection pipeline, while all_models contains models from all predictors, >> as well as combinations of predictors and EST evidence. >> >> Just some 2 cents of observations of mine, >> cheers, >> Mikael >> >> >> Thank you in advance, >> >> Xabier >> >> >> -- >> Xabier V?zquez Campos >> PhD Candidate >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > > -- > Xabier V?zquez Campos > *PhD Candidate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Jan 5 19:59:23 2015 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 5 Jan 2015 19:59:23 -0700 Subject: [maker-devel] some problems using MAKER In-Reply-To: References: Message-ID: <08B46BBA-522B-43BC-9E82-57F641E0127D@gmail.com> I?d have to see the two GFF3 files you are using for your comparison. However one thing that comes to mind is that you may be unfamiliar with eval?s output. Eval provides several levels of strictness in the report at the gene, transcript, exon, and base pair levels. If you are using the gene level strictness in the report for example, then a single base pair difference in any of the transcripts will cause the entire gene to be considered a miss-match. You really only should use the base pair level SN/SP strictness for your comparison which will be in the eval report. In the most extreme case an exon level SN/SP strictness may be used, but in general no gold standard dataset is considered perfect enough to use the gene level SN/SP (or usually even the exon level strictness). ?Carson > On Dec 31, 2014, at 6:48 PM, ?? wrote: > > Hi all, > > Recently I'm using MAKER to annotate a single chromosome of rice as a pre-experiment. And I'm confronting some problems. After the annotation when I run the evaluation of eval between my result and gold standard, the gene sensitivity&specificity is only around 20%. And after I added the gff3 file maker made itself to run maker again, I found that the result is worse than 20%. > > My input is a Trinity-processed RNA-seq file and a protein file. I chose snap, augustus and genemark as ab initio predictors. > > I paste my maker_opts.ctl here: > > #-----Genome (these are always required) > genome=chr12.fasta #genome sequence (fasta file or fasta embeded in GFF3 file) > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff=chr12.gff #MAKER derived GFF3 file > est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est=rna-seq_trinity.fasta #set of ESTs or assembled mRNA-seq in fasta format > altest= #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at least one) > protein=Osativa_193_peptide.fa #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org=Rice #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker > repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no > softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) > > #-----Gene Prediction > snaphmm=rice #SNAP HMM file > gmhmm=/lustre/home/clswcc/yzhao/MAKER/maker/exe/genemark_hmm_euk_linux_64/ehmm/o_sativa.mod #GeneMark HMM file > augustus_species=arabidopsis #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff=augus.gff3 #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) > est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=1 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases > cpus=16 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) > > > Could you help me? Thank you !!! > > > > -- > Yue Zhao (Jerry) > Bachelor Candidate of Plant Biotechnology > Researcher in UCLA-CSST program > Shanghai Jiao Tong University, Shanghai > jerryzhaosjtu at gmail.com _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jerryzhaosjtu at gmail.com Wed Jan 7 04:16:45 2015 From: jerryzhaosjtu at gmail.com (=?UTF-8?B?6LW16LaK?=) Date: Wed, 7 Jan 2015 19:16:45 +0800 Subject: [maker-devel] using MAKER with MPI Message-ID: Greetings, Can I use mpirun instead of mpiexec? Thank you!! -- *Yue Zhao (Jerry)* Bachelor Candidate of Plant Biotechnology Researcher in UCLA-CSST program Shanghai Jiao Tong University, Shanghai *jerryzhaosjtu at gmail.com * -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jan 7 09:13:50 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 7 Jan 2015 09:13:50 -0700 Subject: [maker-devel] using MAKER with MPI In-Reply-To: References: Message-ID: Yes they are interchangeable. In fact in OpenMPI both mpiexec and mpirun are softlinks to the exact same executable ?> orterun Just remember MAKER works which MPICH2/3 and OpenMPI flavors of MPI but not with MVAPICH2. Also If using MPICH, make sure to enable shared libaries during installation (this is not the default). If using OpenMPI, make sure to set LD_PRELOAD to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/usr/local/openmpi/lib/libmpi.so). If jobs hang or freeze when using OpenMPI try adding the '-mca btl ^openib' flag to the mpiexec command when running MAKER. Example: mpiexec -mca btl ^openib -n 20 maker ?Carson > On Jan 7, 2015, at 4:16 AM, ?? wrote: > > Greetings, > > Can I use mpirun instead of mpiexec? Thank you!! > > -- > Yue Zhao (Jerry) > Bachelor Candidate of Plant Biotechnology > Researcher in UCLA-CSST program > Shanghai Jiao Tong University, Shanghai > jerryzhaosjtu at gmail.com _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jan 8 08:47:29 2015 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 8 Jan 2015 08:47:29 -0700 Subject: [maker-devel] MAKER mpi running wrong In-Reply-To: References: Message-ID: <13241A86-804F-4674-A8FD-CA90026CF4AF@gmail.com> When running large jobs in MPI semi-random issues can arise as well as tuning issues where hardware configuration, IO performance, buffer sizes etc. all play a role. Using one of the NIH flagship clusters from XSEDE for example, I can run on over 2000 CPUs without issue. But the IT specialists with XSEDE have also spent a lot of time tuning MPI by enabling and disabling certain options for their hardware and network configuration (The IT specialists for the XSEDE project are actually the developers for many of the MPI flavors available, so they actually wrote MPI to work really well on this specific cluster). On other clusters I can?t go over 200 cpus on a single job. Or on another XSEDE cluster I can run on exactly 1424 CPUs. If I increase by a single CPU, the jobs always fails. For these kinds of issues you would have to delve into some of the more obscure parameters of OpenMPI via trial and error (http://www.open-mpi.org/doc/ ). What happens under the hood in OpenMPI is that different buffer sizes and network communication strategies are triggered as the number of nodes increases, so you can often identify a specific CPU count that is stable, and going one over that number causes a failure. You then look in the documentation for a parameter that matches that trigger value and alter it higher or lower. Or if you can identify the stable CPU count, then just submit multiple jobs at exactly that CPU count. ?Carson > On Jan 8, 2015, at 8:27 AM, ?? wrote: > > Hi Carson, > > After using the flag in your example, the warning after runing MAKER was gone, yet after running with MPI in 512 threads for 2 hours, MAKER 'Exited with exit code 1' The stdout info is as followed: > > [node206][[7968,1],269][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104) > [node206][[7968,1],269][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104) > SIGTERM received > Perl exited with active threads: > 1 running and unjoined > 0 finished and unjoined > 0 running and detached > > Also, my job submission is like: > > #BSUB -J maker_mpi > #BSUB -n 512 > #BSUB -R "span[ptile=16]" > module purge && module load gcc/4.9.1 openmpi/gcc/1.6.5 > mpiexec -mca btl ^openib -n 512 perl /lustre/home/clswcc/yzhao/MAKER/maker/bin/maker -fix_nucleotides > > > Could you help me find out where is going wrong? The stdout at first is normal as followd : > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /lustre/home/clswcc/SOP_1Krice/gene_prediction/mpi/unaln.maker.output/unaln_datastore > > To access files for individual sequences use the datastore index: > /lustre/home/clswcc/SOP_1Krice/gene_prediction/mpi/unaln.maker.output/unaln_master_datastore_index.log > > STATUS: Now running MAKER... > > > > > Regards, > yue > > -- > Yue Zhao (Jerry) > Bachelor Candidate of Plant Biotechnology > Researcher in UCLA-CSST program > Shanghai Jiao Tong University, Shanghai > jerryzhaosjtu at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Wed Jan 14 01:40:38 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Wed, 14 Jan 2015 19:40:38 +1100 Subject: [maker-devel] doubt about selection of the best model Message-ID: Hi Maker developers and users, After quite a bit of time dealing with Maker, I can run it without problems (thank you Carson). However, I have doubts about the evaluation of the best model produced by Maker. I found the AED_cdf_generator.pl script while searching in the mail list and it is great but, when you use it, what gff files are you comparing? I initially thought that the models to be compared where those from each *ab initio* program, e.g. SNAP vs Augustus, and inside them, the subsequent bootstrap training steps, but unless you run only one each time you run Maker, the XXX.all.gff file will contain data from both predictions. Should I run them individually? Following the topic, Maker will generate different FASTA files for proteins and transcripts from each program (Maker and each *ab initio* predictor) as well as "non_overlapping" files. Which one(s) do you select to continue with the functional annotation? Thank you in advance, Xabier -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Wed Jan 14 01:49:34 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Wed, 14 Jan 2015 19:49:34 +1100 Subject: [maker-devel] Augustus retraining?? Message-ID: Hi, I trained Augustus using the output of CEGMA ( http://bioinf.uni-greifswald.de/bioinf/wiki/pmwiki.php?n=Augustus.CEGMATraining) through WebAugustus, which makes the training very easy but, and here is my question, can/should I re-train Augustus like it is done with SNAP? And what would I use for the re-training Thank you, Xabier -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Wed Jan 14 03:08:33 2015 From: mikael.durling at slu.se (=?utf-8?B?TWlrYWVsIEJyYW5kc3Ryw7ZtIER1cmxpbmc=?=) Date: Wed, 14 Jan 2015 10:08:33 +0000 Subject: [maker-devel] Augustus retraining?? In-Reply-To: References: Message-ID: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> Hi, 14 jan 2015 kl. 09:49 skrev Xabier V?zquez Campos >: Hi, I trained Augustus using the output of CEGMA (http://bioinf.uni-greifswald.de/bioinf/wiki/pmwiki.php?n=Augustus.CEGMATraining) through WebAugustus, which makes the training very easy but, and here is my question, can/should I re-train Augustus like it is done with SNAP? And what would I use for the re-training I?ve tried an approach of retraining augustus in a manner similar to what has been suggested here earlier for retraining of SNAP. This has been run with a local augustus installation as part of an automated framework I have set up to annotate fungal genomes. Interestingly, augustus seems to converge very quickly. It is not uncommon that autoAugustus reports that it could not improve the initial models that were derived from the CEGMA dataset. Are there other similar experiences on the list? I also a modified version of maker2zff which I call maker2augustus_gff which extracts an evidence set for augustus retraining from the initial round of maker. I?m happy to share it with anyone interested. cheers, Mikael Thank you, Xabier -- Xabier V?zquez Campos PhD Candidate Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jan 14 08:22:57 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 Jan 2015 08:22:57 -0700 Subject: [maker-devel] Augustus retraining?? In-Reply-To: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> References: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> Message-ID: <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> Here is some info on training SNAP via the bootstrap technique (i.e. using the models produced by the initial training to seed the next round of training). Even though the examples use SNAP, it would be applicable using the scripts and methods Mikael described in his w-mail ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors Also Jason Stajich wrote an excellent explanation on training Augustus on the GMOD mailing list ?> http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html He also included his own scripts to assist with the training ?> https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2augustus_gbk.pl ?Carson > On Jan 14, 2015, at 3:08 AM, Mikael Brandstr?m Durling wrote: > > Hi, > > >> 14 jan 2015 kl. 09:49 skrev Xabier V?zquez Campos >: >> >> Hi, >> >> I trained Augustus using the output of CEGMA (http://bioinf.uni-greifswald.de/bioinf/wiki/pmwiki.php?n=Augustus.CEGMATraining ) through WebAugustus, which makes the training very easy but, and here is my question, can/should I re-train Augustus like it is done with SNAP? And what would I use for the re-training > > I?ve tried an approach of retraining augustus in a manner similar to what has been suggested here earlier for retraining of SNAP. This has been run with a local augustus installation as part of an automated framework I have set up to annotate fungal genomes. Interestingly, augustus seems to converge very quickly. It is not uncommon that autoAugustus reports that it could not improve the initial models that were derived from the CEGMA dataset. Are there other similar experiences on the list? > > I also a modified version of maker2zff which I call maker2augustus_gff which extracts an evidence set for augustus retraining from the initial round of maker. I?m happy to share it with anyone interested. > > cheers, > Mikael > > >> >> Thank you, >> >> Xabier >> -- >> Xabier V?zquez Campos >> PhD Candidate >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jan 14 08:37:43 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 Jan 2015 08:37:43 -0700 Subject: [maker-devel] doubt about selection of the best model In-Reply-To: References: Message-ID: The MAKER models will be the final models. Fasta files and features from the raw ab initio gene predictors on the other hand are there for reference purposes only and unless you have a need for them should be ignored. MAKER models are the combination of ab initio gene predictions filtered for best evidence match together with hint based models from the predictors. Basically MAKER took the best models from each separate predictor and created a final consensus gene set. The CDF generator really is for comparison of how evidence match changes between different releases of the genome or for different parameter options (i.e. you are comparing curves between independent MAKER runs and not within a single MAKER run). THE AED CDF curve is interpreted similar to a ROC curve in that shifts up and to the left indicate improved gene models. This is as opposed to using sensitivity and specificity, because those measures require you to already know the correct models in order to generate a comparison. For de-novo annotation that is impossible (if you already had the correct models you wouldn?t be running MAKER), so since such values cannot be generated then AED which used evidence overlap acts as a proxy measurement. This paper probably gives the overall best example of how AED correlates with model quality (Figures 2 and 3) ?> http://www.biomedcentral.com/1471-2105/12/491 ?Carson > On Jan 14, 2015, at 1:40 AM, Xabier V?zquez Campos wrote: > > Hi Maker developers and users, > > After quite a bit of time dealing with Maker, I can run it without problems (thank you Carson). However, I have doubts about the evaluation of the best model produced by Maker. > > I found the AED_cdf_generator.pl script while searching in the mail list and it is great but, when you use it, what gff files are you comparing? I initially thought that the models to be compared where those from each ab initio program, e.g. SNAP vs Augustus, and inside them, the subsequent bootstrap training steps, but unless you run only one each time you run Maker, the XXX.all.gff file will contain data from both predictions. Should I run them individually? > > Following the topic, Maker will generate different FASTA files for proteins and transcripts from each program (Maker and each ab initio predictor) as well as "non_overlapping" files. Which one(s) do you select to continue with the functional annotation? > > Thank you in advance, > > Xabier > > -- > Xabier V?zquez Campos > PhD Candidate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Fri Jan 16 01:09:11 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Fri, 16 Jan 2015 19:09:11 +1100 Subject: [maker-devel] functional annotation Message-ID: Hi, What file from the Maker output do you use for the functional annotation? The fasta part of the XXX.all.gff? I'll probably be using BLAST and InterProScan. I tested B2go (basic version), good stuff but it is annoyingly slow. Thank you -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Fri Jan 16 03:11:21 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Fri, 16 Jan 2015 21:11:21 +1100 Subject: [maker-devel] repeat masking and repeat libraries Message-ID: Hi there, First, a general question. Probably kind of silly but I prefer to be sure... When you browse RepBase, for example in fungi, all the repeats are marked as Eukaryota (Ancestral) or under the name of the species but no other taxa ranks are indicated. Does RepeatMasker recognise orders, families etc? or in my case should I stick with model_org=fungi? I've been trying to create a repeat libraries specific for my genomes and I did't have any luck with the programs described in the Basic and advanced tutorials (neither in my computer or in the cluster), reporting errors at all times, with exception of RepeatModeler, which ran with no problems. Is the output from RepeatModeler enough to improve the masking? It is not the best option I guess, but better than just the RepBase libraries by themselves, isn't it? Thank you for your time, Xabier -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Fri Jan 16 10:01:37 2015 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Fri, 16 Jan 2015 10:01:37 -0700 Subject: [maker-devel] functional annotation In-Reply-To: References: Message-ID: Hi Xabier, The FASTA at the end of the GFF3 file is the genome. For functional annotation you want to use the XXXout.all.maker.proteins.fasta file. It contains the protein sequences for your MAKER gene models. Good luck, Mike On Fri, Jan 16, 2015 at 1:09 AM, Xabier V?zquez Campos wrote: > Hi, > > What file from the Maker output do you use for the functional annotation? > The fasta part of the XXX.all.gff? > > I'll probably be using BLAST and InterProScan. I tested B2go (basic > version), good stuff but it is annoyingly slow. > > Thank you > > -- > Xabier V?zquez Campos > *PhD Candidate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 16 10:04:09 2015 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 16 Jan 2015 10:04:09 -0700 Subject: [maker-devel] repeat masking and repeat libraries In-Reply-To: References: Message-ID: Using both RepBase and a RepeatModeler produced library should be sufficient, especially for fungi. ?Carson > On Jan 16, 2015, at 3:11 AM, Xabier V?zquez Campos wrote: > > Hi there, > > First, a general question. Probably kind of silly but I prefer to be sure... When you browse RepBase, for example in fungi, all the repeats are marked as Eukaryota (Ancestral) or under the name of the species but no other taxa ranks are indicated. Does RepeatMasker recognise orders, families etc? or in my case should I stick with model_org=fungi? > > I've been trying to create a repeat libraries specific for my genomes and I did't have any luck with the programs described in the Basic and advanced tutorials (neither in my computer or in the cluster), reporting errors at all times, with exception of RepeatModeler, which ran with no problems. Is the output from RepeatModeler enough to improve the masking? It is not the best option I guess, but better than just the RepBase libraries by themselves, isn't it? > > Thank you for your time, > > Xabier > > -- > Xabier V?zquez Campos > PhD Candidate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Fri Jan 16 10:08:43 2015 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Fri, 16 Jan 2015 10:08:43 -0700 Subject: [maker-devel] repeat masking and repeat libraries In-Reply-To: References: Message-ID: Hi Xabier, I haven't seen orders or families documented for repeatmasker with repbase. Fungi seems safe to me. If you want to give yourself a little more peace of mind about the repeatmodeler library you can blast it to database of known fungal proteins and remove the entries int he library that have strong hits to a known protein to avoid over-masking. Mike On Fri, Jan 16, 2015 at 10:04 AM, Carson Holt wrote: > Using both RepBase and a RepeatModeler produced library should be > sufficient, especially for fungi. > > ?Carson > > > On Jan 16, 2015, at 3:11 AM, Xabier V?zquez Campos > wrote: > > Hi there, > > First, a general question. Probably kind of silly but I prefer to be > sure... When you browse RepBase, for example in fungi, all the repeats are > marked as Eukaryota (Ancestral) or under the name of the species but no > other taxa ranks are indicated. Does RepeatMasker recognise orders, > families etc? or in my case should I stick with model_org=fungi? > > I've been trying to create a repeat libraries specific for my genomes and > I did't have any luck with the programs described in the Basic > > and advanced > > tutorials (neither in my computer or in the cluster), reporting errors at > all times, with exception of RepeatModeler, which ran with no problems. Is > the output from RepeatModeler enough to improve the masking? It is not the > best option I guess, but better than just the RepBase libraries by > themselves, isn't it? > > Thank you for your time, > > Xabier > > -- > Xabier V?zquez Campos > *PhD Candidate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Fri Jan 16 20:57:26 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Sat, 17 Jan 2015 14:57:26 +1100 Subject: [maker-devel] AED score script error Message-ID: Hi, Just reporting the following error with the AED_cdf_generator.pl script: Use of uninitialized value $opt_b in division (/) at AED_cdf_generator.pl > line 20. > Illegal division by zero at AED_cdf_generator.pl line 20. > Anybody else with this problem? I use the version attached here: https://groups.google.com/forum/#!topic/maker-devel/LCpB3CEm63M Thank you -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Jan 19 10:27:52 2015 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 19 Jan 2015 10:27:52 -0700 Subject: [maker-devel] AED score script error In-Reply-To: References: Message-ID: Hi Xabier, Did you give the -b option a value on the command line ( e.g. -b 0.1)? Mike On Fri, Jan 16, 2015 at 8:57 PM, Xabier V?zquez Campos wrote: > Hi, > > Just reporting the following error with the AED_cdf_generator.pl script: > > Use of uninitialized value $opt_b in division (/) at AED_cdf_generator.pl >> line 20. >> Illegal division by zero at AED_cdf_generator.pl line 20. >> > > Anybody else with this problem? > I use the version attached here: > https://groups.google.com/forum/#!topic/maker-devel/LCpB3CEm63M > > Thank you > > > -- > Xabier V?zquez Campos > *PhD Candidate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Mon Jan 19 23:14:58 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Tue, 20 Jan 2015 17:14:58 +1100 Subject: [maker-devel] AED score script error In-Reply-To: References: Message-ID: Thanks Mike. It was that. 2015-01-20 4:27 GMT+11:00 Michael Campbell : > Hi Xabier, > > Did you give the -b option a value on the command line ( e.g. -b 0.1)? > > Mike > > On Fri, Jan 16, 2015 at 8:57 PM, Xabier V?zquez Campos < > xvazquezc at gmail.com> wrote: > >> Hi, >> >> Just reporting the following error with the AED_cdf_generator.pl script: >> >> Use of uninitialized value $opt_b in division (/) at AED_cdf_generator.pl >>> line 20. >>> Illegal division by zero at AED_cdf_generator.pl line 20. >>> >> >> Anybody else with this problem? >> I use the version attached here: >> https://groups.google.com/forum/#!topic/maker-devel/LCpB3CEm63M >> >> Thank you >> >> >> -- >> Xabier V?zquez Campos >> *PhD Candidate* >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > -- > Michael Campbell MS, RD. > Doctoral Candidate > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ph:585-3543 > > -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jan 20 09:45:01 2015 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 Jan 2015 09:45:01 -0700 Subject: [maker-devel] Issue due to intensive I/O In-Reply-To: References: Message-ID: <6F82AB5F-4782-41CA-A61F-C79894EFABB4@gmail.com> Genome annotation is very data intensive as opposed to CPU intensive. In MAKER, most IO intensive operations will occur in a temporary directory pointed to by the TMP= option in the MAKER control files. If you are setting this value to a location on a network mounted drive then this could be the source of your problem. Also TMP= defaults to the location of the TMPDIR Linux environmental variable, so make sure that TMPDIR is not set to a network mounted location either. The temporary directory needs to be a locally mounted location. There will still need to be a number of global files though; however, we?ve previously ran MAKER on over 8,000 cpus on Lustre file systems with no issues. It is possible that it is the metadata server that is having problems as opposed to the object storage server if the genome being annotated has a large number of small contigs. Lots of small contigs in a fragmented genome assembly result in a lot of small result files, but very little reading and writing. Such a situation can be quite stressful for Lustre file systems because they don?t like having large numbers of very small files (it overwhelms the metadata server even though the object storage server will be under more moderate load). Make sure you are setting min_contig= to something like 10000 if that is the case to avoid generating analysis for short un-annotatable contigs (they may number in the hundreds of thousands on lower quality genome assemblies and contain no useful information). You can also set clean_up=1 in the maker control files, to delete files as MAKER advances. This removes restart capability because you won?t have logged results from previous runs, but it will reduce the burden on the Metadata server (which is affected by total file number as opposed to file read/write operations). Also setting clean_up=1 can help you avoid any administrator defined limits on total file number per user (administrators commonly set this limit on Lustre based file systems to avoid taxing the metadata server). So your issue is likely caused by one of two things: 1. Improperly setting TMP= in the maker_opts.ctl file or the Linux TMPDIR environmental variable to a network mounted location. Fixed by setting these to a locally mounted location (usually /tmp). 2. Too many total files being generated by a fragmented genome assembly. Fixed by either setting min_contig=10000 in order to skip short contigs or by setting clean_up=1 to avoid logging too many files. This happen because it is very difficult to overwhelm Lustre's object storage servers (which perform IO read/write operations), but it?s relatively easy to overwhelming the metadata server (affected by total file count rather than total IO throughput). ?Carson > On Jan 19, 2015, at 5:55 AM, Stephen Wang wrote: > > Dear MAKER Team, > > I am a cluster administrator in the university. The issue is caused by MAKER jobs, which access massive small files and crashed Lustre file system. > > Hardware: 16 cores per node > Software: OpenMPI 1.6.5 and GCC 4.9.1 > > Q1: Does MAKER have to generate a large number of files on the global file system? > Q2: Can any parameters help MAKER avoid I/O intensive access? Any experience on Lustre? > > MAKER is a quite important software for our user. Hope for your help. > > BR, > Stephen > > -- > Stephen Wang, GPU Computing Specialist > Center for High Performance Computing > Shanghai Jiao Tong University > Room 205 Network Center, 800 Dongchuan Road, Shanghai 200240 China > Mobi:+86-136-6151-1618 Web:http://hpc.sjtu.edu.cn -------------- next part -------------- An HTML attachment was scrubbed... URL: From jgallant at msu.edu Wed Jan 21 06:56:02 2015 From: jgallant at msu.edu (Jason Gallant) Date: Wed, 21 Jan 2015 05:56:02 -0800 (PST) Subject: [maker-devel] Maker on Amazon EC2 Using Starcluster Message-ID: <1421848561970.c8b481bf@Nodemailer> Hi Everyone, I?m attempting to run Maker on Amazon EC2 using MIT?s starcluster? I?ve started a 200 node cluster, and enabled MPICH2 (Starcluster by default uses OpenMPI). ?I plan on documenting this setup once I?ve figured out how to run things reliably. I?m having a persistent issue where something fails on one of the nodes, and std error is flooded with: examining contents of the fasta file and run log [67] ERROR: could not make datastore directory [67] --> rank=67, hostname=node067 [67] ERROR: Failed while examining contents of the fasta file and run log [67] ERROR: Chunk failed at level:0, tier_type:0 [67] FAILED CONTIG:Scaffold261 This error repeats for each ?next? scaffold for some time. ?When I go back to find the ?source? of the error in the log, the following is the first error message on that node: 67] #-------------------------------# [67] deleted:-60 hits [67] collecting blastx reports [67] ERROR: Could not colapse BLAST reports [67]? at /root/maker/bin/../lib/GI.pm line 2524 thread 1. [67] GI::combine_blast_report(FastaChunk=HASH(0x108e1a90), ARRAY(0x1b874938), ARRAY(0xf127ad8), runlog=HASH(0x4d54ed8)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760 thread 1 [67] Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415 thread 1 [67] eval {...} called at /root/maker/bin/../lib/Error.pm line 407 thread 1 [67] Error::subs::try(CODE(0x1514eb00), HASH(0x9cbeb568)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4215 thread 1 [67] Process::MpiChunk::_go(Process::MpiChunk=HASH(0x13976308), "run", HASH(0x12e04268), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341 thread 1 [67] Process::MpiChunk::run(Process::MpiChunk=HASH(0x13976308), 67) called at /root/maker/bin/maker line 1457 thread 1 [67] main::node_thread("/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 [67] eval {...} called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 [67] threads::new("threads", CODE(0x3dc5b38), "/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /root/maker/bin/maker line 917 thread 1 [67] --> rank=67, hostname=node067 [67] ERROR: Failed while collecting blastx reports [67] ERROR: Chunk failed at level:9, tier_type:3 [67] FAILED CONTIG:Scaffold66 [67]? [67] ERROR: Chunk failed at level:4, tier_type:0 [67] FAILED CONTIG:Scaffold66 I?ve attempted to ignore the error to see if things will proceed on the other 199 processors. ?When I returned to the ?master? node after the evening, Maker keeps repeating the same error code over and over (same scaffold): ] examining contents of the fasta file and run log [67] ERROR: could not make datastore directory [67] --> rank=67, hostname=node067 [67] ERROR: Failed while examining contents of the fasta file and run log [67] ERROR: Chunk failed at level:0, tier_type:0 [67] FAILED CONTIG:Scaffold1589 I stop the job, and restart, and after only a few minutes of running, the same error is reported, this time on a new scaffold. ?Strangely here, the error is reported in the MPI tag of node001, but the error originates at node137: ERROR: Could not colapse BLAST reports [1]? at /root/maker/bin/../lib/GI.pm line 2524. [1] ? ? GI::combine_blast_report(FastaChunk=HASH(0xf4aa9b8), ARRAY(0xf628f90), ARRAY(0x325fea78), runlog=HASH(0x133cc8e8)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760 [1] ? ? Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415 [1] ? ? eval {...} called at /root/maker/bin/../lib/Error.pm line 407 [1] ? ? Error::subs::try(CODE(0x352c9b8), HASH(0xdab3b690)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4215 [1] ? ? Process::MpiChunk::_go(Process::MpiChunk=HASH(0x3545d90), "run", HASH(0x30aa710), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341 [1] ? ? Process::MpiChunk::run(Process::MpiChunk=HASH(0x3545d90), 137) called at /root/maker/bin/maker line 979 [1] --> rank=137, hostname=node137 [1] ERROR: Failed while collecting blastx reports [1] ERROR: Chunk failed at level:9, tier_type:3 [1] FAILED CONTIG:Scaffold249 [1] [1] ERROR: Chunk failed at level:4, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 I?d appreciate any guidance as how best to diagnose this error! Many thanks, Jason Gallant ? Dr. Jason R. GallantAssistant Professor Room 38 Natural Sciences Department of Zoology Michigan State University East Lansing, MI 48824 jgallant at msu.edu office: 517-884-7756 -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Wed Jan 21 17:42:35 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Thu, 22 Jan 2015 11:42:35 +1100 Subject: [maker-devel] repeat masking and repeat libraries In-Reply-To: References: Message-ID: Thanks Mike, I've blasted (blastx against nr) and many, if not most of the repeatmodeler library sequences match with transposases, pol proteins, gag proteins, retrotransposons,... all of them present in other fungi of the same order. Should I leave it to be masked? I still do run prediction on the unmasked genome too? Also, in many cases, the match a couple of thousand bp on the extreme of a 9kbp sequence and in none of them InterProScan is capable of finding anything except potential TM domains or so, provided by SignalP. What do you think? Should I leave it as it is? Thank you again for your time 2015-01-17 4:08 GMT+11:00 Michael Campbell : > Hi Xabier, > > I haven't seen orders or families documented for repeatmasker with > repbase. Fungi seems safe to me. > > If you want to give yourself a little more peace of mind about the > repeatmodeler library you can blast it to database of known fungal proteins > and remove the entries int he library that have strong hits to a known > protein to avoid over-masking. > > Mike > > On Fri, Jan 16, 2015 at 10:04 AM, Carson Holt wrote: > >> Using both RepBase and a RepeatModeler produced library should be >> sufficient, especially for fungi. >> >> ?Carson >> >> >> On Jan 16, 2015, at 3:11 AM, Xabier V?zquez Campos >> wrote: >> >> Hi there, >> >> First, a general question. Probably kind of silly but I prefer to be >> sure... When you browse RepBase, for example in fungi, all the repeats are >> marked as Eukaryota (Ancestral) or under the name of the species but no >> other taxa ranks are indicated. Does RepeatMasker recognise orders, >> families etc? or in my case should I stick with model_org=fungi? >> >> I've been trying to create a repeat libraries specific for my genomes and >> I did't have any luck with the programs described in the Basic >> >> and advanced >> >> tutorials (neither in my computer or in the cluster), reporting errors at >> all times, with exception of RepeatModeler, which ran with no problems. Is >> the output from RepeatModeler enough to improve the masking? It is not the >> best option I guess, but better than just the RepBase libraries by >> themselves, isn't it? >> >> Thank you for your time, >> >> Xabier >> >> -- >> Xabier V?zquez Campos >> *PhD Candidate* >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > -- > Michael Campbell MS, RD. > Doctoral Candidate > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ph:585-3543 > > -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Thu Jan 22 09:42:56 2015 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Thu, 22 Jan 2015 09:42:56 -0700 Subject: [maker-devel] repeat masking and repeat libraries In-Reply-To: References: Message-ID: Hi Xabier, >From what you described I would leave it as is. Mike On Wed, Jan 21, 2015 at 5:42 PM, Xabier V?zquez Campos wrote: > Thanks Mike, > > I've blasted (blastx against nr) and many, if not most of the > repeatmodeler library sequences match with transposases, pol proteins, gag > proteins, retrotransposons,... all of them present in other fungi of the > same order. Should I leave it to be masked? I still do run prediction on > the unmasked genome too? > Also, in many cases, the match a couple of thousand bp on the extreme of a > 9kbp sequence and in none of them InterProScan is capable of finding > anything except potential TM domains or so, provided by SignalP. > > What do you think? Should I leave it as it is? > > Thank you again for your time > > 2015-01-17 4:08 GMT+11:00 Michael Campbell > : > >> Hi Xabier, >> >> I haven't seen orders or families documented for repeatmasker with >> repbase. Fungi seems safe to me. >> >> If you want to give yourself a little more peace of mind about the >> repeatmodeler library you can blast it to database of known fungal proteins >> and remove the entries int he library that have strong hits to a known >> protein to avoid over-masking. >> >> Mike >> >> On Fri, Jan 16, 2015 at 10:04 AM, Carson Holt wrote: >> >>> Using both RepBase and a RepeatModeler produced library should be >>> sufficient, especially for fungi. >>> >>> ?Carson >>> >>> >>> On Jan 16, 2015, at 3:11 AM, Xabier V?zquez Campos >>> wrote: >>> >>> Hi there, >>> >>> First, a general question. Probably kind of silly but I prefer to be >>> sure... When you browse RepBase, for example in fungi, all the repeats are >>> marked as Eukaryota (Ancestral) or under the name of the species but no >>> other taxa ranks are indicated. Does RepeatMasker recognise orders, >>> families etc? or in my case should I stick with model_org=fungi? >>> >>> I've been trying to create a repeat libraries specific for my genomes >>> and I did't have any luck with the programs described in the Basic >>> >>> and advanced >>> >>> tutorials (neither in my computer or in the cluster), reporting errors at >>> all times, with exception of RepeatModeler, which ran with no problems. Is >>> the output from RepeatModeler enough to improve the masking? It is not the >>> best option I guess, but better than just the RepBase libraries by >>> themselves, isn't it? >>> >>> Thank you for your time, >>> >>> Xabier >>> >>> -- >>> Xabier V?zquez Campos >>> *PhD Candidate* >>> Water Research Centre >>> School of Civil and Environmental Engineering >>> The University of New South Wales >>> Sydney NSW 2052 AUSTRALIA >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> >> >> -- >> Michael Campbell MS, RD. >> Doctoral Candidate >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ph:585-3543 >> >> > > > -- > Xabier V?zquez Campos > *PhD Candidate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 23 12:17:36 2015 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 23 Jan 2015 12:17:36 -0700 Subject: [maker-devel] running maker on TACC Stampede. In-Reply-To: References: Message-ID: Stampede only has MVAPICH2. It does not have OpenMPI (even though it has been requested several times). OpenFabrics libraries (used by MVAPICH2) have a known issue that restricts programs from making system calls while running under MPI. A system call is when one program launches another (i.e. MAKER launching BLAST). For this reason MAKER does not work with MVAPICH2. It only works with OpenMPI. You can still get it to work on MVAPICH2, but only on a single node. If you request more than one node then it will fail. The solution would be for TACC to install OpenMPI as an option on Stampede (like they have on Lonestar), but until that happens you can only run MAKER on a single node. Thanks, Carson > On Jan 22, 2015, at 10:51 PM, Won C Yim wrote: > > Dear anyone whom may it concern, > > Hi! > > My name is Won Cheol Yim in University of Nevada, Reno. > > I try to run MAKER on TACC Stampede. > > It looks everything installed properly. > > ============================================================================== > STATUS MAKER v2.31.8 > ============================================================================== > PERL Dependencies: > VERIFIED > External Programs: > VERIFIED > External C Libraries: > VERIFIED > MPI SUPPORT: > ENABLED > MWAS Web Interface: > DISABLED > MAKER PACKAGE: > CONFIGURATION OK > > And I installed Perl 5.18.4 with threads option. > > But I try to run it with MPI, it generated error. > > I assumed this problem came from ibrun in Stampede. > > Is there anyway to run it on Stampede? > > Here is my log. > > TACC: Starting up job > TACC: Setting up parallel environment for MVAPICH ssh-based mpirun. > cat: /home1/02908/wyim/.sge/job..hostlist.kUm5vXw9: No such file or directory > sort: open failed: /home1/02908/wyim/.sge/job..hostlist.kUm5vXw9: No such file or directory > TACC: Setup complete. Running job script. > TACC: starting parallel tasks... > [c404-703.stampede.tacc.utexas.edu:mpirun_rsh][read_hostfile] Can't open hostfile `/home1/02908/wyim/.sge/job..hostlist.kUm5vXw9': (2) > TACC: MPI job exited with code: 1 > TACC: Shutting down parallel environment. > TACC: Shutdown complete. Exiting. > > > Regards, > > Won > -- > Yim, Won Cheol > Sent with Airmail -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 23 13:00:56 2015 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 23 Jan 2015 13:00:56 -0700 Subject: [maker-devel] Maker on Amazon EC2 Using Starcluster In-Reply-To: <1421848561970.c8b481bf@Nodemailer> References: <1421848561970.c8b481bf@Nodemailer> Message-ID: MAKER needs a global storage location. You probably need to set up one of your instances up to act as a shared storage server. AWS has lustre implementations for the cloud, perhaps you can try that. Also use OpenMPI instead of MPICH2. It?s more stable. I look forward to seeing how your experiment with AWS, MPI, and MAKER works out. ?Carson > On Jan 21, 2015, at 6:56 AM, Jason Gallant wrote: > > Hi Everyone, > > I?m attempting to run Maker on Amazon EC2 using MIT?s starcluster? I?ve started a 200 node cluster, and enabled MPICH2 (Starcluster by default uses OpenMPI). I plan on documenting this setup once I?ve figured out how to run things reliably. > > I?m having a persistent issue where something fails on one of the nodes, and std error is flooded with: > > examining contents of the fasta file and run log > [67] ERROR: could not make datastore directory > [67] --> rank=67, hostname=node067 > [67] ERROR: Failed while examining contents of the fasta file and run log > [67] ERROR: Chunk failed at level:0, tier_type:0 > [67] FAILED CONTIG:Scaffold261 > > This error repeats for each ?next? scaffold for some time. When I go back to find the ?source? of the error in the log, the following is the first error message on that node: > > 67] #-------------------------------# > [67] deleted:-60 hits > [67] collecting blastx reports > [67] ERROR: Could not colapse BLAST reports > [67] at /root/maker/bin/../lib/GI.pm line 2524 thread 1. > [67] GI::combine_blast_report(FastaChunk=HASH(0x108e1a90), ARRAY(0x1b874938), ARRAY(0xf127ad8), runlog=HASH(0x4d54ed8)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760 thread 1 > [67] Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415 thread 1 > [67] eval {...} called at /root/maker/bin/../lib/Error.pm line 407 thread 1 > [67] Error::subs::try(CODE(0x1514eb00), HASH(0x9cbeb568)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4215 thread 1 > [67] Process::MpiChunk::_go(Process::MpiChunk=HASH(0x13976308), "run", HASH(0x12e04268), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341 thread 1 > [67] Process::MpiChunk::run(Process::MpiChunk=HASH(0x13976308), 67) called at /root/maker/bin/maker line 1457 thread 1 > [67] main::node_thread("/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 > [67] eval {...} called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 > [67] threads::new("threads", CODE(0x3dc5b38), "/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /root/maker/bin/maker line 917 thread 1 > [67] --> rank=67, hostname=node067 > [67] ERROR: Failed while collecting blastx reports > [67] ERROR: Chunk failed at level:9, tier_type:3 > [67] FAILED CONTIG:Scaffold66 > [67] > [67] ERROR: Chunk failed at level:4, tier_type:0 > [67] FAILED CONTIG:Scaffold66 > > > I?ve attempted to ignore the error to see if things will proceed on the other 199 processors. When I returned to the ?master? node after the evening, Maker keeps repeating the same error code over and over (same scaffold): > ] examining contents of the fasta file and run log > [67] ERROR: could not make datastore directory > [67] --> rank=67, hostname=node067 > [67] ERROR: Failed while examining contents of the fasta file and run log > [67] ERROR: Chunk failed at level:0, tier_type:0 > [67] FAILED CONTIG:Scaffold1589 > > I stop the job, and restart, and after only a few minutes of running, the same error is reported, this time on a new scaffold. Strangely here, the error is reported in the MPI tag of node001, but the error originates at node137: > > ERROR: Could not colapse BLAST reports > [1] at /root/maker/bin/../lib/GI.pm line 2524. > [1] GI::combine_blast_report(FastaChunk=HASH(0xf4aa9b8), ARRAY(0xf628f90), ARRAY(0x325fea78), runlog=HASH(0x133cc8e8)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760 > [1] Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415 > [1] eval {...} called at /root/maker/bin/../lib/Error.pm line 407 > [1] Error::subs::try(CODE(0x352c9b8), HASH(0xdab3b690)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4215 > [1] Process::MpiChunk::_go(Process::MpiChunk=HASH(0x3545d90), "run", HASH(0x30aa710), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341 > [1] Process::MpiChunk::run(Process::MpiChunk=HASH(0x3545d90), 137) called at /root/maker/bin/maker line 979 > [1] --> rank=137, hostname=node137 > [1] ERROR: Failed while collecting blastx reports > [1] ERROR: Chunk failed at level:9, tier_type:3 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] ERROR: Chunk failed at level:4, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > > I?d appreciate any guidance as how best to diagnose this error! > > Many thanks, > Jason Gallant > > > > > ? > Dr. Jason R. Gallant > Assistant Professor > Room 38 Natural Sciences > Department of Zoology > Michigan State University > East Lansing, MI 48824 > jgallant at msu.edu > office: 517-884-7756 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From jcornel3 at asu.edu Fri Jan 23 14:28:13 2015 From: jcornel3 at asu.edu (John Cornelius) Date: Fri, 23 Jan 2015 13:28:13 -0800 Subject: [maker-devel] Maker-P vs. Maker Message-ID: Hi, I'm working on annotating a tetraploid animal with a genome size that is 3.1 gigabase in size. I was wondering if maker-P would be appropriate for this organism or is I should just stick with maker? Thanks. -- John Cornelius MCB PhD Candidate Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 23 14:59:01 2015 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 23 Jan 2015 14:59:01 -0700 Subject: [maker-devel] Maker-P vs. Maker In-Reply-To: References: Message-ID: <7813BFBE-7237-4298-8AD3-B210CB96DDD2@gmail.com> Actually the code bases have been merged. So if you use the most recent version of MAKER, the plant extensions for RNA annotation and extra analysis scripts from MAKER-P will be there. If you don?t need them, then just don?t turn the options on in the control files. ?Carson > On Jan 23, 2015, at 2:28 PM, John Cornelius wrote: > > Hi, I'm working on annotating a tetraploid animal with a genome size that is 3.1 gigabase in size. I was wondering if maker-P would be appropriate for this organism or is I should just stick with maker? Thanks. > > -- > John Cornelius > MCB PhD Candidate > Arizona State University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Jan 26 12:17:45 2015 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 26 Jan 2015 12:17:45 -0700 Subject: [maker-devel] running maker on TACC Stampede. In-Reply-To: References: Message-ID: Do you mean sequence upstream of the gene? If that is the case you would probably have to write a script to do this. BioPerl is one options that has several Perl modules that help with manipulating fasta sequences and many common biology tool file formats ?> http://www.bioperl.org ?Carson > On Jan 26, 2015, at 12:10 PM, Won C Yim wrote: > > Dear Carson Holt, > > Thank you for your reply. > > I asked this issue to STAMPEDE and there?s no way to help me. > > I think we need to move another server for MAKER. > > Thank you for your help. > > And I have a one more question. > > Is there any way to extract upstream sequence from MAKER results? > > I tried to extract upstream and downstream results from them, but it?s really hard to do it. > > Regards, > > Won > > -- > Yim, Won Cheol > MS330/Department of Biochemistry & Molecular Biology > 1664 N. Virginia Street > University of Nevada, Reno > > email: wyim at unr.edu > > > On January 23, 2015 at 11:17:41 AM, Carson Holt (carsonhh at gmail.com ) wrote: > >> Stampede only has MVAPICH2. It does not have OpenMPI (even though it has been requested several times). OpenFabrics libraries (used by MVAPICH2) have a known issue that restricts programs from making system calls while running under MPI. A system call is when one program launches another (i.e. MAKER launching BLAST). For this reason MAKER does not work with MVAPICH2. It only works with OpenMPI. >> >> You can still get it to work on MVAPICH2, but only on a single node. If you request more than one node then it will fail. The solution would be for TACC to install OpenMPI as an option on Stampede (like they have on Lonestar), but until that happens you can only run MAKER on a single node. >> >> Thanks, >> Carson >> >> >>> On Jan 22, 2015, at 10:51 PM, Won C Yim > wrote: >>> >>> Dear anyone whom may it concern, >>> >>> Hi! >>> >>> My name is Won Cheol Yim in University of Nevada, Reno. >>> >>> I try to run MAKER on TACC Stampede. >>> >>> It looks everything installed properly. >>> >>> ============================================================================== >>> STATUS MAKER v2.31.8 >>> ============================================================================== >>> PERL Dependencies:VERIFIED >>> External Programs:VERIFIED >>> External C Libraries:VERIFIED >>> MPI SUPPORT:ENABLED >>> MWAS Web Interface:DISABLED >>> MAKER PACKAGE:CONFIGURATION OK >>> >>> And I installed Perl 5.18.4 with threads option. >>> >>> But I try to run it with MPI, it generated error. >>> >>> I assumed this problem came from ibrun in Stampede. >>> >>> Is there anyway to run it on Stampede? >>> >>> Here is my log. >>> >>> TACC: Starting up job >>> TACC: Setting up parallel environment for MVAPICH ssh-based mpirun. >>> cat: /home1/02908/wyim/.sge/job..hostlist.kUm5vXw9: No such file or directory >>> sort: open failed: /home1/02908/wyim/.sge/job..hostlist.kUm5vXw9: No such file or directory >>> TACC: Setup complete. Running job script. >>> TACC: starting parallel tasks... >>> [c404-703.stampede.tacc.utexas.edu:mpirun_rsh][read_hostfile] Can't open hostfile `/home1/02908/wyim/.sge/job..hostlist.kUm5vXw9': (2) >>> TACC: MPI job exited with code: 1 >>> TACC: Shutting down parallel environment. >>> TACC: Shutdown complete. Exiting. >>> >>> >>> Regards, >>> >>> Won >>> -- >>> Yim, Won Cheol >>> Sent with Airmail -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.hoeppner at imbim.uu.se Wed Jan 28 00:01:48 2015 From: marc.hoeppner at imbim.uu.se (=?utf-8?B?TWFyYyBIw7ZwcG5lcg==?=) Date: Wed, 28 Jan 2015 07:01:48 +0000 Subject: [maker-devel] Maker crash on increasingly small contigs In-Reply-To: <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> References: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> Message-ID: Hi, this is probably a long shot, but I was hoping that someone on the list may have some advice as to how to debug an error that has been popping up when running Maker on our 10 node cluster. So, what is the issue? Maker runs fine on several assemblies that w have processed in the past, but I recently started on a fairly fragment (low N50) mammalian assembly and the collaborator was keen to have all contigs annotated, down to 1kb (I guess it is more about the repeats and blast matches in those small bits). Anyway, As the contigs get smaller, Maker starts crashing in MPI mode with the following error (no other message given prior to that): perl:13424 terminated with signal 11 at PC=3d47095012 SP=7f8ac076e530. Backtrace: /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x3d47095012] /lib64/libpthread.so.0[0x358ae0f710] /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x0)[0x3d47094ff0] /lib64/libpthread.so.0[0x358ae0f710] /lib64/libc.so.6(__poll+0x53)[0x358aadf343] /sw/openmpi/1.8.3/lib/libopen-pal.so.6(+0x6af4a)[0x7f8ac0a29f4a] /sw/openmpi/1.8.3/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x221)[0x7f8ac0a21961] /sw/openmpi/1.8.3/lib/libopen-rte.so.7(+0x52f8e)[0x7f8ac0ce5f8e] /lib64/libpthread.so.0[0x358ae079d1] /lib64/libc.so.6(clone+0x6d)[0x358aae8b6d] SIGTERM received A few words about the setup: We have 10 nodes, 160 cores and the shared file system is exported via Infiniband from a ?standard? NFS server. As OS we run Scientific Linux 6.5. Tests so far don?t point to congestion issues or anything like that, the bandwidth usage is actually fairly low. I So far I tried: - running the MPI processes through both the ethernet network as well as over IPoIB, same problem. - installing a more recent version of perl through perlbrew, with all the required modules, and re-compiled Maker - ran some (albeit simple) network checks to for retransmissions, lost packages etc - nothing popped up - running Maker in a subset of nodes to eliminate the possibility of a bad node The error message is a bit cryptic to me and it would be very helpful to know if Maker has a problem with accessing a file, or whether OpenMPI has a communication problem etc - but I am not able to tell from the information I have been able to extract so far. Any ideas? So Cheers, Marc Marc P. Hoeppner, PhD Team Leader BILS Genome Annotation Platform Department for Medical Biochemistry and Microbiology Uppsala University, Sweden marc.hoeppner at imbim.uu.se From dence at genetics.utah.edu Wed Jan 28 09:22:09 2015 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 28 Jan 2015 16:22:09 +0000 Subject: [maker-devel] Maker crash on increasingly small contigs In-Reply-To: References: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> Message-ID: <19F7E075-6B18-4DB2-B97A-922D29456E52@genetics.utah.edu> Hi Marc, so a few things on the maker side to check out. Did you have the min_contig set to 1000, to set the lower limit on contig size? Did maker do anything with the 1kb contigs? Or did it just skip them? You can check that in the master_datastore_index.log or in the void directories for the small contigs. That will tell us whether maker is functioning correctly, even though it?s giving those messages. With the newer versions of makers, I get messages identical to what you sent as part of the normal thread termination, even when maker is functioning normally. Thanks, Daniel > On Jan 28, 2015, at 12:01 AM, Marc H?ppner wrote: > > Hi, > > this is probably a long shot, but I was hoping that someone on the list may have some advice as to how to debug an error that has been popping up when running Maker on our 10 node cluster. So, what is the issue? > > Maker runs fine on several assemblies that w have processed in the past, but I recently started on a fairly fragment (low N50) mammalian assembly and the collaborator was keen to have all contigs annotated, down to 1kb (I guess it is more about the repeats and blast matches in those small bits). Anyway, As the contigs get smaller, Maker starts crashing in MPI mode with the following error (no other message given prior to that): > > perl:13424 terminated with signal 11 at PC=3d47095012 SP=7f8ac076e530. Backtrace: > /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x3d47095012] > /lib64/libpthread.so.0[0x358ae0f710] > /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x0)[0x3d47094ff0] > /lib64/libpthread.so.0[0x358ae0f710] > /lib64/libc.so.6(__poll+0x53)[0x358aadf343] > /sw/openmpi/1.8.3/lib/libopen-pal.so.6(+0x6af4a)[0x7f8ac0a29f4a] > /sw/openmpi/1.8.3/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x221)[0x7f8ac0a21961] > /sw/openmpi/1.8.3/lib/libopen-rte.so.7(+0x52f8e)[0x7f8ac0ce5f8e] > /lib64/libpthread.so.0[0x358ae079d1] > /lib64/libc.so.6(clone+0x6d)[0x358aae8b6d] > SIGTERM received > > A few words about the setup: > > We have 10 nodes, 160 cores and the shared file system is exported via Infiniband from a ?standard? NFS server. As OS we run Scientific Linux 6.5. Tests so far don?t point to congestion issues or anything like that, the bandwidth usage is actually fairly low. I > > So far I tried: > > - running the MPI processes through both the ethernet network as well as over IPoIB, same problem. > - installing a more recent version of perl through perlbrew, with all the required modules, and re-compiled Maker > - ran some (albeit simple) network checks to for retransmissions, lost packages etc - nothing popped up > - running Maker in a subset of nodes to eliminate the possibility of a bad node > > The error message is a bit cryptic to me and it would be very helpful to know if Maker has a problem with accessing a file, or whether OpenMPI has a communication problem etc - but I am not able to tell from the information I have been able to extract so far. Any ideas? > > So > > Cheers, > > Marc > > > Marc P. Hoeppner, PhD > Team Leader > BILS Genome Annotation Platform > Department for Medical Biochemistry and Microbiology > Uppsala University, Sweden > marc.hoeppner at imbim.uu.se > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From marc.hoeppner at imbim.uu.se Thu Jan 29 00:34:17 2015 From: marc.hoeppner at imbim.uu.se (Marc P. Hoeppner) Date: Thu, 29 Jan 2015 08:34:17 +0100 Subject: [maker-devel] Maker crash on increasingly small contigs In-Reply-To: <19F7E075-6B18-4DB2-B97A-922D29456E52@genetics.utah.edu> References: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> <19F7E075-6B18-4DB2-B97A-922D29456E52@genetics.utah.edu> Message-ID: <54C9E279.8040907@imbim.uu.se> Hi, thanks for the feedback. If I resume maker enough times, it will eventually run through an complete all contigs. The question is whether there is any way to debug why it drops at random times , most commonly when running on small contigs (which is probably more due to the increasing frequency of starting/finishing jobs rather than their size). I guess Maker has no debug mode or any other way to find out why it dies? Any idea what could make Maker drop like that? I was thinking NFS, but the nfsstat looks fine, nothing in the log and NFS function is generally good - so I can't identify a good point to look for the problem. Regards, Marc On 2015-01-28 17:22, Daniel Ence wrote: > Hi Marc, so a few things on the maker side to check out. > > Did you have the min_contig set to 1000, to set the lower limit on contig size? > Did maker do anything with the 1kb contigs? Or did it just skip them? > You can check that in the master_datastore_index.log or in the void directories for the small contigs. > That will tell us whether maker is functioning correctly, even though it?s giving those messages. > > With the newer versions of makers, I get messages identical to what you sent as part of the normal thread termination, even when maker is functioning normally. > > Thanks, > Daniel > > > >> On Jan 28, 2015, at 12:01 AM, Marc H?ppner wrote: >> >> Hi, >> >> this is probably a long shot, but I was hoping that someone on the list may have some advice as to how to debug an error that has been popping up when running Maker on our 10 node cluster. So, what is the issue? >> >> Maker runs fine on several assemblies that w have processed in the past, but I recently started on a fairly fragment (low N50) mammalian assembly and the collaborator was keen to have all contigs annotated, down to 1kb (I guess it is more about the repeats and blast matches in those small bits). Anyway, As the contigs get smaller, Maker starts crashing in MPI mode with the following error (no other message given prior to that): >> >> perl:13424 terminated with signal 11 at PC=3d47095012 SP=7f8ac076e530. Backtrace: >> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x3d47095012] >> /lib64/libpthread.so.0[0x358ae0f710] >> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x0)[0x3d47094ff0] >> /lib64/libpthread.so.0[0x358ae0f710] >> /lib64/libc.so.6(__poll+0x53)[0x358aadf343] >> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(+0x6af4a)[0x7f8ac0a29f4a] >> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x221)[0x7f8ac0a21961] >> /sw/openmpi/1.8.3/lib/libopen-rte.so.7(+0x52f8e)[0x7f8ac0ce5f8e] >> /lib64/libpthread.so.0[0x358ae079d1] >> /lib64/libc.so.6(clone+0x6d)[0x358aae8b6d] >> SIGTERM received >> >> A few words about the setup: >> >> We have 10 nodes, 160 cores and the shared file system is exported via Infiniband from a ?standard? NFS server. As OS we run Scientific Linux 6.5. Tests so far don?t point to congestion issues or anything like that, the bandwidth usage is actually fairly low. I >> >> So far I tried: >> >> - running the MPI processes through both the ethernet network as well as over IPoIB, same problem. >> - installing a more recent version of perl through perlbrew, with all the required modules, and re-compiled Maker >> - ran some (albeit simple) network checks to for retransmissions, lost packages etc - nothing popped up >> - running Maker in a subset of nodes to eliminate the possibility of a bad node >> >> The error message is a bit cryptic to me and it would be very helpful to know if Maker has a problem with accessing a file, or whether OpenMPI has a communication problem etc - but I am not able to tell from the information I have been able to extract so far. Any ideas? >> >> So >> >> Cheers, >> >> Marc >> >> >> Marc P. Hoeppner, PhD >> Team Leader >> BILS Genome Annotation Platform >> Department for Medical Biochemistry and Microbiology >> Uppsala University, Sweden >> marc.hoeppner at imbim.uu.se >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mikael.durling at slu.se Thu Jan 29 02:37:23 2015 From: mikael.durling at slu.se (=?utf-8?B?TWlrYWVsIEJyYW5kc3Ryw7ZtIER1cmxpbmc=?=) Date: Thu, 29 Jan 2015 09:37:23 +0000 Subject: [maker-devel] Maker crash on increasingly small contigs In-Reply-To: <54C9E279.8040907@imbim.uu.se> References: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> <19F7E075-6B18-4DB2-B97A-922D29456E52@genetics.utah.edu> <54C9E279.8040907@imbim.uu.se> Message-ID: Hi, are you running the NFS servers in synchronous or asynchronous mode? I have seen cases when maker fails with the nfs server in async mode, but the failures are random and I can?t really reproduce them. In the end, I have continued running maker on NFS in async mode, since the speed gains are significant, at the cost of occasional reruns. (And yes, nfsstats shows no signs of errors). Mikael > 29 jan 2015 kl. 08:34 skrev Marc P. Hoeppner : > > Hi, > > thanks for the feedback. If I resume maker enough times, it will eventually run through an complete all contigs. The question is whether there is any way to debug why it drops at random times , most commonly when running on small contigs (which is probably more due to the increasing frequency of starting/finishing jobs rather than their size). I guess Maker has no debug mode or any other way to find out why it dies? Any idea what could make Maker drop like that? I was thinking NFS, but the nfsstat looks fine, nothing in the log and NFS function is generally good - so I can't identify a good point to look for the problem. > > Regards, > > Marc > > On 2015-01-28 17:22, Daniel Ence wrote: >> Hi Marc, so a few things on the maker side to check out. >> >> Did you have the min_contig set to 1000, to set the lower limit on contig size? >> Did maker do anything with the 1kb contigs? Or did it just skip them? >> You can check that in the master_datastore_index.log or in the void directories for the small contigs. >> That will tell us whether maker is functioning correctly, even though it?s giving those messages. >> >> With the newer versions of makers, I get messages identical to what you sent as part of the normal thread termination, even when maker is functioning normally. >> >> Thanks, >> Daniel >> >> >> >>> On Jan 28, 2015, at 12:01 AM, Marc H?ppner wrote: >>> >>> Hi, >>> >>> this is probably a long shot, but I was hoping that someone on the list may have some advice as to how to debug an error that has been popping up when running Maker on our 10 node cluster. So, what is the issue? >>> >>> Maker runs fine on several assemblies that w have processed in the past, but I recently started on a fairly fragment (low N50) mammalian assembly and the collaborator was keen to have all contigs annotated, down to 1kb (I guess it is more about the repeats and blast matches in those small bits). Anyway, As the contigs get smaller, Maker starts crashing in MPI mode with the following error (no other message given prior to that): >>> >>> perl:13424 terminated with signal 11 at PC=3d47095012 SP=7f8ac076e530. Backtrace: >>> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x3d47095012] >>> /lib64/libpthread.so.0[0x358ae0f710] >>> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x0)[0x3d47094ff0] >>> /lib64/libpthread.so.0[0x358ae0f710] >>> /lib64/libc.so.6(__poll+0x53)[0x358aadf343] >>> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(+0x6af4a)[0x7f8ac0a29f4a] >>> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x221)[0x7f8ac0a21961] >>> /sw/openmpi/1.8.3/lib/libopen-rte.so.7(+0x52f8e)[0x7f8ac0ce5f8e] >>> /lib64/libpthread.so.0[0x358ae079d1] >>> /lib64/libc.so.6(clone+0x6d)[0x358aae8b6d] >>> SIGTERM received >>> >>> A few words about the setup: >>> >>> We have 10 nodes, 160 cores and the shared file system is exported via Infiniband from a ?standard? NFS server. As OS we run Scientific Linux 6.5. Tests so far don?t point to congestion issues or anything like that, the bandwidth usage is actually fairly low. I >>> >>> So far I tried: >>> >>> - running the MPI processes through both the ethernet network as well as over IPoIB, same problem. >>> - installing a more recent version of perl through perlbrew, with all the required modules, and re-compiled Maker >>> - ran some (albeit simple) network checks to for retransmissions, lost packages etc - nothing popped up >>> - running Maker in a subset of nodes to eliminate the possibility of a bad node >>> >>> The error message is a bit cryptic to me and it would be very helpful to know if Maker has a problem with accessing a file, or whether OpenMPI has a communication problem etc - but I am not able to tell from the information I have been able to extract so far. Any ideas? >>> >>> So >>> >>> Cheers, >>> >>> Marc >>> >>> >>> Marc P. Hoeppner, PhD >>> Team Leader >>> BILS Genome Annotation Platform >>> Department for Medical Biochemistry and Microbiology >>> Uppsala University, Sweden >>> marc.hoeppner at imbim.uu.se >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Jan 29 08:22:57 2015 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 29 Jan 2015 08:22:57 -0700 Subject: [maker-devel] Maker crash on increasingly small contigs In-Reply-To: References: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> <19F7E075-6B18-4DB2-B97A-922D29456E52@genetics.utah.edu> <54C9E279.8040907@imbim.uu.se> Message-ID: In my experience NFS is the most likely cause. A lot of very small contigs means that MAKER would produce a lot of very small files very quickly, which creates far more stress for NFS than high IO read/write bandwidth does. There can then be several seconds of lag time between a file being created and the file being available for reading because the asynchronous setting allows the system to return true for IO operation even though the operations have not yet been completed but are only buffered on the NFS server. So when the process tries to read the file it supposably just created, the file doesn?t exist. MAKER tries to offload most small file creation operations that can result in this condition to a temporary directory (indicated by TMP= in the maker_opts.ctl file), so it is critical that this location be set to a local drive and not an NFS location. But running a lot of very small contigs would still result in more frequent file creation on the NFS mount. The only way around this type of NFS issue is either to run on fewer nodes to reduce file creation frequency, turn off asynchronous mode for NFS (which results in serious IO performance degradation) or to just let MAKER retry until it works (brute force) which is the default and in my experience the most effective approach. NFS issues were in fact the reason we put retry and restart capabilities into MAKER in the first place. ?Carson > On Jan 29, 2015, at 2:37 AM, Mikael Brandstr?m Durling wrote: > > Hi, > > are you running the NFS servers in synchronous or asynchronous mode? I have seen cases when maker fails with the nfs server in async mode, but the failures are random and I can?t really reproduce them. In the end, I have continued running maker on NFS in async mode, since the speed gains are significant, at the cost of occasional reruns. (And yes, nfsstats shows no signs of errors). > > Mikael > > >> 29 jan 2015 kl. 08:34 skrev Marc P. Hoeppner : >> >> Hi, >> >> thanks for the feedback. If I resume maker enough times, it will eventually run through an complete all contigs. The question is whether there is any way to debug why it drops at random times , most commonly when running on small contigs (which is probably more due to the increasing frequency of starting/finishing jobs rather than their size). I guess Maker has no debug mode or any other way to find out why it dies? Any idea what could make Maker drop like that? I was thinking NFS, but the nfsstat looks fine, nothing in the log and NFS function is generally good - so I can't identify a good point to look for the problem. >> >> Regards, >> >> Marc >> >> On 2015-01-28 17:22, Daniel Ence wrote: >>> Hi Marc, so a few things on the maker side to check out. >>> >>> Did you have the min_contig set to 1000, to set the lower limit on contig size? >>> Did maker do anything with the 1kb contigs? Or did it just skip them? >>> You can check that in the master_datastore_index.log or in the void directories for the small contigs. >>> That will tell us whether maker is functioning correctly, even though it?s giving those messages. >>> >>> With the newer versions of makers, I get messages identical to what you sent as part of the normal thread termination, even when maker is functioning normally. >>> >>> Thanks, >>> Daniel >>> >>> >>> >>>> On Jan 28, 2015, at 12:01 AM, Marc H?ppner wrote: >>>> >>>> Hi, >>>> >>>> this is probably a long shot, but I was hoping that someone on the list may have some advice as to how to debug an error that has been popping up when running Maker on our 10 node cluster. So, what is the issue? >>>> >>>> Maker runs fine on several assemblies that w have processed in the past, but I recently started on a fairly fragment (low N50) mammalian assembly and the collaborator was keen to have all contigs annotated, down to 1kb (I guess it is more about the repeats and blast matches in those small bits). Anyway, As the contigs get smaller, Maker starts crashing in MPI mode with the following error (no other message given prior to that): >>>> >>>> perl:13424 terminated with signal 11 at PC=3d47095012 SP=7f8ac076e530. Backtrace: >>>> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x3d47095012] >>>> /lib64/libpthread.so.0[0x358ae0f710] >>>> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x0)[0x3d47094ff0] >>>> /lib64/libpthread.so.0[0x358ae0f710] >>>> /lib64/libc.so.6(__poll+0x53)[0x358aadf343] >>>> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(+0x6af4a)[0x7f8ac0a29f4a] >>>> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x221)[0x7f8ac0a21961] >>>> /sw/openmpi/1.8.3/lib/libopen-rte.so.7(+0x52f8e)[0x7f8ac0ce5f8e] >>>> /lib64/libpthread.so.0[0x358ae079d1] >>>> /lib64/libc.so.6(clone+0x6d)[0x358aae8b6d] >>>> SIGTERM received >>>> >>>> A few words about the setup: >>>> >>>> We have 10 nodes, 160 cores and the shared file system is exported via Infiniband from a ?standard? NFS server. As OS we run Scientific Linux 6.5. Tests so far don?t point to congestion issues or anything like that, the bandwidth usage is actually fairly low. I >>>> >>>> So far I tried: >>>> >>>> - running the MPI processes through both the ethernet network as well as over IPoIB, same problem. >>>> - installing a more recent version of perl through perlbrew, with all the required modules, and re-compiled Maker >>>> - ran some (albeit simple) network checks to for retransmissions, lost packages etc - nothing popped up >>>> - running Maker in a subset of nodes to eliminate the possibility of a bad node >>>> >>>> The error message is a bit cryptic to me and it would be very helpful to know if Maker has a problem with accessing a file, or whether OpenMPI has a communication problem etc - but I am not able to tell from the information I have been able to extract so far. Any ideas? >>>> >>>> So >>>> >>>> Cheers, >>>> >>>> Marc >>>> >>>> >>>> Marc P. Hoeppner, PhD >>>> Team Leader >>>> BILS Genome Annotation Platform >>>> Department for Medical Biochemistry and Microbiology >>>> Uppsala University, Sweden >>>> marc.hoeppner at imbim.uu.se >>>> >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From myandell at genetics.utah.edu Thu Jan 29 09:54:50 2015 From: myandell at genetics.utah.edu (Mark Yandell) Date: Thu, 29 Jan 2015 16:54:50 +0000 Subject: [maker-devel] Maker crash on increasingly small contigs In-Reply-To: <54C9E279.8040907@imbim.uu.se> References: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> <19F7E075-6B18-4DB2-B97A-922D29456E52@genetics.utah.edu>, <54C9E279.8040907@imbim.uu.se> Message-ID: <7A60AB257EFF2B48B1F4C814817EA053E371D456@mxb2.hg.genetics.utah.edu> Hi Marc, are you sure this n't your system? E.G. bad NFS mounts, scratch full etc? Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Co-director USTAR Center for Genetic Discovery Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Marc P. Hoeppner [marc.hoeppner at imbim.uu.se] Sent: Thursday, January 29, 2015 12:34 AM To: Daniel Ence Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Maker crash on increasingly small contigs Hi, thanks for the feedback. If I resume maker enough times, it will eventually run through an complete all contigs. The question is whether there is any way to debug why it drops at random times , most commonly when running on small contigs (which is probably more due to the increasing frequency of starting/finishing jobs rather than their size). I guess Maker has no debug mode or any other way to find out why it dies? Any idea what could make Maker drop like that? I was thinking NFS, but the nfsstat looks fine, nothing in the log and NFS function is generally good - so I can't identify a good point to look for the problem. Regards, Marc On 2015-01-28 17:22, Daniel Ence wrote: > Hi Marc, so a few things on the maker side to check out. > > Did you have the min_contig set to 1000, to set the lower limit on contig size? > Did maker do anything with the 1kb contigs? Or did it just skip them? > You can check that in the master_datastore_index.log or in the void directories for the small contigs. > That will tell us whether maker is functioning correctly, even though it?s giving those messages. > > With the newer versions of makers, I get messages identical to what you sent as part of the normal thread termination, even when maker is functioning normally. > > Thanks, > Daniel > > > >> On Jan 28, 2015, at 12:01 AM, Marc H?ppner wrote: >> >> Hi, >> >> this is probably a long shot, but I was hoping that someone on the list may have some advice as to how to debug an error that has been popping up when running Maker on our 10 node cluster. So, what is the issue? >> >> Maker runs fine on several assemblies that w have processed in the past, but I recently started on a fairly fragment (low N50) mammalian assembly and the collaborator was keen to have all contigs annotated, down to 1kb (I guess it is more about the repeats and blast matches in those small bits). Anyway, As the contigs get smaller, Maker starts crashing in MPI mode with the following error (no other message given prior to that): >> >> perl:13424 terminated with signal 11 at PC=3d47095012 SP=7f8ac076e530. Backtrace: >> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x3d47095012] >> /lib64/libpthread.so.0[0x358ae0f710] >> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x0)[0x3d47094ff0] >> /lib64/libpthread.so.0[0x358ae0f710] >> /lib64/libc.so.6(__poll+0x53)[0x358aadf343] >> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(+0x6af4a)[0x7f8ac0a29f4a] >> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x221)[0x7f8ac0a21961] >> /sw/openmpi/1.8.3/lib/libopen-rte.so.7(+0x52f8e)[0x7f8ac0ce5f8e] >> /lib64/libpthread.so.0[0x358ae079d1] >> /lib64/libc.so.6(clone+0x6d)[0x358aae8b6d] >> SIGTERM received >> >> A few words about the setup: >> >> We have 10 nodes, 160 cores and the shared file system is exported via Infiniband from a ?standard? NFS server. As OS we run Scientific Linux 6.5. Tests so far don?t point to congestion issues or anything like that, the bandwidth usage is actually fairly low. I >> >> So far I tried: >> >> - running the MPI processes through both the ethernet network as well as over IPoIB, same problem. >> - installing a more recent version of perl through perlbrew, with all the required modules, and re-compiled Maker >> - ran some (albeit simple) network checks to for retransmissions, lost packages etc - nothing popped up >> - running Maker in a subset of nodes to eliminate the possibility of a bad node >> >> The error message is a bit cryptic to me and it would be very helpful to know if Maker has a problem with accessing a file, or whether OpenMPI has a communication problem etc - but I am not able to tell from the information I have been able to extract so far. Any ideas? >> >> So >> >> Cheers, >> >> Marc >> >> >> Marc P. Hoeppner, PhD >> Team Leader >> BILS Genome Annotation Platform >> Department for Medical Biochemistry and Microbiology >> Uppsala University, Sweden >> marc.hoeppner at imbim.uu.se >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From ashrafi at ucdavis.edu Thu Jan 29 11:07:41 2015 From: ashrafi at ucdavis.edu (Hamid Ashrafi) Date: Thu, 29 Jan 2015 13:07:41 -0500 Subject: [maker-devel] GFF and Dereferencing problem Message-ID: <007c01d03bee$7ab100a0$701301e0$@ucdavis.edu> Hi, After maker finishes its job it generates many files one of them is gff file. I see the following in some of my gff files. It seems it is a dereferencing problem. I am just wondering if affects my annotation. Hamid uti_cns_0004767 est2genome match_part 856428 856485 3090 + . ID=uti_cns_0004767:hsp:8340:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 uti_cns_0004767 est2genome match_part 856587 856938 3090 + . ID=uti_cns_0004767:hsp:8341:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 uti_cns_0004767 est2genome match_part 857053 857201 3090 + . ID=uti_cns_0004767:hsp:8342:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 uti_cns_0004767 est2genome match_part 859004 859041 3090 + . ID=uti_cns_0004767:hsp:8343:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 uti_cns_0004767 est2genome expressed_sequence_match 878327 878771 1446 + . ID=uti_cns_0004767:hit:4231:3.2.3.8;Name=Sp_Illum_Trans_W uti_cns_0004767 est2genome match_part 878327 878771 1446 + . ID=uti_cns_0004767:hsp:8344:3.2.3.8;Parent=uti_cns_0004767:hit:4231:3.2.3 uti_cns_0004767 est2genome expressed_sequence_match 884121 886610 2509 + . ID=uti_cns_0004767:hit:4232:3.2.3.8;Name=Sp_Illum_Trans_W uti_cns_0004767 est2genome match_part 884121 884195 2509 + . ID=uti_cns_0004767:hsp:8345:3.2.3.8;Parent=uti_cns_0004767:hit:4232:3.2.3 uti_cns_0004767 est2genome match_part 886180 886610 2509 + . ID=uti_cns_0004767:hsp:8346:3.2.3.8;Parent=uti_cns_0004767:hit:4232:3.2.3 ARRAY(0x1b91f110) ARRAY(0x1a686350) ARRAY(0x1b06bba0) ARRAY(0x1b931e10) ARRAY(0x1b13f3a0) ARRAY(0x1b6af650) ARRAY(0x1b929600) -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jan 29 11:47:11 2015 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 29 Jan 2015 11:47:11 -0700 Subject: [maker-devel] Maker on Amazon EC2 Using Starcluster In-Reply-To: <1422394249179.2a90ef9d@Nodemailer> References: <73716718-1273-46F1-BC94-AAD276DFE0E1@gmail.com> <1422394249179.2a90ef9d@Nodemailer> Message-ID: I believe this may be caused by the latency of ansyncrounous operations on your network shared drive (which could have a lot of lag between operations when running in the cloud). Try using a single AWS instance in your test using the local drive as the working directory. Next try with two instances where one id the NFS server and you run MAKER on the other instance but on the network mounted drive. Then try gradually increasing the number of instances hitting the network shared drive. ?Carson > On Jan 27, 2015, at 2:30 PM, Jason Gallant wrote: > > Carson, > > Thanks for the input and the test script? I was successfully able to run Maker using OpenMPI on Starcluster. However, I am still receiving error messages fairly commonly? this is the error I described earlier in this thread. It seems to appear regardless of whether I use OpenMPI or MPICH2. > > Essentially, there seems to be an error collapsing BLAST reports. This error essentially causes maker to stop accepting new contigs on that machine (in this case node060), and maker continues to report every contig following this error as ?failed?. Otherwise, the other nodes seem to be working normally, but this error seems to be able to happen on other nodes as well, so the issue can compound. > > [1,15]:deleted:-60 hits > [1,15]:collecting blastx reports > [1,15]:ERROR: Could not colapse BLAST reports > [1,15]: at /root/maker/bin/../lib/GI.pm line 2524 thread 1. > [1,15]: GI::combine_blast_report(FastaChunk=HASH(0x1781acd8), ARRAY(0xc1e4fa8), ARRAY(0x15ab20d0), runlog=HASH(0xb87f878)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760 thread 1 > [1,15]: Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415 thread 1 > [1,15]: eval {...} called at /root/maker/bin/../lib/Error.pm line 407 thread 1 > [1,15]: Error::subs::try(CODE(0x198e22f8), HASH(0x9c9b65c0)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4224 thread 1 > [1,15]: Process::MpiChunk::_go(Process::MpiChunk=HASH(0x1b8a7cd0), "run", HASH(0x15e3e1a0), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341 thread 1 > [1,15]: Process::MpiChunk::run(Process::MpiChunk=HASH(0x1b8a7cd0), 15) called at /root/maker/bin/maker line 1457 thread 1 > [1,15]: main::node_thread("/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 > [1,15]: eval {...} called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 > [1,15]: threads::new("threads", CODE(0x36c9a98), "/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /root/maker/bin/maker line 917 thread 1 > [1,15]:--> rank=15, hostname=node015 > [1,15]:ERROR: Failed while collecting blastx reports > [1,15]:ERROR: Chunk failed at level:9, tier_type:3 > [1,15]:FAILED CONTIG:Scaffold66 > [1,15]: > [1,15]:ERROR: Chunk failed at level:4, tier_type:0 > [1,15]:FAILED CONTIG:Scaffold66 > [1,15]: > [1,15]:examining contents of the fasta file and run log > [1,15]:ERROR: could not make datastore directory > [1,15]:--> rank=15, hostname=node015 > [1,15]:ERROR: Failed while examining contents of the fasta file and run log > [1,15]:ERROR: Chunk failed at level:0, tier_type:0 > [1,15]:FAILED CONTIG:Scaffold483 > > ? > Dr. Jason R. Gallant > Assistant Professor > Room 38 Natural Sciences > Department of Zoology > Michigan State University > East Lansing, MI 48824 > jgallant at msu.edu > office: 517-884-7756 > > > On Fri, Jan 23, 2015 at 3:25 PM, Carson Holt > wrote: > > The complaining is because there is more than one MAKER process running and they are not connected via MPI. So the problem is OpenMPI. Try installing a small MPI script (like the one attached) and using that to test OpenMPI. Once it is configured correctly then each separate processes will communicate with each other (pay attention to comm size and rank messages). > > ?Carson > > > > > >> On Jan 23, 2015, at 1:15 PM, Jason Gallant > wrote: >> >> Hi Carson, >> >> Yes, I?ve tried that and still have the issue of maker complaining about multiple processes in the same directory. Other ideas? >> >> Best, >> Jason >> >> ? >> Dr. Jason R. Gallant >> Assistant Professor >> Room 38 Natural Sciences >> Department of Zoology >> Michigan State University >> East Lansing, MI 48824 >> jgallant at msu.edu >> office: 517-884-7756 >> >> >> On Fri, Jan 23, 2015 at 3:14 PM, Carson Holt > wrote: >> >> If using OpenMPI, make sure to set LD_PRELOAD to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/usr/local/openmpi/lib/libmpi.so). >> >> >> For OpenMPI you may also want to set OMPI_MCA_mpi_warn_on_fork=0 in your ~/.bash_profile to turn off certain nonfatal warnings. Also if jobs hang or freeze when using mpiexec under OpenMPI try adding the '-mca btl ^openib' flag to mpiexec command when running MAKER. >> >> Example: mpiexec -mca btl ^openib -n 20 maker >> >> ?Carson >> >> >> >>> On Jan 23, 2015, at 1:08 PM, Jason Gallant > wrote: >>> >>> Hi Carson, >>> >>> Yes, STARCLUSTER enables a global storage space, which is via NFS to an EBS drive that I?ve created. >>> >>> I?m using the local disk space on each instance for the /tmp directory, however. >>> >>> It occurred to me on reading the forums that MPICH2 doesn?t scale as well as OPENMPI, however when I try to configure Maker for openmpi and run it, I get complaints from maker that multiple makers are running in the same directory? >>> >>> Thanks for your advice! >>> >>> Best, >>> Jason >>> >>> ? >>> Dr. Jason R. Gallant >>> Assistant Professor >>> Room 38 Natural Sciences >>> Department of Zoology >>> Michigan State University >>> East Lansing, MI 48824 >>> jgallant at msu.edu >>> office: 517-884-7756 >>> >>> >>> On Fri, Jan 23, 2015 at 3:01 PM, Carson Holt > wrote: >>> >>> MAKER needs a global storage location. You probably need to set up one of your instances up to act as a shared storage server. AWS has lustre implementations for the cloud, perhaps you can try that. Also use OpenMPI instead of MPICH2. It?s more stable. >>> >>> I look forward to seeing how your experiment with AWS, MPI, and MAKER works out. >>> >>> ?Carson >>> >>> >>> >>> > On Jan 21, 2015, at 6:56 AM, Jason Gallant > wrote: >>> > >>> > Hi Everyone, >>> > >>> > I?m attempting to run Maker on Amazon EC2 using MIT?s starcluster? I?ve started a 200 node cluster, and enabled MPICH2 (Starcluster by default uses OpenMPI). I plan on documenting this setup once I?ve figured out how to run things reliably. >>> > >>> > I?m having a persistent issue where something fails on one of the nodes, and std error is flooded with: >>> > >>> > examining contents of the fasta file and run log >>> > [67] ERROR: could not make datastore directory >>> > [67] --> rank=67, hostname=node067 >>> > [67] ERROR: Failed while examining contents of the fasta file and run log >>> > [67] ERROR: Chunk failed at level:0, tier_type:0 >>> > [67] FAILED CONTIG:Scaffold261 >>> > >>> > This error repeats for each ?next? scaffold for some time. When I go back to find the ?source? of the error in the log, the following is the first error message on that node: >>> > >>> > 67] #-------------------------------# >>> > [67] deleted:-60 hits >>> > [67] collecting blastx reports >>> > [67] ERROR: Could not colapse BLAST reports >>> > [67] at /root/maker/bin/../lib/GI.pm line 2524 thread 1. >>> > [67] GI::combine_blast_report(FastaChunk=HASH(0x108e1a90), ARRAY(0x1b874938), ARRAY(0xf127ad8), runlog=HASH(0x4d54ed8)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760 thread 1 >>> > [67] Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415 thread 1 >>> > [67] eval {...} called at /root/maker/bin/../lib/Error.pm line 407 thread 1 >>> > [67] Error::subs::try(CODE(0x1514eb00), HASH(0x9cbeb568)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4215 thread 1 >>> > [67] Process::MpiChunk::_go(Process::MpiChunk=HASH(0x13976308), "run", HASH(0x12e04268), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341 thread 1 >>> > [67] Process::MpiChunk::run(Process::MpiChunk=HASH(0x13976308), 67) called at /root/maker/bin/maker line 1457 thread 1 >>> > [67] main::node_thread("/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 >>> > [67] eval {...} called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 >>> > [67] threads::new("threads", CODE(0x3dc5b38), "/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /root/maker/bin/maker line 917 thread 1 >>> > [67] --> rank=67, hostname=node067 >>> > [67] ERROR: Failed while collecting blastx reports >>> > [67] ERROR: Chunk failed at level:9, tier_type:3 >>> > [67] FAILED CONTIG:Scaffold66 >>> > [67] >>> > [67] ERROR: Chunk failed at level:4, tier_type:0 >>> > [67] FAILED CONTIG:Scaffold66 >>> > >>> > >>> > I?ve attempted to ignore the error to see if things will proceed on the other 199 processors. When I returned to the ?master? node after the evening, Maker keeps repeating the same error code over and over (same scaffold): >>> > ] examining contents of the fasta file and run log >>> > [67] ERROR: could not make datastore directory >>> > [67] --> rank=67, hostname=node067 >>> > [67] ERROR: Failed while examining contents of the fasta file and run log >>> > [67] ERROR: Chunk failed at level:0, tier_type:0 >>> > [67] FAILED CONTIG:Scaffold1589 >>> > >>> > I stop the job, and restart, and after only a few minutes of running, the same error is reported, this time on a new scaffold. Strangely here, the error is reported in the MPI tag of node001, but the error originates at node137: >>> > >>> > ERROR: Could not colapse BLAST reports >>> > [1] at /root/maker/bin/../lib/GI.pm line 2524. >>> > [1] GI::combine_blast_report(FastaChunk=HASH(0xf4aa9b8), ARRAY(0xf628f90), ARRAY(0x325fea78), runlog=HASH(0x133cc8e8)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760 >>> > [1] Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415 >>> > [1] eval {...} called at /root/maker/bin/../lib/Error.pm line 407 >>> > [1] Error::subs::try(CODE(0x352c9b8), HASH(0xdab3b690)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4215 >>> > [1] Process::MpiChunk::_go(Process::MpiChunk=HASH(0x3545d90), "run", HASH(0x30aa710), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341 >>> > [1] Process::MpiChunk::run(Process::MpiChunk=HASH(0x3545d90), 137) called at /root/maker/bin/maker line 979 >>> > [1] --> rank=137, hostname=node137 >>> > [1] ERROR: Failed while collecting blastx reports >>> > [1] ERROR: Chunk failed at level:9, tier_type:3 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] ERROR: Chunk failed at level:4, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > >>> > I?d appreciate any guidance as how best to diagnose this error! >>> > >>> > Many thanks, >>> > Jason Gallant >>> > >>> > >>> > >>> > >>> > ? >>> > Dr. Jason R. Gallant >>> > Assistant Professor >>> > Room 38 Natural Sciences >>> > Department of Zoology >>> > Michigan State University >>> > East Lansing, MI 48824 >>> > jgallant at msu.edu >>> > office: 517-884-7756 >>> > _______________________________________________ >>> > maker-devel mailing list >>> > maker-devel at box290.bluehost.com >>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jan 29 12:40:09 2015 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 29 Jan 2015 12:40:09 -0700 Subject: [maker-devel] GFF and Dereferencing problem In-Reply-To: <007c01d03bee$7ab100a0$701301e0$@ucdavis.edu> References: <007c01d03bee$7ab100a0$701301e0$@ucdavis.edu> Message-ID: <65C0DD7A-A3CA-4404-B15E-91B77DC6D8FE@gmail.com> Could you make sure you are using the most recent version of MAKER? There was a bug similar to this that was fixed some time ago. Current version is 2.31.8. also when rerunning with the most recent version of MAKER, make sure to set the -a flag on the command line to force rerun of logged data. ?Carson > On Jan 29, 2015, at 11:07 AM, Hamid Ashrafi wrote: > > Hi, > > After maker finishes its job it generates many files one of them is gff file. I see the following in some of my gff files. It seems it is a dereferencing problem. I am just wondering if affects my annotation. > > Hamid > > uti_cns_0004767 est2genome match_part 856428 856485 3090 + . ID=uti_cns_0004767:hsp:8340:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 > uti_cns_0004767 est2genome match_part 856587 856938 3090 + . ID=uti_cns_0004767:hsp:8341:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 > uti_cns_0004767 est2genome match_part 857053 857201 3090 + . ID=uti_cns_0004767:hsp:8342:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 > uti_cns_0004767 est2genome match_part 859004 859041 3090 + . ID=uti_cns_0004767:hsp:8343:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 > uti_cns_0004767 est2genome expressed_sequence_match 878327 878771 1446 + . ID=uti_cns_0004767:hit:4231:3.2.3.8;Name=Sp_Illum_Trans_W > uti_cns_0004767 est2genome match_part 878327 878771 1446 + . ID=uti_cns_0004767:hsp:8344:3.2.3.8;Parent=uti_cns_0004767:hit:4231:3.2.3 > uti_cns_0004767 est2genome expressed_sequence_match 884121 886610 2509 + . ID=uti_cns_0004767:hit:4232:3.2.3.8;Name=Sp_Illum_Trans_W > uti_cns_0004767 est2genome match_part 884121 884195 2509 + . ID=uti_cns_0004767:hsp:8345:3.2.3.8;Parent=uti_cns_0004767:hit:4232:3.2.3 > uti_cns_0004767 est2genome match_part 886180 886610 2509 + . ID=uti_cns_0004767:hsp:8346:3.2.3.8;Parent=uti_cns_0004767:hit:4232:3.2.3 > ARRAY(0x1b91f110) > ARRAY(0x1a686350) > ARRAY(0x1b06bba0) > ARRAY(0x1b931e10) > ARRAY(0x1b13f3a0) > ARRAY(0x1b6af650) > ARRAY(0x1b929600) > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 30 09:33:46 2015 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 30 Jan 2015 09:33:46 -0700 Subject: [maker-devel] How to improve the result of Maker In-Reply-To: References: Message-ID: <492A6635-67E9-4700-B544-E137C4248E55@gmail.com> See below ?> > I have join "Maker-devel" google group, but I don't known why I can't reply a topic and create a new topic. Is there some limitation? The google site is just a searchable archive of MAKER related e-mails. The actual conversations occur through the MAKER mailing list ?> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org E-mails sent to the list will be automatically archived on google. > I have finish genome annotation with Maker. I use SNAP and Augustus in Maker. I have some questions, could you help me? > > When gene finders have prediction at the same location, maker would choose the best prediction as final output, right? but if the prediction doesn't match evidence very much, how maker will synthesize the prediction with evidence? My knowledge about maker's action is as follow, I'm not sure whether it is right: > > assume that there is an exon existing in evidence but not in prediction, if the exon locate at the end of prediction, it will be output as UTR, but if the exon locate inside prediction, it will be ignored, and not be output, right? No. MAKER uses the introns and exons in the evidence alignments to provide hints to the gene predictors. Hints increases the probability scores of the HMM models by increasing the likelihood of the exon or intron state wherever it overlaps the evidence alignment. This process bumps up the likelihood values for models that better match the evidence alignments resulting in better models than SNAP and Augustus produce on their own without hints. Note that models are still governed by the same constraints of what constitutes an open reading frame and a splice site regardless of evidence alignments. This means that no amount of evidence based hints can overcome an assembly error. > for example: > > the exon pointed by red arrow. all evidences contain this exon, but it was missed in the final output. There are two possibilities. Given how different the snap and augustus models are from one another, this would suggest they have not been trained appropriately (for example if you are picking another related organisms parameter file rather than training these programs, there are several assumptions that are being made that can actually make such an approach almost worse than just picking a parameter file at random). But more likely the evidence supported exon breaks the reading frame of the model. This usually indicates that you have an assembly error (possibly issues with homopolymers). No amount of evidence support will allow you to call an exon that generates a mis-sense causing frameshift, so the predictors do the next most reasonable thing - they drop the exon if another model is tenable. More concerning would be the mRNA-seq alignments near the 3? end of the gene call. The structure suggests significant capture of background transcription with the mRNA-seq reads (long UTRs with weird mini-introns). I would suggest not using cufflinks in this case. You should probably go with an assembly based approach of mRNA-seq reads instead. I would suggest using trinity. It will reduce sensitivity but greatly increase evidence specificity which is where you need the most improvement based on these images. I would also suggest using the jaccard_clip option with trinity. I would further suggest looking at the model in question using apollo, and manually adding the exon (click and drag it into the model). You can examine the reading frame after adding the exon and see if it is in fact a frameshift assembly error. If it?s a homopolymer derived frameshift, then you can expect a lot more of these throughout your assembly. Also I do not see any protein alignments here? MAKER cannot work on transcript evidence alone. You need to provide the full proteome of at least two other species (they don?t have to be that closely related, but closer is better). Protein alignments will also help you better interpret the coding status of exons supported by mRNA-seq. For example in the second image, you would expect protein evidence to support all the coding exons but not the UTR exons which would remove any doubt as to whether an exon is really UTR or not. > In this example, long UTR is another issue, is it non-coding RNA? > > I have another example: > > > The yellow was evidencs from cufflinks. The final output choose the prediction from Augustus, but the last two exon was annotated as UTR, I thought UTR should be continuous, and should not contain intron. Actually UTR is not expected to be continuous and without introns. In fact the majority of alternate splicing events occur in the 5? UTR (not in the CDS) and 5? UTR commonly contain introns (just as we see here). This makes evolutionary sense. Alternatively spiced 5? UTR allows for differential and tissue specific control of the exact same protein by swapping out the upstream regulatory sequence. Alternate splicing of the 3? UTR on the other hand is less common (it?s involved in nonsense mediated decay and not so much in regulation of expression), but introns in the 3? UTR are still not uncommon. The mRNA-seq alignments suggests that those exons are transcribed, so unless there is an assembly error causing a framefhift in the CDS and an early stop codon, the 3? UTR would be correct. If you had protein alignments from another species here, then you could see which exons they support as being coding exons. Thanks, Carson -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Fri Jan 30 21:48:33 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Sat, 31 Jan 2015 15:48:33 +1100 Subject: [maker-devel] genome duplication? Message-ID: Hi all, One of the fungal genomes I'm annotating is relatively shattered (?), with many contigs/scaffolds and based on CEGMA analysis only may indicate a potential widespread duplication of the genome # Statistics of the completeness of the genome based on 248 CEGs # > #Prots %Completeness - #Total Average %Ortho > > Complete 181 72.98 - 365 2.02 67.40 > Group 1 54 81.82 - 105 1.94 66.67 > Group 2 39 69.64 - 86 2.21 71.79 > Group 3 45 73.77 - 86 1.91 57.78 > Group 4 43 66.15 - 88 2.05 74.42 > Partial 230 92.74 - 528 2.30 77.83 > Group 1 61 92.42 - 140 2.30 72.13 > Group 2 53 94.64 - 127 2.40 84.91 > Group 3 56 91.80 - 126 2.25 69.64 > Group 4 60 92.31 - 135 2.25 85.00 The expected genome size is relatively low (~42 Mb by abyss-fac) in comparison with *Hortaea werneckii* (51.6Mb, 23333 genes), a related fungi with nearly 90% of its genes present in at least two copies. Paper: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0071328 Now to the Maker part... So, as part of the Maker annotation, I trained SNAP and Augustus, and I generated a specific RepeatModeler library. I recorded the predicted outputs from each Maker run (AED, number of predicted proteins and transcripts...). Both Augustus and SNAP used to give quite high number (~19000 and ~23000 respectively) in comparison with the xxx.all.maker.proteins.fasta (about 13600). So, my first question is, how does maker deal with gene duplications? Or is this just a phenomenon given that there is no support from the protein files provided initially to Maker? I've used 4 different protein files for the annotation, could it be that they weren't the best choices? I picked them from the closest relatives and similar environments So, in my last run I turn the keep_preds=1 and the proteins in the xxx.all.maker.proteins.fasta reached to Last question regarding the protein files. I download the annotated genomes from the JGI and most of them have two annotation folders "All_models,_Filtered_and_Not" and "Filtered_Models___best__". I've been using the protein files found in the later as I expected to have real evidence and a lower chance of being predicting false genes. Am I right? Thank you in advance, Xabier -- Xabier V?zquez Campos PhD Candidate Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Sat Jan 31 01:42:51 2015 From: mikael.durling at slu.se (=?utf-8?B?TWlrYWVsIEJyYW5kc3Ryw7ZtIER1cmxpbmc=?=) Date: Sat, 31 Jan 2015 08:42:51 +0000 Subject: [maker-devel] genome duplication? In-Reply-To: References: Message-ID: Hi Xabier, 31 jan 2015 kl. 05:48 skrev Xabier V?zquez Campos >: Hi all, One of the fungal genomes I'm annotating is relatively shattered (?), with many contigs/scaffolds and based on CEGMA analysis only may indicate a potential widespread duplication of the genome # Statistics of the completeness of the genome based on 248 CEGs # #Prots %Completeness - #Total Average %Ortho Complete 181 72.98 - 365 2.02 67.40 Partial 230 92.74 - 528 2.30 77.83 Judging from these figure, you seem to have a very fragmented assembly? What N50 have you reached? According to my experience, assemblies with an N50 below 5-10 times the average gene length tend to give problems in producing good gene sets. Not to say that the gene sets are unusable, but for comparing e.g. gene complements to other species, it will be hard to draw any conclusions when a high proportion of the genes are incomplete. The expected genome size is relatively low (~42 Mb by abyss-fac) in comparison with Hortaea werneckii (51.6Mb, 23333 genes), a related fungi with nearly 90% of its genes present in at least two copies. Paper: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0071328 Now to the Maker part... So, as part of the Maker annotation, I trained SNAP and Augustus, and I generated a specific RepeatModeler library. I recorded the predicted outputs from each Maker run (AED, number of predicted proteins and transcripts...). Both Augustus and SNAP used to give quite high number (~19000 and ~23000 respectively) in comparison with the xxx.all.maker.proteins.fasta (about 13600). So, my first question is, how does maker deal with gene duplications? Or is this just a phenomenon given that there is no support from the protein files provided initially to Maker? I've used 4 different protein files for the annotation, could it be that they weren't the best choices? I picked them from the closest relatives and similar environments Unless you by mistake filter out duplicated gene families as repeats with repeat modeler, maker should not care about duplicated genes. However, maker, without keep_preds=1, reports only genes with some kind of support (be it EST or protein homology). This is rather conservative, but if you enable keep_preds, you will get more genes as you have noted. Just for the sake of comparison, I have reannotad more than ten genomes downloaded from JGI, providing MAKER with similar evidence as JGI, and consistently, MAKER is reporting fewer gene models. I have yet to do a more thorough comparison to tell what genes JGI are reporting that don?t appear in the MAKER annotations. So, in my last run I turn the keep_preds=1 and the proteins in the xxx.all.maker.proteins.fasta reached to Last question regarding the protein files. I download the annotated genomes from the JGI and most of them have two annotation folders "All_models,_Filtered_and_Not" and "Filtered_Models___best__". I've been using the protein files found in the later as I expected to have real evidence and a lower chance of being predicting false genes. Am I right? Yes, I would say so. The FilteredModels have passed through their model selection pipeline, while all_models contains models from all predictors, as well as combinations of predictors and EST evidence. Just some 2 cents of observations of mine, cheers, Mikael Thank you in advance, Xabier -- Xabier V?zquez Campos PhD Candidate Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Sat Jan 31 01:51:36 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Sat, 31 Jan 2015 19:51:36 +1100 Subject: [maker-devel] genome duplication? In-Reply-To: References: Message-ID: Thanks Mikael, This are the assembly stats as taken from abyss-fac, indeed it isn't a great N50, but it isn't that bad either n n:500 n:N50 min N80 N50 N20 E-size max sum 14277 7099 1185 500 4698 10771 20438 14530 154519 42.68e6 2015-01-31 19:42 GMT+11:00 Mikael Brandstr?m Durling : > Hi Xabier, > > 31 jan 2015 kl. 05:48 skrev Xabier V?zquez Campos : > > Hi all, > > One of the fungal genomes I'm annotating is relatively shattered (?), with > many contigs/scaffolds and based on CEGMA analysis only may indicate a > potential widespread duplication of the genome > > # Statistics of the completeness of the genome based on 248 CEGs >> # >> #Prots %Completeness - #Total Average %Ortho >> >> Complete 181 72.98 - 365 2.02 67.40 >> Partial 230 92.74 - 528 2.30 77.83 >> > > > Judging from these figure, you seem to have a very fragmented assembly? > What N50 have you reached? According to my experience, assemblies with an > N50 below 5-10 times the average gene length tend to give problems in > producing good gene sets. Not to say that the gene sets are unusable, but > for comparing e.g. gene complements to other species, it will be hard to > draw any conclusions when a high proportion of the genes are incomplete. > > The expected genome size is relatively low (~42 Mb by abyss-fac) in > comparison with *Hortaea werneckii* (51.6Mb, 23333 genes), a related > fungi with nearly 90% of its genes present in at least two copies. > Paper: > http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0071328 > > Now to the Maker part... So, as part of the Maker annotation, I trained > SNAP and Augustus, and I generated a specific RepeatModeler library. I > recorded the predicted outputs from each Maker run (AED, number of > predicted proteins and transcripts...). Both Augustus and SNAP used to give > quite high number (~19000 and ~23000 respectively) in comparison with the > xxx.all.maker.proteins.fasta (about 13600). So, my first question is, how > does maker deal with gene duplications? Or is this just a phenomenon given > that there is no support from the protein files provided initially to > Maker? I've used 4 different protein files for the annotation, could it be > that they weren't the best choices? I picked them from the closest > relatives and similar environments > > > Unless you by mistake filter out duplicated gene families as repeats > with repeat modeler, maker should not care about duplicated genes. However, > maker, without keep_preds=1, reports only genes with some kind of support > (be it EST or protein homology). This is rather conservative, but if you > enable keep_preds, you will get more genes as you have noted. Just for the > sake of comparison, I have reannotad more than ten genomes downloaded from > JGI, providing MAKER with similar evidence as JGI, and consistently, MAKER > is reporting fewer gene models. I have yet to do a more thorough comparison > to tell what genes JGI are reporting that don?t appear in the MAKER > annotations. > > > So, in my last run I turn the keep_preds=1 and the proteins in the > xxx.all.maker.proteins.fasta reached to > > Last question regarding the protein files. I download the annotated > genomes from the JGI and most of them have two annotation folders > "All_models,_Filtered_and_Not" and "Filtered_Models___best__". I've been > using the protein files found in the later as I expected to have real > evidence and a lower chance of being predicting false genes. Am I right? > > > Yes, I would say so. The FilteredModels have passed through their model > selection pipeline, while all_models contains models from all predictors, > as well as combinations of predictors and EST evidence. > > Just some 2 cents of observations of mine, > cheers, > Mikael > > > Thank you in advance, > > Xabier > > > -- > Xabier V?zquez Campos > PhD Candidate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From chenwenbo1020 at gmail.com Sat Jan 31 08:54:28 2015 From: chenwenbo1020 at gmail.com (=?UTF-8?B?6ZmI5paH5Y2a?=) Date: Sat, 31 Jan 2015 10:54:28 -0500 Subject: [maker-devel] How to improve the result of Maker In-Reply-To: <492A6635-67E9-4700-B544-E137C4248E55@gmail.com> References: <492A6635-67E9-4700-B544-E137C4248E55@gmail.com> Message-ID: > > > There are two possibilities. Given how different the snap and augustus > models are from one another, this would suggest they have not been trained > appropriately (for example if you are picking another related organisms > parameter file rather than training these programs, there are several > assumptions that are being made that can actually make such an approach > almost worse than just picking a parameter file at random). But more likely > the evidence supported exon breaks the reading frame of the model. This > usually indicates that you have an assembly error (possibly issues with > homopolymers). No amount of evidence support will allow you to call an > exon that generates a mis-sense causing frameshift, so the predictors do > the next most reasonable thing - they drop the exon if another model is > tenable. More concerning would be the mRNA-seq alignments near the 3? end > of the gene call. The structure suggests significant capture of background > transcription with the mRNA-seq reads (long UTRs with weird mini-introns). > I would suggest not using cufflinks in this case. You should probably go > with an assembly based approach of mRNA-seq reads instead. I would suggest > using trinity. It will reduce sensitivity but greatly increase evidence > specificity which is where you need the most improvement based on these > images. I would also suggest using the jaccard_clip option with trinity. > > I would further suggest looking at the model in question using apollo, and > manually adding the exon (click and drag it into the model). You can > examine the reading frame after adding the exon and see if it is in fact a > frameshift assembly error. If it?s a homopolymer derived frameshift, then > you can expect a lot more of these throughout your assembly. > I drag the exon into the model, there is a stop codon in it, it causes the region behind it become UTR, here: [image: ???? 1] the question exon was pointed by red arrow. But the uppermost evidence is the completed EST from NCBI, and it contains start and stop codon. Then I noticed the 5' boundary of the 2nd codon in model is not the same as EST, so it makes frameshift, and cause the stop codon in the exon pointed by red arrow. The first exon should not be CDS, as there would be a start codon in 2nd exon if its 5' boundary is predicted correctly. Would "always_complete=1" fix it? I will try to use trinity. > > Also I do not see any protein alignments here? MAKER cannot work on > transcript evidence alone. You need to provide the full proteome of at > least two other species (they don?t have to be that closely related, but > closer is better). Protein alignments will also help you better interpret > the coding status of exons supported by mRNA-seq. For example in the second > image, you would expect protein evidence to support all the coding exons > but not the UTR exons which would remove any doubt as to whether an exon is > really UTR or not. > I did use 3 sources of protein evidence, one is proteome from related species, and one is proteome from fruitfly, and the last one is Swiss-prot. Thank you very much! Best regards, Wenbo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 10308 bytes Desc: not available URL: From jason.stajich at gmail.com Sat Jan 31 16:21:12 2015 From: jason.stajich at gmail.com (Jason Stajich) Date: Sat, 31 Jan 2015 15:21:12 -0800 Subject: [maker-devel] genome duplication? In-Reply-To: References: Message-ID: Xabier - FYI - though you probably already compared, those stats are on par with the Hortaea v1 assembly, (we do have an improved Hortaea assembly now and genome size is still same range supporting the duplication hypothesis) Hw version 1 asmbly - N50 9623; Max 71563 CEGMA for Hw1 #Prots %Completeness - #Total Average %Ortho Complete 196 79.03 - 498 2.54 81.12 Partial 228 91.94 - 673 2.95 95.18 Mikael - yes - we should compare notes on the models JGI is calling which have little support in MAKER - I am not sure if their pipeline runs with augustus/snap using informant hints though usually they are bringing RNAseq into the mix - I don't know if your approach for reannotation assembled the RNAseq and used it as evidence? We'll be trying to assess some of this when comparisons of proportion of shared genes in the first 1KFG paper so we may be able to say with more certainty of these extra predictions whether they are shared more widely and get a handle on singleton/false positives rates. Jason Jason Stajich jason.stajich at gmail.com On Sat, Jan 31, 2015 at 12:51 AM, Xabier V?zquez Campos wrote: > Thanks Mikael, > > This are the assembly stats as taken from abyss-fac, indeed it isn't a > great N50, but it isn't that bad either > > n n:500 n:N50 min N80 N50 N20 E-size > max sum > 14277 7099 1185 500 4698 10771 20438 14530 154519 > 42.68e6 > > > > 2015-01-31 19:42 GMT+11:00 Mikael Brandstr?m Durling < > mikael.durling at slu.se>: > >> Hi Xabier, >> >> 31 jan 2015 kl. 05:48 skrev Xabier V?zquez Campos : >> >> Hi all, >> >> One of the fungal genomes I'm annotating is relatively shattered (?), >> with many contigs/scaffolds and based on CEGMA analysis only may indicate a >> potential widespread duplication of the genome >> >> # Statistics of the completeness of the genome based on 248 CEGs >>> # >>> #Prots %Completeness - #Total Average %Ortho >>> >>> Complete 181 72.98 - 365 2.02 67.40 >>> Partial 230 92.74 - 528 2.30 77.83 >>> >> >> >> Judging from these figure, you seem to have a very fragmented assembly? >> What N50 have you reached? According to my experience, assemblies with an >> N50 below 5-10 times the average gene length tend to give problems in >> producing good gene sets. Not to say that the gene sets are unusable, but >> for comparing e.g. gene complements to other species, it will be hard to >> draw any conclusions when a high proportion of the genes are incomplete. >> >> The expected genome size is relatively low (~42 Mb by abyss-fac) in >> comparison with *Hortaea werneckii* (51.6Mb, 23333 genes), a related >> fungi with nearly 90% of its genes present in at least two copies. >> Paper: >> http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0071328 >> >> Now to the Maker part... So, as part of the Maker annotation, I trained >> SNAP and Augustus, and I generated a specific RepeatModeler library. I >> recorded the predicted outputs from each Maker run (AED, number of >> predicted proteins and transcripts...). Both Augustus and SNAP used to give >> quite high number (~19000 and ~23000 respectively) in comparison with the >> xxx.all.maker.proteins.fasta (about 13600). So, my first question is, how >> does maker deal with gene duplications? Or is this just a phenomenon given >> that there is no support from the protein files provided initially to >> Maker? I've used 4 different protein files for the annotation, could it be >> that they weren't the best choices? I picked them from the closest >> relatives and similar environments >> >> >> Unless you by mistake filter out duplicated gene families as repeats >> with repeat modeler, maker should not care about duplicated genes. However, >> maker, without keep_preds=1, reports only genes with some kind of support >> (be it EST or protein homology). This is rather conservative, but if you >> enable keep_preds, you will get more genes as you have noted. Just for the >> sake of comparison, I have reannotad more than ten genomes downloaded from >> JGI, providing MAKER with similar evidence as JGI, and consistently, MAKER >> is reporting fewer gene models. I have yet to do a more thorough comparison >> to tell what genes JGI are reporting that don?t appear in the MAKER >> annotations. >> >> >> So, in my last run I turn the keep_preds=1 and the proteins in the >> xxx.all.maker.proteins.fasta reached to >> >> Last question regarding the protein files. I download the annotated >> genomes from the JGI and most of them have two annotation folders >> "All_models,_Filtered_and_Not" and "Filtered_Models___best__". I've been >> using the protein files found in the later as I expected to have real >> evidence and a lower chance of being predicting false genes. Am I right? >> >> >> Yes, I would say so. The FilteredModels have passed through their model >> selection pipeline, while all_models contains models from all predictors, >> as well as combinations of predictors and EST evidence. >> >> Just some 2 cents of observations of mine, >> cheers, >> Mikael >> >> >> Thank you in advance, >> >> Xabier >> >> >> -- >> Xabier V?zquez Campos >> PhD Candidate >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > > -- > Xabier V?zquez Campos > *PhD Candidate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Jan 5 19:59:23 2015 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 5 Jan 2015 19:59:23 -0700 Subject: [maker-devel] some problems using MAKER In-Reply-To: References: Message-ID: <08B46BBA-522B-43BC-9E82-57F641E0127D@gmail.com> I?d have to see the two GFF3 files you are using for your comparison. However one thing that comes to mind is that you may be unfamiliar with eval?s output. Eval provides several levels of strictness in the report at the gene, transcript, exon, and base pair levels. If you are using the gene level strictness in the report for example, then a single base pair difference in any of the transcripts will cause the entire gene to be considered a miss-match. You really only should use the base pair level SN/SP strictness for your comparison which will be in the eval report. In the most extreme case an exon level SN/SP strictness may be used, but in general no gold standard dataset is considered perfect enough to use the gene level SN/SP (or usually even the exon level strictness). ?Carson > On Dec 31, 2014, at 6:48 PM, ?? wrote: > > Hi all, > > Recently I'm using MAKER to annotate a single chromosome of rice as a pre-experiment. And I'm confronting some problems. After the annotation when I run the evaluation of eval between my result and gold standard, the gene sensitivity&specificity is only around 20%. And after I added the gff3 file maker made itself to run maker again, I found that the result is worse than 20%. > > My input is a Trinity-processed RNA-seq file and a protein file. I chose snap, augustus and genemark as ab initio predictors. > > I paste my maker_opts.ctl here: > > #-----Genome (these are always required) > genome=chr12.fasta #genome sequence (fasta file or fasta embeded in GFF3 file) > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff=chr12.gff #MAKER derived GFF3 file > est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est=rna-seq_trinity.fasta #set of ESTs or assembled mRNA-seq in fasta format > altest= #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at least one) > protein=Osativa_193_peptide.fa #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org=Rice #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker > repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no > softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) > > #-----Gene Prediction > snaphmm=rice #SNAP HMM file > gmhmm=/lustre/home/clswcc/yzhao/MAKER/maker/exe/genemark_hmm_euk_linux_64/ehmm/o_sativa.mod #GeneMark HMM file > augustus_species=arabidopsis #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff=augus.gff3 #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) > est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=1 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases > cpus=16 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) > > > Could you help me? Thank you !!! > > > > -- > Yue Zhao (Jerry) > Bachelor Candidate of Plant Biotechnology > Researcher in UCLA-CSST program > Shanghai Jiao Tong University, Shanghai > jerryzhaosjtu at gmail.com _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jerryzhaosjtu at gmail.com Wed Jan 7 04:16:45 2015 From: jerryzhaosjtu at gmail.com (=?UTF-8?B?6LW16LaK?=) Date: Wed, 7 Jan 2015 19:16:45 +0800 Subject: [maker-devel] using MAKER with MPI Message-ID: Greetings, Can I use mpirun instead of mpiexec? Thank you!! -- *Yue Zhao (Jerry)* Bachelor Candidate of Plant Biotechnology Researcher in UCLA-CSST program Shanghai Jiao Tong University, Shanghai *jerryzhaosjtu at gmail.com * -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jan 7 09:13:50 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 7 Jan 2015 09:13:50 -0700 Subject: [maker-devel] using MAKER with MPI In-Reply-To: References: Message-ID: Yes they are interchangeable. In fact in OpenMPI both mpiexec and mpirun are softlinks to the exact same executable ?> orterun Just remember MAKER works which MPICH2/3 and OpenMPI flavors of MPI but not with MVAPICH2. Also If using MPICH, make sure to enable shared libaries during installation (this is not the default). If using OpenMPI, make sure to set LD_PRELOAD to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/usr/local/openmpi/lib/libmpi.so). If jobs hang or freeze when using OpenMPI try adding the '-mca btl ^openib' flag to the mpiexec command when running MAKER. Example: mpiexec -mca btl ^openib -n 20 maker ?Carson > On Jan 7, 2015, at 4:16 AM, ?? wrote: > > Greetings, > > Can I use mpirun instead of mpiexec? Thank you!! > > -- > Yue Zhao (Jerry) > Bachelor Candidate of Plant Biotechnology > Researcher in UCLA-CSST program > Shanghai Jiao Tong University, Shanghai > jerryzhaosjtu at gmail.com _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jan 8 08:47:29 2015 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 8 Jan 2015 08:47:29 -0700 Subject: [maker-devel] MAKER mpi running wrong In-Reply-To: References: Message-ID: <13241A86-804F-4674-A8FD-CA90026CF4AF@gmail.com> When running large jobs in MPI semi-random issues can arise as well as tuning issues where hardware configuration, IO performance, buffer sizes etc. all play a role. Using one of the NIH flagship clusters from XSEDE for example, I can run on over 2000 CPUs without issue. But the IT specialists with XSEDE have also spent a lot of time tuning MPI by enabling and disabling certain options for their hardware and network configuration (The IT specialists for the XSEDE project are actually the developers for many of the MPI flavors available, so they actually wrote MPI to work really well on this specific cluster). On other clusters I can?t go over 200 cpus on a single job. Or on another XSEDE cluster I can run on exactly 1424 CPUs. If I increase by a single CPU, the jobs always fails. For these kinds of issues you would have to delve into some of the more obscure parameters of OpenMPI via trial and error (http://www.open-mpi.org/doc/ ). What happens under the hood in OpenMPI is that different buffer sizes and network communication strategies are triggered as the number of nodes increases, so you can often identify a specific CPU count that is stable, and going one over that number causes a failure. You then look in the documentation for a parameter that matches that trigger value and alter it higher or lower. Or if you can identify the stable CPU count, then just submit multiple jobs at exactly that CPU count. ?Carson > On Jan 8, 2015, at 8:27 AM, ?? wrote: > > Hi Carson, > > After using the flag in your example, the warning after runing MAKER was gone, yet after running with MPI in 512 threads for 2 hours, MAKER 'Exited with exit code 1' The stdout info is as followed: > > [node206][[7968,1],269][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104) > [node206][[7968,1],269][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104) > SIGTERM received > Perl exited with active threads: > 1 running and unjoined > 0 finished and unjoined > 0 running and detached > > Also, my job submission is like: > > #BSUB -J maker_mpi > #BSUB -n 512 > #BSUB -R "span[ptile=16]" > module purge && module load gcc/4.9.1 openmpi/gcc/1.6.5 > mpiexec -mca btl ^openib -n 512 perl /lustre/home/clswcc/yzhao/MAKER/maker/bin/maker -fix_nucleotides > > > Could you help me find out where is going wrong? The stdout at first is normal as followd : > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /lustre/home/clswcc/SOP_1Krice/gene_prediction/mpi/unaln.maker.output/unaln_datastore > > To access files for individual sequences use the datastore index: > /lustre/home/clswcc/SOP_1Krice/gene_prediction/mpi/unaln.maker.output/unaln_master_datastore_index.log > > STATUS: Now running MAKER... > > > > > Regards, > yue > > -- > Yue Zhao (Jerry) > Bachelor Candidate of Plant Biotechnology > Researcher in UCLA-CSST program > Shanghai Jiao Tong University, Shanghai > jerryzhaosjtu at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Wed Jan 14 01:40:38 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Wed, 14 Jan 2015 19:40:38 +1100 Subject: [maker-devel] doubt about selection of the best model Message-ID: Hi Maker developers and users, After quite a bit of time dealing with Maker, I can run it without problems (thank you Carson). However, I have doubts about the evaluation of the best model produced by Maker. I found the AED_cdf_generator.pl script while searching in the mail list and it is great but, when you use it, what gff files are you comparing? I initially thought that the models to be compared where those from each *ab initio* program, e.g. SNAP vs Augustus, and inside them, the subsequent bootstrap training steps, but unless you run only one each time you run Maker, the XXX.all.gff file will contain data from both predictions. Should I run them individually? Following the topic, Maker will generate different FASTA files for proteins and transcripts from each program (Maker and each *ab initio* predictor) as well as "non_overlapping" files. Which one(s) do you select to continue with the functional annotation? Thank you in advance, Xabier -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Wed Jan 14 01:49:34 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Wed, 14 Jan 2015 19:49:34 +1100 Subject: [maker-devel] Augustus retraining?? Message-ID: Hi, I trained Augustus using the output of CEGMA ( http://bioinf.uni-greifswald.de/bioinf/wiki/pmwiki.php?n=Augustus.CEGMATraining) through WebAugustus, which makes the training very easy but, and here is my question, can/should I re-train Augustus like it is done with SNAP? And what would I use for the re-training Thank you, Xabier -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Wed Jan 14 03:08:33 2015 From: mikael.durling at slu.se (=?utf-8?B?TWlrYWVsIEJyYW5kc3Ryw7ZtIER1cmxpbmc=?=) Date: Wed, 14 Jan 2015 10:08:33 +0000 Subject: [maker-devel] Augustus retraining?? In-Reply-To: References: Message-ID: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> Hi, 14 jan 2015 kl. 09:49 skrev Xabier V?zquez Campos >: Hi, I trained Augustus using the output of CEGMA (http://bioinf.uni-greifswald.de/bioinf/wiki/pmwiki.php?n=Augustus.CEGMATraining) through WebAugustus, which makes the training very easy but, and here is my question, can/should I re-train Augustus like it is done with SNAP? And what would I use for the re-training I?ve tried an approach of retraining augustus in a manner similar to what has been suggested here earlier for retraining of SNAP. This has been run with a local augustus installation as part of an automated framework I have set up to annotate fungal genomes. Interestingly, augustus seems to converge very quickly. It is not uncommon that autoAugustus reports that it could not improve the initial models that were derived from the CEGMA dataset. Are there other similar experiences on the list? I also a modified version of maker2zff which I call maker2augustus_gff which extracts an evidence set for augustus retraining from the initial round of maker. I?m happy to share it with anyone interested. cheers, Mikael Thank you, Xabier -- Xabier V?zquez Campos PhD Candidate Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jan 14 08:22:57 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 Jan 2015 08:22:57 -0700 Subject: [maker-devel] Augustus retraining?? In-Reply-To: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> References: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> Message-ID: <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> Here is some info on training SNAP via the bootstrap technique (i.e. using the models produced by the initial training to seed the next round of training). Even though the examples use SNAP, it would be applicable using the scripts and methods Mikael described in his w-mail ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors Also Jason Stajich wrote an excellent explanation on training Augustus on the GMOD mailing list ?> http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html He also included his own scripts to assist with the training ?> https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2augustus_gbk.pl ?Carson > On Jan 14, 2015, at 3:08 AM, Mikael Brandstr?m Durling wrote: > > Hi, > > >> 14 jan 2015 kl. 09:49 skrev Xabier V?zquez Campos >: >> >> Hi, >> >> I trained Augustus using the output of CEGMA (http://bioinf.uni-greifswald.de/bioinf/wiki/pmwiki.php?n=Augustus.CEGMATraining ) through WebAugustus, which makes the training very easy but, and here is my question, can/should I re-train Augustus like it is done with SNAP? And what would I use for the re-training > > I?ve tried an approach of retraining augustus in a manner similar to what has been suggested here earlier for retraining of SNAP. This has been run with a local augustus installation as part of an automated framework I have set up to annotate fungal genomes. Interestingly, augustus seems to converge very quickly. It is not uncommon that autoAugustus reports that it could not improve the initial models that were derived from the CEGMA dataset. Are there other similar experiences on the list? > > I also a modified version of maker2zff which I call maker2augustus_gff which extracts an evidence set for augustus retraining from the initial round of maker. I?m happy to share it with anyone interested. > > cheers, > Mikael > > >> >> Thank you, >> >> Xabier >> -- >> Xabier V?zquez Campos >> PhD Candidate >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jan 14 08:37:43 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 Jan 2015 08:37:43 -0700 Subject: [maker-devel] doubt about selection of the best model In-Reply-To: References: Message-ID: The MAKER models will be the final models. Fasta files and features from the raw ab initio gene predictors on the other hand are there for reference purposes only and unless you have a need for them should be ignored. MAKER models are the combination of ab initio gene predictions filtered for best evidence match together with hint based models from the predictors. Basically MAKER took the best models from each separate predictor and created a final consensus gene set. The CDF generator really is for comparison of how evidence match changes between different releases of the genome or for different parameter options (i.e. you are comparing curves between independent MAKER runs and not within a single MAKER run). THE AED CDF curve is interpreted similar to a ROC curve in that shifts up and to the left indicate improved gene models. This is as opposed to using sensitivity and specificity, because those measures require you to already know the correct models in order to generate a comparison. For de-novo annotation that is impossible (if you already had the correct models you wouldn?t be running MAKER), so since such values cannot be generated then AED which used evidence overlap acts as a proxy measurement. This paper probably gives the overall best example of how AED correlates with model quality (Figures 2 and 3) ?> http://www.biomedcentral.com/1471-2105/12/491 ?Carson > On Jan 14, 2015, at 1:40 AM, Xabier V?zquez Campos wrote: > > Hi Maker developers and users, > > After quite a bit of time dealing with Maker, I can run it without problems (thank you Carson). However, I have doubts about the evaluation of the best model produced by Maker. > > I found the AED_cdf_generator.pl script while searching in the mail list and it is great but, when you use it, what gff files are you comparing? I initially thought that the models to be compared where those from each ab initio program, e.g. SNAP vs Augustus, and inside them, the subsequent bootstrap training steps, but unless you run only one each time you run Maker, the XXX.all.gff file will contain data from both predictions. Should I run them individually? > > Following the topic, Maker will generate different FASTA files for proteins and transcripts from each program (Maker and each ab initio predictor) as well as "non_overlapping" files. Which one(s) do you select to continue with the functional annotation? > > Thank you in advance, > > Xabier > > -- > Xabier V?zquez Campos > PhD Candidate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Fri Jan 16 01:09:11 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Fri, 16 Jan 2015 19:09:11 +1100 Subject: [maker-devel] functional annotation Message-ID: Hi, What file from the Maker output do you use for the functional annotation? The fasta part of the XXX.all.gff? I'll probably be using BLAST and InterProScan. I tested B2go (basic version), good stuff but it is annoyingly slow. Thank you -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Fri Jan 16 03:11:21 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Fri, 16 Jan 2015 21:11:21 +1100 Subject: [maker-devel] repeat masking and repeat libraries Message-ID: Hi there, First, a general question. Probably kind of silly but I prefer to be sure... When you browse RepBase, for example in fungi, all the repeats are marked as Eukaryota (Ancestral) or under the name of the species but no other taxa ranks are indicated. Does RepeatMasker recognise orders, families etc? or in my case should I stick with model_org=fungi? I've been trying to create a repeat libraries specific for my genomes and I did't have any luck with the programs described in the Basic and advanced tutorials (neither in my computer or in the cluster), reporting errors at all times, with exception of RepeatModeler, which ran with no problems. Is the output from RepeatModeler enough to improve the masking? It is not the best option I guess, but better than just the RepBase libraries by themselves, isn't it? Thank you for your time, Xabier -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Fri Jan 16 10:01:37 2015 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Fri, 16 Jan 2015 10:01:37 -0700 Subject: [maker-devel] functional annotation In-Reply-To: References: Message-ID: Hi Xabier, The FASTA at the end of the GFF3 file is the genome. For functional annotation you want to use the XXXout.all.maker.proteins.fasta file. It contains the protein sequences for your MAKER gene models. Good luck, Mike On Fri, Jan 16, 2015 at 1:09 AM, Xabier V?zquez Campos wrote: > Hi, > > What file from the Maker output do you use for the functional annotation? > The fasta part of the XXX.all.gff? > > I'll probably be using BLAST and InterProScan. I tested B2go (basic > version), good stuff but it is annoyingly slow. > > Thank you > > -- > Xabier V?zquez Campos > *PhD Candidate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 16 10:04:09 2015 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 16 Jan 2015 10:04:09 -0700 Subject: [maker-devel] repeat masking and repeat libraries In-Reply-To: References: Message-ID: Using both RepBase and a RepeatModeler produced library should be sufficient, especially for fungi. ?Carson > On Jan 16, 2015, at 3:11 AM, Xabier V?zquez Campos wrote: > > Hi there, > > First, a general question. Probably kind of silly but I prefer to be sure... When you browse RepBase, for example in fungi, all the repeats are marked as Eukaryota (Ancestral) or under the name of the species but no other taxa ranks are indicated. Does RepeatMasker recognise orders, families etc? or in my case should I stick with model_org=fungi? > > I've been trying to create a repeat libraries specific for my genomes and I did't have any luck with the programs described in the Basic and advanced tutorials (neither in my computer or in the cluster), reporting errors at all times, with exception of RepeatModeler, which ran with no problems. Is the output from RepeatModeler enough to improve the masking? It is not the best option I guess, but better than just the RepBase libraries by themselves, isn't it? > > Thank you for your time, > > Xabier > > -- > Xabier V?zquez Campos > PhD Candidate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Fri Jan 16 10:08:43 2015 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Fri, 16 Jan 2015 10:08:43 -0700 Subject: [maker-devel] repeat masking and repeat libraries In-Reply-To: References: Message-ID: Hi Xabier, I haven't seen orders or families documented for repeatmasker with repbase. Fungi seems safe to me. If you want to give yourself a little more peace of mind about the repeatmodeler library you can blast it to database of known fungal proteins and remove the entries int he library that have strong hits to a known protein to avoid over-masking. Mike On Fri, Jan 16, 2015 at 10:04 AM, Carson Holt wrote: > Using both RepBase and a RepeatModeler produced library should be > sufficient, especially for fungi. > > ?Carson > > > On Jan 16, 2015, at 3:11 AM, Xabier V?zquez Campos > wrote: > > Hi there, > > First, a general question. Probably kind of silly but I prefer to be > sure... When you browse RepBase, for example in fungi, all the repeats are > marked as Eukaryota (Ancestral) or under the name of the species but no > other taxa ranks are indicated. Does RepeatMasker recognise orders, > families etc? or in my case should I stick with model_org=fungi? > > I've been trying to create a repeat libraries specific for my genomes and > I did't have any luck with the programs described in the Basic > > and advanced > > tutorials (neither in my computer or in the cluster), reporting errors at > all times, with exception of RepeatModeler, which ran with no problems. Is > the output from RepeatModeler enough to improve the masking? It is not the > best option I guess, but better than just the RepBase libraries by > themselves, isn't it? > > Thank you for your time, > > Xabier > > -- > Xabier V?zquez Campos > *PhD Candidate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Fri Jan 16 20:57:26 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Sat, 17 Jan 2015 14:57:26 +1100 Subject: [maker-devel] AED score script error Message-ID: Hi, Just reporting the following error with the AED_cdf_generator.pl script: Use of uninitialized value $opt_b in division (/) at AED_cdf_generator.pl > line 20. > Illegal division by zero at AED_cdf_generator.pl line 20. > Anybody else with this problem? I use the version attached here: https://groups.google.com/forum/#!topic/maker-devel/LCpB3CEm63M Thank you -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Jan 19 10:27:52 2015 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 19 Jan 2015 10:27:52 -0700 Subject: [maker-devel] AED score script error In-Reply-To: References: Message-ID: Hi Xabier, Did you give the -b option a value on the command line ( e.g. -b 0.1)? Mike On Fri, Jan 16, 2015 at 8:57 PM, Xabier V?zquez Campos wrote: > Hi, > > Just reporting the following error with the AED_cdf_generator.pl script: > > Use of uninitialized value $opt_b in division (/) at AED_cdf_generator.pl >> line 20. >> Illegal division by zero at AED_cdf_generator.pl line 20. >> > > Anybody else with this problem? > I use the version attached here: > https://groups.google.com/forum/#!topic/maker-devel/LCpB3CEm63M > > Thank you > > > -- > Xabier V?zquez Campos > *PhD Candidate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Mon Jan 19 23:14:58 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Tue, 20 Jan 2015 17:14:58 +1100 Subject: [maker-devel] AED score script error In-Reply-To: References: Message-ID: Thanks Mike. It was that. 2015-01-20 4:27 GMT+11:00 Michael Campbell : > Hi Xabier, > > Did you give the -b option a value on the command line ( e.g. -b 0.1)? > > Mike > > On Fri, Jan 16, 2015 at 8:57 PM, Xabier V?zquez Campos < > xvazquezc at gmail.com> wrote: > >> Hi, >> >> Just reporting the following error with the AED_cdf_generator.pl script: >> >> Use of uninitialized value $opt_b in division (/) at AED_cdf_generator.pl >>> line 20. >>> Illegal division by zero at AED_cdf_generator.pl line 20. >>> >> >> Anybody else with this problem? >> I use the version attached here: >> https://groups.google.com/forum/#!topic/maker-devel/LCpB3CEm63M >> >> Thank you >> >> >> -- >> Xabier V?zquez Campos >> *PhD Candidate* >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > -- > Michael Campbell MS, RD. > Doctoral Candidate > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ph:585-3543 > > -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jan 20 09:45:01 2015 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 Jan 2015 09:45:01 -0700 Subject: [maker-devel] Issue due to intensive I/O In-Reply-To: References: Message-ID: <6F82AB5F-4782-41CA-A61F-C79894EFABB4@gmail.com> Genome annotation is very data intensive as opposed to CPU intensive. In MAKER, most IO intensive operations will occur in a temporary directory pointed to by the TMP= option in the MAKER control files. If you are setting this value to a location on a network mounted drive then this could be the source of your problem. Also TMP= defaults to the location of the TMPDIR Linux environmental variable, so make sure that TMPDIR is not set to a network mounted location either. The temporary directory needs to be a locally mounted location. There will still need to be a number of global files though; however, we?ve previously ran MAKER on over 8,000 cpus on Lustre file systems with no issues. It is possible that it is the metadata server that is having problems as opposed to the object storage server if the genome being annotated has a large number of small contigs. Lots of small contigs in a fragmented genome assembly result in a lot of small result files, but very little reading and writing. Such a situation can be quite stressful for Lustre file systems because they don?t like having large numbers of very small files (it overwhelms the metadata server even though the object storage server will be under more moderate load). Make sure you are setting min_contig= to something like 10000 if that is the case to avoid generating analysis for short un-annotatable contigs (they may number in the hundreds of thousands on lower quality genome assemblies and contain no useful information). You can also set clean_up=1 in the maker control files, to delete files as MAKER advances. This removes restart capability because you won?t have logged results from previous runs, but it will reduce the burden on the Metadata server (which is affected by total file number as opposed to file read/write operations). Also setting clean_up=1 can help you avoid any administrator defined limits on total file number per user (administrators commonly set this limit on Lustre based file systems to avoid taxing the metadata server). So your issue is likely caused by one of two things: 1. Improperly setting TMP= in the maker_opts.ctl file or the Linux TMPDIR environmental variable to a network mounted location. Fixed by setting these to a locally mounted location (usually /tmp). 2. Too many total files being generated by a fragmented genome assembly. Fixed by either setting min_contig=10000 in order to skip short contigs or by setting clean_up=1 to avoid logging too many files. This happen because it is very difficult to overwhelm Lustre's object storage servers (which perform IO read/write operations), but it?s relatively easy to overwhelming the metadata server (affected by total file count rather than total IO throughput). ?Carson > On Jan 19, 2015, at 5:55 AM, Stephen Wang wrote: > > Dear MAKER Team, > > I am a cluster administrator in the university. The issue is caused by MAKER jobs, which access massive small files and crashed Lustre file system. > > Hardware: 16 cores per node > Software: OpenMPI 1.6.5 and GCC 4.9.1 > > Q1: Does MAKER have to generate a large number of files on the global file system? > Q2: Can any parameters help MAKER avoid I/O intensive access? Any experience on Lustre? > > MAKER is a quite important software for our user. Hope for your help. > > BR, > Stephen > > -- > Stephen Wang, GPU Computing Specialist > Center for High Performance Computing > Shanghai Jiao Tong University > Room 205 Network Center, 800 Dongchuan Road, Shanghai 200240 China > Mobi:+86-136-6151-1618 Web:http://hpc.sjtu.edu.cn -------------- next part -------------- An HTML attachment was scrubbed... URL: From jgallant at msu.edu Wed Jan 21 06:56:02 2015 From: jgallant at msu.edu (Jason Gallant) Date: Wed, 21 Jan 2015 05:56:02 -0800 (PST) Subject: [maker-devel] Maker on Amazon EC2 Using Starcluster Message-ID: <1421848561970.c8b481bf@Nodemailer> Hi Everyone, I?m attempting to run Maker on Amazon EC2 using MIT?s starcluster? I?ve started a 200 node cluster, and enabled MPICH2 (Starcluster by default uses OpenMPI). ?I plan on documenting this setup once I?ve figured out how to run things reliably. I?m having a persistent issue where something fails on one of the nodes, and std error is flooded with: examining contents of the fasta file and run log [67] ERROR: could not make datastore directory [67] --> rank=67, hostname=node067 [67] ERROR: Failed while examining contents of the fasta file and run log [67] ERROR: Chunk failed at level:0, tier_type:0 [67] FAILED CONTIG:Scaffold261 This error repeats for each ?next? scaffold for some time. ?When I go back to find the ?source? of the error in the log, the following is the first error message on that node: 67] #-------------------------------# [67] deleted:-60 hits [67] collecting blastx reports [67] ERROR: Could not colapse BLAST reports [67]? at /root/maker/bin/../lib/GI.pm line 2524 thread 1. [67] GI::combine_blast_report(FastaChunk=HASH(0x108e1a90), ARRAY(0x1b874938), ARRAY(0xf127ad8), runlog=HASH(0x4d54ed8)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760 thread 1 [67] Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415 thread 1 [67] eval {...} called at /root/maker/bin/../lib/Error.pm line 407 thread 1 [67] Error::subs::try(CODE(0x1514eb00), HASH(0x9cbeb568)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4215 thread 1 [67] Process::MpiChunk::_go(Process::MpiChunk=HASH(0x13976308), "run", HASH(0x12e04268), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341 thread 1 [67] Process::MpiChunk::run(Process::MpiChunk=HASH(0x13976308), 67) called at /root/maker/bin/maker line 1457 thread 1 [67] main::node_thread("/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 [67] eval {...} called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 [67] threads::new("threads", CODE(0x3dc5b38), "/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /root/maker/bin/maker line 917 thread 1 [67] --> rank=67, hostname=node067 [67] ERROR: Failed while collecting blastx reports [67] ERROR: Chunk failed at level:9, tier_type:3 [67] FAILED CONTIG:Scaffold66 [67]? [67] ERROR: Chunk failed at level:4, tier_type:0 [67] FAILED CONTIG:Scaffold66 I?ve attempted to ignore the error to see if things will proceed on the other 199 processors. ?When I returned to the ?master? node after the evening, Maker keeps repeating the same error code over and over (same scaffold): ] examining contents of the fasta file and run log [67] ERROR: could not make datastore directory [67] --> rank=67, hostname=node067 [67] ERROR: Failed while examining contents of the fasta file and run log [67] ERROR: Chunk failed at level:0, tier_type:0 [67] FAILED CONTIG:Scaffold1589 I stop the job, and restart, and after only a few minutes of running, the same error is reported, this time on a new scaffold. ?Strangely here, the error is reported in the MPI tag of node001, but the error originates at node137: ERROR: Could not colapse BLAST reports [1]? at /root/maker/bin/../lib/GI.pm line 2524. [1] ? ? GI::combine_blast_report(FastaChunk=HASH(0xf4aa9b8), ARRAY(0xf628f90), ARRAY(0x325fea78), runlog=HASH(0x133cc8e8)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760 [1] ? ? Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415 [1] ? ? eval {...} called at /root/maker/bin/../lib/Error.pm line 407 [1] ? ? Error::subs::try(CODE(0x352c9b8), HASH(0xdab3b690)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4215 [1] ? ? Process::MpiChunk::_go(Process::MpiChunk=HASH(0x3545d90), "run", HASH(0x30aa710), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341 [1] ? ? Process::MpiChunk::run(Process::MpiChunk=HASH(0x3545d90), 137) called at /root/maker/bin/maker line 979 [1] --> rank=137, hostname=node137 [1] ERROR: Failed while collecting blastx reports [1] ERROR: Chunk failed at level:9, tier_type:3 [1] FAILED CONTIG:Scaffold249 [1] [1] ERROR: Chunk failed at level:4, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 [1] [1] examining contents of the fasta file and run log [1] ERROR: could not make datastore directory [1] --> rank=1, hostname=node001 [1] ERROR: Failed while examining contents of the fasta file and run log [1] ERROR: Chunk failed at level:0, tier_type:0 [1] FAILED CONTIG:Scaffold249 I?d appreciate any guidance as how best to diagnose this error! Many thanks, Jason Gallant ? Dr. Jason R. GallantAssistant Professor Room 38 Natural Sciences Department of Zoology Michigan State University East Lansing, MI 48824 jgallant at msu.edu office: 517-884-7756 -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Wed Jan 21 17:42:35 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Thu, 22 Jan 2015 11:42:35 +1100 Subject: [maker-devel] repeat masking and repeat libraries In-Reply-To: References: Message-ID: Thanks Mike, I've blasted (blastx against nr) and many, if not most of the repeatmodeler library sequences match with transposases, pol proteins, gag proteins, retrotransposons,... all of them present in other fungi of the same order. Should I leave it to be masked? I still do run prediction on the unmasked genome too? Also, in many cases, the match a couple of thousand bp on the extreme of a 9kbp sequence and in none of them InterProScan is capable of finding anything except potential TM domains or so, provided by SignalP. What do you think? Should I leave it as it is? Thank you again for your time 2015-01-17 4:08 GMT+11:00 Michael Campbell : > Hi Xabier, > > I haven't seen orders or families documented for repeatmasker with > repbase. Fungi seems safe to me. > > If you want to give yourself a little more peace of mind about the > repeatmodeler library you can blast it to database of known fungal proteins > and remove the entries int he library that have strong hits to a known > protein to avoid over-masking. > > Mike > > On Fri, Jan 16, 2015 at 10:04 AM, Carson Holt wrote: > >> Using both RepBase and a RepeatModeler produced library should be >> sufficient, especially for fungi. >> >> ?Carson >> >> >> On Jan 16, 2015, at 3:11 AM, Xabier V?zquez Campos >> wrote: >> >> Hi there, >> >> First, a general question. Probably kind of silly but I prefer to be >> sure... When you browse RepBase, for example in fungi, all the repeats are >> marked as Eukaryota (Ancestral) or under the name of the species but no >> other taxa ranks are indicated. Does RepeatMasker recognise orders, >> families etc? or in my case should I stick with model_org=fungi? >> >> I've been trying to create a repeat libraries specific for my genomes and >> I did't have any luck with the programs described in the Basic >> >> and advanced >> >> tutorials (neither in my computer or in the cluster), reporting errors at >> all times, with exception of RepeatModeler, which ran with no problems. Is >> the output from RepeatModeler enough to improve the masking? It is not the >> best option I guess, but better than just the RepBase libraries by >> themselves, isn't it? >> >> Thank you for your time, >> >> Xabier >> >> -- >> Xabier V?zquez Campos >> *PhD Candidate* >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > -- > Michael Campbell MS, RD. > Doctoral Candidate > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ph:585-3543 > > -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Thu Jan 22 09:42:56 2015 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Thu, 22 Jan 2015 09:42:56 -0700 Subject: [maker-devel] repeat masking and repeat libraries In-Reply-To: References: Message-ID: Hi Xabier, >From what you described I would leave it as is. Mike On Wed, Jan 21, 2015 at 5:42 PM, Xabier V?zquez Campos wrote: > Thanks Mike, > > I've blasted (blastx against nr) and many, if not most of the > repeatmodeler library sequences match with transposases, pol proteins, gag > proteins, retrotransposons,... all of them present in other fungi of the > same order. Should I leave it to be masked? I still do run prediction on > the unmasked genome too? > Also, in many cases, the match a couple of thousand bp on the extreme of a > 9kbp sequence and in none of them InterProScan is capable of finding > anything except potential TM domains or so, provided by SignalP. > > What do you think? Should I leave it as it is? > > Thank you again for your time > > 2015-01-17 4:08 GMT+11:00 Michael Campbell > : > >> Hi Xabier, >> >> I haven't seen orders or families documented for repeatmasker with >> repbase. Fungi seems safe to me. >> >> If you want to give yourself a little more peace of mind about the >> repeatmodeler library you can blast it to database of known fungal proteins >> and remove the entries int he library that have strong hits to a known >> protein to avoid over-masking. >> >> Mike >> >> On Fri, Jan 16, 2015 at 10:04 AM, Carson Holt wrote: >> >>> Using both RepBase and a RepeatModeler produced library should be >>> sufficient, especially for fungi. >>> >>> ?Carson >>> >>> >>> On Jan 16, 2015, at 3:11 AM, Xabier V?zquez Campos >>> wrote: >>> >>> Hi there, >>> >>> First, a general question. Probably kind of silly but I prefer to be >>> sure... When you browse RepBase, for example in fungi, all the repeats are >>> marked as Eukaryota (Ancestral) or under the name of the species but no >>> other taxa ranks are indicated. Does RepeatMasker recognise orders, >>> families etc? or in my case should I stick with model_org=fungi? >>> >>> I've been trying to create a repeat libraries specific for my genomes >>> and I did't have any luck with the programs described in the Basic >>> >>> and advanced >>> >>> tutorials (neither in my computer or in the cluster), reporting errors at >>> all times, with exception of RepeatModeler, which ran with no problems. Is >>> the output from RepeatModeler enough to improve the masking? It is not the >>> best option I guess, but better than just the RepBase libraries by >>> themselves, isn't it? >>> >>> Thank you for your time, >>> >>> Xabier >>> >>> -- >>> Xabier V?zquez Campos >>> *PhD Candidate* >>> Water Research Centre >>> School of Civil and Environmental Engineering >>> The University of New South Wales >>> Sydney NSW 2052 AUSTRALIA >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> >> >> -- >> Michael Campbell MS, RD. >> Doctoral Candidate >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ph:585-3543 >> >> > > > -- > Xabier V?zquez Campos > *PhD Candidate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 23 12:17:36 2015 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 23 Jan 2015 12:17:36 -0700 Subject: [maker-devel] running maker on TACC Stampede. In-Reply-To: References: Message-ID: Stampede only has MVAPICH2. It does not have OpenMPI (even though it has been requested several times). OpenFabrics libraries (used by MVAPICH2) have a known issue that restricts programs from making system calls while running under MPI. A system call is when one program launches another (i.e. MAKER launching BLAST). For this reason MAKER does not work with MVAPICH2. It only works with OpenMPI. You can still get it to work on MVAPICH2, but only on a single node. If you request more than one node then it will fail. The solution would be for TACC to install OpenMPI as an option on Stampede (like they have on Lonestar), but until that happens you can only run MAKER on a single node. Thanks, Carson > On Jan 22, 2015, at 10:51 PM, Won C Yim wrote: > > Dear anyone whom may it concern, > > Hi! > > My name is Won Cheol Yim in University of Nevada, Reno. > > I try to run MAKER on TACC Stampede. > > It looks everything installed properly. > > ============================================================================== > STATUS MAKER v2.31.8 > ============================================================================== > PERL Dependencies: > VERIFIED > External Programs: > VERIFIED > External C Libraries: > VERIFIED > MPI SUPPORT: > ENABLED > MWAS Web Interface: > DISABLED > MAKER PACKAGE: > CONFIGURATION OK > > And I installed Perl 5.18.4 with threads option. > > But I try to run it with MPI, it generated error. > > I assumed this problem came from ibrun in Stampede. > > Is there anyway to run it on Stampede? > > Here is my log. > > TACC: Starting up job > TACC: Setting up parallel environment for MVAPICH ssh-based mpirun. > cat: /home1/02908/wyim/.sge/job..hostlist.kUm5vXw9: No such file or directory > sort: open failed: /home1/02908/wyim/.sge/job..hostlist.kUm5vXw9: No such file or directory > TACC: Setup complete. Running job script. > TACC: starting parallel tasks... > [c404-703.stampede.tacc.utexas.edu:mpirun_rsh][read_hostfile] Can't open hostfile `/home1/02908/wyim/.sge/job..hostlist.kUm5vXw9': (2) > TACC: MPI job exited with code: 1 > TACC: Shutting down parallel environment. > TACC: Shutdown complete. Exiting. > > > Regards, > > Won > -- > Yim, Won Cheol > Sent with Airmail -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 23 13:00:56 2015 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 23 Jan 2015 13:00:56 -0700 Subject: [maker-devel] Maker on Amazon EC2 Using Starcluster In-Reply-To: <1421848561970.c8b481bf@Nodemailer> References: <1421848561970.c8b481bf@Nodemailer> Message-ID: MAKER needs a global storage location. You probably need to set up one of your instances up to act as a shared storage server. AWS has lustre implementations for the cloud, perhaps you can try that. Also use OpenMPI instead of MPICH2. It?s more stable. I look forward to seeing how your experiment with AWS, MPI, and MAKER works out. ?Carson > On Jan 21, 2015, at 6:56 AM, Jason Gallant wrote: > > Hi Everyone, > > I?m attempting to run Maker on Amazon EC2 using MIT?s starcluster? I?ve started a 200 node cluster, and enabled MPICH2 (Starcluster by default uses OpenMPI). I plan on documenting this setup once I?ve figured out how to run things reliably. > > I?m having a persistent issue where something fails on one of the nodes, and std error is flooded with: > > examining contents of the fasta file and run log > [67] ERROR: could not make datastore directory > [67] --> rank=67, hostname=node067 > [67] ERROR: Failed while examining contents of the fasta file and run log > [67] ERROR: Chunk failed at level:0, tier_type:0 > [67] FAILED CONTIG:Scaffold261 > > This error repeats for each ?next? scaffold for some time. When I go back to find the ?source? of the error in the log, the following is the first error message on that node: > > 67] #-------------------------------# > [67] deleted:-60 hits > [67] collecting blastx reports > [67] ERROR: Could not colapse BLAST reports > [67] at /root/maker/bin/../lib/GI.pm line 2524 thread 1. > [67] GI::combine_blast_report(FastaChunk=HASH(0x108e1a90), ARRAY(0x1b874938), ARRAY(0xf127ad8), runlog=HASH(0x4d54ed8)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760 thread 1 > [67] Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415 thread 1 > [67] eval {...} called at /root/maker/bin/../lib/Error.pm line 407 thread 1 > [67] Error::subs::try(CODE(0x1514eb00), HASH(0x9cbeb568)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4215 thread 1 > [67] Process::MpiChunk::_go(Process::MpiChunk=HASH(0x13976308), "run", HASH(0x12e04268), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341 thread 1 > [67] Process::MpiChunk::run(Process::MpiChunk=HASH(0x13976308), 67) called at /root/maker/bin/maker line 1457 thread 1 > [67] main::node_thread("/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 > [67] eval {...} called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 > [67] threads::new("threads", CODE(0x3dc5b38), "/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /root/maker/bin/maker line 917 thread 1 > [67] --> rank=67, hostname=node067 > [67] ERROR: Failed while collecting blastx reports > [67] ERROR: Chunk failed at level:9, tier_type:3 > [67] FAILED CONTIG:Scaffold66 > [67] > [67] ERROR: Chunk failed at level:4, tier_type:0 > [67] FAILED CONTIG:Scaffold66 > > > I?ve attempted to ignore the error to see if things will proceed on the other 199 processors. When I returned to the ?master? node after the evening, Maker keeps repeating the same error code over and over (same scaffold): > ] examining contents of the fasta file and run log > [67] ERROR: could not make datastore directory > [67] --> rank=67, hostname=node067 > [67] ERROR: Failed while examining contents of the fasta file and run log > [67] ERROR: Chunk failed at level:0, tier_type:0 > [67] FAILED CONTIG:Scaffold1589 > > I stop the job, and restart, and after only a few minutes of running, the same error is reported, this time on a new scaffold. Strangely here, the error is reported in the MPI tag of node001, but the error originates at node137: > > ERROR: Could not colapse BLAST reports > [1] at /root/maker/bin/../lib/GI.pm line 2524. > [1] GI::combine_blast_report(FastaChunk=HASH(0xf4aa9b8), ARRAY(0xf628f90), ARRAY(0x325fea78), runlog=HASH(0x133cc8e8)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760 > [1] Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415 > [1] eval {...} called at /root/maker/bin/../lib/Error.pm line 407 > [1] Error::subs::try(CODE(0x352c9b8), HASH(0xdab3b690)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4215 > [1] Process::MpiChunk::_go(Process::MpiChunk=HASH(0x3545d90), "run", HASH(0x30aa710), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341 > [1] Process::MpiChunk::run(Process::MpiChunk=HASH(0x3545d90), 137) called at /root/maker/bin/maker line 979 > [1] --> rank=137, hostname=node137 > [1] ERROR: Failed while collecting blastx reports > [1] ERROR: Chunk failed at level:9, tier_type:3 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] ERROR: Chunk failed at level:4, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > [1] > [1] examining contents of the fasta file and run log > [1] ERROR: could not make datastore directory > [1] --> rank=1, hostname=node001 > [1] ERROR: Failed while examining contents of the fasta file and run log > [1] ERROR: Chunk failed at level:0, tier_type:0 > [1] FAILED CONTIG:Scaffold249 > > I?d appreciate any guidance as how best to diagnose this error! > > Many thanks, > Jason Gallant > > > > > ? > Dr. Jason R. Gallant > Assistant Professor > Room 38 Natural Sciences > Department of Zoology > Michigan State University > East Lansing, MI 48824 > jgallant at msu.edu > office: 517-884-7756 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From jcornel3 at asu.edu Fri Jan 23 14:28:13 2015 From: jcornel3 at asu.edu (John Cornelius) Date: Fri, 23 Jan 2015 13:28:13 -0800 Subject: [maker-devel] Maker-P vs. Maker Message-ID: Hi, I'm working on annotating a tetraploid animal with a genome size that is 3.1 gigabase in size. I was wondering if maker-P would be appropriate for this organism or is I should just stick with maker? Thanks. -- John Cornelius MCB PhD Candidate Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 23 14:59:01 2015 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 23 Jan 2015 14:59:01 -0700 Subject: [maker-devel] Maker-P vs. Maker In-Reply-To: References: Message-ID: <7813BFBE-7237-4298-8AD3-B210CB96DDD2@gmail.com> Actually the code bases have been merged. So if you use the most recent version of MAKER, the plant extensions for RNA annotation and extra analysis scripts from MAKER-P will be there. If you don?t need them, then just don?t turn the options on in the control files. ?Carson > On Jan 23, 2015, at 2:28 PM, John Cornelius wrote: > > Hi, I'm working on annotating a tetraploid animal with a genome size that is 3.1 gigabase in size. I was wondering if maker-P would be appropriate for this organism or is I should just stick with maker? Thanks. > > -- > John Cornelius > MCB PhD Candidate > Arizona State University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Jan 26 12:17:45 2015 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 26 Jan 2015 12:17:45 -0700 Subject: [maker-devel] running maker on TACC Stampede. In-Reply-To: References: Message-ID: Do you mean sequence upstream of the gene? If that is the case you would probably have to write a script to do this. BioPerl is one options that has several Perl modules that help with manipulating fasta sequences and many common biology tool file formats ?> http://www.bioperl.org ?Carson > On Jan 26, 2015, at 12:10 PM, Won C Yim wrote: > > Dear Carson Holt, > > Thank you for your reply. > > I asked this issue to STAMPEDE and there?s no way to help me. > > I think we need to move another server for MAKER. > > Thank you for your help. > > And I have a one more question. > > Is there any way to extract upstream sequence from MAKER results? > > I tried to extract upstream and downstream results from them, but it?s really hard to do it. > > Regards, > > Won > > -- > Yim, Won Cheol > MS330/Department of Biochemistry & Molecular Biology > 1664 N. Virginia Street > University of Nevada, Reno > > email: wyim at unr.edu > > > On January 23, 2015 at 11:17:41 AM, Carson Holt (carsonhh at gmail.com ) wrote: > >> Stampede only has MVAPICH2. It does not have OpenMPI (even though it has been requested several times). OpenFabrics libraries (used by MVAPICH2) have a known issue that restricts programs from making system calls while running under MPI. A system call is when one program launches another (i.e. MAKER launching BLAST). For this reason MAKER does not work with MVAPICH2. It only works with OpenMPI. >> >> You can still get it to work on MVAPICH2, but only on a single node. If you request more than one node then it will fail. The solution would be for TACC to install OpenMPI as an option on Stampede (like they have on Lonestar), but until that happens you can only run MAKER on a single node. >> >> Thanks, >> Carson >> >> >>> On Jan 22, 2015, at 10:51 PM, Won C Yim > wrote: >>> >>> Dear anyone whom may it concern, >>> >>> Hi! >>> >>> My name is Won Cheol Yim in University of Nevada, Reno. >>> >>> I try to run MAKER on TACC Stampede. >>> >>> It looks everything installed properly. >>> >>> ============================================================================== >>> STATUS MAKER v2.31.8 >>> ============================================================================== >>> PERL Dependencies:VERIFIED >>> External Programs:VERIFIED >>> External C Libraries:VERIFIED >>> MPI SUPPORT:ENABLED >>> MWAS Web Interface:DISABLED >>> MAKER PACKAGE:CONFIGURATION OK >>> >>> And I installed Perl 5.18.4 with threads option. >>> >>> But I try to run it with MPI, it generated error. >>> >>> I assumed this problem came from ibrun in Stampede. >>> >>> Is there anyway to run it on Stampede? >>> >>> Here is my log. >>> >>> TACC: Starting up job >>> TACC: Setting up parallel environment for MVAPICH ssh-based mpirun. >>> cat: /home1/02908/wyim/.sge/job..hostlist.kUm5vXw9: No such file or directory >>> sort: open failed: /home1/02908/wyim/.sge/job..hostlist.kUm5vXw9: No such file or directory >>> TACC: Setup complete. Running job script. >>> TACC: starting parallel tasks... >>> [c404-703.stampede.tacc.utexas.edu:mpirun_rsh][read_hostfile] Can't open hostfile `/home1/02908/wyim/.sge/job..hostlist.kUm5vXw9': (2) >>> TACC: MPI job exited with code: 1 >>> TACC: Shutting down parallel environment. >>> TACC: Shutdown complete. Exiting. >>> >>> >>> Regards, >>> >>> Won >>> -- >>> Yim, Won Cheol >>> Sent with Airmail -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.hoeppner at imbim.uu.se Wed Jan 28 00:01:48 2015 From: marc.hoeppner at imbim.uu.se (=?utf-8?B?TWFyYyBIw7ZwcG5lcg==?=) Date: Wed, 28 Jan 2015 07:01:48 +0000 Subject: [maker-devel] Maker crash on increasingly small contigs In-Reply-To: <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> References: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> Message-ID: Hi, this is probably a long shot, but I was hoping that someone on the list may have some advice as to how to debug an error that has been popping up when running Maker on our 10 node cluster. So, what is the issue? Maker runs fine on several assemblies that w have processed in the past, but I recently started on a fairly fragment (low N50) mammalian assembly and the collaborator was keen to have all contigs annotated, down to 1kb (I guess it is more about the repeats and blast matches in those small bits). Anyway, As the contigs get smaller, Maker starts crashing in MPI mode with the following error (no other message given prior to that): perl:13424 terminated with signal 11 at PC=3d47095012 SP=7f8ac076e530. Backtrace: /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x3d47095012] /lib64/libpthread.so.0[0x358ae0f710] /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x0)[0x3d47094ff0] /lib64/libpthread.so.0[0x358ae0f710] /lib64/libc.so.6(__poll+0x53)[0x358aadf343] /sw/openmpi/1.8.3/lib/libopen-pal.so.6(+0x6af4a)[0x7f8ac0a29f4a] /sw/openmpi/1.8.3/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x221)[0x7f8ac0a21961] /sw/openmpi/1.8.3/lib/libopen-rte.so.7(+0x52f8e)[0x7f8ac0ce5f8e] /lib64/libpthread.so.0[0x358ae079d1] /lib64/libc.so.6(clone+0x6d)[0x358aae8b6d] SIGTERM received A few words about the setup: We have 10 nodes, 160 cores and the shared file system is exported via Infiniband from a ?standard? NFS server. As OS we run Scientific Linux 6.5. Tests so far don?t point to congestion issues or anything like that, the bandwidth usage is actually fairly low. I So far I tried: - running the MPI processes through both the ethernet network as well as over IPoIB, same problem. - installing a more recent version of perl through perlbrew, with all the required modules, and re-compiled Maker - ran some (albeit simple) network checks to for retransmissions, lost packages etc - nothing popped up - running Maker in a subset of nodes to eliminate the possibility of a bad node The error message is a bit cryptic to me and it would be very helpful to know if Maker has a problem with accessing a file, or whether OpenMPI has a communication problem etc - but I am not able to tell from the information I have been able to extract so far. Any ideas? So Cheers, Marc Marc P. Hoeppner, PhD Team Leader BILS Genome Annotation Platform Department for Medical Biochemistry and Microbiology Uppsala University, Sweden marc.hoeppner at imbim.uu.se From dence at genetics.utah.edu Wed Jan 28 09:22:09 2015 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 28 Jan 2015 16:22:09 +0000 Subject: [maker-devel] Maker crash on increasingly small contigs In-Reply-To: References: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> Message-ID: <19F7E075-6B18-4DB2-B97A-922D29456E52@genetics.utah.edu> Hi Marc, so a few things on the maker side to check out. Did you have the min_contig set to 1000, to set the lower limit on contig size? Did maker do anything with the 1kb contigs? Or did it just skip them? You can check that in the master_datastore_index.log or in the void directories for the small contigs. That will tell us whether maker is functioning correctly, even though it?s giving those messages. With the newer versions of makers, I get messages identical to what you sent as part of the normal thread termination, even when maker is functioning normally. Thanks, Daniel > On Jan 28, 2015, at 12:01 AM, Marc H?ppner wrote: > > Hi, > > this is probably a long shot, but I was hoping that someone on the list may have some advice as to how to debug an error that has been popping up when running Maker on our 10 node cluster. So, what is the issue? > > Maker runs fine on several assemblies that w have processed in the past, but I recently started on a fairly fragment (low N50) mammalian assembly and the collaborator was keen to have all contigs annotated, down to 1kb (I guess it is more about the repeats and blast matches in those small bits). Anyway, As the contigs get smaller, Maker starts crashing in MPI mode with the following error (no other message given prior to that): > > perl:13424 terminated with signal 11 at PC=3d47095012 SP=7f8ac076e530. Backtrace: > /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x3d47095012] > /lib64/libpthread.so.0[0x358ae0f710] > /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x0)[0x3d47094ff0] > /lib64/libpthread.so.0[0x358ae0f710] > /lib64/libc.so.6(__poll+0x53)[0x358aadf343] > /sw/openmpi/1.8.3/lib/libopen-pal.so.6(+0x6af4a)[0x7f8ac0a29f4a] > /sw/openmpi/1.8.3/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x221)[0x7f8ac0a21961] > /sw/openmpi/1.8.3/lib/libopen-rte.so.7(+0x52f8e)[0x7f8ac0ce5f8e] > /lib64/libpthread.so.0[0x358ae079d1] > /lib64/libc.so.6(clone+0x6d)[0x358aae8b6d] > SIGTERM received > > A few words about the setup: > > We have 10 nodes, 160 cores and the shared file system is exported via Infiniband from a ?standard? NFS server. As OS we run Scientific Linux 6.5. Tests so far don?t point to congestion issues or anything like that, the bandwidth usage is actually fairly low. I > > So far I tried: > > - running the MPI processes through both the ethernet network as well as over IPoIB, same problem. > - installing a more recent version of perl through perlbrew, with all the required modules, and re-compiled Maker > - ran some (albeit simple) network checks to for retransmissions, lost packages etc - nothing popped up > - running Maker in a subset of nodes to eliminate the possibility of a bad node > > The error message is a bit cryptic to me and it would be very helpful to know if Maker has a problem with accessing a file, or whether OpenMPI has a communication problem etc - but I am not able to tell from the information I have been able to extract so far. Any ideas? > > So > > Cheers, > > Marc > > > Marc P. Hoeppner, PhD > Team Leader > BILS Genome Annotation Platform > Department for Medical Biochemistry and Microbiology > Uppsala University, Sweden > marc.hoeppner at imbim.uu.se > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From marc.hoeppner at imbim.uu.se Thu Jan 29 00:34:17 2015 From: marc.hoeppner at imbim.uu.se (Marc P. Hoeppner) Date: Thu, 29 Jan 2015 08:34:17 +0100 Subject: [maker-devel] Maker crash on increasingly small contigs In-Reply-To: <19F7E075-6B18-4DB2-B97A-922D29456E52@genetics.utah.edu> References: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> <19F7E075-6B18-4DB2-B97A-922D29456E52@genetics.utah.edu> Message-ID: <54C9E279.8040907@imbim.uu.se> Hi, thanks for the feedback. If I resume maker enough times, it will eventually run through an complete all contigs. The question is whether there is any way to debug why it drops at random times , most commonly when running on small contigs (which is probably more due to the increasing frequency of starting/finishing jobs rather than their size). I guess Maker has no debug mode or any other way to find out why it dies? Any idea what could make Maker drop like that? I was thinking NFS, but the nfsstat looks fine, nothing in the log and NFS function is generally good - so I can't identify a good point to look for the problem. Regards, Marc On 2015-01-28 17:22, Daniel Ence wrote: > Hi Marc, so a few things on the maker side to check out. > > Did you have the min_contig set to 1000, to set the lower limit on contig size? > Did maker do anything with the 1kb contigs? Or did it just skip them? > You can check that in the master_datastore_index.log or in the void directories for the small contigs. > That will tell us whether maker is functioning correctly, even though it?s giving those messages. > > With the newer versions of makers, I get messages identical to what you sent as part of the normal thread termination, even when maker is functioning normally. > > Thanks, > Daniel > > > >> On Jan 28, 2015, at 12:01 AM, Marc H?ppner wrote: >> >> Hi, >> >> this is probably a long shot, but I was hoping that someone on the list may have some advice as to how to debug an error that has been popping up when running Maker on our 10 node cluster. So, what is the issue? >> >> Maker runs fine on several assemblies that w have processed in the past, but I recently started on a fairly fragment (low N50) mammalian assembly and the collaborator was keen to have all contigs annotated, down to 1kb (I guess it is more about the repeats and blast matches in those small bits). Anyway, As the contigs get smaller, Maker starts crashing in MPI mode with the following error (no other message given prior to that): >> >> perl:13424 terminated with signal 11 at PC=3d47095012 SP=7f8ac076e530. Backtrace: >> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x3d47095012] >> /lib64/libpthread.so.0[0x358ae0f710] >> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x0)[0x3d47094ff0] >> /lib64/libpthread.so.0[0x358ae0f710] >> /lib64/libc.so.6(__poll+0x53)[0x358aadf343] >> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(+0x6af4a)[0x7f8ac0a29f4a] >> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x221)[0x7f8ac0a21961] >> /sw/openmpi/1.8.3/lib/libopen-rte.so.7(+0x52f8e)[0x7f8ac0ce5f8e] >> /lib64/libpthread.so.0[0x358ae079d1] >> /lib64/libc.so.6(clone+0x6d)[0x358aae8b6d] >> SIGTERM received >> >> A few words about the setup: >> >> We have 10 nodes, 160 cores and the shared file system is exported via Infiniband from a ?standard? NFS server. As OS we run Scientific Linux 6.5. Tests so far don?t point to congestion issues or anything like that, the bandwidth usage is actually fairly low. I >> >> So far I tried: >> >> - running the MPI processes through both the ethernet network as well as over IPoIB, same problem. >> - installing a more recent version of perl through perlbrew, with all the required modules, and re-compiled Maker >> - ran some (albeit simple) network checks to for retransmissions, lost packages etc - nothing popped up >> - running Maker in a subset of nodes to eliminate the possibility of a bad node >> >> The error message is a bit cryptic to me and it would be very helpful to know if Maker has a problem with accessing a file, or whether OpenMPI has a communication problem etc - but I am not able to tell from the information I have been able to extract so far. Any ideas? >> >> So >> >> Cheers, >> >> Marc >> >> >> Marc P. Hoeppner, PhD >> Team Leader >> BILS Genome Annotation Platform >> Department for Medical Biochemistry and Microbiology >> Uppsala University, Sweden >> marc.hoeppner at imbim.uu.se >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mikael.durling at slu.se Thu Jan 29 02:37:23 2015 From: mikael.durling at slu.se (=?utf-8?B?TWlrYWVsIEJyYW5kc3Ryw7ZtIER1cmxpbmc=?=) Date: Thu, 29 Jan 2015 09:37:23 +0000 Subject: [maker-devel] Maker crash on increasingly small contigs In-Reply-To: <54C9E279.8040907@imbim.uu.se> References: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> <19F7E075-6B18-4DB2-B97A-922D29456E52@genetics.utah.edu> <54C9E279.8040907@imbim.uu.se> Message-ID: Hi, are you running the NFS servers in synchronous or asynchronous mode? I have seen cases when maker fails with the nfs server in async mode, but the failures are random and I can?t really reproduce them. In the end, I have continued running maker on NFS in async mode, since the speed gains are significant, at the cost of occasional reruns. (And yes, nfsstats shows no signs of errors). Mikael > 29 jan 2015 kl. 08:34 skrev Marc P. Hoeppner : > > Hi, > > thanks for the feedback. If I resume maker enough times, it will eventually run through an complete all contigs. The question is whether there is any way to debug why it drops at random times , most commonly when running on small contigs (which is probably more due to the increasing frequency of starting/finishing jobs rather than their size). I guess Maker has no debug mode or any other way to find out why it dies? Any idea what could make Maker drop like that? I was thinking NFS, but the nfsstat looks fine, nothing in the log and NFS function is generally good - so I can't identify a good point to look for the problem. > > Regards, > > Marc > > On 2015-01-28 17:22, Daniel Ence wrote: >> Hi Marc, so a few things on the maker side to check out. >> >> Did you have the min_contig set to 1000, to set the lower limit on contig size? >> Did maker do anything with the 1kb contigs? Or did it just skip them? >> You can check that in the master_datastore_index.log or in the void directories for the small contigs. >> That will tell us whether maker is functioning correctly, even though it?s giving those messages. >> >> With the newer versions of makers, I get messages identical to what you sent as part of the normal thread termination, even when maker is functioning normally. >> >> Thanks, >> Daniel >> >> >> >>> On Jan 28, 2015, at 12:01 AM, Marc H?ppner wrote: >>> >>> Hi, >>> >>> this is probably a long shot, but I was hoping that someone on the list may have some advice as to how to debug an error that has been popping up when running Maker on our 10 node cluster. So, what is the issue? >>> >>> Maker runs fine on several assemblies that w have processed in the past, but I recently started on a fairly fragment (low N50) mammalian assembly and the collaborator was keen to have all contigs annotated, down to 1kb (I guess it is more about the repeats and blast matches in those small bits). Anyway, As the contigs get smaller, Maker starts crashing in MPI mode with the following error (no other message given prior to that): >>> >>> perl:13424 terminated with signal 11 at PC=3d47095012 SP=7f8ac076e530. Backtrace: >>> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x3d47095012] >>> /lib64/libpthread.so.0[0x358ae0f710] >>> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x0)[0x3d47094ff0] >>> /lib64/libpthread.so.0[0x358ae0f710] >>> /lib64/libc.so.6(__poll+0x53)[0x358aadf343] >>> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(+0x6af4a)[0x7f8ac0a29f4a] >>> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x221)[0x7f8ac0a21961] >>> /sw/openmpi/1.8.3/lib/libopen-rte.so.7(+0x52f8e)[0x7f8ac0ce5f8e] >>> /lib64/libpthread.so.0[0x358ae079d1] >>> /lib64/libc.so.6(clone+0x6d)[0x358aae8b6d] >>> SIGTERM received >>> >>> A few words about the setup: >>> >>> We have 10 nodes, 160 cores and the shared file system is exported via Infiniband from a ?standard? NFS server. As OS we run Scientific Linux 6.5. Tests so far don?t point to congestion issues or anything like that, the bandwidth usage is actually fairly low. I >>> >>> So far I tried: >>> >>> - running the MPI processes through both the ethernet network as well as over IPoIB, same problem. >>> - installing a more recent version of perl through perlbrew, with all the required modules, and re-compiled Maker >>> - ran some (albeit simple) network checks to for retransmissions, lost packages etc - nothing popped up >>> - running Maker in a subset of nodes to eliminate the possibility of a bad node >>> >>> The error message is a bit cryptic to me and it would be very helpful to know if Maker has a problem with accessing a file, or whether OpenMPI has a communication problem etc - but I am not able to tell from the information I have been able to extract so far. Any ideas? >>> >>> So >>> >>> Cheers, >>> >>> Marc >>> >>> >>> Marc P. Hoeppner, PhD >>> Team Leader >>> BILS Genome Annotation Platform >>> Department for Medical Biochemistry and Microbiology >>> Uppsala University, Sweden >>> marc.hoeppner at imbim.uu.se >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Jan 29 08:22:57 2015 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 29 Jan 2015 08:22:57 -0700 Subject: [maker-devel] Maker crash on increasingly small contigs In-Reply-To: References: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> <19F7E075-6B18-4DB2-B97A-922D29456E52@genetics.utah.edu> <54C9E279.8040907@imbim.uu.se> Message-ID: In my experience NFS is the most likely cause. A lot of very small contigs means that MAKER would produce a lot of very small files very quickly, which creates far more stress for NFS than high IO read/write bandwidth does. There can then be several seconds of lag time between a file being created and the file being available for reading because the asynchronous setting allows the system to return true for IO operation even though the operations have not yet been completed but are only buffered on the NFS server. So when the process tries to read the file it supposably just created, the file doesn?t exist. MAKER tries to offload most small file creation operations that can result in this condition to a temporary directory (indicated by TMP= in the maker_opts.ctl file), so it is critical that this location be set to a local drive and not an NFS location. But running a lot of very small contigs would still result in more frequent file creation on the NFS mount. The only way around this type of NFS issue is either to run on fewer nodes to reduce file creation frequency, turn off asynchronous mode for NFS (which results in serious IO performance degradation) or to just let MAKER retry until it works (brute force) which is the default and in my experience the most effective approach. NFS issues were in fact the reason we put retry and restart capabilities into MAKER in the first place. ?Carson > On Jan 29, 2015, at 2:37 AM, Mikael Brandstr?m Durling wrote: > > Hi, > > are you running the NFS servers in synchronous or asynchronous mode? I have seen cases when maker fails with the nfs server in async mode, but the failures are random and I can?t really reproduce them. In the end, I have continued running maker on NFS in async mode, since the speed gains are significant, at the cost of occasional reruns. (And yes, nfsstats shows no signs of errors). > > Mikael > > >> 29 jan 2015 kl. 08:34 skrev Marc P. Hoeppner : >> >> Hi, >> >> thanks for the feedback. If I resume maker enough times, it will eventually run through an complete all contigs. The question is whether there is any way to debug why it drops at random times , most commonly when running on small contigs (which is probably more due to the increasing frequency of starting/finishing jobs rather than their size). I guess Maker has no debug mode or any other way to find out why it dies? Any idea what could make Maker drop like that? I was thinking NFS, but the nfsstat looks fine, nothing in the log and NFS function is generally good - so I can't identify a good point to look for the problem. >> >> Regards, >> >> Marc >> >> On 2015-01-28 17:22, Daniel Ence wrote: >>> Hi Marc, so a few things on the maker side to check out. >>> >>> Did you have the min_contig set to 1000, to set the lower limit on contig size? >>> Did maker do anything with the 1kb contigs? Or did it just skip them? >>> You can check that in the master_datastore_index.log or in the void directories for the small contigs. >>> That will tell us whether maker is functioning correctly, even though it?s giving those messages. >>> >>> With the newer versions of makers, I get messages identical to what you sent as part of the normal thread termination, even when maker is functioning normally. >>> >>> Thanks, >>> Daniel >>> >>> >>> >>>> On Jan 28, 2015, at 12:01 AM, Marc H?ppner wrote: >>>> >>>> Hi, >>>> >>>> this is probably a long shot, but I was hoping that someone on the list may have some advice as to how to debug an error that has been popping up when running Maker on our 10 node cluster. So, what is the issue? >>>> >>>> Maker runs fine on several assemblies that w have processed in the past, but I recently started on a fairly fragment (low N50) mammalian assembly and the collaborator was keen to have all contigs annotated, down to 1kb (I guess it is more about the repeats and blast matches in those small bits). Anyway, As the contigs get smaller, Maker starts crashing in MPI mode with the following error (no other message given prior to that): >>>> >>>> perl:13424 terminated with signal 11 at PC=3d47095012 SP=7f8ac076e530. Backtrace: >>>> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x3d47095012] >>>> /lib64/libpthread.so.0[0x358ae0f710] >>>> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x0)[0x3d47094ff0] >>>> /lib64/libpthread.so.0[0x358ae0f710] >>>> /lib64/libc.so.6(__poll+0x53)[0x358aadf343] >>>> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(+0x6af4a)[0x7f8ac0a29f4a] >>>> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x221)[0x7f8ac0a21961] >>>> /sw/openmpi/1.8.3/lib/libopen-rte.so.7(+0x52f8e)[0x7f8ac0ce5f8e] >>>> /lib64/libpthread.so.0[0x358ae079d1] >>>> /lib64/libc.so.6(clone+0x6d)[0x358aae8b6d] >>>> SIGTERM received >>>> >>>> A few words about the setup: >>>> >>>> We have 10 nodes, 160 cores and the shared file system is exported via Infiniband from a ?standard? NFS server. As OS we run Scientific Linux 6.5. Tests so far don?t point to congestion issues or anything like that, the bandwidth usage is actually fairly low. I >>>> >>>> So far I tried: >>>> >>>> - running the MPI processes through both the ethernet network as well as over IPoIB, same problem. >>>> - installing a more recent version of perl through perlbrew, with all the required modules, and re-compiled Maker >>>> - ran some (albeit simple) network checks to for retransmissions, lost packages etc - nothing popped up >>>> - running Maker in a subset of nodes to eliminate the possibility of a bad node >>>> >>>> The error message is a bit cryptic to me and it would be very helpful to know if Maker has a problem with accessing a file, or whether OpenMPI has a communication problem etc - but I am not able to tell from the information I have been able to extract so far. Any ideas? >>>> >>>> So >>>> >>>> Cheers, >>>> >>>> Marc >>>> >>>> >>>> Marc P. Hoeppner, PhD >>>> Team Leader >>>> BILS Genome Annotation Platform >>>> Department for Medical Biochemistry and Microbiology >>>> Uppsala University, Sweden >>>> marc.hoeppner at imbim.uu.se >>>> >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From myandell at genetics.utah.edu Thu Jan 29 09:54:50 2015 From: myandell at genetics.utah.edu (Mark Yandell) Date: Thu, 29 Jan 2015 16:54:50 +0000 Subject: [maker-devel] Maker crash on increasingly small contigs In-Reply-To: <54C9E279.8040907@imbim.uu.se> References: <074CBF77-E946-4E89-9C35-5F5A0B6AE866@slu.se> <4448D3E0-2F1C-41E0-981C-28C8C869AF8B@gmail.com> <19F7E075-6B18-4DB2-B97A-922D29456E52@genetics.utah.edu>, <54C9E279.8040907@imbim.uu.se> Message-ID: <7A60AB257EFF2B48B1F4C814817EA053E371D456@mxb2.hg.genetics.utah.edu> Hi Marc, are you sure this n't your system? E.G. bad NFS mounts, scratch full etc? Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Co-director USTAR Center for Genetic Discovery Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Marc P. Hoeppner [marc.hoeppner at imbim.uu.se] Sent: Thursday, January 29, 2015 12:34 AM To: Daniel Ence Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Maker crash on increasingly small contigs Hi, thanks for the feedback. If I resume maker enough times, it will eventually run through an complete all contigs. The question is whether there is any way to debug why it drops at random times , most commonly when running on small contigs (which is probably more due to the increasing frequency of starting/finishing jobs rather than their size). I guess Maker has no debug mode or any other way to find out why it dies? Any idea what could make Maker drop like that? I was thinking NFS, but the nfsstat looks fine, nothing in the log and NFS function is generally good - so I can't identify a good point to look for the problem. Regards, Marc On 2015-01-28 17:22, Daniel Ence wrote: > Hi Marc, so a few things on the maker side to check out. > > Did you have the min_contig set to 1000, to set the lower limit on contig size? > Did maker do anything with the 1kb contigs? Or did it just skip them? > You can check that in the master_datastore_index.log or in the void directories for the small contigs. > That will tell us whether maker is functioning correctly, even though it?s giving those messages. > > With the newer versions of makers, I get messages identical to what you sent as part of the normal thread termination, even when maker is functioning normally. > > Thanks, > Daniel > > > >> On Jan 28, 2015, at 12:01 AM, Marc H?ppner wrote: >> >> Hi, >> >> this is probably a long shot, but I was hoping that someone on the list may have some advice as to how to debug an error that has been popping up when running Maker on our 10 node cluster. So, what is the issue? >> >> Maker runs fine on several assemblies that w have processed in the past, but I recently started on a fairly fragment (low N50) mammalian assembly and the collaborator was keen to have all contigs annotated, down to 1kb (I guess it is more about the repeats and blast matches in those small bits). Anyway, As the contigs get smaller, Maker starts crashing in MPI mode with the following error (no other message given prior to that): >> >> perl:13424 terminated with signal 11 at PC=3d47095012 SP=7f8ac076e530. Backtrace: >> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x3d47095012] >> /lib64/libpthread.so.0[0x358ae0f710] >> /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x0)[0x3d47094ff0] >> /lib64/libpthread.so.0[0x358ae0f710] >> /lib64/libc.so.6(__poll+0x53)[0x358aadf343] >> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(+0x6af4a)[0x7f8ac0a29f4a] >> /sw/openmpi/1.8.3/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x221)[0x7f8ac0a21961] >> /sw/openmpi/1.8.3/lib/libopen-rte.so.7(+0x52f8e)[0x7f8ac0ce5f8e] >> /lib64/libpthread.so.0[0x358ae079d1] >> /lib64/libc.so.6(clone+0x6d)[0x358aae8b6d] >> SIGTERM received >> >> A few words about the setup: >> >> We have 10 nodes, 160 cores and the shared file system is exported via Infiniband from a ?standard? NFS server. As OS we run Scientific Linux 6.5. Tests so far don?t point to congestion issues or anything like that, the bandwidth usage is actually fairly low. I >> >> So far I tried: >> >> - running the MPI processes through both the ethernet network as well as over IPoIB, same problem. >> - installing a more recent version of perl through perlbrew, with all the required modules, and re-compiled Maker >> - ran some (albeit simple) network checks to for retransmissions, lost packages etc - nothing popped up >> - running Maker in a subset of nodes to eliminate the possibility of a bad node >> >> The error message is a bit cryptic to me and it would be very helpful to know if Maker has a problem with accessing a file, or whether OpenMPI has a communication problem etc - but I am not able to tell from the information I have been able to extract so far. Any ideas? >> >> So >> >> Cheers, >> >> Marc >> >> >> Marc P. Hoeppner, PhD >> Team Leader >> BILS Genome Annotation Platform >> Department for Medical Biochemistry and Microbiology >> Uppsala University, Sweden >> marc.hoeppner at imbim.uu.se >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From ashrafi at ucdavis.edu Thu Jan 29 11:07:41 2015 From: ashrafi at ucdavis.edu (Hamid Ashrafi) Date: Thu, 29 Jan 2015 13:07:41 -0500 Subject: [maker-devel] GFF and Dereferencing problem Message-ID: <007c01d03bee$7ab100a0$701301e0$@ucdavis.edu> Hi, After maker finishes its job it generates many files one of them is gff file. I see the following in some of my gff files. It seems it is a dereferencing problem. I am just wondering if affects my annotation. Hamid uti_cns_0004767 est2genome match_part 856428 856485 3090 + . ID=uti_cns_0004767:hsp:8340:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 uti_cns_0004767 est2genome match_part 856587 856938 3090 + . ID=uti_cns_0004767:hsp:8341:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 uti_cns_0004767 est2genome match_part 857053 857201 3090 + . ID=uti_cns_0004767:hsp:8342:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 uti_cns_0004767 est2genome match_part 859004 859041 3090 + . ID=uti_cns_0004767:hsp:8343:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 uti_cns_0004767 est2genome expressed_sequence_match 878327 878771 1446 + . ID=uti_cns_0004767:hit:4231:3.2.3.8;Name=Sp_Illum_Trans_W uti_cns_0004767 est2genome match_part 878327 878771 1446 + . ID=uti_cns_0004767:hsp:8344:3.2.3.8;Parent=uti_cns_0004767:hit:4231:3.2.3 uti_cns_0004767 est2genome expressed_sequence_match 884121 886610 2509 + . ID=uti_cns_0004767:hit:4232:3.2.3.8;Name=Sp_Illum_Trans_W uti_cns_0004767 est2genome match_part 884121 884195 2509 + . ID=uti_cns_0004767:hsp:8345:3.2.3.8;Parent=uti_cns_0004767:hit:4232:3.2.3 uti_cns_0004767 est2genome match_part 886180 886610 2509 + . ID=uti_cns_0004767:hsp:8346:3.2.3.8;Parent=uti_cns_0004767:hit:4232:3.2.3 ARRAY(0x1b91f110) ARRAY(0x1a686350) ARRAY(0x1b06bba0) ARRAY(0x1b931e10) ARRAY(0x1b13f3a0) ARRAY(0x1b6af650) ARRAY(0x1b929600) -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jan 29 11:47:11 2015 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 29 Jan 2015 11:47:11 -0700 Subject: [maker-devel] Maker on Amazon EC2 Using Starcluster In-Reply-To: <1422394249179.2a90ef9d@Nodemailer> References: <73716718-1273-46F1-BC94-AAD276DFE0E1@gmail.com> <1422394249179.2a90ef9d@Nodemailer> Message-ID: I believe this may be caused by the latency of ansyncrounous operations on your network shared drive (which could have a lot of lag between operations when running in the cloud). Try using a single AWS instance in your test using the local drive as the working directory. Next try with two instances where one id the NFS server and you run MAKER on the other instance but on the network mounted drive. Then try gradually increasing the number of instances hitting the network shared drive. ?Carson > On Jan 27, 2015, at 2:30 PM, Jason Gallant wrote: > > Carson, > > Thanks for the input and the test script? I was successfully able to run Maker using OpenMPI on Starcluster. However, I am still receiving error messages fairly commonly? this is the error I described earlier in this thread. It seems to appear regardless of whether I use OpenMPI or MPICH2. > > Essentially, there seems to be an error collapsing BLAST reports. This error essentially causes maker to stop accepting new contigs on that machine (in this case node060), and maker continues to report every contig following this error as ?failed?. Otherwise, the other nodes seem to be working normally, but this error seems to be able to happen on other nodes as well, so the issue can compound. > > [1,15]:deleted:-60 hits > [1,15]:collecting blastx reports > [1,15]:ERROR: Could not colapse BLAST reports > [1,15]: at /root/maker/bin/../lib/GI.pm line 2524 thread 1. > [1,15]: GI::combine_blast_report(FastaChunk=HASH(0x1781acd8), ARRAY(0xc1e4fa8), ARRAY(0x15ab20d0), runlog=HASH(0xb87f878)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760 thread 1 > [1,15]: Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415 thread 1 > [1,15]: eval {...} called at /root/maker/bin/../lib/Error.pm line 407 thread 1 > [1,15]: Error::subs::try(CODE(0x198e22f8), HASH(0x9c9b65c0)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4224 thread 1 > [1,15]: Process::MpiChunk::_go(Process::MpiChunk=HASH(0x1b8a7cd0), "run", HASH(0x15e3e1a0), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341 thread 1 > [1,15]: Process::MpiChunk::run(Process::MpiChunk=HASH(0x1b8a7cd0), 15) called at /root/maker/bin/maker line 1457 thread 1 > [1,15]: main::node_thread("/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 > [1,15]: eval {...} called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 > [1,15]: threads::new("threads", CODE(0x36c9a98), "/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /root/maker/bin/maker line 917 thread 1 > [1,15]:--> rank=15, hostname=node015 > [1,15]:ERROR: Failed while collecting blastx reports > [1,15]:ERROR: Chunk failed at level:9, tier_type:3 > [1,15]:FAILED CONTIG:Scaffold66 > [1,15]: > [1,15]:ERROR: Chunk failed at level:4, tier_type:0 > [1,15]:FAILED CONTIG:Scaffold66 > [1,15]: > [1,15]:examining contents of the fasta file and run log > [1,15]:ERROR: could not make datastore directory > [1,15]:--> rank=15, hostname=node015 > [1,15]:ERROR: Failed while examining contents of the fasta file and run log > [1,15]:ERROR: Chunk failed at level:0, tier_type:0 > [1,15]:FAILED CONTIG:Scaffold483 > > ? > Dr. Jason R. Gallant > Assistant Professor > Room 38 Natural Sciences > Department of Zoology > Michigan State University > East Lansing, MI 48824 > jgallant at msu.edu > office: 517-884-7756 > > > On Fri, Jan 23, 2015 at 3:25 PM, Carson Holt > wrote: > > The complaining is because there is more than one MAKER process running and they are not connected via MPI. So the problem is OpenMPI. Try installing a small MPI script (like the one attached) and using that to test OpenMPI. Once it is configured correctly then each separate processes will communicate with each other (pay attention to comm size and rank messages). > > ?Carson > > > > > >> On Jan 23, 2015, at 1:15 PM, Jason Gallant > wrote: >> >> Hi Carson, >> >> Yes, I?ve tried that and still have the issue of maker complaining about multiple processes in the same directory. Other ideas? >> >> Best, >> Jason >> >> ? >> Dr. Jason R. Gallant >> Assistant Professor >> Room 38 Natural Sciences >> Department of Zoology >> Michigan State University >> East Lansing, MI 48824 >> jgallant at msu.edu >> office: 517-884-7756 >> >> >> On Fri, Jan 23, 2015 at 3:14 PM, Carson Holt > wrote: >> >> If using OpenMPI, make sure to set LD_PRELOAD to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/usr/local/openmpi/lib/libmpi.so). >> >> >> For OpenMPI you may also want to set OMPI_MCA_mpi_warn_on_fork=0 in your ~/.bash_profile to turn off certain nonfatal warnings. Also if jobs hang or freeze when using mpiexec under OpenMPI try adding the '-mca btl ^openib' flag to mpiexec command when running MAKER. >> >> Example: mpiexec -mca btl ^openib -n 20 maker >> >> ?Carson >> >> >> >>> On Jan 23, 2015, at 1:08 PM, Jason Gallant > wrote: >>> >>> Hi Carson, >>> >>> Yes, STARCLUSTER enables a global storage space, which is via NFS to an EBS drive that I?ve created. >>> >>> I?m using the local disk space on each instance for the /tmp directory, however. >>> >>> It occurred to me on reading the forums that MPICH2 doesn?t scale as well as OPENMPI, however when I try to configure Maker for openmpi and run it, I get complaints from maker that multiple makers are running in the same directory? >>> >>> Thanks for your advice! >>> >>> Best, >>> Jason >>> >>> ? >>> Dr. Jason R. Gallant >>> Assistant Professor >>> Room 38 Natural Sciences >>> Department of Zoology >>> Michigan State University >>> East Lansing, MI 48824 >>> jgallant at msu.edu >>> office: 517-884-7756 >>> >>> >>> On Fri, Jan 23, 2015 at 3:01 PM, Carson Holt > wrote: >>> >>> MAKER needs a global storage location. You probably need to set up one of your instances up to act as a shared storage server. AWS has lustre implementations for the cloud, perhaps you can try that. Also use OpenMPI instead of MPICH2. It?s more stable. >>> >>> I look forward to seeing how your experiment with AWS, MPI, and MAKER works out. >>> >>> ?Carson >>> >>> >>> >>> > On Jan 21, 2015, at 6:56 AM, Jason Gallant > wrote: >>> > >>> > Hi Everyone, >>> > >>> > I?m attempting to run Maker on Amazon EC2 using MIT?s starcluster? I?ve started a 200 node cluster, and enabled MPICH2 (Starcluster by default uses OpenMPI). I plan on documenting this setup once I?ve figured out how to run things reliably. >>> > >>> > I?m having a persistent issue where something fails on one of the nodes, and std error is flooded with: >>> > >>> > examining contents of the fasta file and run log >>> > [67] ERROR: could not make datastore directory >>> > [67] --> rank=67, hostname=node067 >>> > [67] ERROR: Failed while examining contents of the fasta file and run log >>> > [67] ERROR: Chunk failed at level:0, tier_type:0 >>> > [67] FAILED CONTIG:Scaffold261 >>> > >>> > This error repeats for each ?next? scaffold for some time. When I go back to find the ?source? of the error in the log, the following is the first error message on that node: >>> > >>> > 67] #-------------------------------# >>> > [67] deleted:-60 hits >>> > [67] collecting blastx reports >>> > [67] ERROR: Could not colapse BLAST reports >>> > [67] at /root/maker/bin/../lib/GI.pm line 2524 thread 1. >>> > [67] GI::combine_blast_report(FastaChunk=HASH(0x108e1a90), ARRAY(0x1b874938), ARRAY(0xf127ad8), runlog=HASH(0x4d54ed8)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760 thread 1 >>> > [67] Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415 thread 1 >>> > [67] eval {...} called at /root/maker/bin/../lib/Error.pm line 407 thread 1 >>> > [67] Error::subs::try(CODE(0x1514eb00), HASH(0x9cbeb568)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4215 thread 1 >>> > [67] Process::MpiChunk::_go(Process::MpiChunk=HASH(0x13976308), "run", HASH(0x12e04268), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341 thread 1 >>> > [67] Process::MpiChunk::run(Process::MpiChunk=HASH(0x13976308), 67) called at /root/maker/bin/maker line 1457 thread 1 >>> > [67] main::node_thread("/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 >>> > [67] eval {...} called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1 >>> > [67] threads::new("threads", CODE(0x3dc5b38), "/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /root/maker/bin/maker line 917 thread 1 >>> > [67] --> rank=67, hostname=node067 >>> > [67] ERROR: Failed while collecting blastx reports >>> > [67] ERROR: Chunk failed at level:9, tier_type:3 >>> > [67] FAILED CONTIG:Scaffold66 >>> > [67] >>> > [67] ERROR: Chunk failed at level:4, tier_type:0 >>> > [67] FAILED CONTIG:Scaffold66 >>> > >>> > >>> > I?ve attempted to ignore the error to see if things will proceed on the other 199 processors. When I returned to the ?master? node after the evening, Maker keeps repeating the same error code over and over (same scaffold): >>> > ] examining contents of the fasta file and run log >>> > [67] ERROR: could not make datastore directory >>> > [67] --> rank=67, hostname=node067 >>> > [67] ERROR: Failed while examining contents of the fasta file and run log >>> > [67] ERROR: Chunk failed at level:0, tier_type:0 >>> > [67] FAILED CONTIG:Scaffold1589 >>> > >>> > I stop the job, and restart, and after only a few minutes of running, the same error is reported, this time on a new scaffold. Strangely here, the error is reported in the MPI tag of node001, but the error originates at node137: >>> > >>> > ERROR: Could not colapse BLAST reports >>> > [1] at /root/maker/bin/../lib/GI.pm line 2524. >>> > [1] GI::combine_blast_report(FastaChunk=HASH(0xf4aa9b8), ARRAY(0xf628f90), ARRAY(0x325fea78), runlog=HASH(0x133cc8e8)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760 >>> > [1] Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415 >>> > [1] eval {...} called at /root/maker/bin/../lib/Error.pm line 407 >>> > [1] Error::subs::try(CODE(0x352c9b8), HASH(0xdab3b690)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4215 >>> > [1] Process::MpiChunk::_go(Process::MpiChunk=HASH(0x3545d90), "run", HASH(0x30aa710), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341 >>> > [1] Process::MpiChunk::run(Process::MpiChunk=HASH(0x3545d90), 137) called at /root/maker/bin/maker line 979 >>> > [1] --> rank=137, hostname=node137 >>> > [1] ERROR: Failed while collecting blastx reports >>> > [1] ERROR: Chunk failed at level:9, tier_type:3 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] ERROR: Chunk failed at level:4, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > [1] >>> > [1] examining contents of the fasta file and run log >>> > [1] ERROR: could not make datastore directory >>> > [1] --> rank=1, hostname=node001 >>> > [1] ERROR: Failed while examining contents of the fasta file and run log >>> > [1] ERROR: Chunk failed at level:0, tier_type:0 >>> > [1] FAILED CONTIG:Scaffold249 >>> > >>> > I?d appreciate any guidance as how best to diagnose this error! >>> > >>> > Many thanks, >>> > Jason Gallant >>> > >>> > >>> > >>> > >>> > ? >>> > Dr. Jason R. Gallant >>> > Assistant Professor >>> > Room 38 Natural Sciences >>> > Department of Zoology >>> > Michigan State University >>> > East Lansing, MI 48824 >>> > jgallant at msu.edu >>> > office: 517-884-7756 >>> > _______________________________________________ >>> > maker-devel mailing list >>> > maker-devel at box290.bluehost.com >>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jan 29 12:40:09 2015 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 29 Jan 2015 12:40:09 -0700 Subject: [maker-devel] GFF and Dereferencing problem In-Reply-To: <007c01d03bee$7ab100a0$701301e0$@ucdavis.edu> References: <007c01d03bee$7ab100a0$701301e0$@ucdavis.edu> Message-ID: <65C0DD7A-A3CA-4404-B15E-91B77DC6D8FE@gmail.com> Could you make sure you are using the most recent version of MAKER? There was a bug similar to this that was fixed some time ago. Current version is 2.31.8. also when rerunning with the most recent version of MAKER, make sure to set the -a flag on the command line to force rerun of logged data. ?Carson > On Jan 29, 2015, at 11:07 AM, Hamid Ashrafi wrote: > > Hi, > > After maker finishes its job it generates many files one of them is gff file. I see the following in some of my gff files. It seems it is a dereferencing problem. I am just wondering if affects my annotation. > > Hamid > > uti_cns_0004767 est2genome match_part 856428 856485 3090 + . ID=uti_cns_0004767:hsp:8340:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 > uti_cns_0004767 est2genome match_part 856587 856938 3090 + . ID=uti_cns_0004767:hsp:8341:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 > uti_cns_0004767 est2genome match_part 857053 857201 3090 + . ID=uti_cns_0004767:hsp:8342:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 > uti_cns_0004767 est2genome match_part 859004 859041 3090 + . ID=uti_cns_0004767:hsp:8343:3.2.3.8;Parent=uti_cns_0004767:hit:4230:3.2.3 > uti_cns_0004767 est2genome expressed_sequence_match 878327 878771 1446 + . ID=uti_cns_0004767:hit:4231:3.2.3.8;Name=Sp_Illum_Trans_W > uti_cns_0004767 est2genome match_part 878327 878771 1446 + . ID=uti_cns_0004767:hsp:8344:3.2.3.8;Parent=uti_cns_0004767:hit:4231:3.2.3 > uti_cns_0004767 est2genome expressed_sequence_match 884121 886610 2509 + . ID=uti_cns_0004767:hit:4232:3.2.3.8;Name=Sp_Illum_Trans_W > uti_cns_0004767 est2genome match_part 884121 884195 2509 + . ID=uti_cns_0004767:hsp:8345:3.2.3.8;Parent=uti_cns_0004767:hit:4232:3.2.3 > uti_cns_0004767 est2genome match_part 886180 886610 2509 + . ID=uti_cns_0004767:hsp:8346:3.2.3.8;Parent=uti_cns_0004767:hit:4232:3.2.3 > ARRAY(0x1b91f110) > ARRAY(0x1a686350) > ARRAY(0x1b06bba0) > ARRAY(0x1b931e10) > ARRAY(0x1b13f3a0) > ARRAY(0x1b6af650) > ARRAY(0x1b929600) > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 30 09:33:46 2015 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 30 Jan 2015 09:33:46 -0700 Subject: [maker-devel] How to improve the result of Maker In-Reply-To: References: Message-ID: <492A6635-67E9-4700-B544-E137C4248E55@gmail.com> See below ?> > I have join "Maker-devel" google group, but I don't known why I can't reply a topic and create a new topic. Is there some limitation? The google site is just a searchable archive of MAKER related e-mails. The actual conversations occur through the MAKER mailing list ?> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org E-mails sent to the list will be automatically archived on google. > I have finish genome annotation with Maker. I use SNAP and Augustus in Maker. I have some questions, could you help me? > > When gene finders have prediction at the same location, maker would choose the best prediction as final output, right? but if the prediction doesn't match evidence very much, how maker will synthesize the prediction with evidence? My knowledge about maker's action is as follow, I'm not sure whether it is right: > > assume that there is an exon existing in evidence but not in prediction, if the exon locate at the end of prediction, it will be output as UTR, but if the exon locate inside prediction, it will be ignored, and not be output, right? No. MAKER uses the introns and exons in the evidence alignments to provide hints to the gene predictors. Hints increases the probability scores of the HMM models by increasing the likelihood of the exon or intron state wherever it overlaps the evidence alignment. This process bumps up the likelihood values for models that better match the evidence alignments resulting in better models than SNAP and Augustus produce on their own without hints. Note that models are still governed by the same constraints of what constitutes an open reading frame and a splice site regardless of evidence alignments. This means that no amount of evidence based hints can overcome an assembly error. > for example: > > the exon pointed by red arrow. all evidences contain this exon, but it was missed in the final output. There are two possibilities. Given how different the snap and augustus models are from one another, this would suggest they have not been trained appropriately (for example if you are picking another related organisms parameter file rather than training these programs, there are several assumptions that are being made that can actually make such an approach almost worse than just picking a parameter file at random). But more likely the evidence supported exon breaks the reading frame of the model. This usually indicates that you have an assembly error (possibly issues with homopolymers). No amount of evidence support will allow you to call an exon that generates a mis-sense causing frameshift, so the predictors do the next most reasonable thing - they drop the exon if another model is tenable. More concerning would be the mRNA-seq alignments near the 3? end of the gene call. The structure suggests significant capture of background transcription with the mRNA-seq reads (long UTRs with weird mini-introns). I would suggest not using cufflinks in this case. You should probably go with an assembly based approach of mRNA-seq reads instead. I would suggest using trinity. It will reduce sensitivity but greatly increase evidence specificity which is where you need the most improvement based on these images. I would also suggest using the jaccard_clip option with trinity. I would further suggest looking at the model in question using apollo, and manually adding the exon (click and drag it into the model). You can examine the reading frame after adding the exon and see if it is in fact a frameshift assembly error. If it?s a homopolymer derived frameshift, then you can expect a lot more of these throughout your assembly. Also I do not see any protein alignments here? MAKER cannot work on transcript evidence alone. You need to provide the full proteome of at least two other species (they don?t have to be that closely related, but closer is better). Protein alignments will also help you better interpret the coding status of exons supported by mRNA-seq. For example in the second image, you would expect protein evidence to support all the coding exons but not the UTR exons which would remove any doubt as to whether an exon is really UTR or not. > In this example, long UTR is another issue, is it non-coding RNA? > > I have another example: > > > The yellow was evidencs from cufflinks. The final output choose the prediction from Augustus, but the last two exon was annotated as UTR, I thought UTR should be continuous, and should not contain intron. Actually UTR is not expected to be continuous and without introns. In fact the majority of alternate splicing events occur in the 5? UTR (not in the CDS) and 5? UTR commonly contain introns (just as we see here). This makes evolutionary sense. Alternatively spiced 5? UTR allows for differential and tissue specific control of the exact same protein by swapping out the upstream regulatory sequence. Alternate splicing of the 3? UTR on the other hand is less common (it?s involved in nonsense mediated decay and not so much in regulation of expression), but introns in the 3? UTR are still not uncommon. The mRNA-seq alignments suggests that those exons are transcribed, so unless there is an assembly error causing a framefhift in the CDS and an early stop codon, the 3? UTR would be correct. If you had protein alignments from another species here, then you could see which exons they support as being coding exons. Thanks, Carson -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Fri Jan 30 21:48:33 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Sat, 31 Jan 2015 15:48:33 +1100 Subject: [maker-devel] genome duplication? Message-ID: Hi all, One of the fungal genomes I'm annotating is relatively shattered (?), with many contigs/scaffolds and based on CEGMA analysis only may indicate a potential widespread duplication of the genome # Statistics of the completeness of the genome based on 248 CEGs # > #Prots %Completeness - #Total Average %Ortho > > Complete 181 72.98 - 365 2.02 67.40 > Group 1 54 81.82 - 105 1.94 66.67 > Group 2 39 69.64 - 86 2.21 71.79 > Group 3 45 73.77 - 86 1.91 57.78 > Group 4 43 66.15 - 88 2.05 74.42 > Partial 230 92.74 - 528 2.30 77.83 > Group 1 61 92.42 - 140 2.30 72.13 > Group 2 53 94.64 - 127 2.40 84.91 > Group 3 56 91.80 - 126 2.25 69.64 > Group 4 60 92.31 - 135 2.25 85.00 The expected genome size is relatively low (~42 Mb by abyss-fac) in comparison with *Hortaea werneckii* (51.6Mb, 23333 genes), a related fungi with nearly 90% of its genes present in at least two copies. Paper: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0071328 Now to the Maker part... So, as part of the Maker annotation, I trained SNAP and Augustus, and I generated a specific RepeatModeler library. I recorded the predicted outputs from each Maker run (AED, number of predicted proteins and transcripts...). Both Augustus and SNAP used to give quite high number (~19000 and ~23000 respectively) in comparison with the xxx.all.maker.proteins.fasta (about 13600). So, my first question is, how does maker deal with gene duplications? Or is this just a phenomenon given that there is no support from the protein files provided initially to Maker? I've used 4 different protein files for the annotation, could it be that they weren't the best choices? I picked them from the closest relatives and similar environments So, in my last run I turn the keep_preds=1 and the proteins in the xxx.all.maker.proteins.fasta reached to Last question regarding the protein files. I download the annotated genomes from the JGI and most of them have two annotation folders "All_models,_Filtered_and_Not" and "Filtered_Models___best__". I've been using the protein files found in the later as I expected to have real evidence and a lower chance of being predicting false genes. Am I right? Thank you in advance, Xabier -- Xabier V?zquez Campos PhD Candidate Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Sat Jan 31 01:42:51 2015 From: mikael.durling at slu.se (=?utf-8?B?TWlrYWVsIEJyYW5kc3Ryw7ZtIER1cmxpbmc=?=) Date: Sat, 31 Jan 2015 08:42:51 +0000 Subject: [maker-devel] genome duplication? In-Reply-To: References: Message-ID: Hi Xabier, 31 jan 2015 kl. 05:48 skrev Xabier V?zquez Campos >: Hi all, One of the fungal genomes I'm annotating is relatively shattered (?), with many contigs/scaffolds and based on CEGMA analysis only may indicate a potential widespread duplication of the genome # Statistics of the completeness of the genome based on 248 CEGs # #Prots %Completeness - #Total Average %Ortho Complete 181 72.98 - 365 2.02 67.40 Partial 230 92.74 - 528 2.30 77.83 Judging from these figure, you seem to have a very fragmented assembly? What N50 have you reached? According to my experience, assemblies with an N50 below 5-10 times the average gene length tend to give problems in producing good gene sets. Not to say that the gene sets are unusable, but for comparing e.g. gene complements to other species, it will be hard to draw any conclusions when a high proportion of the genes are incomplete. The expected genome size is relatively low (~42 Mb by abyss-fac) in comparison with Hortaea werneckii (51.6Mb, 23333 genes), a related fungi with nearly 90% of its genes present in at least two copies. Paper: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0071328 Now to the Maker part... So, as part of the Maker annotation, I trained SNAP and Augustus, and I generated a specific RepeatModeler library. I recorded the predicted outputs from each Maker run (AED, number of predicted proteins and transcripts...). Both Augustus and SNAP used to give quite high number (~19000 and ~23000 respectively) in comparison with the xxx.all.maker.proteins.fasta (about 13600). So, my first question is, how does maker deal with gene duplications? Or is this just a phenomenon given that there is no support from the protein files provided initially to Maker? I've used 4 different protein files for the annotation, could it be that they weren't the best choices? I picked them from the closest relatives and similar environments Unless you by mistake filter out duplicated gene families as repeats with repeat modeler, maker should not care about duplicated genes. However, maker, without keep_preds=1, reports only genes with some kind of support (be it EST or protein homology). This is rather conservative, but if you enable keep_preds, you will get more genes as you have noted. Just for the sake of comparison, I have reannotad more than ten genomes downloaded from JGI, providing MAKER with similar evidence as JGI, and consistently, MAKER is reporting fewer gene models. I have yet to do a more thorough comparison to tell what genes JGI are reporting that don?t appear in the MAKER annotations. So, in my last run I turn the keep_preds=1 and the proteins in the xxx.all.maker.proteins.fasta reached to Last question regarding the protein files. I download the annotated genomes from the JGI and most of them have two annotation folders "All_models,_Filtered_and_Not" and "Filtered_Models___best__". I've been using the protein files found in the later as I expected to have real evidence and a lower chance of being predicting false genes. Am I right? Yes, I would say so. The FilteredModels have passed through their model selection pipeline, while all_models contains models from all predictors, as well as combinations of predictors and EST evidence. Just some 2 cents of observations of mine, cheers, Mikael Thank you in advance, Xabier -- Xabier V?zquez Campos PhD Candidate Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Sat Jan 31 01:51:36 2015 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Sat, 31 Jan 2015 19:51:36 +1100 Subject: [maker-devel] genome duplication? In-Reply-To: References: Message-ID: Thanks Mikael, This are the assembly stats as taken from abyss-fac, indeed it isn't a great N50, but it isn't that bad either n n:500 n:N50 min N80 N50 N20 E-size max sum 14277 7099 1185 500 4698 10771 20438 14530 154519 42.68e6 2015-01-31 19:42 GMT+11:00 Mikael Brandstr?m Durling : > Hi Xabier, > > 31 jan 2015 kl. 05:48 skrev Xabier V?zquez Campos : > > Hi all, > > One of the fungal genomes I'm annotating is relatively shattered (?), with > many contigs/scaffolds and based on CEGMA analysis only may indicate a > potential widespread duplication of the genome > > # Statistics of the completeness of the genome based on 248 CEGs >> # >> #Prots %Completeness - #Total Average %Ortho >> >> Complete 181 72.98 - 365 2.02 67.40 >> Partial 230 92.74 - 528 2.30 77.83 >> > > > Judging from these figure, you seem to have a very fragmented assembly? > What N50 have you reached? According to my experience, assemblies with an > N50 below 5-10 times the average gene length tend to give problems in > producing good gene sets. Not to say that the gene sets are unusable, but > for comparing e.g. gene complements to other species, it will be hard to > draw any conclusions when a high proportion of the genes are incomplete. > > The expected genome size is relatively low (~42 Mb by abyss-fac) in > comparison with *Hortaea werneckii* (51.6Mb, 23333 genes), a related > fungi with nearly 90% of its genes present in at least two copies. > Paper: > http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0071328 > > Now to the Maker part... So, as part of the Maker annotation, I trained > SNAP and Augustus, and I generated a specific RepeatModeler library. I > recorded the predicted outputs from each Maker run (AED, number of > predicted proteins and transcripts...). Both Augustus and SNAP used to give > quite high number (~19000 and ~23000 respectively) in comparison with the > xxx.all.maker.proteins.fasta (about 13600). So, my first question is, how > does maker deal with gene duplications? Or is this just a phenomenon given > that there is no support from the protein files provided initially to > Maker? I've used 4 different protein files for the annotation, could it be > that they weren't the best choices? I picked them from the closest > relatives and similar environments > > > Unless you by mistake filter out duplicated gene families as repeats > with repeat modeler, maker should not care about duplicated genes. However, > maker, without keep_preds=1, reports only genes with some kind of support > (be it EST or protein homology). This is rather conservative, but if you > enable keep_preds, you will get more genes as you have noted. Just for the > sake of comparison, I have reannotad more than ten genomes downloaded from > JGI, providing MAKER with similar evidence as JGI, and consistently, MAKER > is reporting fewer gene models. I have yet to do a more thorough comparison > to tell what genes JGI are reporting that don?t appear in the MAKER > annotations. > > > So, in my last run I turn the keep_preds=1 and the proteins in the > xxx.all.maker.proteins.fasta reached to > > Last question regarding the protein files. I download the annotated > genomes from the JGI and most of them have two annotation folders > "All_models,_Filtered_and_Not" and "Filtered_Models___best__". I've been > using the protein files found in the later as I expected to have real > evidence and a lower chance of being predicting false genes. Am I right? > > > Yes, I would say so. The FilteredModels have passed through their model > selection pipeline, while all_models contains models from all predictors, > as well as combinations of predictors and EST evidence. > > Just some 2 cents of observations of mine, > cheers, > Mikael > > > Thank you in advance, > > Xabier > > > -- > Xabier V?zquez Campos > PhD Candidate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From chenwenbo1020 at gmail.com Sat Jan 31 08:54:28 2015 From: chenwenbo1020 at gmail.com (=?UTF-8?B?6ZmI5paH5Y2a?=) Date: Sat, 31 Jan 2015 10:54:28 -0500 Subject: [maker-devel] How to improve the result of Maker In-Reply-To: <492A6635-67E9-4700-B544-E137C4248E55@gmail.com> References: <492A6635-67E9-4700-B544-E137C4248E55@gmail.com> Message-ID: > > > There are two possibilities. Given how different the snap and augustus > models are from one another, this would suggest they have not been trained > appropriately (for example if you are picking another related organisms > parameter file rather than training these programs, there are several > assumptions that are being made that can actually make such an approach > almost worse than just picking a parameter file at random). But more likely > the evidence supported exon breaks the reading frame of the model. This > usually indicates that you have an assembly error (possibly issues with > homopolymers). No amount of evidence support will allow you to call an > exon that generates a mis-sense causing frameshift, so the predictors do > the next most reasonable thing - they drop the exon if another model is > tenable. More concerning would be the mRNA-seq alignments near the 3? end > of the gene call. The structure suggests significant capture of background > transcription with the mRNA-seq reads (long UTRs with weird mini-introns). > I would suggest not using cufflinks in this case. You should probably go > with an assembly based approach of mRNA-seq reads instead. I would suggest > using trinity. It will reduce sensitivity but greatly increase evidence > specificity which is where you need the most improvement based on these > images. I would also suggest using the jaccard_clip option with trinity. > > I would further suggest looking at the model in question using apollo, and > manually adding the exon (click and drag it into the model). You can > examine the reading frame after adding the exon and see if it is in fact a > frameshift assembly error. If it?s a homopolymer derived frameshift, then > you can expect a lot more of these throughout your assembly. > I drag the exon into the model, there is a stop codon in it, it causes the region behind it become UTR, here: [image: ???? 1] the question exon was pointed by red arrow. But the uppermost evidence is the completed EST from NCBI, and it contains start and stop codon. Then I noticed the 5' boundary of the 2nd codon in model is not the same as EST, so it makes frameshift, and cause the stop codon in the exon pointed by red arrow. The first exon should not be CDS, as there would be a start codon in 2nd exon if its 5' boundary is predicted correctly. Would "always_complete=1" fix it? I will try to use trinity. > > Also I do not see any protein alignments here? MAKER cannot work on > transcript evidence alone. You need to provide the full proteome of at > least two other species (they don?t have to be that closely related, but > closer is better). Protein alignments will also help you better interpret > the coding status of exons supported by mRNA-seq. For example in the second > image, you would expect protein evidence to support all the coding exons > but not the UTR exons which would remove any doubt as to whether an exon is > really UTR or not. > I did use 3 sources of protein evidence, one is proteome from related species, and one is proteome from fruitfly, and the last one is Swiss-prot. Thank you very much! Best regards, Wenbo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 10308 bytes Desc: not available URL: From jason.stajich at gmail.com Sat Jan 31 16:21:12 2015 From: jason.stajich at gmail.com (Jason Stajich) Date: Sat, 31 Jan 2015 15:21:12 -0800 Subject: [maker-devel] genome duplication? In-Reply-To: References: Message-ID: Xabier - FYI - though you probably already compared, those stats are on par with the Hortaea v1 assembly, (we do have an improved Hortaea assembly now and genome size is still same range supporting the duplication hypothesis) Hw version 1 asmbly - N50 9623; Max 71563 CEGMA for Hw1 #Prots %Completeness - #Total Average %Ortho Complete 196 79.03 - 498 2.54 81.12 Partial 228 91.94 - 673 2.95 95.18 Mikael - yes - we should compare notes on the models JGI is calling which have little support in MAKER - I am not sure if their pipeline runs with augustus/snap using informant hints though usually they are bringing RNAseq into the mix - I don't know if your approach for reannotation assembled the RNAseq and used it as evidence? We'll be trying to assess some of this when comparisons of proportion of shared genes in the first 1KFG paper so we may be able to say with more certainty of these extra predictions whether they are shared more widely and get a handle on singleton/false positives rates. Jason Jason Stajich jason.stajich at gmail.com On Sat, Jan 31, 2015 at 12:51 AM, Xabier V?zquez Campos wrote: > Thanks Mikael, > > This are the assembly stats as taken from abyss-fac, indeed it isn't a > great N50, but it isn't that bad either > > n n:500 n:N50 min N80 N50 N20 E-size > max sum > 14277 7099 1185 500 4698 10771 20438 14530 154519 > 42.68e6 > > > > 2015-01-31 19:42 GMT+11:00 Mikael Brandstr?m Durling < > mikael.durling at slu.se>: > >> Hi Xabier, >> >> 31 jan 2015 kl. 05:48 skrev Xabier V?zquez Campos : >> >> Hi all, >> >> One of the fungal genomes I'm annotating is relatively shattered (?), >> with many contigs/scaffolds and based on CEGMA analysis only may indicate a >> potential widespread duplication of the genome >> >> # Statistics of the completeness of the genome based on 248 CEGs >>> # >>> #Prots %Completeness - #Total Average %Ortho >>> >>> Complete 181 72.98 - 365 2.02 67.40 >>> Partial 230 92.74 - 528 2.30 77.83 >>> >> >> >> Judging from these figure, you seem to have a very fragmented assembly? >> What N50 have you reached? According to my experience, assemblies with an >> N50 below 5-10 times the average gene length tend to give problems in >> producing good gene sets. Not to say that the gene sets are unusable, but >> for comparing e.g. gene complements to other species, it will be hard to >> draw any conclusions when a high proportion of the genes are incomplete. >> >> The expected genome size is relatively low (~42 Mb by abyss-fac) in >> comparison with *Hortaea werneckii* (51.6Mb, 23333 genes), a related >> fungi with nearly 90% of its genes present in at least two copies. >> Paper: >> http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0071328 >> >> Now to the Maker part... So, as part of the Maker annotation, I trained >> SNAP and Augustus, and I generated a specific RepeatModeler library. I >> recorded the predicted outputs from each Maker run (AED, number of >> predicted proteins and transcripts...). Both Augustus and SNAP used to give >> quite high number (~19000 and ~23000 respectively) in comparison with the >> xxx.all.maker.proteins.fasta (about 13600). So, my first question is, how >> does maker deal with gene duplications? Or is this just a phenomenon given >> that there is no support from the protein files provided initially to >> Maker? I've used 4 different protein files for the annotation, could it be >> that they weren't the best choices? I picked them from the closest >> relatives and similar environments >> >> >> Unless you by mistake filter out duplicated gene families as repeats >> with repeat modeler, maker should not care about duplicated genes. However, >> maker, without keep_preds=1, reports only genes with some kind of support >> (be it EST or protein homology). This is rather conservative, but if you >> enable keep_preds, you will get more genes as you have noted. Just for the >> sake of comparison, I have reannotad more than ten genomes downloaded from >> JGI, providing MAKER with similar evidence as JGI, and consistently, MAKER >> is reporting fewer gene models. I have yet to do a more thorough comparison >> to tell what genes JGI are reporting that don?t appear in the MAKER >> annotations. >> >> >> So, in my last run I turn the keep_preds=1 and the proteins in the >> xxx.all.maker.proteins.fasta reached to >> >> Last question regarding the protein files. I download the annotated >> genomes from the JGI and most of them have two annotation folders >> "All_models,_Filtered_and_Not" and "Filtered_Models___best__". I've been >> using the protein files found in the later as I expected to have real >> evidence and a lower chance of being predicting false genes. Am I right? >> >> >> Yes, I would say so. The FilteredModels have passed through their model >> selection pipeline, while all_models contains models from all predictors, >> as well as combinations of predictors and EST evidence. >> >> Just some 2 cents of observations of mine, >> cheers, >> Mikael >> >> >> Thank you in advance, >> >> Xabier >> >> >> -- >> Xabier V?zquez Campos >> PhD Candidate >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > > -- > Xabier V?zquez Campos > *PhD Candidate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: