[maker-devel] CPUs problems

Carson Holt carsonhh at gmail.com
Fri Sep 19 11:22:50 MDT 2014


These are further symptoms of an IO related issue. The script cannot even
query it's current working directory.
Check to make sure there is plenty of space in the temporary directory
/tmp.  If /tmp is separately mounted on each machine there may be one that
is full. Also make sure you did not set TMP= in the maker_opts.ctl file to
an NFS mounted location.

Do you by any chance get any warnings when you start MAKER.

For example -->
WARNING: Multiple MAKER processes have been started in the same directory.

That would indicate that the MPI communication rung is down which would
drastically increase IO operations.


You may also have one or more nodes that are having the issue and are the
source of all the errors. If you are using OpenMPI to run MAKER, you can
tag the output from each node using the --tag-output flag for mpiexec.
Then if the same node is always producing the error, you can have IT look
at it.

Also MAKER is set to automatically retry on errors.  If all contigs are
finished, check the output.  Make sure there are the same number of genes
in the fasta files and GFF3 files.  Also look for munged content.  If
everything looks ok, MAKER may have gotten around the issue through shear
brute force (I.e. it retried until it succeeded).

--Carson





On 9/18/14, 5:49 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]"
<nguyenan at mail.nih.gov> wrote:

>I re-ran maker on 10 CPUs. The maker job was finished after 10 days. I
>checked the log file and got these errors:
>
>Processing run.log file...
>examining contents of the fasta file and run log
>shell-init: error retrieving current directory: getcwd: cannot access
>parent directories: No such file or directory
>
>
>Can you let me know how can I fix the problem?
>
>Thanks
>Anh-Dao
>
>
>On 9/5/14 11:37 AM, "Carson Holt" <carsonhh at gmail.com> wrote:
>
>>The partial lines are symptoms of writing data to a slow NFS mounted
>>drive.  If NFS can't get a response for a write operation, it returns
>>success (even though it wasn't really successful) and then continues to
>>wait for the operation to really complete.  This is called asynchronous
>>writing.  It improves performance by optimistically returning success on
>>all operations rather than waiting to see if the operation really
>>succeeded. If you have a slow or overloaded NFS mount though, you can get
>>a number a failures and never any indication that they failed except for
>>the fact that some files are missing content or lines are partial.
>>
>>When this happens, you need to run MAKER with the -a flag on fewer CPUs
>>to
>>rebuild the GFF3 files. Fewer CPUs reduces the IO burden.  Or if you can
>>find which contigs have partial GFF3 lines, you can delete just those
>>along with the datastore index log file and then launch maker without any
>>flags to let it recompute just those contigs.
>>
>>Another possible cause is also NFS related.  If you are running MAKER
>>multiple times in the same working directory, and a slow NFS mount
>>doesn't
>>allow maker to properly lock files, then two maker jobs can try and
>>compute the same contig simultaneously.  Simultaneous writing of files
>>can
>>then cause IDs to be duplicated and some lines to be munged as lines from
>>one process arrive to the file in the middle of lines from another
>>process
>>(creating a jumble of characters and partial lines).  Start a singe maker
>>job on fewer cpus using the -a flag to rebuild the GFF3 files if this is
>>the case.
>>
>>Repeated gene/mRNA IDs can also be caused by gff3_passthrough when you
>>are
>>passing in GFF3 files with already assigned IDS (that may be used
>>elsewhere).  Are you using GFF3 pass-trough?
>>
>>Features that will not have unique ID= tags are CDS, three_prime_utr, and
>>five_prime_utr features (these are considered non-continuous features
>>because of the shared ID across lines).
>>You can see examples here --> http://www.sequenceontology.org/gff3.shtml
>>
>>Also Name= attributes are not required to be unique.
>>
>>Thanks,
>>Carson
>>
>>
>>
>>
>>
>>
>>On 9/5/14, 8:43 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]"
>><nguyenan at mail.nih.gov> wrote:
>>
>>>Hi,
>>>
>>>I finished running MAKER as suggested above.
>>>Then I ran gff3_merge.pl to retrieve only MAKER annotation using -n -g
>>>options. I called the output file maker.gff3
>>> 
>>>In the maker.gff3 I found some invalid data (does not conform .gff3
>>>format), e.g.
>>>
>>>###
>>>2 +
>>>###
>>>
>>>OR
>>>
>>>###
>>>.Contig1:hsp:72378:1.3.0.0;Parent=c209800247.Contig1:hit:30214:1.3.0.0;T
>>>a
>>>r
>>>g
>>>et=species:tRNA-Asn-AAC|genus:tRNA 1 75 +
>>>###
>>>
>>>OR some gene (or mRNA) IDs are not uniq. This means they can be found
>>>multiple times with different values within the maker.gff3
>>>
>>>How could it happen? As I understood, mRNA IDs in a .gff3 file must be
>>>uniq.
>>>
>>>Thanks
>>>Anh-Dao
>>> 
>>>
>>> 
>>>
>>>
>>>On 7/18/14 2:00 PM, "maker-devel-request at yandell-lab.org"
>>><maker-devel-request at yandell-lab.org> wrote:
>>>
>>>>Send maker-devel mailing list submissions to
>>>>	maker-devel at yandell-lab.org
>>>>
>>>>To subscribe or unsubscribe via the World Wide Web, visit
>>>>	http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or
>>>>g
>>>>
>>>>or, via email, send a message with subject or body 'help' to
>>>>	maker-devel-request at yandell-lab.org
>>>>
>>>>You can reach the person managing the list at
>>>>	maker-devel-owner at yandell-lab.org
>>>>
>>>>When replying, please edit your Subject line so it is more specific
>>>>than "Re: Contents of maker-devel digest..."
>>>>
>>>>
>>>>Today's Topics:
>>>>
>>>>   1. Re: Maker_opts.ctl (Carson Holt)
>>>>
>>>>
>>>>----------------------------------------------------------------------
>>>>
>>>>Message: 1
>>>>Date: Fri, 18 Jul 2014 11:04:09 -0600
>>>>From: Carson Holt <carsonhh at gmail.com>
>>>>To: "Nguyen, Anh-Dao (NIH/NHGRI) [C]" <nguyenan at mail.nih.gov>,	Daniel
>>>>	Ence <dence at genetics.utah.edu>
>>>>Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>Subject: Re: [maker-devel] Maker_opts.ctl
>>>>Message-ID: <CFEEAF84.DCAF%carsonhh at gmail.com>
>>>>Content-Type: text/plain;	charset="UTF-8"
>>>>
>>>>It should just be 'fgenesh'.  If it's not there you can still just give
>>>>the GFF3.
>>>>
>>>>--Carson
>>>>
>>>>
>>>>On 7/17/14, 8:19 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]"
>>>><nguyenan at mail.nih.gov> wrote:
>>>>
>>>>>I am not sure which fgenesh executable file should I use.
>>>>>
>>>>>fgenesh= #location of fgenesh executable
>>>>>
>>>>>When I run FGENESH++, I need to run the run_pipe.pl script. Sure you
>>>>>need
>>>>>to specify a list of other executable programs (such as ppd, ppdn+,
>>>>>etc)
>>>>>
>>>>>Anh-Dao
>>>>>
>>>>>
>>>>>On 7/16/14 3:32 PM, "Carson Holt" <carsonhh at gmail.com> wrote:
>>>>>
>>>>>>'all' will use the whole of RepBase, or you can do 'metazoa' like
>>>>>>your
>>>>>>previous run.  Then provide the RepeatModeler file to rmlib=
>>>>>>
>>>>>>--Carson
>>>>>>
>>>>>>
>>>>>>
>>>>>>On 7/16/14, 1:28 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]"
>>>>>><nguyenan at mail.nih.gov> wrote:
>>>>>>
>>>>>>>By default, model_org=all. Can I use the de novo repeat library
>>>>>>>predicted
>>>>>>>by RepeatModeler for the rmlib option?
>>>>>>>
>>>>>>>Anh-Dao
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>On 7/16/14 3:17 PM, "Carson Holt" <carsonhh at gmail.com> wrote:
>>>>>>>
>>>>>>>>No.  You can provide both to MAKER. The options are model_org= and
>>>>>>>>rmlib=.
>>>>>>>> By letting MAKER handle repeat masking it will differentiate
>>>>>>>>repeat
>>>>>>>>types
>>>>>>>>and use soft masking for some and hard masking for others.  This
>>>>>>>>increases
>>>>>>>>sensitivity of evidence alignments while still maintaining
>>>>>>>>specificity.
>>>>>>>>
>>>>>>>>--Carson
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>On 7/16/14, 1:07 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]"
>>>>>>>><nguyenan at mail.nih.gov> wrote:
>>>>>>>>
>>>>>>>>>I will run Augustus and FGENESH++ inside of MAKER using the
>>>>>>>>>parameter
>>>>>>>>>files for Augustus.
>>>>>>>>>I could also run RepeatMasker inside of MAKER. However, I ran RM
>>>>>>>>>using
>>>>>>>>>two
>>>>>>>>>options: -lib (de novo) and -species (known). I got ~ 45% repeats
>>>>>>>>>via
>>>>>>>>>de
>>>>>>>>>novo and ~ 4% repeats via known options. As I understood, RM
>>>>>>>>>inside
>>>>>>>>>of
>>>>>>>>>MAKER uses only RepBase repeat library and RepeatRunner protein
>>>>>>>>>database.
>>>>>>>>>
>>>>>>>>>Anh-Dao
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>On 7/16/14 2:36 PM, "Carson Holt" <carsonhh at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>>When you ran Augustus separately, it should have created the
>>>>>>>>>>parameters
>>>>>>>>>>needed to run it.  Now you should be able to run it inside of
>>>>>>>>>>MAKER
>>>>>>>>>>using
>>>>>>>>>>the species name you just created.
>>>>>>>>>>
>>>>>>>>>>I'd also recommend letting MAKER run RepeatMasker for you rather
>>>>>>>>>>than
>>>>>>>>>>giving it the results as GFF3.
>>>>>>>>>>
>>>>>>>>>>--Carson
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>On 7/16/14, 12:30 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]"
>>>>>>>>>><nguyenan at mail.nih.gov> wrote:
>>>>>>>>>>
>>>>>>>>>>>Thanks Daniel for your quick response.
>>>>>>>>>>>
>>>>>>>>>>>I did not use the parameter file of other organism when running
>>>>>>>>>>>Augustus.
>>>>>>>>>>>I created the parameter file for the genome following their
>>>>>>>>>>>instructions.
>>>>>>>>>>>There were multiple steps to train and run Augustus (Creating
>>>>>>>>>>>gene
>>>>>>>>>>>structures for training AUGUSTUS with CEGMA => parameter file
>>>>>>>>>>>will
>>>>>>>>>>>be
>>>>>>>>>>>created; Creating Hints for AUGUSTUS from ESTs/cDNA sequences;
>>>>>>>>>>>Incorporating Illumina RNAseq into AUGUSTUS with GSNAP, etc.)
>>>>>>>>>>>As I mentioned the reason why I ran Augustus separately, because
>>>>>>>>>>>Augustus
>>>>>>>>>>>has not trained that genome (no parameter file exists).
>>>>>>>>>>>Otherwise
>>>>>>>>>>>I
>>>>>>>>>>>would
>>>>>>>>>>>run Augustus inside MAKER.
>>>>>>>>>>> 
>>>>>>>>>>>You suggested to use rm_gff option to specify RepeatMasker
>>>>>>>>>>>output
>>>>>>>>>>>(sure
>>>>>>>>>>>I
>>>>>>>>>>>will convert them to .gff3 formatted files). Can I submit two RM
>>>>>>>>>>>.gff3
>>>>>>>>>>>files, separated by comma?
>>>>>>>>>>>
>>>>>>>>>>>Anh-Dao
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>On 7/16/14 2:13 PM, "Daniel Ence" <dence at genetics.utah.edu>
>>>>>>>>>>>wrote:
>>>>>>>>>>>
>>>>>>>>>>>>Hi Anh-Dao,
>>>>>>>>>>>>
>>>>>>>>>>>>In the maker_opts.ctl file, there are options for est and
>>>>>>>>>>>>protein
>>>>>>>>>>>>evidence. You?ll put all of your fasta est files together in a
>>>>>>>>>>>>command
>>>>>>>>>>>>separated list in the ?est" option, and all of your fasta
>>>>>>>>>>>>protein
>>>>>>>>>>>>files
>>>>>>>>>>>>in a command separated list for the ?protein? option.
>>>>>>>>>>>>
>>>>>>>>>>>>You?ll specify the SNAP and Genemark files in their respective
>>>>>>>>>>>>options
>>>>>>>>>>>>in
>>>>>>>>>>>>the control file and pass the augustus and fgenesh predictions
>>>>>>>>>>>>in
>>>>>>>>>>>>the
>>>>>>>>>>>>?pred_gff? option.
>>>>>>>>>>>>
>>>>>>>>>>>>If you have the RepeatMasker output in gff3 format you can give
>>>>>>>>>>>>it
>>>>>>>>>>>>to
>>>>>>>>>>>>maker with the ?rm_gff? option.
>>>>>>>>>>>>
>>>>>>>>>>>>If you?ve converted the cufflinks output to gff3, you can give
>>>>>>>>>>>>it
>>>>>>>>>>>>to
>>>>>>>>>>>>maker with the ?est_gff? option. I?m pretty sure Trinity only
>>>>>>>>>>>>gives
>>>>>>>>>>>>fasta
>>>>>>>>>>>>output, so you would put that in the ?est? option, along with
>>>>>>>>>>>>all
>>>>>>>>>>>>the
>>>>>>>>>>>>other est fasta files.
>>>>>>>>>>>>
>>>>>>>>>>>>If Augustus isn?t trained for your particular organism, then
>>>>>>>>>>>>you
>>>>>>>>>>>>can
>>>>>>>>>>>>use
>>>>>>>>>>>>another organism that augustus is already trained for. The list
>>>>>>>>>>>>of
>>>>>>>>>>>>species that augustus has parameter files for is in the
>>>>>>>>>>>>README.txt
>>>>>>>>>>>>that
>>>>>>>>>>>>came with Augustus. I really recommend that you run Augustus
>>>>>>>>>>>>from
>>>>>>>>>>>>inside
>>>>>>>>>>>>maker, because then you get all the benefits of maker passing
>>>>>>>>>>>>ext-based
>>>>>>>>>>>>hints to augustus at runtime, which can really improve
>>>>>>>>>>>>Augustus?
>>>>>>>>>>>>predictive ability.
>>>>>>>>>>>>
>>>>>>>>>>>>When you ran the augustus gene prediction separately, did you
>>>>>>>>>>>>use
>>>>>>>>>>>>another
>>>>>>>>>>>>organism?s parameter file?
>>>>>>>>>>>>
>>>>>>>>>>>>Thanks,
>>>>>>>>>>>>Daniel
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>On Jul 16, 2014, at 11:15 AM, Nguyen, Anh-Dao (NIH/NHGRI) [C]
>>>>>>>>>>>><nguyenan at mail.nih.gov> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I would like to conduct a genome annotation and have the
>>>>>>>>>>>>>following
>>>>>>>>>>>>>data:
>>>>>>>>>>>>> - Two separate RepeatMasker outputs (using -lib and -species
>>>>>>>>>>>>>options)
>>>>>>>>>>>>> - ESTs and RACE (fasta)
>>>>>>>>>>>>> - proteins (fasta)
>>>>>>>>>>>>> - proteins of related organisms (fasta)
>>>>>>>>>>>>> - SNAP's .hmm file (ran CEGMA, then used cegma2zff.pl to
>>>>>>>>>>>>>convert
>>>>>>>>>>>>>to
>>>>>>>>>>>>>ZFF
>>>>>>>>>>>>>format, etc. )
>>>>>>>>>>>>> - GeneMark's .hmm file (es.mod file from running gm_es.pl)
>>>>>>>>>>>>> - FGENESH++ and Augustus gene predictions. I wrote scripts to
>>>>>>>>>>>>>convert
>>>>>>>>>>>>>the outputs to .gff3 files. The reason why I ran Augustus gene
>>>>>>>>>>>>>prediction separately, because the genome has never been
>>>>>>>>>>>>>trained
>>>>>>>>>>>>>for
>>>>>>>>>>>>>Augustus.
>>>>>>>>>>>>> - Cufflinks and Trinity from RNA-Seq
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Could you please let me know how can I specify parameters in
>>>>>>>>>>>>>the
>>>>>>>>>>>>>maker_opts.ctl file?
>>>>>>>>>>>>> Or do you have other suggestions to re-do the data listed
>>>>>>>>>>>>>above?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>> Anh-Dao
>>>>>>>>>>>>> 
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> maker-devel mailing list
>>>>>>>>>>>>> maker-devel at box290.bluehost.com
>>>>>>>>>>>>> 
>>>>>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandel
>>>>>>>>>>>>>l
>>>>>>>>>>>>>-
>>>>>>>>>>>>>l
>>>>>>>>>>>>>a
>>>>>>>>>>>>>b
>>>>>>>>>>>>>.
>>>>>>>>>>>>>o
>>>>>>>>>>>>>r
>>>>>>>>>>>>>g
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>_______________________________________________
>>>>>>>>>>>maker-devel mailing list
>>>>>>>>>>>maker-devel at box290.bluehost.com
>>>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-
>>>>>>>>>>>l
>>>>>>>>>>>a
>>>>>>>>>>>b
>>>>>>>>>>>.
>>>>>>>>>>>o
>>>>>>>>>>>r
>>>>>>>>>>>g
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>------------------------------
>>>>
>>>>Subject: Digest Footer
>>>>
>>>>_______________________________________________
>>>>maker-devel mailing list
>>>>maker-devel at box290.bluehost.com
>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>
>>>>
>>>>------------------------------
>>>>
>>>>End of maker-devel Digest, Vol 74, Issue 17
>>>>*******************************************
>>>
>>>
>>>_______________________________________________
>>>maker-devel mailing list
>>>maker-devel at box290.bluehost.com
>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>






More information about the maker-devel mailing list