[maker-devel] CPUs problems

Nguyen, Anh-Dao (NIH/NHGRI) [C] nguyenan at mail.nih.gov
Thu Sep 18 05:49:45 MDT 2014


I re-ran maker on 10 CPUs. The maker job was finished after 10 days. I
checked the log file and got these errors:

Processing run.log file...
examining contents of the fasta file and run log
shell-init: error retrieving current directory: getcwd: cannot access
parent directories: No such file or directory


Can you let me know how can I fix the problem?

Thanks
Anh-Dao


On 9/5/14 11:37 AM, "Carson Holt" <carsonhh at gmail.com> wrote:

>The partial lines are symptoms of writing data to a slow NFS mounted
>drive.  If NFS can't get a response for a write operation, it returns
>success (even though it wasn't really successful) and then continues to
>wait for the operation to really complete.  This is called asynchronous
>writing.  It improves performance by optimistically returning success on
>all operations rather than waiting to see if the operation really
>succeeded. If you have a slow or overloaded NFS mount though, you can get
>a number a failures and never any indication that they failed except for
>the fact that some files are missing content or lines are partial.
>
>When this happens, you need to run MAKER with the -a flag on fewer CPUs to
>rebuild the GFF3 files. Fewer CPUs reduces the IO burden.  Or if you can
>find which contigs have partial GFF3 lines, you can delete just those
>along with the datastore index log file and then launch maker without any
>flags to let it recompute just those contigs.
>
>Another possible cause is also NFS related.  If you are running MAKER
>multiple times in the same working directory, and a slow NFS mount doesn't
>allow maker to properly lock files, then two maker jobs can try and
>compute the same contig simultaneously.  Simultaneous writing of files can
>then cause IDs to be duplicated and some lines to be munged as lines from
>one process arrive to the file in the middle of lines from another process
>(creating a jumble of characters and partial lines).  Start a singe maker
>job on fewer cpus using the -a flag to rebuild the GFF3 files if this is
>the case.
>
>Repeated gene/mRNA IDs can also be caused by gff3_passthrough when you are
>passing in GFF3 files with already assigned IDS (that may be used
>elsewhere).  Are you using GFF3 pass-trough?
>
>Features that will not have unique ID= tags are CDS, three_prime_utr, and
>five_prime_utr features (these are considered non-continuous features
>because of the shared ID across lines).
>You can see examples here --> http://www.sequenceontology.org/gff3.shtml
>
>Also Name= attributes are not required to be unique.
>
>Thanks,
>Carson
>
>
>
>
>
>
>On 9/5/14, 8:43 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]"
><nguyenan at mail.nih.gov> wrote:
>
>>Hi,
>>
>>I finished running MAKER as suggested above.
>>Then I ran gff3_merge.pl to retrieve only MAKER annotation using -n -g
>>options. I called the output file maker.gff3
>> 
>>In the maker.gff3 I found some invalid data (does not conform .gff3
>>format), e.g.
>>
>>###
>>2 +
>>###
>>
>>OR
>>
>>###
>>.Contig1:hsp:72378:1.3.0.0;Parent=c209800247.Contig1:hit:30214:1.3.0.0;Ta
>>r
>>g
>>et=species:tRNA-Asn-AAC|genus:tRNA 1 75 +
>>###
>>
>>OR some gene (or mRNA) IDs are not uniq. This means they can be found
>>multiple times with different values within the maker.gff3
>>
>>How could it happen? As I understood, mRNA IDs in a .gff3 file must be
>>uniq.
>>
>>Thanks
>>Anh-Dao
>> 
>>
>> 
>>
>>
>>On 7/18/14 2:00 PM, "maker-devel-request at yandell-lab.org"
>><maker-devel-request at yandell-lab.org> wrote:
>>
>>>Send maker-devel mailing list submissions to
>>>	maker-devel at yandell-lab.org
>>>
>>>To subscribe or unsubscribe via the World Wide Web, visit
>>>	http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>
>>>or, via email, send a message with subject or body 'help' to
>>>	maker-devel-request at yandell-lab.org
>>>
>>>You can reach the person managing the list at
>>>	maker-devel-owner at yandell-lab.org
>>>
>>>When replying, please edit your Subject line so it is more specific
>>>than "Re: Contents of maker-devel digest..."
>>>
>>>
>>>Today's Topics:
>>>
>>>   1. Re: Maker_opts.ctl (Carson Holt)
>>>
>>>
>>>----------------------------------------------------------------------
>>>
>>>Message: 1
>>>Date: Fri, 18 Jul 2014 11:04:09 -0600
>>>From: Carson Holt <carsonhh at gmail.com>
>>>To: "Nguyen, Anh-Dao (NIH/NHGRI) [C]" <nguyenan at mail.nih.gov>,	Daniel
>>>	Ence <dence at genetics.utah.edu>
>>>Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>Subject: Re: [maker-devel] Maker_opts.ctl
>>>Message-ID: <CFEEAF84.DCAF%carsonhh at gmail.com>
>>>Content-Type: text/plain;	charset="UTF-8"
>>>
>>>It should just be 'fgenesh'.  If it's not there you can still just give
>>>the GFF3.
>>>
>>>--Carson
>>>
>>>
>>>On 7/17/14, 8:19 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]"
>>><nguyenan at mail.nih.gov> wrote:
>>>
>>>>I am not sure which fgenesh executable file should I use.
>>>>
>>>>fgenesh= #location of fgenesh executable
>>>>
>>>>When I run FGENESH++, I need to run the run_pipe.pl script. Sure you
>>>>need
>>>>to specify a list of other executable programs (such as ppd, ppdn+,
>>>>etc)
>>>>
>>>>Anh-Dao
>>>>
>>>>
>>>>On 7/16/14 3:32 PM, "Carson Holt" <carsonhh at gmail.com> wrote:
>>>>
>>>>>'all' will use the whole of RepBase, or you can do 'metazoa' like your
>>>>>previous run.  Then provide the RepeatModeler file to rmlib=
>>>>>
>>>>>--Carson
>>>>>
>>>>>
>>>>>
>>>>>On 7/16/14, 1:28 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]"
>>>>><nguyenan at mail.nih.gov> wrote:
>>>>>
>>>>>>By default, model_org=all. Can I use the de novo repeat library
>>>>>>predicted
>>>>>>by RepeatModeler for the rmlib option?
>>>>>>
>>>>>>Anh-Dao
>>>>>>
>>>>>>
>>>>>>
>>>>>>On 7/16/14 3:17 PM, "Carson Holt" <carsonhh at gmail.com> wrote:
>>>>>>
>>>>>>>No.  You can provide both to MAKER. The options are model_org= and
>>>>>>>rmlib=.
>>>>>>> By letting MAKER handle repeat masking it will differentiate repeat
>>>>>>>types
>>>>>>>and use soft masking for some and hard masking for others.  This
>>>>>>>increases
>>>>>>>sensitivity of evidence alignments while still maintaining
>>>>>>>specificity.
>>>>>>>
>>>>>>>--Carson
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>On 7/16/14, 1:07 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]"
>>>>>>><nguyenan at mail.nih.gov> wrote:
>>>>>>>
>>>>>>>>I will run Augustus and FGENESH++ inside of MAKER using the
>>>>>>>>parameter
>>>>>>>>files for Augustus.
>>>>>>>>I could also run RepeatMasker inside of MAKER. However, I ran RM
>>>>>>>>using
>>>>>>>>two
>>>>>>>>options: -lib (de novo) and -species (known). I got ~ 45% repeats
>>>>>>>>via
>>>>>>>>de
>>>>>>>>novo and ~ 4% repeats via known options. As I understood, RM inside
>>>>>>>>of
>>>>>>>>MAKER uses only RepBase repeat library and RepeatRunner protein
>>>>>>>>database.
>>>>>>>>
>>>>>>>>Anh-Dao
>>>>>>>>
>>>>>>>>
>>>>>>>>On 7/16/14 2:36 PM, "Carson Holt" <carsonhh at gmail.com> wrote:
>>>>>>>>
>>>>>>>>>When you ran Augustus separately, it should have created the
>>>>>>>>>parameters
>>>>>>>>>needed to run it.  Now you should be able to run it inside of
>>>>>>>>>MAKER
>>>>>>>>>using
>>>>>>>>>the species name you just created.
>>>>>>>>>
>>>>>>>>>I'd also recommend letting MAKER run RepeatMasker for you rather
>>>>>>>>>than
>>>>>>>>>giving it the results as GFF3.
>>>>>>>>>
>>>>>>>>>--Carson
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>On 7/16/14, 12:30 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]"
>>>>>>>>><nguyenan at mail.nih.gov> wrote:
>>>>>>>>>
>>>>>>>>>>Thanks Daniel for your quick response.
>>>>>>>>>>
>>>>>>>>>>I did not use the parameter file of other organism when running
>>>>>>>>>>Augustus.
>>>>>>>>>>I created the parameter file for the genome following their
>>>>>>>>>>instructions.
>>>>>>>>>>There were multiple steps to train and run Augustus (Creating
>>>>>>>>>>gene
>>>>>>>>>>structures for training AUGUSTUS with CEGMA => parameter file
>>>>>>>>>>will
>>>>>>>>>>be
>>>>>>>>>>created; Creating Hints for AUGUSTUS from ESTs/cDNA sequences;
>>>>>>>>>>Incorporating Illumina RNAseq into AUGUSTUS with GSNAP, etc.)
>>>>>>>>>>As I mentioned the reason why I ran Augustus separately, because
>>>>>>>>>>Augustus
>>>>>>>>>>has not trained that genome (no parameter file exists). Otherwise
>>>>>>>>>>I
>>>>>>>>>>would
>>>>>>>>>>run Augustus inside MAKER.
>>>>>>>>>> 
>>>>>>>>>>You suggested to use rm_gff option to specify RepeatMasker output
>>>>>>>>>>(sure
>>>>>>>>>>I
>>>>>>>>>>will convert them to .gff3 formatted files). Can I submit two RM
>>>>>>>>>>.gff3
>>>>>>>>>>files, separated by comma?
>>>>>>>>>>
>>>>>>>>>>Anh-Dao
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>On 7/16/14 2:13 PM, "Daniel Ence" <dence at genetics.utah.edu>
>>>>>>>>>>wrote:
>>>>>>>>>>
>>>>>>>>>>>Hi Anh-Dao,
>>>>>>>>>>>
>>>>>>>>>>>In the maker_opts.ctl file, there are options for est and
>>>>>>>>>>>protein
>>>>>>>>>>>evidence. You?ll put all of your fasta est files together in a
>>>>>>>>>>>command
>>>>>>>>>>>separated list in the ?est" option, and all of your fasta
>>>>>>>>>>>protein
>>>>>>>>>>>files
>>>>>>>>>>>in a command separated list for the ?protein? option.
>>>>>>>>>>>
>>>>>>>>>>>You?ll specify the SNAP and Genemark files in their respective
>>>>>>>>>>>options
>>>>>>>>>>>in
>>>>>>>>>>>the control file and pass the augustus and fgenesh predictions
>>>>>>>>>>>in
>>>>>>>>>>>the
>>>>>>>>>>>?pred_gff? option.
>>>>>>>>>>>
>>>>>>>>>>>If you have the RepeatMasker output in gff3 format you can give
>>>>>>>>>>>it
>>>>>>>>>>>to
>>>>>>>>>>>maker with the ?rm_gff? option.
>>>>>>>>>>>
>>>>>>>>>>>If you?ve converted the cufflinks output to gff3, you can give
>>>>>>>>>>>it
>>>>>>>>>>>to
>>>>>>>>>>>maker with the ?est_gff? option. I?m pretty sure Trinity only
>>>>>>>>>>>gives
>>>>>>>>>>>fasta
>>>>>>>>>>>output, so you would put that in the ?est? option, along with
>>>>>>>>>>>all
>>>>>>>>>>>the
>>>>>>>>>>>other est fasta files.
>>>>>>>>>>>
>>>>>>>>>>>If Augustus isn?t trained for your particular organism, then you
>>>>>>>>>>>can
>>>>>>>>>>>use
>>>>>>>>>>>another organism that augustus is already trained for. The list
>>>>>>>>>>>of
>>>>>>>>>>>species that augustus has parameter files for is in the
>>>>>>>>>>>README.txt
>>>>>>>>>>>that
>>>>>>>>>>>came with Augustus. I really recommend that you run Augustus
>>>>>>>>>>>from
>>>>>>>>>>>inside
>>>>>>>>>>>maker, because then you get all the benefits of maker passing
>>>>>>>>>>>ext-based
>>>>>>>>>>>hints to augustus at runtime, which can really improve Augustus?
>>>>>>>>>>>predictive ability.
>>>>>>>>>>>
>>>>>>>>>>>When you ran the augustus gene prediction separately, did you
>>>>>>>>>>>use
>>>>>>>>>>>another
>>>>>>>>>>>organism?s parameter file?
>>>>>>>>>>>
>>>>>>>>>>>Thanks,
>>>>>>>>>>>Daniel
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>On Jul 16, 2014, at 11:15 AM, Nguyen, Anh-Dao (NIH/NHGRI) [C]
>>>>>>>>>>><nguyenan at mail.nih.gov> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> 
>>>>>>>>>>>> I would like to conduct a genome annotation and have the
>>>>>>>>>>>>following
>>>>>>>>>>>>data:
>>>>>>>>>>>> - Two separate RepeatMasker outputs (using -lib and -species
>>>>>>>>>>>>options)
>>>>>>>>>>>> - ESTs and RACE (fasta)
>>>>>>>>>>>> - proteins (fasta)
>>>>>>>>>>>> - proteins of related organisms (fasta)
>>>>>>>>>>>> - SNAP's .hmm file (ran CEGMA, then used cegma2zff.pl to
>>>>>>>>>>>>convert
>>>>>>>>>>>>to
>>>>>>>>>>>>ZFF
>>>>>>>>>>>>format, etc. )
>>>>>>>>>>>> - GeneMark's .hmm file (es.mod file from running gm_es.pl)
>>>>>>>>>>>> - FGENESH++ and Augustus gene predictions. I wrote scripts to
>>>>>>>>>>>>convert
>>>>>>>>>>>>the outputs to .gff3 files. The reason why I ran Augustus gene
>>>>>>>>>>>>prediction separately, because the genome has never been
>>>>>>>>>>>>trained
>>>>>>>>>>>>for
>>>>>>>>>>>>Augustus.
>>>>>>>>>>>> - Cufflinks and Trinity from RNA-Seq
>>>>>>>>>>>> 
>>>>>>>>>>>> Could you please let me know how can I specify parameters in
>>>>>>>>>>>>the
>>>>>>>>>>>>maker_opts.ctl file?
>>>>>>>>>>>> Or do you have other suggestions to re-do the data listed
>>>>>>>>>>>>above?
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>> Anh-Dao
>>>>>>>>>>>> 
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> maker-devel mailing list
>>>>>>>>>>>> maker-devel at box290.bluehost.com
>>>>>>>>>>>> 
>>>>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell
>>>>>>>>>>>>-
>>>>>>>>>>>>l
>>>>>>>>>>>>a
>>>>>>>>>>>>b
>>>>>>>>>>>>.
>>>>>>>>>>>>o
>>>>>>>>>>>>r
>>>>>>>>>>>>g
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>_______________________________________________
>>>>>>>>>>maker-devel mailing list
>>>>>>>>>>maker-devel at box290.bluehost.com
>>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-l
>>>>>>>>>>a
>>>>>>>>>>b
>>>>>>>>>>.
>>>>>>>>>>o
>>>>>>>>>>r
>>>>>>>>>>g
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>>
>>>
>>>------------------------------
>>>
>>>Subject: Digest Footer
>>>
>>>_______________________________________________
>>>maker-devel mailing list
>>>maker-devel at box290.bluehost.com
>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>
>>>
>>>------------------------------
>>>
>>>End of maker-devel Digest, Vol 74, Issue 17
>>>*******************************************
>>
>>
>>_______________________________________________
>>maker-devel mailing list
>>maker-devel at box290.bluehost.com
>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>





More information about the maker-devel mailing list