[maker-devel] AED calculations using the MAKER pipeline

Carson Holt carsonhh at gmail.com
Wed Mar 20 11:36:30 MDT 2013


On the few cases where I found this (if it is the same issue you are
experiencing), it was very much dependent on the total size of the
evidence database and the length of the contigs.  For me it took about
25-50% longer, but used up 10-15x as much RAM (primarily because the
contigs were very long > 50 Mb each).  The issue was unnoticeable on the
short contigs that are more typical of de novo annotation.

Thanks,
Carson





On 13-03-20 9:54 AM, "Town, Christopher D." <cdtown at jcvi.org> wrote:

>Thanks. Is there any way of guestimating when this final step might be
>completed. We are in a time crunch here to get this analysis finished and
>the data/annotation out.
>
>Best
>
>Chris
>
>-----Original Message-----
>From: Carson Holt [mailto:Carson.Holt at oicr.on.ca]
>Sent: Wednesday, March 20, 2013 9:51 AM
>To: Krishnakumar, Vivek; maker-devel at yandell-lab.org
>Cc: Town, Christopher D.; Tang, Haibao; Bidwell, Shelby; Rosen, Benjamin
>Subject: Re: AED calculations using the MAKER pipeline
>
>In the current MAKER download when using GFF3 passthrough there was an
>issue with everything being done at the very last step.  This of course
>leads to a memory spike and a very slow last step.  That seems to be
>similar to what you are describing. It should be resolved in what will
>become version 2.28. I can give you access to the pre-release code, so
>you can check that the issue is resolved for you.  I'll send details in a
>separate e-mail.
>
>Also the ### will be printed after every ~100,000 bp of assembly
>processed by MAKER.  You can ignore them, but they actually have a
>meaning in GFF3.
>Basically everything between two sets of ###'s are fully resolved.  It
>allows programs that read GFF3 to parallelize file loading or just load
>sections of a file as they can rapidly identify "safe chunks".  Without
>them the entire file must be loaded into memory in order to be certain
>that all feature parts are there (as there is no requirement for sorting
>or order in GFF3).
>
>log.child files will always be empty unless you run analysis like snap or
>blast.
>
>Thanks,
>Carson
>
>
>
>
>
>
>On 13-03-20 9:05 AM, "Krishnakumar, Vivek" <vKrishna at jcvi.org> wrote:
>
>>Hi,
>>
>>We have been using the MAKER pipeline here at JCVI to calculate AED
>>scores by feeding in our annotation set as `model_gff` and the protein
>>and EST evidence as `protein_gff` and `est_gff` respectively. Here is
>>the issue we are having:
>>
>>When running the above pipeline with protein2genome and est2genome
>>evidence generated earlier by MAKER, there are no problems calculating
>>the AED score. Normally this pipeline takes a little over 12 hours to
>>complete.
>>
>>But if we use our own evidence, AAT and Genewise aligned proteins for
>>`protein_gff` and PASA assembled ESTs for `est_gff`, the same pipeline
>>runs very very slow and the intermediary *.gff.ann file has many chunks
>>(separated by '###') that are completely empty. Our evidence in
>>formatted in the same way as est2genome or protein2genome (GFF file
>>with "expressed_sequence_match::match_part" or
>>"protein_match::match_part"
>>features respectively)
>>
>>The input to my pipeline is 8 chromosomes, ~2200 scaffolds and I use
>>the default `max_dna_len` parameter used to split the large assemblies
>>into chunks.
>>
>>Investigating the master_datastore.log shows me that the scaffolds run
>>through without any issues and the chromosomes are still being processed.
>>For any of the chromosomes, investigating the 'run.log' file, one level
>>above 'theVoid' shows me how many "final.section" jobs were started and
>>how many finished. And in the case of all the chromosomes, it tells me
>>that everything that was started has finished. And the 'log.child.*'
>>files within `theVoid` are all empty. Also within `theVoid`, I'm
>>noticing that the "raw.section" and "evidence_*.gff" files are not
>>empty. But one thing that is surprising is that of all the
>>"final.section" files, only the one pertaining to the last chunk is
>>very large (proportional to the size of the evidnce), the rest are all
>>exactly the same size (exactly 331 bytes).
>>
>>I'm running MAKER in MPI mode spawning 48 processes on a high memory
>>machine with 64 available cores and 1TB of RAM.
>>
>>I hope I've been able to explain my situation clearly in this email.
>>
>>Any help is appreciated.
>>Thank you.
>>
>>Vivek
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org






More information about the maker-devel mailing list