[maker-devel] AED calculations using the MAKER pipeline

Wed Mar 20 08:55:38 MDT 2013

Hi Vivek,

sound like its a  maybe problem with the protein2genome GFF file. Cane you send us a sample file that is known to produce the problem?

cheers,

--mark

Mark Yandell
Professor of Human Genetics
H.A. & Edna Benning Presidential Endowed Chair
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
ph:801-587-7707

________________________________________
From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] on behalf of Krishnakumar, Vivek [vKrishna at jcvi.org]
Sent: Wednesday, March 20, 2013 7:05 AM
To: maker-devel at yandell-lab.org
Cc: Tang, Haibao; Rosen, Benjamin; Town,        Christopher D.; Bidwell, Shelby
Subject: [maker-devel] AED calculations using the MAKER pipeline

Hi,

We have been using the MAKER pipeline here at JCVI to calculate AED scores by feeding in our annotation set as `model_gff` and the protein and EST evidence as `protein_gff` and `est_gff` respectively. Here is the issue we are having:

When running the above pipeline with protein2genome and est2genome evidence generated earlier by MAKER, there are no problems calculating the AED score. Normally this pipeline takes a little over 12 hours to complete.

But if we use our own evidence, AAT and Genewise aligned proteins for `protein_gff` and PASA assembled ESTs for `est_gff`, the same pipeline runs very very slow and the intermediary *.gff.ann file has many chunks (separated by '###') that are completely empty. Our evidence in formatted in the same way as est2genome or protein2genome (GFF file with "expressed_sequence_match::match_part" or "protein_match::match_part" features respectively)

The input to my pipeline is 8 chromosomes, ~2200 scaffolds and I use the default `max_dna_len` parameter used to split the large assemblies into chunks.

Investigating the master_datastore.log shows me that the scaffolds run through without any issues and the chromosomes are still being processed.
For any of the chromosomes, investigating the 'run.log' file, one level above 'theVoid' shows me how many "final.section" jobs were started and how many finished. And in the case of all the chromosomes, it tells me that everything that was started has finished. And the 'log.child.*' files within `theVoid` are all empty. Also within `theVoid`, I'm noticing that the "raw.section" and "evidence_*.gff" files are not empty. But one thing that is surprising is that of all the "final.section" files, only the one pertaining to the last chunk is very large (proportional to the size of the evidnce), the rest are all exactly the same size (exactly 331 bytes).

I'm running MAKER in MPI mode spawning 48 processes on a high memory machine with 64 available cores and 1TB of RAM.

I hope I've been able to explain my situation clearly in this email.

Any help is appreciated.
Thank you.

Vivek
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org