[maker-devel] MAKER training

Tue Sep 11 09:57:31 MDT 2012

Hey,

I used the MAKER model to retrain MAKER itself. I read somewhere it improves MAKER's predictions.

I did train abinitio gene predictors using MAKERs output, but I wanted to identify the best prediction before using it to train other gene predictors.

Thanks,
Ranjani
________________________________
From: Daniel Ence [dence at genetics.utah.edu]
Sent: Tuesday, September 11, 2012 11:46 AM
To: Sivaranjani Namasivayam; maker-devel at yandell-lab.org
Subject: RE: MAKER training

Hi Ranjani,

It is fine to include all three of those transcriptome datatsets. The more (relevant) evidence the better.

I'm not certain what you mean when you say "you used the above model to retrain". Did you train an abinitio gene predictor using the results from your first maker run?

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] on behalf of Sivaranjani Namasivayam [ranjani at uga.edu]
Sent: Tuesday, September 11, 2012 9:38 AM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] MAKER training

Hi,

I am using MAKER to annotate a newly sequenced genome. I have trained and retrained with datasets but I would like some advice on assessing the output and how this is affected by the input provided.

- I have transcriptome data from 454 and Illumina platforms. Illumina is from a single time point and 454 from multiple time point. 454 was assembled using Newbler(dataset 1) and Illumina using  Tophat-Cufflinks (dataset 2) and the denovo Trinity pipeline (dataset 3). I now have3  assemblies - 454 and Illumina will have some redunant transcripts (because of one overlapping time point); TopHat-Cufflinks and Trinity will have highly redundant transcripts (because they use same raw reads). Is it OK to provide all 3 datasets as EST evidence, how does it affect the quality of annotation. (For now I have used dataset 1 and dataset 2 as EST evidence)

- I used the above model to retrain, I passed through everything except the abinitio gene predictions. I also provided a set a manually annotated genes , many of which have EST evidence. Is this OK to do? [ For proteins evidence, I gave a set from related organisms, same as above]

- In my third retraining, I used the above retrained model, but this time I only provided the genome_gff but did not pass through any other data. However I did provide the manually annotated genes as EST evidence and related proteins as protein_evidence.

Can you please give me some advice on which of these could give me the best prediction, or if I can alter something to get a better prediction.

- A quick question about Augustus - I used a Augustus model (trained for a closely related organism) for ab-initio prediction. Does MAKER adjust this model based on the evidence provided, or use the model as such for a prediction.

Greatly appreciate your help!
Thanks!
Ranjani

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20120911/76a921d7/attachment-0003.html>