[maker-devel] MAKER training

Tue Sep 11 10:45:42 MDT 2012

I will try the model_gff option.

For the retraining I noticed the count of the annotated genes vary, I haven't examined if the gene structure varies.
I will use the models to train ab-initio predictions and providing that as input. But I did notice the number of genes output by MAKER is fairly lower when compared to the transcriptome. I understand this is because MAKER annotations are based on good evidence but can this improve/increase with ab-initio gene models input.

Thanks,
Ranjani
________________________________
From: Daniel Ence [dence at genetics.utah.edu]
Sent: Tuesday, September 11, 2012 12:29 PM
To: Sivaranjani Namasivayam; maker-devel at yandell-lab.org
Subject: RE: MAKER training

So, MAKER itself isn't probabilistic. If you give it the same data and the same options, it will give you the same outputs. The iterative approach for MAKER is to get gene models on the first round using the est2genome option and the Augustus model that you mentioned. After that first round, you train the ab-initio predictors and tell maker to use those newly trained gene predictors in the second round.

Regarding the set of manually annotated genes, I think you should put those in the model_gff option.

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: Sivaranjani Namasivayam [ranjani at uga.edu]
Sent: Tuesday, September 11, 2012 9:57 AM
To: Daniel Ence; maker-devel at yandell-lab.org
Subject: RE: MAKER training

Hey,

I used the MAKER model to retrain MAKER itself. I read somewhere it improves MAKER's predictions.

I did train abinitio gene predictors using MAKERs output, but I wanted to identify the best prediction before using it to train other gene predictors.

Thanks,
Ranjani
________________________________
From: Daniel Ence [dence at genetics.utah.edu]
Sent: Tuesday, September 11, 2012 11:46 AM
To: Sivaranjani Namasivayam; maker-devel at yandell-lab.org
Subject: RE: MAKER training

Hi Ranjani,

It is fine to include all three of those transcriptome datatsets. The more (relevant) evidence the better.

I'm not certain what you mean when you say "you used the above model to retrain". Did you train an abinitio gene predictor using the results from your first maker run?

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] on behalf of Sivaranjani Namasivayam [ranjani at uga.edu]
Sent: Tuesday, September 11, 2012 9:38 AM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] MAKER training

Hi,

I am using MAKER to annotate a newly sequenced genome. I have trained and retrained with datasets but I would like some advice on assessing the output and how this is affected by the input provided.

- I have transcriptome data from 454 and Illumina platforms. Illumina is from a single time point and 454 from multiple time point. 454 was assembled using Newbler(dataset 1) and Illumina using  Tophat-Cufflinks (dataset 2) and the denovo Trinity pipeline (dataset 3). I now have3  assemblies - 454 and Illumina will have some redunant transcripts (because of one overlapping time point); TopHat-Cufflinks and Trinity will have highly redundant transcripts (because they use same raw reads). Is it OK to provide all 3 datasets as EST evidence, how does it affect the quality of annotation. (For now I have used dataset 1 and dataset 2 as EST evidence)

- I used the above model to retrain, I passed through everything except the abinitio gene predictions. I also provided a set a manually annotated genes , many of which have EST evidence. Is this OK to do? [ For proteins evidence, I gave a set from related organisms, same as above]

- In my third retraining, I used the above retrained model, but this time I only provided the genome_gff but did not pass through any other data. However I did provide the manually annotated genes as EST evidence and related proteins as protein_evidence.

Can you please give me some advice on which of these could give me the best prediction, or if I can alter something to get a better prediction.

- A quick question about Augustus - I used a Augustus model (trained for a closely related organism) for ab-initio prediction. Does MAKER adjust this model based on the evidence provided, or use the model as such for a prediction.

Greatly appreciate your help!
Thanks!
Ranjani

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20120911/566e06d9/attachment-0003.html>