[maker-devel] MAKER training
Carson Holt
carsonhh at gmail.com
Tue Sep 11 10:04:53 MDT 2012
> - I have transcriptome data from 454 and Illumina platforms. Illumina is from
> a single time point and 454 from multiple time point. 454 was assembled using
> Newbler(dataset 1) and Illumina using Tophat-Cufflinks (dataset 2) and the
> denovo Trinity pipeline (dataset 3). I now have3 assemblies - 454 and
> Illumina will have some redunant transcripts (because of one overlapping time
> point); TopHat-Cufflinks and Trinity will have highly redundant transcripts
> (because they use same raw reads). Is it OK to provide all 3 datasets as EST
> evidence, how does it affect the quality of annotation. (For now I have used
> dataset 1 and dataset 2 as EST evidence)
This is fine. You can give them as a comma separated list
est=file1,file2,file3
> - I used the above model to retrain, I passed through everything except the
> abinitio gene predictions. I also provided a set a manually annotated genes ,
> many of which have EST evidence. Is this OK to do? [ For proteins evidence, I
> gave a set from related organisms, same as above]
>
> - In my third retraining, I used the above retrained model, but this time I
> only provided the genome_gff but did not pass through any other data. However
> I did provide the manually annotated genes as EST evidence and related
> proteins as protein_evidence.
>
> Can you please give me some advice on which of these could give me the best
> prediction, or if I can alter something to get a better prediction.
>
Everything you've done sounds reasonable. Better training comes from having
the most correct models to train with, so providing the manual annotations
as training works, or you can also select MAKER models with the lowest AED
score (i.e. models that most closely match evidence). The goal is to try
and make the process as unbias as possible, so a consistent usually
automated selection method is often the easiest to justify justifiable.
>
> - A quick question about Augustus - I used a Augustus model (trained for a
> closely related organism) for ab-initio prediction. Does MAKER adjust this
> model based on the evidence provided, or use the model as such for a
> prediction.
MAKER will provide hints to Augustus during the run to make it perform
better. MAKER will report the raw unaided augustus results in the GFF3 file
as a reference, but will use evidence to improve performance where it can.
The gene name will let you know if it is a hint based or ab initio model
prediction. When 'maker', is part of the gene name it is hint based.
Thanks,
Carson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20120911/ed0281ce/attachment-0003.html>
More information about the maker-devel
mailing list