[maker-devel] training of gene finders using whole assembly or longest contigs?

Fri Feb 10 09:42:13 MST 2017

Example of training here —> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors <http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors>

You can also search the devel mailing list archives here —> https://groups.google.com/forum/#!forum/maker-devel <https://groups.google.com/forum/#!forum/maker-devel>

There are lots and lots of threads that go into detail on training. Note more than 2 rounds of training is not beneficial, and can actually make performance worse (there is an overtraining paradox).

—Carson

> On Feb 10, 2017, at 9:03 AM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
> 
> Hello:
> 
> I am training the gene finders using the whole assembly. But it seems very time consuming. Besides, I have to repeat the training process several times.  Although I am running it on 25 nodes on a server, it may still take 3 (or even more) weeks for the training. I wonder how you guys train the SNAP. Do you use the whole assembly or just select the longest contigs for the training. If I only use longest contigs (like top 20% longest), will it be good enough as that get by using the whole assembly? Or should I randomly select 20% contigs for the training, for which we will have similar length distribution as the whole assembly? 
> 
> Thanks
> 
> Best
> Quanwei
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170210/4eb1a740/attachment-0003.html>