[maker-devel] training of gene finders using whole assembly or longest contigs?
Carson Holt
carsonhh at gmail.com
Fri Feb 10 09:42:13 MST 2017
Example of training here —> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors <http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors>
You can also search the devel mailing list archives here —> https://groups.google.com/forum/#!forum/maker-devel <https://groups.google.com/forum/#!forum/maker-devel>
There are lots and lots of threads that go into detail on training. Note more than 2 rounds of training is not beneficial, and can actually make performance worse (there is an overtraining paradox).
—Carson
> On Feb 10, 2017, at 9:03 AM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
>
> Hello:
>
> I am training the gene finders using the whole assembly. But it seems very time consuming. Besides, I have to repeat the training process several times. Although I am running it on 25 nodes on a server, it may still take 3 (or even more) weeks for the training. I wonder how you guys train the SNAP. Do you use the whole assembly or just select the longest contigs for the training. If I only use longest contigs (like top 20% longest), will it be good enough as that get by using the whole assembly? Or should I randomly select 20% contigs for the training, for which we will have similar length distribution as the whole assembly?
>
> Thanks
>
> Best
> Quanwei
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170210/4eb1a740/attachment-0003.html>
More information about the maker-devel
mailing list