[maker-devel] evidence for MAKER vs evidence to train gene finders

Steven Sullivan sullis02 at nyu.edu
Tue Sep 20 13:28:20 MDT 2016


Thanks! So, I think for training the gene predictors, I'll try to identify
any sequences in my gold-standard set that have structural in
information...i.e. genes for which the genomic sequence was cloned....and
use those.  But  I doubt there's enough of those to train e.g. Augustus, so
I'll probably have to use the bootstrap method as well . Is there a way to
combine both?

For the BLAST-based annotation, if I use entire Uniprot/Swissprot or
Genbank FASTA sets as protein homology evidence , my gold standards are
already included in those.  I gather from these replies that that's not a
problem.

However, there *are* public database sequences (predicted genes from an
older annotation of this species) that I *do* want to exclude from
evidence.  (Because we want to run MAKER as if this genome was 'new', never
before annotated.)  Can I use something  like the -negative_gilist  option
in blastp , to omit previous genome project predictions from consideration?
 (An  option that only works with Genbank sequences, I think) .  Or do I
have to create a  custom version of the large public database?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20160920/1949c4db/attachment-0003.html>


More information about the maker-devel mailing list