<div dir="ltr">Thanks! So, I think for training the gene predictors, I'll try to identify any sequences in my gold-standard set that have structural in information...i.e. genes for which the genomic sequence was cloned....and use those. But I doubt there's enough of those to train e.g. Augustus, so I'll probably have to use the bootstrap method as well . Is there a way to combine both?<div><br></div><div>For the BLAST-based annotation, if I use entire Uniprot/Swissprot or Genbank FASTA sets as protein homology evidence , my gold standards are already included in those. I gather from these replies that that's not a problem. </div><div><br></div><div>However, there *are* public database sequences (predicted genes from an older annotation of this species) that I *do* want to exclude from evidence. (Because we want to run MAKER as if this genome was 'new', never before annotated.) Can I use something like the -negative_gilist option in blastp , to omit previous genome project predictions from consideration? (An option that only works with Genbank sequences, I think) . Or do I have to create a custom version of the large public database?</div><div><br></div><div><br></div><div><br></div><div><br></div><div class="gmail_extra"><div class="gmail_signature"><br></div>
</div></div>