[maker-devel] maker problem
Carson Holt
carsonhh at gmail.com
Mon Oct 8 14:11:26 MDT 2018
> We had run BUSCO and there is no problem in genome assembly. I used RepeatMasker (separately from maker pipeline) for masking the repeats using custom generated library (denovo repeats and repeat library from other species as well). The masked genome was used as input in maker_opts.ctl.
Let MAKER run masking if possible. Also BUSCO can be used to train Augustus which can then become the gene predictor in MAKER.
> Transcripts-
> We have RNA-Seq data assembled using velvet /oases from the same species as for genome sequenced. I globally aligned transcripts over assembled genome using GMAP with gave ~99% mapping. Gff3 generated from GMAP was also checked on genome browser. Those transcripts were used as est input in maker_opts.ctl. These assembled transcripts may have redundancy.
est2genome doesn't work with est_gff. You must provide fasta of assembled transcripts. You can revert back to the GFF3 if you want after training.
> Proteins-
> I used protein (fasta seq) sequences downloaded from uniprot for 5 closely related species and one from in-house sequenced genome (already published). Protein sequences from all 6 organisms are concatenated in one file and used as protein evidence in maker_opts.ctl.
Look at the contigs in a browser. Find a contig with protein2genome results in the GFF3 (i.e. the column is marked protein2genome in the GFF3), and look at it specifically. If you don’t find any, then the issue is either your pre-masking or the evidence proteins you gave. I’d recommend using UniProt/Swiss-Prot which conains a broad set of curated and conserved proteins.
> atleast=transcripts.fasta (from in-house sequenced genome (already published))
These will being ignored until you have a trained HMM (this type of alignment can only be used as hints to the trained predictor).
—Carson
More information about the maker-devel
mailing list