[maker-devel] Annotating a fragmented assembly
Lior Glick
liorglic at mail.tau.ac.il
Mon Apr 13 08:12:42 MDT 2020
Hello there,
I am working on creating plant pan genomes. This means that I produce many
assemblies for samples of the same species from NGS data available from SRA
and then annotate them with MAKER, based on a collection of relevant
evidence (transcripts and proteins).
As you might imagine, data quality is variable, so I sometimes create
assembles from >x20 sequencing depth, resulting in fragmented assemblies
(say N50 in the range of 5-10kb).
Annotation results of such genomes usually contain many partial genes,
broken across contigs, so in many cases I get two proteins, representing
the 3' and 5' parts of a broken gene. In other cases, only one part of the
gene is detected.
I've also found that applying reference-based scaffolding (I use RaGOO) to
generate pseudomolecules improves results by bringing together contigs
containing gene parts and allowing MAKER to create full annotation.
However, this also results in new erroneous predictions, spanning two
contigs that are not actually adjacent in the genome but were brought
together by the scaffolding process.
I suspect this has to do with the number of 'N' characters introduced as
padding between ordered contigs, so one thing I wanted to ask about is how
MAKER reacts to N's in the middle of a gene. Does it affect gene prediction?
I would also appreciate any advice on how to annotate fragmented genomes
and comments about the strategy I described above. Please note that I am
not expecting a reference-level annotation, but am simply trying to reduce
noise levels towards downstream comparative analyses.
Thanks a lot and best regards,
Lior
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20200413/254fdbe5/attachment-0003.html>
More information about the maker-devel
mailing list