[maker-devel] Fragmented annotation

Ole Kristian Tørresen o.k.torresen at ibv.uio.no
Thu Jul 7 15:56:39 MDT 2016


Sure, but is there a quick way of doing this? With UTRs and such, I am unsure how to parse the gff properly. Three first bases of each CDS for each gene, or something like that? And the three last for the last CDS for a gene?

Ole
________________________________________
From: Daniel Ence <dence at genetics.utah.edu>
Sent: 07 July 2016 23:48
To: Ole Kristian Tørresen
Cc: maker-devel at yandell-lab.org
Subject: Re: [maker-devel] Fragmented annotation

Addressing your suspicion that your genes are fragmented, can you check how many of the protein or transcript sequeces begin and end with canonical start and stop codons? That might tell you whether you have “gene-parts” rather than full genes.

~Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330

> On Jul 7, 2016, at 3:44 PM, Daniel Ence <dence at genetics.utah.edu> wrote:
>
> Hi Ole, when I hear that a genome had too many genes annotated, one of the first things I think of is masking repetitive elements in the genome. Those can contribute a large number of spurious gene annotations which are originating from transposable elements. What did you use for repeat masking for your genome? Did you run MAKER on a pre-masked version of the assembly?
>
> ~Daniel
>
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
>
>> On Jul 7, 2016, at 3:29 PM, Ole Kristian Tørresen <o.k.torresen at ibv.uio.no> wrote:
>>
>> Hi all,
>> I have annotated a fish genome (about 700 Mbp total, 90 kbp N50 contig, 270 kbp N50 scaffold), where I get 96576 gene models, 67917 with default filtering (quality_filter.pl -d) and 67917 with standard filtering (quality_filter.pl -s). I chose to report all genes with AED less than 0.5  (27437) as the high quality set.
>>
>> However, I wonder a bit. One thing is that 70k genes cannot be correct for this species (it is not polyploid), and the correct number of genes should be a bit more than 20k I think. I suspect that many of my genes are fragmented, how can I fix this? I have tried searching the forum, but cannot find any good answers. Is there some parameters I can adjust?
>>
>> I have used SwissProt/UniProt and a Trinity assembly of reads from several stages of embryo development  as evidence. I used SNAP with CEGMA, AUGUSTUS trained with BUSCO actinoptergyrii genes and GeneMark-ES in first pass, SNAP trained on first pass annotation and AUGUSTUS trained on the transcriptome and first pass annotation together with GeneMark for second pass annotation.
>>
>> Thank you.
>>
>> Ole
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org





More information about the maker-devel mailing list