[maker-devel] different number of annotated genes and transcripts

Nicolás Moreyra niconm89 at gmail.com
Tue May 26 12:58:22 MDT 2020


Hi Carson, thanks for your reply.

Yes, I did the same as you. Here are different outputs for the same
annotation file:

> grep -c -P "\tgene\t" Dato_struct-annot.noseq.gff
> 17688
> grep -c -P "RNA\t" Dato_struct-annot.noseq.gff
> 17688
> grep -c -P "mRNA\t" Dato_struct-annot.noseq.gff
> 17205
> grep -P "RNA\t" Dato_struct-annot.noseq.gff| cut -f3 | sort -u
> mRNA
> tRNA


After using a tool to extract transcripts sequences in a Fasta file, y
obtained 17205 sequences. Looking for those genes without an associated
transcript, it seems that you can only find tRNAs annotated there. It is
odd:

> Backbone_23     maker   gene    486041  486112  .       -       .
> ID=Dato03103;Name=Dato03103;Alias=trnascan-Backbone_23-noncoding-Glu_CTC-gene-4.38;
> Backbone_23     maker   tRNA    486041  486112  .       -       .
> ID=Dato03103-RA;Parent=Dato03103;Name=Dato03103-RA;_AED=1.00;_QI=0|-1|0|0|-1|0|1|73|0;_eAED=1.00;
> Backbone_23     maker   exon    486041  486112  .       -       .
> ID=Dato03103-RA:exon:45875;Parent=Dato03103-RA;


The AED is bad in this example, so I'm thinking that it would be possible
this gene had no evidence supporting it.
I do not understand either the "Alias" for the gene line, it looks like
trnaScan detected the gene.
Any ideas?


Nicolás

*--*
*Nicolas Nahuel Moreyra*
*BSc/MSc in Bioinformatics*
*CONICET PhD Fellow @ IEGEBA*
*PhD Student in Comparative Genomics @ EGE (**FCEyN - UBA) **->
**nmoreyra at ege.fcen.uba.ar
<nmoreyra at ege.fcen.uba.ar>*
Professor of Bioinformatics @ Favaloro University
Professor of Informatics @ IFTS N° 7
*Argentina*


El mar., 26 de may. de 2020 a la(s) 14:54, Carson Holt (carsonhh at gmail.com)
escribió:

> Perhaps you are counting wrong.  If you want to know the number go genes,
> you must look at the GFF3. You can use ‘grep -c -P “\tgene\t” file.gff’,
> then the number of transcripts would be “grep -c -P “RNA\t” file.gff"
>
> Note that if you are using things like tRNAscan, you will get tRNA
> transcripts and associated genes.  If you are trying to count from the
> fasta files, make sure you use the right file (maker.proteins.fasta and
> maker.transcripts.fasta).
>
> Thanks,
> Carson
>
>
> On May 21, 2020, at 7:58 AM, Nicolás Moreyra <niconm89 at gmail.com> wrote:
>
> Dear all,
>
> First of all, thank you for sharing your experiences here. I tried to find
> this issue in the posts already made but failed.
> Secondly, I am sorry for asking you a silly question (I think), but after
> I complete the genome annotation of four species, I obtained fewer
> transcripts than genes. I do not understand why MAKER annotated genes
> unable to transcribe.
> I was trying to find the reason for this issue to discuss it in my thesis
> but I am a bit lost. Has this happened to anyone? Is there any possible
> cause that comes to mind?
>
> Thanks in advance.
>
> Nicolás
>
> *--*
> *Nicolas Nahuel Moreyra*
> *BSc/MSc in Bioinformatics*
> *CONICET PhD Fellow @ IEGEBA*
> *PhD Student in Comparative Genomics @ EGE (**FCEyN - UBA) **-> **nmoreyra at ege.fcen.uba.ar
> <nmoreyra at ege.fcen.uba.ar>*
> Professor of Bioinformatics @ Favaloro University
> Professor of Informatics @ IFTS N° 7
> *Argentina*
> _______________________________________________
> maker-devel mailing list
> maker-devel at yandell-lab.org
> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20200526/ccc01f77/attachment-0004.html>


More information about the maker-devel mailing list