[maker-devel] map_forward and temporary storage questions
Carson Holt
carsonhh at gmail.com
Fri Oct 2 14:50:35 MDT 2020
MAKER uses GFF3 format which is not the same as GTF. You will need to convert your file to GFF3 format.
You can try this online tool (I haven’t used it to tell you how well it works) http://www.sequenceontology.org/cgi-bin/converter.cgi <http://www.sequenceontology.org/cgi-bin/converter.cgi>
There are also a number of other resources available when you google "how to convert GTF to GFF3”.
—Carson
> On Sep 25, 2020, at 3:17 AM, Zoe Clarke <zoe.clarke at utoronto.ca> wrote:
>
> Hello!
>
> I am currently running Maker on a 2.5GB genome that has already had a list of ~8000 genes very thoroughly annotated. My hope is to find and annotate the rest of the genes using ESTs and protein homology. However, I tested Maker on a single contig of my genome (there are ~20,000 contigs) and I can't find any of the genes from my original gtf file even though I followed all of the instructions in this wiki: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Updating_annotations_in_light_of_new_data <http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Updating_annotations_in_light_of_new_data> (I entered the original gff under model_gff, and used map_forward=1). I am worried this is because my gff3 file isn't formatted properly. Here are a few lines in my gff file as an example:
> --------------------------------------------
> WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 transcript 1094446 1105585 . + . ID=DIMT1.1;geneID=DIMT1
> WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 exon 1094446 1094521 97.75 + . Parent=DIMT1.1
> WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 exon 1094874 1094947 97.75 + . Parent=DIMT1.1
> WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 exon 1095459 1095545 97.75 + . Parent=DIMT1.1
> WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 exon 1097351 1097412 97.75 + . Parent=DIMT1.1
> WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 exon 1097492 1097585 97.75 + . Parent=DIMT1.1
> WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 exon 1097670 1097719 97.75 + . Parent=DIMT1.1
> WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 exon 1098957 1099080 97.75 + . Parent=DIMT1.1
> WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 exon 1099217 1099309 97.75 + . Parent=DIMT1.1
> WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 exon 1100870 1100934 97.75 + . Parent=DIMT1.1
> WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 exon 1101967 1102030 97.75 + . Parent=DIMT1.1
> WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 exon 1103784 1103890 97.75 + . Parent=DIMT1.1
> WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 exon 1105543 1105585 97.75 + . Parent=DIMT1.1
> WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 CDS 1094446 1094521 . + 0 Parent=DIMT1.1
> WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 CDS 1094874 1094947 . + 2 Parent=DIMT1.1
> WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 CDS 1095459 1095545 . + 0 Parent=DIMT1.1
> WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 CDS 1097351 1097412 . + 0 Parent=DIMT1.1
> WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 CDS 1097492 1097585 . + 1 Parent=DIMT1.1
> WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 CDS 1097670 1097719 . + 0 Parent=DIMT1.1
> WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 CDS 1098957 1099080 . + 1 Parent=DIMT1.1
> WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 CDS 1099217 1099309 . + 0 Parent=DIMT1.1
> WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 CDS 1100870 1100934 . + 0 Parent=DIMT1.1
> WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 CDS 1101967 1102030 . + 1 Parent=DIMT1.1
> WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 CDS 1103784 1103890 . + 0 Parent=DIMT1.1
> WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 CDS 1105543 1105582 . + 1 Parent=DIMT1.1
> --------------------------------------
> This is from the contig I used as a test for maker, and I can't find DIMT1.1 in the final gff file. At first I thought it might be because "geneID" is a listed attribute, but changing this to "Name" didn't help. Do you have any ideas why these genes might not be mapping forward? If it's something I can fix in the gff file, I am hoping I can fix it and use it for the second round of Maker after I have trained Snap.
>
> Also, do you think a better quality annotation would results from Snap trained from this curated list of ~8000 genes (that has been expertly done) or by the round 1 output of Maker?
>
> A final question: I am having memory storage issues with Maker, as it is currently taking up ~15TB of storage with temporary files. I am running Maker on a cluster and whenever my submitted Maker job runs out of memory it fails, so I have to resubmit it about every hour, which leaves a lot of temporary folders (e.g. maker_x6V2y4) in my directory. I notice that some of these temporary files haven't been updated in days - is it okay to delete them?
>
> Thank you so much for your help!
> Zoe
> ______________________________________
> Zoe Clarke
> PhD candidate in Computational Biology at U of T
> Lab profile: http://baderlab.org/Zoe%20Clarke <http://baderlab.org/Zoe%20Clarke>
> Personal website: https://zoe-clarke.weebly.com/ <https://zoe-clarke.weebly.com/>_______________________________________________
> maker-devel mailing list
> maker-devel at yandell-lab.org <mailto:maker-devel at yandell-lab.org>
> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org <http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20201002/96a1f22c/attachment-0002.html>
More information about the maker-devel
mailing list