[maker-devel] map_forward and temporary storage questions
Zoe Clarke
zoe.clarke at utoronto.ca
Fri Sep 25 03:17:47 MDT 2020
Hello!
I am currently running Maker on a 2.5GB genome that has already had a list of ~8000 genes very thoroughly annotated. My hope is to find and annotate the rest of the genes using ESTs and protein homology. However, I tested Maker on a single contig of my genome (there are ~20,000 contigs) and I can't find any of the genes from my original gtf file even though I followed all of the instructions in this wiki: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Updating_annotations_in_light_of_new_data (I entered the original gff under model_gff, and used map_forward=1). I am worried this is because my gff3 file isn't formatted properly. Here are a few lines in my gff file as an example:
--------------------------------------------
WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 transcript 1094446 1105585 . + . ID=DIMT1.1;geneID=DIMT1
WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 exon 1094446 1094521 97.75 + . Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 exon 1094874 1094947 97.75 + . Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 exon 1095459 1095545 97.75 + . Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 exon 1097351 1097412 97.75 + . Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 exon 1097492 1097585 97.75 + . Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 exon 1097670 1097719 97.75 + . Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 exon 1098957 1099080 97.75 + . Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 exon 1099217 1099309 97.75 + . Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 exon 1100870 1100934 97.75 + . Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 exon 1101967 1102030 97.75 + . Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 exon 1103784 1103890 97.75 + . Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 exon 1105543 1105585 97.75 + . Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 CDS 1094446 1094521 . + 0 Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 CDS 1094874 1094947 . + 2 Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 CDS 1095459 1095545 . + 0 Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 CDS 1097351 1097412 . + 0 Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 CDS 1097492 1097585 . + 1 Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 CDS 1097670 1097719 . + 0 Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 CDS 1098957 1099080 . + 1 Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 CDS 1099217 1099309 . + 0 Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 CDS 1100870 1100934 . + 0 Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 CDS 1101967 1102030 . + 1 Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 CDS 1103784 1103890 . + 0 Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36 ovaltine_v0.13 CDS 1105543 1105582 . + 1 Parent=DIMT1.1
--------------------------------------
This is from the contig I used as a test for maker, and I can't find DIMT1.1 in the final gff file. At first I thought it might be because "geneID" is a listed attribute, but changing this to "Name" didn't help. Do you have any ideas why these genes might not be mapping forward? If it's something I can fix in the gff file, I am hoping I can fix it and use it for the second round of Maker after I have trained Snap.
Also, do you think a better quality annotation would results from Snap trained from this curated list of ~8000 genes (that has been expertly done) or by the round 1 output of Maker?
A final question: I am having memory storage issues with Maker, as it is currently taking up ~15TB of storage with temporary files. I am running Maker on a cluster and whenever my submitted Maker job runs out of memory it fails, so I have to resubmit it about every hour, which leaves a lot of temporary folders (e.g. maker_x6V2y4) in my directory. I notice that some of these temporary files haven't been updated in days - is it okay to delete them?
Thank you so much for your help!
Zoe
______________________________________
Zoe Clarke
PhD candidate in Computational Biology at U of T
Lab profile: http://baderlab.org/Zoe%20Clarke
Personal website: https://zoe-clarke.weebly.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20200925/769b740f/attachment-0002.html>
More information about the maker-devel
mailing list