[maker-devel] map_forward and temporary storage questions

Zoe Clarke zoe.clarke at utoronto.ca
Fri Sep 25 03:17:47 MDT 2020


Hello!

I am currently running Maker on a 2.5GB genome that has already had a list of ~8000 genes very thoroughly annotated. My hope is to find and annotate the rest of the genes using ESTs and protein homology. However, I tested Maker on a single contig of my genome (there are ~20,000 contigs) and I can't find any of the genes from my original gtf file even though I followed all of the instructions in this wiki: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Updating_annotations_in_light_of_new_data (I entered the original gff under model_gff, and used map_forward=1). I am worried this is because my gff3 file isn't formatted properly. Here are a few lines in my gff file as an example:
--------------------------------------------
WCK01_AAF20200214_F8-ctg36      ovaltine_v0.13  transcript      1094446 1105585 .       +       .       ID=DIMT1.1;geneID=DIMT1
WCK01_AAF20200214_F8-ctg36      ovaltine_v0.13  exon    1094446 1094521 97.75   +       .       Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36      ovaltine_v0.13  exon    1094874 1094947 97.75   +       .       Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36      ovaltine_v0.13  exon    1095459 1095545 97.75   +       .       Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36      ovaltine_v0.13  exon    1097351 1097412 97.75   +       .       Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36      ovaltine_v0.13  exon    1097492 1097585 97.75   +       .       Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36      ovaltine_v0.13  exon    1097670 1097719 97.75   +       .       Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36      ovaltine_v0.13  exon    1098957 1099080 97.75   +       .       Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36      ovaltine_v0.13  exon    1099217 1099309 97.75   +       .       Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36      ovaltine_v0.13  exon    1100870 1100934 97.75   +       .       Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36      ovaltine_v0.13  exon    1101967 1102030 97.75   +       .       Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36      ovaltine_v0.13  exon    1103784 1103890 97.75   +       .       Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36      ovaltine_v0.13  exon    1105543 1105585 97.75   +       .       Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36      ovaltine_v0.13  CDS     1094446 1094521 .       +       0       Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36      ovaltine_v0.13  CDS     1094874 1094947 .       +       2       Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36      ovaltine_v0.13  CDS     1095459 1095545 .       +       0       Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36      ovaltine_v0.13  CDS     1097351 1097412 .       +       0       Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36      ovaltine_v0.13  CDS     1097492 1097585 .       +       1       Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36      ovaltine_v0.13  CDS     1097670 1097719 .       +       0       Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36      ovaltine_v0.13  CDS     1098957 1099080 .       +       1       Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36      ovaltine_v0.13  CDS     1099217 1099309 .       +       0       Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36      ovaltine_v0.13  CDS     1100870 1100934 .       +       0       Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36      ovaltine_v0.13  CDS     1101967 1102030 .       +       1       Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36      ovaltine_v0.13  CDS     1103784 1103890 .       +       0       Parent=DIMT1.1
WCK01_AAF20200214_F8-ctg36      ovaltine_v0.13  CDS     1105543 1105582 .       +       1       Parent=DIMT1.1
--------------------------------------
​This is from the contig I used as a test for maker, and I can't find DIMT1.1 in the final gff file. At first I thought it might be because "geneID" is a listed attribute, but changing this to "Name" didn't help. Do you have any ideas why these genes might not be mapping forward? If it's something I can fix in the gff file, I am hoping I can fix it and use it for the second round of Maker after I have trained Snap.

Also, do you think a better quality annotation would results from Snap trained from this curated list of ~8000 genes (that has been expertly done) or by the round 1 output of Maker?

A final question: I am having memory storage issues with Maker, as it is currently taking up ~15TB of storage with temporary files. I am running Maker on a cluster and whenever my submitted Maker job runs out of memory it fails, so I have to resubmit it about every hour, which leaves a lot of temporary folders (e.g. maker_x6V2y4) in my directory. I notice that some of these temporary files haven't been updated in days - is it okay to delete them?

Thank you so much for your help!
Zoe
______________________________________
Zoe Clarke
PhD candidate in Computational Biology at U of T
Lab profile: http://baderlab.org/Zoe%20Clarke
Personal website: https://zoe-clarke.weebly.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20200925/769b740f/attachment-0002.html>


More information about the maker-devel mailing list