[maker-devel] [EXTERNAL] Re: Troubleshooting Maker failure

Kevin Kocot kmkocot at ua.edu
Thu Sep 23 10:42:12 MDT 2021


Hi Carson and Jacques,

Thank you both very much for the help. It looks like my PASA and exonerate (via Funannotate) gff3 files are not correctly formatted for maker.  I've uploaded them here just in case it might be helpful for you to see them, but I will try to figure out how to reformat them correctly following Jacques's advice.
http://genomes.ua.edu/Kocot/2021-09-10_Maker/pasa_predictions.gff3
http://genomes.ua.edu/Kocot/2021-09-10_Maker/protein_alignments.gff3

Is there a standalone tool or Maker feature I'm not seeing that can assess whether a gff3 file is correctly formatted for Maker?

Thank you again!
Kevin
________________________________
From: Jacques Dainat <jacques.dainat at nbis.se>
Sent: Wednesday, September 22, 2021 2:09 PM
To: Kevin Kocot <kmkocot at ua.edu>
Cc: maker-devel at yandell-lab.org <maker-devel at yandell-lab.org>; Carson Holt <carsonhh at gmail.com>
Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure

Hi Kevin,

About using AGAT (agat_convert_sp_gxf2gxf.pl), two reasons to get empty output files.
I) The feature types (3rd column) are not yet handled by AGAT. You can inform AGAT how to deal with it
See https://agat.readthedocs.io/en/latest/troubleshooting.html#agat-throws-features-out-because-the-feature-type-is-not-yet-taken-into-account<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fagat.readthedocs.io%2Fen%2Flatest%2Ftroubleshooting.html%23agat-throws-features-out-because-the-feature-type-is-not-yet-taken-into-account&data=04%7C01%7Ckmkocot%40ua.edu%7C168456deb2c74e62a85f08d97dfca290%7C2a00728ef0d040b4a4e8ce433f3fbca7%7C0%7C0%7C637679346284061013%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Wc7PNmVtOVuQHd%2BiDYxQJTuXHlkoltF7eJ4Gv1fkoQc%3D&reserved=0>
II) The features are thrown by AGAT because child feature are missing (e.g. gene feature expect at least one transcript linked to it). See https://agat.readthedocs.io/en/latest/troubleshooting.html#agat-throws-features-out-because-child-features-are-not-provided<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fagat.readthedocs.io%2Fen%2Flatest%2Ftroubleshooting.html%23agat-throws-features-out-because-child-features-are-not-provided&data=04%7C01%7Ckmkocot%40ua.edu%7C168456deb2c74e62a85f08d97dfca290%7C2a00728ef0d040b4a4e8ce433f3fbca7%7C0%7C0%7C637679346284071013%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=UBEc%2BdXAtr%2FOLaJnf5RCMwrp8U3RprkDyPBNDR08ORo%3D&reserved=0>

I invite you to open an issue in the AGAT GitHub repository.

Once the file is parsed correctly you can use the script agat_sp_alignment_output_style.pl to turn level1 feature types (e.g. gene) and level2 feature types (e.g. mRNA) into match and match_part features respectively as it can be preferred by MAKER.

Best regards,

Jacques Dainat, Ph.D.


On 22 Sep 2021, at 18:21, Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>> wrote:

I’d have to see the GFF, but in general you should organize sequence alignments as match/match_part features.

Here is n example from the GFF3 format specification:

ctg123 . cDNA_match  1200  9000  .        .  .    ID=cDNA00001
ctg123 . match_part  1200  3200  2.2e-30  +  .    ID=match00002;Parent=cDNA00001;Target=mjm1123.5 5 506;Gap=M301 D1499 M201
ctg123 . match_part  7000  9000  7.4e-32  -  .    ID=match00003;Parent=cDNA00001;Target=mjm1123.3 1 502;Gap=M101 D1499 M401

Also make sure you are not inadvertently using GFF2 or GTF. They are not backwards compatible with GFF3.

—Carson


On Sep 22, 2021, at 8:06 AM, Kevin Kocot <kmkocot at ua.edu<mailto:kmkocot at ua.edu>> wrote:

Thanks Carson,

I think I see the problem now. Here's what I'm getting:

-----
STATUS: Parsing control files...
STATUS: Processing and indexing input FASTA files...
STATUS: Setting up database for any GFF3 input...
A data structure will be created for you at:
/home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_datastore

To access files for individual sequences use the datastore index:
/home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_master_datastore_index.log

STATUS: Now running MAKER...
examining contents of the fasta file and run log



--Next Contig--

Processing run.log file...
#---------------------------------------------------------------------
Now retrying the contig!!
SeqID: PGA_scaffold0
Length: 141759199
Tries: 5!!
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
prepare section files
Gathering GFF3 input into hits - chunk:0
ERROR: Non-unique top level ID for match.19561.56
While this is technically legal in GFF3, it usually
indicates a poorly fomatted GFF3 file (perhaps you
tried to merge two GFF3 files without accounting for
unique IDs).  MAKER will not handle these correctly.

--> rank=NA, hostname=wirenia
ERROR: Failed while prepare section files
ERROR: Chunk failed at level:12, tier_type:3
FAILED CONTIG:PGA_scaffold0

ERROR: Chunk failed at level:4, tier_type:0
FAILED CONTIG:PGA_scaffold0

examining contents of the fasta file and run log
-----

It looks like maker doesn't like the format of the exonerate gff3 I am using. I tried 'fixing' it with agat_convert_sp_gxf2gxf.pl, which seemed to work on my Braker output, but that just produced an empty gff3 file for both my exonerate and PASA gff3 files. Any advice on how to prepare exonerate or PASA gff3 files for Maker?

Thanks!
Kevin
________________________________
From: Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>
Sent: Monday, September 20, 2021 8:06 AM
To: Kevin Kocot <kmkocot at ua.edu<mailto:kmkocot at ua.edu>>
Cc: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org> <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure

Hi Kevin,

The files are already being skipped because of previous failures. Can you increase the try count (-t on the command line) to something like 6, and send me the STDERR after it generates a new failure.

—Carson

Sent from my iPhone

On Sep 20, 2021, at 4:56 AM, Kevin Kocot <kmkocot at ua.edu<mailto:kmkocot at ua.edu>> wrote:


Thanks Carson! I've uploaded that file here:
http://genomes.ua.edu/Kocot/2021-09-10_Maker/round1_run_maker_try2.log


________________________________
From: Carson Holt
Sent: Tuesday, September 14, 2021 9:51 PM
To: Kevin Kocot
Cc: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure

What I really need is the captured STDERR from the failed run.

—Carson

On Sep 10, 2021, at 9:19 AM, Kevin Kocot <kmkocot at ua.edu<mailto:kmkocot at ua.edu>> wrote:

Hi Carson and all,

I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. Every scaffold in the datastore_index.log file has both STARTED and FAILED statuses. I can't figure out where the problem lies, though. Running ./Build status indicates all the dependencies are there (I’m not using MPI). The run.log files just ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which (if any) of the dependencies is misbehaving here (they all seem to run fine independently) or if my evidence .gff3 files are not correctly formatted?

I’ve uploaded a zipped sample output folder as well as my config files and the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/

Any guidance on what might be the issue would be greatly appreciated.

Thanks!
Kevin


Kevin M. Kocot
he/him/his

Associate Professor & Curator of Invertebrates
Department of Biological Sciences & Alabama Museum of Natural History
The University of Alabama<https://www.ua.edu/>
307 Mary Harmon Bryant Hall
Box 870344
Tuscaloosa, AL 35487
phone 205-348-4052<tel:205-348-4052> | fax 205-348-4039
kmkocot at ua.edu<mailto:kmkocot at ua.edu> | www.kocotlab.com<https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.kocotlab.com%2F&data=04%7C01%7Ckmkocot%40ua.edu%7C168456deb2c74e62a85f08d97dfca290%7C2a00728ef0d040b4a4e8ce433f3fbca7%7C0%7C0%7C637679346284081007%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=IPRtDsChZSGeknM5I9qWvgWNHbRQc9a3NREKARlkxY4%3D&reserved=0>
https://uasystem.zoom.us/j/3755490727


_______________________________________________
maker-devel mailing list
maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org<https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fyandell-lab.org%2Fmailman%2Flistinfo%2Fmaker-devel_yandell-lab.org&data=04%7C01%7Ckmkocot%40ua.edu%7C168456deb2c74e62a85f08d97dfca290%7C2a00728ef0d040b4a4e8ce433f3fbca7%7C0%7C0%7C637679346284081007%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jOkLAShSbdfpwvbNvu%2B5%2FiJjqdldin3gybzZfJAIyoo%3D&reserved=0>

_______________________________________________
maker-devel mailing list
maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20210923/57c498c3/attachment-0003.html>


More information about the maker-devel mailing list