[maker-devel] Gene loss in subsequent round of maker for fungal genome annotation

Urmi urmi208 at gmail.com
Wed Mar 21 03:24:32 MDT 2018


Further to this, I did run interproscan on all three runs and 100% of the
genes from all of them have protein domains found. I am confused which one
should I consider as the best annotation. I am sorry for so many questions
but I am very new to maker.

Thanks again for any help you could provide.

On Wed, Mar 21, 2018 at 9:05 AM, Urmi <urmi208 at gmail.com> wrote:

> Hello maker community,
>
> I am trying to run maker 3.01.02-beta on a fungal genome. I am using
> available EST and protein sequences from a different strain of the same
> species using parameters "est" and "protein" in the maker_opts.ctl file.
> Here is the protocol I am using:
>
>    1. Run maker with repeat masking and providing transcript and protein
>    sequences from related species (Run A)
>    2. Create SNAP model with CEGMA
>    3. Train Augustus with BUSCO
>    4. Run (run B ) with the new SNAP (done at step 2) and augustus
>    species with options turned off (est2genome=0) and (protein2genome=0) data,
>    provide gff file (altest_gff=runA_cdna2genome.gff, protein_gff=runA_
>    protein2genome.gff3)
>    5. Create SNAP model from run B.
>    6. Train Augustus with transcripts from run B and BUSCO
>    7. Run (run C ) with the new SNAP (done at step 5) and augustus
>    species with options turned off (est2genome=0) and (protein2genome=0) data,
>    provide gff file (altest_gff=runA_cdna2genome.gff, protein_gff=runA_protein2genome.gff3),
>    keep_preds=1
>
> As a result of this, I get following gene numbers:
>
>    - run A: 12796 total genes out of which 12771 have AED < 0.5
>    - run B:10713 total genes out of which 10701 have AED < 0.5
>    - run C: 12651 total genes out of which 12582 have AED < 0.5
>
> Looking at the gff files in detail, it is observerd that there are some
> gene models in run A which are lost in run B and gain in run C. I don't
> understand why there is gene loss for run B. Here is an example:
>
> *RunA*
>
> contig1 maker   gene    20468   21193   .       +       .
>>>  ID=maker-contig1-exonerate_protein2genome-gene-0.34;Name=
>>> maker-contig1-exonerate_protein2genome-gene-0.34
>>
>> contig1 maker   mRNA    20468   21193   100     +       .
>>>  ID=maker-contig1-exonerate_protein2genome-gene-0.34-mRNA-
>>> 1;Parent=maker-contig1-exonerate_protein2genome-gene-
>>> 0.34;Name=maker-contig1-exonerate_protein2genome-gene-
>>> 0.34-mRNA-1;_AED=0.30;_eAED=0.30;_QI=0|-1|0|1|-1|0|1|0|241
>>
>> contig1 maker   exon    20468   21193   .       +       .
>>>  ID=maker-contig1-exonerate_protein2genome-gene-0.34-mRNA-
>>> 1:1;Parent=maker-contig1-exonerate_protein2genome-gene-0.34-mRNA-1
>>
>> contig1 maker   CDS     20468   21193   .       +       0
>>>  ID=maker-contig1-exonerate_protein2genome-gene-0.34-mRNA-
>>> 1:cds;Parent=maker-contig1-exonerate_protein2genome-gene-0.34-mRNA-1
>>
>> contig1 blastn  expressed_sequence_match        20468   21193   726
>>>  +       .       ID=contig1:hit:983:3.2.0.0;Name=jgi|test_1|140804|est
>>> target_length=726
>>
>> contig1 blastn  match_part      20468   21193   726     +       .
>>>  ID=contig1:hsp:998:3.2.0.0;Parent=contig1:hit:983:3.2.0.0;Target=jgi|test_1|140804|est
>>> 1 726 +;Gap=M726
>>
>> contig1 est2genome      expressed_sequence_match        20468   21193
>>>  3630    +       .       ID=contig1:hit:1022:3.2.0.0;
>>> Name=jgi|test_1|140804|est;target_length=726;aligned_
>>> coverage=100;aligned_identity=100
>>
>> contig1 est2genome      match_part      20468   21193   3630    +
>>>  .       ID=contig1:hsp:1110:3.2.0.0;Parent=contig1:hit:1022:3.2.0.0;Target=jgi|test_1|140804|est
>>> 1 726 +;Gap=M726
>>
>>
> *RunB:*
>
>> contig1 est_gff:est2genome      expressed_sequence_match        20468
>>>  21193   3630    +       .       ID=contig1:hit:1051:3.12.0.0;
>>> Name=jgi|test_1|140804|est;target_length=726;aligned_
>>> coverage=100;aligned_identity=100;aligned_coverage=100;
>>> aligned_identity=100;score=3630;target_length=726
>>
>> contig1 est_gff:est2genome      match_part      20468   21193   3630
>>> +       .       ID=contig1:hsp:1166:3.12.0.0;
>>> Parent=contig1:hit:1051:3.12.0.0;Target=jgi|test_1|140804|est 1 726
>>> +;Gap=M726
>>
>>
> *RunC: *
>
>> contig1 maker   gene    20468   21193   .       +       .
>>>  ID=snap_masked-contig1-processed-gene-0.5;Name=snap_
>>> masked-contig1-processed-gene-0.5
>>
>> contig1 maker   mRNA    20468   21193   .       +       .
>>>  ID=snap_masked-contig1-processed-gene-0.5-mRNA-1;
>>> Parent=snap_masked-contig1-processed-gene-0.5;Name=snap_
>>> masked-contig1-processed-gene-0.5-mRNA-1;_AED=0.30;_eAED=0.
>>> 30;_QI=0|-1|0|1|-1|1|1|0|241;_merge_warning=1
>>
>> contig1 maker   exon    20468   21193   .       +       .
>>>  ID=snap_masked-contig1-processed-gene-0.5-mRNA-1:1;
>>> Parent=snap_masked-contig1-processed-gene-0.5-mRNA-1
>>
>> contig1 maker   CDS     20468   21193   .       +       0
>>>  ID=snap_masked-contig1-processed-gene-0.5-mRNA-1:cds;
>>> Parent=snap_masked-contig1-processed-gene-0.5-mRNA-1
>>
>> contig1 snap_masked     match   20468   21193   42.956  +       .
>>>  ID=contig1:hit:5240:4.5.0.0;Name=snap_masked-contig1-
>>> abinit-gene-0.5-mRNA-1;target_length=4075195
>>
>> contig1 snap_masked     match_part      20468   21193   42.956  +
>>>  .       ID=contig1:hsp:12911:4.5.0.0;Parent=contig1:hit:5240:4.5.0.
>>> 0;Target=snap_masked-contig1-abinit-gene-0.5-mRNA-1 1 726 +;Gap=M726
>>
>> contig1 est_gff:est2genome      expressed_sequence_match        20468
>>>  21193   3630    +       .       ID=contig1:hit:1051:3.12.0.0;
>>> Name=jgi|test_1|140804|est;target_length=726;aligned_
>>> coverage=100;aligned_identity=100;aligned_coverage=100;
>>> aligned_identity=100;score=3630;target_length=726
>>
>> contig1 est_gff:est2genome      match_part      20468   21193   3630
>>> +       .       ID=contig1:hsp:1166:3.12.0.0;
>>> Parent=contig1:hit:1051:3.12.0.0;Target=jgi|test_1|140804|est 1 726
>>> +;Gap=M726
>>
>>
> Please could anyone shed come light on this?
>
>
> Many thanks in advance.
>
> Urmi
>



-- 
"The only way of finding the limits of the possible is by going beyond them
into the impossible.*" **- Arthur C. Clarke*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20180321/c0cc3e5d/attachment-0001.html>


More information about the maker-devel mailing list