[maker-devel] duplicate CDS in annotation

Sasha Mikheyev mikheyev at gmail.com
Wed Mar 13 21:34:52 MDT 2013


Thank you very much! Problem solved!
Sasha

On Thu, Mar 14, 2013 at 11:54 AM, Carson Holt <carsonhh at gmail.com> wrote:

> Yes.  map_forward=1 allows new models to keep the names of the models they
> replace.  It makes it so you don't have to relocate genes every time a
> model gets a slight modification during reannotation.
>
> --Carson
>
>
> From: Sasha Mikheyev <mikheyev at gmail.com>
> Date: Wednesday, 13 March, 2013 9:17 PM
>
> To: Carson Holt <carsonhh at gmail.com>
> Cc: Barry Moore <barry.moore at genetics.utah.edu>, <
> maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] duplicate CDS in annotation
>
> OK. Got it! I did pass through the gene model names. I guess I now see
> that a new gene model may become associated with the old name in the
> re-annotation.
>
> Sasha
>
> On Thu, Mar 14, 2013 at 6:47 AM, Carson Holt <carsonhh at gmail.com> wrote:
>
>> The output shows that the original model
>> was Alias=maker-pbar_scf7180000349951-snap-gene-1.17-mRNA-1 and the new
>> model replacing it is
>> Alias=genemark-pbar_scf7180000349951-abinit-gene-1.14-mRNA-1.
>>
>> So it is really a completely different model (as one derived from SNAP
>> and one from GeneMark).  I'm guessing you have map_forward=1 set and are
>> using the GFF3 passthrough options correct?
>>
>> Thanks,
>> Carson
>>
>>
>>
>> From: Sasha Mikheyev <mikheyev at gmail.com>
>> Date: Wednesday, 13 March, 2013 3:23 AM
>>
>> To: Carson Holt <carsonhh at gmail.com>
>> Cc: Barry Moore <barry.moore at genetics.utah.edu>, <
>> maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] duplicate CDS in annotation
>>
>> Dear Carson,
>>
>> The new version does indeed fix the problem!
>>
>> However, I noticed that some of the CDS annotations were swallowed. This
>> seems to affect a ~600 genes.
>>
>> e.g. input:
>>
>> pbar_scf7180000349951 maker mRNA 98033 98530 . - .
>> ID=PB12301-RA;Parent=PB12301;Name=PB12301-RA;Alias=maker-pbar_scf7180000349951-snap-gene-1.17-mRNA-1;_AED=1.00;_QI=0|0|0|0|0|0|2|0|81;
>> pbar_scf7180000349951 maker exon 98393 98530 . - .
>> ID=PB12301-RA:exon:10283;Parent=PB12301-RA;
>> pbar_scf7180000349951 maker exon 98033 98140 . - .
>> ID=PB12301-RA:exon:10284;Parent=PB12301-RA;
>> pbar_scf7180000349951 maker CDS 98033 98140 . - 0
>> ID=PB12301-RA:cds:10114;Parent=PB12301-RA;
>> pbar_scf7180000349951 maker CDS 98393 98530 . - 0
>> ID=PB12301-RA:cds:10113;Parent=PB12301-RA;
>>
>> output:
>>
>> pbar_scf7180000349951 maker mRNA 98033 98530 . - .
>> ID=PB12301-RA;Parent=PB12301;Name=PB12301-RA;_AED=0.38;_eAED=0.38;_QI=0|0|0.33|1|0.5|1|3|246|165;Alias=genemark-pbar_scf7180000349951-abinit-gene-1.14-mRNA-1,PB12301-RA
>> pbar_scf7180000349951 maker exon 98033 98530 . - .
>> ID=PB12301-RA:exon:134;Parent=PB12301-RA
>> pbar_scf7180000349951 maker exon 98033 98140 . - .
>> ID=PB12301-RA:exon:133;Parent=PB12301-RA
>> pbar_scf7180000349951 maker exon 98393 98530 . - .
>> ID=PB12301-RA:exon:132;Parent=PB12301-RA
>> pbar_scf7180000349951 maker three_prime_UTR 98393 98530 . - .
>> ID=PB12301-RA:three_prime_utr;Parent=PB12301-RA
>> pbar_scf7180000349951 maker three_prime_UTR 98033 98140 . - .
>> ID=PB12301-RA:three_prime_utr;Parent=PB12301-RA
>> pbar_scf7180000349951 maker CDS 98033 98530 . - 0
>> ID=PB12301-RA:cds;Parent=PB12301-RA
>>
>> Thank you,
>>
>> Sasha
>>
>> On Tue, Mar 12, 2013 at 10:37 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>
>>> Yes.  Try the newer version and see if you still have the issue.
>>>
>>> Thanks,
>>> Carson
>>>
>>>
>>> From: Sasha Mikheyev <mikheyev at gmail.com>
>>> Date: Tuesday, 12 March, 2013 1:26 AM
>>> To: Carson Holt <carsonhh at gmail.com>
>>> Cc: Barry Moore <barry.moore at genetics.utah.edu>, <
>>> maker-devel at yandell-lab.org>
>>>
>>> Subject: Re: [maker-devel] duplicate CDS in annotation
>>>
>>> Hi Carson,
>>>
>>> I have been using version 2.10. Is it worth trying with a newer version?
>>>
>>> You can find the model file here<https://dl.dropbox.com/u/5275622/all.gff.gz>.
>>> It is rather large, as it includes all of the output from the first maker
>>> run.
>>>
>>> Yours,
>>>
>>> Sasha
>>>
>>>
>>> On Mon, Mar 11, 2013 at 10:02 PM, Carson Holt <carsonhh at gmail.com>wrote:
>>>
>>>> I think the issue is that you are getting a match feature that is being
>>>> printed with the same ID as the mRNA feature. Correct?
>>>>
>>>> What version of MAKER are you using, and what does the gile you are
>>>> giving to pred_gff or model_gff look like?  Could you send them?
>>>>
>>>> Thanks,
>>>> Carson
>>>>
>>>>
>>>> From: Barry Moore <barry.moore at genetics.utah.edu>
>>>> Date: Monday, 11 March, 2013 7:32 AM
>>>> To: Sasha Mikheyev <mikheyev at gmail.com>
>>>> Cc: <maker-devel at yandell-lab.org>
>>>> Subject: Re: [maker-devel] duplicate CDS in annotation
>>>>
>>>> Hi Sasha,
>>>>
>>>> This gene model appears to be correctly formatted to me.  In GFF3
>>>> format the CDS features are allowed to span multiple lines and they share
>>>> the same ID to indicate that it is all the same features.  See the GFF3
>>>> specification on the Sequence Ontology website (
>>>> http://www.sequenceontology.org/resources/gff3.html), and in
>>>> particular the description of the ID attribute specifies:
>>>>
>>>> ID Indicates the ID of the feature. IDs for each feature must be unique
>>>> within the scope of the GFF file. In the case of discontinuous features
>>>> (i.e. a single feature that exists over multiple genomic locations) the
>>>> same ID may appear on multiple lines. All lines that share an ID
>>>> collectively represent a single feature.
>>>>
>>>>
>>>> So each of those CDS lines forms one part of the single CDS feature for
>>>> this gene.
>>>>
>>>> B
>>>>
>>>> On Mar 11, 2013, at 3:46 AM, Sasha Mikheyev wrote:
>>>>
>>>> Dear Yandell lab,
>>>>
>>>> I am re-annotating the harvester and genome using protein and RNA-seq
>>>> data. However, I get many artifacts like the one below. It seems that there
>>>> are several CDS records that should tie in to the same mRNA, but they are
>>>> really hanging out separately, and produce several nucleotide sequences
>>>> with the same name when extracted from the gff. I would appreciate any
>>>> guidance about how to fix this!
>>>>
>>>> Thank you,
>>>>
>>>> Sasha
>>>>
>>>> grep "pbar_scf7180000350377:hit:2506" Pbar.2.0.gff
>>>> pbar_scf7180000350377 protein2genome protein_match 172004 172162 150 -
>>>> . ID=pbar_scf7180000350377:hit:2506;Name=Hsal|HS9704;score=150;
>>>> pbar_scf7180000350377 protein2genome match_part 172004 172162 150 - . ID=pbar_scf7180000350377:hsp:2798;Parent=pbar_scf7180000350377:hit:2506;Name=Hsal|HS9704;Target=Hsal|HS9704
>>>> 1 53 +;Gap=M159;
>>>> pbar_scf7180000350377 maker mRNA 538308 558769 . + .
>>>> ID=pbar_scf7180000350377:hit:2506;Parent=augustus_masked-pbar_scf7180000350377-abinit-gene-5.29;Name=augustus_masked-pbar_scf7180000350377-abinit-gene-5.29-mRNA-1;_AED=0.48;_eAED=0.39;_QI=0|0|0|0.5|1|1|6|0|395;score=0.01;
>>>> pbar_scf7180000350377 maker exon 538308 538334 0.01 + .
>>>> ID=pbar_scf7180000350377:hit:2506:exon:305;Parent=pbar_scf7180000350377:hit:2506;
>>>> pbar_scf7180000350377 maker exon 538748 538968 0.01 + .
>>>> ID=pbar_scf7180000350377:hit:2506:exon:306;Parent=pbar_scf7180000350377:hit:2506;
>>>> pbar_scf7180000350377 maker exon 539842 540242 0.01 + .
>>>> ID=pbar_scf7180000350377:hit:2506:exon:307;Parent=pbar_scf7180000350377:hit:2506;
>>>> pbar_scf7180000350377 maker exon 542624 542798 0.01 + .
>>>> ID=pbar_scf7180000350377:hit:2506:exon:308;Parent=pbar_scf7180000350377:hit:2506;
>>>> pbar_scf7180000350377 maker exon 555823 556025 0.01 + .
>>>> ID=pbar_scf7180000350377:hit:2506:exon:309;Parent=pbar_scf7180000350377:hit:2506;
>>>> pbar_scf7180000350377 maker exon 558609 558769 0.01 + .
>>>> ID=pbar_scf7180000350377:hit:2506:exon:310;Parent=pbar_scf7180000350377:hit:2506;
>>>> pbar_scf7180000350377 maker CDS 538308 538334 . + 0
>>>> ID=pbar_scf7180000350377:hit:2506:cds:305;Parent=pbar_scf7180000350377:hit:2506;
>>>> pbar_scf7180000350377 maker CDS 538748 538968 . + 0
>>>> ID=pbar_scf7180000350377:hit:2506:cds:306;Parent=pbar_scf7180000350377:hit:2506;
>>>> pbar_scf7180000350377 maker CDS 539842 540242 . + 1
>>>> ID=pbar_scf7180000350377:hit:2506:cds:307;Parent=pbar_scf7180000350377:hit:2506;
>>>> pbar_scf7180000350377 maker CDS 542624 542798 . + 2
>>>> ID=pbar_scf7180000350377:hit:2506:cds:308;Parent=pbar_scf7180000350377:hit:2506;
>>>> pbar_scf7180000350377 maker CDS 555823 556025 . + 1
>>>> ID=pbar_scf7180000350377:hit:2506:cds:309;Parent=pbar_scf7180000350377:hit:2506;
>>>> pbar_scf7180000350377 maker CDS 558609 558769 . + 2
>>>> ID=pbar_scf7180000350377:hit:2506:cds:310;Parent=pbar_scf7180000350377:hit:2506;
>>>>
>>>> _______________________________________________
>>>> maker-devel mailing list
>>>> maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>
>>>>
>>>> Barry Moore
>>>> Research Scientist
>>>> Dept. of Human Genetics
>>>> University of Utah
>>>> Salt Lake City, UT 84112
>>>> --------------------------------------------
>>>> (801) 585-3543
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________ maker-devel mailing
>>>> list maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20130314/fb4a21e6/attachment-0002.html>


More information about the maker-devel mailing list