[maker-devel] Transcript & protein fasta sequence id/name collisions
Carson Holt
carsonhh at gmail.com
Tue Jun 12 14:19:19 MDT 2018
The ID= tag in GFF3 is not meant to be an identifier (despite being called ID). Instead it's a tag used to determine inheritance when reassembling a feature. It will be treated as such by all GMOD tools while Name will be treated as the user facing identifier. To further complicate things, ‘Name' is not required to be unique (some groups like to name multi-copy genes the same and then have multiple locations in the GFF3 - you will see this all the time in NCBI derived GFF3 for example). I would not be surprised if fgenesh does not produce "best-practice” GFF3. You may need to slightly alter it before using it.
On the MAKER end, at the very least MAKER it would be nice for it to complain that the names in the input file are not unique (even though non-unique names are allowed in GFF3). If you give GFF3 to pred_gff then MAKER should build it’s own unique names for things, but for model_gff it will keep the name you give it.
—Carson
> On Jun 12, 2018, at 12:08 PM, Stein, Joshua <steinj at cshl.edu> wrote:
>
> Dear Carson and maker-devel group,
>
> In our recent MAKER run, some of the transcript and protein id’s in the fasta files (e.g. *.all.maker.transcripts.fasta) correspond to the "Name=“ field of the GFF. The problem is that these names are not unique, so for example the transcript ID ‘mRNA_4’ occurs 45 times, thus making it difficult to determine their corresponding gene ids. My colleague, Kapeel, who is copied, believes this happens when fgenesh results are passed to MAKER as a GFF using the “pred_gff=“ parameter.
>
> How do we prevent this from happening in the future (e.g. tell MAKER to use the "ID=“ field for transcript/protein fasta id’s)?
> Do you have any tips on how to triage the current situation (i.e. figure out which fasta corresponds to which gene)? I suppose it is possible to match up based on the quality metrics in the definition line, assuming these are unique.
>
> Thanks,
> Josh
>
>
> Joshua Stein, PhD
> Manager, Sci. Informatics III
> Cold Spring Harbor Laboratory
> steinj at cshl.edu
> http://ware.cshl.org/
>
>
>
More information about the maker-devel
mailing list