[maker-devel] How sensitive is MAKER to redundant/partial transcripts?

Wed Jul 4 06:32:05 MDT 2018

 Dear MAKER users,

I am new to MAKER and would like your advice.
I am planning to annotate multiple genomes of tomato variants and wild
relatives. To this end, I have been working on generating a diverse
transcripts data set to be used as input for MAKER (along with protein
sequences and the 'official' tomato annotation). My transcripts set was
generated by collecting multiple available RNA-Seq results from SRA,
covering diverse variants, conditions and tissues, and assembling them into
transcripts using Trinity. My goal is to have a data set as diverse and
broad as possible.
Now I have ~30 fasta files of transcripts, originating from different
studies. Of course, many of the transcripts are redundant and/or partial. I
am exploring ways to merge the multiple data sets into a non-redundant one,
while also stitching partial transcripts into longer ones based on overlaps.
However, this turns out to be not-so-trivial and I am wandering if this is
really necessary in order to get a good annotation? Maybe I can just
concatenate all my transcriptome assembly results, and MAKER will handle
redundant and partial transcripts?
Can someone clarify how this works, and try to assess if an annotation
based on a merged data set should be superior to one that didn't undergo
such a process? If someone has actual experience with such data, that
would be really helpful, but any advice would be highly appreciated.

Thanks a lot and best regards,
Lior
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20180704/90ee431a/attachment-0003.html>