[maker-devel] advanced repeat masking library constructions & rna-seq assembly choices

Sun May 7 19:17:37 MDT 2017

Michael can you answer the second question (Michael wrote the protocol, so I CC’d him).

With respect to the first question. Expression level is not necessarily relevant to the annotation process (so no MAKER does not look at read coverage). Instead we use the transcript assemblies to identify introns via splice aware alignment (yes it is the introns and not the exons we care about). Trinity has a nice option called jaccard_clip which avoids false merging of neighboring transcripts (mostly occurs in fungi where UTR can overlap). Merging of transcripts will cause extra introns to be assigned as hints as well as potential overextension of UTR during final polishing steps. The jaccard_clip option is the main reason we recommend Trinity. If Stringtie has a similar option, then it can be used as well.

Thanks,
Carson

> On May 4, 2017, at 12:37 AM, Salim Bougouffa <mjfi2sb3 at gmail.com> wrote:
> 
> Hi,
> 
> I am attempting to annotate a plant genome. I have a couple of questions:
> 
> 1) RNA-seq assembly
> a) I assembled my RNA-seq data using Trinity and StringTie. The two produce drastically different numbers. When I compare the two assemblies for each sample using TransRate, StringTie produces a higher score. for most of the assemblies. I see in all of the threads that you recommend Trinity but doesn't trinity produce way too many transcripts (even after chucking out the "bad" ones using transrate).
> b) During hint creation in MAKER, does it take into account that different transcripts have different read coverage (expression levels). I guess my question is should I filter transcripts that have a small read coverage.
> 
> 2) Repeat Masking 
> I am following the advanced repeat library construction tutorial (http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced <http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced>). The initial steps find 15 sequences for the LTR and 159 for MITE. But, when I get to the perl DIR_CRL/CRL_Step4.pl step, both output files (Inner_Seq_For_BLAST.fasta, lLTRs_Seq_For_BLAST.fasta) are empty.
> 
> a) are these numbers normal because I was expecting a lot more than 16 for the LTR? 
> b) I don't get any errors when I run CRL_Step4.pl yet no output. What's going on?!
> 
> Many thanks,
> /SB
> -- 
> ____________________________
> Sent from Inbox Mobile
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170507/28de8fb9/attachment-0003.html>