[maker-devel] repeats

Wed Sep 30 10:03:09 MDT 2015

Hi Carson - 
> On Sep 30, 2015, at 5:43 PM, Carson Holt <carsonhh at gmail.com> wrote:
> 
> MAKER’s standard repeat masking protocol is to use RepeatMasker to identify repeat, then repeatrunner to extend masking for diverged repeats.  Complex repeats will be hard masked and simple repeats will be soft masked (anything coming from GFF3 will be hard masked).  BLAST then runs to identify evidence alignments against the masked genome assembly.  Exonerate is then allowed to polish the BLAST alignments with any applied masking removed (this is because we already have an alignment outside of the masked region so removing masking keeps it from interfering with the polishing).
> 
> It is possible that REPET is not capturing the full repeat which would allow partial alignment outside of masked regions that can then be polished back into masked regions, or you have mRNA-seq evidence where the repeat has been assembled into the transcript sequence (so the repeat gets polished back in). If that is the case you may want to consider letting RepeatMasker and RepeatRunner run along side with the supplied repeat GFF3.  Alternatively you could try hard masking the genome assembly before ever giving it to MAKER (so REPET masked regions can never be unmasked), but that might cause some issue with some polishing steps.
> 

Yes, I suspect our cufflinks analysis was either  run on the unmasked genome or with a different version of the repeats so that probably explains it.

> Also if your ab initio predictors are calling genes on opposite strands, and one predictor seems to perform particularly poorly, you may want to drop it from your analysis.  I find that I have to do this with GeneMark sometimes.
> 
Yes I had considered that and in fact we already dropped genemark. A lot of the erroneous genes appear to come from snap but if i drop that too I'm only left with augustus which was trained on a different species. We tried training augustus but we never got results that we thought were better than the existing models. Looks like our snap training has issues too. for now I think we'll fix the problems manually and in the future work on our training procedures.

thanks for your help.

> Thanks,
> Carson
> 
> 
>> On Sep 30, 2015, at 8:54 AM, Michael Thon <mike.thon at gmail.com> wrote:
>> 
>> Hi all - the other issue that we're having with maker is with repeats. We have an annotation of repeats done by a colleague using REPET. I'm passing the annotation in using the rm_gff option and leaving all the other repeat masking options turned off. I found at least one case where a CDS of a final gene model overlaps with a repeat annotation. Does this indicate some problem with my input file or with MAKER?
>> 
>> Thanks
>> Mike
>> 
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>