[maker-devel] About loss of Histone H2A, H2B, H4

Carson Holt carsonhh at gmail.com
Mon Nov 27 12:56:04 MST 2017


You should not have to train separately for SNAP on unmasked sequence, and I do believe adding back genes that were rejected because of lack of support but contain an identifiable domain may help. These will be in the fasta files labeled non-overlapping file in the datastore.

—Carson

> On Nov 21, 2017, at 10:42 AM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
> 
> Dear Carson:
> 
> Thank you for your comments and suggestions. Now the SNAP was trained with repeat masked, is it necessary to retrain the predictor without repeat masking?
> By BUSCO analysis on the genome, the completeness is shown as below. Now I am doing the analysis using the default reports of Maker2 (i.e., gene models with evidence support, the default build). For the gene loss, besides you suggestions I am also considering to do the analysis using the gene models with evidence support plus those with scanned domains (i.e., standard build). How do you think? 
> 
> 
> C:95.0%[S:92.7%,D:2.3%],F:2.2%,M:2.8%,n:4104
>   3902  Complete BUSCOs (C)
>   3806  Complete and single-copy BUSCOs (S)
>   96  Complete and duplicated BUSCOs (D)
>   92  Fragmented BUSCOs (F)
>   110  Missing BUSCOs (M) 
> 
> Thanks
> Best
> Quanwei
>  
> 
> 2017-11-21 11:19 GMT-05:00 Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>>:
> No known biases, but if you are concerned, you can collect known Histone H2A, H2B, H4 proteins and transcripts from other species (protein= and altest= options), them run MAKER with no masking to see if you gain any models that may have been overlooked because of over-masking of repeats. Make sure to evaluate any models you find as being a pseudogene. Run InterProScan on results to make sure they contain known InterPro domains for that gene family as well. Running without repeat masking will increase sensitivity but also false positives derived from low homology alignments to simple repeats which is why you need to evaluate results using something like InterProScan.
> 
> Also run BUSCO to evaluate the completeness of the genome. Make sure that the observed contraction is not just a result of an incomplete assembly.
> 
> —Carson
> 
> 
>> On Nov 16, 2017, at 12:46 PM, Quanwei Zhang <qwzhang0601 at gmail.com <mailto:qwzhang0601 at gmail.com>> wrote:
>> 
>> Hello:
>> 
>> We have annotated a new rodent genome using Maker2. Based on the annotated maker2 gene sets, we did gene family expansion/contraction analysis using CAFE. We found Histone H2A, H2B, H4 gene families are under contraction. I wonder whether there are known bias to predict those gene families using Maker2? For example, can this due to repeat masking of the genome? I used repeatmaker and generated species specific repeat libraries follows http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic <http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic>.
>> 
>> Thanks
>> 
>> Best
>> Quanwei
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20171127/7fd3f659/attachment-0003.html>


More information about the maker-devel mailing list