[maker-devel] genome duplication?

Xabier Vázquez Campos xvazquezc at gmail.com
Sat Jan 31 01:51:36 MST 2015


Thanks Mikael,

This are the assembly stats as taken from abyss-fac, indeed it isn't a
great N50, but it isn't that bad either

   n       n:500   n:N50   min     N80      N50     N20      E-size
max     sum
14277   7099    1185    500     4698    10771   20438   14530   154519
42.68e6



2015-01-31 19:42 GMT+11:00 Mikael Brandström Durling <mikael.durling at slu.se>
:

>  Hi Xabier,
>
>  31 jan 2015 kl. 05:48 skrev Xabier Vázquez Campos <xvazquezc at gmail.com>:
>
>  Hi all,
>
> One of the fungal genomes I'm annotating is relatively shattered (?), with
> many contigs/scaffolds and based on CEGMA analysis only may indicate a
> potential widespread duplication of the genome
>
>  #      Statistics of the completeness of the genome based on 248 CEGs
>>  #
>>               #Prots  %Completeness  -  #Total  Average  %Ortho
>>
>>   Complete      181       72.98      -   365     2.02     67.40
>>    Partial      230       92.74      -   528     2.30     77.83
>>
>
>
>  Judging from these figure, you seem to have a very fragmented assembly?
> What N50 have you reached? According to my experience, assemblies with an
> N50 below 5-10 times the average gene length tend to give problems in
> producing good gene sets. Not to say that the gene sets are unusable, but
> for comparing e.g. gene complements to other species, it will be hard to
> draw any conclusions when a high proportion of the genes are incomplete.
>
>  The expected genome size is relatively low (~42 Mb by abyss-fac) in
> comparison with *Hortaea werneckii* (51.6Mb, 23333 genes), a related
> fungi with nearly 90% of its genes present in at least two copies.
> Paper:
> http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0071328
>
>  Now to the Maker part... So, as part of the Maker annotation, I trained
> SNAP and Augustus, and I generated a specific RepeatModeler library. I
> recorded the predicted outputs from each Maker run (AED, number of
> predicted proteins and transcripts...). Both Augustus and SNAP used to give
> quite high number (~19000 and ~23000 respectively) in comparison with the
> xxx.all.maker.proteins.fasta (about 13600). So, my first question is, how
> does maker deal with gene duplications? Or is this just a phenomenon given
> that there is no support from the protein files provided initially to
> Maker? I've used 4 different protein files for the annotation, could it be
> that they weren't the best choices? I picked them from the closest
> relatives and similar environments
>
>
>  Unless you by mistake filter out duplicated gene families as repeats
> with repeat modeler, maker should not care about duplicated genes. However,
> maker, without keep_preds=1, reports only genes with some kind of support
> (be it EST or protein homology). This is rather conservative, but if you
> enable keep_preds, you will get more genes as you have noted. Just for the
> sake of comparison, I have reannotad more than ten genomes downloaded from
> JGI, providing MAKER with similar evidence as JGI, and consistently, MAKER
> is reporting fewer gene models. I have yet to do a more thorough comparison
> to tell what genes JGI are reporting that don’t appear in the MAKER
> annotations.
>
>
>  So, in my last run I turn the keep_preds=1 and the proteins in the
> xxx.all.maker.proteins.fasta reached to
>
>  Last question regarding the protein files. I download the annotated
> genomes from the JGI and most of them have two annotation folders
> "All_models,_Filtered_and_Not" and "Filtered_Models___best__". I've been
> using the protein files found in the later as I expected to have real
> evidence and a lower chance of being predicting false genes. Am I right?
>
>
>  Yes, I would say so. The FilteredModels have passed through their model
> selection pipeline, while all_models contains models from all predictors,
> as well as combinations of predictors and EST evidence.
>
>  Just some 2 cents of observations of mine,
> cheers,
> Mikael
>
>
>  Thank you in advance,
>
>  Xabier
>
>
> --
> Xabier Vázquez Campos
> PhD Candidate
> Water Research Centre
> School of Civil and Environmental Engineering
> The University of New South Wales
> Sydney NSW 2052 AUSTRALIA
>  _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>


-- 
Xabier Vázquez Campos
*PhD Candidate*
Water Research Centre
School of Civil and Environmental Engineering
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20150131/04369c58/attachment-0003.html>


More information about the maker-devel mailing list