[maker-devel] genome duplication?
Jason Stajich
jason.stajich at gmail.com
Sat Jan 31 16:21:12 MST 2015
Xabier -
FYI - though you probably already compared, those stats are on par with
the Hortaea v1 assembly, (we do have an improved Hortaea assembly now and
genome size is still same range supporting the duplication hypothesis)
Hw version 1 asmbly -
N50 9623; Max 71563
CEGMA for Hw1
#Prots %Completeness - #Total Average %Ortho
Complete 196 79.03 - 498 2.54 81.12
Partial 228 91.94 - 673 2.95 95.18
Mikael - yes - we should compare notes on the models JGI is calling which
have little support in MAKER - I am not sure if their pipeline runs with
augustus/snap using informant hints though usually they are bringing RNAseq
into the mix - I don't know if your approach for reannotation assembled the
RNAseq and used it as evidence?
We'll be trying to assess some of this when comparisons of proportion of
shared genes in the first 1KFG paper so we may be able to say with more
certainty of these extra predictions whether they are shared more widely
and get a handle on singleton/false positives rates.
Jason
Jason Stajich
jason.stajich at gmail.com
On Sat, Jan 31, 2015 at 12:51 AM, Xabier Vázquez Campos <xvazquezc at gmail.com
> wrote:
> Thanks Mikael,
>
> This are the assembly stats as taken from abyss-fac, indeed it isn't a
> great N50, but it isn't that bad either
>
> n n:500 n:N50 min N80 N50 N20 E-size
> max sum
> 14277 7099 1185 500 4698 10771 20438 14530 154519
> 42.68e6
>
>
>
> 2015-01-31 19:42 GMT+11:00 Mikael Brandström Durling <
> mikael.durling at slu.se>:
>
>> Hi Xabier,
>>
>> 31 jan 2015 kl. 05:48 skrev Xabier Vázquez Campos <xvazquezc at gmail.com>:
>>
>> Hi all,
>>
>> One of the fungal genomes I'm annotating is relatively shattered (?),
>> with many contigs/scaffolds and based on CEGMA analysis only may indicate a
>> potential widespread duplication of the genome
>>
>> # Statistics of the completeness of the genome based on 248 CEGs
>>> #
>>> #Prots %Completeness - #Total Average %Ortho
>>>
>>> Complete 181 72.98 - 365 2.02 67.40
>>> Partial 230 92.74 - 528 2.30 77.83
>>>
>>
>>
>> Judging from these figure, you seem to have a very fragmented assembly?
>> What N50 have you reached? According to my experience, assemblies with an
>> N50 below 5-10 times the average gene length tend to give problems in
>> producing good gene sets. Not to say that the gene sets are unusable, but
>> for comparing e.g. gene complements to other species, it will be hard to
>> draw any conclusions when a high proportion of the genes are incomplete.
>>
>> The expected genome size is relatively low (~42 Mb by abyss-fac) in
>> comparison with *Hortaea werneckii* (51.6Mb, 23333 genes), a related
>> fungi with nearly 90% of its genes present in at least two copies.
>> Paper:
>> http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0071328
>>
>> Now to the Maker part... So, as part of the Maker annotation, I trained
>> SNAP and Augustus, and I generated a specific RepeatModeler library. I
>> recorded the predicted outputs from each Maker run (AED, number of
>> predicted proteins and transcripts...). Both Augustus and SNAP used to give
>> quite high number (~19000 and ~23000 respectively) in comparison with the
>> xxx.all.maker.proteins.fasta (about 13600). So, my first question is, how
>> does maker deal with gene duplications? Or is this just a phenomenon given
>> that there is no support from the protein files provided initially to
>> Maker? I've used 4 different protein files for the annotation, could it be
>> that they weren't the best choices? I picked them from the closest
>> relatives and similar environments
>>
>>
>> Unless you by mistake filter out duplicated gene families as repeats
>> with repeat modeler, maker should not care about duplicated genes. However,
>> maker, without keep_preds=1, reports only genes with some kind of support
>> (be it EST or protein homology). This is rather conservative, but if you
>> enable keep_preds, you will get more genes as you have noted. Just for the
>> sake of comparison, I have reannotad more than ten genomes downloaded from
>> JGI, providing MAKER with similar evidence as JGI, and consistently, MAKER
>> is reporting fewer gene models. I have yet to do a more thorough comparison
>> to tell what genes JGI are reporting that don’t appear in the MAKER
>> annotations.
>>
>>
>> So, in my last run I turn the keep_preds=1 and the proteins in the
>> xxx.all.maker.proteins.fasta reached to
>>
>> Last question regarding the protein files. I download the annotated
>> genomes from the JGI and most of them have two annotation folders
>> "All_models,_Filtered_and_Not" and "Filtered_Models___best__". I've been
>> using the protein files found in the later as I expected to have real
>> evidence and a lower chance of being predicting false genes. Am I right?
>>
>>
>> Yes, I would say so. The FilteredModels have passed through their model
>> selection pipeline, while all_models contains models from all predictors,
>> as well as combinations of predictors and EST evidence.
>>
>> Just some 2 cents of observations of mine,
>> cheers,
>> Mikael
>>
>>
>> Thank you in advance,
>>
>> Xabier
>>
>>
>> --
>> Xabier Vázquez Campos
>> PhD Candidate
>> Water Research Centre
>> School of Civil and Environmental Engineering
>> The University of New South Wales
>> Sydney NSW 2052 AUSTRALIA
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>>
>
>
> --
> Xabier Vázquez Campos
> *PhD Candidate*
> Water Research Centre
> School of Civil and Environmental Engineering
> The University of New South Wales
> Sydney NSW 2052 AUSTRALIA
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20150131/53cfcb24/attachment-0003.html>
More information about the maker-devel
mailing list