[maker-devel] maker annotation with cufflinks output
dhivya arasappan
darasappan at gmail.com
Thu Jan 30 14:22:14 MST 2014
Thank you for this information. Our server is currently down, so I'm
unable to get you all the statistics you've asked for. Once the server
is back up, I'll email again with more numbers
But I can tell you that I did run cegma first and got around 92%
completeness (full genes) and 98% completeness (partial genes). This
is why I'm even more puzzled by maker results.
Thanks again
Dhivya
On Jan 30, 2014, at 3:14 PM, Carson Holt wrote:
> What you get back from cufflinks should not necessarily be
> considered a
> transcript count, and you should always expect the count given by
> cufflinks to be high relative to assembly methods like trinity
> (especially
> in plants). This is because repetitive elements, spurious
> alignments, and
> pseudogenes will all inflate the count because it is an alignment
> based
> method which can be more sensitive but will also generate a lot of
> false
> positives. Fortunately the false positives will mostly be singe exon
> results and will be filtered out by maker. Also your mRNA-seq data
> from
> cufflinks will contribute to hints that can generate genes in the
> absence
> of an ab-intio gene prediction, but if the gene finder doesn’t think
> the
> hints make sense it will ignore them. So a lot of cufflinks results
> that
> don’t make sense with respect to ORF etc., will fall into the
> category of
> being ignored.
>
> In addition, you should try running your pipeline through CEGMA
> (http://korflab.ucdavis.edu/datasets/cegma/) to identify the expected
> completeness of the genome. For example if a genome of 70%
> completeness
> then you only expect to recover 70% of the genes. I believe CEGMA
> can also
> be run online from the iPlant discovery environment and iPlant
> atmosphere
> images. Also make sure you are including proteins with your MAKER
> run,
> as not all genes will be expressed, so mRNAseq will only capture a
> portion
> of the genes and that portion can be as low as 50%.
>
> Thanks,
> Carson
>
>
> On 1/30/14, 1:51 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>
>> Hi Dhivya,
>>
>> I think there a few numbers that could be helpful to understand
>> what's
>> happening here.
>>
>> How many transcripts did Trinity assembly the RNA-seq data into?
>> Also,
>> you had 29,000 transcripts from cufflinks, but fewer from MAKER
>> when you
>> gave it the cufflinks data. How many transcripts did MAKER identify
>> with
>> the cufflinks data? Did you still get more than the 10,000
>> transcripts
>> that you found with just the Trinity data?
>>
>> A key part of MAKER's approach to genome annotation that might be
>> affecting it's performance is that it only annotates a gene where
>> there
>> is both evidence (like your RNA-seq data) and an ab-initio
>> prediction. If
>> a prediction is unsupported by the evidence, then MAKER won't
>> annotate a
>> gene and if evidence aligns where there's no prediction, MAKER won't
>> annotate a gene either. What ab-initio predictors are you using and
>> have
>> they been trained specific genome?
>>
>> You can force MAKER to automatically promote evidence alignments to a
>> gene model by setting the est2genome option to 1, but that will
>> usually
>> give you many false positives.
>>
>> Try rerunning it with either the Trinity data or the Cufflinks data
>> and
>> with est2genome set to 1, and let us know how that affects the MAKER
>> results.
>>
>> Thanks,
>> Daniel
>>
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> ________________________________________
>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>> dhivya arasappan [darasappan at gmail.com]
>> Sent: Thursday, January 30, 2014 11:18 AM
>> To: maker-devel at yandell-lab.org
>> Subject: [maker-devel] maker annotation with cufflinks output
>>
>> Hello,
>>
>> I am trying to annotate a 200 mb plant genome for which I have a very
>> good assembly.
>>
>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>> using my genome assembly and the trinity results. I did not get as
>> many transcripts as expected, around 10,000 transcripts.
>>
>> So, I decided to try a different approach. I did a genome assisted
>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>> generated 21,000 genes, 29,000 transcripts. I then ran maker using
>> my
>> genome assembly and the cufflinks result. I get much less number of
>> transcripts as a result.
>>
>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>> confused as to why maker is not finding the same.
>>
>> Any suggestions would be appreciated.
>>
>> Thanks
>> Dhivya
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
More information about the maker-devel
mailing list