[maker-devel] maker annotation with cufflinks output
dhivya arasappan
darasappan at gmail.com
Mon Feb 3 09:31:16 MST 2014
Hi Daniel,
I was able to check on some of those questions.
1. From trinity assembly: I started with 102000 contigs. I used
trinotate to annotate proteins in this.
I ran maker on this data with est2genome set to 1. The output looks
like this (most important parts on top):
6653 gene
46675 exon
280534 protein_match
59934 CDS
969 contig
105388 expressed_sequence_match
12584 five_prime_UTR
78565 match
1401369 match_part
10180 mRNA
11545 three_prime_UTR
2. From cufflinks assembly: I started with 133380 entries (out of
which there are 29,000 transcripts). I used the protein sequences
from trinity assembly.
I ran maker on this data with est2genome set to 1. The output looks
like this:
29 gene
75 exon
573659 protein_match
67 CDS
1099 contig
269298 expressed_sequence_match
23 five_prime_UTR
173844 match
2221846 match_part
29 mRNA
23 three_prime_UTR
The genes annotated using the trinity assembly is lower than expected,
so I went the cufflinks route. I dont understand why when using the
cufflinks transcripts, even less genes are being found.
3. Training SNAP: I used the results of maker from 1 to train SNAP.
I then used that training set to rerun maker:
snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/
maker_mpi_withAlltrinity/snap/RHA.hmm
est2genome=0
And again I got results with no entries for gene, exon, CDS etc.
957 contig
46555 expressed_sequence_match
43651 match
553633 match_part
113738 protein_match
As I mentioned in another email, cegma results indicated that the
genome was more than 90% complete. Any suggestions would be helpful.
Thank you
Dhivya
On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
> Hi Dhivya,
>
> I think there a few numbers that could be helpful to understand
> what's happening here.
>
> How many transcripts did Trinity assembly the RNA-seq data into?
> Also, you had 29,000 transcripts from cufflinks, but fewer from
> MAKER when you gave it the cufflinks data. How many transcripts did
> MAKER identify with the cufflinks data? Did you still get more than
> the 10,000 transcripts that you found with just the Trinity data?
>
> A key part of MAKER's approach to genome annotation that might be
> affecting it's performance is that it only annotates a gene where
> there is both evidence (like your RNA-seq data) and an ab-initio
> prediction. If a prediction is unsupported by the evidence, then
> MAKER won't annotate a gene and if evidence aligns where there's no
> prediction, MAKER won't annotate a gene either. What ab-initio
> predictors are you using and have they been trained specific genome?
>
> You can force MAKER to automatically promote evidence alignments to
> a gene model by setting the est2genome option to 1, but that will
> usually give you many false positives.
>
> Try rerunning it with either the Trinity data or the Cufflinks data
> and with est2genome set to 1, and let us know how that affects the
> MAKER results.
>
> Thanks,
> Daniel
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ________________________________________
> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
> dhivya arasappan [darasappan at gmail.com]
> Sent: Thursday, January 30, 2014 11:18 AM
> To: maker-devel at yandell-lab.org
> Subject: [maker-devel] maker annotation with cufflinks output
>
> Hello,
>
> I am trying to annotate a 200 mb plant genome for which I have a very
> good assembly.
>
> I tried to denovo assemble RNA-seq data using trinity and ran maker
> using my genome assembly and the trinity results. I did not get as
> many transcripts as expected, around 10,000 transcripts.
>
> So, I decided to try a different approach. I did a genome assisted
> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
> generated 21,000 genes, 29,000 transcripts. I then ran maker using my
> genome assembly and the cufflinks result. I get much less number of
> transcripts as a result.
>
> If cufflinks found 29000 transcripts by mapping to the genome, I'm
> confused as to why maker is not finding the same.
>
> Any suggestions would be appreciated.
>
> Thanks
> Dhivya
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-
> lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140203/f454f816/attachment-0002.html>
More information about the maker-devel
mailing list