[maker-devel] Fwd: maker annotation with cufflinks output
dhivya arasappan
darasappan at gmail.com
Tue Feb 4 15:43:19 MST 2014
Resending this since it didnt make it to the mailing list before.
>
> I was able to check on some of those questions.
>
> 1. From trinity assembly: I started with 102000 contigs. I used
> trinotate to annotate proteins in this.
>
> I ran maker on this data with est2genome set to 1. The output looks
> like this (most important parts on top):
>
> 6653 gene
> 46675 exon
> 280534 protein_match
> 59934 CDS
> 969 contig
> 105388 expressed_sequence_match
> 12584 five_prime_UTR
> 78565 match
> 1401369 match_part
> 10180 mRNA
> 11545 three_prime_UTR
>
> 2. From cufflinks assembly: I started with 133380 entries (out of
> which there are 29,000 transcripts). I used the protein sequences
> from trinity assembly.
>
> I ran maker on this data with est2genome set to 1. The output looks
> like this:
> 29 gene
> 75 exon
> 573659 protein_match
> 67 CDS
> 1099 contig
> 269298 expressed_sequence_match
> 23 five_prime_UTR
> 173844 match
> 2221846 match_part
> 29 mRNA
> 23 three_prime_UTR
>
> The genes annotated using the trinity assembly is lower than
> expected, so I went the cufflinks route. I dont understand why when
> using the cufflinks transcripts, even less genes are being found.
>
> 3. Training SNAP: I used the results of maker from 1 to train
> SNAP. I then used that training set to rerun maker:
> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/
> maker_mpi_withAlltrinity/snap/RHA.hmm
> est2genome=0
>
> And again I got results with no entries for gene, exon, CDS etc.
> 957 contig
> 46555 expressed_sequence_match
> 43651 match
> 553633 match_part
> 113738 protein_match
>
> As I mentioned in another email, cegma results indicated that the
> genome was more than 90% complete. Any suggestions would be helpful.
>
> Thank you
> Dhivya
>
>
>
>
> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>
>> Hi Dhivya,
>>
>> I think there a few numbers that could be helpful to understand
>> what's happening here.
>>
>> How many transcripts did Trinity assembly the RNA-seq data into?
>> Also, you had 29,000 transcripts from cufflinks, but fewer from
>> MAKER when you gave it the cufflinks data. How many transcripts did
>> MAKER identify with the cufflinks data? Did you still get more than
>> the 10,000 transcripts that you found with just the Trinity data?
>>
>> A key part of MAKER's approach to genome annotation that might be
>> affecting it's performance is that it only annotates a gene where
>> there is both evidence (like your RNA-seq data) and an ab-initio
>> prediction. If a prediction is unsupported by the evidence, then
>> MAKER won't annotate a gene and if evidence aligns where there's no
>> prediction, MAKER won't annotate a gene either. What ab-initio
>> predictors are you using and have they been trained specific genome?
>>
>> You can force MAKER to automatically promote evidence alignments to
>> a gene model by setting the est2genome option to 1, but that will
>> usually give you many false positives.
>>
>> Try rerunning it with either the Trinity data or the Cufflinks data
>> and with est2genome set to 1, and let us know how that affects the
>> MAKER results.
>>
>> Thanks,
>> Daniel
>>
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> ________________________________________
>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf
>> of dhivya arasappan [darasappan at gmail.com]
>> Sent: Thursday, January 30, 2014 11:18 AM
>> To: maker-devel at yandell-lab.org
>> Subject: [maker-devel] maker annotation with cufflinks output
>>
>> Hello,
>>
>> I am trying to annotate a 200 mb plant genome for which I have a very
>> good assembly.
>>
>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>> using my genome assembly and the trinity results. I did not get as
>> many transcripts as expected, around 10,000 transcripts.
>>
>> So, I decided to try a different approach. I did a genome assisted
>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>> generated 21,000 genes, 29,000 transcripts. I then ran maker using
>> my
>> genome assembly and the cufflinks result. I get much less number of
>> transcripts as a result.
>>
>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>> confused as to why maker is not finding the same.
>>
>> Any suggestions would be appreciated.
>>
>> Thanks
>> Dhivya
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140204/b1755e26/attachment-0003.html>
More information about the maker-devel
mailing list