[maker-devel] Fwd: maker annotation with cufflinks output

Tue Feb 4 15:43:19 MST 2014

Resending this since it didnt make it to the mailing list before.

>
> I was able to check on some of those questions.
>
> 1. From trinity assembly: I started with 102000 contigs. I used  
> trinotate to annotate proteins in this.
>
> I ran maker on this data with est2genome set to 1. The output looks  
> like this (most important parts on top):
>
>     6653 gene
>    46675 exon
>  280534 protein_match
> 59934 CDS
>     969 contig
>  105388 expressed_sequence_match
>   12584 five_prime_UTR
>   78565 match
> 1401369 match_part
>   10180 mRNA
>   11545 three_prime_UTR
>
> 2. From cufflinks assembly: I started with 133380 entries (out of  
> which there are 29,000 transcripts).  I used the protein sequences  
> from trinity assembly.
>
> I ran maker on this data with est2genome set to 1. The output looks  
> like this:
>      29 gene
>      75 exon
>  573659 protein_match
> 67 CDS
>    1099 contig
>  269298 expressed_sequence_match
>      23 five_prime_UTR
>  173844 match
> 2221846 match_part
>      29 mRNA
>      23 three_prime_UTR
>
> The genes annotated using the trinity assembly is lower than  
> expected, so I went the cufflinks route. I dont understand why when  
> using the cufflinks transcripts, even less genes are being found.
>
> 3. Training SNAP:  I used the results of maker from 1 to train  
> SNAP.  I then used that training set to rerun maker:
> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
> maker_mpi_withAlltrinity/snap/RHA.hmm
> est2genome=0
>
> And again I got results with no entries for gene, exon, CDS etc.
> 957 contig
>   46555 expressed_sequence_match
>   43651 match
>  553633 match_part
>  113738 protein_match
>
> As I mentioned in another email, cegma results indicated that the  
> genome was more than 90% complete. Any suggestions would be helpful.
>
> Thank you
> Dhivya
>
>
>
>
> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>
>> Hi Dhivya,
>>
>> I think there a few numbers that could be helpful to understand  
>> what's happening here.
>>
>> How many transcripts did Trinity assembly the RNA-seq data into?  
>> Also, you had 29,000 transcripts from cufflinks, but fewer from  
>> MAKER when you gave it the cufflinks data. How many transcripts did  
>> MAKER identify with the cufflinks data? Did you still get more than  
>> the 10,000 transcripts that you found with just the Trinity data?
>>
>> A key part of MAKER's approach to genome annotation that might be  
>> affecting it's performance is that it only annotates a gene where  
>> there is both evidence (like your RNA-seq data) and an ab-initio  
>> prediction. If a prediction is unsupported by the evidence, then  
>> MAKER won't annotate a gene and if evidence aligns where there's no  
>> prediction, MAKER won't annotate a gene either. What ab-initio  
>> predictors are you using and have they been trained specific genome?
>>
>> You can force MAKER to automatically promote evidence alignments to  
>> a gene model by setting the est2genome option to 1, but that will  
>> usually give you many false positives.
>>
>> Try rerunning it with either the Trinity data or the Cufflinks data  
>> and with est2genome set to 1, and let us know how that affects the  
>> MAKER results.
>>
>> Thanks,
>> Daniel
>>
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> ________________________________________
>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf  
>> of dhivya arasappan [darasappan at gmail.com]
>> Sent: Thursday, January 30, 2014 11:18 AM
>> To: maker-devel at yandell-lab.org
>> Subject: [maker-devel] maker annotation with cufflinks output
>>
>> Hello,
>>
>> I am trying to annotate a 200 mb plant genome for which I have a very
>> good assembly.
>>
>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>> using my genome assembly and the trinity results.  I did not get as
>> many transcripts as expected, around 10,000 transcripts.
>>
>> So, I decided to try a different approach.  I did a genome assisted
>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using  
>> my
>> genome assembly and the cufflinks result.  I get much less number of
>> transcripts as a result.
>>
>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>> confused as to why maker is not finding the same.
>>
>> Any suggestions would be appreciated.
>>
>> Thanks
>> Dhivya
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140204/b1755e26/attachment-0003.html>