[maker-devel] maker annotation with cufflinks output

Mon Feb 3 09:31:16 MST 2014

Hi Daniel,

I was able to check on some of those questions.

1. From trinity assembly: I started with 102000 contigs. I used  
trinotate to annotate proteins in this.

I ran maker on this data with est2genome set to 1. The output looks  
like this (most important parts on top):

     6653 gene
    46675 exon
  280534 protein_match
59934 CDS
     969 contig
  105388 expressed_sequence_match
   12584 five_prime_UTR
   78565 match
1401369 match_part
   10180 mRNA
   11545 three_prime_UTR

2. From cufflinks assembly: I started with 133380 entries (out of  
which there are 29,000 transcripts).  I used the protein sequences  
from trinity assembly.

I ran maker on this data with est2genome set to 1. The output looks  
like this:
      29 gene
      75 exon
  573659 protein_match
67 CDS
    1099 contig
  269298 expressed_sequence_match
      23 five_prime_UTR
  173844 match
2221846 match_part
      29 mRNA
      23 three_prime_UTR

The genes annotated using the trinity assembly is lower than expected,  
so I went the cufflinks route. I dont understand why when using the  
cufflinks transcripts, even less genes are being found.

3. Training SNAP:  I used the results of maker from 1 to train SNAP.   
I then used that training set to rerun maker:
snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
maker_mpi_withAlltrinity/snap/RHA.hmm
est2genome=0

And again I got results with no entries for gene, exon, CDS etc.
957 contig
   46555 expressed_sequence_match
   43651 match
  553633 match_part
  113738 protein_match

As I mentioned in another email, cegma results indicated that the  
genome was more than 90% complete. Any suggestions would be helpful.

Thank you
Dhivya

On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:

> Hi Dhivya,
>
> I think there a few numbers that could be helpful to understand  
> what's happening here.
>
> How many transcripts did Trinity assembly the RNA-seq data into?  
> Also, you had 29,000 transcripts from cufflinks, but fewer from  
> MAKER when you gave it the cufflinks data. How many transcripts did  
> MAKER identify with the cufflinks data? Did you still get more than  
> the 10,000 transcripts that you found with just the Trinity data?
>
> A key part of MAKER's approach to genome annotation that might be  
> affecting it's performance is that it only annotates a gene where  
> there is both evidence (like your RNA-seq data) and an ab-initio  
> prediction. If a prediction is unsupported by the evidence, then  
> MAKER won't annotate a gene and if evidence aligns where there's no  
> prediction, MAKER won't annotate a gene either. What ab-initio  
> predictors are you using and have they been trained specific genome?
>
> You can force MAKER to automatically promote evidence alignments to  
> a gene model by setting the est2genome option to 1, but that will  
> usually give you many false positives.
>
> Try rerunning it with either the Trinity data or the Cufflinks data  
> and with est2genome set to 1, and let us know how that affects the  
> MAKER results.
>
> Thanks,
> Daniel
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ________________________________________
> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of  
> dhivya arasappan [darasappan at gmail.com]
> Sent: Thursday, January 30, 2014 11:18 AM
> To: maker-devel at yandell-lab.org
> Subject: [maker-devel] maker annotation with cufflinks output
>
> Hello,
>
> I am trying to annotate a 200 mb plant genome for which I have a very
> good assembly.
>
> I tried to denovo assemble RNA-seq data using trinity and ran maker
> using my genome assembly and the trinity results.  I did not get as
> many transcripts as expected, around 10,000 transcripts.
>
> So, I decided to try a different approach.  I did a genome assisted
> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
> generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
> genome assembly and the cufflinks result.  I get much less number of
> transcripts as a result.
>
> If cufflinks found 29000 transcripts by mapping to the genome, I'm
> confused as to why maker is not finding the same.
>
> Any suggestions would be appreciated.
>
> Thanks
> Dhivya
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- 
> lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140203/f454f816/attachment-0002.html>