[maker-devel] maker annotation with cufflinks output
Daniel Ence
dence at genetics.utah.edu
Wed Feb 5 12:28:48 MST 2014
Hi Dhivya, Are the protein matches in your results coming from your annotations of the transcriptome? You should really use amino-acid sequences from related organisms and some kind of omnibus source like SwissProt.
Thanks,
Daniel
Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: Carson Holt [carsonhh at gmail.com]
Sent: Wednesday, February 05, 2014 11:38 AM
To: dhivya arasappan; Daniel Ence
Cc: maker-devel at yandell-lab.org
Subject: Re: [maker-devel] maker annotation with cufflinks output
Do you have any features of type snap in your results from step 3? We’ve had a couple of recent posts where after training snap was giving no results, and as a result maker couldn’t give any genes. One cause of something like that may be your step 2. Make sure the ZFF wasn’t empty you used to train with. The maker2zff script uses filters to only put the best genes in the off file, and if all your genes fail the filtering then you are training with an empty ZFF.
Also you should use proteins from a related species as your protein file. I see that you protein marches are varying wildly from run to run? So is your contig count? Were the subset of contigs you have results for long enough to contain genes?
—Carson
From: dhivya arasappan <darasappan at gmail.com<mailto:darasappan at gmail.com>>
Date: Monday, February 3, 2014 at 9:31 AM
To: Daniel Ence <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] maker annotation with cufflinks output
Hi Daniel,
I was able to check on some of those questions.
1. From trinity assembly: I started with 102000 contigs. I used trinotate to annotate proteins in this.
I ran maker on this data with est2genome set to 1. The output looks like this (most important parts on top):
6653 gene
46675 exon
280534 protein_match
59934 CDS
969 contig
105388 expressed_sequence_match
12584 five_prime_UTR
78565 match
1401369 match_part
10180 mRNA
11545 three_prime_UTR
2. From cufflinks assembly: I started with 133380 entries (out of which there are 29,000 transcripts). I used the protein sequences from trinity assembly.
I ran maker on this data with est2genome set to 1. The output looks like this:
29 gene
75 exon
573659 protein_match
67 CDS
1099 contig
269298 expressed_sequence_match
23 five_prime_UTR
173844 match
2221846 match_part
29 mRNA
23 three_prime_UTR
The genes annotated using the trinity assembly is lower than expected, so I went the cufflinks route. I dont understand why when using the cufflinks transcripts, even less genes are being found.
3. Training SNAP: I used the results of maker from 1 to train SNAP. I then used that training set to rerun maker:
snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap/RHA.hmm
est2genome=0
And again I got results with no entries for gene, exon, CDS etc.
957 contig
46555 expressed_sequence_match
43651 match
553633 match_part
113738 protein_match
As I mentioned in another email, cegma results indicated that the genome was more than 90% complete. Any suggestions would be helpful.
Thank you
Dhivya
On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
Hi Dhivya,
I think there a few numbers that could be helpful to understand what's happening here.
How many transcripts did Trinity assembly the RNA-seq data into? Also, you had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it the cufflinks data. How many transcripts did MAKER identify with the cufflinks data? Did you still get more than the 10,000 transcripts that you found with just the Trinity data?
A key part of MAKER's approach to genome annotation that might be affecting it's performance is that it only annotates a gene where there is both evidence (like your RNA-seq data) and an ab-initio prediction. If a prediction is unsupported by the evidence, then MAKER won't annotate a gene and if evidence aligns where there's no prediction, MAKER won't annotate a gene either. What ab-initio predictors are you using and have they been trained specific genome?
You can force MAKER to automatically promote evidence alignments to a gene model by setting the est2genome option to 1, but that will usually give you many false positives.
Try rerunning it with either the Trinity data or the Cufflinks data and with est2genome set to 1, and let us know how that affects the MAKER results.
Thanks,
Daniel
Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org>] on behalf of dhivya arasappan [darasappan at gmail.com<mailto:darasappan at gmail.com>]
Sent: Thursday, January 30, 2014 11:18 AM
To: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
Subject: [maker-devel] maker annotation with cufflinks output
Hello,
I am trying to annotate a 200 mb plant genome for which I have a very
good assembly.
I tried to denovo assemble RNA-seq data using trinity and ran maker
using my genome assembly and the trinity results. I did not get as
many transcripts as expected, around 10,000 transcripts.
So, I decided to try a different approach. I did a genome assisted
assembly of the RNA-seq data using tophat/cufflinks. This pipeline
generated 21,000 genes, 29,000 transcripts. I then ran maker using my
genome assembly and the cufflinks result. I get much less number of
transcripts as a result.
If cufflinks found 29000 transcripts by mapping to the genome, I'm
confused as to why maker is not finding the same.
Any suggestions would be appreciated.
Thanks
Dhivya
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140205/98e0f3f4/attachment-0003.html>
More information about the maker-devel
mailing list