[maker-devel] maker annotation with cufflinks output

Carson Holt carsonhh at gmail.com
Thu Feb 6 10:04:25 MST 2014


Could you give me the file without using 'head’ to trim it, its cutting it
before it reaches the part I’m interested in.

—Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Thursday, February 6, 2014 at 10:01 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] maker annotation with cufflinks output

Oh yes I did- I took just the non sequence entries in the gff file and used
that as my input.  I will rerun snap with the gff file containing the
sequences as well. 

I'm attaching a snippet of the gff file that I used as input to maker2zff.

Thanks for your help
Dhivya




On Feb 6, 2014, at 10:05 AM, Carson Holt wrote:

> Your genome.dna file has no sequence?  Did you by any chance strip the fasta
> sequence from the GFF3 you are using as input to maker2zff?  There should be
> fasta sequence at the end of that file.  Also can I see the GFF3 file you are
> using as input to maker2zff.
> 
> Thanks,
> Carson
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Thursday, February 6, 2014 at 7:47 AM
> To:  Carson Holt <carsonhh at gmail.com>
> Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
> <maker-devel at yandell-lab.org>
> Subject:  Re: [maker-devel] maker annotation with cufflinks output
> 
> Hello,
> 
> I does appear than my genome.ann file from maker2zff script has data in it.
> However, the SNAP steps after that have created empty files.  The following
> are all empty:
> 
> alt.dna  err.dna  export.dna  genome.dna  olp.dna  uni.dna  wrn.dna
> alt.ann  err.ann  export.ann  genome.ann  olp.ann  uni.ann  wrn.ann
> 
> When I tried to get gene stats or validate genome.ann, I get errors like this
> for all of them:
> 
> fathom genome.ann genome.dna -gene-stats |more
> MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds
> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
> exon-6:out_of_bounds
> MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds
> exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds
> exon-1:out_of_bounds
> MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds
> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
> MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds
> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
> exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds
> exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds
> exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds
> exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds
> exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds
> exon-21:out_of_bounds
> 
> I'm not sure why the annotation I'm seeing in genome.ann are all showing up as
> errors. I realize this may be an issue with snap, but are you familiar with
> anything like this? My genome.ann file is attached for reference.
> 
> Thanks
> Dhivya
> 
> On Feb 5, 2014, at 12:38 PM, Carson Holt wrote:
> 
>> Do you have any features of type snap in your results from step 3?  We’ve had
>> a couple of recent posts where after training snap was giving no results, and
>> as a result maker couldn’t give any genes.  One cause of something like that
>> may be your step 2.  Make sure the ZFF wasn’t empty you used to train with.
>> The maker2zff script uses filters to only put the best genes in the off file,
>> and if all your genes fail the filtering then you are training with an empty
>> ZFF.
>> 
>> Also you should use proteins from a related species as your protein file.  I
>> see that you protein marches are varying wildly from run to run? So is your
>> contig count?  Were the subset of contigs you have results for long enough to
>> contain genes?
>> 
>> —Carson
>> 
>> From:  dhivya arasappan <darasappan at gmail.com>
>> Date:  Monday, February 3, 2014 at 9:31 AM
>> To:  Daniel Ence <dence at genetics.utah.edu>
>> Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject:  Re: [maker-devel] maker annotation with cufflinks output
>> 
>> Hi Daniel,
>> 
>> I was able to check on some of those questions.
>> 
>> 1. From trinity assembly: I started with 102000 contigs. I used trinotate to
>> annotate proteins in this.
>> 
>> I ran maker on this data with est2genome set to 1. The output looks like this
>> (most important parts on top):
>> 
>>     6653 gene
>>    46675 exon
>>  280534 protein_match
>> 59934 CDS
>>     969 contig
>>  105388 expressed_sequence_match
>>   12584 five_prime_UTR
>>   78565 match
>> 1401369 match_part
>>   10180 mRNA
>>   11545 three_prime_UTR
>> 
>> 2. From cufflinks assembly: I started with 133380 entries (out of which there
>> are 29,000 transcripts).  I used the protein sequences from trinity assembly.
>> 
>> I ran maker on this data with est2genome set to 1. The output looks like
>> this:
>>      29 gene
>>      75 exon
>>  573659 protein_match
>> 67 CDS
>>    1099 contig
>>  269298 expressed_sequence_match
>>      23 five_prime_UTR
>>  173844 match
>> 2221846 match_part
>>      29 mRNA
>>      23 three_prime_UTR
>> 
>> The genes annotated using the trinity assembly is lower than expected, so I
>> went the cufflinks route. I dont understand why when using the cufflinks
>> transcripts, even less genes are being found.
>> 
>> 3. Training SNAP:  I used the results of maker from 1 to train SNAP.  I then
>> used that training set to rerun maker:
>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/sna
>> p/RHA.hmm
>> est2genome=0
>> 
>> And again I got results with no entries for gene, exon, CDS etc.
>> 957 contig
>>   46555 expressed_sequence_match
>>   43651 match
>>  553633 match_part
>>  113738 protein_match
>> 
>> As I mentioned in another email, cegma results indicated that the genome was
>> more than 90% complete. Any suggestions would be helpful.
>> 
>> Thank you
>> Dhivya
>> 
>> 
>> 
>> 
>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>> 
>>> Hi Dhivya, 
>>> 
>>> I think there a few numbers that could be helpful to understand what's
>>> happening here.
>>> 
>>> How many transcripts did Trinity assembly the RNA-seq data into? Also, you
>>> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it
>>> the cufflinks data. How many transcripts did MAKER identify with the
>>> cufflinks data? Did you still get more than the 10,000 transcripts that you
>>> found with just the Trinity data?
>>> 
>>> A key part of MAKER's approach to genome annotation that might be affecting
>>> it's performance is that it only annotates a gene where there is both
>>> evidence (like your RNA-seq data) and an ab-initio prediction. If a
>>> prediction is unsupported by the evidence, then MAKER won't annotate a gene
>>> and if evidence aligns where there's no prediction, MAKER won't annotate a
>>> gene either. What ab-initio predictors are you using and have they been
>>> trained specific genome?
>>> 
>>> You can force MAKER to automatically promote evidence alignments to a gene
>>> model by setting the est2genome option to 1, but that will usually give you
>>> many false positives.
>>> 
>>> Try rerunning it with either the Trinity data or the Cufflinks data and with
>>> est2genome set to 1, and let us know how that affects the MAKER results.
>>> 
>>> Thanks,
>>> Daniel
>>> 
>>> Daniel Ence
>>> Graduate Student
>>> Eccles Institute of Human Genetics
>>> University of Utah
>>> 15 North 2030 East, Room 2100
>>> Salt Lake City, UT 84112-5330
>>> ________________________________________
>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya
>>> arasappan [darasappan at gmail.com]
>>> Sent: Thursday, January 30, 2014 11:18 AM
>>> To: maker-devel at yandell-lab.org
>>> Subject: [maker-devel] maker annotation with cufflinks output
>>> 
>>> Hello,
>>> 
>>> I am trying to annotate a 200 mb plant genome for which I have a very
>>> good assembly.
>>> 
>>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>>> using my genome assembly and the trinity results.  I did not get as
>>> many transcripts as expected, around 10,000 transcripts.
>>> 
>>> So, I decided to try a different approach.  I did a genome assisted
>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
>>> genome assembly and the cufflinks result.  I get much less number of
>>> transcripts as a result.
>>> 
>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>>> confused as to why maker is not finding the same.
>>> 
>>> Any suggestions would be appreciated.
>>> 
>>> Thanks
>>> Dhivya
>>> 
>>> 
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
>> _______________________________________________ maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/0e6ce7ae/attachment-0003.html>


More information about the maker-devel mailing list