[maker-devel] maker annotation with cufflinks output

dhivya arasappan darasappan at gmail.com
Thu Feb 6 10:01:44 MST 2014


Oh yes I did- I took just the non sequence entries in the gff file and  
used that as my input.  I will rerun snap with the gff file containing  
the sequences as well.

I'm attaching a snippet of the gff file that I used as input to  
maker2zff.

Thanks for your help
Dhivya




On Feb 6, 2014, at 10:05 AM, Carson Holt wrote:

> Your genome.dna file has no sequence?  Did you by any chance strip  
> the fasta sequence from the GFF3 you are using as input to  
> maker2zff?  There should be fasta sequence at the end of that file.   
> Also can I see the GFF3 file you are using as input to maker2zff.
>
> Thanks,
> Carson
>
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Thursday, February 6, 2014 at 7:47 AM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org 
> " <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] maker annotation with cufflinks output
>
> Hello,
>
> I does appear than my genome.ann file from maker2zff script has data  
> in it. However, the SNAP steps after that have created empty files.   
> The following are all empty:
>
> alt.dna  err.dna  export.dna  genome.dna  olp.dna  uni.dna  wrn.dna
> alt.ann  err.ann  export.ann  genome.ann  olp.ann  uni.ann  wrn.ann
>
> When I tried to get gene stats or validate genome.ann, I get errors  
> like this for all of them:
>
> fathom genome.ann genome.dna -gene-stats |more
> MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds  
> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
> exon-5:out_of_bounds exon-6:out_of_bounds
> MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds  
> exon-5:out_of_bounds exon-4:out_of_bounds exon-3:out_of_bounds  
> exon-2:out_of_bounds exon-1:out_of_bounds
> MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds  
> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
> exon-5:out_of_bounds
> MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds  
> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
> exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds  
> exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds  
> exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds  
> exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds  
> exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds  
> exon-20:out_of_bounds exon-21:out_of_bounds
>
> I'm not sure why the annotation I'm seeing in genome.ann are all  
> showing up as errors. I realize this may be an issue with snap, but  
> are you familiar with anything like this? My genome.ann file is  
> attached for reference.
>
> Thanks
> Dhivya
>
> On Feb 5, 2014, at 12:38 PM, Carson Holt wrote:
>
>> Do you have any features of type snap in your results from step 3?   
>> We’ve had a couple of recent posts where after training snap was  
>> giving no results, and as a result maker couldn’t give any genes.   
>> One cause of something like that may be your step 2.  Make sure the  
>> ZFF wasn’t empty you used to train with.  The maker2zff script uses  
>> filters to only put the best genes in the off file, and if all your  
>> genes fail the filtering then you are training with an empty ZFF.
>>
>> Also you should use proteins from a related species as your protein  
>> file.  I see that you protein marches are varying wildly from run  
>> to run? So is your contig count?  Were the subset of contigs you  
>> have results for long enough to contain genes?
>>
>> —Carson
>>
>> From: dhivya arasappan <darasappan at gmail.com>
>> Date: Monday, February 3, 2014 at 9:31 AM
>> To: Daniel Ence <dence at genetics.utah.edu>
>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] maker annotation with cufflinks output
>>
>> Hi Daniel,
>>
>> I was able to check on some of those questions.
>>
>> 1. From trinity assembly: I started with 102000 contigs. I used  
>> trinotate to annotate proteins in this.
>>
>> I ran maker on this data with est2genome set to 1. The output looks  
>> like this (most important parts on top):
>>
>>     6653 gene
>>    46675 exon
>>  280534 protein_match
>> 59934 CDS
>>     969 contig
>>  105388 expressed_sequence_match
>>   12584 five_prime_UTR
>>   78565 match
>> 1401369 match_part
>>   10180 mRNA
>>   11545 three_prime_UTR
>>
>> 2. From cufflinks assembly: I started with 133380 entries (out of  
>> which there are 29,000 transcripts).  I used the protein sequences  
>> from trinity assembly.
>>
>> I ran maker on this data with est2genome set to 1. The output looks  
>> like this:
>>      29 gene
>>      75 exon
>>  573659 protein_match
>> 67 CDS
>>    1099 contig
>>  269298 expressed_sequence_match
>>      23 five_prime_UTR
>>  173844 match
>> 2221846 match_part
>>      29 mRNA
>>      23 three_prime_UTR
>>
>> The genes annotated using the trinity assembly is lower than  
>> expected, so I went the cufflinks route. I dont understand why when  
>> using the cufflinks transcripts, even less genes are being found.
>>
>> 3. Training SNAP:  I used the results of maker from 1 to train  
>> SNAP.  I then used that training set to rerun maker:
>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
>> maker_mpi_withAlltrinity/snap/RHA.hmm
>> est2genome=0
>>
>> And again I got results with no entries for gene, exon, CDS etc.
>> 957 contig
>>   46555 expressed_sequence_match
>>   43651 match
>>  553633 match_part
>>  113738 protein_match
>>
>> As I mentioned in another email, cegma results indicated that the  
>> genome was more than 90% complete. Any suggestions would be helpful.
>>
>> Thank you
>> Dhivya
>>
>>
>>
>>
>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>>
>>> Hi Dhivya,
>>>
>>> I think there a few numbers that could be helpful to understand  
>>> what's happening here.
>>>
>>> How many transcripts did Trinity assembly the RNA-seq data into?  
>>> Also, you had 29,000 transcripts from cufflinks, but fewer from  
>>> MAKER when you gave it the cufflinks data. How many transcripts  
>>> did MAKER identify with the cufflinks data? Did you still get more  
>>> than the 10,000 transcripts that you found with just the Trinity  
>>> data?
>>>
>>> A key part of MAKER's approach to genome annotation that might be  
>>> affecting it's performance is that it only annotates a gene where  
>>> there is both evidence (like your RNA-seq data) and an ab-initio  
>>> prediction. If a prediction is unsupported by the evidence, then  
>>> MAKER won't annotate a gene and if evidence aligns where there's  
>>> no prediction, MAKER won't annotate a gene either. What ab-initio  
>>> predictors are you using and have they been trained specific genome?
>>>
>>> You can force MAKER to automatically promote evidence alignments  
>>> to a gene model by setting the est2genome option to 1, but that  
>>> will usually give you many false positives.
>>>
>>> Try rerunning it with either the Trinity data or the Cufflinks  
>>> data and with est2genome set to 1, and let us know how that  
>>> affects the MAKER results.
>>>
>>> Thanks,
>>> Daniel
>>>
>>> Daniel Ence
>>> Graduate Student
>>> Eccles Institute of Human Genetics
>>> University of Utah
>>> 15 North 2030 East, Room 2100
>>> Salt Lake City, UT 84112-5330
>>> ________________________________________
>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf  
>>> of dhivya arasappan [darasappan at gmail.com]
>>> Sent: Thursday, January 30, 2014 11:18 AM
>>> To: maker-devel at yandell-lab.org
>>> Subject: [maker-devel] maker annotation with cufflinks output
>>>
>>> Hello,
>>>
>>> I am trying to annotate a 200 mb plant genome for which I have a  
>>> very
>>> good assembly.
>>>
>>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>>> using my genome assembly and the trinity results.  I did not get as
>>> many transcripts as expected, around 10,000 transcripts.
>>>
>>> So, I decided to try a different approach.  I did a genome assisted
>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>>> generated 21,000 genes, 29,000 transcripts.  I then ran maker  
>>> using my
>>> genome assembly and the cufflinks result.  I get much less number of
>>> transcripts as a result.
>>>
>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>>> confused as to why maker is not finding the same.
>>>
>>> Any suggestions would be appreciated.
>>>
>>> Thanks
>>> Dhivya
>>>
>>>
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>> _______________________________________________ maker-devel mailing  
>> list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a662c5a7/attachment-0004.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: head.cat.formatted.gff
Type: application/octet-stream
Size: 19905 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a662c5a7/attachment-0002.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a662c5a7/attachment-0005.html>


More information about the maker-devel mailing list