[maker-devel] maker annotation with cufflinks output
Carson Holt
carsonhh at gmail.com
Thu Feb 6 10:04:25 MST 2014
Could you give me the file without using 'head’ to trim it, its cutting it
before it reaches the part I’m interested in.
—Carson
From: dhivya arasappan <darasappan at gmail.com>
Date: Thursday, February 6, 2014 at 10:01 AM
To: Carson Holt <carsonhh at gmail.com>
Cc: Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] maker annotation with cufflinks output
Oh yes I did- I took just the non sequence entries in the gff file and used
that as my input. I will rerun snap with the gff file containing the
sequences as well.
I'm attaching a snippet of the gff file that I used as input to maker2zff.
Thanks for your help
Dhivya
On Feb 6, 2014, at 10:05 AM, Carson Holt wrote:
> Your genome.dna file has no sequence? Did you by any chance strip the fasta
> sequence from the GFF3 you are using as input to maker2zff? There should be
> fasta sequence at the end of that file. Also can I see the GFF3 file you are
> using as input to maker2zff.
>
> Thanks,
> Carson
>
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Thursday, February 6, 2014 at 7:47 AM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
> <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] maker annotation with cufflinks output
>
> Hello,
>
> I does appear than my genome.ann file from maker2zff script has data in it.
> However, the SNAP steps after that have created empty files. The following
> are all empty:
>
> alt.dna err.dna export.dna genome.dna olp.dna uni.dna wrn.dna
> alt.ann err.ann export.ann genome.ann olp.ann uni.ann wrn.ann
>
> When I tried to get gene stats or validate genome.ann, I get errors like this
> for all of them:
>
> fathom genome.ann genome.dna -gene-stats |more
> MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds
> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
> exon-6:out_of_bounds
> MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds
> exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds
> exon-1:out_of_bounds
> MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds
> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
> MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds
> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
> exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds
> exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds
> exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds
> exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds
> exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds
> exon-21:out_of_bounds
>
> I'm not sure why the annotation I'm seeing in genome.ann are all showing up as
> errors. I realize this may be an issue with snap, but are you familiar with
> anything like this? My genome.ann file is attached for reference.
>
> Thanks
> Dhivya
>
> On Feb 5, 2014, at 12:38 PM, Carson Holt wrote:
>
>> Do you have any features of type snap in your results from step 3? We’ve had
>> a couple of recent posts where after training snap was giving no results, and
>> as a result maker couldn’t give any genes. One cause of something like that
>> may be your step 2. Make sure the ZFF wasn’t empty you used to train with.
>> The maker2zff script uses filters to only put the best genes in the off file,
>> and if all your genes fail the filtering then you are training with an empty
>> ZFF.
>>
>> Also you should use proteins from a related species as your protein file. I
>> see that you protein marches are varying wildly from run to run? So is your
>> contig count? Were the subset of contigs you have results for long enough to
>> contain genes?
>>
>> —Carson
>>
>> From: dhivya arasappan <darasappan at gmail.com>
>> Date: Monday, February 3, 2014 at 9:31 AM
>> To: Daniel Ence <dence at genetics.utah.edu>
>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] maker annotation with cufflinks output
>>
>> Hi Daniel,
>>
>> I was able to check on some of those questions.
>>
>> 1. From trinity assembly: I started with 102000 contigs. I used trinotate to
>> annotate proteins in this.
>>
>> I ran maker on this data with est2genome set to 1. The output looks like this
>> (most important parts on top):
>>
>> 6653 gene
>> 46675 exon
>> 280534 protein_match
>> 59934 CDS
>> 969 contig
>> 105388 expressed_sequence_match
>> 12584 five_prime_UTR
>> 78565 match
>> 1401369 match_part
>> 10180 mRNA
>> 11545 three_prime_UTR
>>
>> 2. From cufflinks assembly: I started with 133380 entries (out of which there
>> are 29,000 transcripts). I used the protein sequences from trinity assembly.
>>
>> I ran maker on this data with est2genome set to 1. The output looks like
>> this:
>> 29 gene
>> 75 exon
>> 573659 protein_match
>> 67 CDS
>> 1099 contig
>> 269298 expressed_sequence_match
>> 23 five_prime_UTR
>> 173844 match
>> 2221846 match_part
>> 29 mRNA
>> 23 three_prime_UTR
>>
>> The genes annotated using the trinity assembly is lower than expected, so I
>> went the cufflinks route. I dont understand why when using the cufflinks
>> transcripts, even less genes are being found.
>>
>> 3. Training SNAP: I used the results of maker from 1 to train SNAP. I then
>> used that training set to rerun maker:
>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/sna
>> p/RHA.hmm
>> est2genome=0
>>
>> And again I got results with no entries for gene, exon, CDS etc.
>> 957 contig
>> 46555 expressed_sequence_match
>> 43651 match
>> 553633 match_part
>> 113738 protein_match
>>
>> As I mentioned in another email, cegma results indicated that the genome was
>> more than 90% complete. Any suggestions would be helpful.
>>
>> Thank you
>> Dhivya
>>
>>
>>
>>
>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>>
>>> Hi Dhivya,
>>>
>>> I think there a few numbers that could be helpful to understand what's
>>> happening here.
>>>
>>> How many transcripts did Trinity assembly the RNA-seq data into? Also, you
>>> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it
>>> the cufflinks data. How many transcripts did MAKER identify with the
>>> cufflinks data? Did you still get more than the 10,000 transcripts that you
>>> found with just the Trinity data?
>>>
>>> A key part of MAKER's approach to genome annotation that might be affecting
>>> it's performance is that it only annotates a gene where there is both
>>> evidence (like your RNA-seq data) and an ab-initio prediction. If a
>>> prediction is unsupported by the evidence, then MAKER won't annotate a gene
>>> and if evidence aligns where there's no prediction, MAKER won't annotate a
>>> gene either. What ab-initio predictors are you using and have they been
>>> trained specific genome?
>>>
>>> You can force MAKER to automatically promote evidence alignments to a gene
>>> model by setting the est2genome option to 1, but that will usually give you
>>> many false positives.
>>>
>>> Try rerunning it with either the Trinity data or the Cufflinks data and with
>>> est2genome set to 1, and let us know how that affects the MAKER results.
>>>
>>> Thanks,
>>> Daniel
>>>
>>> Daniel Ence
>>> Graduate Student
>>> Eccles Institute of Human Genetics
>>> University of Utah
>>> 15 North 2030 East, Room 2100
>>> Salt Lake City, UT 84112-5330
>>> ________________________________________
>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya
>>> arasappan [darasappan at gmail.com]
>>> Sent: Thursday, January 30, 2014 11:18 AM
>>> To: maker-devel at yandell-lab.org
>>> Subject: [maker-devel] maker annotation with cufflinks output
>>>
>>> Hello,
>>>
>>> I am trying to annotate a 200 mb plant genome for which I have a very
>>> good assembly.
>>>
>>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>>> using my genome assembly and the trinity results. I did not get as
>>> many transcripts as expected, around 10,000 transcripts.
>>>
>>> So, I decided to try a different approach. I did a genome assisted
>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>>> generated 21,000 genes, 29,000 transcripts. I then ran maker using my
>>> genome assembly and the cufflinks result. I get much less number of
>>> transcripts as a result.
>>>
>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>>> confused as to why maker is not finding the same.
>>>
>>> Any suggestions would be appreciated.
>>>
>>> Thanks
>>> Dhivya
>>>
>>>
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>> _______________________________________________ maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/0e6ce7ae/attachment-0003.html>
More information about the maker-devel
mailing list