[maker-devel] Maker consensus

Carson Holt carsonhh at gmail.com
Fri May 10 11:51:48 MDT 2013


Ok.  You just ran the evidence and didn't give a gene predictor.  You need
to provide an HMM file for SNAP a species for augustus, or for rough
annotations you can set protein3genome=1 and est2genome=1.  This will try
and generate models direct from the alignments.

If you provide a gene predictor, then MAKER can talk to it about the
evidence alignments so it can make a best gene call for the region.  Then
there will be gene/mRNA/exon model in the GFF3 file and entires in the
proteins.fasta and transcripts.fasta.  If you need to train a predictor, you
can train SNAP using the maker2zff script and the SNAP documentation or
maker GMOD tutorial.  If you want to train augustus Jason Stajich wrote an
excellent explanation as well as tools in a previous list message.

list msg - http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html
Script is in this github repo -
https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2
augustus_gbk.pl

Thanks,
Carson



From:  Diana LeDuc <diana_leduc at eva.mpg.de>
Reply-To:  Diana LeDuc <diana_leduc at eva.mpg.de>
Date:  Friday, 10 May, 2013 1:41 PM
To:  <maker-devel at yandell-lab.org>, Carson Holt <carsonhh at gmail.com>
Cc:  Torsten Schoeneberg <torsten.schoeneberg at medizin.uni-leipzig.de>,
Gabriel Renaud <gabriel_renaud at eva.mpg.de>, Janet Kelso <kelso at eva.mpg.de>
Subject:  Re: [maker-devel] Maker consensus

  
   
 Hi Carson, 
  
   
  
 Thank you for the quick answer.
  
 I ran gff3_merge to merge all the gff files and this resulted in a gff
file, which has these type of fields:
  
 scaffold32239   blastx  protein_match   22905   34500   174     +       .
ID=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAM
L1-2039; 
scaffold32239   blastx  match_part      22905   23045   174     +       .
ID=scaffold32239:hsp:2806529;Parent=scaffold32239:hit:976144;Name=ENSTGUG000
00000198|ENSTGUT00000000219|DSCAML1-2039;Target=ENSTGUG00000000198|ENSTGUT00
000000219|DSCAML1-2039 172 218;Gap=M47;
  
 In comparison to the dpp_contig test file, I am missing est2genome
evidence, most probably because my est data set is pretty poor. I have
blastx and protein2genome evidence though.
  
   
  
 My goal is to extract the genes that could be annotated on the scaffolds.
In the gff files the hits overlap most of the times, I can visualize this
properly in apollo: for example one scaffold hits DSCAML gene in both
zebrafinch and chicken, but extracting the coordinates between which this
scaffold fits this annotated gene is difficult from the gff. Manually
curating the genes is also not an option, since I am trying to do this for a
1.7Gb genome. 
  
   
  
 I hope this explains better what we are after.
  
   
  
 Thank you once again.
  
   
  
 Best regards, 
  
   
  
 Diana 
On May 10, 2013 at 6:13 PM Carson Holt <carsonhh at gmail.com> wrote:
  
  
>   
>  I'm sorry I don¹t' understand question 1.  You are you missing resulting
> fasta files, correct?  Did your resulting GFF3 file have any features of type
> "gene"?  Did you run fasta_merge after running gff3_merge?
>   
>    
>   
>  Could you give me more details on what you are trying to do, so I can take a
> stab at question 2 as well.
>   
>    
>   
>  Thanks, 
>   
>  Carson 
>   
>    
>   
>    
>   
>    
>   
>  From:  Diana LeDuc < diana_leduc at eva.mpg.de>
>  Reply-To:  Diana LeDuc < diana_leduc at eva.mpg.de>
>  Date:  Friday, 10 May, 2013 10:44 AM
>  To:  < maker-devel at yandell-lab.org>
>  Cc:  Gabriel Renaud < gabriel_renaud at eva.mpg.de>, Janet Kelso <
> kelso at eva.mpg.de>, Torsten Schoeneberg <
> torsten.schoeneberg at medizin.uni-leipzig.de>
>  Subject:  [maker-devel] Maker consensus
>   
>    
>   
>   
>   
>   
> 
> Dear maker developers,
>   
> 
> I am a phD student working on de novo assembly and annotation of a bird
> genome. I used Maker as annotation pipeline, which ran very well, and I
> obtained different annotations with evidence from Augustus gene predictor,
> small EST dataset from my organism and protein sequences from chicken, turkey
> and zebrafinch. I could combine the different gff files from different
> scaffolds into one gff file with annotations for the entire genome.
>   
> 
> I now have two questions:
>   
> 
> 1. What could be the reason that I haven't gotten the protein.fasta and
> trancript.fasta files
>   
> 
> 2. How can I obtain a consensus gene list of different evidences from maker?
> What I would actually need is the scaffold, coordinates and annotation (gene
> name) according to the 3 other bird species.
>  Thank you in advance.
>   
>    
>   
>  Best regards, 
>   
>    
>   
>  Diana Le Duc 
>   
>    
>   
>  --  
>   
> Max Planck Institute for Evolutionary Anthropology
> Department of Evolutionary Genetics
> Deutscher Platz 6
> D-04103 Leipzig  
>   
> Phone +49 (0)341-3550-554
>   www.eva.mpg.de <http://www.eva.mpg.de>
>   
>   
>   _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
  
  
 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20130510/0ce954bf/attachment-0003.html>


More information about the maker-devel mailing list