[maker-devel] Help with updating an annotation

Wed Jun 18 10:42:17 MDT 2014

Thanks Daniel. I think it's more clear to me now.

So If I understand correctly now: I have to specify an ab initio gene model for any locus that I wish to annotate using evidence alignment (i.e. there must be a preexisting model)? These ab initio gene models can be trained internally in Maker with SNAP using my cufflinks output as EST evidence.Alternatively, I can provide alternative ab inito predictions (for regions not present in my ensembl ref passed to model_GFF) for regions overlapping my cufflinks output via the pred_GFF option? 

Since i'm interested in unannotated regions, i'm also passing in reference proteomes of closely related species as protein homology evidence.

As such i should be able to keep, only evidence supported predictions (for regions not present in my model_GFF and or better supported models for present regions) from my pred_GFF and merge them with Ensembl annotations from the model_GFF?

Let me know if i'm still missing something here. 

Thanks in advance.

best,
Saad
On 18 Jun 2014, at 17:21, Daniel Ence wrote:

> Hi Saad, 
> 
> Maker doesn't view EST or protein evidence as a gene model in themselves. There's a good reason for this. Aligners like blast  don't guarantee complete gene models, with accurate start and stop codons and splice sites. With it's default settings maker won't make a gene model unless there's evidence that overlaps an ab-initio prediction (or something from the pred_gff option). 
> 
> You can use est2genome to promote everything from the est_gff option to a gene model, but this will probably give you many spurious results. What you're saying with est2genome is, "Everything that this tool found is a complete gene model." I don't think that's true even for cufflinks output. 
> 
> One of the gene predictors that can run internally is snap. It's really easy to train; here's a link to a tutorial for training it: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors
> 
> Let me know if that helps, or if you have more question
> 
> 
> ~Daniel
> 
> Daniel Ence
> Graduate Student
> dence at genetics.utah.edu
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> 
> On Jun 18, 2014, at 5:09 AM, Saad Arif <saad.arif at tuebingen.mpg.de>
>  wrote:
> 
>> Thank you for the response. I still have one question though, with these options:
>> 
>> est_GFF=cufflinksout.GFF
>> 
>> modle_GFF= ensembl reference.GFF
>> 
>> What happens to cufflinks assembled transcripts that are not confined to current gene loci (i.e. novel genes in cufflinks ouput)? Would i have to prepare ab initio gene predictions for each of these predicted 'new' genes? 
>> Is there a simple way to combine adding (new genes) and improving of an existing annotation?
>> 
>> Any feedback on this would be greatly appreciated.
>> 
>> saad
>> 
>> On 13 Jun 2014, at 17:59, Carson Holt wrote:
>> 
>>> Use the cufflinks instead of the tophat features (tophat tends to be
>>> really noisy).  Give the existing models to model_gff (they will then
>>> always be kept unless something better is found).  There is no option to
>>> keep models and then just add isoforms.  The model_gff input will either
>>> be kept as is (unchanged), or replaced with an updated model suggested by
>>> the evidence (the updated model may contain multiple isoforms though), and
>>> map_forward=1 can be used to pull names forward from the old model onto
>>> the new models.
>>> 
>>> Thansk,
>>> Carson
>>> 
>>> 
>>> On 6/13/14, 5:03 AM, "Saad Arif" <saad.arif at tuebingen.mpg.de> wrote:
>>> 
>>>> Dear All,
>>>> 
>>>> I would like to use Maker pipeline  to expand a current annotation (new
>>>> isoforms and novel genes with respect to current annotation) and was
>>>> wondering if anyone had experience with this and or suggestions to my
>>>> questions.
>>>> 
>>>> Briefly:
>>>> 
>>>> I have tophat splice junctions from RNAseq data or alternatively
>>>> cufflinks generated transcript models (fasts format) that i want to use
>>>> as my new data (est_gff or est).
>>>> 
>>>> I want to provide the current Ensembl annotation for gene prediction but
>>>> i want this annotation to remain unchanged. Hence, i’m not sure if i
>>>> should provide this annotation as pred_gff
>>>> or model_gff. Can the model_gff be used for gene prediction or is this
>>>> just a subset of pred_gff that remain unaltered? Can we provide the same
>>>> annotation for both options (pred_ and mod_gff)?
>>>> 
>>>> 
>>>> 
>>>> Importantly, my main goal is to use the new RNAseq data to add more
>>>> isoforms and (any) novel genes to the existing Ensembl annotation. Any
>>>> thoughts or suggestions on how to go about  this would be  sincerely
>>>> appreciated.
>>>> 
>>>> 
>>>> Thanks in advance,
>>>> saad
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> maker-devel mailing list
>>>> maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> 
>>> 
>> 
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140618/5633ed39/attachment-0002.html>