[maker-devel] Help with updating an annotation
Saad Arif
saad.arif at tuebingen.mpg.de
Wed Jun 18 10:42:17 MDT 2014
Thanks Daniel. I think it's more clear to me now.
So If I understand correctly now: I have to specify an ab initio gene model for any locus that I wish to annotate using evidence alignment (i.e. there must be a preexisting model)? These ab initio gene models can be trained internally in Maker with SNAP using my cufflinks output as EST evidence.Alternatively, I can provide alternative ab inito predictions (for regions not present in my ensembl ref passed to model_GFF) for regions overlapping my cufflinks output via the pred_GFF option?
Since i'm interested in unannotated regions, i'm also passing in reference proteomes of closely related species as protein homology evidence.
As such i should be able to keep, only evidence supported predictions (for regions not present in my model_GFF and or better supported models for present regions) from my pred_GFF and merge them with Ensembl annotations from the model_GFF?
Let me know if i'm still missing something here.
Thanks in advance.
best,
Saad
On 18 Jun 2014, at 17:21, Daniel Ence wrote:
> Hi Saad,
>
> Maker doesn't view EST or protein evidence as a gene model in themselves. There's a good reason for this. Aligners like blast don't guarantee complete gene models, with accurate start and stop codons and splice sites. With it's default settings maker won't make a gene model unless there's evidence that overlaps an ab-initio prediction (or something from the pred_gff option).
>
> You can use est2genome to promote everything from the est_gff option to a gene model, but this will probably give you many spurious results. What you're saying with est2genome is, "Everything that this tool found is a complete gene model." I don't think that's true even for cufflinks output.
>
> One of the gene predictors that can run internally is snap. It's really easy to train; here's a link to a tutorial for training it: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors
>
> Let me know if that helps, or if you have more question
>
>
> ~Daniel
>
> Daniel Ence
> Graduate Student
> dence at genetics.utah.edu
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
>
> On Jun 18, 2014, at 5:09 AM, Saad Arif <saad.arif at tuebingen.mpg.de>
> wrote:
>
>> Thank you for the response. I still have one question though, with these options:
>>
>> est_GFF=cufflinksout.GFF
>>
>> modle_GFF= ensembl reference.GFF
>>
>> What happens to cufflinks assembled transcripts that are not confined to current gene loci (i.e. novel genes in cufflinks ouput)? Would i have to prepare ab initio gene predictions for each of these predicted 'new' genes?
>> Is there a simple way to combine adding (new genes) and improving of an existing annotation?
>>
>> Any feedback on this would be greatly appreciated.
>>
>> saad
>>
>> On 13 Jun 2014, at 17:59, Carson Holt wrote:
>>
>>> Use the cufflinks instead of the tophat features (tophat tends to be
>>> really noisy). Give the existing models to model_gff (they will then
>>> always be kept unless something better is found). There is no option to
>>> keep models and then just add isoforms. The model_gff input will either
>>> be kept as is (unchanged), or replaced with an updated model suggested by
>>> the evidence (the updated model may contain multiple isoforms though), and
>>> map_forward=1 can be used to pull names forward from the old model onto
>>> the new models.
>>>
>>> Thansk,
>>> Carson
>>>
>>>
>>> On 6/13/14, 5:03 AM, "Saad Arif" <saad.arif at tuebingen.mpg.de> wrote:
>>>
>>>> Dear All,
>>>>
>>>> I would like to use Maker pipeline to expand a current annotation (new
>>>> isoforms and novel genes with respect to current annotation) and was
>>>> wondering if anyone had experience with this and or suggestions to my
>>>> questions.
>>>>
>>>> Briefly:
>>>>
>>>> I have tophat splice junctions from RNAseq data or alternatively
>>>> cufflinks generated transcript models (fasts format) that i want to use
>>>> as my new data (est_gff or est).
>>>>
>>>> I want to provide the current Ensembl annotation for gene prediction but
>>>> i want this annotation to remain unchanged. Hence, i’m not sure if i
>>>> should provide this annotation as pred_gff
>>>> or model_gff. Can the model_gff be used for gene prediction or is this
>>>> just a subset of pred_gff that remain unaltered? Can we provide the same
>>>> annotation for both options (pred_ and mod_gff)?
>>>>
>>>>
>>>>
>>>> Importantly, my main goal is to use the new RNAseq data to add more
>>>> isoforms and (any) novel genes to the existing Ensembl annotation. Any
>>>> thoughts or suggestions on how to go about this would be sincerely
>>>> appreciated.
>>>>
>>>>
>>>> Thanks in advance,
>>>> saad
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> maker-devel mailing list
>>>> maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>
>>>
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140618/5633ed39/attachment-0002.html>
More information about the maker-devel
mailing list