[maker-devel] Conensus gene model

Carson Holt carsonhh at gmail.com
Mon Oct 15 14:10:00 MDT 2012


One thing you seem to be missing is protein evidence.

Is this a sea urchin (I looked up some of the ESTs)?  If so, I would
recommend adding all proteins from the Strongylocentrotus purpuratus
genome, then throw in another Deuterstome of your choice. Perhaps you
should also add a couple of outgroup organisms like Nematostella vectensis
(cnidaria) and a protostome of your choice.  Be careful if adding adding
to many protostome outgroups (i.e. C. elegans and Drosophila) because a
big part of their evolution is gene loss (so distant cnidaria often match
deuterstomes better than most protostomes do).

You could take the maker results when protein data is included and use it
to retrain SNAP again.

Even a 22 kb contig is still really short.  Is this genome primarily
constituted by short contigs like this?  I would recommend running CEGMA
once on this genome to get an appropriate estimate of how recoverable the
genes are going to be (http://korflab.ucdavis.edu/datasets/cegma/).  Cegma
will give you an estimate for genome completeness as well as estimates of
what percentage of genes will be found in their entirety and what percent
will be partial genes.  This is important to do if your genome is
fragmented as it will give you a reasonable expectation of what you can
expected to recover (as short contigs don't annotate very well - you tend
to loose a lot).

Thanks,
Carson


On 12-10-15 3:45 PM, "Parul Kudtarkar" <parulk at caltech.edu> wrote:

>Hi Carson,
>
>Thanks. I have attached another contig which is 22 kb, with as many as 3
>exons EST alignments. Could you please recommend additional training. We
>are currently running maker on the entire contig set and eventually merge
>all the gff3 contig predictions. The using suggested parameter/methods we
>would like to get a consensus gene-set with minimal false
>positives/negatives.
>
>Thanks,
>Parul
>
>> The contig in question is really too small to get much out of it (only 5
>kb).  There was only one single exon EST alignments and a couple of
>predictions with no evidence support.  Anything smaller than 10 kb is
>mostly useless for annotation purposes.  You would really need a few
>100kb
>> length or longer contigs to glean enough information for optimizing your
>parameters.
>>
>> The general suggestions for any maker run are to use proteins from a
>closely related organism or a couple of closely related organisms for
>the
>> protein= option in maker.  Also leave single_exon set to 0, except for
>certain eukaryotes that have a bias for single exon transcripts (i.e.
>some
>> fungi and oomycetes).  And leave keep_preds set to 0 because ab initio
>predictors tend to over-predict by a wide margin (lots of false
>> positives).
>>
>> Additional training would really depend on what your other contigs look
>like.  Do you have any large contigs?  I could look at one of those and
>give suggestions but the provided contig is just too short to glean
>much.
>>
>> Thanks,
>> Carson
>>
>>
>>
>>
>>
>> On 12-10-15 1:41 PM, "Parul Kudtarkar" <parulk at caltech.edu> wrote:
>>
>>>Hello,
>>>Please advice on the aforementioned query?
>>>Thanks,
>>>Parul Kudtarkar
>>>---------------------------- Original Message
>>> ----------------------------
>>>Subject: [maker-devel] Conensus gene model
>>>From:    "Parul Kudtarkar" <parulk at caltech.edu>
>>>Date:    Fri, October 12, 2012 2:46 pm
>>>To:      maker-devel at yandell-lab.org
>>>------------------------------------------------------------------------
>>>--
>Hi,
>>>We are using snap(training set[hmm file] generated using est,protein and
>contig file), agustus,genemarkE(we ran it outside maker and have gff3
>>> file
>>>as input). The output that we get is combination of various
>>>gene-predictors and evidences. I have attached sample result file. What
>would you recommend to get consensus result set? Bootstrapping the
>resulting gff3 file (rerunning maker)?
>>>Thanks,
>>>Parul Kudtarkar
>>>--
>>>Scientific Programmer
>>>Center for Computational Regulatory Genomics
>>>Beckman Institute,
>>>California Institute of Technology
>>>http://www.spbase.org_______________________________________________
>maker-devel mailing list
>>>maker-devel at box290.bluehost.com
>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>--
>>>Scientific Programmer
>>>Center for Computational Regulatory Genomics
>>>Beckman Institute,
>>>California Institute of Technology
>>>http://www.spbase.org_______________________________________________
>maker-devel mailing list
>>>maker-devel at box290.bluehost.com
>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>>
>
>
>--
>Scientific Programmer
>Center for Computational Regulatory Genomics
>Beckman Institute,
>California Institute of Technology
>http://www.spbase.org
>






More information about the maker-devel mailing list