[maker-devel] Using GeneMark-ET with RNAseq intron hints

Ray Cui rcui at age.mpg.de
Thu Mar 16 10:02:08 MDT 2017


Dear Carson,

         thank you for the explanation! Now I see why sometimes it seems
that EVM doesn't produce any model for a particular cluster.

Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for
Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:+49 (0)221 496
Mobile:           +49 0221 37970 496
rcui at age.mpg.de
www.age.mpg.de



On Thu, Mar 16, 2017 at 4:19 PM, Carson Holt <carsonhh at gmail.com> wrote:

> Final results with source maker will be of type gene/mRNA/exon/CDS. They
> have been further processed beyond the raw results, and may include
> extensions such as the addition of UTR for example (or hint based
> recomputation in the case of SNAP and Augustus). The gene ID of the maker
> model will let you know the source before additional processing was
> applied.  Raw results will also be in the file as type match/match_part and
> source evm/snap/augustus, but are only there for reference purposes (there
> will also be a raw fasta from each source, but only for reference
> purposes). All models compete against each other, and the one best matching
> the evidence is kept. So if SNAP or Augustus scores better than EVM, then
> that model will be kept for that locus. You can find more detail in the
> MAKER wiki and the MAKER2 paper for how models compete.
>
> So the final result is not a superset, rather a merged subset from each
> potential source.
>
> EVM is not used to obtain a consensus gene model. Its results compete just
> like all other algorithms. This is because when EVM works it produces
> beautiful models that score really well, but when it doesn’t work it
> produces either no model or partial models.
>
> —Carson
>
>
> On Mar 16, 2017, at 3:07 AM, Ray Cui <rcui at age.mpg.de> wrote:
>
> Dear Carson,
>
>         thank you so much! I am now peeking into the results for the
> finished scaffolds. In the gff file, the gene id confuses me a bit. In this
> file, column 2 is always "maker", but the "ID" attribute in the annotation
> is prefixed with "snap", "maker", "evm" , "augustus" etc. Does that mean
> the final annotation is a superset of all gene predictors? If EVM was used
> to obtain a consensus gene model, why would the other models still show up
> in the final result set?
>
> Best Regards,
> Ray
>
> Dr. Rongfeng (Ray) Cui
> Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for
> Biology of Ageing
> Wissenschaftlicher MA / Postdoctoral researcher
> Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
> Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
> Tel.:+49 (0)221 496 <+49%20221%20496>
> Mobile:           +49 0221 37970 496
> rcui at age.mpg.de
> www.age.mpg.de
>
>
>
> On Wed, Mar 15, 2017 at 3:52 PM, Carson Holt <carsonhh at gmail.com> wrote:
>
>> Maybe. I haven’t tested this, but it should work. Maker supports labels
>> for input by placing a ‘:’ and a label after each file name.
>>
>> Example—>
>> est=file1.fasta:label_1,file2.fasta:label_2
>>
>> If you label your files, then the label will go into the GFF3. So instead
>> of est2genome in column 2, you will get est2genome:label_1 in column 2.
>>
>> As a result, you should be able to add that label to the EVM settings
>> like so and it will match column 2 of the GFF3—>
>> evmtrans:est2genome:label1=10
>>
>> I don’t know if the label will force anything raw analysis to rerun, but
>> it shouldn’t.
>>
>>
>> —Carson
>>
>>
>>
>> On Mar 15, 2017, at 5:13 AM, Ray Cui <rcui at age.mpg.de> wrote:
>>
>> Hi Carson,
>>
>>        currently I am partitioning the protein evidence based on
>> phylogenetic relationship into several datasets, supplied as comma
>> delimited list. Is it possible then to specify higher weight for
>> protein2genome models from closer related species than further related taxa?
>>
>> Ray
>>
>> Dr. Rongfeng (Ray) Cui
>> Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for
>> Biology of Ageing
>> Wissenschaftlicher MA / Postdoctoral researcher
>> Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
>> Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
>> Tel.:+49 (0)221 496 <+49%20221%20496>
>> Mobile:           +49 0221 37970 496
>> rcui at age.mpg.de
>> www.age.mpg.de
>>
>>
>>
>> On Wed, Mar 15, 2017 at 11:47 AM, Ray Cui <rcui at age.mpg.de> wrote:
>>
>>> Dear Carson,
>>>
>>>        thank you for the pointers! Before running the first round of
>>> Maker, I mapped conspecific Trinity assembled proteins (long, "full length"
>>> subset) to an earlier version of the genome assembly using my own pipeline
>>> and trained Augustus and SNAP that way. I also trained Genemark-ET using
>>> TopHat alignments per their instructions. I'm wondering if it will be worth
>>> doing a second round, but I guess I will see.
>>>
>>>        It is good to know that MAKER will reuse the old results.
>>>
>>> Best Regards,
>>> Ray
>>>
>>> Dr. Rongfeng (Ray) Cui
>>> Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for
>>> Biology of Ageing
>>> Wissenschaftlicher MA / Postdoctoral researcher
>>> Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
>>> Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
>>> Tel.:+49 (0)221 496 <+49%20221%20496>
>>> Mobile:           +49 0221 37970 496
>>> rcui at age.mpg.de
>>> www.age.mpg.de
>>>
>>>
>>>
>>> On Tue, Mar 14, 2017 at 5:58 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>>
>>>> You can find lots of info in the devel archives on training. Example —>
>>>> https://groups.google.com/forum/#!topic/maker-devel/FWMSTdqWQqI
>>>>
>>>> Also example of training SNAP on the wiki —>
>>>> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/M
>>>> AKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_init
>>>> io_Gene_Predictors
>>>>
>>>> MAKER will reuse old raw results if you rerun in the same directory
>>>> (only deleting what would be different given altered settings between
>>>> runs). It will see the existing alignments archived in the datastore as raw
>>>> reports and just reuse them. The exception to this are the exonerate
>>>> alignments. They are generated relatively quickly compared to the BLAS T
>>>> runs, so rerunning them is not too much overhead. Also they are not
>>>> archived because doing so created IO issues (exonerate is not running in
>>>> bulk batches like BLAST, rather as multiple small separate runs for each
>>>> polished read, and archiving a lot of small raw reports can occur so fast
>>>> when using MPI that it crashes storage servers). So we decided to just not
>>>> archive exonerate rather than develop a database like bundling/compression
>>>> mechanism to get around the IO issues.
>>>>
>>>> Thanks,
>>>> Carson
>>>>
>>>>
>>>> On Mar 14, 2017, at 10:44 AM, Ray Cui <rcui at age.mpg.de> wrote:
>>>>
>>>> Hi Carson,
>>>>           Thanks for your prompt response!
>>>>
>>>>           I have a somewhat unrelated question. After the first run of
>>>> Maker, I want to train Augustus, SNAP and Genemark-ET using the most
>>>> reliable gene models produced in the first round. What would be a good way
>>>> to select these gene models?
>>>>           After retraining the ab initio predictors, I also wonder if
>>>> it's necessary to redo all the alignments (blastx, est2genome,
>>>> protein2genome etc) in the second iteration, since they are exactly the
>>>> same as the first run. Perhaps maker can take in the alignment results from
>>>> the previous run?
>>>>
>>>> Best Regards,
>>>> Ray
>>>>
>>>> Dr. Rongfeng (Ray) Cui
>>>> Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for
>>>> Biology of Ageing
>>>> Wissenschaftlicher MA / Postdoctoral researcher
>>>> Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
>>>> Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
>>>> Tel.:+49 (0)221 496 <+49%20221%20496>
>>>> Mobile:           +49 0221 37970 496
>>>> rcui at age.mpg.de
>>>> www.age.mpg.de
>>>>
>>>>
>>>>
>>>> On Tue, Mar 14, 2017 at 5:37 PM, Ray Cui <rcui at age.mpg.de> wrote:
>>>>
>>>>> I see. If my evm config looks like this:
>>>>> evmab=5 #default weight for source unspecified ab initio predictions
>>>>> evmab:snap=5 #weight for snap sourced predictions
>>>>> evmab:augustus=10 #weight for augustus sourced predictions
>>>>> evmab:fgenesh=10 #weight for fgenesh sourced predictions
>>>>> evmab:genemark=5 #weight for genemark sourced predictions
>>>>>
>>>>> and Column 2 in the genemark.gff is "GeneMark.hmm" , then the value
>>>>> from "evmab" (=5) will be used, is that correct?
>>>>>
>>>>> Best Regards,
>>>>> Ray
>>>>>
>>>>> Dr. Rongfeng (Ray) Cui
>>>>> Max-Planck-Institut für Biologie des Alterns / Max Planck Institute
>>>>> for Biology of Ageing
>>>>> Wissenschaftlicher MA / Postdoctoral researcher
>>>>> Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
>>>>> Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
>>>>> Tel.:+49 (0)221 496 <+49%20221%20496>
>>>>> Mobile:           +49 0221 37970 496
>>>>> rcui at age.mpg.de
>>>>> www.age.mpg.de
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 14, 2017 at 5:29 PM, Carson Holt <carsonhh at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Column 2 in the GFF3 file is the source column. It is used to specify
>>>>>> the source fo the data. That column will also be used by EVM to bin
>>>>>> features by their source and apply weights based on source.
>>>>>>
>>>>>> —Carson
>>>>>>
>>>>>> On Mar 14, 2017, at 10:26 AM, Ray Cui <rcui at age.mpg.de> wrote:
>>>>>>
>>>>>> Thanks! I didn't know you can also name the gff, but I think using
>>>>>> the default is fine, that's what I'm doing now.
>>>>>>
>>>>>>
>>>>>> Best Regards,
>>>>>> Ray
>>>>>>
>>>>>> Dr. Rongfeng (Ray) Cui
>>>>>> Max-Planck-Institut für Biologie des Alterns / Max Planck Institute
>>>>>> for Biology of Ageing
>>>>>> Wissenschaftlicher MA / Postdoctoral researcher
>>>>>> Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
>>>>>> Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496>
>>>>>> Mobile:           +49 0221 37970 496
>>>>>> rcui at age.mpg.de
>>>>>> www.age.mpg.de
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 14, 2017 at 5:11 PM, Carson Holt <carsonhh at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> These are set in the maker_evm.ctl file.
>>>>>>>
>>>>>>> Use whatever you used in the source column of the input GFF3. For
>>>>>>> example if column 2 is set as GENEMARK, then do this —>
>>>>>>> evmab:GENEMARK=7
>>>>>>>
>>>>>>> This also works —>
>>>>>>> evmab:pred_gff:GENEMARK=7
>>>>>>>
>>>>>>> Or just set the default —>
>>>>>>> evmab=7
>>>>>>>
>>>>>>> —Carson
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mar 10, 2017, at 8:48 AM, Ray Cui <rcui at age.mpg.de> wrote:
>>>>>>>
>>>>>>> Dear Carson,
>>>>>>>
>>>>>>>        I think it may be the most straight foward to input the GFF3
>>>>>>> instead.
>>>>>>>
>>>>>>>        What is the correct way of setting a weight for the EVM step
>>>>>>> for this GFF3 models passed through the pred_gff option?
>>>>>>>
>>>>>>> Ray
>>>>>>>
>>>>>>> Dr. Rongfeng (Ray) Cui
>>>>>>> Max-Planck-Institut für Biologie des Alterns / Max Planck Institute
>>>>>>> for Biology of Ageing
>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher
>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
>>>>>>> Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496>
>>>>>>> Mobile:           +49 0221 37970 496
>>>>>>> rcui at age.mpg.de
>>>>>>> www.age.mpg.de
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Feb 20, 2017 at 10:53 AM, Carson Holt <carsonhh at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> It may work as is as long as you don’t need any of the additional
>>>>>>>> options that have been added. If not, you can also just run it outside of
>>>>>>>> MAKER then provide the result in GFF3 format to pred_gff.
>>>>>>>>
>>>>>>>> —Carson
>>>>>>>>
>>>>>>>> On Feb 20, 2017, at 2:51 AM, Ray Cui <rcui at age.mpg.de> wrote:
>>>>>>>>
>>>>>>>> I see. Is there any recent plans to incorporate it into Maker?
>>>>>>>>
>>>>>>>> If not, I could try to see if I can adapt the current Maker script.
>>>>>>>>
>>>>>>>> Ray
>>>>>>>>
>>>>>>>> Dr. Rongfeng (Ray) Cui
>>>>>>>> Max-Planck-Institut für Biologie des Alterns / Max Planck Institute
>>>>>>>> for Biology of Ageing
>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher
>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
>>>>>>>> Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
>>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496>
>>>>>>>> Mobile:           +49 0221 37970 496
>>>>>>>> rcui at age.mpg.de
>>>>>>>> www.age.mpg.de
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Feb 20, 2017 at 10:46 AM, Carson Holt <carsonhh at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Yes. This is a recent update. It’s an attempt to merge GeneMark-ET
>>>>>>>>> and GeneMark-EP into GeneMark-ES scripts.
>>>>>>>>>
>>>>>>>>> —Carson
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Feb 20, 2017, at 2:43 AM, Ray Cui <rcui at age.mpg.de> wrote:
>>>>>>>>>
>>>>>>>>> I see, I will take a look at the wrapper gmhmm_wrap.
>>>>>>>>>
>>>>>>>>> I think there must have been a big update between different
>>>>>>>>> Genemark versions. It seems that they now also supports evidence being fed
>>>>>>>>> into the prediction stage.
>>>>>>>>>
>>>>>>>>> The name of the latest version of the genemark script has been
>>>>>>>>> changed to "gmes_petap.pl", with the following command lines
>>>>>>>>> options:
>>>>>>>>>
>>>>>>>>> Usage:  /beegfs/group_dv/software/sou
>>>>>>>>> rce/gm_et_linux_64/gmes_petap/gmes_petap.pl  [options]
>>>>>>>>>  --sequence [filename]
>>>>>>>>>
>>>>>>>>> GeneMark-ES Suite version 4.33
>>>>>>>>>    includes transcript (GeneMark-ET) and protein (GeneMark-EP)
>>>>>>>>> based training and prediction
>>>>>>>>>
>>>>>>>>> Input sequence/s should be in FASTA format
>>>>>>>>>
>>>>>>>>> Algorithm options
>>>>>>>>>   --ES           to run self-training
>>>>>>>>>   --fungus       to run algorithm with branch point model (most
>>>>>>>>> useful for fungal genomes)
>>>>>>>>>   --ET           [filename]; to run training with introns
>>>>>>>>> coordinates from RNA-Seq read alignments (GFF format)
>>>>>>>>>   --et_score     [number]; 4 (default) minimum score of intron in
>>>>>>>>> initiation of the ET algorithm
>>>>>>>>>   --evidence     [filename]; to use in prediction external
>>>>>>>>> evidence (RNA or protein) mapped to genome
>>>>>>>>>   --training_only     to run only training step
>>>>>>>>>   --prediction_only   to run only prediction step
>>>>>>>>>   --predict_with [filename]; predict genes using this file species
>>>>>>>>> specific parameters (bypass regular training and prediction steps)
>>>>>>>>>
>>>>>>>>> Sequence pre-processing options
>>>>>>>>>   --max_contig   [number]; 5000000 (default) will split input
>>>>>>>>> genomic sequence into contigs shorter then max_contig
>>>>>>>>>   --min_contig   [number]; 50000 (default); will ignore contigs
>>>>>>>>> shorter then min_contig in training
>>>>>>>>>   --max_gap      [number]; 5000 (default); will split sequence at
>>>>>>>>> gaps longer than max_gap
>>>>>>>>>                  Letters 'n' and 'N' are interpreted as standing
>>>>>>>>> within gaps
>>>>>>>>>   --max_mask     [number]; 5000 (default); will split sequence at
>>>>>>>>> repeats longer then max_mask
>>>>>>>>>                  Letters 'x' and 'X' are interpreted as results of
>>>>>>>>> hard masking of repeats
>>>>>>>>>   --soft_mask    [number] to indicate that lowercase letters stand
>>>>>>>>> for repeats; utilize only lowercase repeats longer than specified length
>>>>>>>>>
>>>>>>>>> Run options
>>>>>>>>>   --cores        [number]; 1 (default) to run program with
>>>>>>>>> multiple threads
>>>>>>>>>   --pbs          to run on cluster with PBS support
>>>>>>>>>   --v            verbose
>>>>>>>>>
>>>>>>>>> Customizing parameters:
>>>>>>>>>   --max_intron          [number]; default 10000 (3000 fungi),
>>>>>>>>> maximum length of intron
>>>>>>>>>   --max_intergenic      [number]; default 10000, maximum length of
>>>>>>>>> intergenic regions
>>>>>>>>>   --min_gene_prediction [number]; default 300 (120 fungi) minimum
>>>>>>>>> allowed gene length in prediction step
>>>>>>>>>
>>>>>>>>> Developer options:
>>>>>>>>>   --usr_cfg      [filename]; to customize configuration file
>>>>>>>>>   --ini_mod      [filename]; use this file with parameters for
>>>>>>>>> algorithm initiation
>>>>>>>>>   --test_set     [filename]; to evaluate prediction accuracy on
>>>>>>>>> the given test set
>>>>>>>>>   --key_bin
>>>>>>>>>   --debug
>>>>>>>>> # -------------------
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Dr. Rongfeng (Ray) Cui
>>>>>>>>> Max-Planck-Institut für Biologie des Alterns / Max Planck
>>>>>>>>> Institute for Biology of Ageing
>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher
>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
>>>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496>
>>>>>>>>> Mobile:           +49 0221 37970 496
>>>>>>>>> rcui at age.mpg.de
>>>>>>>>> www.age.mpg.de
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Feb 20, 2017 at 10:28 AM, Carson Holt <carsonhh at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Also note that the gmhmme3 executable distributed with different
>>>>>>>>>> flavors of genemark has had the same name but has been quite different in
>>>>>>>>>> both command line structure and output between flavors.
>>>>>>>>>>
>>>>>>>>>> —Carson
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Feb 20, 2017, at 2:08 AM, Ray Cui <rcui at age.mpg.de> wrote:
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>> Are the "--max_intron" and "--max_intergenic" parameters
>>>>>>>>>> automatically set by Maker when calling Genemark?
>>>>>>>>>> If you can point me to the part of the maker source code that
>>>>>>>>>> construct the final genemark command line I can also take a look.
>>>>>>>>>>
>>>>>>>>>> Best Regards,
>>>>>>>>>> Ray
>>>>>>>>>>
>>>>>>>>>> Dr. Rongfeng (Ray) Cui
>>>>>>>>>> Max-Planck-Institut für Biologie des Alterns / Max Planck
>>>>>>>>>> Institute for Biology of Ageing
>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher
>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
>>>>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496>
>>>>>>>>>> Mobile:           +49 0221 37970 496
>>>>>>>>>> rcui at age.mpg.de
>>>>>>>>>> www.age.mpg.de
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Feb 20, 2017 at 10:02 AM, Carson Holt <carsonhh at gmail.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> The names of scripts used are listed in the maker_exe.ctl file.
>>>>>>>>>>> It depends on if formatting or any flags have changed between versions.
>>>>>>>>>>>
>>>>>>>>>>> —Carson
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Feb 20, 2017, at 1:59 AM, Ray Cui <rcui at age.mpg.de> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Dear Carson,
>>>>>>>>>>>
>>>>>>>>>>>         I have now run GeneMark-ET, and it produces a trained
>>>>>>>>>>> .mod file. I think it can be then passed to Maker. Do you know what is the
>>>>>>>>>>> final constructed command line in Maker that calls genemark? Genemark-et
>>>>>>>>>>> and es use the same perl script so one probably only needs to use the
>>>>>>>>>>>  --prediction  and --predict_with xxx.mod options to predict genes using
>>>>>>>>>>> the species specific parameters (bypassing regular training and prediction
>>>>>>>>>>> steps)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Ray
>>>>>>>>>>>
>>>>>>>>>>> Dr. Rongfeng (Ray) Cui
>>>>>>>>>>> Max-Planck-Institut für Biologie des Alterns / Max Planck
>>>>>>>>>>> Institute for Biology of Ageing
>>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher
>>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
>>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
>>>>>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496>
>>>>>>>>>>> Mobile:           +49 0221 37970 496
>>>>>>>>>>> rcui at age.mpg.de
>>>>>>>>>>> www.age.mpg.de
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Feb 20, 2017 at 6:39 AM, Carson Holt <carsonhh at gmail.com
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> MAKER was support was designed with GeneMark-ES. It may or may
>>>>>>>>>>>> not work with GeneMark-ET. So any MAKER related archive posts etc. will be
>>>>>>>>>>>> related to the latter.
>>>>>>>>>>>>
>>>>>>>>>>>> With GeneMark-ES, you simply provided a genome assembly and let
>>>>>>>>>>>> it run. It would then produce several files and output directories. The
>>>>>>>>>>>> es.mod file was the one you provided to MAKER. I don’t know how this
>>>>>>>>>>>> compares to GeneMark-ET.
>>>>>>>>>>>>
>>>>>>>>>>>> —Carson
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Feb 14, 2017, at 8:44 AM, Ray Cui <rcui at age.mpg.de> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>
>>>>>>>>>>>>         thanks! It seems that Genemark-ET has a "--training"
>>>>>>>>>>>> flag, is that the flag I should use when training or should I just let
>>>>>>>>>>>> Genemark also perform the prediction?
>>>>>>>>>>>>
>>>>>>>>>>>> Ray
>>>>>>>>>>>>
>>>>>>>>>>>> Dr. Rongfeng (Ray) Cui
>>>>>>>>>>>> Max-Planck-Institut für Biologie des Alterns / Max Planck
>>>>>>>>>>>> Institute for Biology of Ageing
>>>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher
>>>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
>>>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
>>>>>>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496>
>>>>>>>>>>>> Mobile:           +49 0221 37970 496
>>>>>>>>>>>> rcui at age.mpg.de
>>>>>>>>>>>> www.age.mpg.de
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Feb 14, 2017 at 3:43 PM, Ence,daniel <d.ence at ufl.edu>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Ray,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think you’re on the right track with training Genemark with
>>>>>>>>>>>>> RNAseq data. It should only change the training steps, which are external
>>>>>>>>>>>>> to MAKER, but not how MAKER runs Genemark. You’ll still give MAKER the path
>>>>>>>>>>>>> to the “es.mod" file made by Genemark.
>>>>>>>>>>>>>
>>>>>>>>>>>>> For the 2nd question, in the MAKER beta 3, MAKER creates a
>>>>>>>>>>>>> control file for EVM, in which you set your weights for the various inputs,
>>>>>>>>>>>>> and then MAKER runs EVM alongside all the other gene predictors and chooses
>>>>>>>>>>>>> the model that is best supported by the evidence.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ~Daniel
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Feb 14, 2017, at 7:38 AM, Ray Cui <rcui at age.mpg.de> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>
>>>>>>>>>>>>>          I have sucessfully installed Maker beta 3, working
>>>>>>>>>>>>> with both Augustus and SNAP. I also want to try adding GeneMark-ES to the
>>>>>>>>>>>>> ab initio predictor.
>>>>>>>>>>>>>          When I read the GeneMark-ES manual, it says that one
>>>>>>>>>>>>> can use RNAseq data to aid training. I'm wondering what would be the best
>>>>>>>>>>>>> way to integrate Genemark-ET predictions into Maker. Should I run
>>>>>>>>>>>>> Genemark-ET independent of Maker, then integrate the GFF at some point
>>>>>>>>>>>>> during the maker process? If so, how should I edit the configuration file?
>>>>>>>>>>>>> Currently maker has an option called "gmhmm". Should I then train GeneMark
>>>>>>>>>>>>> by myself with RNAseq data, then feed the hmm to maker?
>>>>>>>>>>>>>
>>>>>>>>>>>>>           And perhaps an unrelated question is that now Maker
>>>>>>>>>>>>> beta 3 supports EVM. I'm wondering how EVM is used by Maker (at which step,
>>>>>>>>>>>>> what does it do), and how does it differ from what Maker is designed for
>>>>>>>>>>>>> (both reconciles different gene models).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>> Ray
>>>>>>>>>>>>>
>>>>>>>>>>>>> Dr. Rongfeng (Ray) Cui
>>>>>>>>>>>>> Max-Planck-Institut für Biologie des Alterns / Max Planck
>>>>>>>>>>>>> Institute for Biology of Ageing
>>>>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher
>>>>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
>>>>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
>>>>>>>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496>
>>>>>>>>>>>>> Mobile:           +49 0221 37970 496
>>>>>>>>>>>>> rcui at age.mpg.de
>>>>>>>>>>>>> www.age.mpg.de
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> maker-devel mailing list
>>>>>>>>>>>>> maker-devel at box290.bluehost.com
>>>>>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yand
>>>>>>>>>>>>> ell-lab.org
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> maker-devel mailing list
>>>>>>>>>>>> maker-devel at box290.bluehost.com
>>>>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yand
>>>>>>>>>>>> ell-lab.org
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170316/033e1579/attachment-0003.html>


More information about the maker-devel mailing list