[maker-devel] Using GeneMark-ET with RNAseq intron hints

Carson Holt carsonhh at gmail.com
Thu Mar 16 09:19:02 MDT 2017


Final results with source maker will be of type gene/mRNA/exon/CDS. They have been further processed beyond the raw results, and may include extensions such as the addition of UTR for example (or hint based recomputation in the case of SNAP and Augustus). The gene ID of the maker model will let you know the source before additional processing was applied.  Raw results will also be in the file as type match/match_part and source evm/snap/augustus, but are only there for reference purposes (there will also be a raw fasta from each source, but only for reference purposes). All models compete against each other, and the one best matching the evidence is kept. So if SNAP or Augustus scores better than EVM, then that model will be kept for that locus. You can find more detail in the MAKER wiki and the MAKER2 paper for how models compete. 

So the final result is not a superset, rather a merged subset from each potential source.

EVM is not used to obtain a consensus gene model. Its results compete just like all other algorithms. This is because when EVM works it produces beautiful models that score really well, but when it doesn’t work it produces either no model or partial models.

—Carson


> On Mar 16, 2017, at 3:07 AM, Ray Cui <rcui at age.mpg.de> wrote:
> 
> Dear Carson,
> 
>         thank you so much! I am now peeking into the results for the finished scaffolds. In the gff file, the gene id confuses me a bit. In this file, column 2 is always "maker", but the "ID" attribute in the annotation is prefixed with "snap", "maker", "evm" , "augustus" etc. Does that mean the final annotation is a superset of all gene predictors? If EVM was used to obtain a consensus gene model, why would the other models still show up in the final result set?
> 
> Best Regards,
> Ray
> 
> Dr. Rongfeng (Ray) Cui
> Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
> Wissenschaftlicher MA / Postdoctoral researcher
> Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
> Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
> Tel.:+49 (0)221 496 
> Mobile:           +49 0221 37970 496 <>
> rcui at age.mpg.de <mailto:rcui at age.mpg.de>
> www.age.mpg.de <http://www.age.mpg.de/> 
> 
> 
> 
> On Wed, Mar 15, 2017 at 3:52 PM, Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>> wrote:
> Maybe. I haven’t tested this, but it should work. Maker supports labels for input by placing a ‘:’ and a label after each file name.
> 
> Example—>
> est=file1.fasta:label_1,file2.fasta:label_2
> 
> If you label your files, then the label will go into the GFF3. So instead of est2genome in column 2, you will get est2genome:label_1 in column 2.
> 
> As a result, you should be able to add that label to the EVM settings like so and it will match column 2 of the GFF3—>
> evmtrans:est2genome:label1=10
> 
> I don’t know if the label will force anything raw analysis to rerun, but it shouldn’t.
> 
> 
> —Carson
> 
> 
> 
>> On Mar 15, 2017, at 5:13 AM, Ray Cui <rcui at age.mpg.de <mailto:rcui at age.mpg.de>> wrote:
>> 
>> Hi Carson,
>> 
>>        currently I am partitioning the protein evidence based on phylogenetic relationship into several datasets, supplied as comma delimited list. Is it possible then to specify higher weight for protein2genome models from closer related species than further related taxa?
>> 
>> Ray 
>> 
>> Dr. Rongfeng (Ray) Cui
>> Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
>> Wissenschaftlicher MA / Postdoctoral researcher
>> Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
>> Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
>> Tel.:+49 (0)221 496 <tel:+49%20221%20496> 
>> Mobile:           +49 0221 37970 496 <>
>> rcui at age.mpg.de <mailto:rcui at age.mpg.de>
>> www.age.mpg.de <http://www.age.mpg.de/> 
>> 
>> 
>> 
>> On Wed, Mar 15, 2017 at 11:47 AM, Ray Cui <rcui at age.mpg.de <mailto:rcui at age.mpg.de>> wrote:
>> Dear Carson,
>> 
>>        thank you for the pointers! Before running the first round of Maker, I mapped conspecific Trinity assembled proteins (long, "full length" subset) to an earlier version of the genome assembly using my own pipeline and trained Augustus and SNAP that way. I also trained Genemark-ET using TopHat alignments per their instructions. I'm wondering if it will be worth doing a second round, but I guess I will see.
>> 
>>        It is good to know that MAKER will reuse the old results. 
>> 
>> Best Regards,
>> Ray
>> 
>> Dr. Rongfeng (Ray) Cui
>> Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
>> Wissenschaftlicher MA / Postdoctoral researcher
>> Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
>> Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
>> Tel.:+49 (0)221 496 <tel:+49%20221%20496> 
>> Mobile:           +49 0221 37970 496 <>
>> rcui at age.mpg.de <mailto:rcui at age.mpg.de>
>> www.age.mpg.de <http://www.age.mpg.de/> 
>> 
>> 
>> 
>> On Tue, Mar 14, 2017 at 5:58 PM, Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>> wrote:
>> You can find lots of info in the devel archives on training. Example —> https://groups.google.com/forum/#!topic/maker-devel/FWMSTdqWQqI <https://groups.google.com/forum/#!topic/maker-devel/FWMSTdqWQqI>
>> 
>> Also example of training SNAP on the wiki —> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors <http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors>
>> 
>> MAKER will reuse old raw results if you rerun in the same directory (only deleting what would be different given altered settings between runs). It will see the existing alignments archived in the datastore as raw reports and just reuse them. The exception to this are the exonerate alignments. They are generated relatively quickly compared to the BLAS T runs, so rerunning them is not too much overhead. Also they are not archived because doing so created IO issues (exonerate is not running in bulk batches like BLAST, rather as multiple small separate runs for each polished read, and archiving a lot of small raw reports can occur so fast when using MPI that it crashes storage servers). So we decided to just not archive exonerate rather than develop a database like bundling/compression mechanism to get around the IO issues.
>> 
>> Thanks,
>> Carson
>> 
>> 
>>> On Mar 14, 2017, at 10:44 AM, Ray Cui <rcui at age.mpg.de <mailto:rcui at age.mpg.de>> wrote:
>>> 
>>> Hi Carson,
>>>           Thanks for your prompt response!
>>> 
>>>           I have a somewhat unrelated question. After the first run of Maker, I want to train Augustus, SNAP and Genemark-ET using the most reliable gene models produced in the first round. What would be a good way to select these gene models? 
>>>           After retraining the ab initio predictors, I also wonder if it's necessary to redo all the alignments (blastx, est2genome, protein2genome etc) in the second iteration, since they are exactly the same as the first run. Perhaps maker can take in the alignment results from the previous run? 
>>> 
>>> Best Regards,
>>> Ray
>>> 
>>> Dr. Rongfeng (Ray) Cui
>>> Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
>>> Wissenschaftlicher MA / Postdoctoral researcher
>>> Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
>>> Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
>>> Tel.:+49 (0)221 496 <tel:+49%20221%20496> 
>>> Mobile:           +49 0221 37970 496 <>
>>> rcui at age.mpg.de <mailto:rcui at age.mpg.de>
>>> www.age.mpg.de <http://www.age.mpg.de/> 
>>> 
>>> 
>>> 
>>> On Tue, Mar 14, 2017 at 5:37 PM, Ray Cui <rcui at age.mpg.de <mailto:rcui at age.mpg.de>> wrote:
>>> I see. If my evm config looks like this:
>>> evmab=5 #default weight for source unspecified ab initio predictions
>>> evmab:snap=5 #weight for snap sourced predictions
>>> evmab:augustus=10 #weight for augustus sourced predictions
>>> evmab:fgenesh=10 #weight for fgenesh sourced predictions
>>> evmab:genemark=5 #weight for genemark sourced predictions
>>> 
>>> and Column 2 in the genemark.gff is "GeneMark.hmm" , then the value from "evmab" (=5) will be used, is that correct?
>>> 
>>> Best Regards,
>>> Ray
>>> 
>>> Dr. Rongfeng (Ray) Cui
>>> Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
>>> Wissenschaftlicher MA / Postdoctoral researcher
>>> Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
>>> Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
>>> Tel.:+49 (0)221 496 <tel:+49%20221%20496> 
>>> Mobile:           +49 0221 37970 496 <>
>>> rcui at age.mpg.de <mailto:rcui at age.mpg.de>
>>> www.age.mpg.de <http://www.age.mpg.de/> 
>>> 
>>> 
>>> 
>>> On Tue, Mar 14, 2017 at 5:29 PM, Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>> wrote:
>>> Column 2 in the GFF3 file is the source column. It is used to specify the source fo the data. That column will also be used by EVM to bin features by their source and apply weights based on source.
>>> 
>>> —Carson
>>> 
>>>> On Mar 14, 2017, at 10:26 AM, Ray Cui <rcui at age.mpg.de <mailto:rcui at age.mpg.de>> wrote:
>>>> 
>>>> Thanks! I didn't know you can also name the gff, but I think using the default is fine, that's what I'm doing now.
>>>> 
>>>> 
>>>> Best Regards,
>>>> Ray
>>>> 
>>>> Dr. Rongfeng (Ray) Cui
>>>> Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
>>>> Wissenschaftlicher MA / Postdoctoral researcher
>>>> Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
>>>> Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
>>>> Tel.:+49 (0)221 496 <tel:+49%20221%20496> 
>>>> Mobile:           +49 0221 37970 496 <>
>>>> rcui at age.mpg.de <mailto:rcui at age.mpg.de>
>>>> www.age.mpg.de <http://www.age.mpg.de/> 
>>>> 
>>>> 
>>>> 
>>>> On Tue, Mar 14, 2017 at 5:11 PM, Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>> wrote:
>>>> 
>>>> These are set in the maker_evm.ctl file.
>>>> 
>>>> Use whatever you used in the source column of the input GFF3. For example if column 2 is set as GENEMARK, then do this —>
>>>> evmab:GENEMARK=7
>>>> 
>>>> This also works —>
>>>> evmab:pred_gff:GENEMARK=7
>>>> 
>>>> Or just set the default —>
>>>> evmab=7
>>>> 
>>>> —Carson
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Mar 10, 2017, at 8:48 AM, Ray Cui <rcui at age.mpg.de <mailto:rcui at age.mpg.de>> wrote:
>>>>> 
>>>>> Dear Carson,
>>>>> 
>>>>>        I think it may be the most straight foward to input the GFF3 instead.
>>>>> 
>>>>>        What is the correct way of setting a weight for the EVM step for this GFF3 models passed through the pred_gff option?
>>>>> 
>>>>> Ray
>>>>> 
>>>>> Dr. Rongfeng (Ray) Cui
>>>>> Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
>>>>> Wissenschaftlicher MA / Postdoctoral researcher
>>>>> Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
>>>>> Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
>>>>> Tel.:+49 (0)221 496 <tel:+49%20221%20496> 
>>>>> Mobile:           +49 0221 37970 496 <>
>>>>> rcui at age.mpg.de <mailto:rcui at age.mpg.de>
>>>>> www.age.mpg.de <http://www.age.mpg.de/> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Mon, Feb 20, 2017 at 10:53 AM, Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>> wrote:
>>>>> It may work as is as long as you don’t need any of the additional options that have been added. If not, you can also just run it outside of MAKER then provide the result in GFF3 format to pred_gff.
>>>>> 
>>>>> —Carson
>>>>> 
>>>>>> On Feb 20, 2017, at 2:51 AM, Ray Cui <rcui at age.mpg.de <mailto:rcui at age.mpg.de>> wrote:
>>>>>> 
>>>>>> I see. Is there any recent plans to incorporate it into Maker?
>>>>>> 
>>>>>> If not, I could try to see if I can adapt the current Maker script.
>>>>>> 
>>>>>> Ray
>>>>>> 
>>>>>> Dr. Rongfeng (Ray) Cui
>>>>>> Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
>>>>>> Wissenschaftlicher MA / Postdoctoral researcher
>>>>>> Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
>>>>>> Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
>>>>>> Tel.:+49 (0)221 496 <tel:+49%20221%20496> 
>>>>>> Mobile:           +49 0221 37970 496 <>
>>>>>> rcui at age.mpg.de <mailto:rcui at age.mpg.de>
>>>>>> www.age.mpg.de <http://www.age.mpg.de/> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Mon, Feb 20, 2017 at 10:46 AM, Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>> wrote:
>>>>>> Yes. This is a recent update. It’s an attempt to merge GeneMark-ET and GeneMark-EP into GeneMark-ES scripts.
>>>>>> 
>>>>>> —Carson
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Feb 20, 2017, at 2:43 AM, Ray Cui <rcui at age.mpg.de <mailto:rcui at age.mpg.de>> wrote:
>>>>>>> 
>>>>>>> I see, I will take a look at the wrapper gmhmm_wrap. 
>>>>>>> 
>>>>>>> I think there must have been a big update between different Genemark versions. It seems that they now also supports evidence being fed into the prediction stage.
>>>>>>> 
>>>>>>> The name of the latest version of the genemark script has been changed to "gmes_petap.pl <http://gmes_petap.pl/>", with the following command lines options:
>>>>>>> 
>>>>>>> Usage:  /beegfs/group_dv/software/source/gm_et_linux_64/gmes_petap/gmes_petap.pl <http://gmes_petap.pl/>  [options]  --sequence [filename]
>>>>>>> 
>>>>>>> GeneMark-ES Suite version 4.33
>>>>>>>    includes transcript (GeneMark-ET) and protein (GeneMark-EP) based training and prediction
>>>>>>> 
>>>>>>> Input sequence/s should be in FASTA format
>>>>>>> 
>>>>>>> Algorithm options
>>>>>>>   --ES           to run self-training
>>>>>>>   --fungus       to run algorithm with branch point model (most useful for fungal genomes)
>>>>>>>   --ET           [filename]; to run training with introns coordinates from RNA-Seq read alignments (GFF format)
>>>>>>>   --et_score     [number]; 4 (default) minimum score of intron in initiation of the ET algorithm
>>>>>>>   --evidence     [filename]; to use in prediction external evidence (RNA or protein) mapped to genome
>>>>>>>   --training_only     to run only training step
>>>>>>>   --prediction_only   to run only prediction step
>>>>>>>   --predict_with [filename]; predict genes using this file species specific parameters (bypass regular training and prediction steps)
>>>>>>> 
>>>>>>> Sequence pre-processing options
>>>>>>>   --max_contig   [number]; 5000000 (default) will split input genomic sequence into contigs shorter then max_contig
>>>>>>>   --min_contig   [number]; 50000 (default); will ignore contigs shorter then min_contig in training 
>>>>>>>   --max_gap      [number]; 5000 (default); will split sequence at gaps longer than max_gap
>>>>>>>                  Letters 'n' and 'N' are interpreted as standing within gaps 
>>>>>>>   --max_mask     [number]; 5000 (default); will split sequence at repeats longer then max_mask
>>>>>>>                  Letters 'x' and 'X' are interpreted as results of hard masking of repeats
>>>>>>>   --soft_mask    [number] to indicate that lowercase letters stand for repeats; utilize only lowercase repeats longer than specified length
>>>>>>> 
>>>>>>> Run options
>>>>>>>   --cores        [number]; 1 (default) to run program with multiple threads 
>>>>>>>   --pbs          to run on cluster with PBS support
>>>>>>>   --v            verbose
>>>>>>> 
>>>>>>> Customizing parameters:
>>>>>>>   --max_intron          [number]; default 10000 (3000 fungi), maximum length of intron
>>>>>>>   --max_intergenic      [number]; default 10000, maximum length of intergenic regions
>>>>>>>   --min_gene_prediction [number]; default 300 (120 fungi) minimum allowed gene length in prediction step
>>>>>>> 
>>>>>>> Developer options:
>>>>>>>   --usr_cfg      [filename]; to customize configuration file
>>>>>>>   --ini_mod      [filename]; use this file with parameters for algorithm initiation
>>>>>>>   --test_set     [filename]; to evaluate prediction accuracy on the given test set
>>>>>>>   --key_bin
>>>>>>>   --debug
>>>>>>> # -------------------
>>>>>>> 
>>>>>>> 
>>>>>>> Dr. Rongfeng (Ray) Cui
>>>>>>> Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher
>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
>>>>>>> Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
>>>>>>> Tel.:+49 (0)221 496 <tel:+49%20221%20496> 
>>>>>>> Mobile:           +49 0221 37970 496 <>
>>>>>>> rcui at age.mpg.de <mailto:rcui at age.mpg.de>
>>>>>>> www.age.mpg.de <http://www.age.mpg.de/> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Mon, Feb 20, 2017 at 10:28 AM, Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>> wrote:
>>>>>>> Also note that the gmhmme3 executable distributed with different flavors of genemark has had the same name but has been quite different in both command line structure and output between flavors.
>>>>>>> 
>>>>>>> —Carson
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Feb 20, 2017, at 2:08 AM, Ray Cui <rcui at age.mpg.de <mailto:rcui at age.mpg.de>> wrote:
>>>>>>>> 
>>>>>>>> Thanks. 
>>>>>>>> 
>>>>>>>> Are the "--max_intron" and "--max_intergenic" parameters automatically set by Maker when calling Genemark?
>>>>>>>> If you can point me to the part of the maker source code that construct the final genemark command line I can also take a look.
>>>>>>>> 
>>>>>>>> Best Regards,
>>>>>>>> Ray
>>>>>>>> 
>>>>>>>> Dr. Rongfeng (Ray) Cui
>>>>>>>> Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher
>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
>>>>>>>> Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
>>>>>>>> Tel.:+49 (0)221 496 <tel:+49%20221%20496> 
>>>>>>>> Mobile:           +49 0221 37970 496 <>
>>>>>>>> rcui at age.mpg.de <mailto:rcui at age.mpg.de>
>>>>>>>> www.age.mpg.de <http://www.age.mpg.de/> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, Feb 20, 2017 at 10:02 AM, Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>> wrote:
>>>>>>>> The names of scripts used are listed in the maker_exe.ctl file. It depends on if formatting or any flags have changed between versions.
>>>>>>>> 
>>>>>>>> —Carson
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Feb 20, 2017, at 1:59 AM, Ray Cui <rcui at age.mpg.de <mailto:rcui at age.mpg.de>> wrote:
>>>>>>>>> 
>>>>>>>>> Dear Carson,
>>>>>>>>> 
>>>>>>>>>         I have now run GeneMark-ET, and it produces a trained .mod file. I think it can be then passed to Maker. Do you know what is the final constructed command line in Maker that calls genemark? Genemark-et and es use the same perl script so one probably only needs to use the  --prediction  and --predict_with xxx.mod options to predict genes using the species specific parameters (bypassing regular training and prediction steps)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Best Regards,
>>>>>>>>> Ray
>>>>>>>>> 
>>>>>>>>> Dr. Rongfeng (Ray) Cui
>>>>>>>>> Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher
>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
>>>>>>>>> Tel.:+49 (0)221 496 <tel:+49%20221%20496> 
>>>>>>>>> Mobile:           +49 0221 37970 496 <>
>>>>>>>>> rcui at age.mpg.de <mailto:rcui at age.mpg.de>
>>>>>>>>> www.age.mpg.de <http://www.age.mpg.de/> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Mon, Feb 20, 2017 at 6:39 AM, Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>> wrote:
>>>>>>>>> MAKER was support was designed with GeneMark-ES. It may or may not work with GeneMark-ET. So any MAKER related archive posts etc. will be related to the latter.
>>>>>>>>> 
>>>>>>>>> With GeneMark-ES, you simply provided a genome assembly and let it run. It would then produce several files and output directories. The es.mod file was the one you provided to MAKER. I don’t know how this compares to GeneMark-ET.
>>>>>>>>> 
>>>>>>>>> —Carson
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Feb 14, 2017, at 8:44 AM, Ray Cui <rcui at age.mpg.de <mailto:rcui at age.mpg.de>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi Daniel,
>>>>>>>>>> 
>>>>>>>>>>         thanks! It seems that Genemark-ET has a "--training" flag, is that the flag I should use when training or should I just let Genemark also perform the prediction? 
>>>>>>>>>> 
>>>>>>>>>> Ray
>>>>>>>>>> 
>>>>>>>>>> Dr. Rongfeng (Ray) Cui
>>>>>>>>>> Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher
>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
>>>>>>>>>> Tel.:+49 (0)221 496 <tel:+49%20221%20496> 
>>>>>>>>>> Mobile:           +49 0221 37970 496 <>
>>>>>>>>>> rcui at age.mpg.de <mailto:rcui at age.mpg.de>
>>>>>>>>>> www.age.mpg.de <http://www.age.mpg.de/> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Tue, Feb 14, 2017 at 3:43 PM, Ence,daniel <d.ence at ufl.edu <mailto:d.ence at ufl.edu>> wrote:
>>>>>>>>>> Hi Ray, 
>>>>>>>>>> 
>>>>>>>>>> I think you’re on the right track with training Genemark with RNAseq data. It should only change the training steps, which are external to MAKER, but not how MAKER runs Genemark. You’ll still give MAKER the path to the “es.mod" file made by Genemark. 
>>>>>>>>>> 
>>>>>>>>>> For the 2nd question, in the MAKER beta 3, MAKER creates a control file for EVM, in which you set your weights for the various inputs, and then MAKER runs EVM alongside all the other gene predictors and chooses the model that is best supported by the evidence. 
>>>>>>>>>> 
>>>>>>>>>> ~Daniel
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Feb 14, 2017, at 7:38 AM, Ray Cui <rcui at age.mpg.de <mailto:rcui at age.mpg.de>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hello,
>>>>>>>>>>> 
>>>>>>>>>>>          I have sucessfully installed Maker beta 3, working with both Augustus and SNAP. I also want to try adding GeneMark-ES to the ab initio predictor.
>>>>>>>>>>>          When I read the GeneMark-ES manual, it says that one can use RNAseq data to aid training. I'm wondering what would be the best way to integrate Genemark-ET predictions into Maker. Should I run Genemark-ET independent of Maker, then integrate the GFF at some point during the maker process? If so, how should I edit the configuration file? Currently maker has an option called "gmhmm". Should I then train GeneMark by myself with RNAseq data, then feed the hmm to maker?
>>>>>>>>>>>           
>>>>>>>>>>>           And perhaps an unrelated question is that now Maker beta 3 supports EVM. I'm wondering how EVM is used by Maker (at which step, what does it do), and how does it differ from what Maker is designed for (both reconciles different gene models). 
>>>>>>>>>>> 
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Ray
>>>>>>>>>>> 
>>>>>>>>>>> Dr. Rongfeng (Ray) Cui
>>>>>>>>>>> Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
>>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher
>>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
>>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
>>>>>>>>>>> Tel.:+49 (0)221 496 <tel:+49%20221%20496> 
>>>>>>>>>>> Mobile:           +49 0221 37970 496 <>
>>>>>>>>>>> rcui at age.mpg.de <mailto:rcui at age.mpg.de>
>>>>>>>>>>> www.age.mpg.de <http://www.age.mpg.de/> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> maker-devel mailing list
>>>>>>>>>>> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
>>>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> maker-devel mailing list
>>>>>>>>>> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
>>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170316/efe396bc/attachment-0002.html>


More information about the maker-devel mailing list