[maker-devel] Advice on my pipeline
Patrick Tran Van
patrick.tranvan at unil.ch
Sat Jul 1 05:21:37 MDT 2017
So I have assembled my transcriptome with Trinity using the jaccard clip option and I have run maker with and without corrected_est_fusion.
I have then use SNAP to train/filter it with:
maker2zff specie.all.gff
Here are my results:
Number of gene after maker -> Number of gene after maker2zff
- Without corrected_est_fusion: 21621 -> 13875
- With corrected_est_fusion: 16850 -> 9098
1 )If I understand well how works corrected_est_fusion, because it prevents gene merging, shouldn't be the invert ?
Normally I should find more genes with corrected_est_fusion right ?
2) I think I should find something like 13000-14000 genes for my specie. SHould I go with the "Without corrected_est_fusion" for the 2nd iteration of maker ?
Thanks for your help
Patrick Tran Van
Groups Chapuisat, Robinson-Rechavi & Schwander
Department of Ecology and Evolution
University of Lausanne
Le Biophore
CH-1015 Lausanne
Switzerland
Office 3206
________________________________
From: Carson Holt <carsonhh at gmail.com>
Sent: Monday, June 26, 2017 11:38 PM
To: Patrick Tran Van
Cc: maker-devel at yandell-lab.org
Subject: Re: [maker-devel] Advice on my pipeline
Sorry the option is —> correct_est_fusion
It is in the maker_opts.ctl file.
I would use both SNAP and Augustus on a few large contigs then review the results manually. If one of them is not behaving well, then drop it. If both behave well (i.e. correlate well with evidence alignemnts) then keep them both.
—Carson
On Jun 26, 2017, at 3:48 AM, Patrick Tran Van <Patrick.TranVan at unil.ch<mailto:Patrick.TranVan at unil.ch>> wrote:
Thanks for your answer.
1) Do you think that adding a Augustus training in addition to SNAP at the step 3 and 5 will add more confidence (instead of adding Augustus only for the final round) ?
Because I am using autoAug for this and it tooks a while to compute ..
2) I don't see this option : 'avoid_est_fusion=1' . I have tried to add it but I got this error:
WARNING: Invalid option 'avoid_est_fusion' in control file maker_opts.ctl
(I am using v 2.31.8 )
Patrick Tran Van
Groups Chapuisat, Robinson-Rechavi & Schwander
Department of Ecology and Evolution
University of Lausanne
Le Biophore
CH-1015 Lausanne
Switzerland
Office 3206
________________________________
From: Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>
Sent: Monday, June 5, 2017 8:29 PM
To: Patrick Tran Van
Cc: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Advice on my pipeline
Your plan sounds good. A couple of related notes.
Insect genomes tend to have high gene density, so gene merging will be the primary difficulty. You can avoid merging of mRNA-seq evidence by using options like jaccard_clip in Trinity. Then use avoid_est_fusion=1 inside of MAKER.
Also it is more convenient to do each run in the same directory rather than supplying the previous run as GFF3 input. MAKER will automatically recycle previous results archived in the run directory when you do this. Using the maker_gff option is really more for getting data into the run from jobs performed a long time ago (so they can’t be run in the same directory).
—Carson
On Jun 2, 2017, at 3:56 AM, Patrick Tran Van <Patrick.TranVan at unil.ch<mailto:Patrick.TranVan at unil.ch>> wrote:
Hello,
This is my first time running Maker for an insect genome annotation.
I have found various resources and tried to make a consensus, I am looking for your thoughts and advices about my pipeline, if I can improve something or doing useless things:
What I have:
- RNA evidence: transcriptome
- Proteine evidence: swissprot/uniprot + busco protein set of insect
- Cegma and busco results of my genome
1) Train SNAP with CEGMA
2) Run (run A) maker with repeat masking with transcript, protein, the new SNAP file (from step 1) and augustus file (from busco).
3) Create SNAP model from run A.
4) Run (run B ) with the new SNAP (done at step 3) with options turned off (est2genome=0) and (protein2genome=0) data, provide gff file (maker_gff=run_A.gff), turn off repeat masking (rm_pass=1), and use previous mapping results (altest_pass=1 and protein_pass=1).
5) Create SNAP model from run B.
6) Run (run C) with the new SNAP (done at step 5) with options turned off (est2genome=0) and (protein2genome=0) data, provide gff file (maker_gff=run_B.gff), turn off repeat masking (rm_pass=1), and use previous mapping results (altest_pass=1 and protein_pass=1).
7) Create SNAP model from run C AND Create Augustus gene model from run C
8) Run (run D) with the new SNAP (done at step 7) + AUGUSTUS file (step 7) with options turned off (est2genome=0) and (protein2genome=0) data, provide gff file (maker_gff=run_C.gff), turn off repeat masking (rm_pass=1), and use previous mapping results (altest_pass=1 and protein_pass=1). + Use keep_preds=1
Does it seems coherent ?
Cheers,
Patrick Tran Van
Groups Chapuisat, Robinson-Rechavi & Schwander
Department of Ecology and Evolution
University of Lausanne
Le Biophore
CH-1015 Lausanne
Switzerland
Office 3206
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170701/4836f907/attachment-0002.html>
More information about the maker-devel
mailing list