[maker-devel] Advice on my pipeline

Sat Jul 1 05:21:37 MDT 2017

So I have assembled my transcriptome with Trinity using the jaccard clip option and I have run maker with and without corrected_est_fusion.

I have then use SNAP to train/filter it with:

maker2zff  specie.all.gff

Here are my results:

Number of gene after maker -> Number of gene after maker2zff

- Without corrected_est_fusion: 21621 -> 13875

- With corrected_est_fusion: 16850 -> 9098

1 )If I understand well how works corrected_est_fusion, because it prevents gene merging, shouldn't be the invert ?

Normally I should find more genes with corrected_est_fusion right ?

2) I think I should find something like 13000-14000 genes for my specie. SHould I go with the "Without corrected_est_fusion" for the 2nd iteration of maker ?

 Thanks for your help

Patrick Tran Van

Groups Chapuisat, Robinson-Rechavi & Schwander
Department of Ecology and Evolution
University of Lausanne
Le Biophore
CH-1015 Lausanne
Switzerland
Office 3206

________________________________
From: Carson Holt <carsonhh at gmail.com>
Sent: Monday, June 26, 2017 11:38 PM
To: Patrick Tran Van
Cc: maker-devel at yandell-lab.org
Subject: Re: [maker-devel] Advice on my pipeline

Sorry the option is —> correct_est_fusion

It is in the maker_opts.ctl file.

I would use both SNAP and Augustus on a few large contigs then review the results manually. If one of them is not behaving well, then drop it. If both behave well (i.e. correlate well with evidence alignemnts) then keep them both.

—Carson

On Jun 26, 2017, at 3:48 AM, Patrick Tran Van <Patrick.TranVan at unil.ch<mailto:Patrick.TranVan at unil.ch>> wrote:

Thanks for your answer.

1) Do you think that adding a Augustus training in addition to SNAP at the step 3 and 5 will add more confidence (instead of adding Augustus only for the final round) ?
Because I am using autoAug for this and it tooks a while to compute ..

2) I don't see this option : 'avoid_est_fusion=1' . I have tried to add it but I got this error:

WARNING: Invalid option 'avoid_est_fusion' in control file maker_opts.ctl

(I am using v 2.31.8 )

Patrick Tran Van

Groups Chapuisat, Robinson-Rechavi & Schwander
Department of Ecology and Evolution
University of Lausanne
Le Biophore
CH-1015 Lausanne
Switzerland
Office 3206

________________________________
From: Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>
Sent: Monday, June 5, 2017 8:29 PM
To: Patrick Tran Van
Cc: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Advice on my pipeline

Your plan sounds good. A couple of related notes.

Insect genomes tend to have high gene density, so gene merging will be the primary difficulty. You can avoid merging of mRNA-seq evidence by using options like jaccard_clip in Trinity. Then use avoid_est_fusion=1 inside of MAKER.

Also it is more convenient to do each run in the same directory rather than supplying the previous run as GFF3 input. MAKER will automatically recycle previous results archived in the run directory when you do this. Using the maker_gff option is really more for getting data into the run from jobs performed a long time ago (so they can’t be run in the same directory).

—Carson

On Jun 2, 2017, at 3:56 AM, Patrick Tran Van <Patrick.TranVan at unil.ch<mailto:Patrick.TranVan at unil.ch>> wrote:

Hello,

This is my first time running Maker for an insect genome annotation.

I have found various resources and tried to make a consensus, I am looking for your thoughts and advices about my pipeline, if I can improve something or doing useless things:

What I have:
- RNA evidence: transcriptome
- Proteine evidence: swissprot/uniprot + busco protein set of insect
- Cegma and busco results of my genome

1) Train SNAP with CEGMA

2) Run (run A) maker with repeat masking with transcript, protein, the new SNAP file (from step 1) and augustus file (from busco).

3) Create SNAP model from run A.

4) Run (run B ) with the new SNAP (done at step 3) with options turned off (est2genome=0) and (protein2genome=0) data, provide gff file (maker_gff=run_A.gff), turn off repeat masking (rm_pass=1), and use previous mapping results (altest_pass=1 and protein_pass=1).

5) Create SNAP model from run B.

6) Run (run C) with the new SNAP (done at step 5) with options turned off (est2genome=0) and (protein2genome=0) data, provide gff file (maker_gff=run_B.gff), turn off repeat masking (rm_pass=1), and use previous mapping results (altest_pass=1 and protein_pass=1).

7)  Create SNAP model from run C AND Create Augustus gene model from run C

8) Run (run D) with the new SNAP (done at step 7) + AUGUSTUS file (step 7) with options turned off (est2genome=0) and (protein2genome=0) data, provide gff file (maker_gff=run_C.gff), turn off repeat masking (rm_pass=1), and use previous mapping results (altest_pass=1 and protein_pass=1). + Use keep_preds=1

Does it seems coherent ?

Cheers,

Patrick Tran Van

Groups Chapuisat, Robinson-Rechavi & Schwander
Department of Ecology and Evolution
University of Lausanne
Le Biophore
CH-1015 Lausanne
Switzerland
Office 3206

_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170701/4836f907/attachment-0002.html>