[maker-devel] MAKER output has different genes with same name

Carson Holt carsonhh at gmail.com
Mon Jul 18 09:29:54 MDT 2016


Normally a second run should be done in the same directory as opposed to passing in the previous GFF3. Using GFF3 passthrough is meant as a round about way of getting previous results into a new run (for example a previous version of an annotation set where you need to keep the old annoations for some reason and don’t have access to the original data files). You actually lose certain info that was available in the BLAST reports but cannot be recovered from the GFF3 for example.

Both model_pass and pred_pass should probably be set to 0 if you are letting things rerun by providing snaphmm.

Also check your input GFF3 for duplicates, as those will iteratively feed into the next run.

—Carson
 



> On Jul 18, 2016, at 9:20 AM, Matt Simenc <mcsimenc at gmail.com> wrote:
> 
> Update:
> 
> So I isolated a single scaffold to run MAKER on and test different parameters. With map_forward=1 the duplicates disappeared. 
> 
> However this does not entirely take care of the issue with the entire assembly. There are still some duplicates. I tried using the -a command line option and it reduced the number of duplicate IDs for different features by 2, but I don't know what to do. It's important if I know maker is keeping the features in order or if it's possible maker is mixing up exons and CDSs between different gene and mRNA features.
> 
> Thanks!
> 
> On Sun, Jul 17, 2016 at 4:39 PM, Matt Simenc <mcsimenc at gmail.com <mailto:mcsimenc at gmail.com>> wrote:
> Hi, I figured out the problem. I needed to use map_forward=1. With that set, no duplicates.
> 
> Matt
> 
> On Sat, Jul 16, 2016 at 10:40 PM, Matt Simenc <mcsimenc at gmail.com <mailto:mcsimenc at gmail.com>> wrote:
> I have been using MAKER to iteratively update previous run's annotations by running ab initios with fresh training and feeding the previous run's GFF using the maker_gff option like this:
> 
> maker_gff=previous_run.gff
> 
> est_pass=1
> 
> altest_pass=1
> 
> protein_pass=1
> 
> rm_pass=1
> 
> model_pass=1
> 
> pred_pass=1
> 
> other_pass=0
> 
> 
> 
> Along the way it seems that non-identical features with the same name, some covering the same region and some not, accumulate. When I use fasta_merge -d ...index.log I get sequences for the duplicates. Am I using the control file options incorrectly? Any suggestions how to select final models? Or should I redo the runs if I had some settings wrong?
> 
> 
> 
> Here is a snippet of the gff produced by gff3_merge -d ...index.log showing duplicate models:
> 
> -------------------------------------
> 
> Sacu_v1_s0077	maker	gene	136647	138568	.	-	.	ID=snap_masked-Sacu_v1_s0077-abinit-gene-1.20;Name=snap_masked-Sacu_v1_s0077-abinit-gene-1.20;score=70.704
> 
> Sacu_v1_s0077	maker	mRNA	136647	138568	.	-	.	ID=snap_masked-Sacu_v1_s0077-abinit-gene-1.20-mRNA-1;Parent=snap_masked-Sacu_v1_s0077-abinit-gene-1.20;Name=snap_masked-Sacu_v1_s0077-abinit-gene-1.20-mRNA-1;_AED=1.00;_eAED=1.00;_QI=0|0|0|0|1|1|5|0|158;score=70.704
> 
> Sacu_v1_s0077	maker	exon	138512	138568	.	-	.	ID=snap_masked-Sacu_v1_s0077-abinit-gene-1.20-mRNA-1:exon:2329;Parent=snap_masked-Sacu_v1_s0077-abinit-gene-1.20-mRNA-1
> 
> Sacu_v1_s0077	maker	exon	138297	138361	.	-	.	ID=snap_masked-Sacu_v1_s0077-abinit-gene-1.20-mRNA-1:exon:2328;Parent=snap_masked-Sacu_v1_s0077-abinit-gene-1.20-mRNA-1
> 
> Sacu_v1_s0077	maker	exon	137723	137786	.	-	.	ID=snap_masked-Sacu_v1_s0077-abinit-gene-1.20-mRNA-1:exon:2327;Parent=snap_masked-Sacu_v1_s0077-abinit-gene-1.20-mRNA-1
> 
> Sacu_v1_s0077	maker	exon	137578	137643	.	-	.	ID=snap_masked-Sacu_v1_s0077-abinit-gene-1.20-mRNA-1:exon:2326;Parent=snap_masked-Sacu_v1_s0077-abinit-gene-1.20-mRNA-1
> 
> Sacu_v1_s0077	maker	exon	136647	136871	.	-	.	ID=snap_masked-Sacu_v1_s0077-abinit-gene-1.20-mRNA-1:exon:2325;Parent=snap_masked-Sacu_v1_s0077-abinit-gene-1.20-mRNA-1
> 
> Sacu_v1_s0077	maker	CDS	138512	138568	.	-	0	ID=snap_masked-Sacu_v1_s0077-abinit-gene-1.20-mRNA-1:cds;Parent=snap_masked-Sacu_v1_s0077-abinit-gene-1.20-mRNA-1
> 
> Sacu_v1_s0077	maker	CDS	138297	138361	.	-	0	ID=snap_masked-Sacu_v1_s0077-abinit-gene-1.20-mRNA-1:cds;Parent=snap_masked-Sacu_v1_s0077-abinit-gene-1.20-mRNA-1
> 
> Sacu_v1_s0077	maker	CDS	137723	137786	.	-	1	ID=snap_masked-Sacu_v1_s0077-abinit-gene-1.20-mRNA-1:cds;Parent=snap_masked-Sacu_v1_s0077-abinit-gene-1.20-mRNA-1
> 
> Sacu_v1_s0077	maker	CDS	137578	137643	.	-	0	ID=snap_masked-Sacu_v1_s0077-abinit-gene-1.20-mRNA-1:cds;Parent=snap_masked-Sacu_v1_s0077-abinit-gene-1.20-mRNA-1
> 
> Sacu_v1_s0077	maker	CDS	136647	136871	.	-	0	ID=snap_masked-Sacu_v1_s0077-abinit-gene-1.20-mRNA-1:cds;Parent=snap_masked-Sacu_v1_s0077-abinit-gene-1.20-mRNA-1
> 
> Sacu_v1_s0077	maker	gene	98236	98541	.	-	.	ID=snap_masked-Sacu_v1_s0077-abinit-gene-1.20;Name=snap_masked-Sacu_v1_s0077-abinit-gene-1.20;score=18.18,18.18,18.18
> 
> 
> Sacu_v1_s0077	maker	mRNA	98236	98541	.	-	.	ID=snap_masked-Sacu_v1_s0077-abinit-gene-1.20-mRNA-1;Parent=snap_masked-Sacu_v1_s0077-abinit-gene-1.20;Name=snap_masked-Sacu_v1_s0077-abinit-gene-1.20-mRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|1|1|0|101;score=18.18,18.18,18.18
> 
> 
> 
> 
> 
> 
> 
> Sacu_v1_s0004	maker	gene	4775142	4775554	.	+	.	ID=snap_masked-Sacu_v1_s0004-abinit-gene-47.3;Name=snap_masked-Sacu_v1_s0004-abinit-gene-47.3;score=14.976
> 
> Sacu_v1_s0004	maker	mRNA	4775142	4775554	.	+	.	ID=snap_masked-Sacu_v1_s0004-abinit-gene-47.3-mRNA-1;Parent=snap_masked-Sacu_v1_s0004-abinit-gene-47.3;Name=snap_masked-Sacu_v1_s0004-abinit-gene-47.3-mRNA-1;_AED=1.00;_eAED=1.00;_QI=0|0|0|0|1|1|2|0|129;score=14.976
> 
> Sacu_v1_s0004	maker	exon	4775142	4775330	.	+	.	ID=snap_masked-Sacu_v1_s0004-abinit-gene-47.3-mRNA-1:exon:204;Parent=snap_masked-Sacu_v1_s0004-abinit-gene-47.3-mRNA-1
> 
> Sacu_v1_s0004	maker	exon	4775354	4775554	.	+	.	ID=snap_masked-Sacu_v1_s0004-abinit-gene-47.3-mRNA-1:exon:205;Parent=snap_masked-Sacu_v1_s0004-abinit-gene-47.3-mRNA-1
> 
> Sacu_v1_s0004	maker	CDS	4775142	4775330	.	+	0	ID=snap_masked-Sacu_v1_s0004-abinit-gene-47.3-mRNA-1:cds;Parent=snap_masked-Sacu_v1_s0004-abinit-gene-47.3-mRNA-1
> 
> Sacu_v1_s0004	maker	CDS	4775354	4775554	.	+	0	ID=snap_masked-Sacu_v1_s0004-abinit-gene-47.3-mRNA-1:cds;Parent=snap_masked-Sacu_v1_s0004-abinit-gene-47.3-mRNA-1
> 
> Sacu_v1_s0004	maker	gene	4767976	4768158	.	-	.	ID=snap_masked-Sacu_v1_s0004-abinit-gene-47.3;Name=snap_masked-Sacu_v1_s0004-abinit-gene-47.3;score=-0.624,-0.624,-0.624
> 
> Sacu_v1_s0004	maker	mRNA	4767976	4768158	.	-	.	ID=snap_masked-Sacu_v1_s0004-abinit-gene-47.3-mRNA-1;Parent=snap_masked-Sacu_v1_s0004-abinit-gene-47.3;Name=snap_masked-Sacu_v1_s0004-abinit-gene-47.3-mRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|1|1|0|60;score=-0.624,-0.624,-0.624
> 
> Sacu_v1_s0004	maker	exon	4767976	4768158	.	-	.	ID=snap_masked-Sacu_v1_s0004-abinit-gene-47.3-mRNA-1:exon:211;Parent=snap_masked-Sacu_v1_s0004-abinit-gene-47.3-mRNA-1
> 
> Sacu_v1_s0004	maker	CDS	4767976	4768158	.	-	0	ID=snap_masked-Sacu_v1_s0004-abinit-gene-47.3-mRNA-1:cds;Parent=snap_masked-Sacu_v1_s0004-abinit-gene-47.3-mRNA-1
> 
> 
> Sacu_v1_s0004	snap_masked	match	4775142	4775554	14.976	+	.	ID=Sacu_v1_s0004:hit:181:4.5.0.47;Name=snap_masked-Sacu_v1_s0004-abinit-gene-47.3-mRNA-1;score=14.976
> 
> 
> 
> Here the models' headers from the maker.proteins.fasta:
> 
> -------------------------------------
> 
> >snap_masked-Sacu_v1_s0004-abinit-gene-47.3-mRNA-1 protein AED:1.00 eAED:1.00 QI:0|0|0|0|1|1|2|0|129
> 
> 
> >snap_masked-Sacu_v1_s0004-abinit-gene-47.3-mRNA-1 protein AED:1.00 eAED:1.00 QI:0|-1|0|0|-1|1|1|0|60
> 
> >snap_masked-Sacu_v1_s0077-abinit-gene-1.20-mRNA-1 protein AED:1.00 eAED:1.00 QI:0|0|0|0|1|1|5|0|158
> 
> 
> >snap_masked-Sacu_v1_s0077-abinit-gene-1.20-mRNA-1 protein AED:1.00 eAED:1.00 QI:0|-1|0|0|-1|1|1|0|101
> 
> 
> 
> 
> 
> Thanks!
> 
> Matt
> 
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20160718/2fc45d72/attachment-0001.html>


More information about the maker-devel mailing list