[maker-devel] [maker] transcripts doesn't provide any help

Michael Campbell michael.s.campbell1 at gmail.com
Wed May 18 07:16:05 MDT 2016


Hi Pei-Ying,

The time it takes to run MAKER is a hard to guess because it is dependent on the size of the genome and the amount of evidence you give it. However, There may be more going on. Can you tell if MAKER is using all of the cores that you gave it? 

For training augustus, there are several options. Using the CEGMA output is a common method. Given that your genome is a 4G plant genome I don’t think GeneMark will perform well. If you used the step you mentioned below but left GeneMark out you may get a better training than you would with CEGMA output alone.

I’ve ccd Carson Holt, he has much more experience with the MPI aspects of MAKER and may have some additional insights. I’m also ccing the devlist. There may be others in the community that can comment on the run times.

Thanks,
Mike
> On May 17, 2016, at 10:10 PM, Pei-Ying Huang <themis.ray at gmail.com> wrote:
> 
> Hi mike,
> 
> My plant genome is about 4Gb, 93789 scaffolds. When I run maker using MPI on a server with 64 cores, only 1% of genome is annotated.
> Is it the normal condition? Since I read a post said that it takes about 6 days on 16 processor to finish one round on a ~150,000 scaffold ~2Gb vertebrate genome with protein evidence.
> Then based on the post, I expect I get the result no more than two weeks.  However, it seems it will take me more than three months.
> 
> Also I want to get a training set parameter by augustus, now I use CEGMA to produce a .gff file, then convert it to augustus.gff by cegma2gff.
> Then autotrain with augustus, here is my command
> autoAugTrain.pl --genome=GULI.genome.removeAllN.fa --trainingset=augustus.gff --species=A_autoAugTrain_1 &> log 
> 
> 
> But I saw one's method below, so I wonder if I am doing wrong?
> 
> "We get the genome.gff3 training set from the output of a first-pass run of MAKER using: 
> 1. EST data
> 2. Proteins from related species 
> 3. a SNAP model trained using CEGMA 
> 4. a GeneMark model (obtained by running GeneMark.ES on the draft genome) 
> 5. Running maker2zff on the output of MAKER, and converting that to GFF3
> Once done, we run MAKER a second time using the Augustus model and more stringent settings."
> 
> Thank you.
> Pei-Ying
> 
>  
> 
> 
> 
> 2016-05-18 9:16 GMT+08:00 Michael Campbell <michael.s.campbell1 at gmail.com <mailto:michael.s.campbell1 at gmail.com>>:
> Hi Pei-Ying,
> 
> One of the first places to start with RNA-seq quality control is using a tool called fastqc it will produce a number of graphics that can help identify problematic files. There are a number of tools for quality trimming reads, timmomatic and fastx tools are popular ones. 
> 
> I would only redo the sequencing if you are convinced that the original sequencing is bad.
> 
> Mike
> 
> 
>> On May 16, 2016, at 8:42 PM, Pei-Ying Huang <themis.ray at gmail.com <mailto:themis.ray at gmail.com>> wrote:
>> 
>> Hi mike,
>> 
>> As you said the reason I only get one gene with the transcript evidence is independent of MAKER and could be RNA-seq data quality or the expression profiles of the tissues used for mRNA-seq.
>> 
>> If the problem is due to RNA-seq data quality, how could I identify the RNA-seq data with bad quality and trim them out?
>> If the problem is due to expression profiles of the tissues used for mRNA-seq, should we try to extract RNA from the plant again and redo the sequencing?
>> Thank you.
>> 
>> Pei-Ying
>> 
>> 2016-05-09 22:18 GMT+08:00 Michael Campbell <michael.s.campbell1 at gmail.com <mailto:michael.s.campbell1 at gmail.com>>:
>> I did finish running the test I planned. What I noticed is that there is protein evidence for about 1,000 genes on that scaffold and transcript evidence for only one gene. The reason you only get one gene with the transcript evidence is independent of MAKER and could be RNA-seq data quality or the expression profiles of the tissues used for mRNA-seq. 
>> 
>> What you described is what I would do. Followed by training augustus. Unless est2genome=1 and prtein2genome=0 doesn’t generate enough gene models to train the gene finders. Then I would set est2genome=1 and protein2genome=1 for the first round instead.
>>  
>> Thanks,
>> Mike
>>> On May 8, 2016, at 10:08 AM, Pei-Ying Huang <themis.ray at gmail.com <mailto:themis.ray at gmail.com>> wrote:
>>> 
>>> Have you done all of the test?
>>> What would you suggest me to run my data?
>>> 
>>> To get ab initio model by setting the est2genome =1 and protein2genome = 0,
>>> then training with sanp model with est2genome = 0 and protein2genome = 0,
>>> training second snap model with est2genome = 0 and protein2genome = 0.
>>> 
>>> Thank you.
>>> 
>>> 2016-05-07 0:30 GMT+08:00 Michael Campbell <michael.s.campbell1 at gmail.com <mailto:michael.s.campbell1 at gmail.com>>:
>>> So far in the tests that I’ve done I get the same first exon as 5 prime UTR and part of the last exon in 3 prime UTR for that gene.
>>> Mike
>>>> On May 5, 2016, at 10:18 PM, Pei-Ying Huang <themis.ray at gmail.com <mailto:themis.ray at gmail.com>> wrote:
>>>> 
>>>> Hi Mike,
>>>> 
>>>> I found one five_prime_UTP evidence, but only this one shown in the scaff0001.
>>>> Does it mean no more five_prime_UTP on this scaffold or maker doesn't find others?
>>>> Thank you.
>>>> 
>>>> GULI.scaff0001	maker	gene	3190189	3192302	.	-	.	ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426
>>>> GULI.scaff0001	maker	mRNA	3190189	3192302	1262	-	.	ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1;_AED=0.27;_eAED=0.27;_QI=335|0.83|0.71|1|0|0|7|0|308
>>>> GULI.scaff0001	maker	exon	3190189	3190216	.	-	.	ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:6;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
>>>> GULI.scaff0001	maker	exon	3190331	3190656	.	-	.	ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:5;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
>>>> GULI.scaff0001	maker	exon	3190818	3190955	.	-	.	ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:4;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
>>>> GULI.scaff0001	maker	exon	3191233	3191510	.	-	.	ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:3;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
>>>> GULI.scaff0001	maker	exon	3191634	3191666	.	-	.	ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:2;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
>>>> GULI.scaff0001	maker	exon	3191755	3191848	.	-	.	ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
>>>> GULI.scaff0001	maker	exon	3191938	3192302	.	-	.	ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:0;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
>>>> GULI.scaff0001	maker	five_prime_UTR	3191968	3192302	.	-	.	ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:five_prime_utr;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
>>>> GULI.scaff0001	maker	CDS	3191938	3191967	.	-	0	ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
>>>> GULI.scaff0001	maker	CDS	3191755	3191848	.	-	0	ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
>>>> GULI.scaff0001	maker	CDS	3191634	3191666	.	-	2	ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
>>>> GULI.scaff0001	maker	CDS	3191233	3191510	.	-	2	ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
>>>> GULI.scaff0001	maker	CDS	3190818	3190955	.	-	0	ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
>>>> GULI.scaff0001	maker	CDS	3190331	3190656	.	-	0	ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
>>>> GULI.scaff0001	maker	CDS	3190189	3190216	.	-	1	ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1
>>>> 
>>>> Pei-Ying
>>>> 
>>>> 2016-05-06 8:31 GMT+08:00 Pei-Ying Huang <themis.ray at gmail.com <mailto:themis.ray at gmail.com>>:
>>>> Hi Mike,
>>>> 
>>>> Any clue about the problems?
>>>> Or my thought is wrong.  I judge the transcript data help or not in maker by checking if est2genome shown in the column 2 in maker output gff file.
>>>> Thank you.
>>>> 
>>>> Pei-Ying
>>>> 
>>>> 
>>>> 2016-05-05 1:22 GMT+08:00 Pei-Ying Huang <themis.ray at gmail.com <mailto:themis.ray at gmail.com>>:
>>>> Hi Mike,
>>>> 
>>>> Attached file is the folder I use to run maker. Thank you.
>>>>>>>>  guliRN_L1_v1_mike.tar.gz <https://drive.google.com/file/d/0B1vRN27dO1OBN01reFZLV3JKbGM/view?usp=drive_web>​
>>>> Pei-Ying
>>>> 
>>>> 2016-05-04 22:54 GMT+08:00 Michael Campbell <michael.s.campbell1 at gmail.com <mailto:michael.s.campbell1 at gmail.com>>:
>>>> Hi Pei-Ying,
>>>> 
>>>> If the sample data didn’t produce est2genome lines when using the sample data then it may be that exonerate is not being called. Could you send me the maker_exe.ctl file.
>>>> 
>>>> your maker_opts.ctl file looks fine.
>>>> 
>>>> If you have a small test set for your data like a small scaffold that you know has some sringtie hits on it, you could send it to me if you want and I can see if I can figure it out form here if that would be helpful. 
>>>> 
>>>> Thanks,
>>>> Mike
>>>>> On May 4, 2016, at 12:33 AM, Pei-Ying Huang <themis.ray at gmail.com <mailto:themis.ray at gmail.com>> wrote:
>>>>> 
>>>>> Hi Mike,
>>>>> 
>>>>> basic_protocol_1.tar.gz: I run the sample data by Basic protocol 1 in the attached protocol paper uses the drosophila data bundled with MAKER.
>>>>> 
>>>>> I still can't find est2genome in column 2 of gff file and no five_prime_UTR or three_prime_UTR in column 3.
>>>>> I use StringTie to align pair-end reads to genome then use cufflinks2gff to generate the .gff file for maker input.
>>>>> Since I have three conditions (root, stem, leaf), so I got Root_strtie.gff,Stem_strtie.gff, R_strtie.gff as maker inputs.
>>>>> 
>>>>> Should I merge Root_strtie.gff,Stem_strtie.gff, R_strtie.gff to strtie_merge.gff before input to maker?
>>>>> When I try to use cufflinks to convert strtie_merge.gtf to strtie_merge.gff, shows the error message below.
>>>>> 
>>>>> /home/pyh/bin/maker/bin/cufflinks2gff3 strtie_merge.gtf > strtie_merge.gff
>>>>> 
>>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, <IN> line 221531.
>>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, <IN> line 221532.
>>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, <IN> line 221533.
>>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, <IN> line 221534.
>>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, <IN> line 221535.
>>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, <IN> line 221536.
>>>>>>>>>>  maker1.log <https://drive.google.com/file/d/0B1vRN27dO1OBX1djVmpCMHhNT2M/view?usp=drive_web>​​
>>>>>  maker_opts.log <https://drive.google.com/file/d/0B1vRN27dO1OBVWgwcVRmQU1jaW8/view?usp=drive_web>​
>>>>> less A_guli_1.all.gff
>>>>> GULI.scaff0001  maker   gene    1750118 1755997 .       -       .       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37
>>>>> GULI.scaff0001  maker   mRNA    1750118 1755997 5292    -       .       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1;_AED=0.37;_eAED=0.37;_QI=0|0|0|1|0|0|7|0|1764
>>>>> GULI.scaff0001  maker   exon    1750118 1750214 .       -       .       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:21;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
>>>>> GULI.scaff0001  maker   exon    1750304 1750815 .       -       .       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:20;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
>>>>> GULI.scaff0001  maker   exon    1750896 1751717 .       -       .       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:19;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
>>>>> GULI.scaff0001  maker   exon    1751849 1752373 .       -       .       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:18;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
>>>>> GULI.scaff0001  maker   exon    1752515 1753488 .       -       .       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:17;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
>>>>> GULI.scaff0001  maker   exon    1753554 1754406 .       -       .       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:16;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
>>>>> GULI.scaff0001  maker   exon    1754489 1755997 .       -       .       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:15;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
>>>>> GULI.scaff0001  maker   CDS     1754489 1755997 .       -       0       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
>>>>> GULI.scaff0001  maker   CDS     1753554 1754406 .       -       0       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
>>>>> GULI.scaff0001  maker   CDS     1752515 1753488 .       -       2       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
>>>>> GULI.scaff0001  maker   CDS     1751849 1752373 .       -       0       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
>>>>> GULI.scaff0001  maker   CDS     1750896 1751717 .       -       0       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
>>>>> GULI.scaff0001  maker   CDS     1750304 1750815 .       -       0       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
>>>>> GULI.scaff0001  maker   CDS     1750118 1750214 .       -       1       ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1
>>>>> 
>>>>> Thank you.
>>>>> Pei-Ying
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 2016-04-14 21:09 GMT+08:00 Michael Campbell <michael.s.campbell1 at gmail.com <mailto:michael.s.campbell1 at gmail.com>>:
>>>>> It is strange for transcripts from the species of interest to not align or help. That FASTA entry looks okay. Did you save the error output from MAKER? if you did could you send it to me along with the MAKER control files? There may be some clues in there. 
>>>>> 
>>>>> It would also be good if you could run MAKER on the sample data from drosophila in the /data folder in MAKER. This way we can see if it is your data or your install of MAKER. Basic protocol 1 in the attached protocol paper uses the drosophila data bundled with MAKER.
>>>>> 
>>>>> Aligning with hisat2 and using cufflinks to make transcripts should work. Stringtie seems to have higher specificity than cufflinks and the cufflinks2gff script works on stringtie output as well. You could also do a denovo assembly of the reads yourself using trinity, which has worked well for me in the past. 
>>>>> 
>>>>> Protein evidence only will give a reasonable annotation. The transcript data will help in annotating UTRs and species specific genes.
>>>>> 
>>>>> The attached protocol paper also addresses your quality question to an extent.
>>>>> 
>>>>> 
>>>>> <basic_protocol_1.tar.gz>
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20160518/50add83b/attachment-0002.html>


More information about the maker-devel mailing list