[maker-devel] CDS retrieve from augustus_masked

Barry Moore barry.utah at gmail.com
Sat Apr 6 14:50:29 MDT 2013


On Apr 6, 2013, at 12:45 PM, Kang, Yang Jae wrote:

> Thank you for quick response again!
>  
> I found the non-ATG starting sequences in transcript file. I thought this would be the UTR traces, and

The gene predictors will occasionally produce a transcript with no start/stop codon, set always_complete=1 in maker_opts.clt to get MAKER to try hard to force a start/stop codon.

> I additionally found the offset value some position after ‘>’ letter. Is that indicate the starting ATG?

I didn't really understand that question...

> Secondly, there is several files named *.augustus_masked.proteins.fasta, *.non_overlapping_ab_initio.proteins.fasta, and *.proteins.fasta. What is the criteria of splitting those files? The reason why I’m asking is that some genes were

augustus_masked is a file that contains proteins of all predictions make by Augustus when working on masked sequence.  Setting unmask=1 in maker_opts.ctl would instruct MAKER to also run the gene predictors on unmasked sequence and then you'd have a augustus_unmasked file for those predicitions.  The non_overlapping_ab_initio files contain proteins predicted by all gene predictors for which MAKER could not find protein/RNA evidence for, so they are unsupported by physical evidence.  These unsupported predictions are not promoted by MAKER into annotations in it's final output, but they are included in these files in case you want to work with them.  The non_overlapping part of the name means that if multiple gene predictors produce overlapping un support ab initio predictions then MAKER will only output one of them.


> redundant between *.augustus_masked.proteins.fasta and *.proteins.fasta.

Yes, the proteins for genes for which MAKER creates annotations will be in both files.

>  
> Thank you
>  
> From: Carson Holt [mailto:carsonhh at gmail.com] 
> Sent: Sunday, April 07, 2013 12:13 AM
> To: Michael Thon; Kang, Yang Jae
> Cc: maker-devel at yandell-lab.org
> Subject: Re: [maker-devel] CDS retrieve from augustus_masked
>  
> Augustus only predicts UTR for a handful of organisms.  I trim them off the rejected models before outputting to the GFF3 as match/match_part features (per my previous e-mail concerning the limitations of GFF3).
>  
>  --Carson
>  
> From: Michael Thon <mike.thon at gmail.com>
> Date: Saturday, 6 April, 2013 10:37 AM
> To: "Kang, Yang Jae" <kangyangjae at gmail.com>
> Cc: <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] CDS retrieve from augustus_masked
>  
> Thats a good point because 'transcripts' implies that it would have the UTRs. Does augustus predict the UTRs?  I manually checked the translations of the .transcript. file and I only found valid translations but that does not mean that UTRs could not be present...
> On Apr 6, 2013, at 1:24 PM, "Kang, Yang Jae" <kangyangjae at gmail.com> wrote:
> 
> 
> Thank for your quick response Mike
> I looked the file named transcript, but it might include UTRs I suspect. What I want to do is calculating Ka Ks values so that I need coding sequences. Is there any indication where is exact START and STOP in the transcript file?
>  
> Thank you
>  
>  
> From: Michael Thon [mailto:mike.thon at gmail.com] 
> Sent: Saturday, April 06, 2013 8:20 PM
> To: Kang, Yang Jae
> Cc: maker-devel at yandell-lab.org
> Subject: Re: [maker-devel] CDS retrieve from augustus_masked
>  
> Hi Kang - After running fasta_merge there should be a file:
>  
> [prefix].all.maker.augustus_masked.transcripts.fasta
>  
> in the output directory.  Is that what you need?
> Mike
>  
> On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" <kangyangjae at gmail.com> wrote:
> 
> 
> 
> Dear everyone!
>  
> I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information.
>  
> Thank you!
>  
> Kang, Yang Jae
> Ph.D.
> Cropgenomics Lab.
> College of Agriculture and Life Science
> Seoul National University
> Korea
>  
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>  
> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20130406/c635abfb/attachment-0002.html>


More information about the maker-devel mailing list