<html><head><base href="x-msg://37/"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hi Diana,<div><br></div><div>There is a Perl library - The Genome Annotation Library - that is designed to make writing code like this easy.  I just added a script to this library called gal_CDS_sequence which you would run like this:</div><div><br></div><div>gal_CDS_sequence --translate genes.gff3 genome.fasta</div><div><br></div><div>The focus of GAL is to try to make writing quick scripts like this easy, so if you're comfortable with a bit of Perl, you can modify existing scripts and write new ones to search, iterate through, and traverse the relationships of features in GFF3 files.</div><div><br></div><div>You can access the library here:</div><div><br></div><div><a href="http://www.sequenceontology.org/software/GAL.html">http://www.sequenceontology.org/software/GAL.html</a></div><div><br></div><div>Support for GAL is available via the SO mailing list:</div><div><br></div><div><a href="https://lists.sourceforge.net/lists/listinfo/song-devel">https://lists.sourceforge.net/lists/listinfo/song-devel</a></div><div><br></div><div>Hope that helps,</div><div><br></div><div>Barry</div><div><br></div><div><div><div>On Mar 24, 2014, at 5:11 PM, Diana Garnica Moreno wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div><div style="font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255); font-family: Calibri, Arial, Helvetica, sans-serif; "><div style="margin-top: 0px; margin-bottom: 0px; ">Hi there,<br></div><div style="margin-top: 0px; margin-bottom: 0px; "><br></div><div style="margin-top: 0px; margin-bottom: 0px; ">We recently assembled a fungal genome using MAKER and we got the gene models. and the corresponding transcripts, predicted proteins and GFF files. However, the predicted proteins do not have the stop codon included so I do not know which proteins are complete and which ones are incomplete at the 3' end. To solve that I have used different programs to extract the fasta sequence of the CDSs given the gff file and the genome sequence. The problem is that with the tools I have tested I get the right sequence for some of the proteins and wrong sequences for others (with multiple stop codons for example). I am not sure why it happens and since it happens with different tools (different python scripts and even gffread from cufflink) I do not know where is the problem. Could you please give me some advice on how to extract the right sequences with the stop codons included?<br></div><div style="margin-top: 0px; margin-bottom: 0px; "><br></div><div style="margin-top: 0px; margin-bottom: 0px; ">Thanks!<br></div><div style="margin-top: 0px; margin-bottom: 0px; "><br></div><div style="margin-top: 0px; margin-bottom: 0px; ">Diana<br></div></div>_______________________________________________<br>maker-devel mailing list<br><a href="mailto:maker-devel@box290.bluehost.com">maker-devel@box290.bluehost.com</a><br><a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a><br></div></blockquote></div><br><div apple-content-edited="true">

<div><span class="Apple-style-span" style="font-family: Arial; font-size: 12px; "><div>Barry Moore</div><div>Research Scientist</div><div>Dept. of Human Genetics</div><div>University of Utah</div><div>Salt Lake City, UT 84112</div><div>--------------------------------------------</div><div>(801) 585-3543</div><div><br class="khtml-block-placeholder"></div></span></div><div><br></div><br class="Apple-interchange-newline">

</div>

<br></div></body></html>