[maker-devel] advanced repeat libraries

Xabier Vázquez-Campos xvazquezc at gmail.com
Tue Jul 4 22:05:10 MDT 2017


Hi,
I'm dealing with a fungal genome with at least 40% of repeats, so I'm
trying to follow the advanced repeat construction protocol.
So far, so good, but I have doubts about how to build the protein database
as explained at the end of the page

In summary
1. get SwissProt and RefSeq fungal proteins
2. tblastn (from 1) against EST-NCBI database and keep the matches
3. blastp the output from 2 against the transposase protein db. Remove
matches
but from here on I'm a bit lost...

"Finally, the rice protein sequences were compared with verified
transposons (such as Pack-MULEs) in the rice genome. If the protein
sequence matched a transposon perfectly and was the only perfect match in
the genome, the relevant protein sequence was excluded. Although elements
such as Pack-MULEs contain true gene sequences, the annotation (the protein
sequence in the database) often extends to non-gene sequences such as
terminal inverted repeat or sub-terminal repeat, which are not true plant
proteins and would cause great complications. As a result, it is essential
to exclude them."

Are the proteins kept at the end of the step 3 the 'protein database'?
Could you provide a bit more detail on how to tackle this?

Thank you in advance,
Xabi

-- 
Xabier Vázquez-Campos, *PhD*
*Research Associate*
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170705/0277d019/attachment-0002.html>


More information about the maker-devel mailing list