[maker-devel] Non-redundant Reference Human EST Data
Daniel Ence
dence at genetics.utah.edu
Mon May 18 08:38:15 MDT 2015
Hi Julian,
The RefSeq NM models would be a good place to start for evidence, since those are curated manually. Don’t concatenate the protein and EST files together; putting amino acid seq in as EST will only give you errors in blast and vice versa.
The number of files you use in your annotation doesn’t matter as much as the quantity and breadth of the evidence that you use.
I don’t think that putting the RefSeq models as EST and protein evidence will make a big difference, but you’d have to put the human ESTs in as alt_ests, which takes longer to blast.
I think a good rule of thumb for the protein evidence is to have protein evidence from two genome that are about the same distance from your target genome and a third set from a genome that’s an outgroup to all three genomes. Another good source for protein evidence is the UniRef database.
Let me know if that helps,
Daniel
On May 18, 2015, at 7:53 AM, Julian Egger <julian.egger at omahazoo.com<mailto:julian.egger at omahazoo.com>> wrote:
Hi Carson,
Thank you for the response. I had assumed using human transcripts instead of ESTs might be the way to go, but I had just read so much about using ESTs for annotation. You said to use either the human proteome or human transcripts, would using both in the MAKER setup file be too inefficient? As far as using either data type, since we are trying to annotate as many genes as possible from our scaffolds, we are looking for a single file to use. For either mRNA transcripts or protein sequences, would using data from ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/mRNA_Prot/<ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/mRNA_Prot/>be good reference data for our scaffolds? That directory has both protein files and rna files. I am not sure if a good option would be too concatenate either the rna files or protein files together and use that as the est= or protein= file. Otherwise, is there a better reference source people use for MAKER?
Thanks,
Julian
From: Carson Holt [mailto:carsonhh at gmail.com]
Sent: Friday, May 15, 2015 10:31 AM
To: Julian Egger
Cc: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Non-redundant Reference Human EST Data
Hi Julian,
Using Human EST’s on primate contigs would be very inefficient. The human genome is already annotated, so you should instead use either the human proteome or human transcripts as input. Using EST’s from another species other than the one being annotated should only be done if there is not a curated annotation set to use instead.
You may be able to just give the human transcripts to the est= option if the two organisms have not diverged too much in nucleotide sequence. Don’t use the alt_est option since you have human protein annotations. The alt_est option uses tblastx to seed the alignments which will not be as accurate as the protein= option that seeds via blastx, and it is about 10 time more expensive computationally. So it will take a lot longer and you won’t get anything that you couldn’t have found using the protein data instead.
Also scaffolds shorter than about 10kb will likely be too short to annotate, so you can test out your parameters on a few of only the largest contigs. In addition, don’t use SNAP because it performs poorly on primate genomes (use Augustus instead).
Thanks,
Carson
On May 14, 2015, at 12:17 PM, Julian Egger <julian.egger at omahazoo.com<mailto:julian.egger at omahazoo.com>> wrote:
We have assembled scaffolds from genomic reads of a primate sample and would like to annotate as many genes as possible with MAKER. Where is the best place to find an EST file to use with MAKER containing all of the non-redundant reference humans ESTs? Was trying to look around NCBI, Ensembl, and UCSC, but not sure what the ftp site, subdirectory, and file name would be for something like that.
Thanks
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20150518/02a5eeb3/attachment-0003.html>
More information about the maker-devel
mailing list