[maker-devel] Non-redundant Reference Human EST Data
Julian Egger
julian.egger at omahazoo.com
Mon May 18 09:17:46 MDT 2015
Ok great. I had looked at those files as well. So I could set both est=rna.fa and protein=protein.fa with files from
ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/protein/
ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/RNA/
?
Would that be worthwhile or would it just slow things up to a point where it wouldn't be worth it?
Thanks again,
Julian
________________________________
From: Carson Holt [carsonhh at gmail.com]
Sent: Monday, May 18, 2015 10:16 AM
To: Julian Egger
Cc: maker-devel at yandell-lab.org; Daniel Ence
Subject: Re: [maker-devel] Non-redundant Reference Human EST Data
Best sources —>
ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/protein/
ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/RNA/
—Carson
On May 18, 2015, at 9:08 AM, Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>> wrote:
If you decide to use human transcripts because it is a closely related primate, put them into the est= option. Like I said in the previous e-mail, they may not align (because of nucleotide divergence), but you don’t want to use the alt_est option because you have proteins which will align better than alt_est and will align much faster. You can use both transcripts and proteins if you want. Don’t use human EST’s. There will be no benefit. They contain the same information as the annotated transcripts and proteins but will be noisier. You can download human transcripts and proteins from the RefSeq FTP server. Then add any additional proteomes you choose to use as additional evidence. Novel genes will only be discoverable if you have EST’s from the species being annotated. But without those, you can still identify orthologs and paralogs from other species.
—Carson
On May 18, 2015, at 8:38 AM, Daniel Ence <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>> wrote:
Hi Julian,
The RefSeq NM models would be a good place to start for evidence, since those are curated manually. Don’t concatenate the protein and EST files together; putting amino acid seq in as EST will only give you errors in blast and vice versa.
The number of files you use in your annotation doesn’t matter as much as the quantity and breadth of the evidence that you use.
I don’t think that putting the RefSeq models as EST and protein evidence will make a big difference, but you’d have to put the human ESTs in as alt_ests, which takes longer to blast.
I think a good rule of thumb for the protein evidence is to have protein evidence from two genome that are about the same distance from your target genome and a third set from a genome that’s an outgroup to all three genomes. Another good source for protein evidence is the UniRef database.
Let me know if that helps,
Daniel
On May 18, 2015, at 7:53 AM, Julian Egger <julian.egger at omahazoo.com<mailto:julian.egger at omahazoo.com>> wrote:
Hi Carson,
Thank you for the response. I had assumed using human transcripts instead of ESTs might be the way to go, but I had just read so much about using ESTs for annotation. You said to use either the human proteome or human transcripts, would using both in the MAKER setup file be too inefficient? As far as using either data type, since we are trying to annotate as many genes as possible from our scaffolds, we are looking for a single file to use. For either mRNA transcripts or protein sequences, would using data from ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/mRNA_Prot/<ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/mRNA_Prot/>be good reference data for our scaffolds? That directory has both protein files and rna files. I am not sure if a good option would be too concatenate either the rna files or protein files together and use that as the est= or protein= file. Otherwise, is there a better reference source people use for MAKER?
Thanks,
Julian
From: Carson Holt [mailto:carsonhh at gmail.com]
Sent: Friday, May 15, 2015 10:31 AM
To: Julian Egger
Cc: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Non-redundant Reference Human EST Data
Hi Julian,
Using Human EST’s on primate contigs would be very inefficient. The human genome is already annotated, so you should instead use either the human proteome or human transcripts as input. Using EST’s from another species other than the one being annotated should only be done if there is not a curated annotation set to use instead.
You may be able to just give the human transcripts to the est= option if the two organisms have not diverged too much in nucleotide sequence. Don’t use the alt_est option since you have human protein annotations. The alt_est option uses tblastx to seed the alignments which will not be as accurate as the protein= option that seeds via blastx, and it is about 10 time more expensive computationally. So it will take a lot longer and you won’t get anything that you couldn’t have found using the protein data instead.
Also scaffolds shorter than about 10kb will likely be too short to annotate, so you can test out your parameters on a few of only the largest contigs. In addition, don’t use SNAP because it performs poorly on primate genomes (use Augustus instead).
Thanks,
Carson
On May 14, 2015, at 12:17 PM, Julian Egger <julian.egger at omahazoo.com<mailto:julian.egger at omahazoo.com>> wrote:
We have assembled scaffolds from genomic reads of a primate sample and would like to annotate as many genes as possible with MAKER. Where is the best place to find an EST file to use with MAKER containing all of the non-redundant reference humans ESTs? Was trying to look around NCBI, Ensembl, and UCSC, but not sure what the ftp site, subdirectory, and file name would be for something like that.
Thanks
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20150518/6fcb49ff/attachment-0001.html>
More information about the maker-devel
mailing list