[maker-devel] Maker consensus
Carson Holt
carsonhh at gmail.com
Wed May 29 06:54:30 MDT 2013
Yes. That's ok, but you would get better performance by installing MPI and
using that. Alternatively just start maker several times in the same
directory without splitting the input fasta. You can usually start about
10-15 concurrent maker processes safely, but would still get better
performance with MPI.
--Carson
From: Diana LeDuc <diana_leduc at eva.mpg.de>
Reply-To: Diana LeDuc <diana_leduc at eva.mpg.de>
Date: Tuesday, 28 May, 2013 8:16 AM
To: <maker-devel at yandell-lab.org>, Carson Holt <carsonhh at gmail.com>
Cc: Gabriel Renaud <gabriel_renaud at eva.mpg.de>, Janet Kelso
<kelso at eva.mpg.de>
Subject: Re: [maker-devel] Maker consensus
Hi Carson,
I have now restarted maker with specification of augustus path and species.
I am trying to run it separately on each scaffold just to parallelise the
process and speed it up. It happens that some of the scaffolds which run ok
in the complete datatset now fail. Do you have any idea why this happens?
Is it ok to have a separate directory for each of the scaffolds and run
maker in each of them?
Thank you for the help.
Best regards,
Diana
On May 10, 2013 at 8:29 PM Carson Holt <carsonhh at gmail.com> wrote:
>
> You can use any species augustus already has. If it doesn't then you train
> it yourself. The species folder is pointed to by the AUGUSTUS_CONFIG_PATH
> environmental variable, and is usually /augusts/config/species
>
>
>
> Thanks,
>
> Carson
>
>
>
>
>
> From: Diana LeDuc < diana_leduc at eva.mpg.de>
> Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de>
> Date: Friday, 10 May, 2013 2:16 PM
> To: < maker-devel at yandell-lab.org>, Carson Holt < carsonhh at gmail.com>
> Cc: Torsten Schoeneberg < torsten.schoeneberg at medizin.uni-leipzig.de>,
> Gabriel Renaud < gabriel_renaud at eva.mpg.de>, Janet Kelso < kelso at eva.mpg.de>
> Subject: Re: [maker-devel] Maker consensus
>
>
>
>
>
> Hi Carson,
>
>
>
> In maker_exe.ctl I would have to provide the path to augustus. Augustus has a
> training set for chicken that I would use. Is it possible to specify the
> species i want to use, or the only way is training Augustus myself?
>
>
>
> Thank you!
>
>
>
> Best,
>
>
>
> Diana
>
> On May 10, 2013 at 7:51 PM Carson Holt < carsonhh at gmail.com> wrote:
>
>
>>
>> Ok. You just ran the evidence and didn't give a gene predictor. You need
>> to provide an HMM file for SNAP a species for augustus, or for rough
>> annotations you can set protein3genome=1 and est2genome=1. This will try and
>> generate models direct from the alignments.
>>
>>
>>
>> If you provide a gene predictor, then MAKER can talk to it about the
>> evidence alignments so it can make a best gene call for the region. Then
>> there will be gene/mRNA/exon model in the GFF3 file and entires in the
>> proteins.fasta and transcripts.fasta. If you need to train a predictor, you
>> can train SNAP using the maker2zff script and the SNAP documentation or maker
>> GMOD tutorial. If you want to train augustus Jason Stajich wrote an
>> excellent explanation as well as tools in a previous list message.
>>
>>
>>
>>
>> list msg - http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html
>>
>> Script is in this github repo -
>>
>> https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2a
>> ugustus_gbk.pl
>>
>>
>>
>> Thanks,
>>
>> Carson
>>
>>
>>
>>
>>
>>
>>
>> From: Diana LeDuc < diana_leduc at eva.mpg.de>
>> Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de>
>> Date: Friday, 10 May, 2013 1:41 PM
>> To: < maker-devel at yandell-lab.org>, Carson Holt < carsonhh at gmail.com>
>> Cc: Torsten Schoeneberg < torsten.schoeneberg at medizin.uni-leipzig.de>,
>> Gabriel Renaud < gabriel_renaud at eva.mpg.de>, Janet Kelso <
>> kelso at eva.mpg.de>
>> Subject: Re: [maker-devel] Maker consensus
>>
>>
>>
>>
>>
>> Hi Carson,
>>
>>
>>
>> Thank you for the quick answer.
>>
>> I ran gff3_merge to merge all the gff files and this resulted in a gff file,
>> which has these type of fields:
>>
>> scaffold32239 blastx protein_match 22905 34500 174 + .
>> ID=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML
>> 1-2039;
>> scaffold32239 blastx match_part 22905 23045 174 + .
>> ID=scaffold32239:hsp:2806529;Parent=scaffold32239:hit:976144;Name=ENSTGUG0000
>> 0000198|ENSTGUT00000000219|DSCAML1-2039;Target=ENSTGUG00000000198|ENSTGUT0000
>> 0000219|DSCAML1-2039 172 218;Gap=M47;
>>
>> In comparison to the dpp_contig test file, I am missing est2genome evidence,
>> most probably because my est data set is pretty poor. I have blastx and
>> protein2genome evidence though.
>>
>>
>>
>> My goal is to extract the genes that could be annotated on the scaffolds. In
>> the gff files the hits overlap most of the times, I can visualize this
>> properly in apollo: for example one scaffold hits DSCAML gene in both
>> zebrafinch and chicken, but extracting the coordinates between which this
>> scaffold fits this annotated gene is difficult from the gff. Manually
>> curating the genes is also not an option, since I am trying to do this for a
>> 1.7Gb genome.
>>
>>
>>
>> I hope this explains better what we are after.
>>
>>
>>
>> Thank you once again.
>>
>>
>>
>> Best regards,
>>
>>
>>
>> Diana
>> On May 10, 2013 at 6:13 PM Carson Holt < carsonhh at gmail.com> wrote:
>>
>>
>>>
>>> I'm sorry I don¹t' understand question 1. You are you missing resulting
>>> fasta files, correct? Did your resulting GFF3 file have any features of
>>> type "gene"? Did you run fasta_merge after running gff3_merge?
>>>
>>>
>>>
>>> Could you give me more details on what you are trying to do, so I can take
>>> a stab at question 2 as well.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Carson
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> From: Diana LeDuc < diana_leduc at eva.mpg.de>
>>> Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de>
>>> Date: Friday, 10 May, 2013 10:44 AM
>>> To: < maker-devel at yandell-lab.org>
>>> Cc: Gabriel Renaud < gabriel_renaud at eva.mpg.de>, Janet Kelso <
>>> kelso at eva.mpg.de>, Torsten Schoeneberg <
>>> torsten.schoeneberg at medizin.uni-leipzig.de>
>>> Subject: [maker-devel] Maker consensus
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Dear maker developers,
>>>
>>>
>>> I am a phD student working on de novo assembly and annotation of a bird
>>> genome. I used Maker as annotation pipeline, which ran very well, and I
>>> obtained different annotations with evidence from Augustus gene predictor,
>>> small EST dataset from my organism and protein sequences from chicken,
>>> turkey and zebrafinch. I could combine the different gff files from
>>> different scaffolds into one gff file with annotations for the entire
>>> genome.
>>>
>>>
>>> I now have two questions:
>>>
>>>
>>> 1. What could be the reason that I haven't gotten the protein.fasta and
>>> trancript.fasta files
>>>
>>>
>>> 2. How can I obtain a consensus gene list of different evidences from maker?
>>> What I would actually need is the scaffold, coordinates and annotation (gene
>>> name) according to the 3 other bird species.
>>> Thank you in advance.
>>>
>>>
>>>
>>> Best regards,
>>>
>>>
>>>
>>> Diana Le Duc
>>>
>>>
>>>
>>> --
>>>
>>> Max Planck Institute for Evolutionary Anthropology
>>> Department of Evolutionary Genetics
>>> Deutscher Platz 6
>>> D-04103 Leipzig
>>>
>>> Phone +49 (0)341-3550-554
>>> www.eva.mpg.de <http://www.eva.mpg.de>
>>>
>>>
>>> _______________________________________________ maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>>
>>
>>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20130529/0fa538bf/attachment-0003.html>
More information about the maker-devel
mailing list