[maker-devel] maker-devel Digest, Vol 45, Issue 15
Carson Holt
carsonhh at gmail.com
Fri Mar 2 09:05:49 MST 2012
There is more than one way to train your predictors. If you have protein
sequence, you can run MAKER with the protein2genome=1 option set which
will generate preliminary gene models based on protein homology alone, and
those results can be used to train SNAP and augustus. You can also train
SNAP independently using CEGMA - a program from the same group as SNAP.
It finds a small set of core genes that should be in all eukaryotic
genomes.
For me this is more of a question of speed. Using ESTs that are not from
the same organism has caveats. If the organisms are very closely related
(example: chimp and human), you can do a direct alignment in nucleotide
space. But if they are not (example: human and mouse), you can use
MAKER's alt_est option and MAKER will align in translated space using
TBLASTX. TBLASTX is mind numbingly slow, taking about 20x longer than a
direct EST alignment, and should be avoided when possible.
So to summarize, if you have ESTs from your organism, use those first. If
not, then use proteins. Finally use alternate organism ESTs if that's all
you have. Of course you can also supply all three, but I normally only do
that for the final run. I limit the dataset for the training rounds.
After using MAKER's results for training SNAP/augustus etc., just supply
MAKER with the resulting training HMM and run again in the same directory.
Why in the same directory? So that you can keep the evidence files you
used on the first run. MAKER will see that the only thing that changed
was the HMM (so it won't have to rerun any of the alignments). It will
just add the predictions, interpret them in light of the evidence, and
produce new output. So the first run is slow, and the second is fast
(because MAKER takes advantage of archived BLAST, RepeatMasker, and
Exonerate results).
Thanks,
Carson
On 12-03-02 2:26 AM, "Maria Joana F. B. A. Guimarães"
<joana.guimaraes at inrb.pt> wrote:
>Hi
>
>About this problem I solve it removing ab-blast from the path. I'm now
>using ncbi+. Now I have another one. I read the articles on Maker to
>better understand the procedures. But I'm having some doubts. My data are:
>
>EST from my organism J
>genome sequence from organism A
>EST from organism R (annotated)
>
>My organism is closer to R than to A (but there isn't genome sequence for
>organism R). Should I run my EST against A or should I train maker with A
>and R and then use the result on my organism? If so, how can I do it? It
>isn't clear on your article.
>Thanks for all your help.
>
>Joana
>
>-----Original Message-----
>From: maker-devel-bounces at yandell-lab.org
>[mailto:maker-devel-bounces at yandell-lab.org] On Behalf Of
>maker-devel-request at yandell-lab.org
>Sent: terça-feira, 28 de Fevereiro de 2012 19:00
>To: maker-devel at yandell-lab.org
>Subject: maker-devel Digest, Vol 45, Issue 15
>
>Send maker-devel mailing list submissions to
> maker-devel at yandell-lab.org
>
>To subscribe or unsubscribe via the World Wide Web, visit
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>or, via email, send a message with subject or body 'help' to
> maker-devel-request at yandell-lab.org
>
>You can reach the person managing the list at
> maker-devel-owner at yandell-lab.org
>
>When replying, please edit your Subject line so it is more specific
>than "Re: Contents of maker-devel digest..."
>
>
>Today's Topics:
>
> 1. maker run error (Maria Joana F. B. A. Guimar?es)
> 2. Re: maker run error (Carson Holt)
>
>
>----------------------------------------------------------------------
>
>Message: 1
>Date: Tue, 28 Feb 2012 10:25:24 -0000
>From: Maria Joana F. B. A. Guimar?es <joana.guimaraes at inrb.pt>
>To: <maker-devel at yandell-lab.org>
>Subject: [maker-devel] maker run error
>Message-ID:
> <F5B5DE3EA7B4FD48A549687F860198325F5483 at INRBMSMXBE.INRB.PT>
>Content-Type: text/plain; charset="iso-8859-1"
>
>Hi
>
>I managed to install MAKER and everything looks OK. I'm trying to run
>maker with the examples but it keeps giving me this error:
>
>bioinf at linux-hdoc:~/Desktop/TEMP> maker
>
>STATUS: Parsing control files...
>
>STATUS: Processing and indexing input FASTA files...
>
>STATUS: Setting up database for any GFF3 input...
>
>A data structure will be created for you at:
>
>/home/bioinf/Desktop/TEMP/dpp_contig.maker.output/dpp_contig_datastore
>
>
>
>To access files for individual sequences use the datastore index:
>
>/home/bioinf/Desktop/TEMP/dpp_contig.maker.output/dpp_contig_master_datast
>ore_index.log
>
>
>
>STATUS: Now running MAKER...
>
>
>
>
>
>
>
>--Next Contig--
>
>
>
>#---------------------------------------------------------------------
>
>Now starting the contig!!
>
>SeqID: contig-dpp-500-500
>
>Length: 32156
>
>#---------------------------------------------------------------------
>
>
>
>
>
>running repeat masker.
>
>#--------- command -------------#
>
>Widget::RepeatMasker:
>
>cd /tmp/maker_eiYoLm; /home/bioinf/maker/exe/RepeatMasker/RepeatMasker
>/home/bioinf/Desktop/TEMP/dpp_contig.maker.output/dpp_contig_datastore/05/
>1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all
>.rb -species all -dir
>/home/bioinf/Desktop/TEMP/dpp_contig.maker.output/dpp_contig_datastore/05/
>1F/contig-dpp-500-500//theVoid.contig-dpp-500-500 -pa 1
>
>#-------------------------------#
>
>processing output:
>
>cycle 1
>
>cycle 2
>
>cycle 3
>
>cycle 4
>
>cycle 5
>
>cycle 6
>
>cycle 7
>
>cycle 8
>
>cycle 9
>
>cycle 10
>
>Generating output...
>
>masking
>
>done
>
>formating database...
>
>#--------- command -------------#
>
>Widget::formater:
>
>/home/bioinf/maker/bin/../exe/blast/bin/makeblastdb -dbtype prot -in
>/tmp/maker_eiYoLm/te_proteins%2Efasta.mpi.10.0
>
>#-------------------------------#
>
>running blast search.
>
>#--------- command -------------#
>
>Widget::blastx:
>
>/home/bioinf/maker/bin/../exe/ab-blast/blastx -db
>/tmp/maker_eiYoLm/te_proteins%2Efasta.mpi.10.0 -query
>/tmp/maker_eiYoLm/rank0/contig-dpp-500-500.0 -num_alignments 10000
>-num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000
>-num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out
>/home/bioinf/Desktop/TEMP/dpp_contig.maker.output/dpp_contig_datastore/05/
>1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.te_
>proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeat
>runner
>
>#-------------------------------#
>
>FATAL: Argument 1 ("-db") is not recognized or is improperly formed.
>
>EXIT CODE 5
>
>FATAL: Argument 1 ("-db") is not recognized or is improperly formed.
>
>EXIT CODE 5
>
>ERROR: BLASTX failed
>
>ERROR: Failed while doing blastx repeats
>
>ERROR: Chunk failed at level:1, tier_type:1
>
>FAILED CONTIG:contig-dpp-500-500
>
>
>
>ERROR: Chunk failed at level:2, tier_type:0
>
>FAILED CONTIG:contig-dpp-500-500
>
>
>
>
>
>
>
>
>
>--Next Contig--
>
>
>
>Processing run.log file...
>
>MAKER WARNING: The file
>dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//the
>Void.contig-dpp-500-500/contig-dpp-500-500.0.te_proteins%2Efasta.repeatrun
>ner
>
>did not finish on the last run and must be erased
>
>#---------------------------------------------------------------------
>
>Now retrying the contig!!
>
>SeqID: contig-dpp-500-500
>
>Length: 32156
>
>Tries: 2!!
>
>#---------------------------------------------------------------------
>
>
>
>
>
>re reading repeat masker report.
>
>/home/bioinf/Desktop/TEMP/dpp_contig.maker.output/dpp_contig_datastore/05/
>1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all
>.rb.out
>
>running blast search.
>
>#--------- command -------------#
>
>Widget::blastx:
>
>/home/bioinf/maker/bin/../exe/ab-blast/blastx -db
>/tmp/maker_eiYoLm/te_proteins%2Efasta.mpi.10.0 -query
>/tmp/maker_eiYoLm/rank0/contig-dpp-500-500.0 -num_alignments 10000
>-num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000
>-num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out
>/home/bioinf/Desktop/TEMP/dpp_contig.maker.output/dpp_contig_datastore/05/
>1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.te_
>proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeat
>runner
>
>#-------------------------------#
>
>FATAL: Argument 1 ("-db") is not recognized or is improperly formed.
>
>EXIT CODE 5
>
>FATAL: Argument 1 ("-db") is not recognized or is improperly formed.
>
>EXIT CODE 5
>
>ERROR: BLASTX failed
>
>ERROR: Failed while doing blastx repeats
>
>ERROR: Chunk failed at level:1, tier_type:1
>
>FAILED CONTIG:contig-dpp-500-500
>
>
>
>ERROR: Chunk failed at level:2, tier_type:0
>
>FAILED CONTIG:contig-dpp-500-500
>
>
>
>
>
>
>
>
>
>--Next Contig--
>
>
>
>Processing run.log file...
>
>MAKER WARNING: The file
>dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//the
>Void.contig-dpp-500-500/contig-dpp-500-500.0.te_proteins%2Efasta.repeatrun
>ner
>
>did not finish on the last run and must be erased
>
>
>
>
>
>Maker is now finished!!!
>
>
>
>bioinf at linux-hdoc:~/Desktop/TEMP>
>
>
>
>
>Can you please help me?
>Thanks
>
>Joana
>-------------- next part --------------
>An HTML attachment was scrubbed...
>URL:
><http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachme
>nts/20120228/8e6a3106/attachment-0001.html>
>
>------------------------------
>
>Message: 2
>Date: Tue, 28 Feb 2012 07:48:09 -0500
>From: Carson Holt <carsonhh at gmail.com>
>To: "Maria Joana F. B. A. =?ISO-8859-1?B?R3VpbWFy42Vz?="
> <joana.guimaraes at inrb.pt>, <maker-devel at yandell-lab.org>
>Subject: Re: [maker-devel] maker run error
>Message-ID: <CB7236CE.88F9%carsonhh at gmail.com>
>Content-Type: text/plain; charset="iso-8859-1"
>
>You're using ABBlast. MAKER doesn't support it yet. Maybe now would be a
>good time to do it though. It would take about 10 hours to implement and
>test. I could probably look into it this weekend.
>
>Thanks,
>Carson
>
>From: "Maria Joana F. B. A. Guimar?es" <joana.guimaraes at inrb.pt>
>Date: Tue, 28 Feb 2012 10:25:24 -0000
>To: <maker-devel at yandell-lab.org>
>Subject: [maker-devel] maker run error
>
>maker run error
>Hi
>
>I managed to install MAKER and everything looks OK. I'm trying to run
>maker
>with the examples but it keeps giving me this error:
>
>bioinf at linux-hdoc:~/Desktop/TEMP> maker
>
>STATUS: Parsing control files...
>
>STATUS: Processing and indexing input FASTA files...
>
>STATUS: Setting up database for any GFF3 input...
>
>A data structure will be created for you at:
>
>/home/bioinf/Desktop/TEMP/dpp_contig.maker.output/dpp_contig_datastore
>
>
>
>To access files for individual sequences use the datastore index:
>
>/home/bioinf/Desktop/TEMP/dpp_contig.maker.output/dpp_contig_master_datast
>or
>e_index.log
>
>
>
>STATUS: Now running MAKER...
>
>
>
>
>
>
>
>--Next Contig--
>
>
>
>#---------------------------------------------------------------------
>
>Now starting the contig!!
>
>SeqID: contig-dpp-500-500
>
>Length: 32156
>
>#---------------------------------------------------------------------
>
>
>
>
>
>running repeat masker.
>
>#--------- command -------------#
>
>Widget::RepeatMasker:
>
>cd /tmp/maker_eiYoLm; /home/bioinf/maker/exe/RepeatMasker/RepeatMasker
>/home/bioinf/Desktop/TEMP/dpp_contig.maker.output/dpp_contig_datastore/05/
>1F
>/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.r
>b
>-species all -dir
>/home/bioinf/Desktop/TEMP/dpp_contig.maker.output/dpp_contig_datastore/05/
>1F
>/contig-dpp-500-500//theVoid.contig-dpp-500-500 -pa 1
>
>#-------------------------------#
>
>processing output:
>
>cycle 1
>
>cycle 2
>
>cycle 3
>
>cycle 4
>
>cycle 5
>
>cycle 6
>
>cycle 7
>
>cycle 8
>
>cycle 9
>
>cycle 10
>
>Generating output...
>
>masking
>
>done
>
>formating database...
>
>#--------- command -------------#
>
>Widget::formater:
>
>/home/bioinf/maker/bin/../exe/blast/bin/makeblastdb -dbtype prot -in
>/tmp/maker_eiYoLm/te_proteins%2Efasta.mpi.10.0
>
>#-------------------------------#
>
>running blast search.
>
>#--------- command -------------#
>
>Widget::blastx:
>
>/home/bioinf/maker/bin/../exe/ab-blast/blastx -db
>/tmp/maker_eiYoLm/te_proteins%2Efasta.mpi.10.0 -query
>/tmp/maker_eiYoLm/rank0/contig-dpp-500-500.0 -num_alignments 10000
>-num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000
>-num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out
>/home/bioinf/Desktop/TEMP/dpp_contig.maker.output/dpp_contig_datastore/05/
>1F
>/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.te_pr
>ot
>eins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunn
>er
>
>#-------------------------------#
>
>FATAL: Argument 1 ("-db") is not recognized or is improperly formed.
>
>EXIT CODE 5
>
>FATAL: Argument 1 ("-db") is not recognized or is improperly formed.
>
>EXIT CODE 5
>
>ERROR: BLASTX failed
>
>ERROR: Failed while doing blastx repeats
>
>ERROR: Chunk failed at level:1, tier_type:1
>
>FAILED CONTIG:contig-dpp-500-500
>
>
>
>ERROR: Chunk failed at level:2, tier_type:0
>
>FAILED CONTIG:contig-dpp-500-500
>
>
>
>
>
>
>
>
>
>--Next Contig--
>
>
>
>Processing run.log file...
>
>MAKER WARNING: The file
>dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//the
>Vo
>id.contig-dpp-500-500/contig-dpp-500-500.0.te_proteins%2Efasta.repeatrunne
>r
>
>did not finish on the last run and must be erased
>
>#---------------------------------------------------------------------
>
>Now retrying the contig!!
>
>SeqID: contig-dpp-500-500
>
>Length: 32156
>
>Tries: 2!!
>
>#---------------------------------------------------------------------
>
>
>
>
>
>re reading repeat masker report.
>
>/home/bioinf/Desktop/TEMP/dpp_contig.maker.output/dpp_contig_datastore/05/
>1F
>/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.r
>b.
>out
>
>running blast search.
>
>#--------- command -------------#
>
>Widget::blastx:
>
>/home/bioinf/maker/bin/../exe/ab-blast/blastx -db
>/tmp/maker_eiYoLm/te_proteins%2Efasta.mpi.10.0 -query
>/tmp/maker_eiYoLm/rank0/contig-dpp-500-500.0 -num_alignments 10000
>-num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000
>-num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out
>/home/bioinf/Desktop/TEMP/dpp_contig.maker.output/dpp_contig_datastore/05/
>1F
>/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.te_pr
>ot
>eins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunn
>er
>
>#-------------------------------#
>
>FATAL: Argument 1 ("-db") is not recognized or is improperly formed.
>
>EXIT CODE 5
>
>FATAL: Argument 1 ("-db") is not recognized or is improperly formed.
>
>EXIT CODE 5
>
>ERROR: BLASTX failed
>
>ERROR: Failed while doing blastx repeats
>
>ERROR: Chunk failed at level:1, tier_type:1
>
>FAILED CONTIG:contig-dpp-500-500
>
>
>
>ERROR: Chunk failed at level:2, tier_type:0
>
>FAILED CONTIG:contig-dpp-500-500
>
>
>
>
>
>
>
>
>
>--Next Contig--
>
>
>
>Processing run.log file...
>
>MAKER WARNING: The file
>dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//the
>Vo
>id.contig-dpp-500-500/contig-dpp-500-500.0.te_proteins%2Efasta.repeatrunne
>r
>
>did not finish on the last run and must be erased
>
>
>
>
>
>Maker is now finished!!!
>
>
>
>bioinf at linux-hdoc:~/Desktop/TEMP>
>
>
>
>
>Can you please help me?
>Thanks
>
>Joana
>_______________________________________________ maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>-------------- next part --------------
>An HTML attachment was scrubbed...
>URL:
><http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachme
>nts/20120228/fb8824be/attachment-0001.html>
>
>------------------------------
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>End of maker-devel Digest, Vol 45, Issue 15
>*******************************************
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
More information about the maker-devel
mailing list