From carsonhh at gmail.com  Mon Dec  1 13:31:46 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 1 Dec 2014 12:31:46 -0700
Subject: [maker-devel] gff output
In-Reply-To: <5476ED52.3060902@gmail.com>
References: <5476ED52.3060902@gmail.com>
Message-ID: <5A861A9A-5348-44B5-B0F6-C9AF3AA1469E@gmail.com>

If you are using the gff3 directly produced by Augustus, it will be oddly structured and does not conform to the 'Canonical Gene? example given by the GFF3 format specification.  You have to make a couple of search and replace operations to make it work.

Also it would generally be better to let maker run augustus for you rather than providing it as GFF3. This is because you lose the hint feedback that maker provides augustus.  AS a result there will be no improvement made to the annotations beyond what augustus has already produced.

?Carson


> On Nov 27, 2014, at 2:22 AM, Muriel Gros-Balthazard <muriel.grosb at gmail.com> wrote:
> 
> Hello,
> 
> I have been using Maker to generate an annotation.
> I especially set these options:
> - est_gff with a list of transcripts.gff3 (Cufflinks output)
> - model_org=all
> - rmlib=allrepeats.lib
> - repeat_protein=te_prot.fasta
> - pred_gff= Augustus.gff3 (that I generated previously)
> 
> I obtain a gff file for each of my contigs.
> However, here are the three possibilities in the second column :
> # est_gff:cufflinks
> # repeatmasker
> # repeatrunner
> 
> I have no information about exons and introns.
> And I am wondering if the Augustus.gff3 was used...
> 
> On top of that, I forgot to set up pred_stats to 1.
> If I understand well, I can just change this in the ocntrol file, and run Maker again. Since there is the output with everything, it won't run again the prediction, only this option. Is that right ?
> 
> Thank you,
> 
> Muriel
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From Alice.Dennis at eawag.ch  Fri Dec 12 08:10:46 2014
From: Alice.Dennis at eawag.ch (Dennis, Alice)
Date: Fri, 12 Dec 2014 14:10:46 +0000
Subject: [maker-devel] iterative Maker2
Message-ID: <1FD5809847938F44B92893606806BD53600D845F@EE-MBX1.ee.emp-eaw.ch>

Hi all,

I am a relatively new user to Maker2, and I'm looking for advise on running many iterations of the same dataset in Maker2.

I have a relatively small genome (~124 MB) from a wasp that is assembled into ~1,500 scaffold. I have run several iterations of Maker2 by re-generating .hmms in SNAP and feeding them into the next round, and my gene predictions keep increasing (in number and in size).  The only thing that changes at each round is the .hmm.
This is the evidence that I give is:

-          de novo assembled ESTs from a different strain of the same species (70,000 contigs... I am currently working on improving this assembly with the hope that this will be helpful here)

-          610 proteins extracted from the genome scaffolds using CEGMA and HaMSTr

For my 1st iteration, I used the Nasonia .hmm from SNAP, and the est2genome/protein2genome option.

For the 2nd, 3rd and 4th rounds I have used .hmms generated from the previous round, all without the est2genome/protein2genome option. All other files are the same as in the original run.

As I understand it, after the second round, nothing should change in Maker2. But the differences are obvious between runs. Some entirely new exons are annotated. For example,  just counting "exon" in the .gff file gives me 73,000 after the third iteration and 96,000 after the fourth! Actually the biggest leap in this number is between the third and fourth round. I can also see that many features are longer when I look at the files in Geneious.

Is this sort of change possible after the second round of Maker2? Is there something I have done wrong in my runs, or am a understanding this output incorrectly?

Thank you,
Alice

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20141212/0df64686/attachment.html>

From carsonhh at gmail.com  Fri Dec 12 09:41:42 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 12 Dec 2014 08:41:42 -0700
Subject: [maker-devel] iterative Maker2
In-Reply-To: <1FD5809847938F44B92893606806BD53600D845F@EE-MBX1.ee.emp-eaw.ch>
References: <1FD5809847938F44B92893606806BD53600D845F@EE-MBX1.ee.emp-eaw.ch>
Message-ID: <7D42E0F3-B601-4D67-AF07-09C98469D8E5@gmail.com>

The gene models are actually produced by SNAP, Augustus, or whatever gene predictor you are using, so if you change the HMM every round, then the models will change too.  But I have one concern.  You are using a very sparse protein evidence dataset.  The protein dataset is very important to MAKER?s performance, and for itterative training of the ab initio predictors.  Normally after the second iteration, additional training should not be beneficial, but if you are getting wildly different results on 3rd and 4th round, then you probably aren?t getting sufficient good models to train with.

For a protein dataset you should be using the entire a proteome from a minimum of two related species and perhaps all of UniProt/Swiss-prot to get a broad protein database.  Don?t use the proteins extracted by CEGMA and HaMSTr.  CEGMA can be used to guide the first HMM creation (cegma2zff scrip that comes with MAEKR), but don?t give the proteins to MAKER as evidence, also the HaMSTr results will be redundant with the ESTs.  You need proteins from related species to look for homology not found in the EST dataset.

Also repeat masking is important for any genome and has a huge effect on ab initio predictor performance.  Make sure you run something like RepeatModeler to look for species specific repeats that will not already be in RepBase.  Then add those results to the rmlib= option in the maker control files.

Thanks,
Carson

 
> On Dec 12, 2014, at 7:10 AM, Dennis, Alice <Alice.Dennis at eawag.ch> wrote:
> 
> Hi all,
>  
> I am a relatively new user to Maker2, and I?m looking for advise on running many iterations of the same dataset in Maker2.
>  
> I have a relatively small genome (~124 MB) from a wasp that is assembled into ~1,500 scaffold. I have run several iterations of Maker2 by re-generating .hmms in SNAP and feeding them into the next round, and my gene predictions keep increasing (in number and in size).  The only thing that changes at each round is the .hmm.
> This is the evidence that I give is:
> -          de novo assembled ESTs from a different strain of the same species (70,000 contigs? I am currently working on improving this assembly with the hope that this will be helpful here)
> -          610 proteins extracted from the genome scaffolds using CEGMA and HaMSTr
>  
> For my 1st iteration, I used the Nasonia .hmm from SNAP, and the est2genome/protein2genome option.
>  
> For the 2nd, 3rd and 4th rounds I have used .hmms generated from the previous round, all without the est2genome/protein2genome option. All other files are the same as in the original run.
>  
> As I understand it, after the second round, nothing should change in Maker2. But the differences are obvious between runs. Some entirely new exons are annotated. For example,  just counting ?exon? in the .gff file gives me 73,000 after the third iteration and 96,000 after the fourth! Actually the biggest leap in this number is between the third and fourth round. I can also see that many features are longer when I look at the files in Geneious.
>  
> Is this sort of change possible after the second round of Maker2? Is there something I have done wrong in my runs, or am a understanding this output incorrectly?
>  
> Thank you, 
> Alice
>  
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20141212/0318a00e/attachment.html>

From tuanduonganh at gmail.com  Sun Dec 14 06:55:35 2014
From: tuanduonganh at gmail.com (Tuan Duong Anh)
Date: Sun, 14 Dec 2014 14:55:35 +0200
Subject: [maker-devel] Quality filter perl script
Message-ID: <CAPpYYpx_4TaQY+FKLBXH6L--aSKDRcsKfiDgDQHJagV080Rcwg@mail.gmail.com>

Hi all,

I successfully ran MAKER and now looking into rescuing rejected gene models
using protein domain evidence. I have obtained the tsv file from
interproscan and have also updated the GFF3 file with this result using
ipr_update_gff. In the next step I will need quality_filter.pl script to
generate the default and standard build, however this script is not
included in my version of MAKER. Do you know where can I get this script?

Thanks.
Tuan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20141214/bbdcca6b/attachment.html>

From michael.s.campbell1 at gmail.com  Mon Dec 15 14:13:29 2014
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Mon, 15 Dec 2014 13:13:29 -0700
Subject: [maker-devel] Quality filter perl script
In-Reply-To: <CAPpYYpx_4TaQY+FKLBXH6L--aSKDRcsKfiDgDQHJagV080Rcwg@mail.gmail.com>
References: <CAPpYYpx_4TaQY+FKLBXH6L--aSKDRcsKfiDgDQHJagV080Rcwg@mail.gmail.com>
Message-ID: <CAAi6vWXiSqS-Y=10BSOO_WYBcXvZ5XzT7psEgque1yz3M4z03g@mail.gmail.com>

Hi Tuan,

I've attached a copy of the quality filter script. I've removed the .pl
extension because some email services will not accept them.

Take care,
Mike

On Sun, Dec 14, 2014 at 5:55 AM, Tuan Duong Anh <tuanduonganh at gmail.com>
wrote:
>
> Hi all,
>
> I successfully ran MAKER and now looking into rescuing rejected gene
> models using protein domain evidence. I have obtained the tsv file from
> interproscan and have also updated the GFF3 file with this result using
> ipr_update_gff. In the next step I will need quality_filter.pl script to
> generate the default and standard build, however this script is not
> included in my version of MAKER. Do you know where can I get this script?
>
> Thanks.
> Tuan
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>

-- 
Michael Campbell MS, RD.
Doctoral Candidate
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
ph:585-3543
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20141215/7f57e7e4/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: quality_filter
Type: application/octet-stream
Size: 4597 bytes
Desc: not available
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20141215/7f57e7e4/attachment.obj>

From cognitiveshrapnel at gmail.com  Sat Dec 27 20:59:12 2014
From: cognitiveshrapnel at gmail.com (Justin Peyton)
Date: Sat, 27 Dec 2014 21:59:12 -0500
Subject: [maker-devel] openmpi instantly chokes on maker
Message-ID: <CAKJ_QP2KRmdVqSPAPKrTfek+_H9jOQUFwFg3t=WBnsJZBSAz-g@mail.gmail.com>

I am working on getting maker running on a system running ubuntu 14.04. I
have installed maker and it runs great on a small but real data set. When I
try it with openmpi with the exact same inputs, however, I get the below
error almost instantly.

STATUS: Parsing control files...
STATUS: Processing and indexing input FASTA files...
[molybdenum:23241] *** Process received signal ***
[molybdenum:23241] Signal: Segmentation fault (11)
[molybdenum:23241] Signal code: Address not mapped (1)
[molybdenum:23241] Failing at address: 0x50c
[molybdenum:23241] [ 0]
/lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30]
[molybdenum:23241] [ 1]
/usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7f99bd5155a2]
[molybdenum:23241] [ 2]
/lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30]
[molybdenum:23241] [ 3]
/lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7f99bd19fbad]
[molybdenum:23241] [ 4]
/usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7f99bcbcc156]
[molybdenum:23241] [ 5]
/usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7f99bcbc34bb]
[molybdenum:23241] [ 6]
/usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7f99bce6e97e]
[molybdenum:23241] [ 7]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f99bc944182]
[molybdenum:23241] [ 8]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f99bd1acefd]
[molybdenum:23241] *** End of error message ***
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
[molybdenum:23252] *** Process received signal ***
[molybdenum:23252] Signal: Segmentation fault (11)
[molybdenum:23252] Signal code: Address not mapped (1)
[molybdenum:23252] Failing at address: 0x50c
[molybdenum:23252] [ 0]
/lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30]
[molybdenum:23252] [ 1]
/usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7fb191f5e5a2]
[molybdenum:23252] [ 2]
/lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30]
[molybdenum:23252] [ 3]
/lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7fb191be8bad]
[molybdenum:23252] [ 4]
/usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7fb191615156]
[molybdenum:23252] [ 5]
/usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7fb19160c4bb]
[molybdenum:23252] [ 6]
/usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7fb1918b797e]
[molybdenum:23252] [ 7]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7fb19138d182]
[molybdenum:23252] [ 8]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fb191bf5efd]
[molybdenum:23252] *** End of error message ***
SIGTERM received
--------------------------------------------------------------------------
mpiexec noticed that process rank 2 with PID 23241 on node molybdenum
exited on signal 11 (Segmentation fault).


I have tried reinstalling both maker and openmpi. I have tried two
different versions of both maker and openmpi. I am curenlty working with
maker 2.31.6 and openmpi 1.8.3 because I have had those work together on
another system. I have triple checked that LD_PRELOAD is properly set. I
have a feeling that I am pissing something small. I appreciate all the help.

Justin Peyton
The Ohio State University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20141227/873f3c68/attachment.html>

From harini1981 at gmail.com  Tue Dec 23 04:31:46 2014
From: harini1981 at gmail.com (Harini Vinod)
Date: Tue, 23 Dec 2014 16:01:46 +0530
Subject: [maker-devel] regd aed score plot
Message-ID: <CA++b=XvN-zF9F4frzC-eyzyJUBucXNWFGAtZs-zQ822HRwSufw@mail.gmail.com>

Dear Concern,
I had used the following script from MAKER-DEVEL
AED_cdf_generator.pl to obtain the plot

I get the following error
readline() on closed filehandle GEN0 at AED_cdf_generator.pl line 69.
AED    scaffold11.gff,    scaffold7.gff
Use of uninitialized value $total in division (/) at AED_cdf_generator.pl
line 43.
Illegal division by zero at AED_cdf_generator.pl line 43.

Can you kindly suggest what could have gone wrong???
regards
Harini

-- 
K.Harini
PhD scholar
Lab-25
NCBS,GKVK
Bangalore
560065
harinik at ncbs.res.in
+91 9535292110
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20141223/5b82e6c1/attachment.html>

From michael.s.campbell1 at gmail.com  Mon Dec 29 13:33:49 2014
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Mon, 29 Dec 2014 12:33:49 -0700
Subject: [maker-devel] regd aed score plot
In-Reply-To: <CA++b=XvN-zF9F4frzC-eyzyJUBucXNWFGAtZs-zQ822HRwSufw@mail.gmail.com>
References: <CA++b=XvN-zF9F4frzC-eyzyJUBucXNWFGAtZs-zQ822HRwSufw@mail.gmail.com>
Message-ID: <CAAi6vWVQrKX01jKSAVrCCeopOuMYOwrGVL2TRCew+oGE=fOpoQ@mail.gmail.com>

I think I fixed this in a recent svn commit. Try the attached version of
the script and let me know if it works.

Thanks,
Mike

On Tue, Dec 23, 2014 at 3:31 AM, Harini Vinod <harini1981 at gmail.com> wrote:

>
> Dear Concern,
> I had used the following script from MAKER-DEVEL
> AED_cdf_generator.pl to obtain the plot
>
> I get the following error
> readline() on closed filehandle GEN0 at AED_cdf_generator.pl line 69.
> AED    scaffold11.gff,    scaffold7.gff
> Use of uninitialized value $total in division (/) at AED_cdf_generator.pl
> line 43.
> Illegal division by zero at AED_cdf_generator.pl line 43.
>
> Can you kindly suggest what could have gone wrong???
> regards
> Harini
>
> --
> K.Harini
> PhD scholar
> Lab-25
> NCBS,GKVK
> Bangalore
> 560065
> harinik at ncbs.res.in
> +91 9535292110
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>


-- 
Michael Campbell MS, RD.
Doctoral Candidate
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
ph:585-3543
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20141229/affaccd8/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: AED_cdf_generator.pl.gz
Type: application/x-gzip
Size: 1116 bytes
Desc: not available
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20141229/affaccd8/attachment.gz>

From xvazquezc at gmail.com  Mon Dec 29 22:00:56 2014
From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=)
Date: Tue, 30 Dec 2014 15:00:56 +1100
Subject: [maker-devel] few basic questions
Message-ID: <CAL0hg4Epkbe29L5sqkdhyMucT9eSZ_Dp=_UmGqw=HBsCrQ6t-g@mail.gmail.com>

Hi there,

I'm a newbie dealing with genomes and I've been trying to start using Maker
for the annotation. I understand the base concepts but I have doubts about
the correct steps to follow. I've being through the 2014 video tutorial and
searched for detailed steps and I still have some question, maybe a bit
obvious tough...

I have to annotate two fungal genomes and I only have the DNA assembly (no
EST or protein files).
I understand that lacking of EST and protein files I should provide them as
alt-est and protein from the closest species I can, but is it enough with
one EST file from one organism for the alt-est?

Regarding the steps to process would this be correct?:

   1. run Maker with the genome, alt-est and protein files, with
   est2genome=1 and protein2genome=1 (softmask=1 ?)
   2. with this first output, create the hmm file for SNAP based on the
   first output
   3. Set est2genome=0 and protein2genome=0, set the snaphmm file and run
   again (using -base option)
   4. repeat2 and 3 as necessary*

*How do you know when you get to the point where no more refinement is
possible? Would that the final model? It should be based on the AED scores?
How can I get it without looking into individual sequence headings? Also,
do you perform the bootstrapping on the same folder? In the tutorial
<http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014>I
saw different folders, (e.g. pyu_contig1, pyu_contig2) used on each
repetition, not sure if just for demonstration purposes or if it is the
proper way to go..

I'm trying to run also a gene prediction with Augustus and GeneMark. The
first run will include an already trained profile for Augustus and the
native hmm file of genemark-ES**. Do they need to repeat the prediction by
bootstrap like with SNAP? If so, do I need to generate new hmm files or
prediction models based on results?

**I have been trying to make the hmm file for genemark-ES using the gm_es.pl
script but no matter what parameters I use the cluster shut the job down as
it exceeds 128GB of memory in use. The genome I've been testing for this is
about 42Mbp in a roughly 40-50 MB fasta file

Thank you in advance,

Xabier

-- 
Xabier V?zquez Campos
*PhD Candidate*
Water Research Centre
School of Civil and Environmental Engineering
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20141230/7f403c04/attachment.html>

From carsonhh at gmail.com  Wed Dec 31 14:39:10 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 31 Dec 2014 13:39:10 -0700
Subject: [maker-devel] few basic questions
In-Reply-To: <CAL0hg4Epkbe29L5sqkdhyMucT9eSZ_Dp=_UmGqw=HBsCrQ6t-g@mail.gmail.com>
References: <CAL0hg4Epkbe29L5sqkdhyMucT9eSZ_Dp=_UmGqw=HBsCrQ6t-g@mail.gmail.com>
Message-ID: <BD84D7BA-F0CC-4534-B167-59DC30942E9E@gmail.com>

Hi Xabier,

See below ?> 


> I have to annotate two fungal genomes and I only have the DNA assembly (no EST or protein files). 
> I understand that lacking of EST and protein files I should provide them as alt-est and protein from the closest species I can, but is it enough with one EST file from one organism for the alt-est?

Provide alt-EST if you have ESTs from a closely relate species, but do not have the proteome for that species.  If you have the proteome, use that.  Both are aligned in amino acid space, and provide the same hint information, the only difference being that alt-EST takes 10x longer because because noth target and query must be translated into all 6 reading frames.


> Regarding the steps to process would this be correct?:
> run Maker with the genome, alt-est and protein files, with est2genome=1 and protein2genome=1 (softmask=1 ?)
> with this first output, create the hmm file for SNAP based on the first output
> Set est2genome=0 and protein2genome=0, set the snaphmm file and run again (using -base option)
> repeat2 and 3 as necessary*
If you don?t have ESTs, don?t do est2genome (alt-ESTs don?t count).  Just do protein2genome.  In general to rounds of training is the maximum you should do.  At that point, ab initio predictions and hint based predictions will start to look like each other (so the ab initio models are doing well on their own).


> *How do you know when you get to the point where no more refinement is possible? Would that the final model? It should be based on the AED scores? How can I get it without looking into individual sequence headings? Also, do you perform the bootstrapping on the same folder? In the tutorial  <http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014>I saw different folders, (e.g. pyu_contig1, pyu_contig2) used on each repetition, not sure if just for demonstration purposes or if it is the proper way to go..

Run it in the same folder.  This will allow MAKER to recycle raw reports from BALST etc. from the previous run (i.e. MAKER will run faster).  In the tutorial we ran separately just to be able to open old results and compare.


> I'm trying to run also a gene prediction with Augustus and GeneMark. The first run will include an already trained profile for Augustus and the native hmm file of genemark-ES**. Do they need to repeat the prediction by bootstrap like with SNAP? If so, do I need to generate new hmm files or prediction models based on results?

You do with Augustus, but not GeneMark which does self training.


> **I have been trying to make the hmm file for genemark-ES using the gm_es.pl <http://gm_es.pl/> script but no matter what parameters I use the cluster shut the job down as it exceeds 128GB of memory in use. The genome I've been testing for this is about 42Mbp in a roughly 40-50 MB fasta file

You can train GeneMark with just part of the genome. Try using 10Mb made up of the longest contigs.  Also I only recommend using GeneMark on Fungi, it tends to not work well on organisms with more complex intron/exon structures. Also you should build a species specific repeat database to supplement RepeatMaskers internal libraries.  I?d recommend using Repeat Modeler.


Thanks,
Carson

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20141231/bf5d22d0/attachment.html>

From carsonhh at gmail.com  Wed Dec 31 14:42:38 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 31 Dec 2014 13:42:38 -0700
Subject: [maker-devel] openmpi instantly chokes on maker
In-Reply-To: <CAKJ_QP2KRmdVqSPAPKrTfek+_H9jOQUFwFg3t=WBnsJZBSAz-g@mail.gmail.com>
References: <CAKJ_QP2KRmdVqSPAPKrTfek+_H9jOQUFwFg3t=WBnsJZBSAz-g@mail.gmail.com>
Message-ID: <6BEE4837-A3E1-4FBF-AD18-4FBFD479BB2A@gmail.com>

Hi Justin,

You need to set LD_PRELOAD to the proper location and add the '-mca btl ^openib? flag to your command line.


The following is from the INSTALL file that should be included with MAKER ?>

If using OpenMPI, make sure to set LD_PRELOAD to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/location/of/openmpi/lib/libmpi.so).


1.  Say yes to the 'configure for MPI' question when running 'perl Build.PL? in step 1 of the EASY INSTALL.

2.  Give path to 'mpicc'. Note to make sure you do not give the path to ?mpicc' from another MPI flavor that might be installed on your system.

3.  Give path to the folder containing 'mpi,h'. Note to make sure you do not give the path to a folder from another MPI flavor that might be installed on your system. Mixing MPI flavors for 'mpicc' and 'mpi.h' will cause failures. Make sure to read and confirm the auto-detected paths.

4.  Finish installation according to steps 2-4 of the EASY INSTALL

    Note: For OpenMPI you may also want to set OMPI_MCA_mpi_warn_on_fork=0 in your ~/.bash_profile to turn off certain nonfatal warnings.

    Note: If jobs hang or freeze when using mpiexec under OpenMPI try adding the '-mca btl ^openib' flag to mpiexec command when running MAKER.

        Example: mpiexec -mca btl ^openib -n 20 maker


Thanks,
Carson


> On Dec 27, 2014, at 7:59 PM, Justin Peyton <cognitiveshrapnel at gmail.com> wrote:
> 
> I am working on getting maker running on a system running ubuntu 14.04. I have installed maker and it runs great on a small but real data set. When I try it with openmpi with the exact same inputs, however, I get the below error almost instantly. 
> 
> STATUS: Parsing control files...
> STATUS: Processing and indexing input FASTA files...
> [molybdenum:23241] *** Process received signal ***
> [molybdenum:23241] Signal: Segmentation fault (11)
> [molybdenum:23241] Signal code: Address not mapped (1)
> [molybdenum:23241] Failing at address: 0x50c
> [molybdenum:23241] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30]
> [molybdenum:23241] [ 1] /usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7f99bd5155a2]
> [molybdenum:23241] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30]
> [molybdenum:23241] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7f99bd19fbad]
> [molybdenum:23241] [ 4] /usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7f99bcbcc156]
> [molybdenum:23241] [ 5] /usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7f99bcbc34bb]
> [molybdenum:23241] [ 6] /usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7f99bce6e97e]
> [molybdenum:23241] [ 7] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f99bc944182]
> [molybdenum:23241] [ 8] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f99bd1acefd]
> [molybdenum:23241] *** End of error message ***
> SIGTERM received
> SIGTERM received
> SIGTERM received
> SIGTERM received
> SIGTERM received
> [molybdenum:23252] *** Process received signal ***
> [molybdenum:23252] Signal: Segmentation fault (11)
> [molybdenum:23252] Signal code: Address not mapped (1)
> [molybdenum:23252] Failing at address: 0x50c
> [molybdenum:23252] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30]
> [molybdenum:23252] [ 1] /usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7fb191f5e5a2]
> [molybdenum:23252] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30]
> [molybdenum:23252] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7fb191be8bad]
> [molybdenum:23252] [ 4] /usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7fb191615156]
> [molybdenum:23252] [ 5] /usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7fb19160c4bb]
> [molybdenum:23252] [ 6] /usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7fb1918b797e]
> [molybdenum:23252] [ 7] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7fb19138d182]
> [molybdenum:23252] [ 8] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fb191bf5efd]
> [molybdenum:23252] *** End of error message ***
> SIGTERM received
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 2 with PID 23241 on node molybdenum exited on signal 11 (Segmentation fault).
> 
> 
> I have tried reinstalling both maker and openmpi. I have tried two different versions of both maker and openmpi. I am curenlty working with maker 2.31.6 and openmpi 1.8.3 because I have had those work together on another system. I have triple checked that LD_PRELOAD is properly set. I have a feeling that I am pissing something small. I appreciate all the help.
> 
> Justin Peyton
> The Ohio State University
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20141231/09659eb9/attachment.html>

From jerryzhaosjtu at gmail.com  Wed Dec 31 19:48:29 2014
From: jerryzhaosjtu at gmail.com (=?UTF-8?B?6LW16LaK?=)
Date: Thu, 1 Jan 2015 09:48:29 +0800
Subject: [maker-devel] some problems using MAKER
Message-ID: <CAMxJ+aewxN6eM_eQDfNK6p4Vn1c90Jwqd8EJokaAZPUbB1dDMA@mail.gmail.com>

Hi all,

Recently I'm using MAKER to annotate a single chromosome of rice as a
pre-experiment. And I'm confronting some problems. After the annotation
when I run the evaluation of eval between my result and gold standard, the
gene sensitivity&specificity is only around 20%. And after I added the gff3
file maker made itself to run maker again, I found that the result is worse
than 20%.

My input is a Trinity-processed RNA-seq file and a protein file.  I chose
snap, augustus and genemark as ab initio predictors.

I paste my maker_opts.ctl here:

#-----Genome (these are always required)
genome=chr12.fasta #genome sequence (fasta file or fasta embeded in GFF3
file)
organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic

#-----Re-annotation Using MAKER Derived GFF3
maker_gff=chr12.gff #MAKER derived GFF3 file
est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no
altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no

#-----EST Evidence (for best results provide a file for at least one)
est=rna-seq_trinity.fasta #set of ESTs or assembled mRNA-seq in fasta format
altest= #EST/cDNA sequence file in fasta format from an alternate organism
est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
altest_gff= #aligned ESTs from a closly relate species in GFF3 format

#-----Protein Homology Evidence (for best results provide a file for at
least one)
protein=Osativa_193_peptide.fa  #protein sequence file in fasta format
(i.e. from mutiple oransisms)
protein_gff= #aligned protein homology evidence from an external GFF3 file

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org=Rice #select a model organism for RepBase masking in RepeatMasker
rmlib= #provide an organism specific repeat library in fasta format for
RepeatMasker
repeat_protein= #provide a fasta file of transposable element proteins for
RepeatRunner
rm_gff= #pre-identified repeat elements from an external GFF3 file
prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change
this), 1 = yes, 0 = no
softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg
and dust filtering)

#-----Gene Prediction
snaphmm=rice #SNAP HMM file
gmhmm=/lustre/home/clswcc/yzhao/MAKER/maker/exe/genemark_hmm_euk_linux_64/ehmm/o_sativa.mod
#GeneMark HMM file
augustus_species=arabidopsis #Augustus gene prediction species model
fgenesh_par_file= #FGENESH parameter file
pred_gff=augus.gff3 #ab-initio predictions from an external GFF3 file
model_gff= #annotated gene models from an external GFF3 file (annotation
pass-through)
est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no
trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no


snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
unmask=1 #also run ab-initio prediction programs on unmasked sequence, 1 =
yes, 0 = no

#-----Other Annotation Feature Types (features MAKER doesn't recognize)
other_gff= #extra features to pass-through to final MAKER generated GFF3
file

#-----External Application Behavior Options
alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST
databases
cpus=16 #max number of cpus to use in BLAST and RepeatMasker (not for MPI,
leave 1 when using MPI)


Could you help me? Thank you !!!


-- 

*Yue Zhao (Jerry)*

Bachelor Candidate of Plant Biotechnology

Researcher in UCLA-CSST program

Shanghai Jiao Tong University, Shanghai

*jerryzhaosjtu at gmail.com <jerryzhaosjtu at gmail.com>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20150101/64655e98/attachment.html>

From carsonhh at gmail.com  Mon Dec  1 12:31:46 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 1 Dec 2014 12:31:46 -0700
Subject: [maker-devel] gff output
In-Reply-To: <5476ED52.3060902@gmail.com>
References: <5476ED52.3060902@gmail.com>
Message-ID: <5A861A9A-5348-44B5-B0F6-C9AF3AA1469E@gmail.com>

If you are using the gff3 directly produced by Augustus, it will be oddly structured and does not conform to the 'Canonical Gene? example given by the GFF3 format specification.  You have to make a couple of search and replace operations to make it work.

Also it would generally be better to let maker run augustus for you rather than providing it as GFF3. This is because you lose the hint feedback that maker provides augustus.  AS a result there will be no improvement made to the annotations beyond what augustus has already produced.

?Carson


> On Nov 27, 2014, at 2:22 AM, Muriel Gros-Balthazard <muriel.grosb at gmail.com> wrote:
> 
> Hello,
> 
> I have been using Maker to generate an annotation.
> I especially set these options:
> - est_gff with a list of transcripts.gff3 (Cufflinks output)
> - model_org=all
> - rmlib=allrepeats.lib
> - repeat_protein=te_prot.fasta
> - pred_gff= Augustus.gff3 (that I generated previously)
> 
> I obtain a gff file for each of my contigs.
> However, here are the three possibilities in the second column :
> # est_gff:cufflinks
> # repeatmasker
> # repeatrunner
> 
> I have no information about exons and introns.
> And I am wondering if the Augustus.gff3 was used...
> 
> On top of that, I forgot to set up pred_stats to 1.
> If I understand well, I can just change this in the ocntrol file, and run Maker again. Since there is the output with everything, it won't run again the prediction, only this option. Is that right ?
> 
> Thank you,
> 
> Muriel
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From Alice.Dennis at eawag.ch  Fri Dec 12 07:10:46 2014
From: Alice.Dennis at eawag.ch (Dennis, Alice)
Date: Fri, 12 Dec 2014 14:10:46 +0000
Subject: [maker-devel] iterative Maker2
Message-ID: <1FD5809847938F44B92893606806BD53600D845F@EE-MBX1.ee.emp-eaw.ch>

Hi all,

I am a relatively new user to Maker2, and I'm looking for advise on running many iterations of the same dataset in Maker2.

I have a relatively small genome (~124 MB) from a wasp that is assembled into ~1,500 scaffold. I have run several iterations of Maker2 by re-generating .hmms in SNAP and feeding them into the next round, and my gene predictions keep increasing (in number and in size).  The only thing that changes at each round is the .hmm.
This is the evidence that I give is:

-          de novo assembled ESTs from a different strain of the same species (70,000 contigs... I am currently working on improving this assembly with the hope that this will be helpful here)

-          610 proteins extracted from the genome scaffolds using CEGMA and HaMSTr

For my 1st iteration, I used the Nasonia .hmm from SNAP, and the est2genome/protein2genome option.

For the 2nd, 3rd and 4th rounds I have used .hmms generated from the previous round, all without the est2genome/protein2genome option. All other files are the same as in the original run.

As I understand it, after the second round, nothing should change in Maker2. But the differences are obvious between runs. Some entirely new exons are annotated. For example,  just counting "exon" in the .gff file gives me 73,000 after the third iteration and 96,000 after the fourth! Actually the biggest leap in this number is between the third and fourth round. I can also see that many features are longer when I look at the files in Geneious.

Is this sort of change possible after the second round of Maker2? Is there something I have done wrong in my runs, or am a understanding this output incorrectly?

Thank you,
Alice

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141212/0df64686/attachment-0001.html>

From carsonhh at gmail.com  Fri Dec 12 08:41:42 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 12 Dec 2014 08:41:42 -0700
Subject: [maker-devel] iterative Maker2
In-Reply-To: <1FD5809847938F44B92893606806BD53600D845F@EE-MBX1.ee.emp-eaw.ch>
References: <1FD5809847938F44B92893606806BD53600D845F@EE-MBX1.ee.emp-eaw.ch>
Message-ID: <7D42E0F3-B601-4D67-AF07-09C98469D8E5@gmail.com>

The gene models are actually produced by SNAP, Augustus, or whatever gene predictor you are using, so if you change the HMM every round, then the models will change too.  But I have one concern.  You are using a very sparse protein evidence dataset.  The protein dataset is very important to MAKER?s performance, and for itterative training of the ab initio predictors.  Normally after the second iteration, additional training should not be beneficial, but if you are getting wildly different results on 3rd and 4th round, then you probably aren?t getting sufficient good models to train with.

For a protein dataset you should be using the entire a proteome from a minimum of two related species and perhaps all of UniProt/Swiss-prot to get a broad protein database.  Don?t use the proteins extracted by CEGMA and HaMSTr.  CEGMA can be used to guide the first HMM creation (cegma2zff scrip that comes with MAEKR), but don?t give the proteins to MAKER as evidence, also the HaMSTr results will be redundant with the ESTs.  You need proteins from related species to look for homology not found in the EST dataset.

Also repeat masking is important for any genome and has a huge effect on ab initio predictor performance.  Make sure you run something like RepeatModeler to look for species specific repeats that will not already be in RepBase.  Then add those results to the rmlib= option in the maker control files.

Thanks,
Carson

 
> On Dec 12, 2014, at 7:10 AM, Dennis, Alice <Alice.Dennis at eawag.ch> wrote:
> 
> Hi all,
>  
> I am a relatively new user to Maker2, and I?m looking for advise on running many iterations of the same dataset in Maker2.
>  
> I have a relatively small genome (~124 MB) from a wasp that is assembled into ~1,500 scaffold. I have run several iterations of Maker2 by re-generating .hmms in SNAP and feeding them into the next round, and my gene predictions keep increasing (in number and in size).  The only thing that changes at each round is the .hmm.
> This is the evidence that I give is:
> -          de novo assembled ESTs from a different strain of the same species (70,000 contigs? I am currently working on improving this assembly with the hope that this will be helpful here)
> -          610 proteins extracted from the genome scaffolds using CEGMA and HaMSTr
>  
> For my 1st iteration, I used the Nasonia .hmm from SNAP, and the est2genome/protein2genome option.
>  
> For the 2nd, 3rd and 4th rounds I have used .hmms generated from the previous round, all without the est2genome/protein2genome option. All other files are the same as in the original run.
>  
> As I understand it, after the second round, nothing should change in Maker2. But the differences are obvious between runs. Some entirely new exons are annotated. For example,  just counting ?exon? in the .gff file gives me 73,000 after the third iteration and 96,000 after the fourth! Actually the biggest leap in this number is between the third and fourth round. I can also see that many features are longer when I look at the files in Geneious.
>  
> Is this sort of change possible after the second round of Maker2? Is there something I have done wrong in my runs, or am a understanding this output incorrectly?
>  
> Thank you, 
> Alice
>  
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141212/0318a00e/attachment-0001.html>

From tuanduonganh at gmail.com  Sun Dec 14 05:55:35 2014
From: tuanduonganh at gmail.com (Tuan Duong Anh)
Date: Sun, 14 Dec 2014 14:55:35 +0200
Subject: [maker-devel] Quality filter perl script
Message-ID: <CAPpYYpx_4TaQY+FKLBXH6L--aSKDRcsKfiDgDQHJagV080Rcwg@mail.gmail.com>

Hi all,

I successfully ran MAKER and now looking into rescuing rejected gene models
using protein domain evidence. I have obtained the tsv file from
interproscan and have also updated the GFF3 file with this result using
ipr_update_gff. In the next step I will need quality_filter.pl script to
generate the default and standard build, however this script is not
included in my version of MAKER. Do you know where can I get this script?

Thanks.
Tuan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141214/bbdcca6b/attachment-0001.html>

From michael.s.campbell1 at gmail.com  Mon Dec 15 13:13:29 2014
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Mon, 15 Dec 2014 13:13:29 -0700
Subject: [maker-devel] Quality filter perl script
In-Reply-To: <CAPpYYpx_4TaQY+FKLBXH6L--aSKDRcsKfiDgDQHJagV080Rcwg@mail.gmail.com>
References: <CAPpYYpx_4TaQY+FKLBXH6L--aSKDRcsKfiDgDQHJagV080Rcwg@mail.gmail.com>
Message-ID: <CAAi6vWXiSqS-Y=10BSOO_WYBcXvZ5XzT7psEgque1yz3M4z03g@mail.gmail.com>

Hi Tuan,

I've attached a copy of the quality filter script. I've removed the .pl
extension because some email services will not accept them.

Take care,
Mike

On Sun, Dec 14, 2014 at 5:55 AM, Tuan Duong Anh <tuanduonganh at gmail.com>
wrote:
>
> Hi all,
>
> I successfully ran MAKER and now looking into rescuing rejected gene
> models using protein domain evidence. I have obtained the tsv file from
> interproscan and have also updated the GFF3 file with this result using
> ipr_update_gff. In the next step I will need quality_filter.pl script to
> generate the default and standard build, however this script is not
> included in my version of MAKER. Do you know where can I get this script?
>
> Thanks.
> Tuan
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>

-- 
Michael Campbell MS, RD.
Doctoral Candidate
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
ph:585-3543
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141215/7f57e7e4/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: quality_filter
Type: application/octet-stream
Size: 4597 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141215/7f57e7e4/attachment-0001.obj>

From cognitiveshrapnel at gmail.com  Sat Dec 27 19:59:12 2014
From: cognitiveshrapnel at gmail.com (Justin Peyton)
Date: Sat, 27 Dec 2014 21:59:12 -0500
Subject: [maker-devel] openmpi instantly chokes on maker
Message-ID: <CAKJ_QP2KRmdVqSPAPKrTfek+_H9jOQUFwFg3t=WBnsJZBSAz-g@mail.gmail.com>

I am working on getting maker running on a system running ubuntu 14.04. I
have installed maker and it runs great on a small but real data set. When I
try it with openmpi with the exact same inputs, however, I get the below
error almost instantly.

STATUS: Parsing control files...
STATUS: Processing and indexing input FASTA files...
[molybdenum:23241] *** Process received signal ***
[molybdenum:23241] Signal: Segmentation fault (11)
[molybdenum:23241] Signal code: Address not mapped (1)
[molybdenum:23241] Failing at address: 0x50c
[molybdenum:23241] [ 0]
/lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30]
[molybdenum:23241] [ 1]
/usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7f99bd5155a2]
[molybdenum:23241] [ 2]
/lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30]
[molybdenum:23241] [ 3]
/lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7f99bd19fbad]
[molybdenum:23241] [ 4]
/usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7f99bcbcc156]
[molybdenum:23241] [ 5]
/usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7f99bcbc34bb]
[molybdenum:23241] [ 6]
/usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7f99bce6e97e]
[molybdenum:23241] [ 7]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f99bc944182]
[molybdenum:23241] [ 8]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f99bd1acefd]
[molybdenum:23241] *** End of error message ***
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
[molybdenum:23252] *** Process received signal ***
[molybdenum:23252] Signal: Segmentation fault (11)
[molybdenum:23252] Signal code: Address not mapped (1)
[molybdenum:23252] Failing at address: 0x50c
[molybdenum:23252] [ 0]
/lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30]
[molybdenum:23252] [ 1]
/usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7fb191f5e5a2]
[molybdenum:23252] [ 2]
/lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30]
[molybdenum:23252] [ 3]
/lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7fb191be8bad]
[molybdenum:23252] [ 4]
/usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7fb191615156]
[molybdenum:23252] [ 5]
/usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7fb19160c4bb]
[molybdenum:23252] [ 6]
/usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7fb1918b797e]
[molybdenum:23252] [ 7]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7fb19138d182]
[molybdenum:23252] [ 8]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fb191bf5efd]
[molybdenum:23252] *** End of error message ***
SIGTERM received
--------------------------------------------------------------------------
mpiexec noticed that process rank 2 with PID 23241 on node molybdenum
exited on signal 11 (Segmentation fault).


I have tried reinstalling both maker and openmpi. I have tried two
different versions of both maker and openmpi. I am curenlty working with
maker 2.31.6 and openmpi 1.8.3 because I have had those work together on
another system. I have triple checked that LD_PRELOAD is properly set. I
have a feeling that I am pissing something small. I appreciate all the help.

Justin Peyton
The Ohio State University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141227/873f3c68/attachment-0001.html>

From harini1981 at gmail.com  Tue Dec 23 03:31:46 2014
From: harini1981 at gmail.com (Harini Vinod)
Date: Tue, 23 Dec 2014 16:01:46 +0530
Subject: [maker-devel] regd aed score plot
Message-ID: <CA++b=XvN-zF9F4frzC-eyzyJUBucXNWFGAtZs-zQ822HRwSufw@mail.gmail.com>

Dear Concern,
I had used the following script from MAKER-DEVEL
AED_cdf_generator.pl to obtain the plot

I get the following error
readline() on closed filehandle GEN0 at AED_cdf_generator.pl line 69.
AED    scaffold11.gff,    scaffold7.gff
Use of uninitialized value $total in division (/) at AED_cdf_generator.pl
line 43.
Illegal division by zero at AED_cdf_generator.pl line 43.

Can you kindly suggest what could have gone wrong???
regards
Harini

-- 
K.Harini
PhD scholar
Lab-25
NCBS,GKVK
Bangalore
560065
harinik at ncbs.res.in
+91 9535292110
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141223/5b82e6c1/attachment-0001.html>

From michael.s.campbell1 at gmail.com  Mon Dec 29 12:33:49 2014
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Mon, 29 Dec 2014 12:33:49 -0700
Subject: [maker-devel] regd aed score plot
In-Reply-To: <CA++b=XvN-zF9F4frzC-eyzyJUBucXNWFGAtZs-zQ822HRwSufw@mail.gmail.com>
References: <CA++b=XvN-zF9F4frzC-eyzyJUBucXNWFGAtZs-zQ822HRwSufw@mail.gmail.com>
Message-ID: <CAAi6vWVQrKX01jKSAVrCCeopOuMYOwrGVL2TRCew+oGE=fOpoQ@mail.gmail.com>

I think I fixed this in a recent svn commit. Try the attached version of
the script and let me know if it works.

Thanks,
Mike

On Tue, Dec 23, 2014 at 3:31 AM, Harini Vinod <harini1981 at gmail.com> wrote:

>
> Dear Concern,
> I had used the following script from MAKER-DEVEL
> AED_cdf_generator.pl to obtain the plot
>
> I get the following error
> readline() on closed filehandle GEN0 at AED_cdf_generator.pl line 69.
> AED    scaffold11.gff,    scaffold7.gff
> Use of uninitialized value $total in division (/) at AED_cdf_generator.pl
> line 43.
> Illegal division by zero at AED_cdf_generator.pl line 43.
>
> Can you kindly suggest what could have gone wrong???
> regards
> Harini
>
> --
> K.Harini
> PhD scholar
> Lab-25
> NCBS,GKVK
> Bangalore
> 560065
> harinik at ncbs.res.in
> +91 9535292110
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>


-- 
Michael Campbell MS, RD.
Doctoral Candidate
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
ph:585-3543
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141229/affaccd8/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: AED_cdf_generator.pl.gz
Type: application/x-gzip
Size: 1116 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141229/affaccd8/attachment-0001.gz>

From xvazquezc at gmail.com  Mon Dec 29 21:00:56 2014
From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=)
Date: Tue, 30 Dec 2014 15:00:56 +1100
Subject: [maker-devel] few basic questions
Message-ID: <CAL0hg4Epkbe29L5sqkdhyMucT9eSZ_Dp=_UmGqw=HBsCrQ6t-g@mail.gmail.com>

Hi there,

I'm a newbie dealing with genomes and I've been trying to start using Maker
for the annotation. I understand the base concepts but I have doubts about
the correct steps to follow. I've being through the 2014 video tutorial and
searched for detailed steps and I still have some question, maybe a bit
obvious tough...

I have to annotate two fungal genomes and I only have the DNA assembly (no
EST or protein files).
I understand that lacking of EST and protein files I should provide them as
alt-est and protein from the closest species I can, but is it enough with
one EST file from one organism for the alt-est?

Regarding the steps to process would this be correct?:

   1. run Maker with the genome, alt-est and protein files, with
   est2genome=1 and protein2genome=1 (softmask=1 ?)
   2. with this first output, create the hmm file for SNAP based on the
   first output
   3. Set est2genome=0 and protein2genome=0, set the snaphmm file and run
   again (using -base option)
   4. repeat2 and 3 as necessary*

*How do you know when you get to the point where no more refinement is
possible? Would that the final model? It should be based on the AED scores?
How can I get it without looking into individual sequence headings? Also,
do you perform the bootstrapping on the same folder? In the tutorial
<http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014>I
saw different folders, (e.g. pyu_contig1, pyu_contig2) used on each
repetition, not sure if just for demonstration purposes or if it is the
proper way to go..

I'm trying to run also a gene prediction with Augustus and GeneMark. The
first run will include an already trained profile for Augustus and the
native hmm file of genemark-ES**. Do they need to repeat the prediction by
bootstrap like with SNAP? If so, do I need to generate new hmm files or
prediction models based on results?

**I have been trying to make the hmm file for genemark-ES using the gm_es.pl
script but no matter what parameters I use the cluster shut the job down as
it exceeds 128GB of memory in use. The genome I've been testing for this is
about 42Mbp in a roughly 40-50 MB fasta file

Thank you in advance,

Xabier

-- 
Xabier V?zquez Campos
*PhD Candidate*
Water Research Centre
School of Civil and Environmental Engineering
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141230/7f403c04/attachment-0001.html>

From carsonhh at gmail.com  Wed Dec 31 13:39:10 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 31 Dec 2014 13:39:10 -0700
Subject: [maker-devel] few basic questions
In-Reply-To: <CAL0hg4Epkbe29L5sqkdhyMucT9eSZ_Dp=_UmGqw=HBsCrQ6t-g@mail.gmail.com>
References: <CAL0hg4Epkbe29L5sqkdhyMucT9eSZ_Dp=_UmGqw=HBsCrQ6t-g@mail.gmail.com>
Message-ID: <BD84D7BA-F0CC-4534-B167-59DC30942E9E@gmail.com>

Hi Xabier,

See below ?> 


> I have to annotate two fungal genomes and I only have the DNA assembly (no EST or protein files). 
> I understand that lacking of EST and protein files I should provide them as alt-est and protein from the closest species I can, but is it enough with one EST file from one organism for the alt-est?

Provide alt-EST if you have ESTs from a closely relate species, but do not have the proteome for that species.  If you have the proteome, use that.  Both are aligned in amino acid space, and provide the same hint information, the only difference being that alt-EST takes 10x longer because because noth target and query must be translated into all 6 reading frames.


> Regarding the steps to process would this be correct?:
> run Maker with the genome, alt-est and protein files, with est2genome=1 and protein2genome=1 (softmask=1 ?)
> with this first output, create the hmm file for SNAP based on the first output
> Set est2genome=0 and protein2genome=0, set the snaphmm file and run again (using -base option)
> repeat2 and 3 as necessary*
If you don?t have ESTs, don?t do est2genome (alt-ESTs don?t count).  Just do protein2genome.  In general to rounds of training is the maximum you should do.  At that point, ab initio predictions and hint based predictions will start to look like each other (so the ab initio models are doing well on their own).


> *How do you know when you get to the point where no more refinement is possible? Would that the final model? It should be based on the AED scores? How can I get it without looking into individual sequence headings? Also, do you perform the bootstrapping on the same folder? In the tutorial  <http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014>I saw different folders, (e.g. pyu_contig1, pyu_contig2) used on each repetition, not sure if just for demonstration purposes or if it is the proper way to go..

Run it in the same folder.  This will allow MAKER to recycle raw reports from BALST etc. from the previous run (i.e. MAKER will run faster).  In the tutorial we ran separately just to be able to open old results and compare.


> I'm trying to run also a gene prediction with Augustus and GeneMark. The first run will include an already trained profile for Augustus and the native hmm file of genemark-ES**. Do they need to repeat the prediction by bootstrap like with SNAP? If so, do I need to generate new hmm files or prediction models based on results?

You do with Augustus, but not GeneMark which does self training.


> **I have been trying to make the hmm file for genemark-ES using the gm_es.pl <http://gm_es.pl/> script but no matter what parameters I use the cluster shut the job down as it exceeds 128GB of memory in use. The genome I've been testing for this is about 42Mbp in a roughly 40-50 MB fasta file

You can train GeneMark with just part of the genome. Try using 10Mb made up of the longest contigs.  Also I only recommend using GeneMark on Fungi, it tends to not work well on organisms with more complex intron/exon structures. Also you should build a species specific repeat database to supplement RepeatMaskers internal libraries.  I?d recommend using Repeat Modeler.


Thanks,
Carson

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141231/bf5d22d0/attachment-0001.html>

From carsonhh at gmail.com  Wed Dec 31 13:42:38 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 31 Dec 2014 13:42:38 -0700
Subject: [maker-devel] openmpi instantly chokes on maker
In-Reply-To: <CAKJ_QP2KRmdVqSPAPKrTfek+_H9jOQUFwFg3t=WBnsJZBSAz-g@mail.gmail.com>
References: <CAKJ_QP2KRmdVqSPAPKrTfek+_H9jOQUFwFg3t=WBnsJZBSAz-g@mail.gmail.com>
Message-ID: <6BEE4837-A3E1-4FBF-AD18-4FBFD479BB2A@gmail.com>

Hi Justin,

You need to set LD_PRELOAD to the proper location and add the '-mca btl ^openib? flag to your command line.


The following is from the INSTALL file that should be included with MAKER ?>

If using OpenMPI, make sure to set LD_PRELOAD to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/location/of/openmpi/lib/libmpi.so).


1.  Say yes to the 'configure for MPI' question when running 'perl Build.PL? in step 1 of the EASY INSTALL.

2.  Give path to 'mpicc'. Note to make sure you do not give the path to ?mpicc' from another MPI flavor that might be installed on your system.

3.  Give path to the folder containing 'mpi,h'. Note to make sure you do not give the path to a folder from another MPI flavor that might be installed on your system. Mixing MPI flavors for 'mpicc' and 'mpi.h' will cause failures. Make sure to read and confirm the auto-detected paths.

4.  Finish installation according to steps 2-4 of the EASY INSTALL

    Note: For OpenMPI you may also want to set OMPI_MCA_mpi_warn_on_fork=0 in your ~/.bash_profile to turn off certain nonfatal warnings.

    Note: If jobs hang or freeze when using mpiexec under OpenMPI try adding the '-mca btl ^openib' flag to mpiexec command when running MAKER.

        Example: mpiexec -mca btl ^openib -n 20 maker


Thanks,
Carson


> On Dec 27, 2014, at 7:59 PM, Justin Peyton <cognitiveshrapnel at gmail.com> wrote:
> 
> I am working on getting maker running on a system running ubuntu 14.04. I have installed maker and it runs great on a small but real data set. When I try it with openmpi with the exact same inputs, however, I get the below error almost instantly. 
> 
> STATUS: Parsing control files...
> STATUS: Processing and indexing input FASTA files...
> [molybdenum:23241] *** Process received signal ***
> [molybdenum:23241] Signal: Segmentation fault (11)
> [molybdenum:23241] Signal code: Address not mapped (1)
> [molybdenum:23241] Failing at address: 0x50c
> [molybdenum:23241] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30]
> [molybdenum:23241] [ 1] /usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7f99bd5155a2]
> [molybdenum:23241] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30]
> [molybdenum:23241] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7f99bd19fbad]
> [molybdenum:23241] [ 4] /usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7f99bcbcc156]
> [molybdenum:23241] [ 5] /usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7f99bcbc34bb]
> [molybdenum:23241] [ 6] /usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7f99bce6e97e]
> [molybdenum:23241] [ 7] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f99bc944182]
> [molybdenum:23241] [ 8] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f99bd1acefd]
> [molybdenum:23241] *** End of error message ***
> SIGTERM received
> SIGTERM received
> SIGTERM received
> SIGTERM received
> SIGTERM received
> [molybdenum:23252] *** Process received signal ***
> [molybdenum:23252] Signal: Segmentation fault (11)
> [molybdenum:23252] Signal code: Address not mapped (1)
> [molybdenum:23252] Failing at address: 0x50c
> [molybdenum:23252] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30]
> [molybdenum:23252] [ 1] /usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7fb191f5e5a2]
> [molybdenum:23252] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30]
> [molybdenum:23252] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7fb191be8bad]
> [molybdenum:23252] [ 4] /usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7fb191615156]
> [molybdenum:23252] [ 5] /usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7fb19160c4bb]
> [molybdenum:23252] [ 6] /usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7fb1918b797e]
> [molybdenum:23252] [ 7] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7fb19138d182]
> [molybdenum:23252] [ 8] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fb191bf5efd]
> [molybdenum:23252] *** End of error message ***
> SIGTERM received
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 2 with PID 23241 on node molybdenum exited on signal 11 (Segmentation fault).
> 
> 
> I have tried reinstalling both maker and openmpi. I have tried two different versions of both maker and openmpi. I am curenlty working with maker 2.31.6 and openmpi 1.8.3 because I have had those work together on another system. I have triple checked that LD_PRELOAD is properly set. I have a feeling that I am pissing something small. I appreciate all the help.
> 
> Justin Peyton
> The Ohio State University
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141231/09659eb9/attachment-0001.html>

From jerryzhaosjtu at gmail.com  Wed Dec 31 18:48:29 2014
From: jerryzhaosjtu at gmail.com (=?UTF-8?B?6LW16LaK?=)
Date: Thu, 1 Jan 2015 09:48:29 +0800
Subject: [maker-devel] some problems using MAKER
Message-ID: <CAMxJ+aewxN6eM_eQDfNK6p4Vn1c90Jwqd8EJokaAZPUbB1dDMA@mail.gmail.com>

Hi all,

Recently I'm using MAKER to annotate a single chromosome of rice as a
pre-experiment. And I'm confronting some problems. After the annotation
when I run the evaluation of eval between my result and gold standard, the
gene sensitivity&specificity is only around 20%. And after I added the gff3
file maker made itself to run maker again, I found that the result is worse
than 20%.

My input is a Trinity-processed RNA-seq file and a protein file.  I chose
snap, augustus and genemark as ab initio predictors.

I paste my maker_opts.ctl here:

#-----Genome (these are always required)
genome=chr12.fasta #genome sequence (fasta file or fasta embeded in GFF3
file)
organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic

#-----Re-annotation Using MAKER Derived GFF3
maker_gff=chr12.gff #MAKER derived GFF3 file
est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no
altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no

#-----EST Evidence (for best results provide a file for at least one)
est=rna-seq_trinity.fasta #set of ESTs or assembled mRNA-seq in fasta format
altest= #EST/cDNA sequence file in fasta format from an alternate organism
est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
altest_gff= #aligned ESTs from a closly relate species in GFF3 format

#-----Protein Homology Evidence (for best results provide a file for at
least one)
protein=Osativa_193_peptide.fa  #protein sequence file in fasta format
(i.e. from mutiple oransisms)
protein_gff= #aligned protein homology evidence from an external GFF3 file

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org=Rice #select a model organism for RepBase masking in RepeatMasker
rmlib= #provide an organism specific repeat library in fasta format for
RepeatMasker
repeat_protein= #provide a fasta file of transposable element proteins for
RepeatRunner
rm_gff= #pre-identified repeat elements from an external GFF3 file
prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change
this), 1 = yes, 0 = no
softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg
and dust filtering)

#-----Gene Prediction
snaphmm=rice #SNAP HMM file
gmhmm=/lustre/home/clswcc/yzhao/MAKER/maker/exe/genemark_hmm_euk_linux_64/ehmm/o_sativa.mod
#GeneMark HMM file
augustus_species=arabidopsis #Augustus gene prediction species model
fgenesh_par_file= #FGENESH parameter file
pred_gff=augus.gff3 #ab-initio predictions from an external GFF3 file
model_gff= #annotated gene models from an external GFF3 file (annotation
pass-through)
est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no
trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no


snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
unmask=1 #also run ab-initio prediction programs on unmasked sequence, 1 =
yes, 0 = no

#-----Other Annotation Feature Types (features MAKER doesn't recognize)
other_gff= #extra features to pass-through to final MAKER generated GFF3
file

#-----External Application Behavior Options
alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST
databases
cpus=16 #max number of cpus to use in BLAST and RepeatMasker (not for MPI,
leave 1 when using MPI)


Could you help me? Thank you !!!


-- 

*Yue Zhao (Jerry)*

Bachelor Candidate of Plant Biotechnology

Researcher in UCLA-CSST program

Shanghai Jiao Tong University, Shanghai

*jerryzhaosjtu at gmail.com <jerryzhaosjtu at gmail.com>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20150101/64655e98/attachment-0001.html>

From carsonhh at gmail.com  Mon Dec  1 12:31:46 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 1 Dec 2014 12:31:46 -0700
Subject: [maker-devel] gff output
In-Reply-To: <5476ED52.3060902@gmail.com>
References: <5476ED52.3060902@gmail.com>
Message-ID: <5A861A9A-5348-44B5-B0F6-C9AF3AA1469E@gmail.com>

If you are using the gff3 directly produced by Augustus, it will be oddly structured and does not conform to the 'Canonical Gene? example given by the GFF3 format specification.  You have to make a couple of search and replace operations to make it work.

Also it would generally be better to let maker run augustus for you rather than providing it as GFF3. This is because you lose the hint feedback that maker provides augustus.  AS a result there will be no improvement made to the annotations beyond what augustus has already produced.

?Carson


> On Nov 27, 2014, at 2:22 AM, Muriel Gros-Balthazard <muriel.grosb at gmail.com> wrote:
> 
> Hello,
> 
> I have been using Maker to generate an annotation.
> I especially set these options:
> - est_gff with a list of transcripts.gff3 (Cufflinks output)
> - model_org=all
> - rmlib=allrepeats.lib
> - repeat_protein=te_prot.fasta
> - pred_gff= Augustus.gff3 (that I generated previously)
> 
> I obtain a gff file for each of my contigs.
> However, here are the three possibilities in the second column :
> # est_gff:cufflinks
> # repeatmasker
> # repeatrunner
> 
> I have no information about exons and introns.
> And I am wondering if the Augustus.gff3 was used...
> 
> On top of that, I forgot to set up pred_stats to 1.
> If I understand well, I can just change this in the ocntrol file, and run Maker again. Since there is the output with everything, it won't run again the prediction, only this option. Is that right ?
> 
> Thank you,
> 
> Muriel
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From Alice.Dennis at eawag.ch  Fri Dec 12 07:10:46 2014
From: Alice.Dennis at eawag.ch (Dennis, Alice)
Date: Fri, 12 Dec 2014 14:10:46 +0000
Subject: [maker-devel] iterative Maker2
Message-ID: <1FD5809847938F44B92893606806BD53600D845F@EE-MBX1.ee.emp-eaw.ch>

Hi all,

I am a relatively new user to Maker2, and I'm looking for advise on running many iterations of the same dataset in Maker2.

I have a relatively small genome (~124 MB) from a wasp that is assembled into ~1,500 scaffold. I have run several iterations of Maker2 by re-generating .hmms in SNAP and feeding them into the next round, and my gene predictions keep increasing (in number and in size).  The only thing that changes at each round is the .hmm.
This is the evidence that I give is:

-          de novo assembled ESTs from a different strain of the same species (70,000 contigs... I am currently working on improving this assembly with the hope that this will be helpful here)

-          610 proteins extracted from the genome scaffolds using CEGMA and HaMSTr

For my 1st iteration, I used the Nasonia .hmm from SNAP, and the est2genome/protein2genome option.

For the 2nd, 3rd and 4th rounds I have used .hmms generated from the previous round, all without the est2genome/protein2genome option. All other files are the same as in the original run.

As I understand it, after the second round, nothing should change in Maker2. But the differences are obvious between runs. Some entirely new exons are annotated. For example,  just counting "exon" in the .gff file gives me 73,000 after the third iteration and 96,000 after the fourth! Actually the biggest leap in this number is between the third and fourth round. I can also see that many features are longer when I look at the files in Geneious.

Is this sort of change possible after the second round of Maker2? Is there something I have done wrong in my runs, or am a understanding this output incorrectly?

Thank you,
Alice

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141212/0df64686/attachment-0002.html>

From carsonhh at gmail.com  Fri Dec 12 08:41:42 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 12 Dec 2014 08:41:42 -0700
Subject: [maker-devel] iterative Maker2
In-Reply-To: <1FD5809847938F44B92893606806BD53600D845F@EE-MBX1.ee.emp-eaw.ch>
References: <1FD5809847938F44B92893606806BD53600D845F@EE-MBX1.ee.emp-eaw.ch>
Message-ID: <7D42E0F3-B601-4D67-AF07-09C98469D8E5@gmail.com>

The gene models are actually produced by SNAP, Augustus, or whatever gene predictor you are using, so if you change the HMM every round, then the models will change too.  But I have one concern.  You are using a very sparse protein evidence dataset.  The protein dataset is very important to MAKER?s performance, and for itterative training of the ab initio predictors.  Normally after the second iteration, additional training should not be beneficial, but if you are getting wildly different results on 3rd and 4th round, then you probably aren?t getting sufficient good models to train with.

For a protein dataset you should be using the entire a proteome from a minimum of two related species and perhaps all of UniProt/Swiss-prot to get a broad protein database.  Don?t use the proteins extracted by CEGMA and HaMSTr.  CEGMA can be used to guide the first HMM creation (cegma2zff scrip that comes with MAEKR), but don?t give the proteins to MAKER as evidence, also the HaMSTr results will be redundant with the ESTs.  You need proteins from related species to look for homology not found in the EST dataset.

Also repeat masking is important for any genome and has a huge effect on ab initio predictor performance.  Make sure you run something like RepeatModeler to look for species specific repeats that will not already be in RepBase.  Then add those results to the rmlib= option in the maker control files.

Thanks,
Carson

 
> On Dec 12, 2014, at 7:10 AM, Dennis, Alice <Alice.Dennis at eawag.ch> wrote:
> 
> Hi all,
>  
> I am a relatively new user to Maker2, and I?m looking for advise on running many iterations of the same dataset in Maker2.
>  
> I have a relatively small genome (~124 MB) from a wasp that is assembled into ~1,500 scaffold. I have run several iterations of Maker2 by re-generating .hmms in SNAP and feeding them into the next round, and my gene predictions keep increasing (in number and in size).  The only thing that changes at each round is the .hmm.
> This is the evidence that I give is:
> -          de novo assembled ESTs from a different strain of the same species (70,000 contigs? I am currently working on improving this assembly with the hope that this will be helpful here)
> -          610 proteins extracted from the genome scaffolds using CEGMA and HaMSTr
>  
> For my 1st iteration, I used the Nasonia .hmm from SNAP, and the est2genome/protein2genome option.
>  
> For the 2nd, 3rd and 4th rounds I have used .hmms generated from the previous round, all without the est2genome/protein2genome option. All other files are the same as in the original run.
>  
> As I understand it, after the second round, nothing should change in Maker2. But the differences are obvious between runs. Some entirely new exons are annotated. For example,  just counting ?exon? in the .gff file gives me 73,000 after the third iteration and 96,000 after the fourth! Actually the biggest leap in this number is between the third and fourth round. I can also see that many features are longer when I look at the files in Geneious.
>  
> Is this sort of change possible after the second round of Maker2? Is there something I have done wrong in my runs, or am a understanding this output incorrectly?
>  
> Thank you, 
> Alice
>  
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141212/0318a00e/attachment-0002.html>

From tuanduonganh at gmail.com  Sun Dec 14 05:55:35 2014
From: tuanduonganh at gmail.com (Tuan Duong Anh)
Date: Sun, 14 Dec 2014 14:55:35 +0200
Subject: [maker-devel] Quality filter perl script
Message-ID: <CAPpYYpx_4TaQY+FKLBXH6L--aSKDRcsKfiDgDQHJagV080Rcwg@mail.gmail.com>

Hi all,

I successfully ran MAKER and now looking into rescuing rejected gene models
using protein domain evidence. I have obtained the tsv file from
interproscan and have also updated the GFF3 file with this result using
ipr_update_gff. In the next step I will need quality_filter.pl script to
generate the default and standard build, however this script is not
included in my version of MAKER. Do you know where can I get this script?

Thanks.
Tuan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141214/bbdcca6b/attachment-0002.html>

From michael.s.campbell1 at gmail.com  Mon Dec 15 13:13:29 2014
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Mon, 15 Dec 2014 13:13:29 -0700
Subject: [maker-devel] Quality filter perl script
In-Reply-To: <CAPpYYpx_4TaQY+FKLBXH6L--aSKDRcsKfiDgDQHJagV080Rcwg@mail.gmail.com>
References: <CAPpYYpx_4TaQY+FKLBXH6L--aSKDRcsKfiDgDQHJagV080Rcwg@mail.gmail.com>
Message-ID: <CAAi6vWXiSqS-Y=10BSOO_WYBcXvZ5XzT7psEgque1yz3M4z03g@mail.gmail.com>

Hi Tuan,

I've attached a copy of the quality filter script. I've removed the .pl
extension because some email services will not accept them.

Take care,
Mike

On Sun, Dec 14, 2014 at 5:55 AM, Tuan Duong Anh <tuanduonganh at gmail.com>
wrote:
>
> Hi all,
>
> I successfully ran MAKER and now looking into rescuing rejected gene
> models using protein domain evidence. I have obtained the tsv file from
> interproscan and have also updated the GFF3 file with this result using
> ipr_update_gff. In the next step I will need quality_filter.pl script to
> generate the default and standard build, however this script is not
> included in my version of MAKER. Do you know where can I get this script?
>
> Thanks.
> Tuan
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>

-- 
Michael Campbell MS, RD.
Doctoral Candidate
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
ph:585-3543
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141215/7f57e7e4/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: quality_filter
Type: application/octet-stream
Size: 4598 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141215/7f57e7e4/attachment-0002.obj>

From cognitiveshrapnel at gmail.com  Sat Dec 27 19:59:12 2014
From: cognitiveshrapnel at gmail.com (Justin Peyton)
Date: Sat, 27 Dec 2014 21:59:12 -0500
Subject: [maker-devel] openmpi instantly chokes on maker
Message-ID: <CAKJ_QP2KRmdVqSPAPKrTfek+_H9jOQUFwFg3t=WBnsJZBSAz-g@mail.gmail.com>

I am working on getting maker running on a system running ubuntu 14.04. I
have installed maker and it runs great on a small but real data set. When I
try it with openmpi with the exact same inputs, however, I get the below
error almost instantly.

STATUS: Parsing control files...
STATUS: Processing and indexing input FASTA files...
[molybdenum:23241] *** Process received signal ***
[molybdenum:23241] Signal: Segmentation fault (11)
[molybdenum:23241] Signal code: Address not mapped (1)
[molybdenum:23241] Failing at address: 0x50c
[molybdenum:23241] [ 0]
/lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30]
[molybdenum:23241] [ 1]
/usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7f99bd5155a2]
[molybdenum:23241] [ 2]
/lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30]
[molybdenum:23241] [ 3]
/lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7f99bd19fbad]
[molybdenum:23241] [ 4]
/usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7f99bcbcc156]
[molybdenum:23241] [ 5]
/usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7f99bcbc34bb]
[molybdenum:23241] [ 6]
/usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7f99bce6e97e]
[molybdenum:23241] [ 7]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f99bc944182]
[molybdenum:23241] [ 8]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f99bd1acefd]
[molybdenum:23241] *** End of error message ***
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
[molybdenum:23252] *** Process received signal ***
[molybdenum:23252] Signal: Segmentation fault (11)
[molybdenum:23252] Signal code: Address not mapped (1)
[molybdenum:23252] Failing at address: 0x50c
[molybdenum:23252] [ 0]
/lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30]
[molybdenum:23252] [ 1]
/usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7fb191f5e5a2]
[molybdenum:23252] [ 2]
/lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30]
[molybdenum:23252] [ 3]
/lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7fb191be8bad]
[molybdenum:23252] [ 4]
/usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7fb191615156]
[molybdenum:23252] [ 5]
/usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7fb19160c4bb]
[molybdenum:23252] [ 6]
/usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7fb1918b797e]
[molybdenum:23252] [ 7]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7fb19138d182]
[molybdenum:23252] [ 8]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fb191bf5efd]
[molybdenum:23252] *** End of error message ***
SIGTERM received
--------------------------------------------------------------------------
mpiexec noticed that process rank 2 with PID 23241 on node molybdenum
exited on signal 11 (Segmentation fault).


I have tried reinstalling both maker and openmpi. I have tried two
different versions of both maker and openmpi. I am curenlty working with
maker 2.31.6 and openmpi 1.8.3 because I have had those work together on
another system. I have triple checked that LD_PRELOAD is properly set. I
have a feeling that I am pissing something small. I appreciate all the help.

Justin Peyton
The Ohio State University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141227/873f3c68/attachment-0002.html>

From harini1981 at gmail.com  Tue Dec 23 03:31:46 2014
From: harini1981 at gmail.com (Harini Vinod)
Date: Tue, 23 Dec 2014 16:01:46 +0530
Subject: [maker-devel] regd aed score plot
Message-ID: <CA++b=XvN-zF9F4frzC-eyzyJUBucXNWFGAtZs-zQ822HRwSufw@mail.gmail.com>

Dear Concern,
I had used the following script from MAKER-DEVEL
AED_cdf_generator.pl to obtain the plot

I get the following error
readline() on closed filehandle GEN0 at AED_cdf_generator.pl line 69.
AED    scaffold11.gff,    scaffold7.gff
Use of uninitialized value $total in division (/) at AED_cdf_generator.pl
line 43.
Illegal division by zero at AED_cdf_generator.pl line 43.

Can you kindly suggest what could have gone wrong???
regards
Harini

-- 
K.Harini
PhD scholar
Lab-25
NCBS,GKVK
Bangalore
560065
harinik at ncbs.res.in
+91 9535292110
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141223/5b82e6c1/attachment-0002.html>

From michael.s.campbell1 at gmail.com  Mon Dec 29 12:33:49 2014
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Mon, 29 Dec 2014 12:33:49 -0700
Subject: [maker-devel] regd aed score plot
In-Reply-To: <CA++b=XvN-zF9F4frzC-eyzyJUBucXNWFGAtZs-zQ822HRwSufw@mail.gmail.com>
References: <CA++b=XvN-zF9F4frzC-eyzyJUBucXNWFGAtZs-zQ822HRwSufw@mail.gmail.com>
Message-ID: <CAAi6vWVQrKX01jKSAVrCCeopOuMYOwrGVL2TRCew+oGE=fOpoQ@mail.gmail.com>

I think I fixed this in a recent svn commit. Try the attached version of
the script and let me know if it works.

Thanks,
Mike

On Tue, Dec 23, 2014 at 3:31 AM, Harini Vinod <harini1981 at gmail.com> wrote:

>
> Dear Concern,
> I had used the following script from MAKER-DEVEL
> AED_cdf_generator.pl to obtain the plot
>
> I get the following error
> readline() on closed filehandle GEN0 at AED_cdf_generator.pl line 69.
> AED    scaffold11.gff,    scaffold7.gff
> Use of uninitialized value $total in division (/) at AED_cdf_generator.pl
> line 43.
> Illegal division by zero at AED_cdf_generator.pl line 43.
>
> Can you kindly suggest what could have gone wrong???
> regards
> Harini
>
> --
> K.Harini
> PhD scholar
> Lab-25
> NCBS,GKVK
> Bangalore
> 560065
> harinik at ncbs.res.in
> +91 9535292110
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>


-- 
Michael Campbell MS, RD.
Doctoral Candidate
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
ph:585-3543
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141229/affaccd8/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: AED_cdf_generator.pl.gz
Type: application/x-gzip
Size: 1116 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141229/affaccd8/attachment.tgz>

From xvazquezc at gmail.com  Mon Dec 29 21:00:56 2014
From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=)
Date: Tue, 30 Dec 2014 15:00:56 +1100
Subject: [maker-devel] few basic questions
Message-ID: <CAL0hg4Epkbe29L5sqkdhyMucT9eSZ_Dp=_UmGqw=HBsCrQ6t-g@mail.gmail.com>

Hi there,

I'm a newbie dealing with genomes and I've been trying to start using Maker
for the annotation. I understand the base concepts but I have doubts about
the correct steps to follow. I've being through the 2014 video tutorial and
searched for detailed steps and I still have some question, maybe a bit
obvious tough...

I have to annotate two fungal genomes and I only have the DNA assembly (no
EST or protein files).
I understand that lacking of EST and protein files I should provide them as
alt-est and protein from the closest species I can, but is it enough with
one EST file from one organism for the alt-est?

Regarding the steps to process would this be correct?:

   1. run Maker with the genome, alt-est and protein files, with
   est2genome=1 and protein2genome=1 (softmask=1 ?)
   2. with this first output, create the hmm file for SNAP based on the
   first output
   3. Set est2genome=0 and protein2genome=0, set the snaphmm file and run
   again (using -base option)
   4. repeat2 and 3 as necessary*

*How do you know when you get to the point where no more refinement is
possible? Would that the final model? It should be based on the AED scores?
How can I get it without looking into individual sequence headings? Also,
do you perform the bootstrapping on the same folder? In the tutorial
<http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014>I
saw different folders, (e.g. pyu_contig1, pyu_contig2) used on each
repetition, not sure if just for demonstration purposes or if it is the
proper way to go..

I'm trying to run also a gene prediction with Augustus and GeneMark. The
first run will include an already trained profile for Augustus and the
native hmm file of genemark-ES**. Do they need to repeat the prediction by
bootstrap like with SNAP? If so, do I need to generate new hmm files or
prediction models based on results?

**I have been trying to make the hmm file for genemark-ES using the gm_es.pl
script but no matter what parameters I use the cluster shut the job down as
it exceeds 128GB of memory in use. The genome I've been testing for this is
about 42Mbp in a roughly 40-50 MB fasta file

Thank you in advance,

Xabier

-- 
Xabier V?zquez Campos
*PhD Candidate*
Water Research Centre
School of Civil and Environmental Engineering
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141230/7f403c04/attachment-0002.html>

From carsonhh at gmail.com  Wed Dec 31 13:39:10 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 31 Dec 2014 13:39:10 -0700
Subject: [maker-devel] few basic questions
In-Reply-To: <CAL0hg4Epkbe29L5sqkdhyMucT9eSZ_Dp=_UmGqw=HBsCrQ6t-g@mail.gmail.com>
References: <CAL0hg4Epkbe29L5sqkdhyMucT9eSZ_Dp=_UmGqw=HBsCrQ6t-g@mail.gmail.com>
Message-ID: <BD84D7BA-F0CC-4534-B167-59DC30942E9E@gmail.com>

Hi Xabier,

See below ?> 


> I have to annotate two fungal genomes and I only have the DNA assembly (no EST or protein files). 
> I understand that lacking of EST and protein files I should provide them as alt-est and protein from the closest species I can, but is it enough with one EST file from one organism for the alt-est?

Provide alt-EST if you have ESTs from a closely relate species, but do not have the proteome for that species.  If you have the proteome, use that.  Both are aligned in amino acid space, and provide the same hint information, the only difference being that alt-EST takes 10x longer because because noth target and query must be translated into all 6 reading frames.


> Regarding the steps to process would this be correct?:
> run Maker with the genome, alt-est and protein files, with est2genome=1 and protein2genome=1 (softmask=1 ?)
> with this first output, create the hmm file for SNAP based on the first output
> Set est2genome=0 and protein2genome=0, set the snaphmm file and run again (using -base option)
> repeat2 and 3 as necessary*
If you don?t have ESTs, don?t do est2genome (alt-ESTs don?t count).  Just do protein2genome.  In general to rounds of training is the maximum you should do.  At that point, ab initio predictions and hint based predictions will start to look like each other (so the ab initio models are doing well on their own).


> *How do you know when you get to the point where no more refinement is possible? Would that the final model? It should be based on the AED scores? How can I get it without looking into individual sequence headings? Also, do you perform the bootstrapping on the same folder? In the tutorial  <http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014>I saw different folders, (e.g. pyu_contig1, pyu_contig2) used on each repetition, not sure if just for demonstration purposes or if it is the proper way to go..

Run it in the same folder.  This will allow MAKER to recycle raw reports from BALST etc. from the previous run (i.e. MAKER will run faster).  In the tutorial we ran separately just to be able to open old results and compare.


> I'm trying to run also a gene prediction with Augustus and GeneMark. The first run will include an already trained profile for Augustus and the native hmm file of genemark-ES**. Do they need to repeat the prediction by bootstrap like with SNAP? If so, do I need to generate new hmm files or prediction models based on results?

You do with Augustus, but not GeneMark which does self training.


> **I have been trying to make the hmm file for genemark-ES using the gm_es.pl <http://gm_es.pl/> script but no matter what parameters I use the cluster shut the job down as it exceeds 128GB of memory in use. The genome I've been testing for this is about 42Mbp in a roughly 40-50 MB fasta file

You can train GeneMark with just part of the genome. Try using 10Mb made up of the longest contigs.  Also I only recommend using GeneMark on Fungi, it tends to not work well on organisms with more complex intron/exon structures. Also you should build a species specific repeat database to supplement RepeatMaskers internal libraries.  I?d recommend using Repeat Modeler.


Thanks,
Carson

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141231/bf5d22d0/attachment-0002.html>

From carsonhh at gmail.com  Wed Dec 31 13:42:38 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 31 Dec 2014 13:42:38 -0700
Subject: [maker-devel] openmpi instantly chokes on maker
In-Reply-To: <CAKJ_QP2KRmdVqSPAPKrTfek+_H9jOQUFwFg3t=WBnsJZBSAz-g@mail.gmail.com>
References: <CAKJ_QP2KRmdVqSPAPKrTfek+_H9jOQUFwFg3t=WBnsJZBSAz-g@mail.gmail.com>
Message-ID: <6BEE4837-A3E1-4FBF-AD18-4FBFD479BB2A@gmail.com>

Hi Justin,

You need to set LD_PRELOAD to the proper location and add the '-mca btl ^openib? flag to your command line.


The following is from the INSTALL file that should be included with MAKER ?>

If using OpenMPI, make sure to set LD_PRELOAD to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/location/of/openmpi/lib/libmpi.so).


1.  Say yes to the 'configure for MPI' question when running 'perl Build.PL? in step 1 of the EASY INSTALL.

2.  Give path to 'mpicc'. Note to make sure you do not give the path to ?mpicc' from another MPI flavor that might be installed on your system.

3.  Give path to the folder containing 'mpi,h'. Note to make sure you do not give the path to a folder from another MPI flavor that might be installed on your system. Mixing MPI flavors for 'mpicc' and 'mpi.h' will cause failures. Make sure to read and confirm the auto-detected paths.

4.  Finish installation according to steps 2-4 of the EASY INSTALL

    Note: For OpenMPI you may also want to set OMPI_MCA_mpi_warn_on_fork=0 in your ~/.bash_profile to turn off certain nonfatal warnings.

    Note: If jobs hang or freeze when using mpiexec under OpenMPI try adding the '-mca btl ^openib' flag to mpiexec command when running MAKER.

        Example: mpiexec -mca btl ^openib -n 20 maker


Thanks,
Carson


> On Dec 27, 2014, at 7:59 PM, Justin Peyton <cognitiveshrapnel at gmail.com> wrote:
> 
> I am working on getting maker running on a system running ubuntu 14.04. I have installed maker and it runs great on a small but real data set. When I try it with openmpi with the exact same inputs, however, I get the below error almost instantly. 
> 
> STATUS: Parsing control files...
> STATUS: Processing and indexing input FASTA files...
> [molybdenum:23241] *** Process received signal ***
> [molybdenum:23241] Signal: Segmentation fault (11)
> [molybdenum:23241] Signal code: Address not mapped (1)
> [molybdenum:23241] Failing at address: 0x50c
> [molybdenum:23241] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30]
> [molybdenum:23241] [ 1] /usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7f99bd5155a2]
> [molybdenum:23241] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30]
> [molybdenum:23241] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7f99bd19fbad]
> [molybdenum:23241] [ 4] /usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7f99bcbcc156]
> [molybdenum:23241] [ 5] /usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7f99bcbc34bb]
> [molybdenum:23241] [ 6] /usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7f99bce6e97e]
> [molybdenum:23241] [ 7] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f99bc944182]
> [molybdenum:23241] [ 8] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f99bd1acefd]
> [molybdenum:23241] *** End of error message ***
> SIGTERM received
> SIGTERM received
> SIGTERM received
> SIGTERM received
> SIGTERM received
> [molybdenum:23252] *** Process received signal ***
> [molybdenum:23252] Signal: Segmentation fault (11)
> [molybdenum:23252] Signal code: Address not mapped (1)
> [molybdenum:23252] Failing at address: 0x50c
> [molybdenum:23252] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30]
> [molybdenum:23252] [ 1] /usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7fb191f5e5a2]
> [molybdenum:23252] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30]
> [molybdenum:23252] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7fb191be8bad]
> [molybdenum:23252] [ 4] /usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7fb191615156]
> [molybdenum:23252] [ 5] /usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7fb19160c4bb]
> [molybdenum:23252] [ 6] /usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7fb1918b797e]
> [molybdenum:23252] [ 7] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7fb19138d182]
> [molybdenum:23252] [ 8] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fb191bf5efd]
> [molybdenum:23252] *** End of error message ***
> SIGTERM received
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 2 with PID 23241 on node molybdenum exited on signal 11 (Segmentation fault).
> 
> 
> I have tried reinstalling both maker and openmpi. I have tried two different versions of both maker and openmpi. I am curenlty working with maker 2.31.6 and openmpi 1.8.3 because I have had those work together on another system. I have triple checked that LD_PRELOAD is properly set. I have a feeling that I am pissing something small. I appreciate all the help.
> 
> Justin Peyton
> The Ohio State University
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141231/09659eb9/attachment-0002.html>

From jerryzhaosjtu at gmail.com  Wed Dec 31 18:48:29 2014
From: jerryzhaosjtu at gmail.com (=?UTF-8?B?6LW16LaK?=)
Date: Thu, 1 Jan 2015 09:48:29 +0800
Subject: [maker-devel] some problems using MAKER
Message-ID: <CAMxJ+aewxN6eM_eQDfNK6p4Vn1c90Jwqd8EJokaAZPUbB1dDMA@mail.gmail.com>

Hi all,

Recently I'm using MAKER to annotate a single chromosome of rice as a
pre-experiment. And I'm confronting some problems. After the annotation
when I run the evaluation of eval between my result and gold standard, the
gene sensitivity&specificity is only around 20%. And after I added the gff3
file maker made itself to run maker again, I found that the result is worse
than 20%.

My input is a Trinity-processed RNA-seq file and a protein file.  I chose
snap, augustus and genemark as ab initio predictors.

I paste my maker_opts.ctl here:

#-----Genome (these are always required)
genome=chr12.fasta #genome sequence (fasta file or fasta embeded in GFF3
file)
organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic

#-----Re-annotation Using MAKER Derived GFF3
maker_gff=chr12.gff #MAKER derived GFF3 file
est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no
altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no

#-----EST Evidence (for best results provide a file for at least one)
est=rna-seq_trinity.fasta #set of ESTs or assembled mRNA-seq in fasta format
altest= #EST/cDNA sequence file in fasta format from an alternate organism
est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
altest_gff= #aligned ESTs from a closly relate species in GFF3 format

#-----Protein Homology Evidence (for best results provide a file for at
least one)
protein=Osativa_193_peptide.fa  #protein sequence file in fasta format
(i.e. from mutiple oransisms)
protein_gff= #aligned protein homology evidence from an external GFF3 file

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org=Rice #select a model organism for RepBase masking in RepeatMasker
rmlib= #provide an organism specific repeat library in fasta format for
RepeatMasker
repeat_protein= #provide a fasta file of transposable element proteins for
RepeatRunner
rm_gff= #pre-identified repeat elements from an external GFF3 file
prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change
this), 1 = yes, 0 = no
softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg
and dust filtering)

#-----Gene Prediction
snaphmm=rice #SNAP HMM file
gmhmm=/lustre/home/clswcc/yzhao/MAKER/maker/exe/genemark_hmm_euk_linux_64/ehmm/o_sativa.mod
#GeneMark HMM file
augustus_species=arabidopsis #Augustus gene prediction species model
fgenesh_par_file= #FGENESH parameter file
pred_gff=augus.gff3 #ab-initio predictions from an external GFF3 file
model_gff= #annotated gene models from an external GFF3 file (annotation
pass-through)
est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no
trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no


snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
unmask=1 #also run ab-initio prediction programs on unmasked sequence, 1 =
yes, 0 = no

#-----Other Annotation Feature Types (features MAKER doesn't recognize)
other_gff= #extra features to pass-through to final MAKER generated GFF3
file

#-----External Application Behavior Options
alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST
databases
cpus=16 #max number of cpus to use in BLAST and RepeatMasker (not for MPI,
leave 1 when using MPI)


Could you help me? Thank you !!!


-- 

*Yue Zhao (Jerry)*

Bachelor Candidate of Plant Biotechnology

Researcher in UCLA-CSST program

Shanghai Jiao Tong University, Shanghai

*jerryzhaosjtu at gmail.com <jerryzhaosjtu at gmail.com>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20150101/64655e98/attachment-0002.html>

From carsonhh at gmail.com  Mon Dec  1 12:31:46 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 1 Dec 2014 12:31:46 -0700
Subject: [maker-devel] gff output
In-Reply-To: <5476ED52.3060902@gmail.com>
References: <5476ED52.3060902@gmail.com>
Message-ID: <5A861A9A-5348-44B5-B0F6-C9AF3AA1469E@gmail.com>

If you are using the gff3 directly produced by Augustus, it will be oddly structured and does not conform to the 'Canonical Gene? example given by the GFF3 format specification.  You have to make a couple of search and replace operations to make it work.

Also it would generally be better to let maker run augustus for you rather than providing it as GFF3. This is because you lose the hint feedback that maker provides augustus.  AS a result there will be no improvement made to the annotations beyond what augustus has already produced.

?Carson


> On Nov 27, 2014, at 2:22 AM, Muriel Gros-Balthazard <muriel.grosb at gmail.com> wrote:
> 
> Hello,
> 
> I have been using Maker to generate an annotation.
> I especially set these options:
> - est_gff with a list of transcripts.gff3 (Cufflinks output)
> - model_org=all
> - rmlib=allrepeats.lib
> - repeat_protein=te_prot.fasta
> - pred_gff= Augustus.gff3 (that I generated previously)
> 
> I obtain a gff file for each of my contigs.
> However, here are the three possibilities in the second column :
> # est_gff:cufflinks
> # repeatmasker
> # repeatrunner
> 
> I have no information about exons and introns.
> And I am wondering if the Augustus.gff3 was used...
> 
> On top of that, I forgot to set up pred_stats to 1.
> If I understand well, I can just change this in the ocntrol file, and run Maker again. Since there is the output with everything, it won't run again the prediction, only this option. Is that right ?
> 
> Thank you,
> 
> Muriel
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From Alice.Dennis at eawag.ch  Fri Dec 12 07:10:46 2014
From: Alice.Dennis at eawag.ch (Dennis, Alice)
Date: Fri, 12 Dec 2014 14:10:46 +0000
Subject: [maker-devel] iterative Maker2
Message-ID: <1FD5809847938F44B92893606806BD53600D845F@EE-MBX1.ee.emp-eaw.ch>

Hi all,

I am a relatively new user to Maker2, and I'm looking for advise on running many iterations of the same dataset in Maker2.

I have a relatively small genome (~124 MB) from a wasp that is assembled into ~1,500 scaffold. I have run several iterations of Maker2 by re-generating .hmms in SNAP and feeding them into the next round, and my gene predictions keep increasing (in number and in size).  The only thing that changes at each round is the .hmm.
This is the evidence that I give is:

-          de novo assembled ESTs from a different strain of the same species (70,000 contigs... I am currently working on improving this assembly with the hope that this will be helpful here)

-          610 proteins extracted from the genome scaffolds using CEGMA and HaMSTr

For my 1st iteration, I used the Nasonia .hmm from SNAP, and the est2genome/protein2genome option.

For the 2nd, 3rd and 4th rounds I have used .hmms generated from the previous round, all without the est2genome/protein2genome option. All other files are the same as in the original run.

As I understand it, after the second round, nothing should change in Maker2. But the differences are obvious between runs. Some entirely new exons are annotated. For example,  just counting "exon" in the .gff file gives me 73,000 after the third iteration and 96,000 after the fourth! Actually the biggest leap in this number is between the third and fourth round. I can also see that many features are longer when I look at the files in Geneious.

Is this sort of change possible after the second round of Maker2? Is there something I have done wrong in my runs, or am a understanding this output incorrectly?

Thank you,
Alice

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141212/0df64686/attachment-0003.html>

From carsonhh at gmail.com  Fri Dec 12 08:41:42 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 12 Dec 2014 08:41:42 -0700
Subject: [maker-devel] iterative Maker2
In-Reply-To: <1FD5809847938F44B92893606806BD53600D845F@EE-MBX1.ee.emp-eaw.ch>
References: <1FD5809847938F44B92893606806BD53600D845F@EE-MBX1.ee.emp-eaw.ch>
Message-ID: <7D42E0F3-B601-4D67-AF07-09C98469D8E5@gmail.com>

The gene models are actually produced by SNAP, Augustus, or whatever gene predictor you are using, so if you change the HMM every round, then the models will change too.  But I have one concern.  You are using a very sparse protein evidence dataset.  The protein dataset is very important to MAKER?s performance, and for itterative training of the ab initio predictors.  Normally after the second iteration, additional training should not be beneficial, but if you are getting wildly different results on 3rd and 4th round, then you probably aren?t getting sufficient good models to train with.

For a protein dataset you should be using the entire a proteome from a minimum of two related species and perhaps all of UniProt/Swiss-prot to get a broad protein database.  Don?t use the proteins extracted by CEGMA and HaMSTr.  CEGMA can be used to guide the first HMM creation (cegma2zff scrip that comes with MAEKR), but don?t give the proteins to MAKER as evidence, also the HaMSTr results will be redundant with the ESTs.  You need proteins from related species to look for homology not found in the EST dataset.

Also repeat masking is important for any genome and has a huge effect on ab initio predictor performance.  Make sure you run something like RepeatModeler to look for species specific repeats that will not already be in RepBase.  Then add those results to the rmlib= option in the maker control files.

Thanks,
Carson

 
> On Dec 12, 2014, at 7:10 AM, Dennis, Alice <Alice.Dennis at eawag.ch> wrote:
> 
> Hi all,
>  
> I am a relatively new user to Maker2, and I?m looking for advise on running many iterations of the same dataset in Maker2.
>  
> I have a relatively small genome (~124 MB) from a wasp that is assembled into ~1,500 scaffold. I have run several iterations of Maker2 by re-generating .hmms in SNAP and feeding them into the next round, and my gene predictions keep increasing (in number and in size).  The only thing that changes at each round is the .hmm.
> This is the evidence that I give is:
> -          de novo assembled ESTs from a different strain of the same species (70,000 contigs? I am currently working on improving this assembly with the hope that this will be helpful here)
> -          610 proteins extracted from the genome scaffolds using CEGMA and HaMSTr
>  
> For my 1st iteration, I used the Nasonia .hmm from SNAP, and the est2genome/protein2genome option.
>  
> For the 2nd, 3rd and 4th rounds I have used .hmms generated from the previous round, all without the est2genome/protein2genome option. All other files are the same as in the original run.
>  
> As I understand it, after the second round, nothing should change in Maker2. But the differences are obvious between runs. Some entirely new exons are annotated. For example,  just counting ?exon? in the .gff file gives me 73,000 after the third iteration and 96,000 after the fourth! Actually the biggest leap in this number is between the third and fourth round. I can also see that many features are longer when I look at the files in Geneious.
>  
> Is this sort of change possible after the second round of Maker2? Is there something I have done wrong in my runs, or am a understanding this output incorrectly?
>  
> Thank you, 
> Alice
>  
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141212/0318a00e/attachment-0003.html>

From tuanduonganh at gmail.com  Sun Dec 14 05:55:35 2014
From: tuanduonganh at gmail.com (Tuan Duong Anh)
Date: Sun, 14 Dec 2014 14:55:35 +0200
Subject: [maker-devel] Quality filter perl script
Message-ID: <CAPpYYpx_4TaQY+FKLBXH6L--aSKDRcsKfiDgDQHJagV080Rcwg@mail.gmail.com>

Hi all,

I successfully ran MAKER and now looking into rescuing rejected gene models
using protein domain evidence. I have obtained the tsv file from
interproscan and have also updated the GFF3 file with this result using
ipr_update_gff. In the next step I will need quality_filter.pl script to
generate the default and standard build, however this script is not
included in my version of MAKER. Do you know where can I get this script?

Thanks.
Tuan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141214/bbdcca6b/attachment-0003.html>

From michael.s.campbell1 at gmail.com  Mon Dec 15 13:13:29 2014
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Mon, 15 Dec 2014 13:13:29 -0700
Subject: [maker-devel] Quality filter perl script
In-Reply-To: <CAPpYYpx_4TaQY+FKLBXH6L--aSKDRcsKfiDgDQHJagV080Rcwg@mail.gmail.com>
References: <CAPpYYpx_4TaQY+FKLBXH6L--aSKDRcsKfiDgDQHJagV080Rcwg@mail.gmail.com>
Message-ID: <CAAi6vWXiSqS-Y=10BSOO_WYBcXvZ5XzT7psEgque1yz3M4z03g@mail.gmail.com>

Hi Tuan,

I've attached a copy of the quality filter script. I've removed the .pl
extension because some email services will not accept them.

Take care,
Mike

On Sun, Dec 14, 2014 at 5:55 AM, Tuan Duong Anh <tuanduonganh at gmail.com>
wrote:
>
> Hi all,
>
> I successfully ran MAKER and now looking into rescuing rejected gene
> models using protein domain evidence. I have obtained the tsv file from
> interproscan and have also updated the GFF3 file with this result using
> ipr_update_gff. In the next step I will need quality_filter.pl script to
> generate the default and standard build, however this script is not
> included in my version of MAKER. Do you know where can I get this script?
>
> Thanks.
> Tuan
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>

-- 
Michael Campbell MS, RD.
Doctoral Candidate
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
ph:585-3543
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141215/7f57e7e4/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: quality_filter
Type: application/octet-stream
Size: 4598 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141215/7f57e7e4/attachment-0003.obj>

From cognitiveshrapnel at gmail.com  Sat Dec 27 19:59:12 2014
From: cognitiveshrapnel at gmail.com (Justin Peyton)
Date: Sat, 27 Dec 2014 21:59:12 -0500
Subject: [maker-devel] openmpi instantly chokes on maker
Message-ID: <CAKJ_QP2KRmdVqSPAPKrTfek+_H9jOQUFwFg3t=WBnsJZBSAz-g@mail.gmail.com>

I am working on getting maker running on a system running ubuntu 14.04. I
have installed maker and it runs great on a small but real data set. When I
try it with openmpi with the exact same inputs, however, I get the below
error almost instantly.

STATUS: Parsing control files...
STATUS: Processing and indexing input FASTA files...
[molybdenum:23241] *** Process received signal ***
[molybdenum:23241] Signal: Segmentation fault (11)
[molybdenum:23241] Signal code: Address not mapped (1)
[molybdenum:23241] Failing at address: 0x50c
[molybdenum:23241] [ 0]
/lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30]
[molybdenum:23241] [ 1]
/usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7f99bd5155a2]
[molybdenum:23241] [ 2]
/lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30]
[molybdenum:23241] [ 3]
/lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7f99bd19fbad]
[molybdenum:23241] [ 4]
/usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7f99bcbcc156]
[molybdenum:23241] [ 5]
/usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7f99bcbc34bb]
[molybdenum:23241] [ 6]
/usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7f99bce6e97e]
[molybdenum:23241] [ 7]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f99bc944182]
[molybdenum:23241] [ 8]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f99bd1acefd]
[molybdenum:23241] *** End of error message ***
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
SIGTERM received
[molybdenum:23252] *** Process received signal ***
[molybdenum:23252] Signal: Segmentation fault (11)
[molybdenum:23252] Signal code: Address not mapped (1)
[molybdenum:23252] Failing at address: 0x50c
[molybdenum:23252] [ 0]
/lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30]
[molybdenum:23252] [ 1]
/usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7fb191f5e5a2]
[molybdenum:23252] [ 2]
/lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30]
[molybdenum:23252] [ 3]
/lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7fb191be8bad]
[molybdenum:23252] [ 4]
/usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7fb191615156]
[molybdenum:23252] [ 5]
/usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7fb19160c4bb]
[molybdenum:23252] [ 6]
/usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7fb1918b797e]
[molybdenum:23252] [ 7]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7fb19138d182]
[molybdenum:23252] [ 8]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fb191bf5efd]
[molybdenum:23252] *** End of error message ***
SIGTERM received
--------------------------------------------------------------------------
mpiexec noticed that process rank 2 with PID 23241 on node molybdenum
exited on signal 11 (Segmentation fault).


I have tried reinstalling both maker and openmpi. I have tried two
different versions of both maker and openmpi. I am curenlty working with
maker 2.31.6 and openmpi 1.8.3 because I have had those work together on
another system. I have triple checked that LD_PRELOAD is properly set. I
have a feeling that I am pissing something small. I appreciate all the help.

Justin Peyton
The Ohio State University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141227/873f3c68/attachment-0003.html>

From harini1981 at gmail.com  Tue Dec 23 03:31:46 2014
From: harini1981 at gmail.com (Harini Vinod)
Date: Tue, 23 Dec 2014 16:01:46 +0530
Subject: [maker-devel] regd aed score plot
Message-ID: <CA++b=XvN-zF9F4frzC-eyzyJUBucXNWFGAtZs-zQ822HRwSufw@mail.gmail.com>

Dear Concern,
I had used the following script from MAKER-DEVEL
AED_cdf_generator.pl to obtain the plot

I get the following error
readline() on closed filehandle GEN0 at AED_cdf_generator.pl line 69.
AED    scaffold11.gff,    scaffold7.gff
Use of uninitialized value $total in division (/) at AED_cdf_generator.pl
line 43.
Illegal division by zero at AED_cdf_generator.pl line 43.

Can you kindly suggest what could have gone wrong???
regards
Harini

-- 
K.Harini
PhD scholar
Lab-25
NCBS,GKVK
Bangalore
560065
harinik at ncbs.res.in
+91 9535292110
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141223/5b82e6c1/attachment-0003.html>

From michael.s.campbell1 at gmail.com  Mon Dec 29 12:33:49 2014
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Mon, 29 Dec 2014 12:33:49 -0700
Subject: [maker-devel] regd aed score plot
In-Reply-To: <CA++b=XvN-zF9F4frzC-eyzyJUBucXNWFGAtZs-zQ822HRwSufw@mail.gmail.com>
References: <CA++b=XvN-zF9F4frzC-eyzyJUBucXNWFGAtZs-zQ822HRwSufw@mail.gmail.com>
Message-ID: <CAAi6vWVQrKX01jKSAVrCCeopOuMYOwrGVL2TRCew+oGE=fOpoQ@mail.gmail.com>

I think I fixed this in a recent svn commit. Try the attached version of
the script and let me know if it works.

Thanks,
Mike

On Tue, Dec 23, 2014 at 3:31 AM, Harini Vinod <harini1981 at gmail.com> wrote:

>
> Dear Concern,
> I had used the following script from MAKER-DEVEL
> AED_cdf_generator.pl to obtain the plot
>
> I get the following error
> readline() on closed filehandle GEN0 at AED_cdf_generator.pl line 69.
> AED    scaffold11.gff,    scaffold7.gff
> Use of uninitialized value $total in division (/) at AED_cdf_generator.pl
> line 43.
> Illegal division by zero at AED_cdf_generator.pl line 43.
>
> Can you kindly suggest what could have gone wrong???
> regards
> Harini
>
> --
> K.Harini
> PhD scholar
> Lab-25
> NCBS,GKVK
> Bangalore
> 560065
> harinik at ncbs.res.in
> +91 9535292110
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>


-- 
Michael Campbell MS, RD.
Doctoral Candidate
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
ph:585-3543
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141229/affaccd8/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: AED_cdf_generator.pl.gz
Type: application/x-gzip
Size: 1116 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141229/affaccd8/attachment-0001.tgz>

From xvazquezc at gmail.com  Mon Dec 29 21:00:56 2014
From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=)
Date: Tue, 30 Dec 2014 15:00:56 +1100
Subject: [maker-devel] few basic questions
Message-ID: <CAL0hg4Epkbe29L5sqkdhyMucT9eSZ_Dp=_UmGqw=HBsCrQ6t-g@mail.gmail.com>

Hi there,

I'm a newbie dealing with genomes and I've been trying to start using Maker
for the annotation. I understand the base concepts but I have doubts about
the correct steps to follow. I've being through the 2014 video tutorial and
searched for detailed steps and I still have some question, maybe a bit
obvious tough...

I have to annotate two fungal genomes and I only have the DNA assembly (no
EST or protein files).
I understand that lacking of EST and protein files I should provide them as
alt-est and protein from the closest species I can, but is it enough with
one EST file from one organism for the alt-est?

Regarding the steps to process would this be correct?:

   1. run Maker with the genome, alt-est and protein files, with
   est2genome=1 and protein2genome=1 (softmask=1 ?)
   2. with this first output, create the hmm file for SNAP based on the
   first output
   3. Set est2genome=0 and protein2genome=0, set the snaphmm file and run
   again (using -base option)
   4. repeat2 and 3 as necessary*

*How do you know when you get to the point where no more refinement is
possible? Would that the final model? It should be based on the AED scores?
How can I get it without looking into individual sequence headings? Also,
do you perform the bootstrapping on the same folder? In the tutorial
<http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014>I
saw different folders, (e.g. pyu_contig1, pyu_contig2) used on each
repetition, not sure if just for demonstration purposes or if it is the
proper way to go..

I'm trying to run also a gene prediction with Augustus and GeneMark. The
first run will include an already trained profile for Augustus and the
native hmm file of genemark-ES**. Do they need to repeat the prediction by
bootstrap like with SNAP? If so, do I need to generate new hmm files or
prediction models based on results?

**I have been trying to make the hmm file for genemark-ES using the gm_es.pl
script but no matter what parameters I use the cluster shut the job down as
it exceeds 128GB of memory in use. The genome I've been testing for this is
about 42Mbp in a roughly 40-50 MB fasta file

Thank you in advance,

Xabier

-- 
Xabier V?zquez Campos
*PhD Candidate*
Water Research Centre
School of Civil and Environmental Engineering
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141230/7f403c04/attachment-0003.html>

From carsonhh at gmail.com  Wed Dec 31 13:39:10 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 31 Dec 2014 13:39:10 -0700
Subject: [maker-devel] few basic questions
In-Reply-To: <CAL0hg4Epkbe29L5sqkdhyMucT9eSZ_Dp=_UmGqw=HBsCrQ6t-g@mail.gmail.com>
References: <CAL0hg4Epkbe29L5sqkdhyMucT9eSZ_Dp=_UmGqw=HBsCrQ6t-g@mail.gmail.com>
Message-ID: <BD84D7BA-F0CC-4534-B167-59DC30942E9E@gmail.com>

Hi Xabier,

See below ?> 


> I have to annotate two fungal genomes and I only have the DNA assembly (no EST or protein files). 
> I understand that lacking of EST and protein files I should provide them as alt-est and protein from the closest species I can, but is it enough with one EST file from one organism for the alt-est?

Provide alt-EST if you have ESTs from a closely relate species, but do not have the proteome for that species.  If you have the proteome, use that.  Both are aligned in amino acid space, and provide the same hint information, the only difference being that alt-EST takes 10x longer because because noth target and query must be translated into all 6 reading frames.


> Regarding the steps to process would this be correct?:
> run Maker with the genome, alt-est and protein files, with est2genome=1 and protein2genome=1 (softmask=1 ?)
> with this first output, create the hmm file for SNAP based on the first output
> Set est2genome=0 and protein2genome=0, set the snaphmm file and run again (using -base option)
> repeat2 and 3 as necessary*
If you don?t have ESTs, don?t do est2genome (alt-ESTs don?t count).  Just do protein2genome.  In general to rounds of training is the maximum you should do.  At that point, ab initio predictions and hint based predictions will start to look like each other (so the ab initio models are doing well on their own).


> *How do you know when you get to the point where no more refinement is possible? Would that the final model? It should be based on the AED scores? How can I get it without looking into individual sequence headings? Also, do you perform the bootstrapping on the same folder? In the tutorial  <http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014>I saw different folders, (e.g. pyu_contig1, pyu_contig2) used on each repetition, not sure if just for demonstration purposes or if it is the proper way to go..

Run it in the same folder.  This will allow MAKER to recycle raw reports from BALST etc. from the previous run (i.e. MAKER will run faster).  In the tutorial we ran separately just to be able to open old results and compare.


> I'm trying to run also a gene prediction with Augustus and GeneMark. The first run will include an already trained profile for Augustus and the native hmm file of genemark-ES**. Do they need to repeat the prediction by bootstrap like with SNAP? If so, do I need to generate new hmm files or prediction models based on results?

You do with Augustus, but not GeneMark which does self training.


> **I have been trying to make the hmm file for genemark-ES using the gm_es.pl <http://gm_es.pl/> script but no matter what parameters I use the cluster shut the job down as it exceeds 128GB of memory in use. The genome I've been testing for this is about 42Mbp in a roughly 40-50 MB fasta file

You can train GeneMark with just part of the genome. Try using 10Mb made up of the longest contigs.  Also I only recommend using GeneMark on Fungi, it tends to not work well on organisms with more complex intron/exon structures. Also you should build a species specific repeat database to supplement RepeatMaskers internal libraries.  I?d recommend using Repeat Modeler.


Thanks,
Carson

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141231/bf5d22d0/attachment-0003.html>

From carsonhh at gmail.com  Wed Dec 31 13:42:38 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 31 Dec 2014 13:42:38 -0700
Subject: [maker-devel] openmpi instantly chokes on maker
In-Reply-To: <CAKJ_QP2KRmdVqSPAPKrTfek+_H9jOQUFwFg3t=WBnsJZBSAz-g@mail.gmail.com>
References: <CAKJ_QP2KRmdVqSPAPKrTfek+_H9jOQUFwFg3t=WBnsJZBSAz-g@mail.gmail.com>
Message-ID: <6BEE4837-A3E1-4FBF-AD18-4FBFD479BB2A@gmail.com>

Hi Justin,

You need to set LD_PRELOAD to the proper location and add the '-mca btl ^openib? flag to your command line.


The following is from the INSTALL file that should be included with MAKER ?>

If using OpenMPI, make sure to set LD_PRELOAD to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/location/of/openmpi/lib/libmpi.so).


1.  Say yes to the 'configure for MPI' question when running 'perl Build.PL? in step 1 of the EASY INSTALL.

2.  Give path to 'mpicc'. Note to make sure you do not give the path to ?mpicc' from another MPI flavor that might be installed on your system.

3.  Give path to the folder containing 'mpi,h'. Note to make sure you do not give the path to a folder from another MPI flavor that might be installed on your system. Mixing MPI flavors for 'mpicc' and 'mpi.h' will cause failures. Make sure to read and confirm the auto-detected paths.

4.  Finish installation according to steps 2-4 of the EASY INSTALL

    Note: For OpenMPI you may also want to set OMPI_MCA_mpi_warn_on_fork=0 in your ~/.bash_profile to turn off certain nonfatal warnings.

    Note: If jobs hang or freeze when using mpiexec under OpenMPI try adding the '-mca btl ^openib' flag to mpiexec command when running MAKER.

        Example: mpiexec -mca btl ^openib -n 20 maker


Thanks,
Carson


> On Dec 27, 2014, at 7:59 PM, Justin Peyton <cognitiveshrapnel at gmail.com> wrote:
> 
> I am working on getting maker running on a system running ubuntu 14.04. I have installed maker and it runs great on a small but real data set. When I try it with openmpi with the exact same inputs, however, I get the below error almost instantly. 
> 
> STATUS: Parsing control files...
> STATUS: Processing and indexing input FASTA files...
> [molybdenum:23241] *** Process received signal ***
> [molybdenum:23241] Signal: Segmentation fault (11)
> [molybdenum:23241] Signal code: Address not mapped (1)
> [molybdenum:23241] Failing at address: 0x50c
> [molybdenum:23241] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30]
> [molybdenum:23241] [ 1] /usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7f99bd5155a2]
> [molybdenum:23241] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30]
> [molybdenum:23241] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7f99bd19fbad]
> [molybdenum:23241] [ 4] /usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7f99bcbcc156]
> [molybdenum:23241] [ 5] /usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7f99bcbc34bb]
> [molybdenum:23241] [ 6] /usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7f99bce6e97e]
> [molybdenum:23241] [ 7] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f99bc944182]
> [molybdenum:23241] [ 8] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f99bd1acefd]
> [molybdenum:23241] *** End of error message ***
> SIGTERM received
> SIGTERM received
> SIGTERM received
> SIGTERM received
> SIGTERM received
> [molybdenum:23252] *** Process received signal ***
> [molybdenum:23252] Signal: Segmentation fault (11)
> [molybdenum:23252] Signal code: Address not mapped (1)
> [molybdenum:23252] Failing at address: 0x50c
> [molybdenum:23252] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30]
> [molybdenum:23252] [ 1] /usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7fb191f5e5a2]
> [molybdenum:23252] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30]
> [molybdenum:23252] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7fb191be8bad]
> [molybdenum:23252] [ 4] /usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7fb191615156]
> [molybdenum:23252] [ 5] /usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7fb19160c4bb]
> [molybdenum:23252] [ 6] /usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7fb1918b797e]
> [molybdenum:23252] [ 7] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7fb19138d182]
> [molybdenum:23252] [ 8] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fb191bf5efd]
> [molybdenum:23252] *** End of error message ***
> SIGTERM received
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 2 with PID 23241 on node molybdenum exited on signal 11 (Segmentation fault).
> 
> 
> I have tried reinstalling both maker and openmpi. I have tried two different versions of both maker and openmpi. I am curenlty working with maker 2.31.6 and openmpi 1.8.3 because I have had those work together on another system. I have triple checked that LD_PRELOAD is properly set. I have a feeling that I am pissing something small. I appreciate all the help.
> 
> Justin Peyton
> The Ohio State University
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141231/09659eb9/attachment-0003.html>

From jerryzhaosjtu at gmail.com  Wed Dec 31 18:48:29 2014
From: jerryzhaosjtu at gmail.com (=?UTF-8?B?6LW16LaK?=)
Date: Thu, 1 Jan 2015 09:48:29 +0800
Subject: [maker-devel] some problems using MAKER
Message-ID: <CAMxJ+aewxN6eM_eQDfNK6p4Vn1c90Jwqd8EJokaAZPUbB1dDMA@mail.gmail.com>

Hi all,

Recently I'm using MAKER to annotate a single chromosome of rice as a
pre-experiment. And I'm confronting some problems. After the annotation
when I run the evaluation of eval between my result and gold standard, the
gene sensitivity&specificity is only around 20%. And after I added the gff3
file maker made itself to run maker again, I found that the result is worse
than 20%.

My input is a Trinity-processed RNA-seq file and a protein file.  I chose
snap, augustus and genemark as ab initio predictors.

I paste my maker_opts.ctl here:

#-----Genome (these are always required)
genome=chr12.fasta #genome sequence (fasta file or fasta embeded in GFF3
file)
organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic

#-----Re-annotation Using MAKER Derived GFF3
maker_gff=chr12.gff #MAKER derived GFF3 file
est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no
altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no

#-----EST Evidence (for best results provide a file for at least one)
est=rna-seq_trinity.fasta #set of ESTs or assembled mRNA-seq in fasta format
altest= #EST/cDNA sequence file in fasta format from an alternate organism
est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
altest_gff= #aligned ESTs from a closly relate species in GFF3 format

#-----Protein Homology Evidence (for best results provide a file for at
least one)
protein=Osativa_193_peptide.fa  #protein sequence file in fasta format
(i.e. from mutiple oransisms)
protein_gff= #aligned protein homology evidence from an external GFF3 file

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org=Rice #select a model organism for RepBase masking in RepeatMasker
rmlib= #provide an organism specific repeat library in fasta format for
RepeatMasker
repeat_protein= #provide a fasta file of transposable element proteins for
RepeatRunner
rm_gff= #pre-identified repeat elements from an external GFF3 file
prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change
this), 1 = yes, 0 = no
softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg
and dust filtering)

#-----Gene Prediction
snaphmm=rice #SNAP HMM file
gmhmm=/lustre/home/clswcc/yzhao/MAKER/maker/exe/genemark_hmm_euk_linux_64/ehmm/o_sativa.mod
#GeneMark HMM file
augustus_species=arabidopsis #Augustus gene prediction species model
fgenesh_par_file= #FGENESH parameter file
pred_gff=augus.gff3 #ab-initio predictions from an external GFF3 file
model_gff= #annotated gene models from an external GFF3 file (annotation
pass-through)
est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no
trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no


snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
unmask=1 #also run ab-initio prediction programs on unmasked sequence, 1 =
yes, 0 = no

#-----Other Annotation Feature Types (features MAKER doesn't recognize)
other_gff= #extra features to pass-through to final MAKER generated GFF3
file

#-----External Application Behavior Options
alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST
databases
cpus=16 #max number of cpus to use in BLAST and RepeatMasker (not for MPI,
leave 1 when using MPI)


Could you help me? Thank you !!!


-- 

*Yue Zhao (Jerry)*

Bachelor Candidate of Plant Biotechnology

Researcher in UCLA-CSST program

Shanghai Jiao Tong University, Shanghai

*jerryzhaosjtu at gmail.com <jerryzhaosjtu at gmail.com>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20150101/64655e98/attachment-0003.html>