From pushplata.singh at teri.res.in  Sun Mar  2 23:29:37 2014
From: pushplata.singh at teri.res.in (Pushplata Singh)
Date: Mon, 3 Mar 2014 10:59:37 +0530
Subject: [maker-devel] Query on Hardware requirement
Message-ID: <OF837195A3.CDBC7472-ON65257C90.001D994D-65257C90.001E2DB9@teri.res.in>


Hi,

I am trying to assemble and analyse(bio-informatics) genome sequence of a
35 GB fungal genome. The raw data that has been generated from Illumina
sequencing is of  ~15 GB. Could you please suggest me the system (hardware)
requirement for installing and running Maker and ALLPATHS-LG sofrware for
the job?

Thank you
Pushplata Singh, PhD
Nanobiotechnology Centre
Biotechnology and Management of Bioresources Division
The Energy and Resources Institute
Darbari Seth Block , India Habitat Centre,Lodhi Road
New Delhi 110003 India
Phone +91 11 24682100 ext 2611
Fax +91 11 24682145


------------------------------------------------------------------------------------------------------------

Disclaimer:

The information contained in this e-mail is intended for the person or entity
to which it is addressed, and it may contain confidential and/or privileged
material. Any review or other use of this mail or taking any action based on it
by persons or entities other than the intended recipient is strictly prohibited.
If you receive this e-mail by mistake, please contact the sender, and delete all
copies of this mail.This e-mail has been scanned and verified by McAfee SaaS
Email Security, formerly MX Logic.


From dence at genetics.utah.edu  Mon Mar  3 08:11:34 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Mon, 3 Mar 2014 14:11:34 +0000
Subject: [maker-devel] Query on Hardware requirement
In-Reply-To: <OF837195A3.CDBC7472-ON65257C90.001D994D-65257C90.001E2DB9@teri.res.in>
References: <OF837195A3.CDBC7472-ON65257C90.001D994D-65257C90.001E2DB9@teri.res.in>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D68BF9@mxb2.hg.genetics.utah.edu>

Hi Pradeep, 

I think Allpaths is developed by the Broad Institute, so you'd have to check their documentation for their system requirments. MAKER is installable on Linux and Mac OS X computers. The throughput you'll be able to achieve with MAKER depends on how many processors and how much RAM the machine has. To take advantage of MAKER's ability to parallelize the annotation process, you need some version of MPI installed on your machine. MAKER can try to install MPI for you, but a manual installation is usually required. 

I hope that helps. 

Thanks,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Pushplata Singh [pushplata.singh at teri.res.in]
Sent: Sunday, March 02, 2014 10:29 PM
To: maker-devel at yandell-lab.org
Cc: Pradeep Dahiya
Subject: [maker-devel] Query on Hardware requirement

Hi,

I am trying to assemble and analyse(bio-informatics) genome sequence of a
35 GB fungal genome. The raw data that has been generated from Illumina
sequencing is of  ~15 GB. Could you please suggest me the system (hardware)
requirement for installing and running Maker and ALLPATHS-LG sofrware for
the job?

Thank you
Pushplata Singh, PhD
Nanobiotechnology Centre
Biotechnology and Management of Bioresources Division
The Energy and Resources Institute
Darbari Seth Block , India Habitat Centre,Lodhi Road
New Delhi 110003 India
Phone +91 11 24682100 ext 2611
Fax +91 11 24682145


------------------------------------------------------------------------------------------------------------

Disclaimer:

The information contained in this e-mail is intended for the person or entity
to which it is addressed, and it may contain confidential and/or privileged
material. Any review or other use of this mail or taking any action based on it
by persons or entities other than the intended recipient is strictly prohibited.
If you receive this e-mail by mistake, please contact the sender, and delete all
copies of this mail.This e-mail has been scanned and verified by McAfee SaaS
Email Security, formerly MX Logic.

_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carson.holt at genetics.utah.edu  Mon Mar  3 13:08:49 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Mon, 3 Mar 2014 19:08:49 +0000
Subject: [maker-devel] FW: error runinig agustus
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A890B159@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A890B159@SKREGIXES2.AGR.GC.CA>
Message-ID: <CF3A2120.A782%carson.holt@genetics.utah.edu>

Forwarding this to the maker-devel list.


On 3/3/14, 12:04 PM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:

>I encountered the following error while running maker (2nd annotation
>using gff file of the first maker run and trinity assembled RNA seq as
>EST)
>
>ERROR: Augustus failed
>--> rank=NA, hostname=rapa.agr.gc.ca
>
>Note : 1st run of the maker was done by Maker 2.10 and for the 2nd one I
>am using 2.31
>
>Your help is appreciated
>
>
>HB
>
>
>
>
>


From carsonhh at gmail.com  Mon Mar  3 13:11:08 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 03 Mar 2014 12:11:08 -0700
Subject: [maker-devel] FW: error runinig agustus
Message-ID: <CF3A21A5.A788%carsonhh@gmail.com>

You will need to provide more detail.  Probably the entire error log and
the maker control files.

Thanks,
Carson


On 3/3/14, 12:08 PM, "Carson Holt" <carson.holt at genetics.utah.edu> wrote:

>Forwarding this to the maker-devel list.
>
>
>On 3/3/14, 12:04 PM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:
>
>>I encountered the following error while running maker (2nd annotation
>>using gff file of the first maker run and trinity assembled RNA seq as
>>EST)
>>
>>ERROR: Augustus failed
>>--> rank=NA, hostname=rapa.agr.gc.ca
>>
>>Note : 1st run of the maker was done by Maker 2.10 and for the 2nd one I
>>am using 2.31
>>
>>Your help is appreciated
>>
>>
>>HB
>>
>>
>>
>>
>>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From sjackman at gmail.com  Tue Mar  4 20:10:42 2014
From: sjackman at gmail.com (Shaun Jackman)
Date: Tue, 4 Mar 2014 18:10:42 -0800
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
	<CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
	<CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>
Message-ID: <CADX6M3rdHPKSk4VMUpbxKSA=rhVVUk+L=dG1xMibWz1KntF2AA@mail.gmail.com>

Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for
the tip.

The rRNA genes that are found with est2genome have the feature type set to
*mRNA* and have corresponding *five_prime_UTR*, *CDS* and
*three_prime_UTR*features. Ideally the feature type would be set to
*rRNA* or *tRNA* as appropriate, and would omit the UTR and CDS features.
Is that a feature that you would be interested in adding to MAKER? The rRNA
gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is
standard, so determining the appropriate type should be straight forward.

Thanks again for your help with this. Cheers,
Shaun


On 27 February 2014 17:13, Carson Holt <carsonhh at gmail.com> wrote:

> Set single_exon=1, and the minimum size to a smaller value.  I think it's
> set to 250 right now.  Also est2genome is looking for ORF, so if there is
> none (as with tRNAs) they probably won't get picked up.
>
> --Carson
>
> Sent from my iPhone
>
> On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:
>
> Sorry, ignore my previous question. est_forward also carries forward the
> names of protein evidence and works like a charm. Thank you!
>
> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller
> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They
> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect
> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value
> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing
> these hits?
>
> organism_type=prokaryotic
> est2genome=1
> protein2genome=1
> est_forward=1
>
> Cheers,
> Shaun
>
>
> On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
>
>> Is there a corresponding protein_forward=1 option to map forward protein
>> names from protein2genome?
>>
>> Cheers,
>> Shaun
>>
>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com<//carsonhh at gmail.com>)
>> wrote:
>>
>> Sorry I meant to say prefilter on the score in the mRNA column before
>> passing the gff3 to model_gff.
>>
>> --Carson
>>
>> Sent from my iPhone
>>
>> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>
>>  What you can do is run it once with just est_forward=1 and
>> est2genome/protein2genome set to 1.  Then take those results, pass them in
>> as model_gff and use the map_forward option to then filter the results
>> based on mRNA score and that would copy names onto new gene under the
>> standard MAKER pipeline.  Eventually it?s really supposed to go into a
>> separate tool that will map genes onto new assemblies (but under the hood
>> the tool will just be calling MAKER with certain parameters restricted).  I
>> do this because if people commonly use it mixed with things like SNAP I can
>> start to get some very weird behaviors.
>>
>> Thanks,
>> Carson
>>
>>  From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>> Date: Wednesday, February 26, 2014 at 3:04 PM
>> To: Carson Holt <carsonhh at gmail.com>
>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] Mapping gene names
>>
>>  It seems that this could be a very useful option in those cases where
>> you have firm a priori knowledge of the placement of ESTs. However, while
>> trying it I note that est_forward implies that the est2genome predictor is
>> turned on, implicitly. Is this necessary for this to work? I?m after the
>> behavior you describe below where exonerate is made to try really hard
>> within a limited region to align an est, but I would not like maker to
>> produce est2genome predictions.
>>
>> In general, I think this maker_coor and est_forward is a feature set that
>> is worthy to be promoted into a documented feature.
>>
>> THanks,
>> Mikael
>>
>>  26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>>
>>  It will still work without est_forward.  It just works a little
>> differently.  Keep in mind this was a hidden feature I used to find
>> stubborn or hard to find missing genes after reassembly of a genome.
>>
>> If est_forward is provided, MAKER will parse the database to look for the
>> maker_coor tags early in the pipeline.  Then it will create a list of
>> locations to search, and it will search them even if there are no BLAST
>> results to seed the search (normally MAKER gets a BLAST result first and
>> then polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to
>> look for a match using all of chr1 as the input to exonerate even when
>> BLAST finds nothing (this is a very very slow search, but can help pick up
>> one or two stubborn genes that don?t remap well).  To allow this, MAKER
>> gives exonerate looser matching parameters (i.e. allows for single base
>> pair introns perhaps caused by assembly errors).  The logic here is that
>> given the fact that I already told MAKER that with some degree of
>> confidence I expect sequence A to map to to location X, it will try its
>> hardest to make it match.
>>
>> Without est_forward set, the maker_coor= flag still gets read in GI.pm at
>> line 1563, but only after a BLAST alignment has already seeded it to the
>> region (that BLAST result has the information in its description
>> parameter).  MAKER will then ignore seeds completely outside of maker_coor.
>> In addition any BLAST seeds that overlap maker_coor will get the search
>> space for alignment polishing adjusted to match maker_coor exactly.  Also
>> match parameters for exonerate will not be relaxed as they were with
>> est_forward.
>>
>> As you can see the behavior, is slightly different (because it?s an
>> accidental feature).
>>
>> Thanks,
>> Carson
>>
>>
>>
>>  From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>> Date: Wednesday, February 26, 2014 at 6:37 AM
>> To: Carson Holt <carsonhh at gmail.com>
>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] Mapping gene names
>>
>>  That might be a useful and time saving accidental feature. But, reading
>> the code, it seems that I need to supply maker_coor but not gene_id, as
>> well as the configuration option est_forward for this to work. Any
>> occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1
>> right?
>>
>> Mikael
>>
>>  26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>>
>>  Yes.  That should work as well as an accidental feature.
>>
>> --Carson
>>
>> Sent from my iPhone
>>
>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <
>> mikael.durling at slu.se> wrote:
>>
>> Can this use of maker_coor be used only to hint about the placement of
>> the ests, without affecting the naming of the final genes? Ie if I have a
>> database of EST where I have a priori knowledge of their rough placement,
>> can this placement be given to maker without providing est_forward=1?
>>
>> Thanks,
>> Mikael
>>
>>  26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>
>>  There is a way.  It?s not a standard option and it?s undocumented, but
>> if you add est_forward=1 to the maker_opts.ctl file, then it will do just
>> that.  The option won?t already be there so you?ll have to type it in.
>>
>> There is also a feature designed to work with this option.  If you add
>> tags to your fasta headers, those can be used to guide the mapping and
>> naming.  For example, gene_id=<some_gene>  will ensure different isoforms
>> that share a common gene_id get clustered into the same gene,
>> and maker_coor=chr1:1-10000 in the fasta header will force a particular
>> sequence to only be mapped against chr1 within the range of 1-10000 bp  and
>> just using maker_coor=chr1 will force it to only be mapped against chr1.
>>
>> This is an undocumented way to remap genes onto new assemblies using
>> blast alignments of earlier transcript or protein annotations as a guide.
>>
>> ?Carson
>>
>>
>>
>>
>>  From: Shaun Jackman <sjackman at gmail.com>
>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>> Date: Tuesday, February 25, 2014 at 5:06 PM
>> To: <maker-devel at yandell-lab.org>
>> Subject: [maker-devel] Mapping gene names
>>
>>  Hi,
>>
>> I?m annotating a genome using a closely related genome from Genbank,
>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence to
>> annotate my genome. I?ve run Maker, and the annotation seems to have worked
>> well. Is it possible to map the names of the genes from the related species
>> to my annotation? I see the *map_forward* option, which applies to the
>> *model_gff* parameter. Is there a similar option for *est* and *protein*?
>>
>> *maker_opts.ctl*
>>
>> est=NC_123456.frn
>> protein=NC_123456.faa
>> est2genome=1
>> protein2genome=1
>>
>> Thanks,
>> Shaun
>>  _______________________________________________ maker-devel mailing
>> list maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>>  http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>>
>>
>>   _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140304/86755749/attachment.html>

From carsonhh at gmail.com  Tue Mar  4 20:33:12 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 04 Mar 2014 19:33:12 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CADX6M3rdHPKSk4VMUpbxKSA=rhVVUk+L=dG1xMibWz1KntF2AA@mail.gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
	<CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
	<CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>
	<CADX6M3rdHPKSk4VMUpbxKSA=rhVVUk+L=dG1xMibWz1KntF2AA@mail.gmail.com>
Message-ID: <CF3BD88C.A7D5%carsonhh@gmail.com>

Trying to call non-coding RNA from ESTs or even sequence homology is
extremely messy (non-trivial problem in most organisms with high false
positive rate), so MAKER for the most part doesn?t even try to do that.  It
focuses only on the coding genes.  You can now use tRNAscan and snoscan in
the newest version for some non-coding RNA support (those features were only
added a couple of months ago).  So just like other prediction tools (snap,
augustus etc.), the primary focus has always been the coding genes.  We?ve
only started adding non-coding RNA support recently for iPlant, so it?s
still relatively immature.

Thanks,
Carson


From:  Shaun Jackman <sjackman at gmail.com>
Reply-To:  Shaun Jackman <sjackman at gmail.com>
Date:  Tuesday, March 4, 2014 at 7:10 PM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Mapping gene names

Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for
the tip.

The rRNA genes that are found with est2genome have the feature type set to
mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR
features. Ideally the feature type would be set to rRNA or tRNA as
appropriate, and would omit the UTR and CDS features. Is that a feature that
you would be interested in adding to MAKER? The rRNA gene names all start
with ?rrn? and the tRNA gene names with ?trn?, as is standard, so
determining the appropriate type should be straight forward.

Thanks again for your help with this. Cheers,
Shaun


On 27 February 2014 17:13, Carson Holt <carsonhh at gmail.com> wrote:
> Set single_exon=1, and the minimum size to a smaller value.  I think it's set
> to 250 right now.  Also est2genome is looking for ORF, so if there is none (as
> with tRNAs) they probably won't get picked up.
> 
> --Carson 
> 
> Sent from my iPhone
> 
> On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:
> 
>> Sorry, ignore my previous question. est_forward also carries forward the
>> names of protein evidence and works like a charm. Thank you!
>> 
>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5
>> and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the
>> blastn output, and in the evidence_0.gff. rrn5 has perfect identity,
>> sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 <
>> eval_blastn=1e-10). How should I debug which filter is removing these hits?
>> organism_type=prokaryotic
>> est2genome=1
>> protein2genome=1
>> est_forward=1
>> Cheers,
>> Shaun
>> 
>> 
>> 
>> On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
>>> Is there a corresponding protein_forward=1 option to map forward protein
>>> names from protein2genome?
>>>  
>>> 
>>> Cheers,
>>> Shaun
>>> 
>>> 
>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com
>>> <mailto://carsonhh at gmail.com> ) wrote:
>>>  
>>>> Sorry I meant to say prefilter on the score in the mRNA column before
>>>> passing the gff3 to model_gff.
>>>> 
>>>> --Carson 
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>>> 
>>>>> What you can do is run it once with just est_forward=1 and
>>>>> est2genome/protein2genome set to 1.  Then take those results, pass them in
>>>>> as model_gff and use the map_forward option to then filter the results
>>>>> based on mRNA score and that would copy names onto new gene under the
>>>>> standard MAKER pipeline.  Eventually it?s really supposed to go into a
>>>>> separate tool that will map genes onto new assemblies (but under the hood
>>>>> the tool will just be calling MAKER with certain parameters restricted).
>>>>> I do this because if people commonly use it mixed with things like SNAP I
>>>>> can start to get some very weird behaviors.
>>>>> 
>>>>> Thanks,
>>>>> Carson
>>>>> 
>>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>>> Date: Wednesday, February 26, 2014 at 3:04 PM
>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>> 
>>>>> It seems that this could be a very useful option in those cases where you
>>>>> have firm a priori knowledge of the placement of ESTs. However, while
>>>>> trying it I note that est_forward implies that the est2genome predictor is
>>>>> turned on, implicitly. Is this necessary for this to work? I?m after the
>>>>> behavior you describe below where exonerate is made to try really hard
>>>>> within a limited region to align an est, but I would not like maker to
>>>>> produce est2genome predictions.
>>>>> 
>>>>> In general, I think this maker_coor and est_forward is a feature set that
>>>>> is worthy to be promoted into a documented feature.
>>>>> 
>>>>> THanks,
>>>>> Mikael
>>>>> 
>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>>>>> 
>>>>>> It will still work without est_forward.  It just works a little
>>>>>> differently.  Keep in mind this was a hidden feature I used to find
>>>>>> stubborn or hard to find missing genes after reassembly of a genome.
>>>>>> 
>>>>>> If est_forward is provided, MAKER will parse the database to look for the
>>>>>> maker_coor tags early in the pipeline.  Then it will create a list of
>>>>>> locations to search, and it will search them even if there are no BLAST
>>>>>> results to seed the search (normally MAKER gets a BLAST result first and
>>>>>> then polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to
>>>>>> look for a match using all of chr1 as the input to exonerate even when
>>>>>> BLAST finds nothing (this is a very very slow search, but can help pick
>>>>>> up one or two stubborn genes that don?t remap well).  To allow this,
>>>>>> MAKER gives exonerate looser matching parameters (i.e. allows for single
>>>>>> base pair introns perhaps caused by assembly errors).  The logic here is
>>>>>> that given the fact that I already told MAKER that with some degree of
>>>>>> confidence I expect sequence A to map to to location X, it will try its
>>>>>> hardest to make it match.
>>>>>> 
>>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm at
>>>>>> line 1563, but only after a BLAST alignment has already seeded it to the
>>>>>> region (that BLAST result has the information in its description
>>>>>> parameter).  MAKER will then ignore seeds completely outside of
>>>>>> maker_coor. In addition any BLAST seeds that overlap maker_coor will get
>>>>>> the search space for alignment polishing adjusted to match maker_coor
>>>>>> exactly.  Also match parameters for exonerate will not be relaxed as they
>>>>>> were with est_forward.
>>>>>> 
>>>>>> As you can see the behavior, is slightly different (because it?s an
>>>>>> accidental feature).
>>>>>> 
>>>>>> Thanks,
>>>>>> Carson
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM
>>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>>> 
>>>>>> That might be a useful and time saving accidental feature. But, reading
>>>>>> the code, it seems that I need to supply maker_coor but not gene_id, as
>>>>>> well as the configuration option est_forward for this to work. Any
>>>>>> occurrences of maker_coor in GI.pm seems to be conditioned on
>>>>>> set_forward=1 right?
>>>>>> 
>>>>>> Mikael
>>>>>> 
>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>> 
>>>>>>> Yes.  That should work as well as an accidental feature.
>>>>>>> 
>>>>>>> --Carson 
>>>>>>> 
>>>>>>> Sent from my iPhone
>>>>>>> 
>>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling
>>>>>>> <mikael.durling at slu.se> wrote:
>>>>>>> 
>>>>>>> Can this use of maker_coor be used only to hint about the placement of
>>>>>>> the ests, without affecting the naming of the final genes? Ie if I have
>>>>>>> a database of EST where I have a priori knowledge of their rough
>>>>>>> placement, can this placement be given to maker without providing
>>>>>>> est_forward=1?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Mikael
>>>>>>> 
>>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>> 
>>>>>>> There is a way.  It?s not a standard option and it?s undocumented, but
>>>>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do
>>>>>>> just that.  The option won?t already be there so you?ll have to type it
>>>>>>> in.
>>>>>>> 
>>>>>>> There is also a feature designed to work with this option.  If you add
>>>>>>> tags to your fasta headers, those can be used to guide the mapping and
>>>>>>> naming.  For example, gene_id=<some_gene>  will ensure different
>>>>>>> isoforms that share a common gene_id get clustered into the same gene,
>>>>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular
>>>>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp
>>>>>>> and just using maker_coor=chr1 will force it to only be mapped against
>>>>>>> chr1.
>>>>>>> 
>>>>>>> This is an undocumented way to remap genes onto new assemblies using
>>>>>>> blast alignments of earlier transcript or protein annotations as a
>>>>>>> guide.
>>>>>>> 
>>>>>>> ?Carson
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> From: Shaun Jackman <sjackman at gmail.com>
>>>>>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>>>>>> To: <maker-devel at yandell-lab.org>
>>>>>>> Subject: [maker-devel] Mapping gene names
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I?m annotating a genome using a closely related genome from Genbank,
>>>>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence
>>>>>>> to annotate my genome. I?ve run Maker, and the annotation seems to have
>>>>>>> worked well. Is it possible to map the names of the genes from the
>>>>>>> related species to my annotation? I see the map_forward option, which
>>>>>>> applies to the model_gff parameter. Is there a similar option for est
>>>>>>> and protein?
>>>>>>> 
>>>>>>> maker_opts.ctl
>>>>>>> est=NC_123456.frn
>>>>>>> protein=NC_123456.faa
>>>>>>> est2genome=1
>>>>>>> protein2genome=1
>>>>>>> Thanks,
>>>>>>> Shaun
>>>>>>> _______________________________________________ maker-devel mailing list
>>>>>>> maker-devel at box290.bluehost.com
>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>>> <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>>> > 
>>>>>>> _______________________________________________
>>>>>>> maker-devel mailing list
>>>>>>> maker-devel at box290.bluehost.com
>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>>> 
>>>>>> 
>>>>> 
>>>> _______________________________________________
>>>> maker-devel mailing list
>>>> maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140304/6f5e8e33/attachment.html>

From felix.bemm at uni-wuerzburg.de  Wed Mar  5 10:35:33 2014
From: felix.bemm at uni-wuerzburg.de (Felix Bemm)
Date: Wed, 05 Mar 2014 17:35:33 +0100
Subject: [maker-devel] Build Issues - v2.31
Message-ID: <53175255.4050102@uni-wuerzburg.de>

Hi,

I am trying to build maker version 2.31. Got the following error:

Configuring MAKER with MPI support
'CCFLAGSEX' is not a valid config option for Inline::C
  at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm 
line 236
  at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm 
line 256
	Parallel::Application::MPI::_bind('/software/mpich2-1.5rc3/bin/mpicc', 
'/software/mpich2-1.5rc3/include', 'blib', '') called at 
/storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 277
	MAKER::Build::ACTION_build('MAKER::Build=HASH(0x2199060)') called at 
/usr/share/perl/5.14/Module/Build/Base.pm line 2024
	Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)', 
'build') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2007
	Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)', 'build') 
called at /storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 469
	MAKER::Build::ACTION_install('MAKER::Build=HASH(0x2199060)') called at 
/usr/share/perl/5.14/Module/Build/Base.pm line 2024
	Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)', 
'install') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2012
	Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)') called at 
./Build line 70

Same procedure worked with 2.29-beta!

Any ideas?

Felix

-- 
Felix Bemm
Department of Bioinformatics
University of W?rzburg, Germany
Tel: +49 931 - 31 83696
Fax: +49 931 - 31 84552
felix.bemm at uni-wuerzburg.de


From carsonhh at gmail.com  Wed Mar  5 10:40:05 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 05 Mar 2014 09:40:05 -0700
Subject: [maker-devel] Build Issues - v2.31
In-Reply-To: <53175255.4050102@uni-wuerzburg.de>
References: <53175255.4050102@uni-wuerzburg.de>
Message-ID: <CF3CA125.A7FA%carsonhh@gmail.com>

You need to update your Inline::C module.  The CCFLAGSEX option was added
to Inline::C a couple of years ago to allow users to pass in flags to the
compiler.

Thanks,
Carson


On 3/5/14, 9:35 AM, "Felix Bemm" <felix.bemm at uni-wuerzburg.de> wrote:

>Hi,
>
>I am trying to build maker version 2.31. Got the following error:
>
>Configuring MAKER with MPI support
>'CCFLAGSEX' is not a valid config option for Inline::C
>  at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm
>line 236
>  at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm
>line 256
>	Parallel::Application::MPI::_bind('/software/mpich2-1.5rc3/bin/mpicc',
>'/software/mpich2-1.5rc3/include', 'blib', '') called at
>/storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 277
>	MAKER::Build::ACTION_build('MAKER::Build=HASH(0x2199060)') called at
>/usr/share/perl/5.14/Module/Build/Base.pm line 2024
>	Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)',
>'build') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2007
>	Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)', 'build')
>called at /storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 469
>	MAKER::Build::ACTION_install('MAKER::Build=HASH(0x2199060)') called at
>/usr/share/perl/5.14/Module/Build/Base.pm line 2024
>	Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)',
>'install') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2012
>	Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)') called at
>./Build line 70
>
>Same procedure worked with 2.29-beta!
>
>Any ideas?
>
>Felix
>
>-- 
>Felix Bemm
>Department of Bioinformatics
>University of W?rzburg, Germany
>Tel: +49 931 - 31 83696
>Fax: +49 931 - 31 84552
>felix.bemm at uni-wuerzburg.de
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carson.holt at genetics.utah.edu  Wed Mar  5 13:02:26 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Wed, 5 Mar 2014 19:02:26 +0000
Subject: [maker-devel] FW: maker-control file
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A890B8A7@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A890B8A7@SKREGIXES2.AGR.GC.CA>
Message-ID: <CF3CC2C6.A802%carson.holt@genetics.utah.edu>


On 3/5/14, 11:59 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:

>Dear Maker users
>
>I want to run maker on a fungal genome of about 45 Mb with about 1/3 of
>the genome begin repeat rich. But most of the virulent genes are located
>within the repeat regions flanked but stretch of repeats. I am not sure
>if I  use the repeat masker option I am going to miss out on the
>predication of these virulent genes located within the repeats.
>
>Other concerns with the setting in maker-opts file for fungal genomes are:
>
>single_exon = 0     should this get changed to 1 since single exon genes
>are quit common in fungi and what is the consequence of this on using EST
>and assembled RNA as evidence for gene prediction
>
>correct_est_fusion=0                  #limits use of ESTs in annotation
>to avoid fusion genes         as I understand this option will remove the
>overlapping UTRs but what is the consequence of setting this option on
>the use of EST for predicting ORFs
>
>
>Thanks
>
>
>
>HB
>
>
>
>


From carsonhh at gmail.com  Wed Mar  5 13:17:57 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 05 Mar 2014 12:17:57 -0700
Subject: [maker-devel] FW: maker-control file
Message-ID: <CF3CC300.A805%carsonhh@gmail.com>

Not using repeat masking will cause many problems.  Beside a gene being
flanked by repeats does not mean it will be lost, any evidence/alignments
that can seed in non-repetative regions (gene/exon) are still allowed to
extend into repetitive regions during the polishing stage (aligners have
two stages - seed and extend).  So transposons should never seed, but
genes will because there sequence will contain non-repetative regions
(even if they are near repeats).

single_exon should be set to 1 for fungi, just make sure to set the
minimum length of single exon evidence to something reasonable like 250bp.

correct_est_fusion should not be used together with est2genome.  It won?t
fail, you just get odd results.  Actually est2genome should not ever be
used to generate the final annotation set.  It is a convenience method
that allows you to generate rough models for training gene predictors like
SNAP and Augustus.  But once they are trained it should be turned off,
because the models it produces will be partial (Ests rarely cover the
whole transcript) and the results will have many false potties from
background transcription events from your EST data.  These models are good
enough to train with, but make very poor final annotations. So in the end
you should be using correct_est_fusion=1 with the SNAP pr Augustus set and
not est2genome (which should already have been turned off by then).


Thanks,
Carson


>
>
>On 3/5/14, 11:59 AM, "Borhan, Hossein" <> wrote:
>
>>Dear Maker users
>>
>>I want to run maker on a fungal genome of about 45 Mb with about 1/3 of
>>the genome begin repeat rich. But most of the virulent genes are located
>>within the repeat regions flanked but stretch of repeats. I am not sure
>>if I  use the repeat masker option I am going to miss out on the
>>predication of these virulent genes located within the repeats.
>>
>>Other concerns with the setting in maker-opts file for fungal genomes
>>are:
>>
>>single_exon = 0     should this get changed to 1 since single exon genes
>>are quit common in fungi and what is the consequence of this on using EST
>>and assembled RNA as evidence for gene prediction
>>
>>correct_est_fusion=0                  #limits use of ESTs in annotation
>>to avoid fusion genes         as I understand this option will remove the
>>overlapping UTRs but what is the consequence of setting this option on
>>the use of EST for predicting ORFs
>>
>>
>>Thanks
>>
>>
>>
>>HB
>>
>>
>>
>>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From marc.hoeppner at imbim.uu.se  Thu Mar  6 01:26:29 2014
From: marc.hoeppner at imbim.uu.se (=?Windows-1252?Q?Marc_H=F6ppner?=)
Date: Thu, 6 Mar 2014 07:26:29 +0000
Subject: [maker-devel] FW: maker-control file
In-Reply-To: <CF3CC300.A805%carsonhh@gmail.com>
References: <CF3CC300.A805%carsonhh@gmail.com>
Message-ID: <1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se>

Hi,

I think this is an interesting comment that I would like a few more information on:


correct_est_fusion should not be used together with est2genome.  It won?t
fail, you just get odd results.  Actually est2genome should not ever be
used to generate the final annotation set.  It is a convenience method
that allows you to generate rough models for training gene predictors like
SNAP and Augustus.  But once they are trained it should be turned off,
because the models it produces will be partial (Ests rarely cover the
whole transcript) and the results will have many false potties from
background transcription events from your EST data.  These models are good
enough to train with, but make very poor final annotations. So in the end
you should be using correct_est_fusion=1 with the SNAP pr Augustus set and
not est2genome (which should already have been turned off by then).


My experience has been that the process of training gene finders, especially for complex genomes like vertebrates, is a very slow and painful process. And ultimately, the results are far from accurate, even with a sizeable, manually curated training set. Wouldn?t it be more sensible to rely on the evidence over probabilistic models? The annotation would be partial, but on the other hand the chance of incorporating false signals are smaller (assuming I can generate a clean set of transcripts from RNA-seq data)? And I?d rather underestimate the exon inventory slightly than putting out an annotation with ~ 10% false exon calls.

As an example, using SNAP and Augustus on a bird genome - with augustus achieving nucleotide and exon sensitivities in the 70-90% range gave a host if false exons that were simply not supported by the RNAseq data, yet made it into the final gene build. Not sure what to think about that to be honest. Is it possible to get some more details on how Maker uses ab-inito predictions and reconciles them with evidence alignments? At the moment it seems to me that maker gives higher weight to the ab-initio predictions, which to me seems problematic.


/Marc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140306/f7acdc87/attachment.html>

From carsonhh at gmail.com  Thu Mar  6 08:29:35 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Mar 2014 07:29:35 -0700
Subject: [maker-devel] FW: maker-control file
In-Reply-To: <1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se>
References: <CF3CC300.A805%carsonhh@gmail.com>
	<1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se>
Message-ID: <CF3DCCB0.A85C%carsonhh@gmail.com>

> Wouldn?t it be more sensible to rely on the evidence over probabilistic
> models?

Yes.  Infact that is the backbone of MAKER.  The evidence is used to derive
hints that are passed back into the predictors and reviewed in light of the
evidence to decide on final models (no longer strictly probabalistic).  Take
a look at the MAKER2 paper (Table 2 and Figure 1) and you will see that eve
when you use the wrong species parameters in the predictor (I.e. A. thaliana
to annotate C. elegant) you get as much as a 3 fold increase in exon level
accuracy by using the hint feedback from MAKER.  With est2genome option you
don?t get that hint feedback (normally probabilistic models, EST evidence,
and protein evidence would all work together), and the models are overall
poorer and contain more false positives (we have looked at this a lot).


> The annotation would be partial, but on the other hand the chance of
> incorporating false signals are smaller (assuming I can generate a clean set
> of transcripts from RNA-seq data)?

False signals are abundant.  It?s just the nature of how ESTs and especially
mRNAseq reads are generated and anchored back to the assembly.  By letting
there be feedback between the probabilistic model and the evidence (both
protein and EST/mRNAseq) a lot of this is eliminated.


> As an example, using SNAP and Augustus on a bird genome - with augustus
> achieving nucleotide and exon sensitivities in the 70-90% range gave a host if
> false exons that were simply not supported by the RNAseq data, yet made it
> into the final gene build.

You will get false positives from est2genome alone approach as well.  Models
will be more partial, and false negative rate will be very high (often
30-70% false negative rate).  Also look at the MAKER2 paper Figure 1.  The
false positive rate from ab initio alone can be quite high, but with the
evidence feedback it is substantially reduced (especially for poorly trained
predictors).


> Is it possible to get some more details on how Maker uses ab-inito predictions
> and reconciles them with evidence alignments? At the moment it seems to me
> that maker gives higher weight to the ab-initio predictions, which to me seems
> problematic. 

Take a look at the MAKER, MAKER2, and MAKER-P papers.  Final genes are
chosen based off of evidence overlap using AED (completely evidence based).
It is the model generation that leverages the hint based feedback.  The
names of MAKER genes can let you know what the source of the model is.  Any
time hint based models match the evidence better the name will have hame
like this ?>
maker-<contig>-<predictor>-gene-<ID> (I.e. maker-chr1-snap-gene-0.4)

When the ab initio model matches better than the hint based model the name
is like this ?>
<predictor>-<contig>-abinit-gene-<ID> (I.e. snap-chr1-abinit-gene-0.2)


In summary, using est2genome alone (while good for generating training sets)
undercuts the power of the evidence feedback together with the probabilistic
models.


Thanks,
Carson

From:  Marc H?ppner <marc.hoeppner at imbim.uu.se>
Date:  Thursday, March 6, 2014 at 12:26 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] FW: maker-control file

Hi,

I think this is an interesting comment that I would like a few more
information on:

> 
> correct_est_fusion should not be used together with est2genome.  It won?t
> fail, you just get odd results.  Actually est2genome should not ever be
> used to generate the final annotation set.  It is a convenience method
> that allows you to generate rough models for training gene predictors like
> SNAP and Augustus.  But once they are trained it should be turned off,
> because the models it produces will be partial (Ests rarely cover the
> whole transcript) and the results will have many false potties from
> background transcription events from your EST data.  These models are good
> enough to train with, but make very poor final annotations. So in the end
> you should be using correct_est_fusion=1 with the SNAP pr Augustus set and
> not est2genome (which should already have been turned off by then).
> 

My experience has been that the process of training gene finders, especially
for complex genomes like vertebrates, is a very slow and painful process.
And ultimately, the results are far from accurate, even with a sizeable,
manually curated training set. Wouldn?t it be more sensible to rely on the
evidence over probabilistic models? The annotation would be partial, but on
the other hand the chance of incorporating false signals are smaller
(assuming I can generate a clean set of transcripts from RNA-seq data)? And
I?d rather underestimate the exon inventory slightly than putting out an
annotation with ~ 10% false exon calls.

As an example, using SNAP and Augustus on a bird genome - with augustus
achieving nucleotide and exon sensitivities in the 70-90% range gave a host
if false exons that were simply not supported by the RNAseq data, yet made
it into the final gene build. Not sure what to think about that to be
honest. Is it possible to get some more details on how Maker uses ab-inito
predictions and reconciles them with evidence alignments? At the moment it
seems to me that maker gives higher weight to the ab-initio predictions,
which to me seems problematic.


/Marc


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140306/465e3b3f/attachment.html>

From marc.hoeppner at imbim.uu.se  Thu Mar  6 08:40:48 2014
From: marc.hoeppner at imbim.uu.se (=?Windows-1252?Q?Marc_H=F6ppner?=)
Date: Thu, 6 Mar 2014 14:40:48 +0000
Subject: [maker-devel] FW: maker-control file
In-Reply-To: <CF3DCCB0.A85C%carsonhh@gmail.com>
References: <CF3CC300.A805%carsonhh@gmail.com>
	<1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se>
	<CF3DCCB0.A85C%carsonhh@gmail.com>
Message-ID: <1E6F33D6-44FE-44C5-81C5-8FE58DA07D27@imbim.uu.se>

Hi Carson,

Thanks for the detailed feedback, this has cleared up a few things. I don?t necessarily share your view on the problematic nature of RNA-seq data - especially with newer protocols near-perfect strandedness. We work a lot on transcriptome assembly and with a stringent approach to transcript assembly I think I got better results with est2genome than trying to let Maker work with a semi-refined ab-initio model. But it can be a bit tricky to hit that sweet spot (we did validate > 4000 models manually in order to make that sort of assessment tho).

But I will have another look at this and see if I can get Maker to do what I need with the approach you describe. That reminds me, I think it would be fantastic if you guys could put together a Wiki for Maker. This is such a useful and powerful tool, but clearly there are many things that people should get a proper explanation on that has only ever been discussed on this list here - best practices, experimental features etc.

Regards,

Marc


On 06 Mar 2014, at 15:29, Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>> wrote:

Wouldn?t it be more sensible to rely on the evidence over probabilistic models?

Yes.  Infact that is the backbone of MAKER.  The evidence is used to derive hints that are passed back into the predictors and reviewed in light of the evidence to decide on final models (no longer strictly probabalistic).  Take a look at the MAKER2 paper (Table 2 and Figure 1) and you will see that eve when you use the wrong species parameters in the predictor (I.e. A. thaliana to annotate C. elegant) you get as much as a 3 fold increase in exon level accuracy by using the hint feedback from MAKER.  With est2genome option you don?t get that hint feedback (normally probabilistic models, EST evidence, and protein evidence would all work together), and the models are overall poorer and contain more false positives (we have looked at this a lot).


The annotation would be partial, but on the other hand the chance of incorporating false signals are smaller (assuming I can generate a clean set of transcripts from RNA-seq data)?

False signals are abundant.  It?s just the nature of how ESTs and especially mRNAseq reads are generated and anchored back to the assembly.  By letting there be feedback between the probabilistic model and the evidence (both protein and EST/mRNAseq) a lot of this is eliminated.


As an example, using SNAP and Augustus on a bird genome - with augustus achieving nucleotide and exon sensitivities in the 70-90% range gave a host if false exons that were simply not supported by the RNAseq data, yet made it into the final gene build.

You will get false positives from est2genome alone approach as well.  Models will be more partial, and false negative rate will be very high (often 30-70% false negative rate).  Also look at the MAKER2 paper Figure 1.  The false positive rate from ab initio alone can be quite high, but with the evidence feedback it is substantially reduced (especially for poorly trained predictors).


Is it possible to get some more details on how Maker uses ab-inito predictions and reconciles them with evidence alignments? At the moment it seems to me that maker gives higher weight to the ab-initio predictions, which to me seems problematic.

Take a look at the MAKER, MAKER2, and MAKER-P papers.  Final genes are chosen based off of evidence overlap using AED (completely evidence based).  It is the model generation that leverages the hint based feedback.  The names of MAKER genes can let you know what the source of the model is.  Any time hint based models match the evidence better the name will have hame like this ?>
maker-<contig>-<predictor>-gene-<ID> (I.e. maker-chr1-snap-gene-0.4)

When the ab initio model matches better than the hint based model the name is like this ?>
<predictor>-<contig>-abinit-gene-<ID> (I.e. snap-chr1-abinit-gene-0.2)


In summary, using est2genome alone (while good for generating training sets) undercuts the power of the evidence feedback together with the probabilistic models.


Thanks,
Carson

From: Marc H?ppner <marc.hoeppner at imbim.uu.se<mailto:marc.hoeppner at imbim.uu.se>>
Date: Thursday, March 6, 2014 at 12:26 AM
To: Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] FW: maker-control file

Hi,

I think this is an interesting comment that I would like a few more information on:


correct_est_fusion should not be used together with est2genome.  It won?t
fail, you just get odd results.  Actually est2genome should not ever be
used to generate the final annotation set.  It is a convenience method
that allows you to generate rough models for training gene predictors like
SNAP and Augustus.  But once they are trained it should be turned off,
because the models it produces will be partial (Ests rarely cover the
whole transcript) and the results will have many false potties from
background transcription events from your EST data.  These models are good
enough to train with, but make very poor final annotations. So in the end
you should be using correct_est_fusion=1 with the SNAP pr Augustus set and
not est2genome (which should already have been turned off by then).


My experience has been that the process of training gene finders, especially for complex genomes like vertebrates, is a very slow and painful process. And ultimately, the results are far from accurate, even with a sizeable, manually curated training set. Wouldn?t it be more sensible to rely on the evidence over probabilistic models? The annotation would be partial, but on the other hand the chance of incorporating false signals are smaller (assuming I can generate a clean set of transcripts from RNA-seq data)? And I?d rather underestimate the exon inventory slightly than putting out an annotation with ~ 10% false exon calls.

As an example, using SNAP and Augustus on a bird genome - with augustus achieving nucleotide and exon sensitivities in the 70-90% range gave a host if false exons that were simply not supported by the RNAseq data, yet made it into the final gene build. Not sure what to think about that to be honest. Is it possible to get some more details on how Maker uses ab-inito predictions and reconciles them with evidence alignments? At the moment it seems to me that maker gives higher weight to the ab-initio predictions, which to me seems problematic.


/Marc

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140306/868effc6/attachment.html>

From carsonhh at gmail.com  Thu Mar  6 09:03:10 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Mar 2014 08:03:10 -0700
Subject: [maker-devel] FW: maker-control file
In-Reply-To: <1E6F33D6-44FE-44C5-81C5-8FE58DA07D27@imbim.uu.se>
References: <CF3CC300.A805%carsonhh@gmail.com>
	<1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se>
	<CF3DCCB0.A85C%carsonhh@gmail.com>
	<1E6F33D6-44FE-44C5-81C5-8FE58DA07D27@imbim.uu.se>
Message-ID: <CF3DDC22.A8AF%carsonhh@gmail.com>

MAKER wiki ?> 
http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Main_Page

Thanks,
Carson


From:  Marc H?ppner <marc.hoeppner at imbim.uu.se>
Date:  Thursday, March 6, 2014 at 7:40 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] FW: maker-control file

Hi Carson, 

Thanks for the detailed feedback, this has cleared up a few things. I don?t
necessarily share your view on the problematic nature of RNA-seq data -
especially with newer protocols near-perfect strandedness. We work a lot on
transcriptome assembly and with a stringent approach to transcript assembly
I think I got better results with est2genome than trying to let Maker work
with a semi-refined ab-initio model. But it can be a bit tricky to hit that
sweet spot (we did validate > 4000 models manually in order to make that
sort of assessment tho).

But I will have another look at this and see if I can get Maker to do what I
need with the approach you describe. That reminds me, I think it would be
fantastic if you guys could put together a Wiki for Maker. This is such a
useful and powerful tool, but clearly there are many things that people
should get a proper explanation on that has only ever been discussed on this
list here - best practices, experimental features etc.

Regards,

Marc


On 06 Mar 2014, at 15:29, Carson Holt <carsonhh at gmail.com> wrote:

>> Wouldn?t it be more sensible to rely on the evidence over probabilistic
>> models?
> 
> Yes.  Infact that is the backbone of MAKER.  The evidence is used to derive
> hints that are passed back into the predictors and reviewed in light of the
> evidence to decide on final models (no longer strictly probabalistic).  Take a
> look at the MAKER2 paper (Table 2 and Figure 1) and you will see that eve when
> you use the wrong species parameters in the predictor (I.e. A. thaliana to
> annotate C. elegant) you get as much as a 3 fold increase in exon level
> accuracy by using the hint feedback from MAKER.  With est2genome option you
> don?t get that hint feedback (normally probabilistic models, EST evidence, and
> protein evidence would all work together), and the models are overall poorer
> and contain more false positives (we have looked at this a lot).
> 
> 
>> The annotation would be partial, but on the other hand the chance of
>> incorporating false signals are smaller (assuming I can generate a clean set
>> of transcripts from RNA-seq data)?
> 
> False signals are abundant.  It?s just the nature of how ESTs and especially
> mRNAseq reads are generated and anchored back to the assembly.  By letting
> there be feedback between the probabilistic model and the evidence (both
> protein and EST/mRNAseq) a lot of this is eliminated.
> 
> 
>> As an example, using SNAP and Augustus on a bird genome - with augustus
>> achieving nucleotide and exon sensitivities in the 70-90% range gave a host
>> if false exons that were simply not supported by the RNAseq data, yet made it
>> into the final gene build.
> 
> You will get false positives from est2genome alone approach as well.  Models
> will be more partial, and false negative rate will be very high (often 30-70%
> false negative rate).  Also look at the MAKER2 paper Figure 1.  The false
> positive rate from ab initio alone can be quite high, but with the evidence
> feedback it is substantially reduced (especially for poorly trained
> predictors).
> 
> 
>> Is it possible to get some more details on how Maker uses ab-inito
>> predictions and reconciles them with evidence alignments? At the moment it
>> seems to me that maker gives higher weight to the ab-initio predictions,
>> which to me seems problematic.
> 
> Take a look at the MAKER, MAKER2, and MAKER-P papers.  Final genes are chosen
> based off of evidence overlap using AED (completely evidence based).  It is
> the model generation that leverages the hint based feedback.  The names of
> MAKER genes can let you know what the source of the model is.  Any time hint
> based models match the evidence better the name will have hame like this ?>
> maker-<contig>-<predictor>-gene-<ID> (I.e. maker-chr1-snap-gene-0.4)
> 
> When the ab initio model matches better than the hint based model the name is
> like this ?>
> <predictor>-<contig>-abinit-gene-<ID> (I.e. snap-chr1-abinit-gene-0.2)
> 
> 
> In summary, using est2genome alone (while good for generating training sets)
> undercuts the power of the evidence feedback together with the probabilistic
> models.
> 
> 
> Thanks,
> Carson
> 
> From: Marc H?ppner <marc.hoeppner at imbim.uu.se>
> Date: Thursday, March 6, 2014 at 12:26 AM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] FW: maker-control file
> 
> Hi,
> 
> I think this is an interesting comment that I would like a few more
> information on:
> 
>> 
>> correct_est_fusion should not be used together with est2genome.  It won?t
>> fail, you just get odd results.  Actually est2genome should not ever be
>> used to generate the final annotation set.  It is a convenience method
>> that allows you to generate rough models for training gene predictors like
>> SNAP and Augustus.  But once they are trained it should be turned off,
>> because the models it produces will be partial (Ests rarely cover the
>> whole transcript) and the results will have many false potties from
>> background transcription events from your EST data.  These models are good
>> enough to train with, but make very poor final annotations. So in the end
>> you should be using correct_est_fusion=1 with the SNAP pr Augustus set and
>> not est2genome (which should already have been turned off by then).
>> 
> 
> My experience has been that the process of training gene finders, especially
> for complex genomes like vertebrates, is a very slow and painful process. And
> ultimately, the results are far from accurate, even with a sizeable, manually
> curated training set. Wouldn?t it be more sensible to rely on the evidence
> over probabilistic models? The annotation would be partial, but on the other
> hand the chance of incorporating false signals are smaller (assuming I can
> generate a clean set of transcripts from RNA-seq data)? And I?d rather
> underestimate the exon inventory slightly than putting out an annotation with
> ~ 10% false exon calls.
> 
> As an example, using SNAP and Augustus on a bird genome - with augustus
> achieving nucleotide and exon sensitivities in the 70-90% range gave a host if
> false exons that were simply not supported by the RNAseq data, yet made it
> into the final gene build. Not sure what to think about that to be honest. Is
> it possible to get some more details on how Maker uses ab-inito predictions
> and reconciles them with evidence alignments? At the moment it seems to me
> that maker gives higher weight to the ab-initio predictions, which to me seems
> problematic. 
> 
> 
> /Marc


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140306/10d5f640/attachment.html>

From sjackman at gmail.com  Thu Mar  6 14:56:34 2014
From: sjackman at gmail.com (Shaun Jackman)
Date: Thu, 6 Mar 2014 12:56:34 -0800
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CF3BD88C.A7D5%carsonhh@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
	<CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
	<CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>
	<CADX6M3rdHPKSk4VMUpbxKSA=rhVVUk+L=dG1xMibWz1KntF2AA@mail.gmail.com>
	<CF3BD88C.A7D5%carsonhh@gmail.com>
Message-ID: <etPan.5318e112.238e1f29.9018@pshen01-imac.phage.bcgsc.ca>

Hi, Carson. I agree that identifying non-coding RNA by homology in general is a non-trivial problem. In my particular case, I have a well annotated reference species that is very closely related (99.2% sequence identity), so lifting over the annotations from that reference species to my species should be pretty straight forward. It would be great if MAKER had an option for RNA sequence homology similar to est2genome that does not imply the sequence is coding.

The integration of MAKER-P with tRNAscan is very useful. The identified genes are named e.g. `trnascan-205522-processed-gene-0.38`. ?tRNA genes are conventionally named according to the amino acid and anticodon, such as `trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the names with that convention?

Cheers,
Shaun


On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote:

Trying to call non-coding RNA from ESTs or even sequence homology is extremely messy (non-trivial problem in most organisms with high false positive rate), so MAKER for the most part doesn?t even try to do that. ?It focuses only on the coding genes. ?You can now use tRNAscan and snoscan in the newest version for some non-coding RNA support (those features were only added a couple of months ago). ?So just like other prediction tools (snap, augustus etc.), the primary focus has always been the coding genes. ?We?ve only started adding non-coding RNA support recently for iPlant, so it?s still relatively immature.

Thanks,
Carson


From: Shaun Jackman <sjackman at gmail.com>
Reply-To: Shaun Jackman <sjackman at gmail.com>
Date: Tuesday, March 4, 2014 at 7:10 PM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

Hi, Carson. I set  
single_length=50, and it worked like a charm. Thanks for the tip.

The rRNA genes that are found with est2genome have the feature type set to mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. Ideally the feature type would be set to rRNA or tRNA as appropriate, and would omit the UTR and CDS features. Is that a feature that you would be interested in adding to MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is standard, so determining the appropriate type should be straight forward.

Thanks again for your help with this. Cheers,
Shaun


On 27 February 2014 17:13, Carson Holt <carsonhh at gmail.com> wrote:
Set single_exon=1, and the minimum size to a smaller value. ?I think it's set to 250 right now. ?Also est2genome is looking for ORF, so if there is none (as with tRNAs) they probably won't get picked up.

--Carson?

Sent from my iPhone

On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:

Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you!

The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits?


organism_type=prokaryotic
est2genome=1
protein2genome=1
est_forward=1

Cheers,
Shaun


On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
Is there a corresponding?protein_forward=1 option to map forward protein names from protein2genome?

Cheers,
Shaun

On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote:

Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff.

--Carson?

Sent from my iPhone

On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:

What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. ?Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. ?Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). ?I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors.?

Thanks,
Carson

From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date: Wednesday, February 26, 2014 at 3:04 PM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions.

In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature.

THanks,
Mikael

26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:

It will still work without est_forward. ?It just works a little differently. ?Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome.

If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. ?Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). ?So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). ?To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). ?The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match.?

Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). ?MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. ?Also match parameters for exonerate will not be relaxed as they were with est_forward.

As you can see the behavior, is slightly different (because it?s an accidental feature).

Thanks,
Carson


From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date: Wednesday, February 26, 2014 at 6:37 AM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right?

Mikael

26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:

Yes. ?That should work as well as an accidental feature.

--Carson?

Sent from my iPhone

On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se> wrote:

Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?

Thanks,
Mikael

26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:

There is a way. ?It?s not a standard option and it?s undocumented, but if you add?est_forward=1 to the maker_opts.ctl file, then it will do just that. ?The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option. ?If you add tags to your fasta headers, those can be used to guide the mapping and naming. ?For example, gene_id=<some_gene> ?will ensure different isoforms that share a common gene_id get clustered into the same gene, and?maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp ?and just using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.

?Carson


From: Shaun Jackman <sjackman at gmail.com>
Reply-To: Shaun Jackman <sjackman at gmail.com>
Date: Tuesday, February 25, 2014 at 5:06 PM
To: <maker-devel at yandell-lab.org>
Subject: [maker-devel] Mapping gene names

Hi,

I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?

maker_opts.ctl


est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1

Thanks,
Shaun

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140306/b953179f/attachment.html>

From carsonhh at gmail.com  Thu Mar  6 14:58:41 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Mar 2014 13:58:41 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <etPan.5318e112.238e1f29.9018@pshen01-imac.phage.bcgsc.ca>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
	<CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
	<CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>
	<CADX6M3rdHPKSk4VMUpbxKSA=rhVVUk+L=dG1xMibWz1KntF2AA@mail.gmail.com>
	<CF3BD88C.A7D5%carsonhh@gmail.com>
	<etPan.5318e112.238e1f29.9018@pshen01-imac.phage.bcgsc.ca>
Message-ID: <CF3E2F7A.A911%carsonhh@gmail.com>

Yes.  I?ll fix the naming.

Thanks,
Carson


From:  Shaun Jackman <sjackman at gmail.com>
Date:  Thursday, March 6, 2014 at 1:56 PM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Mapping gene names

Hi, Carson. I agree that identifying non-coding RNA by homology in general
is a non-trivial problem. In my particular case, I have a well annotated
reference species that is very closely related (99.2% sequence identity), so
lifting over the annotations from that reference species to my species
should be pretty straight forward. It would be great if MAKER had an option
for RNA sequence homology similar to est2genome that does not imply the
sequence is coding.

The integration of MAKER-P with tRNAscan is very useful. The identified
genes are named e.g. `trnascan-205522-processed-gene-0.38`.  tRNA genes are
conventionally named according to the amino acid and anticodon, such as
`trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the
names with that convention?

Cheers,
Shaun


On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote:
 
> Trying to call non-coding RNA from ESTs or even sequence homology is extremely
> messy (non-trivial problem in most organisms with high false positive rate),
> so MAKER for the most part doesn?t even try to do that.  It focuses only on
> the coding genes.  You can now use tRNAscan and snoscan in the newest version
> for some non-coding RNA support (those features were only added a couple of
> months ago).  So just like other prediction tools (snap, augustus etc.), the
> primary focus has always been the coding genes.  We?ve only started adding
> non-coding RNA support recently for iPlant, so it?s still relatively immature.
> 
> Thanks,
> Carson
> 
> 
> From: Shaun Jackman <sjackman at gmail.com>
> Reply-To: Shaun Jackman <sjackman at gmail.com>
> Date: Tuesday, March 4, 2014 at 7:10 PM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Mapping gene names
> 
> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for the
> tip.
> 
> The rRNA genes that are found with est2genome have the feature type set to
> mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features.
> Ideally the feature type would be set to rRNA or tRNA as appropriate, and
> would omit the UTR and CDS features. Is that a feature that you would be
> interested in adding to MAKER? The rRNA gene names all start with ?rrn? and
> the tRNA gene names with ?trn?, as is standard, so determining the appropriate
> type should be straight forward.
> 
> Thanks again for your help with this. Cheers,
> Shaun
> 
> 
> 
> On 27 February 2014 17:13, Carson Holt <carsonhh at gmail.com> wrote:
>> Set single_exon=1, and the minimum size to a smaller value.  I think it's set
>> to 250 right now.  Also est2genome is looking for ORF, so if there is none
>> (as with tRNAs) they probably won't get picked up.
>> 
>> --Carson 
>> 
>> Sent from my iPhone
>> 
>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:
>> 
>>> Sorry, ignore my previous question. est_forward also carries forward the
>>> names of protein evidence and works like a charm. Thank you!
>>> 
>>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5
>>> and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in
>>> the blastn output, and in the evidence_0.gff. rrn5 has perfect identity,
>>> sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 <
>>> eval_blastn=1e-10). How should I debug which filter is removing these hits?
>>> organism_type=prokaryotic
>>> est2genome=1
>>> protein2genome=1
>>> est_forward=1
>>> Cheers,
>>> Shaun
>>> 
>>> 
>>> 
>>> On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
>>>> Is there a corresponding protein_forward=1 option to map forward protein
>>>> names from protein2genome?
>>>> 
>>>> Cheers, 
>>>> Shaun
>>>> 
>>>> 
>>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com
>>>> <mailto://carsonhh at gmail.com> ) wrote:
>>>>> 
>>>>> Sorry I meant to say prefilter on the score in the mRNA column before
>>>>> passing the gff3 to model_gff.
>>>>> 
>>>>> --Carson 
>>>>> 
>>>>> Sent from my iPhone
>>>>> 
>>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>>>> 
>>>>>> What you can do is run it once with just est_forward=1 and
>>>>>> est2genome/protein2genome set to 1.  Then take those results, pass them
>>>>>> in as model_gff and use the map_forward option to then filter the results
>>>>>> based on mRNA score and that would copy names onto new gene under the
>>>>>> standard MAKER pipeline.  Eventually it?s really supposed to go into a
>>>>>> separate tool that will map genes onto new assemblies (but under the hood
>>>>>> the tool will just be calling MAKER with certain parameters restricted).
>>>>>> I do this because if people commonly use it mixed with things like SNAP I
>>>>>> can start to get some very weird behaviors.
>>>>>> 
>>>>>> Thanks,
>>>>>> Carson
>>>>>> 
>>>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>>>> Date: Wednesday, February 26, 2014 at 3:04 PM
>>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>>> 
>>>>>> It seems that this could be a very useful option in those cases where you
>>>>>> have firm a priori knowledge of the placement of ESTs. However, while
>>>>>> trying it I note that est_forward implies that the est2genome predictor
>>>>>> is turned on, implicitly. Is this necessary for this to work? I?m after
>>>>>> the behavior you describe below where exonerate is made to try really
>>>>>> hard within a limited region to align an est, but I would not like maker
>>>>>> to produce est2genome predictions.
>>>>>> 
>>>>>> In general, I think this maker_coor and est_forward is a feature set that
>>>>>> is worthy to be promoted into a documented feature.
>>>>>> 
>>>>>> THanks,
>>>>>> Mikael
>>>>>> 
>>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>> 
>>>>>>> It will still work without est_forward.  It just works a little
>>>>>>> differently.  Keep in mind this was a hidden feature I used to find
>>>>>>> stubborn or hard to find missing genes after reassembly of a genome.
>>>>>>> 
>>>>>>> If est_forward is provided, MAKER will parse the database to look for
>>>>>>> the maker_coor tags early in the pipeline.  Then it will create a list
>>>>>>> of locations to search, and it will search them even if there are no
>>>>>>> BLAST results to seed the search (normally MAKER gets a BLAST result
>>>>>>> first and then polishes it with exonerate).  So maker_coor=chr1 will
>>>>>>> cause MAKER to look for a match using all of chr1 as the input to
>>>>>>> exonerate even when BLAST finds nothing (this is a very very slow
>>>>>>> search, but can help pick up one or two stubborn genes that don?t remap
>>>>>>> well).  To allow this, MAKER gives exonerate looser matching parameters
>>>>>>> (i.e. allows for single base pair introns perhaps caused by assembly
>>>>>>> errors).  The logic here is that given the fact that I already told
>>>>>>> MAKER that with some degree of confidence I expect sequence A to map to
>>>>>>> to location X, it will try its hardest to make it match.
>>>>>>> 
>>>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm
>>>>>>> at line 1563, but only after a BLAST alignment has already seeded it to
>>>>>>> the region (that BLAST result has the information in its description
>>>>>>> parameter).  MAKER will then ignore seeds completely outside of
>>>>>>> maker_coor. In addition any BLAST seeds that overlap maker_coor will get
>>>>>>> the search space for alignment polishing adjusted to match maker_coor
>>>>>>> exactly.  Also match parameters for exonerate will not be relaxed as
>>>>>>> they were with est_forward.
>>>>>>> 
>>>>>>> As you can see the behavior, is slightly different (because it?s an
>>>>>>> accidental feature).
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Carson
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM
>>>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>>>> 
>>>>>>> That might be a useful and time saving accidental feature. But, reading
>>>>>>> the code, it seems that I need to supply maker_coor but not gene_id, as
>>>>>>> well as the configuration option est_forward for this to work. Any
>>>>>>> occurrences of maker_coor in GI.pm seems to be conditioned on
>>>>>>> set_forward=1 right?
>>>>>>> 
>>>>>>> Mikael
>>>>>>> 
>>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>> 
>>>>>>> Yes.  That should work as well as an accidental feature.
>>>>>>> 
>>>>>>> --Carson 
>>>>>>> 
>>>>>>> Sent from my iPhone
>>>>>>> 
>>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling
>>>>>>> <mikael.durling at slu.se> wrote:
>>>>>>> 
>>>>>>> Can this use of maker_coor be used only to hint about the placement of
>>>>>>> the ests, without affecting the naming of the final genes? Ie if I have
>>>>>>> a database of EST where I have a priori knowledge of their rough
>>>>>>> placement, can this placement be given to maker without providing
>>>>>>> est_forward=1?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Mikael
>>>>>>> 
>>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>> 
>>>>>>> There is a way.  It?s not a standard option and it?s undocumented, but
>>>>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do
>>>>>>> just that.  The option won?t already be there so you?ll have to type it
>>>>>>> in.
>>>>>>> 
>>>>>>> There is also a feature designed to work with this option.  If you add
>>>>>>> tags to your fasta headers, those can be used to guide the mapping and
>>>>>>> naming.  For example, gene_id=<some_gene>  will ensure different
>>>>>>> isoforms that share a common gene_id get clustered into the same gene,
>>>>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular
>>>>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp
>>>>>>> and just using maker_coor=chr1 will force it to only be mapped against
>>>>>>> chr1.
>>>>>>> 
>>>>>>> This is an undocumented way to remap genes onto new assemblies using
>>>>>>> blast alignments of earlier transcript or protein annotations as a
>>>>>>> guide.
>>>>>>> 
>>>>>>> ?Carson
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> From: Shaun Jackman <sjackman at gmail.com>
>>>>>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>>>>>> To: <maker-devel at yandell-lab.org>
>>>>>>> Subject: [maker-devel] Mapping gene names
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I?m annotating a genome using a closely related genome from Genbank,
>>>>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence
>>>>>>> to annotate my genome. I?ve run Maker, and the annotation seems to have
>>>>>>> worked well. Is it possible to map the names of the genes from the
>>>>>>> related species to my annotation? I see the map_forward option, which
>>>>>>> applies to the model_gff parameter. Is there a similar option for est
>>>>>>> and protein?
>>>>>>> 
>>>>>>> maker_opts.ctl
>>>>>>> est=NC_123456.frn
>>>>>>> protein=NC_123456.faa
>>>>>>> est2genome=1
>>>>>>> protein2genome=1
>>>>>>> Thanks,
>>>>>>> Shaun
>>>>>>> _______________________________________________ maker-devel mailing list
>>>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listin
>>>>>>> fo/maker-devel_yandell-lab.org
>>>>>>> _______________________________________________
>>>>>>> maker-devel mailing list
>>>>>>> maker-devel at box290.bluehost.com
>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> _______________________________________________
>>>>> maker-devel mailing list
>>>>> maker-devel at box290.bluehost.com
>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140306/7d17d96d/attachment.html>

From carson.holt at genetics.utah.edu  Thu Mar  6 17:00:40 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Thu, 6 Mar 2014 23:00:40 +0000
Subject: [maker-devel] maker problem with running blast
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A890BAE7@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A890BAE7@SKREGIXES2.AGR.GC.CA>
Message-ID: <CF3E4A6E.A92B%carson.holt@genetics.utah.edu>

Your blast_type parameter in maker_bopts.ctl is set to 'wublast' but the
executables for wublast are blank in maker_exe.ctl.

See, they?re blank ?>
xdformat=#location of WUBLAST xdformat executable
blasta=#location of WUBLAST blasta executable


You either need to provide executables or set your blast_type parameter to
something else. For example, you could set it to 'NCBI+', but you will nee
to fix the location of makeblastdb.

makeblastdb is set incorrectly here?>
makeblastdb=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+ #location of
NCBI+ makeblastdb executable


Alternativley you can set blast_type to 'NCBI', but you will need to
uncomment the executables.

Here?>
formatdb=#/usr/local/bin/formatdb #location of NCBI formatdb executable
blastall=#/usr/local/bin/blastall #location of NCBI blastall executable


?Carson


On 3/6/14, 3:51 PM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:

>Hi
>
>I have installed latest version of blast+ and provided the excitable path
>to the maker_exec.ctl  as follow
>
>#-----Location of Executables Used by MAKER/EVALUATOR
>makeblastdb=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+ #location of
>NCBI+ makeblastdb executable
>blastn=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+/bin/blastn #location
>of NCBI+ blastn executable
>blastx=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+/bin/blastx #location
>of NCBI+ blastx executable
>tblastx=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+/bin/tblastx
>#location of NCBI+ tblastx executable
>formatdb=#/usr/local/bin/formatdb #location of NCBI formatdb executable
>blastall=#/usr/local/bin/blastall #location of NCBI blastall executable
>xdformat=#location of WUBLAST xdformat executable
>blasta=#location of WUBLAST blasta executable
>RepeatMasker=/usr/local/RepeatMasker/RepeatMasker #location of
>RepeatMasker executable
>exonerate=/home/AAFC-AAC/borhanh/bin/exonerate-2.2.0-x86_64/bin/exonerate
>#location of exonerate executable
>
>#-----Ab-initio Gene Prediction Algorithms
>snap=/home/AAFC-AAC/borhanh/bin/snap/snap #location of snap executable
>gmhmme3=/home/AAFC-AAC/borhanh/bin/gm_es_bp_linux64_v2.3e/gmes/gmhmme3
>#location of eukaryotic genemark executable
>gmhmmp= #location of prokaryotic genemark executable
>augustus=/usr/local/augustus.2.5.5/bin/augustus #location of augustus
>executable
>fgenesh=/usr/local/FGENESH/fgenesh #location of fgenesh executable
>
>#-----Other Algorithms
>fathom=/home/AAFC-AAC/borhanh/bin/snap/fathom #location of fathom
>executable (experimental)
>probuild=/home/AAFC-AAC/borhanh/bin/gm_es_bp_linux64_v2.3e/gmes/probuild
>#location of probuild executable (required for genemark)
>
>
>
>
>
>But when running maker I get this error
>
>
>STATUS: Parsing control files...
>WARNING: blast_type is set to 'wublast' but executables cannot be located
>ERROR: Please provide a valid locaction for a BLAST algorithm in the
>control files.
>
>
>
>
>
>
>


From sjackman at gmail.com  Thu Mar  6 17:33:04 2014
From: sjackman at gmail.com (Shaun Jackman)
Date: Thu, 6 Mar 2014 15:33:04 -0800
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CF3E2F7A.A911%carsonhh@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
	<CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
	<CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>
	<CADX6M3rdHPKSk4VMUpbxKSA=rhVVUk+L=dG1xMibWz1KntF2AA@mail.gmail.com>
	<CF3BD88C.A7D5%carsonhh@gmail.com>
	<etPan.5318e112.238e1f29.9018@pshen01-imac.phage.bcgsc.ca>
	<CF3E2F7A.A911%carsonhh@gmail.com>
Message-ID: <etPan.531905bf.79e2a9e3.9018@pshen01-imac.phage.bcgsc.ca>

Fantastic. Thanks, Carson. When I use both est2genome and tRNAscan to identify tRNA, I was hoping that both forms of evidence would be used to create a single gene model, which doesn?t seem to be the case. I get duplicate overlapping gene models (one mRNA from est and one tRNA from tRNAscan). Could MAKER merge these models?

Cheers,
Shaun
On 2014-March-06 at 12:58:50 , Carson Holt (carsonhh at gmail.com) wrote:

Yes. ?I?ll fix the naming.

Thanks,
Carson


From: Shaun Jackman <sjackman at gmail.com>
Date: Thursday, March 6, 2014 at 1:56 PM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

Hi, Carson. I agree that identifying non-coding RNA by homology in general is a non-trivial problem. In my particular case, I have a well annotated reference species that is very closely related (99.2% sequence identity), so lifting over the annotations from that reference species to my species should be pretty straight forward. It would be great if MAKER had an option for RNA sequence homology similar to est2genome that does not imply the sequence is coding.

The integration of MAKER-P with tRNAscan is very useful. The identified genes are named e.g. `trnascan-205522-processed-gene-0.38`. ?tRNA genes are conventionally named according to the amino acid and anticodon, such as `trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the names with that convention?

Cheers,
Shaun


On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote:

Trying to call non-coding RNA from ESTs or even sequence homology is extremely messy (non-trivial problem in most organisms with high false positive rate), so MAKER for the most part doesn?t even try to do that. ?It focuses only on the coding genes. ?You can now use tRNAscan and snoscan in the newest version for some non-coding RNA support (those features were only added a couple of months ago). ?So just like other prediction tools (snap, augustus etc.), the primary focus has always been the coding genes. ?We?ve only started adding non-coding RNA support recently for iPlant, so it?s still relatively immature.

Thanks,
Carson


From: Shaun Jackman <sjackman at gmail.com>
Reply-To: Shaun Jackman <sjackman at gmail.com>
Date: Tuesday, March 4, 2014 at 7:10 PM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

Hi, Carson. I set  
single_length=50, and it worked like a charm. Thanks for the tip.

The rRNA genes that are found with est2genome have the feature type set to mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. Ideally the feature type would be set to rRNA or tRNA as appropriate, and would omit the UTR and CDS features. Is that a feature that you would be interested in adding to MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is standard, so determining the appropriate type should be straight forward.

Thanks again for your help with this. Cheers,
Shaun


On 27 February 2014 17:13, Carson Holt <carsonhh at gmail.com> wrote:
Set single_exon=1, and the minimum size to a smaller value. ?I think it's set to 250 right now. ?Also est2genome is looking for ORF, so if there is none (as with tRNAs) they probably won't get picked up.

--Carson?

Sent from my iPhone

On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:

Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you!

The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits?


organism_type=prokaryotic
est2genome=1
protein2genome=1
est_forward=1

Cheers,
Shaun


On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
Is there a corresponding?protein_forward=1 option to map forward protein names from protein2genome?

Cheers,
Shaun

On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote:

Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff.

--Carson?

Sent from my iPhone

On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:

What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. ?Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. ?Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). ?I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors.?

Thanks,
Carson

From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date: Wednesday, February 26, 2014 at 3:04 PM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions.

In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature.

THanks,
Mikael

26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:

It will still work without est_forward. ?It just works a little differently. ?Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome.

If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. ?Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). ?So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). ?To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). ?The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match.?

Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). ?MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. ?Also match parameters for exonerate will not be relaxed as they were with est_forward.

As you can see the behavior, is slightly different (because it?s an accidental feature).

Thanks,
Carson


From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date: Wednesday, February 26, 2014 at 6:37 AM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right?

Mikael

26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:

Yes. ?That should work as well as an accidental feature.

--Carson?

Sent from my iPhone

On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se> wrote:

Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?

Thanks,
Mikael

26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:

There is a way. ?It?s not a standard option and it?s undocumented, but if you add?est_forward=1 to the maker_opts.ctl file, then it will do just that. ?The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option. ?If you add tags to your fasta headers, those can be used to guide the mapping and naming. ?For example, gene_id=<some_gene> ?will ensure different isoforms that share a common gene_id get clustered into the same gene, and?maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp ?and just using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.

?Carson


From: Shaun Jackman <sjackman at gmail.com>
Reply-To: Shaun Jackman <sjackman at gmail.com>
Date: Tuesday, February 25, 2014 at 5:06 PM
To: <maker-devel at yandell-lab.org>
Subject: [maker-devel] Mapping gene names

Hi,

I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?

maker_opts.ctl


est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1

Thanks,
Shaun

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140306/41dd51b0/attachment.html>

From carsonhh at gmail.com  Thu Mar  6 17:38:48 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Mar 2014 16:38:48 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <etPan.531905bf.79e2a9e3.9018@pshen01-imac.phage.bcgsc.ca>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
	<CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
	<CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>
	<CADX6M3rdHPKSk4VMUpbxKSA=rhVVUk+L=dG1xMibWz1KntF2AA@mail.gmail.com>
	<CF3BD88C.A7D5%carsonhh@gmail.com>
	<etPan.5318e112.238e1f29.9018@pshen01-imac.phage.bcgsc.ca>
	<CF3E2F7A.A911%carsonhh@gmail.com>
	<etPan.531905bf.79e2a9e3.9018@pshen01-imac.phage.bcgsc.ca>
Message-ID: <CF3E5408.A93F%carsonhh@gmail.com>

Well? not really.  I have no plans to add est2genome support for noncoding
genes (non-trivial), so you would either have to remove the ncRNA from your
input, or filter it out downstream.

Thanks,
Carson


From:  Shaun Jackman <sjackman at gmail.com>
Date:  Thursday, March 6, 2014 at 4:33 PM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Mapping gene names

Fantastic. Thanks, Carson. When I use both est2genome and tRNAscan to
identify tRNA, I was hoping that both forms of evidence would be used to
create a single gene model, which doesn?t seem to be the case. I get
duplicate overlapping gene models (one mRNA from est and one tRNA from
tRNAscan). Could MAKER merge these models?

Cheers,
Shaun
On 2014-March-06 at 12:58:50 , Carson Holt (carsonhh at gmail.com) wrote:
 
> Yes.  I?ll fix the naming.
> 
> Thanks,
> Carson
> 
> 
> From: Shaun Jackman <sjackman at gmail.com>
> Date: Thursday, March 6, 2014 at 1:56 PM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Mapping gene names
> 
> Hi, Carson. I agree that identifying non-coding RNA by homology in general is
> a non-trivial problem. In my particular case, I have a well annotated
> reference species that is very closely related (99.2% sequence identity), so
> lifting over the annotations from that reference species to my species should
> be pretty straight forward. It would be great if MAKER had an option for RNA
> sequence homology similar to est2genome that does not imply the sequence is
> coding.
> 
> The integration of MAKER-P with tRNAscan is very useful. The identified genes
> are named e.g. `trnascan-205522-processed-gene-0.38`.  tRNA genes are
> conventionally named according to the amino acid and anticodon, such as
> `trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the names
> with that convention?
> 
> Cheers,
> Shaun
> 
> 
> On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote:
>> 
>> Trying to call non-coding RNA from ESTs or even sequence homology is
>> extremely messy (non-trivial problem in most organisms with high false
>> positive rate), so MAKER for the most part doesn?t even try to do that.  It
>> focuses only on the coding genes.  You can now use tRNAscan and snoscan in
>> the newest version for some non-coding RNA support (those features were only
>> added a couple of months ago).  So just like other prediction tools (snap,
>> augustus etc.), the primary focus has always been the coding genes.  We?ve
>> only started adding non-coding RNA support recently for iPlant, so it?s still
>> relatively immature.
>> 
>> Thanks,
>> Carson
>> 
>> 
>> From: Shaun Jackman <sjackman at gmail.com>
>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>> Date: Tuesday, March 4, 2014 at 7:10 PM
>> To: Carson Holt <carsonhh at gmail.com>
>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] Mapping gene names
>> 
>> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for
>> the tip.
>> 
>> The rRNA genes that are found with est2genome have the feature type set to
>> mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features.
>> Ideally the feature type would be set to rRNA or tRNA as appropriate, and
>> would omit the UTR and CDS features. Is that a feature that you would be
>> interested in adding to MAKER? The rRNA gene names all start with ?rrn? and
>> the tRNA gene names with ?trn?, as is standard, so determining the
>> appropriate type should be straight forward.
>> 
>> Thanks again for your help with this. Cheers,
>> Shaun
>> 
>> 
>> 
>> On 27 February 2014 17:13, Carson Holt <carsonhh at gmail.com> wrote:
>>> Set single_exon=1, and the minimum size to a smaller value.  I think it's
>>> set to 250 right now.  Also est2genome is looking for ORF, so if there is
>>> none (as with tRNAs) they probably won't get picked up.
>>> 
>>> --Carson 
>>> 
>>> Sent from my iPhone
>>> 
>>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:
>>> 
>>>> Sorry, ignore my previous question. est_forward also carries forward the
>>>> names of protein evidence and works like a charm. Thank you!
>>>> 
>>>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller
>>>> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They
>>>> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect
>>>> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value
>>>> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing
>>>> these hits?
>>>> organism_type=prokaryotic
>>>> est2genome=1
>>>> protein2genome=1
>>>> est_forward=1
>>>> Cheers,
>>>> Shaun
>>>> 
>>>> 
>>>> 
>>>> On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
>>>>> Is there a corresponding protein_forward=1 option to map forward protein
>>>>> names from protein2genome?
>>>>> 
>>>>> Cheers, 
>>>>> Shaun
>>>>> 
>>>>> 
>>>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com
>>>>> <mailto://carsonhh at gmail.com> ) wrote:
>>>>>> 
>>>>>> Sorry I meant to say prefilter on the score in the mRNA column before
>>>>>> passing the gff3 to model_gff.
>>>>>> 
>>>>>> --Carson 
>>>>>> 
>>>>>> Sent from my iPhone
>>>>>> 
>>>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>>>>> 
>>>>>>> What you can do is run it once with just est_forward=1 and
>>>>>>> est2genome/protein2genome set to 1.  Then take those results, pass them
>>>>>>> in as model_gff and use the map_forward option to then filter the
>>>>>>> results based on mRNA score and that would copy names onto new gene
>>>>>>> under the standard MAKER pipeline.  Eventually it?s really supposed to
>>>>>>> go into a separate tool that will map genes onto new assemblies (but
>>>>>>> under the hood the tool will just be calling MAKER with certain
>>>>>>> parameters restricted).  I do this because if people commonly use it
>>>>>>> mixed with things like SNAP I can start to get some very weird
>>>>>>> behaviors. 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Carson
>>>>>>> 
>>>>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>>>>> Date: Wednesday, February 26, 2014 at 3:04 PM
>>>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>>>> 
>>>>>>> It seems that this could be a very useful option in those cases where
>>>>>>> you have firm a priori knowledge of the placement of ESTs. However,
>>>>>>> while trying it I note that est_forward implies that the est2genome
>>>>>>> predictor is turned on, implicitly. Is this necessary for this to work?
>>>>>>> I?m after the behavior you describe below where exonerate is made to try
>>>>>>> really hard within a limited region to align an est, but I would not
>>>>>>> like maker to produce est2genome predictions.
>>>>>>> 
>>>>>>> In general, I think this maker_coor and est_forward is a feature set
>>>>>>> that is worthy to be promoted into a documented feature.
>>>>>>> 
>>>>>>> THanks,
>>>>>>> Mikael
>>>>>>> 
>>>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>> 
>>>>>>> It will still work without est_forward.  It just works a little
>>>>>>> differently.  Keep in mind this was a hidden feature I used to find
>>>>>>> stubborn or hard to find missing genes after reassembly of a genome.
>>>>>>> 
>>>>>>> If est_forward is provided, MAKER will parse the database to look for
>>>>>>> the maker_coor tags early in the pipeline.  Then it will create a list
>>>>>>> of locations to search, and it will search them even if there are no
>>>>>>> BLAST results to seed the search (normally MAKER gets a BLAST result
>>>>>>> first and then polishes it with exonerate).  So maker_coor=chr1 will
>>>>>>> cause MAKER to look for a match using all of chr1 as the input to
>>>>>>> exonerate even when BLAST finds nothing (this is a very very slow
>>>>>>> search, but can help pick up one or two stubborn genes that don?t remap
>>>>>>> well).  To allow this, MAKER gives exonerate looser matching parameters
>>>>>>> (i.e. allows for single base pair introns perhaps caused by assembly
>>>>>>> errors).  The logic here is that given the fact that I already told
>>>>>>> MAKER that with some degree of confidence I expect sequence A to map to
>>>>>>> to location X, it will try its hardest to make it match.
>>>>>>> 
>>>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm
>>>>>>> at line 1563, but only after a BLAST alignment has already seeded it to
>>>>>>> the region (that BLAST result has the information in its description
>>>>>>> parameter).  MAKER will then ignore seeds completely outside of
>>>>>>> maker_coor. In addition any BLAST seeds that overlap maker_coor will get
>>>>>>> the search space for alignment polishing adjusted to match maker_coor
>>>>>>> exactly.  Also match parameters for exonerate will not be relaxed as
>>>>>>> they were with est_forward.
>>>>>>> 
>>>>>>> As you can see the behavior, is slightly different (because it?s an
>>>>>>> accidental feature).
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Carson
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM
>>>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>>>> 
>>>>>>> That might be a useful and time saving accidental feature. But, reading
>>>>>>> the code, it seems that I need to supply maker_coor but not gene_id, as
>>>>>>> well as the configuration option est_forward for this to work. Any
>>>>>>> occurrences of maker_coor in GI.pm seems to be conditioned on
>>>>>>> set_forward=1 right?
>>>>>>> 
>>>>>>> Mikael
>>>>>>> 
>>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>> 
>>>>>>> Yes.  That should work as well as an accidental feature.
>>>>>>> 
>>>>>>> --Carson 
>>>>>>> 
>>>>>>> Sent from my iPhone
>>>>>>> 
>>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling
>>>>>>> <mikael.durling at slu.se> wrote:
>>>>>>> 
>>>>>>> Can this use of maker_coor be used only to hint about the placement of
>>>>>>> the ests, without affecting the naming of the final genes? Ie if I have
>>>>>>> a database of EST where I have a priori knowledge of their rough
>>>>>>> placement, can this placement be given to maker without providing
>>>>>>> est_forward=1?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Mikael
>>>>>>> 
>>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>> 
>>>>>>> There is a way.  It?s not a standard option and it?s undocumented, but
>>>>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do
>>>>>>> just that.  The option won?t already be there so you?ll have to type it
>>>>>>> in.
>>>>>>> 
>>>>>>> There is also a feature designed to work with this option.  If you add
>>>>>>> tags to your fasta headers, those can be used to guide the mapping and
>>>>>>> naming.  For example, gene_id=<some_gene>  will ensure different
>>>>>>> isoforms that share a common gene_id get clustered into the same gene,
>>>>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular
>>>>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp
>>>>>>> and just using maker_coor=chr1 will force it to only be mapped against
>>>>>>> chr1.
>>>>>>> 
>>>>>>> This is an undocumented way to remap genes onto new assemblies using
>>>>>>> blast alignments of earlier transcript or protein annotations as a
>>>>>>> guide.
>>>>>>> 
>>>>>>> ?Carson
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> From: Shaun Jackman <sjackman at gmail.com>
>>>>>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>>>>>> To: <maker-devel at yandell-lab.org>
>>>>>>> Subject: [maker-devel] Mapping gene names
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I?m annotating a genome using a closely related genome from Genbank,
>>>>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence
>>>>>>> to annotate my genome. I?ve run Maker, and the annotation seems to have
>>>>>>> worked well. Is it possible to map the names of the genes from the
>>>>>>> related species to my annotation? I see the map_forward option, which
>>>>>>> applies to the model_gff parameter. Is there a similar option for est
>>>>>>> and protein?
>>>>>>> 
>>>>>>> maker_opts.ctl
>>>>>>> est=NC_123456.frn
>>>>>>> protein=NC_123456.faa
>>>>>>> est2genome=1
>>>>>>> protein2genome=1
>>>>>>> Thanks,
>>>>>>> Shaun
>>>>>>> _______________________________________________ maker-devel mailing list
>>>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listin
>>>>>>> fo/maker-devel_yandell-lab.org
>>>>>>> _______________________________________________
>>>>>>> maker-devel mailing list
>>>>>>> maker-devel at box290.bluehost.com
>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> _______________________________________________
>>>>>> maker-devel mailing list
>>>>>> maker-devel at box290.bluehost.com
>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>> 
>> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140306/1c286d5e/attachment.html>

From sbrubaker at solazyme.com  Thu Mar  6 17:41:55 2014
From: sbrubaker at solazyme.com (Shane Brubaker)
Date: Thu, 6 Mar 2014 23:41:55 +0000
Subject: [maker-devel] Long introns from Augustus
Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA50F08236@EXCHANGE-MB01.internal.solazyme.com>

Hi, we have a very compact genome and we are getting a lot of fused gene models from running Augustus.  I am wondering if anyone has any advice about how to prevent introns above a certain cutoff from being created?

I tried a couple of things, some settings in a probabilities file and also changing a long list of probabilities to another file that someone had suggested on a forum.  So far I don't really see any changes though.

Any advice would be greatly appreciated.  

Thanks,
Shane


From carsonhh at gmail.com  Thu Mar  6 17:46:53 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Mar 2014 16:46:53 -0700
Subject: [maker-devel] Long introns from Augustus
Message-ID: <CF3E5643.A94C%carsonhh@gmail.com>

Are these the ab intio calls that are merged or final MAKER models.

?Carson


On 3/6/14, 4:41 PM, "Shane Brubaker" <sbrubaker at solazyme.com> wrote:

>Hi, we have a very compact genome and we are getting a lot of fused gene
>models from running Augustus.  I am wondering if anyone has any advice
>about how to prevent introns above a certain cutoff from being created?
>
>I tried a couple of things, some settings in a probabilities file and
>also changing a long list of probabilities to another file that someone
>had suggested on a forum.  So far I don't really see any changes though.
>
>Any advice would be greatly appreciated.
>
>Thanks,
>Shane
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From sbrubaker at solazyme.com  Thu Mar  6 18:48:15 2014
From: sbrubaker at solazyme.com (Shane Brubaker)
Date: Fri, 7 Mar 2014 00:48:15 +0000
Subject: [maker-devel] Long introns from Augustus
In-Reply-To: <CF3E5643.A94C%carsonhh@gmail.com>
References: <CF3E5643.A94C%carsonhh@gmail.com>
Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com>

Actually these are calls directly from Augustus (without using Maker).  They are not purely ab initio in that they are using hints from RNA-Seq data.

I had noticed that Maker does have some information about max intron length - does that mean it could be taken care of by Maker?  I don't have very good "EST" (transcriptome) assemblies because it is a very difficult organism to sequence.


-----Original Message-----
From: Carson Holt [mailto:carsonhh at gmail.com] 
Sent: Thursday, March 06, 2014 3:47 PM
To: Shane Brubaker; maker-devel at yandell-lab.org
Subject: Re: [maker-devel] Long introns from Augustus

Are these the ab intio calls that are merged or final MAKER models.

?Carson


On 3/6/14, 4:41 PM, "Shane Brubaker" <sbrubaker at solazyme.com> wrote:

>Hi, we have a very compact genome and we are getting a lot of fused 
>gene models from running Augustus.  I am wondering if anyone has any 
>advice about how to prevent introns above a certain cutoff from being created?
>
>I tried a couple of things, some settings in a probabilities file and 
>also changing a long list of probabilities to another file that someone 
>had suggested on a forum.  So far I don't really see any changes though.
>
>Any advice would be greatly appreciated.
>
>Thanks,
>Shane
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From mikael.durling at slu.se  Mon Mar 10 05:27:25 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Mon, 10 Mar 2014 10:27:25 +0000
Subject: [maker-devel] keep_preds values
Message-ID: <6765E2B1-3B6F-4F5D-92E1-80AE8C315FE3@slu.se>

Hi,

Can someone, please, explain the keep_preds parameter, as it works now with a value between 1 and 0? It used to be binary, but now it seems to test concordance towards something. The maker wiki doesn?t explain it any further either.

Thanks,
Mikael


From robert.king at rothamsted.ac.uk  Mon Mar 10 07:17:07 2014
From: robert.king at rothamsted.ac.uk (Robert King (RRes-Roth))
Date: Mon, 10 Mar 2014 12:17:07 +0000
Subject: [maker-devel] annotation comparison aed plots
Message-ID: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk>

Dear Maker Developers,

I've updated a reference that was had errors and was a little incomplete and now trying to produce a annotation for it. Please note the reference has not changed dramatically. I've produced two annotations using as evidence:

Annotation 1:
Uniprot proteins search using species keyword "fusarium"
Pubmed mRNA for the name of the organism
Prior annotation reference transcripts

Annotation 2:
Uniprot proteins search using species keyword "fusarium"
Pubmed mRNA for the name of the organism
Prior annotation reference transcripts
mRNA trinity assembly pasafly of different strain (only RNA-seq available)

I'm not sure if it was a smart move to use the prior annotation reference transcripts?

I want to compare these two annotations and have produced AED scores. How do I generate summary stats/figures to compare annotations. You mentioned last year in a post Mike Campbell has a script to produce these, do you know if he will post it? I've got the Eval program and converted to gtf format using the provided script, just waiting on some perl modules to be installed by admin to test it. I'm waiting on some perl modules to be installed by our administrator to test out the "Evaluator" and "compare" programs too, what do they do?

Best Wishes
Rob

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and we believe 
but do not warrant that this e-mail and any attachments
thereto do not contain any viruses. However, you are fully
responsible for performing any virus scanning.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140310/c3507502/attachment.html>

From dence at genetics.utah.edu  Mon Mar 10 09:47:42 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Mon, 10 Mar 2014 14:47:42 +0000
Subject: [maker-devel] keep_preds values
In-Reply-To: <6765E2B1-3B6F-4F5D-92E1-80AE8C315FE3@slu.se>
References: <6765E2B1-3B6F-4F5D-92E1-80AE8C315FE3@slu.se>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6BA90@mxb2.hg.genetics.utah.edu>

Hi Mikael, 

The keep_preds parameter is often used the same as a binary parameter, but it doesn't have to be. The concordance that is mentioned in the comment line is the AED for that prediction. AED is a measurement of how well a prediction is supported by the evidence and ranges from 0 - 1. A prediction with an AED of 0 matches the evidence exactly while a prediction with an AED of 1 isn't overlapped by any evidence. 

The default behavior for MAKER is to make a gene model out of a prediction with any AED <1. When you change the keep_preds option from 0 to 1, then MAKER will make a gene model out of any prediction that matches the other parameters (like single_exon, min_exon, etc). Setting the keep_preds option to somewhere in between 0 and 1 will set a ceiling on the AED required for promoting a prediction to a gene model. 

>From a user standpoint, when you will almost certainly lose gene models when you set AED at an intermediate value, but you might benefit by knowing that all your models will now have an AED of at least a certain value. 

I hope that helps; let me know if it didn't. 

~Daniel

PS The original paper that described the AED is Eilbeck et al in BMC Bioinformatics 2009. It's also discussed in more detail in the MAKER2 paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews Genetics paper from 2012. 

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Mikael Brandstr?m Durling [mikael.durling at slu.se]
Sent: Monday, March 10, 2014 4:27 AM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] keep_preds values

Hi,

Can someone, please, explain the keep_preds parameter, as it works now with a value between 1 and 0? It used to be binary, but now it seems to test concordance towards something. The maker wiki doesn?t explain it any further either.

Thanks,
Mikael


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Mon Mar 10 10:51:21 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Mar 2014 08:51:21 -0700
Subject: [maker-devel] keep_preds values
Message-ID: <CF432CF3.A9C7%carsonhh@gmail.com>

Actually that is false. The keep_preds option is still binary.  Any value
other than 0 sets it to true.  There was discussion about making it a
non-binary value, but that has not been implemented.

?Carson


On 3/10/14, 7:47 AM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hi Mikael, 
>
>The keep_preds parameter is often used the same as a binary parameter,
>but it doesn't have to be. The concordance that is mentioned in the
>comment line is the AED for that prediction. AED is a measurement of how
>well a prediction is supported by the evidence and ranges from 0 - 1. A
>prediction with an AED of 0 matches the evidence exactly while a
>prediction with an AED of 1 isn't overlapped by any evidence.
>
>The default behavior for MAKER is to make a gene model out of a
>prediction with any AED <1. When you change the keep_preds option from 0
>to 1, then MAKER will make a gene model out of any prediction that
>matches the other parameters (like single_exon, min_exon, etc). Setting
>the keep_preds option to somewhere in between 0 and 1 will set a ceiling
>on the AED required for promoting a prediction to a gene model.
>
>From a user standpoint, when you will almost certainly lose gene models
>when you set AED at an intermediate value, but you might benefit by
>knowing that all your models will now have an AED of at least a certain
>value. 
>
>I hope that helps; let me know if it didn't.
>
>~Daniel
>
>PS The original paper that described the AED is Eilbeck et al in BMC
>Bioinformatics 2009. It's also discussed in more detail in the MAKER2
>paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews
>Genetics paper from 2012.
>
>Daniel Ence
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>Mikael Brandstr?m Durling [mikael.durling at slu.se]
>Sent: Monday, March 10, 2014 4:27 AM
>To: maker-devel at yandell-lab.org
>Subject: [maker-devel] keep_preds values
>
>Hi,
>
>Can someone, please, explain the keep_preds parameter, as it works now
>with a value between 1 and 0? It used to be binary, but now it seems to
>test concordance towards something. The maker wiki doesn?t explain it any
>further either.
>
>Thanks,
>Mikael
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From mikael.durling at slu.se  Mon Mar 10 09:57:23 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Mon, 10 Mar 2014 14:57:23 +0000
Subject: [maker-devel] keep_preds values
In-Reply-To: <CF432CF3.A9C7%carsonhh@gmail.com>
References: <CF432CF3.A9C7%carsonhh@gmail.com>
Message-ID: <E01F696F-4FC2-4B22-86B7-E40A5585A6F1@slu.se>

Hi Carson and Daniel,

That sounds more logical to me.  Then it would be appropriate to change the comment of keep_preds in the generated config files.

Would it make sense to make keep_preds a non-binary value to evaluate the concordance between ab initio models obtained from different predictors? That would assume that it is less likely to be a false positive when two or more predictors suggest the same unsported model?

Mikael


10 mar 2014 kl. 16:51 skrev Carson Holt <carsonhh at gmail.com>:

> Actually that is false. The keep_preds option is still binary.  Any value
> other than 0 sets it to true.  There was discussion about making it a
> non-binary value, but that has not been implemented.
> 
> ?Carson
> 
> 
> On 3/10/14, 7:47 AM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
> 
>> Hi Mikael, 
>> 
>> The keep_preds parameter is often used the same as a binary parameter,
>> but it doesn't have to be. The concordance that is mentioned in the
>> comment line is the AED for that prediction. AED is a measurement of how
>> well a prediction is supported by the evidence and ranges from 0 - 1. A
>> prediction with an AED of 0 matches the evidence exactly while a
>> prediction with an AED of 1 isn't overlapped by any evidence.
>> 
>> The default behavior for MAKER is to make a gene model out of a
>> prediction with any AED <1. When you change the keep_preds option from 0
>> to 1, then MAKER will make a gene model out of any prediction that
>> matches the other parameters (like single_exon, min_exon, etc). Setting
>> the keep_preds option to somewhere in between 0 and 1 will set a ceiling
>> on the AED required for promoting a prediction to a gene model.
>> 
>> From a user standpoint, when you will almost certainly lose gene models
>> when you set AED at an intermediate value, but you might benefit by
>> knowing that all your models will now have an AED of at least a certain
>> value. 
>> 
>> I hope that helps; let me know if it didn't.
>> 
>> ~Daniel
>> 
>> PS The original paper that described the AED is Eilbeck et al in BMC
>> Bioinformatics 2009. It's also discussed in more detail in the MAKER2
>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews
>> Genetics paper from 2012.
>> 
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> ________________________________________
>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>> Mikael Brandstr?m Durling [mikael.durling at slu.se]
>> Sent: Monday, March 10, 2014 4:27 AM
>> To: maker-devel at yandell-lab.org
>> Subject: [maker-devel] keep_preds values
>> 
>> Hi,
>> 
>> Can someone, please, explain the keep_preds parameter, as it works now
>> with a value between 1 and 0? It used to be binary, but now it seems to
>> test concordance towards something. The maker wiki doesn?t explain it any
>> further either.
>> 
>> Thanks,
>> Mikael
>> 
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> 


From carsonhh at gmail.com  Mon Mar 10 10:59:43 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Mar 2014 08:59:43 -0700
Subject: [maker-devel] keep_preds values
In-Reply-To: <E01F696F-4FC2-4B22-86B7-E40A5585A6F1@slu.se>
References: <CF432CF3.A9C7%carsonhh@gmail.com>
	<E01F696F-4FC2-4B22-86B7-E40A5585A6F1@slu.se>
Message-ID: <CF432F23.A9D4%carsonhh@gmail.com>

Yes.  It will eventually perform an AED like calculation between multiple
predictors (i.e. if you use 3 predictors it, then you require support by
at least 2 predictors across all exons to get a value of 0.33).  A value
of 0 would be perfect concordance across all 3 predictors.

?Carson


On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
wrote:

>Hi Carson and Daniel,
>
>That sounds more logical to me.  Then it would be appropriate to change
>the comment of keep_preds in the generated config files.
>
>Would it make sense to make keep_preds a non-binary value to evaluate the
>concordance between ab initio models obtained from different predictors?
>That would assume that it is less likely to be a false positive when two
>or more predictors suggest the same unsported model?
>
>Mikael
>
>
>10 mar 2014 kl. 16:51 skrev Carson Holt <carsonhh at gmail.com>:
>
>> Actually that is false. The keep_preds option is still binary.  Any
>>value
>> other than 0 sets it to true.  There was discussion about making it a
>> non-binary value, but that has not been implemented.
>> 
>> ?Carson
>> 
>> 
>> On 3/10/14, 7:47 AM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>> 
>>> Hi Mikael, 
>>> 
>>> The keep_preds parameter is often used the same as a binary parameter,
>>> but it doesn't have to be. The concordance that is mentioned in the
>>> comment line is the AED for that prediction. AED is a measurement of
>>>how
>>> well a prediction is supported by the evidence and ranges from 0 - 1. A
>>> prediction with an AED of 0 matches the evidence exactly while a
>>> prediction with an AED of 1 isn't overlapped by any evidence.
>>> 
>>> The default behavior for MAKER is to make a gene model out of a
>>> prediction with any AED <1. When you change the keep_preds option from
>>>0
>>> to 1, then MAKER will make a gene model out of any prediction that
>>> matches the other parameters (like single_exon, min_exon, etc). Setting
>>> the keep_preds option to somewhere in between 0 and 1 will set a
>>>ceiling
>>> on the AED required for promoting a prediction to a gene model.
>>> 
>>> From a user standpoint, when you will almost certainly lose gene models
>>> when you set AED at an intermediate value, but you might benefit by
>>> knowing that all your models will now have an AED of at least a certain
>>> value. 
>>> 
>>> I hope that helps; let me know if it didn't.
>>> 
>>> ~Daniel
>>> 
>>> PS The original paper that described the AED is Eilbeck et al in BMC
>>> Bioinformatics 2009. It's also discussed in more detail in the MAKER2
>>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews
>>> Genetics paper from 2012.
>>> 
>>> Daniel Ence
>>> Graduate Student
>>> Eccles Institute of Human Genetics
>>> University of Utah
>>> 15 North 2030 East, Room 2100
>>> Salt Lake City, UT 84112-5330
>>> ________________________________________
>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>>> Mikael Brandstr?m Durling [mikael.durling at slu.se]
>>> Sent: Monday, March 10, 2014 4:27 AM
>>> To: maker-devel at yandell-lab.org
>>> Subject: [maker-devel] keep_preds values
>>> 
>>> Hi,
>>> 
>>> Can someone, please, explain the keep_preds parameter, as it works now
>>> with a value between 1 and 0? It used to be binary, but now it seems to
>>> test concordance towards something. The maker wiki doesn?t explain it
>>>any
>>> further either.
>>> 
>>> Thanks,
>>> Mikael
>>> 
>>> 
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> 
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
>> 
>


From mikael.durling at slu.se  Mon Mar 10 10:08:16 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Mon, 10 Mar 2014 15:08:16 +0000
Subject: [maker-devel] keep_preds values
In-Reply-To: <CF432F23.A9D4%carsonhh@gmail.com>
References: <CF432CF3.A9C7%carsonhh@gmail.com>
	<E01F696F-4FC2-4B22-86B7-E40A5585A6F1@slu.se>
	<CF432F23.A9D4%carsonhh@gmail.com>
Message-ID: <00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se>

Ok. But that is not implemented no as far as I can tell from the source, right? Or is it reflected in the AED for the unsupported models?

Mikael

10 mar 2014 kl. 16:59 skrev Carson Holt <carsonhh at gmail.com>:

> Yes.  It will eventually perform an AED like calculation between multiple
> predictors (i.e. if you use 3 predictors it, then you require support by
> at least 2 predictors across all exons to get a value of 0.33).  A value
> of 0 would be perfect concordance across all 3 predictors.
> 
> ?Carson
> 
> 
> 
> 
> On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
> wrote:
> 
>> Hi Carson and Daniel,
>> 
>> That sounds more logical to me.  Then it would be appropriate to change
>> the comment of keep_preds in the generated config files.
>> 
>> Would it make sense to make keep_preds a non-binary value to evaluate the
>> concordance between ab initio models obtained from different predictors?
>> That would assume that it is less likely to be a false positive when two
>> or more predictors suggest the same unsported model?
>> 
>> Mikael
>> 
>> 
>> 10 mar 2014 kl. 16:51 skrev Carson Holt <carsonhh at gmail.com>:
>> 
>>> Actually that is false. The keep_preds option is still binary.  Any
>>> value
>>> other than 0 sets it to true.  There was discussion about making it a
>>> non-binary value, but that has not been implemented.
>>> 
>>> ?Carson
>>> 
>>> 
>>> On 3/10/14, 7:47 AM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>> 
>>>> Hi Mikael, 
>>>> 
>>>> The keep_preds parameter is often used the same as a binary parameter,
>>>> but it doesn't have to be. The concordance that is mentioned in the
>>>> comment line is the AED for that prediction. AED is a measurement of
>>>> how
>>>> well a prediction is supported by the evidence and ranges from 0 - 1. A
>>>> prediction with an AED of 0 matches the evidence exactly while a
>>>> prediction with an AED of 1 isn't overlapped by any evidence.
>>>> 
>>>> The default behavior for MAKER is to make a gene model out of a
>>>> prediction with any AED <1. When you change the keep_preds option from
>>>> 0
>>>> to 1, then MAKER will make a gene model out of any prediction that
>>>> matches the other parameters (like single_exon, min_exon, etc). Setting
>>>> the keep_preds option to somewhere in between 0 and 1 will set a
>>>> ceiling
>>>> on the AED required for promoting a prediction to a gene model.
>>>> 
>>>> From a user standpoint, when you will almost certainly lose gene models
>>>> when you set AED at an intermediate value, but you might benefit by
>>>> knowing that all your models will now have an AED of at least a certain
>>>> value. 
>>>> 
>>>> I hope that helps; let me know if it didn't.
>>>> 
>>>> ~Daniel
>>>> 
>>>> PS The original paper that described the AED is Eilbeck et al in BMC
>>>> Bioinformatics 2009. It's also discussed in more detail in the MAKER2
>>>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews
>>>> Genetics paper from 2012.
>>>> 
>>>> Daniel Ence
>>>> Graduate Student
>>>> Eccles Institute of Human Genetics
>>>> University of Utah
>>>> 15 North 2030 East, Room 2100
>>>> Salt Lake City, UT 84112-5330
>>>> ________________________________________
>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>>>> Mikael Brandstr?m Durling [mikael.durling at slu.se]
>>>> Sent: Monday, March 10, 2014 4:27 AM
>>>> To: maker-devel at yandell-lab.org
>>>> Subject: [maker-devel] keep_preds values
>>>> 
>>>> Hi,
>>>> 
>>>> Can someone, please, explain the keep_preds parameter, as it works now
>>>> with a value between 1 and 0? It used to be binary, but now it seems to
>>>> test concordance towards something. The maker wiki doesn?t explain it
>>>> any
>>>> further either.
>>>> 
>>>> Thanks,
>>>> Mikael
>>>> 
>>>> 
>>>> _______________________________________________
>>>> maker-devel mailing list
>>>> maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>> 
>>>> _______________________________________________
>>>> maker-devel mailing list
>>>> maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> 
>>> 
>> 
> 
> 


From carsonhh at gmail.com  Mon Mar 10 11:16:59 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Mar 2014 09:16:59 -0700
Subject: [maker-devel] keep_preds values
In-Reply-To: <00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se>
References: <CF432CF3.A9C7%carsonhh@gmail.com>
	<E01F696F-4FC2-4B22-86B7-E40A5585A6F1@slu.se>
	<CF432F23.A9D4%carsonhh@gmail.com>
	<00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se>
Message-ID: <CF4331A9.A9E0%carsonhh@gmail.com>

There is a value called abAED being calculated, which somewhat captures
the concordance among the predictors.  It is not currently printed in the
GFF3, but it is used to identify the best non-overlapping ab initio
predictor to put in the non-overlapping fasta file.  There are a couple of
things I still need to do with it to though.  It?s not yet normalized to
take into account the absence of a predictor in the cluster of overlapping
predictions. For example, if I have 2 predictors and 2 make perfectly
matching calls and 1 makes no call, they get a score of 0 before I have
perfect concordance between what?s there, but I really should make it 0.33
because the abscence of the third predictor is meaningful.  The
unnormalized concordance value is fine for deciding which overlapping
model to keep in the file, but not for global comparison.

?Carson


On 3/10/14, 8:08 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
wrote:

>Ok. But that is not implemented no as far as I can tell from the source,
>right? Or is it reflected in the AED for the unsupported models?
>
>Mikael
>
>10 mar 2014 kl. 16:59 skrev Carson Holt <carsonhh at gmail.com>:
>
>> Yes.  It will eventually perform an AED like calculation between
>>multiple
>> predictors (i.e. if you use 3 predictors it, then you require support by
>> at least 2 predictors across all exons to get a value of 0.33).  A value
>> of 0 would be perfect concordance across all 3 predictors.
>> 
>> ?Carson
>> 
>> 
>> 
>> 
>> On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
>> wrote:
>> 
>>> Hi Carson and Daniel,
>>> 
>>> That sounds more logical to me.  Then it would be appropriate to change
>>> the comment of keep_preds in the generated config files.
>>> 
>>> Would it make sense to make keep_preds a non-binary value to evaluate
>>>the
>>> concordance between ab initio models obtained from different
>>>predictors?
>>> That would assume that it is less likely to be a false positive when
>>>two
>>> or more predictors suggest the same unsported model?
>>> 
>>> Mikael
>>> 
>>> 
>>> 10 mar 2014 kl. 16:51 skrev Carson Holt <carsonhh at gmail.com>:
>>> 
>>>> Actually that is false. The keep_preds option is still binary.  Any
>>>> value
>>>> other than 0 sets it to true.  There was discussion about making it a
>>>> non-binary value, but that has not been implemented.
>>>> 
>>>> ?Carson
>>>> 
>>>> 
>>>> On 3/10/14, 7:47 AM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>>> 
>>>>> Hi Mikael, 
>>>>> 
>>>>> The keep_preds parameter is often used the same as a binary
>>>>>parameter,
>>>>> but it doesn't have to be. The concordance that is mentioned in the
>>>>> comment line is the AED for that prediction. AED is a measurement of
>>>>> how
>>>>> well a prediction is supported by the evidence and ranges from 0 -
>>>>>1. A
>>>>> prediction with an AED of 0 matches the evidence exactly while a
>>>>> prediction with an AED of 1 isn't overlapped by any evidence.
>>>>> 
>>>>> The default behavior for MAKER is to make a gene model out of a
>>>>> prediction with any AED <1. When you change the keep_preds option
>>>>>from
>>>>> 0
>>>>> to 1, then MAKER will make a gene model out of any prediction that
>>>>> matches the other parameters (like single_exon, min_exon, etc).
>>>>>Setting
>>>>> the keep_preds option to somewhere in between 0 and 1 will set a
>>>>> ceiling
>>>>> on the AED required for promoting a prediction to a gene model.
>>>>> 
>>>>> From a user standpoint, when you will almost certainly lose gene
>>>>>models
>>>>> when you set AED at an intermediate value, but you might benefit by
>>>>> knowing that all your models will now have an AED of at least a
>>>>>certain
>>>>> value. 
>>>>> 
>>>>> I hope that helps; let me know if it didn't.
>>>>> 
>>>>> ~Daniel
>>>>> 
>>>>> PS The original paper that described the AED is Eilbeck et al in BMC
>>>>> Bioinformatics 2009. It's also discussed in more detail in the MAKER2
>>>>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews
>>>>> Genetics paper from 2012.
>>>>> 
>>>>> Daniel Ence
>>>>> Graduate Student
>>>>> Eccles Institute of Human Genetics
>>>>> University of Utah
>>>>> 15 North 2030 East, Room 2100
>>>>> Salt Lake City, UT 84112-5330
>>>>> ________________________________________
>>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>>>>> Mikael Brandstr?m Durling [mikael.durling at slu.se]
>>>>> Sent: Monday, March 10, 2014 4:27 AM
>>>>> To: maker-devel at yandell-lab.org
>>>>> Subject: [maker-devel] keep_preds values
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> Can someone, please, explain the keep_preds parameter, as it works
>>>>>now
>>>>> with a value between 1 and 0? It used to be binary, but now it seems
>>>>>to
>>>>> test concordance towards something. The maker wiki doesn?t explain it
>>>>> any
>>>>> further either.
>>>>> 
>>>>> Thanks,
>>>>> Mikael
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> maker-devel mailing list
>>>>> maker-devel at box290.bluehost.com
>>>>> 
>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or
>>>>>g
>>>>> 
>>>>> _______________________________________________
>>>>> maker-devel mailing list
>>>>> maker-devel at box290.bluehost.com
>>>>> 
>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or
>>>>>g
>>>> 
>>>> 
>>> 
>> 
>> 
>


From carsonhh at gmail.com  Mon Mar 10 11:18:14 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Mar 2014 09:18:14 -0700
Subject: [maker-devel] keep_preds values
In-Reply-To: <CF4331A9.A9E0%carsonhh@gmail.com>
References: <CF432CF3.A9C7%carsonhh@gmail.com>
	<E01F696F-4FC2-4B22-86B7-E40A5585A6F1@slu.se>
	<CF432F23.A9D4%carsonhh@gmail.com>
	<00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se>
	<CF4331A9.A9E0%carsonhh@gmail.com>
Message-ID: <CF4333C1.AA06%carsonhh@gmail.com>

Sorry meant to say "3 predictors and 2 make perfectly
matching calls and 1 makes no call."


On 3/10/14, 9:16 AM, "Carson Holt" <carsonhh at gmail.com> wrote:

>There is a value called abAED being calculated, which somewhat captures
>the concordance among the predictors.  It is not currently printed in the
>GFF3, but it is used to identify the best non-overlapping ab initio
>predictor to put in the non-overlapping fasta file.  There are a couple of
>things I still need to do with it to though.  It?s not yet normalized to
>take into account the absence of a predictor in the cluster of overlapping
>predictions. For example, if I have 2 predictors and 2 make perfectly
>matching calls and 1 makes no call, they get a score of 0 before I have
>perfect concordance between what?s there, but I really should make it 0.33
>because the abscence of the third predictor is meaningful.  The
>unnormalized concordance value is fine for deciding which overlapping
>model to keep in the file, but not for global comparison.
>
>?Carson
>
>
>
>On 3/10/14, 8:08 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
>wrote:
>
>>Ok. But that is not implemented no as far as I can tell from the source,
>>right? Or is it reflected in the AED for the unsupported models?
>>
>>Mikael
>>
>>10 mar 2014 kl. 16:59 skrev Carson Holt <carsonhh at gmail.com>:
>>
>>> Yes.  It will eventually perform an AED like calculation between
>>>multiple
>>> predictors (i.e. if you use 3 predictors it, then you require support
>>>by
>>> at least 2 predictors across all exons to get a value of 0.33).  A
>>>value
>>> of 0 would be perfect concordance across all 3 predictors.
>>> 
>>> ?Carson
>>> 
>>> 
>>> 
>>> 
>>> On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling"
>>><mikael.durling at slu.se>
>>> wrote:
>>> 
>>>> Hi Carson and Daniel,
>>>> 
>>>> That sounds more logical to me.  Then it would be appropriate to
>>>>change
>>>> the comment of keep_preds in the generated config files.
>>>> 
>>>> Would it make sense to make keep_preds a non-binary value to evaluate
>>>>the
>>>> concordance between ab initio models obtained from different
>>>>predictors?
>>>> That would assume that it is less likely to be a false positive when
>>>>two
>>>> or more predictors suggest the same unsported model?
>>>> 
>>>> Mikael
>>>> 
>>>> 
>>>> 10 mar 2014 kl. 16:51 skrev Carson Holt <carsonhh at gmail.com>:
>>>> 
>>>>> Actually that is false. The keep_preds option is still binary.  Any
>>>>> value
>>>>> other than 0 sets it to true.  There was discussion about making it a
>>>>> non-binary value, but that has not been implemented.
>>>>> 
>>>>> ?Carson
>>>>> 
>>>>> 
>>>>> On 3/10/14, 7:47 AM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>>>> 
>>>>>> Hi Mikael, 
>>>>>> 
>>>>>> The keep_preds parameter is often used the same as a binary
>>>>>>parameter,
>>>>>> but it doesn't have to be. The concordance that is mentioned in the
>>>>>> comment line is the AED for that prediction. AED is a measurement of
>>>>>> how
>>>>>> well a prediction is supported by the evidence and ranges from 0 -
>>>>>>1. A
>>>>>> prediction with an AED of 0 matches the evidence exactly while a
>>>>>> prediction with an AED of 1 isn't overlapped by any evidence.
>>>>>> 
>>>>>> The default behavior for MAKER is to make a gene model out of a
>>>>>> prediction with any AED <1. When you change the keep_preds option
>>>>>>from
>>>>>> 0
>>>>>> to 1, then MAKER will make a gene model out of any prediction that
>>>>>> matches the other parameters (like single_exon, min_exon, etc).
>>>>>>Setting
>>>>>> the keep_preds option to somewhere in between 0 and 1 will set a
>>>>>> ceiling
>>>>>> on the AED required for promoting a prediction to a gene model.
>>>>>> 
>>>>>> From a user standpoint, when you will almost certainly lose gene
>>>>>>models
>>>>>> when you set AED at an intermediate value, but you might benefit by
>>>>>> knowing that all your models will now have an AED of at least a
>>>>>>certain
>>>>>> value. 
>>>>>> 
>>>>>> I hope that helps; let me know if it didn't.
>>>>>> 
>>>>>> ~Daniel
>>>>>> 
>>>>>> PS The original paper that described the AED is Eilbeck et al in BMC
>>>>>> Bioinformatics 2009. It's also discussed in more detail in the
>>>>>>MAKER2
>>>>>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews
>>>>>> Genetics paper from 2012.
>>>>>> 
>>>>>> Daniel Ence
>>>>>> Graduate Student
>>>>>> Eccles Institute of Human Genetics
>>>>>> University of Utah
>>>>>> 15 North 2030 East, Room 2100
>>>>>> Salt Lake City, UT 84112-5330
>>>>>> ________________________________________
>>>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>>>>>> Mikael Brandstr?m Durling [mikael.durling at slu.se]
>>>>>> Sent: Monday, March 10, 2014 4:27 AM
>>>>>> To: maker-devel at yandell-lab.org
>>>>>> Subject: [maker-devel] keep_preds values
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Can someone, please, explain the keep_preds parameter, as it works
>>>>>>now
>>>>>> with a value between 1 and 0? It used to be binary, but now it seems
>>>>>>to
>>>>>> test concordance towards something. The maker wiki doesn?t explain
>>>>>>it
>>>>>> any
>>>>>> further either.
>>>>>> 
>>>>>> Thanks,
>>>>>> Mikael
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> maker-devel mailing list
>>>>>> maker-devel at box290.bluehost.com
>>>>>> 
>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o
>>>>>>r
>>>>>>g
>>>>>> 
>>>>>> _______________________________________________
>>>>>> maker-devel mailing list
>>>>>> maker-devel at box290.bluehost.com
>>>>>> 
>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o
>>>>>>r
>>>>>>g
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>
>
>


From carsonhh at gmail.com  Mon Mar 10 11:25:50 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Mar 2014 09:25:50 -0700
Subject: [maker-devel] annotation comparison aed plots
Message-ID: <CF4330EC.A9DA%carsonhh@gmail.com>

I don?t know about Michaels?s script, but I?ve always used eval.  It
produces sensitivity/specificity metrics.  It assumes the first models are
100% correct, and then tells you the sensitivity/specificity value for the
second models.

It is not therefor a quality metric.  Instead you should view it as a change
metric. Lower sensitivity tells you that models/exons have been lost between
versions, and lower specificity tells you models/exons have been gained.
There will also be a lost of generic statistics on exon/intron distribution
and UTR length.  Then the AED values from the MAEKR run can be used
independently to evaluate how well models match the evidence.

?Carson


From:  "Robert King (RRes-Roth)" <robert.king at rothamsted.ac.uk>
Date:  Monday, March 10, 2014 at 5:17 AM
To:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  [maker-devel] annotation comparison aed plots

Dear Maker Developers,
 
I?ve updated a reference that was had errors and was a little incomplete and
now trying to produce a annotation for it. Please note the reference has not
changed dramatically. I?ve produced two annotations using as evidence:
 
Annotation 1:
Uniprot proteins search using species keyword ?fusarium?
Pubmed mRNA for the name of the organism
Prior annotation reference transcripts
 
Annotation 2:
Uniprot proteins search using species keyword ?fusarium?
Pubmed mRNA for the name of the organism
Prior annotation reference transcripts
mRNA trinity assembly pasafly of different strain (only RNA-seq available)
 
I?m not sure if it was a smart move to use the prior annotation reference
transcripts?
 
I want to compare these two annotations and have produced AED scores. How do
I generate summary stats/figures to compare annotations. You mentioned last
year in a post Mike Campbell has a script to produce these, do you know if
he will post it? I?ve got the Eval program and converted to gtf format using
the provided script, just waiting on some perl modules to be installed by
admin to test it. I?m waiting on some perl modules to be installed by our
administrator to test out the ?Evaluator? and ?compare? programs too, what
do they do?
 
Best Wishes
Rob

-- 
This message has been scanned for viruses and
dangerous content by MailScanner <http://www.mailscanner.info/> , and
we believe  but do not warrant that this e-mail and any attachments thereto
do not contain any viruses. However, you are fully responsible for
performing any virus scanning.
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140310/cbd8263c/attachment.html>

From michael.s.campbell1 at gmail.com  Mon Mar 10 10:50:53 2014
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Mon, 10 Mar 2014 09:50:53 -0600
Subject: [maker-devel] annotation comparison aed plots
In-Reply-To: <CAAi6vWVWuP4b39zf+3k_SAwKuWxAFGRvAD3oNCugkuPLjagOww@mail.gmail.com>
References: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk>
	<CAAi6vWVWuP4b39zf+3k_SAwKuWxAFGRvAD3oNCugkuPLjagOww@mail.gmail.com>
Message-ID: <CAAi6vWUSY6UgyyXAJ5=-aUA_39FBwFREVX3xmeHSZaE264AKGw@mail.gmail.com>

One more point. The sensitivity, specificity,and accuracy produced by the
compare_annotations_3.2.pl script are gene level, and overlap is defined
very liberally between annotation sets is defined as at least one
nucleotide of an exon overlap.
Mike


On Mon, Mar 10, 2014 at 9:47 AM, Michael Campbell <
michael.s.campbell1 at gmail.com> wrote:

> Hi Robert,
>
> Here are the scripts that were mentioned before.
>
> The AED_cdf_generator.pl script is for making cumulative distribution
> function plots based on annotation edit distance. This script is quite
> simple and strait forward in its internals.
>
> The compare_annotations_3.2.pl script is for generating summary stats for
> annotations and will compare two annotations of the same assembly.
>
> You can run either script without arguments to get a usage statement.
>
> Thanks,
> Mike
>
>
> On Mon, Mar 10, 2014 at 6:17 AM, Robert King (RRes-Roth) <
> robert.king at rothamsted.ac.uk> wrote:
>
>>  Dear Maker Developers,
>>
>>
>>
>> I've updated a reference that was had errors and was a little incomplete
>> and now trying to produce a annotation for it. Please note the reference
>> has not changed dramatically. I've produced two annotations using as
>> evidence:
>>
>>
>>
>> Annotation 1:
>>
>> Uniprot proteins search using species keyword "fusarium"
>>
>> Pubmed mRNA for the name of the organism
>>
>> Prior annotation reference transcripts
>>
>>
>>
>> Annotation 2:
>>
>> Uniprot proteins search using species keyword "fusarium"
>>
>> Pubmed mRNA for the name of the organism
>>
>> Prior annotation reference transcripts
>>
>> mRNA trinity assembly pasafly of different strain (only RNA-seq available)
>>
>>
>>
>> I'm not sure if it was a smart move to use the prior annotation reference
>> transcripts?
>>
>>
>>
>> I want to compare these two annotations and have produced AED scores. How
>> do I generate summary stats/figures to compare annotations. You mentioned
>> last year in a post Mike Campbell has a script to produce these, do you
>> know if he will post it? I've got the Eval program and converted to gtf
>> format using the provided script, just waiting on some perl modules to be
>> installed by admin to test it. I'm waiting on some perl modules to be
>> installed by our administrator to test out the "Evaluator" and "compare"
>> programs too, what do they do?
>>
>>
>>
>> Best Wishes
>>
>> Rob
>>
>> --
>> This message has been scanned for viruses and
>> dangerous content by *MailScanner* <http://www.mailscanner.info/>, and
>> we believe but do not warrant that this e-mail and any attachments
>> thereto do not contain any viruses. However, you are fully responsible for
>> performing any virus scanning.
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>
>
> --
> Michael Campbell MS, RD.
> Doctoral Candidate
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ph:585-3543
>
>


-- 
Michael Campbell MS, RD.
Doctoral Candidate
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
ph:585-3543
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140310/25073390/attachment.html>

From cjfields at illinois.edu  Mon Mar 10 10:52:50 2014
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Mon, 10 Mar 2014 15:52:50 +0000
Subject: [maker-devel] geneid (or alternative ab initio predictors)
Message-ID: <CEB024AC-5E08-4827-9EC4-17D09F06E1FA@illinois.edu>

I have been running MAKER 2.31 using Augustus and SNAP on an avian genome.  Augustus gives pretty decent gene model predictions based on a custom model we have and the hints MAKER provides.  However, SNAP seems to throw out a ton of false positives; in many cases this appears to cause erroneous gene fusions.  Leaving out SNAP altogether however leads to a marked decrease in # models overall, which is worse.  GeneMark had a very similar problem (high # false positives) and thus no marked improvement, either when using with both Augustus and SNAP or with Augustus alone.

I have been exploring using geneid (http://genome.crg.es/software/geneid/) as an alternative, based on some feedback on another project I worked with int he past.  This would be feed into MAKER using external GFF, but I wanted to see if anyone has tried geneid with MAKER first.  

Finally, how hard would it be to incorporate alternative callers into MAKER?  For instance, would it be possible to add these like a ?plugin??  

chris


From carsonhh at gmail.com  Mon Mar 10 12:05:24 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Mar 2014 10:05:24 -0700
Subject: [maker-devel] geneid (or alternative ab initio predictors)
Message-ID: <CF433C40.AA26%carsonhh@gmail.com>

Adding a new predictor can take some time.  It obviously requires some
coding.  It?s usually not too hard just to convert results to GFF3 and
then pass it in.  Integrated support is really only beneficial for
predictors that can take ?hints? from evidence alignments (for example we
are working on EVM integration right now -
http://evidencemodeler.sourceforge.net).  If SNAP and GeneMark give
problems just drop them.  GeneMark really doesn?t work very good on
genomes with complex intron/exon structure (and I really wouldn?t use it
for anything but fungi).

Make sure you are also giving sufficient protein evidence.  Perhaps all
proteins from chicken and pigeon for example.  Then you shouldn?t find
loss of any true genes if just using Augustus.  Also try not to use gene
count as an indicator of performance.  The value is very deceptive,
especially if the genome assembly is fragmented.

Thanks,
Carson


On 3/10/14, 8:52 AM, "Fields, Christopher J" <cjfields at illinois.edu> wrote:

>I have been running MAKER 2.31 using Augustus and SNAP on an avian
>genome.  Augustus gives pretty decent gene model predictions based on a
>custom model we have and the hints MAKER provides.  However, SNAP seems
>to throw out a ton of false positives; in many cases this appears to
>cause erroneous gene fusions.  Leaving out SNAP altogether however leads
>to a marked decrease in # models overall, which is worse.  GeneMark had a
>very similar problem (high # false positives) and thus no marked
>improvement, either when using with both Augustus and SNAP or with
>Augustus alone.
>
>I have been exploring using geneid
>(http://genome.crg.es/software/geneid/) as an alternative, based on some
>feedback on another project I worked with int he past.  This would be
>feed into MAKER using external GFF, but I wanted to see if anyone has
>tried geneid with MAKER first.
>
>Finally, how hard would it be to incorporate alternative callers into
>MAKER?  For instance, would it be possible to add these like a ?plugin??
>
>chris
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From michael.s.campbell1 at gmail.com  Mon Mar 10 10:47:50 2014
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Mon, 10 Mar 2014 09:47:50 -0600
Subject: [maker-devel] annotation comparison aed plots
In-Reply-To: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk>
References: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk>
Message-ID: <CAAi6vWVWuP4b39zf+3k_SAwKuWxAFGRvAD3oNCugkuPLjagOww@mail.gmail.com>

Hi Robert,

Here are the scripts that were mentioned before.

The AED_cdf_generator.pl script is for making cumulative distribution
function plots based on annotation edit distance. This script is quite
simple and strait forward in its internals.

The compare_annotations_3.2.pl script is for generating summary stats for
annotations and will compare two annotations of the same assembly.

You can run either script without arguments to get a usage statement.

Thanks,
Mike


On Mon, Mar 10, 2014 at 6:17 AM, Robert King (RRes-Roth) <
robert.king at rothamsted.ac.uk> wrote:

>  Dear Maker Developers,
>
>
>
> I've updated a reference that was had errors and was a little incomplete
> and now trying to produce a annotation for it. Please note the reference
> has not changed dramatically. I've produced two annotations using as
> evidence:
>
>
>
> Annotation 1:
>
> Uniprot proteins search using species keyword "fusarium"
>
> Pubmed mRNA for the name of the organism
>
> Prior annotation reference transcripts
>
>
>
> Annotation 2:
>
> Uniprot proteins search using species keyword "fusarium"
>
> Pubmed mRNA for the name of the organism
>
> Prior annotation reference transcripts
>
> mRNA trinity assembly pasafly of different strain (only RNA-seq available)
>
>
>
> I'm not sure if it was a smart move to use the prior annotation reference
> transcripts?
>
>
>
> I want to compare these two annotations and have produced AED scores. How
> do I generate summary stats/figures to compare annotations. You mentioned
> last year in a post Mike Campbell has a script to produce these, do you
> know if he will post it? I've got the Eval program and converted to gtf
> format using the provided script, just waiting on some perl modules to be
> installed by admin to test it. I'm waiting on some perl modules to be
> installed by our administrator to test out the "Evaluator" and "compare"
> programs too, what do they do?
>
>
>
> Best Wishes
>
> Rob
>
> --
> This message has been scanned for viruses and
> dangerous content by *MailScanner* <http://www.mailscanner.info/>, and
> we believe but do not warrant that this e-mail and any attachments thereto
> do not contain any viruses. However, you are fully responsible for
> performing any virus scanning.
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>


-- 
Michael Campbell MS, RD.
Doctoral Candidate
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
ph:585-3543
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140310/e21497bc/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: AED_cdf_generator.pl
Type: text/x-perl-script
Size: 2579 bytes
Desc: not available
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140310/e21497bc/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: compare_annotations_3.2.pl
Type: text/x-perl-script
Size: 29154 bytes
Desc: not available
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140310/e21497bc/attachment-0001.bin>

From sajeet at gmail.com  Mon Mar 10 13:31:40 2014
From: sajeet at gmail.com (Sajeet Haridas)
Date: Mon, 10 Mar 2014 11:31:40 -0700
Subject: [maker-devel] geneid (or alternative ab initio predictors)
In-Reply-To: <CF433C40.AA26%carsonhh@gmail.com>
References: <CF433C40.AA26%carsonhh@gmail.com>
Message-ID: <CAJrwUqnXZgJgse2X6z7QhQcC_aNih_dp90dpjsW037F0Qk-W4A@mail.gmail.com>

One of the problems I have found with genemark is that it does not
understand a soft-masked genome. Hence, the self training is incorrect. I
have found marked improvement to genemark's prediction by running the
training on a hard masked genome.


On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt <carsonhh at gmail.com> wrote:

> Adding a new predictor can take some time.  It obviously requires some
> coding.  It's usually not too hard just to convert results to GFF3 and
> then pass it in.  Integrated support is really only beneficial for
> predictors that can take "hints" from evidence alignments (for example we
> are working on EVM integration right now -
> http://evidencemodeler.sourceforge.net).  If SNAP and GeneMark give
> problems just drop them.  GeneMark really doesn't work very good on
> genomes with complex intron/exon structure (and I really wouldn't use it
> for anything but fungi).
>
> Make sure you are also giving sufficient protein evidence.  Perhaps all
> proteins from chicken and pigeon for example.  Then you shouldn't find
> loss of any true genes if just using Augustus.  Also try not to use gene
> count as an indicator of performance.  The value is very deceptive,
> especially if the genome assembly is fragmented.
>
> Thanks,
> Carson
>
>
>
> On 3/10/14, 8:52 AM, "Fields, Christopher J" <cjfields at illinois.edu>
> wrote:
>
> >I have been running MAKER 2.31 using Augustus and SNAP on an avian
> >genome.  Augustus gives pretty decent gene model predictions based on a
> >custom model we have and the hints MAKER provides.  However, SNAP seems
> >to throw out a ton of false positives; in many cases this appears to
> >cause erroneous gene fusions.  Leaving out SNAP altogether however leads
> >to a marked decrease in # models overall, which is worse.  GeneMark had a
> >very similar problem (high # false positives) and thus no marked
> >improvement, either when using with both Augustus and SNAP or with
> >Augustus alone.
> >
> >I have been exploring using geneid
> >(http://genome.crg.es/software/geneid/) as an alternative, based on some
> >feedback on another project I worked with int he past.  This would be
> >feed into MAKER using external GFF, but I wanted to see if anyone has
> >tried geneid with MAKER first.
> >
> >Finally, how hard would it be to incorporate alternative callers into
> >MAKER?  For instance, would it be possible to add these like a 'plugin'?
> >
> >chris
> >_______________________________________________
> >maker-devel mailing list
> >maker-devel at box290.bluehost.com
> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140310/e3f33e33/attachment.html>

From carsonhh at gmail.com  Mon Mar 10 23:13:43 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Mar 2014 22:13:43 -0600
Subject: [maker-devel] Long introns from Augustus
In-Reply-To: <61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com>
References: <CF3E5643.A94C%carsonhh@gmail.com>
	<61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com>
Message-ID: <99883695-A1E7-4B03-BB8D-06863D8132E5@gmail.com>

Maybe.  The max intron length will affect evidence alignments and clustering, which will be used as hints to Augustus. You can give it a try. If you lack transcriptome data, just make sure you provide it with a couple of related proteomes.

--Carson

Sent from my iPhone

> On Mar 6, 2014, at 5:48 PM, Shane Brubaker <sbrubaker at solazyme.com> wrote:
> 
> Actually these are calls directly from Augustus (without using Maker).  They are not purely ab initio in that they are using hints from RNA-Seq data.
> 
> I had noticed that Maker does have some information about max intron length - does that mean it could be taken care of by Maker?  I don't have very good "EST" (transcriptome) assemblies because it is a very difficult organism to sequence.
> 
> 
> -----Original Message-----
> From: Carson Holt [mailto:carsonhh at gmail.com] 
> Sent: Thursday, March 06, 2014 3:47 PM
> To: Shane Brubaker; maker-devel at yandell-lab.org
> Subject: Re: [maker-devel] Long introns from Augustus
> 
> Are these the ab intio calls that are merged or final MAKER models.
> 
> ?Carson
> 
> 
>> On 3/6/14, 4:41 PM, "Shane Brubaker" <sbrubaker at solazyme.com> wrote:
>> 
>> Hi, we have a very compact genome and we are getting a lot of fused 
>> gene models from running Augustus.  I am wondering if anyone has any 
>> advice about how to prevent introns above a certain cutoff from being created?
>> 
>> I tried a couple of things, some settings in a probabilities file and 
>> also changing a long list of probabilities to another file that someone 
>> had suggested on a forum.  So far I don't really see any changes though.
>> 
>> Any advice would be greatly appreciated.
>> 
>> Thanks,
>> Shane
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> 


From darasappan at gmail.com  Mon Mar 10 15:14:03 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Mon, 10 Mar 2014 15:14:03 -0500
Subject: [maker-devel] maker output- transcripts.fasta and proteins.fasta
	files missing
Message-ID: <E1538E4B-E356-4044-BD96-6D56D6F65C87@gmail.com>

Hello,

I've been running maker with different assembly files, reference files  
etc  and I check the output by:

1. concatenating the gff files
2. concatenating the *transcripts.fasta files
3. concatenating the *proteins.fasta files

I'm noticing that when I ran maker twice with same parameters, the  
second time around, many of the output subdirectories  do not have a  
*transcripts.fasta or *proteins.fasta file in it.
There are 251 subdirectories and only 97 of them have all 3 output  
files.  Maker log looks ok to me, but I've attached it here as well.

What could be the reason for this?

Thanks
dhivya

-------------- next part --------------
A non-text attachment was scrubbed...
Name: maker.o1813247.gz
Type: application/x-gzip
Size: 13857217 bytes
Desc: not available
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140310/34f3a118/attachment.gz>
-------------- next part --------------


From sbrubaker at solazyme.com  Tue Mar 11 12:06:57 2014
From: sbrubaker at solazyme.com (Shane Brubaker)
Date: Tue, 11 Mar 2014 17:06:57 +0000
Subject: [maker-devel] Long introns from Augustus
In-Reply-To: <99883695-A1E7-4B03-BB8D-06863D8132E5@gmail.com>
References: <CF3E5643.A94C%carsonhh@gmail.com>
	<61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com>
	<99883695-A1E7-4B03-BB8D-06863D8132E5@gmail.com>
Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA50F08FB3@EXCHANGE-MB01.internal.solazyme.com>

Ok thank you.

-----Original Message-----
From: Carson Holt [mailto:carsonhh at gmail.com] 
Sent: Monday, March 10, 2014 9:14 PM
To: Shane Brubaker
Cc: maker-devel at yandell-lab.org
Subject: Re: [maker-devel] Long introns from Augustus

Maybe.  The max intron length will affect evidence alignments and clustering, which will be used as hints to Augustus. You can give it a try. If you lack transcriptome data, just make sure you provide it with a couple of related proteomes.

--Carson

Sent from my iPhone

> On Mar 6, 2014, at 5:48 PM, Shane Brubaker <sbrubaker at solazyme.com> wrote:
> 
> Actually these are calls directly from Augustus (without using Maker).  They are not purely ab initio in that they are using hints from RNA-Seq data.
> 
> I had noticed that Maker does have some information about max intron length - does that mean it could be taken care of by Maker?  I don't have very good "EST" (transcriptome) assemblies because it is a very difficult organism to sequence.
> 
> 
> -----Original Message-----
> From: Carson Holt [mailto:carsonhh at gmail.com]
> Sent: Thursday, March 06, 2014 3:47 PM
> To: Shane Brubaker; maker-devel at yandell-lab.org
> Subject: Re: [maker-devel] Long introns from Augustus
> 
> Are these the ab intio calls that are merged or final MAKER models.
> 
> ?Carson
> 
> 
>> On 3/6/14, 4:41 PM, "Shane Brubaker" <sbrubaker at solazyme.com> wrote:
>> 
>> Hi, we have a very compact genome and we are getting a lot of fused 
>> gene models from running Augustus.  I am wondering if anyone has any 
>> advice about how to prevent introns above a certain cutoff from being created?
>> 
>> I tried a couple of things, some settings in a probabilities file and 
>> also changing a long list of probabilities to another file that 
>> someone had suggested on a forum.  So far I don't really see any changes though.
>> 
>> Any advice would be greatly appreciated.
>> 
>> Thanks,
>> Shane
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o
>> rg
> 
> 

From carson.holt at genetics.utah.edu  Thu Mar 13 11:00:06 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Thu, 13 Mar 2014 16:00:06 +0000
Subject: [maker-devel] non-nucleotide characters in the maker generated
	transcripts
In-Reply-To: <CF47300B.AB4F%carson.holt@genetics.utah.edu>
References: <E8EDFB90D92694478065C37017B3A3A6A890C8AC@SKREGIXES2.AGR.GC.CA>
	<CF47300B.AB4F%carson.holt@genetics.utah.edu>
Message-ID: <CF4731CC.AB5E%carson.holt@genetics.utah.edu>

Just resending this to the correct maker-devel address.  Please when
replying, do not CC the incorrect maker-devel-bounce address.

Thanks,
Carson


On 3/13/14, 9:56 AM, "Carson Holt" <carson.holt at genetics.utah.edu> wrote:

>FGENESH is not a heavily used tool, so depending on which version it is
>(either too old or too new), output might be slightly different which
>could cause incorrect parsing. Could you tar up your maker.output folder,
>and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>(send me either your user/guest ID after you upload).
>
>For the BLAST error, use BLAST+ instead.  You are using blastall which is
>the old legacy version of NCBI BLAST.  You can do this by setting the
>blast type in maker_bopts.ctl and the location of executables in
>maker_exe.ctl.
>
>Thanks,
>Carson
>
>
>
>On 3/12/14, 11:58 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:
>
>>Dear Maker users
>>
>>
>>I ran maker (2.31) on a fungal genome and found out that it inserted the
>>word SCLAR   followed by a pair of bracket like this (0x22de7020)
>>inserted in the nucleotide sequence of some of the genes. This seems to
>>be related to transcripts predicted by fgenesh_masked.
>>
>>
>>Here is an example for one of the genes
>>
>>
>>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript
>>>offset:0 AE
>>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651
>>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23
>>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA
>>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG
>>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC
>>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT
>>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC
>>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT
>>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA
>>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA
>>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT
>>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT
>>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC
>>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG
>>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG
>>TTTCGACAAGC
>>
>>The same genome sequence was used for the first round of maker (2.10)
>>without such problem. I checked the sequence for the scaffold related to
>>one of the affected transcripts and there was no error in the sequence.
>>I am not sure what is causing this. The only error that I could spot in
>>the output error file is the following
>>
>>
>>[blastall] FATAL ERROR:  search cannot proceed due to errors in all
>>contexts/frames of query sequences.
>>
>>
>>
>>Your help is appreciated
>>
>>
>>
>>HB
>>
>>
>>
>>
>>
>>
>


From carsonhh at gmail.com  Thu Mar 13 11:14:54 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 13 Mar 2014 10:14:54 -0600
Subject: [maker-devel] maker output- transcripts.fasta and
	proteins.fasta files missing
In-Reply-To: <A22880BB-7693-4655-A3F3-D99D4F1FC08D@gmail.com>
References: <E1538E4B-E356-4044-BD96-6D56D6F65C87@gmail.com>
	<64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com>
	<CF4382AA.AA8B%carsonhh@gmail.com>
	<A1D096BC-F25A-48D9-8C7F-8A64946E57F7@gmail.com>
	<CF438653.AA92%carsonhh@gmail.com>
	<A22880BB-7693-4655-A3F3-D99D4F1FC08D@gmail.com>
Message-ID: <CF4733ED.AB63%carsonhh@gmail.com>

Note protein/transcript fasts are only created when there are gene models to
output to those files (so their absence means there were no gene models for
that contig). Most sequences without protein/transcript fasts in your sample
are very short and thus don?t contain anything.  What is left either have no
est2genome results or the est2genome alignments do not have sufficient open
reading frame to be turned into a gene model (false merging of regions by
trinity can cause this, so make sure you use the jaccard index option when
assembling reads with trinity to avoid this).

You are using only the est2genome=1 option.  This will result in a limited
set of genes that can be used for training SNAP/Augustus (so not getting
results on all contigs is expected).  You really won?t get much as far as
results until you have one of the ab initio predictors turned on.

Thanks,
Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Tuesday, March 11, 2014 at 8:52 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Daniel Ence <dence at genetics.utah.edu>
Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
missing

Alright done. My username is daras

Thanks
Dhivya

On Mar 10, 2014, at 5:10 PM, Carson Holt wrote:

> Input and compressed file of output.
> 
> Thanks,
> Carson
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Monday, March 10, 2014 at 2:09 PM
> To:  Carson Holt <carsonhh at gmail.com>
> Cc:  Daniel Ence <dence at genetics.utah.edu>
> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files missing
> 
> Hi Carson,
> 
> Do you mean the whole maker output?
> 
> Thanks
> dhivya
> 
> On Mar 10, 2014, at 4:55 PM, Carson Holt wrote:
> 
>> Could you upload everything here ?>
>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>> 
>> Than send us the link generated or your user ID.
>> 
>> Thanks,
>> Carson
>> 
>> 
>> 
>> From:  dhivya arasappan <darasappan at gmail.com>
>> Date:  Monday, March 10, 2014 at 1:50 PM
>> To:  Carson Holt <carsonhh at gmail.com>, Daniel Ence <dence at genetics.utah.edu>
>> Subject:  Fwd: maker output- transcripts.fasta and proteins.fasta files
>> missing
>> 
>> Hi Carson and Daniel,
>> 
>> I'm sending this across to you separately since maker list is blocking my
>> email due to attachment size.
>> 
>> As always, thanks for any guidance you can provide.
>> Dhivya
>> 
>> 
>> Begin forwarded message:
>> 
>>> From: dhivya arasappan <darasappan at gmail.com>
>>> Date: March 10, 2014 3:14:03 PM CDT
>>> To: maker-devel at yandell-lab.org
>>> Subject: maker output- transcripts.fasta and proteins.fasta files missing
>>> 
>>>  
>>> Hello,
>>> 
>>> I've been running maker with different assembly files, reference files etc
>>> and I check the output by:
>>> 
>>> 1. concatenating the gff files
>>> 2. concatenating the *transcripts.fasta files
>>> 3. concatenating the *proteins.fasta files
>>> 
>>> I'm noticing that when I ran maker twice with same parameters, the second
>>> time around, many of the output subdirectories  do not have a
>>> *transcripts.fasta or *proteins.fasta file in it.
>>> There are 251 subdirectories and only 97 of them have all 3 output files.
>>> Maker log looks ok to me, but I've attached it here as well.
>>> 
>>> What could be the reason for this?
>>> 
>>> Thanks
>>> dhivya
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140313/1484b1b6/attachment.html>

From carsonhh at gmail.com  Thu Mar 13 11:55:40 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 13 Mar 2014 10:55:40 -0600
Subject: [maker-devel] maker output- transcripts.fasta and
	proteins.fasta files missing
In-Reply-To: <0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com>
References: <E1538E4B-E356-4044-BD96-6D56D6F65C87@gmail.com>
	<64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com>
	<CF4382AA.AA8B%carsonhh@gmail.com>
	<A1D096BC-F25A-48D9-8C7F-8A64946E57F7@gmail.com>
	<CF438653.AA92%carsonhh@gmail.com>
	<A22880BB-7693-4655-A3F3-D99D4F1FC08D@gmail.com>
	<CF4733ED.AB63%carsonhh@gmail.com>
	<0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com>
Message-ID: <CF473DBA.AB9F%carsonhh@gmail.com>

The second time, it should have just started where it left off, so it would
run faster (because the processing from the previous job counted towards the
second one).  The archived output you sent me had 21,183 proteins and
transcripts.  If you are using the fasta_merge to collect them, just make
sure the datastore.index file is not truncated or corrupt otherwise it won?t
collect all the fastas from every contig.  You can rebuild the
datastore.index using the -dsindex flag with MAKER, if you want to check
that.  Also you can have maker just regenerate results without rerunning
BLAST etc., by using the -a flag if you want to just recalculate ll results
quickly (rebuilds all FASTA and GFF3 without redoing most analysis).

?Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Thursday, March 13, 2014 at 10:47 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
missing

Thanks Carson for the response.  I understand that est2genome=1 does not use
any ab initio gene predictions, but simply identifies ests based on
alignment.  I'm a little confused because I ran maker on my assembly before,
using the same parameters ( including est2genome=1).  I got a very good
result with > 20,000 transcripts and proteins.

Then I  was able to get an improved assembly, where many scaffolds were
combined into superscaffolds. So I reran maker on this assembly.   Same
parameters, same transcriptome and proteins files.  Now, I see such
drastically different results:  Only 500+ genes and transcripts.  My
scaffolds are now bigger than before, so I'm not sure how this is happening.
These were the results I sent you.

Another odd thing I noticed (and I am hesitant to report this because
perhaps it is due to some sort of error on my part):  I ran maker on the
improved assembly the first time and maker did not complete in the 48 hours
I allocated.  But I had  19,000+ transcripts in the unfinished output.  When
I reran maker, just changing the time allocated, it completed much faster,
but is giving much fewer transcripts and proteins as output.  Could
something like this happen? If not, then I'm guessing I must have changed
something although I'm pretty sure that I did not change anything other than
the time allocated. I've attached the trascripts and proteins files from the
first time I ran maker on my improved assembly.

Thanks again for your help
Dhivya


On Mar 13, 2014, at 11:14 AM, Carson Holt wrote:

> Note protein/transcript fasts are only created when there are gene models to
> output to those files (so their absence means there were no gene models for
> that contig). Most sequences without protein/transcript fasts in your sample
> are very short and thus don?t contain anything.  What is left either have no
> est2genome results or the est2genome alignments do not have sufficient open
> reading frame to be turned into a gene model (false merging of regions by
> trinity can cause this, so make sure you use the jaccard index option when
> assembling reads with trinity to avoid this).
> 
> You are using only the est2genome=1 option.  This will result in a limited set
> of genes that can be used for training SNAP/Augustus (so not getting results
> on all contigs is expected).  You really won?t get much as far as results
> until you have one of the ab initio predictors turned on.
> 
> Thanks,
> Carson
> 
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Tuesday, March 11, 2014 at 8:52 AM
> To:  Carson Holt <carsonhh at gmail.com>
> Cc:  Daniel Ence <dence at genetics.utah.edu>
> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files missing
> 
> Alright done. My username is daras
> 
> Thanks
> Dhivya
> 
> On Mar 10, 2014, at 5:10 PM, Carson Holt wrote:
> 
>> Input and compressed file of output.
>> 
>> Thanks,
>> Carson
>> 
>> From:  dhivya arasappan <darasappan at gmail.com>
>> Date:  Monday, March 10, 2014 at 2:09 PM
>> To:  Carson Holt <carsonhh at gmail.com>
>> Cc:  Daniel Ence <dence at genetics.utah.edu>
>> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
>> missing
>> 
>> Hi Carson,
>> 
>> Do you mean the whole maker output?
>> 
>> Thanks
>> dhivya
>> 
>> On Mar 10, 2014, at 4:55 PM, Carson Holt wrote:
>> 
>>> Could you upload everything here ?>
>>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>>> 
>>> Than send us the link generated or your user ID.
>>> 
>>> Thanks,
>>> Carson
>>> 
>>> 
>>> 
>>> From:  dhivya arasappan <darasappan at gmail.com>
>>> Date:  Monday, March 10, 2014 at 1:50 PM
>>> To:  Carson Holt <carsonhh at gmail.com>, Daniel Ence <dence at genetics.utah.edu>
>>> Subject:  Fwd: maker output- transcripts.fasta and proteins.fasta files
>>> missing
>>> 
>>> Hi Carson and Daniel,
>>> 
>>> I'm sending this across to you separately since maker list is blocking my
>>> email due to attachment size.
>>> 
>>> As always, thanks for any guidance you can provide.
>>> Dhivya
>>> 
>>> 
>>> Begin forwarded message:
>>> 
>>>> From: dhivya arasappan <darasappan at gmail.com>
>>>> Date: March 10, 2014 3:14:03 PM CDT
>>>> To: maker-devel at yandell-lab.org
>>>> Subject: maker output- transcripts.fasta and proteins.fasta files missing
>>>> 
>>>>  
>>>> Hello,
>>>> 
>>>> I've been running maker with different assembly files, reference files etc
>>>> and I check the output by:
>>>> 
>>>> 1. concatenating the gff files
>>>> 2. concatenating the *transcripts.fasta files
>>>> 3. concatenating the *proteins.fasta files
>>>> 
>>>> I'm noticing that when I ran maker twice with same parameters, the second
>>>> time around, many of the output subdirectories  do not have a
>>>> *transcripts.fasta or *proteins.fasta file in it.
>>>> There are 251 subdirectories and only 97 of them have all 3 output files.
>>>> Maker log looks ok to me, but I've attached it here as well.
>>>> 
>>>> What could be the reason for this?
>>>> 
>>>> Thanks
>>>> dhivya
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140313/a1a879a2/attachment.html>

From darasappan at gmail.com  Thu Mar 13 11:47:25 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Thu, 13 Mar 2014 11:47:25 -0500
Subject: [maker-devel] maker output- transcripts.fasta and
	proteins.fasta files missing
In-Reply-To: <CF4733ED.AB63%carsonhh@gmail.com>
References: <E1538E4B-E356-4044-BD96-6D56D6F65C87@gmail.com>
	<64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com>
	<CF4382AA.AA8B%carsonhh@gmail.com>
	<A1D096BC-F25A-48D9-8C7F-8A64946E57F7@gmail.com>
	<CF438653.AA92%carsonhh@gmail.com>
	<A22880BB-7693-4655-A3F3-D99D4F1FC08D@gmail.com>
	<CF4733ED.AB63%carsonhh@gmail.com>
Message-ID: <0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com>

Thanks Carson for the response.  I understand that est2genome=1 does  
not use any ab initio gene predictions, but simply identifies ests  
based on alignment.  I'm a little confused because I ran maker on my  
assembly before, using the same parameters ( including est2genome=1).   
I got a very good result with > 20,000 transcripts and proteins.

Then I  was able to get an improved assembly, where many scaffolds  
were combined into superscaffolds. So I reran maker on this  
assembly.   Same parameters, same transcriptome and proteins files.   
Now, I see such drastically different results:  Only 500+ genes and  
transcripts.  My scaffolds are now bigger than before, so I'm not sure  
how this is happening.   These were the results I sent you.

Another odd thing I noticed (and I am hesitant to report this because  
perhaps it is due to some sort of error on my part):  I ran maker on  
the improved assembly the first time and maker did not complete in the  
48 hours I allocated.  But I had  19,000+ transcripts in the  
unfinished output.  When I reran maker, just changing the time  
allocated, it completed much faster, but is giving much fewer  
transcripts and proteins as output.  Could something like this happen?  
If not, then I'm guessing I must have changed something although I'm  
pretty sure that I did not change anything other than the time  
allocated. I've attached the trascripts and proteins files from the  
first time I ran maker on my improved assembly.

Thanks again for your help
Dhivya


On Mar 13, 2014, at 11:14 AM, Carson Holt wrote:

> Note protein/transcript fasts are only created when there are gene  
> models to output to those files (so their absence means there were  
> no gene models for that contig). Most sequences without protein/ 
> transcript fasts in your sample are very short and thus don?t  
> contain anything.  What is left either have no est2genome results or  
> the est2genome alignments do not have sufficient open reading frame  
> to be turned into a gene model (false merging of regions by trinity  
> can cause this, so make sure you use the jaccard index option when  
> assembling reads with trinity to avoid this).
>
> You are using only the est2genome=1 option.  This will result in a  
> limited set of genes that can be used for training SNAP/Augustus (so  
> not getting results on all contigs is expected).  You really won?t  
> get much as far as results until you have one of the ab initio  
> predictors turned on.
>
> Thanks,
> Carson
>
>
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Tuesday, March 11, 2014 at 8:52 AM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: Daniel Ence <dence at genetics.utah.edu>
> Subject: Re: maker output- transcripts.fasta and proteins.fasta  
> files missing
>
> Alright done. My username is daras
>
> Thanks
> Dhivya
>
> On Mar 10, 2014, at 5:10 PM, Carson Holt wrote:
>
>> Input and compressed file of output.
>>
>> Thanks,
>> Carson
>>
>> From: dhivya arasappan <darasappan at gmail.com>
>> Date: Monday, March 10, 2014 at 2:09 PM
>> To: Carson Holt <carsonhh at gmail.com>
>> Cc: Daniel Ence <dence at genetics.utah.edu>
>> Subject: Re: maker output- transcripts.fasta and proteins.fasta  
>> files missing
>>
>> Hi Carson,
>>
>> Do you mean the whole maker output?
>>
>> Thanks
>> dhivya
>>
>> On Mar 10, 2014, at 4:55 PM, Carson Holt wrote:
>>
>>> Could you upload everything here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>>>
>>> Than send us the link generated or your user ID.
>>>
>>> Thanks,
>>> Carson
>>>
>>>
>>>
>>> From: dhivya arasappan <darasappan at gmail.com>
>>> Date: Monday, March 10, 2014 at 1:50 PM
>>> To: Carson Holt <carsonhh at gmail.com>, Daniel Ence <dence at genetics.utah.edu 
>>> >
>>> Subject: Fwd: maker output- transcripts.fasta and proteins.fasta  
>>> files missing
>>>
>>> Hi Carson and Daniel,
>>>
>>> I'm sending this across to you separately since maker list is  
>>> blocking my email due to attachment size.
>>>
>>> As always, thanks for any guidance you can provide.
>>> Dhivya
>>>
>>>
>>> Begin forwarded message:
>>>
>>>> From: dhivya arasappan <darasappan at gmail.com>
>>>> Date: March 10, 2014 3:14:03 PM CDT
>>>> To: maker-devel at yandell-lab.org
>>>> Subject: maker output- transcripts.fasta and proteins.fasta files  
>>>> missing
>>>>
>>>> Hello,
>>>>
>>>> I've been running maker with different assembly files, reference  
>>>> files etc  and I check the output by:
>>>>
>>>> 1. concatenating the gff files
>>>> 2. concatenating the *transcripts.fasta files
>>>> 3. concatenating the *proteins.fasta files
>>>>
>>>> I'm noticing that when I ran maker twice with same parameters,  
>>>> the second time around, many of the output subdirectories  do not  
>>>> have a *transcripts.fasta or *proteins.fasta file in it.
>>>> There are 251 subdirectories and only 97 of them have all 3  
>>>> output files.  Maker log looks ok to me, but I've attached it  
>>>> here as well.
>>>>
>>>> What could be the reason for this?
>>>>
>>>> Thanks
>>>> dhivya
>>>>
>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140313/2048cfef/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: transcripts.cat.fasta.old.gz
Type: application/x-gzip
Size: 7927581 bytes
Desc: not available
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140313/2048cfef/attachment.gz>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140313/2048cfef/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: proteins.cat.fasta.old.gz
Type: application/x-gzip
Size: 3668381 bytes
Desc: not available
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140313/2048cfef/attachment-0001.gz>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140313/2048cfef/attachment-0002.html>

From carsonhh at gmail.com  Thu Mar 13 13:53:05 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 13 Mar 2014 12:53:05 -0600
Subject: [maker-devel] maker output- transcripts.fasta and
	proteins.fasta files missing
In-Reply-To: <C5EC9853-C3A9-4651-9C7F-05F7B73FC628@gmail.com>
References: <E1538E4B-E356-4044-BD96-6D56D6F65C87@gmail.com>
	<64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com>
	<CF4382AA.AA8B%carsonhh@gmail.com>
	<A1D096BC-F25A-48D9-8C7F-8A64946E57F7@gmail.com>
	<CF438653.AA92%carsonhh@gmail.com>
	<A22880BB-7693-4655-A3F3-D99D4F1FC08D@gmail.com>
	<CF4733ED.AB63%carsonhh@gmail.com>
	<0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com>
	<CF473DBA.AB9F%carsonhh@gmail.com>
	<672A27A2-FFBD-45EC-9303-E3973EEA5AB6@gmail.com>
	<CF474291.ABC0%carsonhh@gmail.com>
	<CF4744C6.ABC9%carsonhh@gmail.com>
	<5EE3B5E8-E7DC-4F09-B52D-E08CA4D85A15@gmail.com>
	<CF474BE5.ABDA%carsonhh@gmail.com>
	<C5EC9853-C3A9-4651-9C7F-05F7B73FC628@gmail.com>
Message-ID: <CF4759BA.ABE2%carsonhh@gmail.com>

For future reference, I suggest using the ?/maker/bin/fasta_merge tool to
merge based on the datastore.index rather than other command line based
methods.  It will handle the multiple fasta types that are produced in the
results, and will validate with the datastore.index file.

Example:
fasta_merge -d 
opgenResult+scaffoldsLengthsLess200_master_datastore_index.log

The same is also true when merging gff3 files.
gff3_merge -d opgenResult+scaffoldsLengthsLess200_master_datastore_index.log

Thanks,
Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Thursday, March 13, 2014 at 12:48 PM
To:  Carson Holt <carsonhh at gmail.com>
Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
missing

ah  I forgot that some were called superscaffolds.  That is a difference
between the old and new assembly. This was definitely the issue. Thanks and
sorry for the mix up.

Dhivya
On Mar 13, 2014, at 12:51 PM, Carson Holt wrote:

> Note that your command does not capture everything because not all scaffolds
> start with the name ?scaffold".
> 
> This works though ?>
> ls -lh opgenResult+scaffoldsLengthsLess200_datastore/*/*/*/*trans*fasta|wc -l
> 
> Thanks,
> Carson
> 
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Thursday, March 13, 2014 at 11:34 AM
> To:  Carson Holt <carsonhh at gmail.com>
> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files missing
> 
> Hi Carson,
> 
> Am I looking in the wrong place for my fasta files?  I looked here:
> 
> ls -lh opgenResult+scaffoldsLengthsLess200_datastore/*/*/sca*/*trans*fasta|wc
> -l
> 
> I see only 97 such files- so 97 contigs with transcripts.fasta files?
> 
> When I count the number of sequences in all these files, I get 514 sequences.
> 
> grep -c '^>' 
> opgenResult+scaffoldsLengthsLess200_datastore/*/*/sca*/*trans*fasta|cut -d ':'
> -f 2|awk '{total+=$0}END{print total}'
> 
> Could you tell how and where you are getting the 21,183 transcripts?
> 
> thanks
> dhivya
> 
> On Mar 13, 2014, at 12:21 PM, Carson Holt wrote:
> 
>> This is what I see in your uploaded data.  There are 21,183 transcripts from
>> 201 contigs.  Then there are 707 contigs with no gene models.
>> 
>> ?Carson
>> 
>> 
>> From:  Carson Holt <carsonhh at gmail.com>
>> Date:  Thursday, March 13, 2014 at 11:11 AM
>> To:  dhivya arasappan <darasappan at gmail.com>
>> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
>> missing
>> 
>> "as you saw from the output I uploaded before, the output certainly was much
>> less than 20,000 transcripts?
>> 
>> Actually there were 21,183 in the output you uploaded.  I saw no loss of
>> entries.
>> 
>> ?Carson
>> 
>> From:  dhivya arasappan <darasappan at gmail.com>
>> Date:  Thursday, March 13, 2014 at 11:09 AM
>> To:  Carson Holt <carsonhh at gmail.com>
>> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
>> missing
>> 
>> Hi Carson,
>> 
>> The datastore.index file looks fine- it has a started and finished status for
>> my 980 scaffolds.  I reran with increased time twice. Second time around, I
>> actually deleted the entire output directory to make sure it runs all over
>> again.  It still seemed to complete within a day. As you saw from the output
>> I uploaded before, the output certainly was much less than 20,000
>> transcripts. Given that I was seeing great results for an older version of my
>> assembly, I'm puzzled as to why my results are worse this time around. Any
>> suggestions of what to check or what I can do to see improved results would
>> be really helpful.
>> 
>> I do know that I went from ~4% gaps to ~6% gaps in my new assembly- other
>> than that, its better in every way. Could this cause just a dramatic
>> difference in results?
>> 
>> Thanks
>> dhivya
>> 
>> On Mar 13, 2014, at 11:55 AM, Carson Holt wrote:
>> 
>>> The second time, it should have just started where it left off, so it would
>>> run faster (because the processing from the previous job counted towards the
>>> second one).  The archived output you sent me had 21,183 proteins and
>>> transcripts.  If you are using the fasta_merge to collect them, just make
>>> sure the datastore.index file is not truncated or corrupt otherwise it won?t
>>> collect all the fastas from every contig.  You can rebuild the
>>> datastore.index using the -dsindex flag with MAKER, if you want to check
>>> that.  Also you can have maker just regenerate results without rerunning
>>> BLAST etc., by using the -a flag if you want to just recalculate ll results
>>> quickly (rebuilds all FASTA and GFF3 without redoing most analysis).
>>> 
>>> ?Carson
>>> 
>>> 
>>> From:  dhivya arasappan <darasappan at gmail.com>
>>> Date:  Thursday, March 13, 2014 at 10:47 AM
>>> To:  Carson Holt <carsonhh at gmail.com>
>>> Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
>>> <maker-devel at yandell-lab.org>
>>> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
>>> missing
>>> 
>>> Thanks Carson for the response.  I understand that est2genome=1 does not use
>>> any ab initio gene predictions, but simply identifies ests based on
>>> alignment.  I'm a little confused because I ran maker on my assembly before,
>>> using the same parameters ( including est2genome=1).  I got a very good
>>> result with > 20,000 transcripts and proteins.
>>> 
>>> Then I  was able to get an improved assembly, where many scaffolds were
>>> combined into superscaffolds. So I reran maker on this assembly.   Same
>>> parameters, same transcriptome and proteins files.  Now, I see such
>>> drastically different results:  Only 500+ genes and transcripts.  My
>>> scaffolds are now bigger than before, so I'm not sure how this is happening.
>>> These were the results I sent you.
>>> 
>>> Another odd thing I noticed (and I am hesitant to report this because
>>> perhaps it is due to some sort of error on my part):  I ran maker on the
>>> improved assembly the first time and maker did not complete in the 48 hours
>>> I allocated.  But I had  19,000+ transcripts in the unfinished output.  When
>>> I reran maker, just changing the time allocated, it completed much faster,
>>> but is giving much fewer transcripts and proteins as output.  Could
>>> something like this happen? If not, then I'm guessing I must have changed
>>> something although I'm pretty sure that I did not change anything other than
>>> the time allocated. I've attached the trascripts and proteins files from the
>>> first time I ran maker on my improved assembly.
>>> 
>>> Thanks again for your help
>>> Dhivya
>>> 
>>> 
>>> 
>>> On Mar 13, 2014, at 11:14 AM, Carson Holt wrote:
>>> 
>>>> Note protein/transcript fasts are only created when there are gene models
>>>> to output to those files (so their absence means there were no gene models
>>>> for that contig). Most sequences without protein/transcript fasts in your
>>>> sample are very short and thus don?t contain anything.  What is left either
>>>> have no est2genome results or the est2genome alignments do not have
>>>> sufficient open reading frame to be turned into a gene model (false merging
>>>> of regions by trinity can cause this, so make sure you use the jaccard
>>>> index option when assembling reads with trinity to avoid this).
>>>> 
>>>> You are using only the est2genome=1 option.  This will result in a limited
>>>> set of genes that can be used for training SNAP/Augustus (so not getting
>>>> results on all contigs is expected).  You really won?t get much as far as
>>>> results until you have one of the ab initio predictors turned on.
>>>> 
>>>> Thanks,
>>>> Carson
>>>> 
>>>> 
>>>> From:  dhivya arasappan <darasappan at gmail.com>
>>>> Date:  Tuesday, March 11, 2014 at 8:52 AM
>>>> To:  Carson Holt <carsonhh at gmail.com>
>>>> Cc:  Daniel Ence <dence at genetics.utah.edu>
>>>> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
>>>> missing
>>>> 
>>>> Alright done. My username is daras
>>>> 
>>>> Thanks
>>>> Dhivya
>>>> 
>>>> On Mar 10, 2014, at 5:10 PM, Carson Holt wrote:
>>>> 
>>>>> Input and compressed file of output.
>>>>> 
>>>>> Thanks,
>>>>> Carson
>>>>> 
>>>>> From:  dhivya arasappan <darasappan at gmail.com>
>>>>> Date:  Monday, March 10, 2014 at 2:09 PM
>>>>> To:  Carson Holt <carsonhh at gmail.com>
>>>>> Cc:  Daniel Ence <dence at genetics.utah.edu>
>>>>> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
>>>>> missing
>>>>> 
>>>>> Hi Carson,
>>>>> 
>>>>> Do you mean the whole maker output?
>>>>> 
>>>>> Thanks
>>>>> dhivya
>>>>> 
>>>>> On Mar 10, 2014, at 4:55 PM, Carson Holt wrote:
>>>>> 
>>>>>> Could you upload everything here ?>
>>>>>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>>>>>> 
>>>>>> Than send us the link generated or your user ID.
>>>>>> 
>>>>>> Thanks,
>>>>>> Carson
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> From:  dhivya arasappan <darasappan at gmail.com>
>>>>>> Date:  Monday, March 10, 2014 at 1:50 PM
>>>>>> To:  Carson Holt <carsonhh at gmail.com>, Daniel Ence
>>>>>> <dence at genetics.utah.edu>
>>>>>> Subject:  Fwd: maker output- transcripts.fasta and proteins.fasta files
>>>>>> missing
>>>>>> 
>>>>>> Hi Carson and Daniel,
>>>>>> 
>>>>>> I'm sending this across to you separately since maker list is blocking my
>>>>>> email due to attachment size.
>>>>>> 
>>>>>> As always, thanks for any guidance you can provide.
>>>>>> Dhivya
>>>>>> 
>>>>>> 
>>>>>> Begin forwarded message:
>>>>>> 
>>>>>>> From: dhivya arasappan <darasappan at gmail.com>
>>>>>>> Date: March 10, 2014 3:14:03 PM CDT
>>>>>>> To: maker-devel at yandell-lab.org
>>>>>>> Subject: maker output- transcripts.fasta and proteins.fasta files
>>>>>>> missing
>>>>>>> 
>>>>>>>  
>>>>>>> Hello,
>>>>>>> 
>>>>>>> I've been running maker with different assembly files, reference files
>>>>>>> etc  and I check the output by:
>>>>>>> 
>>>>>>> 1. concatenating the gff files
>>>>>>> 2. concatenating the *transcripts.fasta files
>>>>>>> 3. concatenating the *proteins.fasta files
>>>>>>> 
>>>>>>> I'm noticing that when I ran maker twice with same parameters, the
>>>>>>> second time around, many of the output subdirectories  do not have a
>>>>>>> *transcripts.fasta or *proteins.fasta file in it.
>>>>>>> There are 251 subdirectories and only 97 of them have all 3 output
>>>>>>> files.  Maker log looks ok to me, but I've attached it here as well.
>>>>>>> 
>>>>>>> What could be the reason for this?
>>>>>>> 
>>>>>>> Thanks
>>>>>>> dhivya
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140313/dff0c913/attachment.html>

From cjfields at illinois.edu  Thu Mar 13 16:04:23 2014
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 13 Mar 2014 21:04:23 +0000
Subject: [maker-devel] geneid (or alternative ab initio predictors)
In-Reply-To: <CAJrwUqnXZgJgse2X6z7QhQcC_aNih_dp90dpjsW037F0Qk-W4A@mail.gmail.com>
References: <CF433C40.AA26%carsonhh@gmail.com>
	<CAJrwUqnXZgJgse2X6z7QhQcC_aNih_dp90dpjsW037F0Qk-W4A@mail.gmail.com>
Message-ID: <A7C303EB-717F-4E95-8829-7912B49A6D38@illinois.edu>

That is nice to know; I?ll have to check the masking on this assembly to see if that is the problem (my guess is that it is).

Carson, re: geneid and ?hints?, it looks as if geneid can take some hints such as BLAST HSPs (as well as other information), in the form of a GFF ?homology? file.  I assume it could take protein2genome/est2genome as well through the same route.

chris

On Mar 10, 2014, at 1:31 PM, Sajeet Haridas <sajeet at gmail.com<mailto:sajeet at gmail.com>> wrote:

One of the problems I have found with genemark is that it does not understand a soft-masked genome. Hence, the self training is incorrect. I have found marked improvement to genemark's prediction by running the training on a hard masked genome.


On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>> wrote:
Adding a new predictor can take some time.  It obviously requires some
coding.  It?s usually not too hard just to convert results to GFF3 and
then pass it in.  Integrated support is really only beneficial for
predictors that can take ?hints? from evidence alignments (for example we
are working on EVM integration right now -
http://evidencemodeler.sourceforge.net<http://evidencemodeler.sourceforge.net/>).  If SNAP and GeneMark give
problems just drop them.  GeneMark really doesn?t work very good on
genomes with complex intron/exon structure (and I really wouldn?t use it
for anything but fungi).

Make sure you are also giving sufficient protein evidence.  Perhaps all
proteins from chicken and pigeon for example.  Then you shouldn?t find
loss of any true genes if just using Augustus.  Also try not to use gene
count as an indicator of performance.  The value is very deceptive,
especially if the genome assembly is fragmented.

Thanks,
Carson


On 3/10/14, 8:52 AM, "Fields, Christopher J" <cjfields at illinois.edu<mailto:cjfields at illinois.edu>> wrote:

>I have been running MAKER 2.31 using Augustus and SNAP on an avian
>genome.  Augustus gives pretty decent gene model predictions based on a
>custom model we have and the hints MAKER provides.  However, SNAP seems
>to throw out a ton of false positives; in many cases this appears to
>cause erroneous gene fusions.  Leaving out SNAP altogether however leads
>to a marked decrease in # models overall, which is worse.  GeneMark had a
>very similar problem (high # false positives) and thus no marked
>improvement, either when using with both Augustus and SNAP or with
>Augustus alone.
>
>I have been exploring using geneid
>(http://genome.crg.es/software/geneid/) as an alternative, based on some
>feedback on another project I worked with int he past.  This would be
>feed into MAKER using external GFF, but I wanted to see if anyone has
>tried geneid with MAKER first.
>
>Finally, how hard would it be to incorporate alternative callers into
>MAKER?  For instance, would it be possible to add these like a ?plugin??
>
>chris
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140313/357a688a/attachment.html>

From jfierst at uoregon.edu  Fri Mar 14 11:06:26 2014
From: jfierst at uoregon.edu (Janna Fierst)
Date: Fri, 14 Mar 2014 09:06:26 -0700
Subject: [maker-devel] associating gene names between related strains
Message-ID: <CAGoyurZz5FvX_oCGtSoq5mzwfabFS5ixaHVgzQds7Bo26NcYHg@mail.gmail.com>

Hi,

we are assembling and annotating genomes for several related strains of
Caenorhabditis worms and I was wondering if there is a way to coordinate
the gene naming so that orthologs between species can be associated by
name. I have been playing around a little with the est_forward option but
can't figure out a good system/workflow that preserves names but still uses
the strain-specific RNA-Seq EST set for the actual gene models. Thanks!
-Janna
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140314/6d450ccc/attachment.html>

From dence at genetics.utah.edu  Fri Mar 14 12:32:02 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Fri, 14 Mar 2014 17:32:02 +0000
Subject: [maker-devel] associating gene names between related strains
In-Reply-To: <CAGoyurZz5FvX_oCGtSoq5mzwfabFS5ixaHVgzQds7Bo26NcYHg@mail.gmail.com>
References: <CAGoyurZz5FvX_oCGtSoq5mzwfabFS5ixaHVgzQds7Bo26NcYHg@mail.gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6C3C3@mxb2.hg.genetics.utah.edu>

Hi Janna, So do you have one strain that you want to use as the reference for all the others? There's a script that comes with MAKER called maker_map_ids that lets you use a common prefix or suffix for entries in a fasta file from one strain and then use est_forward to use that ID in the gene models for the other species.

Let me know if that's not what you're looking for,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Janna Fierst [jfierst at uoregon.edu]
Sent: Friday, March 14, 2014 10:06 AM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] associating gene names between related strains

Hi,

we are assembling and annotating genomes for several related strains of Caenorhabditis worms and I was wondering if there is a way to coordinate the gene naming so that orthologs between species can be associated by name. I have been playing around a little with the est_forward option but can't figure out a good system/workflow that preserves names but still uses the strain-specific RNA-Seq EST set for the actual gene models. Thanks! -Janna
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140314/84143c7f/attachment.html>

From jfierst at uoregon.edu  Fri Mar 14 13:01:16 2014
From: jfierst at uoregon.edu (Janna Fierst)
Date: Fri, 14 Mar 2014 11:01:16 -0700
Subject: [maker-devel] associating gene names between related strains
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6C3C3@mxb2.hg.genetics.utah.edu>
References: <CAGoyurZz5FvX_oCGtSoq5mzwfabFS5ixaHVgzQds7Bo26NcYHg@mail.gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6C3C3@mxb2.hg.genetics.utah.edu>
Message-ID: <CAGoyuracbDO5pcWU7wThnnnGbfoKo2xEn+trPPUaUJx9t+8_Lg@mail.gmail.com>

I will try it today. Thanks for the quick reply!


On Fri, Mar 14, 2014 at 10:32 AM, Daniel Ence <dence at genetics.utah.edu>wrote:

>  Hi Janna, So do you have one strain that you want to use as the
> reference for all the others? There's a script that comes with MAKER called
> maker_map_ids that lets you use a common prefix or suffix for entries in a
> fasta file from one strain and then use est_forward to use that ID in the
> gene models for the other species.
>
>  Let me know if that's not what you're looking for,
> Daniel
>
>  Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
>   ------------------------------
> *From:* maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
> Janna Fierst [jfierst at uoregon.edu]
> *Sent:* Friday, March 14, 2014 10:06 AM
> *To:* maker-devel at yandell-lab.org
> *Subject:* [maker-devel] associating gene names between related strains
>
>   Hi,
>
> we are assembling and annotating genomes for several related strains of
> Caenorhabditis worms and I was wondering if there is a way to coordinate
> the gene naming so that orthologs between species can be associated by
> name. I have been playing around a little with the est_forward option but
> can't figure out a good system/workflow that preserves names but still uses
> the strain-specific RNA-Seq EST set for the actual gene models. Thanks!
> -Janna
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140314/6c26531d/attachment.html>

From carsonhh at gmail.com  Fri Mar 14 13:02:48 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 14 Mar 2014 12:02:48 -0600
Subject: [maker-devel] associating gene names between related strains
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6C3C3@mxb2.hg.genetics.utah.edu>
References: <CAGoyurZz5FvX_oCGtSoq5mzwfabFS5ixaHVgzQds7Bo26NcYHg@mail.gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6C3C3@mxb2.hg.genetics.utah.edu>
Message-ID: <CF489F0B.AC19%carsonhh@gmail.com>

maker_map_ids does a translation (i.e. change gene-A to smug1), so you need
to know which genes you want to translate names to (two column input file,
column 1 -> original ID, column 2 -> new ID).  I?m not sure EST forward is
the best way to do this, although I do think maker_map_ids is the tool to
use in the end.  The question is how to make a list of IDs to translate as
the input to maker_map_ids?

I would actually just use BLASTP against the reference strain, and then do
reciprocal best BLAST hits.  To do this you BLAST your reference proteins
against your maker proteins.  Then do the opposite, BLAST your  maker
proteins against your reference proteins.  If they are both each others best
hit, then they are orthologous, and you can safely make a two column entry
for the maker_map_ids input (i.e. maker-gene-1 translates into smug1).

?Carson


From:  Daniel Ence <dence at genetics.utah.edu>
Date:  Friday, March 14, 2014 at 11:32 AM
To:  Janna Fierst <jfierst at uoregon.edu>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] associating gene names between related strains

Hi Janna, So do you have one strain that you want to use as the reference
for all the others? There's a script that comes with MAKER called
maker_map_ids that lets you use a common prefix or suffix for entries in a
fasta file from one strain and then use est_forward to use that ID in the
gene models for the other species.

Let me know if that's not what you're looking for,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330

From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Janna
Fierst [jfierst at uoregon.edu]
Sent: Friday, March 14, 2014 10:06 AM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] associating gene names between related strains

Hi,

we are assembling and annotating genomes for several related strains of
Caenorhabditis worms and I was wondering if there is a way to coordinate the
gene naming so that orthologs between species can be associated by name. I
have been playing around a little with the est_forward option but can't
figure out a good system/workflow that preserves names but still uses the
strain-specific RNA-Seq EST set for the actual gene models. Thanks! -Janna
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140314/e19abad7/attachment.html>

From carsonhh at gmail.com  Fri Mar 14 13:43:41 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 14 Mar 2014 12:43:41 -0600
Subject: [maker-devel] Error when running maker2zff script
In-Reply-To: <9E3C7171-E5F7-4602-A7B7-9E9CE91F303A@gmail.com>
References: <C9394A0F-A682-4249-80DD-D79E45AE18EA@gmail.com>
	<3219E92A-2024-45C6-84A9-66C646287D7E@gmail.com>
	<9E3C7171-E5F7-4602-A7B7-9E9CE91F303A@gmail.com>
Message-ID: <CF48A7BD.AC29%carsonhh@gmail.com>

I?m glad you were able to fix it.  I?ll check to see why it was failing as
well.

Thanks,
Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Friday, March 14, 2014 at 10:16 AM
To:  Carson Holt <carsonhh at gmail.com>
Subject:  Re: Error when running maker2zff script

Kindly ignore my previous question. I was able to manipulate the scaffold
names in the gff file to get maker2zff to work.

Thanks
dhivya

On Mar 14, 2014, at 10:55 AM, dhivya arasappan <darasappan at gmail.com> wrote:

> My message got flagged by the maker list again, so I?m forwarding this
> separately to you.  Is there a better way to send biggish files?
> 
> 
> Thank you
> Dhivya
> 
> 
> 
> Begin forwarded message:
> 
>> From: dhivya arasappan <darasappan at gmail.com>
>> Subject: Error when running maker2zff script
>> Date: March 13, 2014 at 8:35:27 PM CDT
>> To: Carson Holt <carsonhh at gmail.com>, maker-devel at yandell-lab.org
>> 
>> Hi Carson,
>> 
>> I used gff3_merge to create my gff file from maker output. I've attached it
>> here. But when I run maker2zff on it, I get the following error:
>> 
>> Can't use an undefined value as an ARRAY reference at
>> /opt/apps/maker/2.30/bin/maker2zff line 177, <GFF> line 7294251.
>> 
>> It produces an incomplete output file and it looks like it may be running
>> into problems when it encounters scaffold3%2F0.  I'm wondering if its having
>> problems with my scaffold names. There seem to be some inconsistencies
>> because it's referred to as  scaffold3%F0 and scaffold3/0 in the gff file.
>> It goes through other scaffolds like SCAFFOLD3_873, SCAFFOLD3_95 etc just
>> fine.   I did try replacing the scaffold names in the gff file, but still get
>> the same error.   Any ideas?
>> 
>> Substitution command I used, for your reference:  sed 's/3\%2F/3_/g' gfffile|
>> sed 's/\//\_/'  > mod.gfffile
>> 
>> Thanks
>> Dhivya
>> 
> <head.gff.gz>
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140314/0ab2c23b/attachment.html>

From carsonhh at gmail.com  Fri Mar 14 14:25:58 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 14 Mar 2014 13:25:58 -0600
Subject: [maker-devel] geneid (or alternative ab initio predictors)
In-Reply-To: <A7C303EB-717F-4E95-8829-7912B49A6D38@illinois.edu>
References: <CF433C40.AA26%carsonhh@gmail.com>
	<CAJrwUqnXZgJgse2X6z7QhQcC_aNih_dp90dpjsW037F0Qk-W4A@mail.gmail.com>
	<A7C303EB-717F-4E95-8829-7912B49A6D38@illinois.edu>
Message-ID: <CF48B2BC.AC3E%carsonhh@gmail.com>

We can look into it.

?Carson

From:  "Fields, Christopher J" <cjfields at illinois.edu>
Date:  Thursday, March 13, 2014 at 3:04 PM
To:  Sajeet Haridas <sajeet at gmail.com>
Cc:  Carson Holt <carsonhh at gmail.com>, "<maker-devel at yandell-lab.org> List"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] geneid (or alternative ab initio predictors)

That is nice to know; I?ll have to check the masking on this assembly to see
if that is the problem (my guess is that it is).

Carson, re: geneid and ?hints?, it looks as if geneid can take some hints
such as BLAST HSPs (as well as other information), in the form of a GFF
?homology? file.  I assume it could take protein2genome/est2genome as well
through the same route.

chris

On Mar 10, 2014, at 1:31 PM, Sajeet Haridas <sajeet at gmail.com> wrote:

> One of the problems I have found with genemark is that it does not understand
> a soft-masked genome. Hence, the self training is incorrect. I have found
> marked improvement to genemark's prediction by running the training on a hard
> masked genome.
> 
> 
> On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt <carsonhh at gmail.com> wrote:
>> Adding a new predictor can take some time.  It obviously requires some
>> coding.  It?s usually not too hard just to convert results to GFF3 and
>> then pass it in.  Integrated support is really only beneficial for
>> predictors that can take ?hints? from evidence alignments (for example we
>> are working on EVM integration right now -
>> http://evidencemodeler.sourceforge.net
>> <http://evidencemodeler.sourceforge.net/> ).  If SNAP and GeneMark give
>> problems just drop them.  GeneMark really doesn?t work very good on
>> genomes with complex intron/exon structure (and I really wouldn?t use it
>> for anything but fungi).
>> 
>> Make sure you are also giving sufficient protein evidence.  Perhaps all
>> proteins from chicken and pigeon for example.  Then you shouldn?t find
>> loss of any true genes if just using Augustus.  Also try not to use gene
>> count as an indicator of performance.  The value is very deceptive,
>> especially if the genome assembly is fragmented.
>> 
>> Thanks,
>> Carson
>> 
>> 
>> 
>> On 3/10/14, 8:52 AM, "Fields, Christopher J" <cjfields at illinois.edu> wrote:
>> 
>>> >I have been running MAKER 2.31 using Augustus and SNAP on an avian
>>> >genome.  Augustus gives pretty decent gene model predictions based on a
>>> >custom model we have and the hints MAKER provides.  However, SNAP seems
>>> >to throw out a ton of false positives; in many cases this appears to
>>> >cause erroneous gene fusions.  Leaving out SNAP altogether however leads
>>> >to a marked decrease in # models overall, which is worse.  GeneMark had a
>>> >very similar problem (high # false positives) and thus no marked
>>> >improvement, either when using with both Augustus and SNAP or with
>>> >Augustus alone.
>>> >
>>> >I have been exploring using geneid
>>> >(http://genome.crg.es/software/geneid/) as an alternative, based on some
>>> >feedback on another project I worked with int he past.  This would be
>>> >feed into MAKER using external GFF, but I wanted to see if anyone has
>>> >tried geneid with MAKER first.
>>> >
>>> >Finally, how hard would it be to incorporate alternative callers into
>>> >MAKER?  For instance, would it be possible to add these like a ?plugin??
>>> >
>>> >chris
>>> >_______________________________________________
>>> >maker-devel mailing list
>>> >maker-devel at box290.bluehost.com
>>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
>> 
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140314/f67ff628/attachment.html>

From cjfields at illinois.edu  Fri Mar 14 21:22:55 2014
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Sat, 15 Mar 2014 02:22:55 +0000
Subject: [maker-devel] geneid (or alternative ab initio predictors)
In-Reply-To: <CF48B2BC.AC3E%carsonhh@gmail.com>
References: <CF433C40.AA26%carsonhh@gmail.com>
	<CAJrwUqnXZgJgse2X6z7QhQcC_aNih_dp90dpjsW037F0Qk-W4A@mail.gmail.com>
	<A7C303EB-717F-4E95-8829-7912B49A6D38@illinois.edu>
	<CF48B2BC.AC3E%carsonhh@gmail.com>
Message-ID: <53FD788A-15EA-4A18-BB2F-3072178816CA@illinois.edu>

Not an issue at the moment; I?ll likely supply these via gff for now.  If needed I can work off a svn checkout and send along a patch should I ever manage to eek out time to work on it.

chris

On Mar 14, 2014, at 2:25 PM, Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>> wrote:

We can look into it.

?Carson

From: "Fields, Christopher J" <cjfields at illinois.edu<mailto:cjfields at illinois.edu>>
Date: Thursday, March 13, 2014 at 3:04 PM
To: Sajeet Haridas <sajeet at gmail.com<mailto:sajeet at gmail.com>>
Cc: Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>, "<maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>> List" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] geneid (or alternative ab initio predictors)

That is nice to know; I?ll have to check the masking on this assembly to see if that is the problem (my guess is that it is).

Carson, re: geneid and ?hints?, it looks as if geneid can take some hints such as BLAST HSPs (as well as other information), in the form of a GFF ?homology? file.  I assume it could take protein2genome/est2genome as well through the same route.

chris

On Mar 10, 2014, at 1:31 PM, Sajeet Haridas <sajeet at gmail.com<mailto:sajeet at gmail.com>> wrote:

One of the problems I have found with genemark is that it does not understand a soft-masked genome. Hence, the self training is incorrect. I have found marked improvement to genemark's prediction by running the training on a hard masked genome.


On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>> wrote:
Adding a new predictor can take some time.  It obviously requires some
coding.  It?s usually not too hard just to convert results to GFF3 and
then pass it in.  Integrated support is really only beneficial for
predictors that can take ?hints? from evidence alignments (for example we
are working on EVM integration right now -
http://evidencemodeler.sourceforge.net<http://evidencemodeler.sourceforge.net/>).  If SNAP and GeneMark give
problems just drop them.  GeneMark really doesn?t work very good on
genomes with complex intron/exon structure (and I really wouldn?t use it
for anything but fungi).

Make sure you are also giving sufficient protein evidence.  Perhaps all
proteins from chicken and pigeon for example.  Then you shouldn?t find
loss of any true genes if just using Augustus.  Also try not to use gene
count as an indicator of performance.  The value is very deceptive,
especially if the genome assembly is fragmented.

Thanks,
Carson


On 3/10/14, 8:52 AM, "Fields, Christopher J" <cjfields at illinois.edu<mailto:cjfields at illinois.edu>> wrote:

>I have been running MAKER 2.31 using Augustus and SNAP on an avian
>genome.  Augustus gives pretty decent gene model predictions based on a
>custom model we have and the hints MAKER provides.  However, SNAP seems
>to throw out a ton of false positives; in many cases this appears to
>cause erroneous gene fusions.  Leaving out SNAP altogether however leads
>to a marked decrease in # models overall, which is worse.  GeneMark had a
>very similar problem (high # false positives) and thus no marked
>improvement, either when using with both Augustus and SNAP or with
>Augustus alone.
>
>I have been exploring using geneid
>(http://genome.crg.es/software/geneid/) as an alternative, based on some
>feedback on another project I worked with int he past.  This would be
>feed into MAKER using external GFF, but I wanted to see if anyone has
>tried geneid with MAKER first.
>
>Finally, how hard would it be to incorporate alternative callers into
>MAKER?  For instance, would it be possible to add these like a ?plugin??
>
>chris
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140315/e6294622/attachment.html>

From carson.holt at genetics.utah.edu  Mon Mar 17 14:45:15 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Mon, 17 Mar 2014 19:45:15 +0000
Subject: [maker-devel] non-nucleotide characters in the maker generated
	transcripts
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A890CC84@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A890C8AC@SKREGIXES2.AGR.GC.CA>
	<CF47300B.AB4F%carson.holt@genetics.utah.edu>
	<CF4731CC.AB5E%carson.holt@genetics.utah.edu>
	<E8EDFB90D92694478065C37017B3A3A6A890CC84@SKREGIXES2.AGR.GC.CA>
Message-ID: <CF4CA8DB.AD74%carson.holt@genetics.utah.edu>

I have attached 4 files for you to place in the .../maker/Widgets/
directory.

The *blast.pm files will suppress the BLAST+ failures you are getting
(alternatively you can just downgrade to BLAST 2.27 to get the same
effect).  BLAST 2.29 gives a lot of warnings etc., which you can ignore.
In the latest release NCBI redid all their warnings and error codes so it
spits out a lot of garbage and fails with different messages than it did
before.  For example BLAST now warns you every time it encounter a fasta
header with a comment (virtually every fasta entry in existence falls in
this category), so your screen will be awash with meaningless warning
messages.

The fgenesh.pm file will fix the other failure, which only occurs if you
use fgenesh simultaneously with the est_fustion=1 option.  No other
predictors are affected.

Thanks,
Carson


On 3/14/14, 5:14 PM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:

>Dear  Carson
>
>Sorry for the late reply. I was away for a couple of days. I have uploaded
>the out put files plus control and error output on the FTP site that you
>provided
>The user ID is borhanh
>
>I used blast+ for this run.
>
>
>
>
>Regards
>
>
>HB
>
>
>
>
>
>
>
>
>On 14-03-13 10:00 AM, "Carson Holt" <carson.holt at genetics.utah.edu> wrote:
>
>>Just resending this to the correct maker-devel address.  Please when
>>replying, do not CC the incorrect maker-devel-bounce address.
>>
>>Thanks,
>>Carson
>>
>>
>>On 3/13/14, 9:56 AM, "Carson Holt" <carson.holt at genetics.utah.edu> wrote:
>>
>>>FGENESH is not a heavily used tool, so depending on which version it is
>>>(either too old or too new), output might be slightly different which
>>>could cause incorrect parsing. Could you tar up your maker.output
>>>folder,
>>>and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>>>(send me either your user/guest ID after you upload).
>>>
>>>For the BLAST error, use BLAST+ instead.  You are using blastall which
>>>is
>>>the old legacy version of NCBI BLAST.  You can do this by setting the
>>>blast type in maker_bopts.ctl and the location of executables in
>>>maker_exe.ctl.
>>>
>>>Thanks,
>>>Carson
>>>
>>>
>>>
>>>On 3/12/14, 11:58 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA>
>>>wrote:
>>>
>>>>Dear Maker users
>>>>
>>>>
>>>>I ran maker (2.31) on a fungal genome and found out that it inserted
>>>>the
>>>>word SCLAR   followed by a pair of bracket like this (0x22de7020)
>>>>inserted in the nucleotide sequence of some of the genes. This seems to
>>>>be related to transcripts predicted by fgenesh_masked.
>>>>
>>>>
>>>>Here is an example for one of the genes
>>>>
>>>>
>>>>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript
>>>>>offset:0 AE
>>>>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651
>>>>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23
>>>>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA
>>>>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG
>>>>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC
>>>>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT
>>>>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC
>>>>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT
>>>>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA
>>>>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA
>>>>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT
>>>>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT
>>>>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC
>>>>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG
>>>>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG
>>>>TTTCGACAAGC
>>>>
>>>>The same genome sequence was used for the first round of maker (2.10)
>>>>without such problem. I checked the sequence for the scaffold related
>>>>to
>>>>one of the affected transcripts and there was no error in the sequence.
>>>>I am not sure what is causing this. The only error that I could spot in
>>>>the output error file is the following
>>>>
>>>>
>>>>[blastall] FATAL ERROR:  search cannot proceed due to errors in all
>>>>contexts/frames of query sequences.
>>>>
>>>>
>>>>
>>>>Your help is appreciated
>>>>
>>>>
>>>>
>>>>HB
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: blastn.pm
Type: text/x-perl-script
Size: 8112 bytes
Desc: blastn.pm
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140317/e73c4b0f/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: blastx.pm
Type: text/x-perl-script
Size: 8218 bytes
Desc: blastx.pm
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140317/e73c4b0f/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fgenesh.pm
Type: text/x-perl-script
Size: 19744 bytes
Desc: fgenesh.pm
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140317/e73c4b0f/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tblastx.pm
Type: text/x-perl-script
Size: 9113 bytes
Desc: tblastx.pm
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140317/e73c4b0f/attachment-0003.bin>

From carsonhh at gmail.com  Mon Mar 17 16:14:42 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 17 Mar 2014 15:14:42 -0600
Subject: [maker-devel] Error when running maker2zff script
In-Reply-To: <C9394A0F-A682-4249-80DD-D79E45AE18EA@gmail.com>
References: <C9394A0F-A682-4249-80DD-D79E45AE18EA@gmail.com>
Message-ID: <CF4CBEAF.ADA3%carsonhh@gmail.com>

Just an update on this.  I?ve fixed the maker2zff script to handle the
issues seen.  Looking at this actually brought to light another issue.
There is inconsistent escape character specification for GFF3 in column 1
(the source ID), column 8 (the attributes ID and Target_ID), as well as
the FASTA ID for internal sequence.  We?re updating the GFF3 spec to
clarify this so that everywhere you see the same ID getting treated the
same way for character escaping.
 
To be safe though, only use these characters in your contig IDs for the
assembly when using any tool that reads or outputs GFF3 ?>
a-zA-Z0-9.:^*$@!+_?-|

Any character not in that set has a high chance of breaking some
downstream tool.  For now just assume the strict interpretation from the
GFF3 spec for column 1, must be used on all IDs everywhere (see below).

>>Column 1: ?seqid"
>>The ID of the landmark used to establish the coordinate system for the
>>current feature.
>>IDs may contain any characters, but must escape any characters not in
>>the set [a-zA-Z0-9.:^*$@!+_?-|].
>>In particular, IDs may not contain unescaped whitespace and must not
>>begin with an unescaped ">".


Thanks,
Carson


On 3/13/14, 7:35 PM, "dhivya arasappan" <darasappan at gmail.com> wrote:

>Hi Carson,
>
>I used gff3_merge to create my gff file from maker output. I've
>attached it here. But when I run maker2zff on it, I get the following
>error:
>
>Can't use an undefined value as an ARRAY reference at /opt/apps/maker/
>2.30/bin/maker2zff line 177, <GFF> line 7294251.
>
>It produces an incomplete output file and it looks like it may be
>running into problems when it encounters scaffold3%2F0.  I'm wondering
>if its having problems with my scaffold names. There seem to be some
>inconsistencies because it's referred to as  scaffold3%F0 and
>scaffold3/0 in the gff file.  It goes through other scaffolds like
>SCAFFOLD3_873, SCAFFOLD3_95 etc just fine.   I did try replacing the
>scaffold names in the gff file, but still get the same error.   Any
>ideas?
>
>Substitution command I used, for your reference:  sed 's/3\%2F/3_/g'
>gfffile| sed 's/\//\_/'  > mod.gfffile
>
>Thanks
>Dhivya
>


From darasappan at gmail.com  Mon Mar 17 16:20:18 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Mon, 17 Mar 2014 16:20:18 -0500
Subject: [maker-devel] Error when running maker2zff script
In-Reply-To: <CF4CBEAF.ADA3%carsonhh@gmail.com>
References: <C9394A0F-A682-4249-80DD-D79E45AE18EA@gmail.com>
	<CF4CBEAF.ADA3%carsonhh@gmail.com>
Message-ID: <CAGWaY_61EFs28=2dThqjgnkeisCXjad7JM72ews-fkTn0v7FCA@mail.gmail.com>

Awesome! Thanks Carson.

Dhivya


On Mon, Mar 17, 2014 at 4:14 PM, Carson Holt <carsonhh at gmail.com> wrote:

> Just an update on this.  I've fixed the maker2zff script to handle the
> issues seen.  Looking at this actually brought to light another issue.
> There is inconsistent escape character specification for GFF3 in column 1
> (the source ID), column 8 (the attributes ID and Target_ID), as well as
> the FASTA ID for internal sequence.  We're updating the GFF3 spec to
> clarify this so that everywhere you see the same ID getting treated the
> same way for character escaping.
>
> To be safe though, only use these characters in your contig IDs for the
> assembly when using any tool that reads or outputs GFF3 -->
> a-zA-Z0-9.:^*$@!+_?-|
>
> Any character not in that set has a high chance of breaking some
> downstream tool.  For now just assume the strict interpretation from the
> GFF3 spec for column 1, must be used on all IDs everywhere (see below).
>
> >>Column 1: "seqid"
> >>The ID of the landmark used to establish the coordinate system for the
> >>current feature.
> >>IDs may contain any characters, but must escape any characters not in
> >>the set [a-zA-Z0-9.:^*$@!+_?-|].
> >>In particular, IDs may not contain unescaped whitespace and must not
> >>begin with an unescaped ">".
>
>
> Thanks,
> Carson
>
>
>
> On 3/13/14, 7:35 PM, "dhivya arasappan" <darasappan at gmail.com> wrote:
>
> >Hi Carson,
> >
> >I used gff3_merge to create my gff file from maker output. I've
> >attached it here. But when I run maker2zff on it, I get the following
> >error:
> >
> >Can't use an undefined value as an ARRAY reference at /opt/apps/maker/
> >2.30/bin/maker2zff line 177, <GFF> line 7294251.
> >
> >It produces an incomplete output file and it looks like it may be
> >running into problems when it encounters scaffold3%2F0.  I'm wondering
> >if its having problems with my scaffold names. There seem to be some
> >inconsistencies because it's referred to as  scaffold3%F0 and
> >scaffold3/0 in the gff file.  It goes through other scaffolds like
> >SCAFFOLD3_873, SCAFFOLD3_95 etc just fine.   I did try replacing the
> >scaffold names in the gff file, but still get the same error.   Any
> >ideas?
> >
> >Substitution command I used, for your reference:  sed 's/3\%2F/3_/g'
> >gfffile| sed 's/\//\_/'  > mod.gfffile
> >
> >Thanks
> >Dhivya
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140317/7b1247b8/attachment.html>

From marc.hoeppner at bils.se  Tue Mar 18 06:43:43 2014
From: marc.hoeppner at bils.se (=?windows-1252?Q?Marc_H=F6ppner?=)
Date: Tue, 18 Mar 2014 12:43:43 +0100
Subject: [maker-devel] Maker changes 2.30-2.31
Message-ID: <92E3B1D1-092C-45CF-9DD6-1A5E6921FC15@bils.se>

Hi,

I have observed a few oddities with our installation of maker 2.31 and was therefore wondering if there is a change log somewhere to get some information on what, if anything, was changed between 2.30 and 2.31?

There is of course a good chance that the issues I am seeing (pipeline locking up) are related to our setup and not necessarily Maker - but I?d  like to make sure, if possible. Both versions use the exact same external binaries etc, and were run on the same data. 2.30 is running along happily, 2.31 however has randomly locked up. I should perhaps also say that I am running on SL 6.2 and am using mpich2 for the MPI run. 

I haven?t done any more systematic testing so far, but will probably do so if there is no ?obvious? reason why Maker 2.31 should behave differently..

Cheers,

Marc


Marc P. Hoeppner, PhD
Department for Medical Biochemistry and Microbiology
Uppsala University, Sweden
marc.hoeppner at bils.se


From carsonhh at gmail.com  Tue Mar 18 10:07:07 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 18 Mar 2014 09:07:07 -0600
Subject: [maker-devel] Maker changes 2.30-2.31
In-Reply-To: <92E3B1D1-092C-45CF-9DD6-1A5E6921FC15@bils.se>
References: <92E3B1D1-092C-45CF-9DD6-1A5E6921FC15@bils.se>
Message-ID: <CF4DBC09.ADE0%carsonhh@gmail.com>

Attached.  Also make sure you are using the tar ball from the lab website
and not the prerelease from the subversion repository.

Thanks,
Carson


On 3/18/14, 5:43 AM, "Marc H?ppner" <marc.hoeppner at bils.se> wrote:

>Hi,
>
>I have observed a few oddities with our installation of maker 2.31 and
>was therefore wondering if there is a change log somewhere to get some
>information on what, if anything, was changed between 2.30 and 2.31?
>
>There is of course a good chance that the issues I am seeing (pipeline
>locking up) are related to our setup and not necessarily Maker - but I?d
>like to make sure, if possible. Both versions use the exact same external
>binaries etc, and were run on the same data. 2.30 is running along
>happily, 2.31 however has randomly locked up. I should perhaps also say
>that I am running on SL 6.2 and am using mpich2 for the MPI run.
>
>I haven?t done any more systematic testing so far, but will probably do
>so if there is no ?obvious? reason why Maker 2.31 should behave
>differently..
>
>Cheers,
>
>Marc
>
>
>
>
>Marc P. Hoeppner, PhD
>Department for Medical Biochemistry and Microbiology
>Uppsala University, Sweden
>marc.hoeppner at bils.se
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: svn_log.txt
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140318/65c17518/attachment.txt>

From fbarreto at ucsd.edu  Tue Mar 18 11:08:47 2014
From: fbarreto at ucsd.edu (Felipe Barreto)
Date: Tue, 18 Mar 2014 09:08:47 -0700
Subject: [maker-devel] Size of initial EST training set for SNAP
Message-ID: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>

Hi, all,

I've been learning a lot from reading posts from this group, and finally
started doing actual runs of Maker on our current genome assembly
(arthropod, genome size ~230Mb).  I started by training SNAP, but would
like to check my approach before continuing with longer runs.

>From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I
deemed of very high quality based on blast alignments to Swiss-Prot (based
on query-subject coverage, bit score, etc).  I then used only these 2000
ESTs in a first Maker run using est2genome=1.  The output returned 1500
models (with the 500 "missing" models probably a result of single-exon
issues; not a concern at this point).

I now plan on training SNAP with this first output, and then doing another
Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein
evidence, and 3) SNAP with the first HMM file.  The output of this second
run will be used to re-train SNAP, and this second HMM file will be used in
a final "official" run (while continuing to provide the EST and protein
evidence, of course).

Does this sound like a reasonable approach?  Simply put, my main concern is
whether I'm using too few ESTs in my first est2genome step.

Thanks for any insight!

-- 
Felipe Barreto
Post-doctoral Scholar
Scripps Institution of Oceanography
University of California, San Diego
La Jolla, CA 92093
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140318/c8c3b2ba/attachment.html>

From carsonhh at gmail.com  Tue Mar 18 11:14:29 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 18 Mar 2014 10:14:29 -0600
Subject: [maker-devel] Size of initial EST training set for SNAP
In-Reply-To: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>
References: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>
Message-ID: <CF4DCCE1.ADEE%carsonhh@gmail.com>

That sounds good.  1,500 initial models should be more than sufficient for
the first round of training.

?Carson


From:  Felipe Barreto <fbarreto at ucsd.edu>
Date:  Tuesday, March 18, 2014 at 10:08 AM
To:  MAKER group <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Size of initial EST training set for SNAP

Hi, all,

I've been learning a lot from reading posts from this group, and finally
started doing actual runs of Maker on our current genome assembly
(arthropod, genome size ~230Mb).  I started by training SNAP, but would like
to check my approach before continuing with longer runs.

>From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I
deemed of very high quality based on blast alignments to Swiss-Prot (based
on query-subject coverage, bit score, etc).  I then used only these 2000
ESTs in a first Maker run using est2genome=1.  The output returned 1500
models (with the 500 "missing" models probably a result of single-exon
issues; not a concern at this point).

I now plan on training SNAP with this first output, and then doing another
Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein
evidence, and 3) SNAP with the first HMM file.  The output of this second
run will be used to re-train SNAP, and this second HMM file will be used in
a final "official" run (while continuing to provide the EST and protein
evidence, of course).

Does this sound like a reasonable approach?  Simply put, my main concern is
whether I'm using too few ESTs in my first est2genome step.

Thanks for any insight!

-- 
Felipe Barreto
Post-doctoral Scholar
Scripps Institution of Oceanography
University of California, San Diego
La Jolla, CA 92093 
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140318/2cd5fce1/attachment.html>

From dence at genetics.utah.edu  Tue Mar 18 11:16:20 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Tue, 18 Mar 2014 16:16:20 +0000
Subject: [maker-devel] Size of initial EST training set for SNAP
In-Reply-To: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>
References: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6E483@mxb2.hg.genetics.utah.edu>

Hi Felipe,

I think 1500 models sounds like a good size set with which to train SNAP. I think that SNAP expects ~1000 models for training.

The only other comment on the approach is perhaps that using only one ab-initio predictor is a little bit risky. Using multiple predictors would allow MAKER to select from among their different models for the one that best fits the evidence.

Good luck and let us know if there's anything we can help with!

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Felipe Barreto [fbarreto at ucsd.edu]
Sent: Tuesday, March 18, 2014 10:08 AM
To: MAKER group
Subject: [maker-devel] Size of initial EST training set for SNAP

Hi, all,

I've been learning a lot from reading posts from this group, and finally started doing actual runs of Maker on our current genome assembly (arthropod, genome size ~230Mb).  I started by training SNAP, but would like to check my approach before continuing with longer runs.

>From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I deemed of very high quality based on blast alignments to Swiss-Prot (based on query-subject coverage, bit score, etc).  I then used only these 2000 ESTs in a first Maker run using est2genome=1.  The output returned 1500 models (with the 500 "missing" models probably a result of single-exon issues; not a concern at this point).

I now plan on training SNAP with this first output, and then doing another Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein evidence, and 3) SNAP with the first HMM file.  The output of this second run will be used to re-train SNAP, and this second HMM file will be used in a final "official" run (while continuing to provide the EST and protein evidence, of course).

Does this sound like a reasonable approach?  Simply put, my main concern is whether I'm using too few ESTs in my first est2genome step.

Thanks for any insight!

--
Felipe Barreto
Post-doctoral Scholar
Scripps Institution of Oceanography
University of California, San Diego
La Jolla, CA 92093
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140318/b9bf5ff0/attachment.html>

From barry.utah at gmail.com  Tue Mar 18 11:26:45 2014
From: barry.utah at gmail.com (Barry Moore)
Date: Tue, 18 Mar 2014 10:26:45 -0600
Subject: [maker-devel] Size of initial EST training set for SNAP
In-Reply-To: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>
References: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>
Message-ID: <02A2F388-D911-4C73-BF34-47A125A62EE5@genetics.utah.edu>

Hi Felipe,

I think that plan sounds quite reasonable.  To address your primary concern, most gene prediction tools recommend something in the range of a minimum of a few hundred gene models to train on.  Since your an order of magnitude above that I think your in good shape.  Having said that, of course if you have concerns about biases in your training set you may be able to supplement it further by using a tool like CEGMA (http://korflab.ucdavis.edu/datasets/cegma/) to include high confidence genes that your set is missing.

Since the final gene set will only be as complete as the gene predictions that MAKER has to choose from I would suggest that you also consider including at least one other gene predictor.  Augustus works well on a wide variety of genomes and while it is more difficult to train than SNAP it does accept hints from MAKER and will likely add to the diversity of the final gene set, even if you choose to use an existing HMM that has some reasonable relationship to your genome.  This is one of the advantages of MAKER supervision, while it would be best to train Augustus as well, MAKER will ensure that the final models are not too far out of line with the evidence and you'll likely see quite good results using a custom SNAP HMM and an existing Augustus HMM as predictor within MAKER.

Thanks,

B

On Mar 18, 2014, at 10:08 AM, Felipe Barreto wrote:

> Hi, all,
> 
> I've been learning a lot from reading posts from this group, and finally started doing actual runs of Maker on our current genome assembly (arthropod, genome size ~230Mb).  I started by training SNAP, but would like to check my approach before continuing with longer runs.  
> 
> From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I deemed of very high quality based on blast alignments to Swiss-Prot (based on query-subject coverage, bit score, etc).  I then used only these 2000 ESTs in a first Maker run using est2genome=1.  The output returned 1500 models (with the 500 "missing" models probably a result of single-exon issues; not a concern at this point).
> 
> I now plan on training SNAP with this first output, and then doing another Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein evidence, and 3) SNAP with the first HMM file.  The output of this second run will be used to re-train SNAP, and this second HMM file will be used in a final "official" run (while continuing to provide the EST and protein evidence, of course).
> 
> Does this sound like a reasonable approach?  Simply put, my main concern is whether I'm using too few ESTs in my first est2genome step.
> 
> Thanks for any insight!
> 
> -- 
> Felipe Barreto
> Post-doctoral Scholar
> Scripps Institution of Oceanography
> University of California, San Diego
> La Jolla, CA 92093
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140318/94293e29/attachment.html>

From fbarreto at ucsd.edu  Tue Mar 18 11:59:39 2014
From: fbarreto at ucsd.edu (Felipe Barreto)
Date: Tue, 18 Mar 2014 09:59:39 -0700
Subject: [maker-devel] Size of initial EST training set for SNAP
In-Reply-To: <02A2F388-D911-4C73-BF34-47A125A62EE5@genetics.utah.edu>
References: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>
	<02A2F388-D911-4C73-BF34-47A125A62EE5@genetics.utah.edu>
Message-ID: <CAOi0ENYUcJFJsg0nDj3-9if0E96N+UY=vPyJkfH0T4xvFYOQ3w@mail.gmail.com>

Thanks, guys, for the swift and informative response!  I will try to train
Augustus again, but at the very least, will include it with an arthropod
HMM in my final run (in addition to my custom SNAP HMM).

Cheers,

Felipe


On Tue, Mar 18, 2014 at 9:26 AM, Barry Moore <barry.utah at gmail.com> wrote:

> Hi Felipe,
>
> I think that plan sounds quite reasonable.  To address your primary
> concern, most gene prediction tools recommend something in the range of a
> minimum of a few hundred gene models to train on.  Since your an order of
> magnitude above that I think your in good shape.  Having said that, of
> course if you have concerns about biases in your training set you may be
> able to supplement it further by using a tool like CEGMA (
> http://korflab.ucdavis.edu/datasets/cegma/) to include high confidence
> genes that your set is missing.
>
> Since the final gene set will only be as complete as the gene predictions
> that MAKER has to choose from I would suggest that you also consider
> including at least one other gene predictor.  Augustus works well on a wide
> variety of genomes and while it is more difficult to train than SNAP it
> does accept hints from MAKER and will likely add to the diversity of the
> final gene set, even if you choose to use an existing HMM that has some
> reasonable relationship to your genome.  This is one of the advantages of
> MAKER supervision, while it would be best to train Augustus as well, MAKER
> will ensure that the final models are not too far out of line with the
> evidence and you'll likely see quite good results using a custom SNAP HMM
> and an existing Augustus HMM as predictor within MAKER.
>
> Thanks,
>
> B
>
> On Mar 18, 2014, at 10:08 AM, Felipe Barreto wrote:
>
> Hi, all,
>
> I've been learning a lot from reading posts from this group, and finally
> started doing actual runs of Maker on our current genome assembly
> (arthropod, genome size ~230Mb).  I started by training SNAP, but would
> like to check my approach before continuing with longer runs.
>
> From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I
> deemed of very high quality based on blast alignments to Swiss-Prot (based
> on query-subject coverage, bit score, etc).  I then used only these 2000
> ESTs in a first Maker run using est2genome=1.  The output returned 1500
> models (with the 500 "missing" models probably a result of single-exon
> issues; not a concern at this point).
>
> I now plan on training SNAP with this first output, and then doing another
> Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein
> evidence, and 3) SNAP with the first HMM file.  The output of this second
> run will be used to re-train SNAP, and this second HMM file will be used in
> a final "official" run (while continuing to provide the EST and protein
> evidence, of course).
>
> Does this sound like a reasonable approach?  Simply put, my main concern
> is whether I'm using too few ESTs in my first est2genome step.
>
> Thanks for any insight!
>
> --
> Felipe Barreto
> Post-doctoral Scholar
> Scripps Institution of Oceanography
> University of California, San Diego
> La Jolla, CA 92093
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
> Barry Moore
> Research Scientist
> Dept. of Human Genetics
> University of Utah
> Salt Lake City, UT 84112
> --------------------------------------------
> (801) 585-3543
>
>
>
>
>


-- 
Felipe Barreto
Post-doctoral Scholar
Scripps Institution of Oceanography
University of California, San Diego
La Jolla, CA 92093
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140318/f95daccd/attachment.html>

From darasappan at gmail.com  Tue Mar 18 14:27:11 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Tue, 18 Mar 2014 14:27:11 -0500
Subject: [maker-devel] maker snap output files
Message-ID: <F88D0CA1-15E3-4E01-881F-4F697044B1FC@gmail.com>

Hello,

I ran maker after running SNAP ab initio prediction (following instructions from the maker tutorial).  It ran successfully and when I ran fasta_merge, I got several output fasta files. I?m unable to find information on the tutorial about interpreting these different files. I?m hoping one of you can help.

*maker.proteins.fasta
*maker.snap_masked.proteins.fasta
*maker.non_overlapping_ab_initio.proteins.fasta

What is the difference among these? They all have different number of sequences.

Similarly,with transcripts:

maker.non_overlapping_ab_initio.transcripts.fasta
maker.snap_masked.transcripts.fasta
maker.transcripts.fasta

Thanks
Dhivya


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140318/93fd247e/attachment.html>

From carsonhh at gmail.com  Tue Mar 18 14:34:05 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 18 Mar 2014 13:34:05 -0600
Subject: [maker-devel] maker snap output files
In-Reply-To: <F88D0CA1-15E3-4E01-881F-4F697044B1FC@gmail.com>
References: <F88D0CA1-15E3-4E01-881F-4F697044B1FC@gmail.com>
Message-ID: <CF4DFA69.AE2E%carsonhh@gmail.com>

maker.proteins.fasta - these are the final filtered and modified protein
models (this is what you want)
maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab
initio predictions (for reference purposes)
maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant
rejected models that do not overlap the maker.proteins.fasta entries. If you
think you are missing a gene, look for it here.  Sometimes people use
interproscan (very slow) to analyze this file for false negatives.


These files are also described in the README distributed with MAKER in the
?MAKER OUTPUT? section.

Thanks,
Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Tuesday, March 18, 2014 at 1:27 PM
To:  Carson Holt <carsonhh at gmail.com>, <maker-devel at yandell-lab.org>
Subject:  maker snap output files

Hello,

I ran maker after running SNAP ab initio prediction (following instructions
from the maker tutorial).  It ran successfully and when I ran fasta_merge, I
got several output fasta files. I?m unable to find information on the
tutorial about interpreting these different files. I?m hoping one of you can
help.

*maker.proteins.fasta
*maker.snap_masked.proteins.fasta
*maker.non_overlapping_ab_initio.proteins.fasta

What is the difference among these? They all have different number of
sequences.

Similarly,with transcripts:

maker.non_overlapping_ab_initio.transcripts.fasta
maker.snap_masked.transcripts.fasta
maker.transcripts.fasta

Thanks
Dhivya


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140318/5d1345f9/attachment.html>

From darasappan at gmail.com  Tue Mar 18 15:05:39 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Tue, 18 Mar 2014 15:05:39 -0500
Subject: [maker-devel] maker snap output files
In-Reply-To: <CF4DFA69.AE2E%carsonhh@gmail.com>
References: <F88D0CA1-15E3-4E01-881F-4F697044B1FC@gmail.com>
	<CF4DFA69.AE2E%carsonhh@gmail.com>
Message-ID: <05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com>

Thanks Carson.

Is it normal that in my maker results after running snap, the number of proteins (in *maker.proteins.fasta) Is actually less than the number of proteins in my pre-snap maker results?  I assumed that annotations through alignment+annotation through prediction would equal more annotations?

The unfiltered proteins file has more proteins though.

Thanks
Dhivya


On Mar 18, 2014, at 2:34 PM, Carson Holt <carsonhh at gmail.com> wrote:

> maker.proteins.fasta - these are the final filtered and modified protein models (this is what you want)
> maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab initio predictions (for reference purposes)
> maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant rejected models that do not overlap the maker.proteins.fasta entries. If you think you are missing a gene, look for it here.  Sometimes people use interproscan (very slow) to analyze this file for false negatives.
> 
> 
> These files are also described in the README distributed with MAKER in the ?MAKER OUTPUT? section.
> 
> Thanks,
> Carson
> 
> 
> 
> 
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Tuesday, March 18, 2014 at 1:27 PM
> To: Carson Holt <carsonhh at gmail.com>, <maker-devel at yandell-lab.org>
> Subject: maker snap output files
> 
> Hello,
> 
> I ran maker after running SNAP ab initio prediction (following instructions from the maker tutorial).  It ran successfully and when I ran fasta_merge, I got several output fasta files. I?m unable to find information on the tutorial about interpreting these different files. I?m hoping one of you can help.
> 
> *maker.proteins.fasta
> *maker.snap_masked.proteins.fasta
> *maker.non_overlapping_ab_initio.proteins.fasta
> 
> What is the difference among these? They all have different number of sequences.
> 
> Similarly,with transcripts:
> 
> maker.non_overlapping_ab_initio.transcripts.fasta
> maker.snap_masked.transcripts.fasta
> maker.transcripts.fasta
> 
> Thanks
> Dhivya
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140318/8f85193d/attachment.html>

From carsonhh at gmail.com  Tue Mar 18 15:09:01 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 18 Mar 2014 14:09:01 -0600
Subject: [maker-devel] maker snap output files
In-Reply-To: <05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com>
References: <F88D0CA1-15E3-4E01-881F-4F697044B1FC@gmail.com>
	<CF4DFA69.AE2E%carsonhh@gmail.com>
	<05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com>
Message-ID: <CF4E0363.AE3D%carsonhh@gmail.com>

There can also be hint based predictions.  They may be similar in size, but
there is no rule.  Generally maker.snap_masked.proteins.fasta will be
larger, as gene predictors tend to over predict (as much as 10 fold).  You
should always review your annotations in something like Apollo, to see how
the models compare to the evidence.  Just counts don?t really mean anything.

Thanks,
Carson

From:  dhivya arasappan <darasappan at gmail.com>
Date:  Tuesday, March 18, 2014 at 2:05 PM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  <maker-devel at yandell-lab.org>
Subject:  Re: maker snap output files

Thanks Carson.

Is it normal that in my maker results after running snap, the number of
proteins (in *maker.proteins.fasta) Is actually less than the number of
proteins in my pre-snap maker results?  I assumed that annotations through
alignment+annotation through prediction would equal more annotations?

The unfiltered proteins file has more proteins though.

Thanks
Dhivya


On Mar 18, 2014, at 2:34 PM, Carson Holt <carsonhh at gmail.com> wrote:

> maker.proteins.fasta - these are the final filtered and modified protein
> models (this is what you want)
> maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab initio
> predictions (for reference purposes)
> maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant
> rejected models that do not overlap the maker.proteins.fasta entries. If you
> think you are missing a gene, look for it here.  Sometimes people use
> interproscan (very slow) to analyze this file for false negatives.
> 
> 
> These files are also described in the README distributed with MAKER in the
> ?MAKER OUTPUT? section.
> 
> Thanks,
> Carson
> 
> 
> 
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Tuesday, March 18, 2014 at 1:27 PM
> To:  Carson Holt <carsonhh at gmail.com>, <maker-devel at yandell-lab.org>
> Subject:  maker snap output files
> 
> Hello,
> 
> I ran maker after running SNAP ab initio prediction (following instructions
> from the maker tutorial).  It ran successfully and when I ran fasta_merge, I
> got several output fasta files. I?m unable to find information on the tutorial
> about interpreting these different files. I?m hoping one of you can help.
> 
> *maker.proteins.fasta
> *maker.snap_masked.proteins.fasta
> *maker.non_overlapping_ab_initio.proteins.fasta
> 
> What is the difference among these? They all have different number of
> sequences.
> 
> Similarly,with transcripts:
> 
> maker.non_overlapping_ab_initio.transcripts.fasta
> maker.snap_masked.transcripts.fasta
> maker.transcripts.fasta
> 
> Thanks
> Dhivya
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140318/f5d761ca/attachment.html>

From chrisbioinfo at gmail.com  Wed Mar 19 06:09:57 2014
From: chrisbioinfo at gmail.com (Chris Bioinfo)
Date: Wed, 19 Mar 2014 12:09:57 +0100
Subject: [maker-devel] Annotation with maker2
Message-ID: <CAF+kvSZO+VzHveN+WNmD3O8qayyrOFATS7VA2c-wLdGs1m4iTw@mail.gmail.com>

Hello,

I'm installing/using maker2 for the first time and I have an error by using
it.

I certainly missing something, but I don't know what.

I compile maker with no error message and I have all these directories
after compilation:
bin  data  GMOD  INSTALL  lib  LICENSE  MWAS  perl  README  src

Nevertheless when I try maker2 on the test data (dpp_contig.fasta) I have
this error:

STATUS: Now running MAKER...
examining contents of the fasta file and run log


--Next Contig--

#---------------------------------------------------------------------
Now starting the contig!!
SeqID: contig-dpp-500-500
Length: 32156
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
doing repeat masking
DBI connect('dbname=/path/to/dpp_contig.maker.output/dpp_contig.db','',...)
failed: unable to open database file at
/usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm

Can't call method "do" on an undefined value at
/usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm
--> rank=NA, hostname=belem
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:contig-dpp-500-500
...

ideas?

Best,

Christelle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140319/f54e5d3c/attachment.html>

From carsonhh at gmail.com  Wed Mar 19 08:01:35 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 19 Mar 2014 07:01:35 -0600
Subject: [maker-devel] Annotation with maker2
In-Reply-To: <CAF+kvSZO+VzHveN+WNmD3O8qayyrOFATS7VA2c-wLdGs1m4iTw@mail.gmail.com>
References: <CAF+kvSZO+VzHveN+WNmD3O8qayyrOFATS7VA2c-wLdGs1m4iTw@mail.gmail.com>
Message-ID: <CF4EF035.AE6F%carsonhh@gmail.com>

Your problem is one of the following.  You need to reinstall the DBD::SQLite
module, you are running in a directory you don?t have permissions for, you
set your TMDIR environmental variable or TMP value in maker_opts.ctl to an
NFS mounted or memory mounted directory, or you are using a self compiled
version of Perl (I.e. not /usr/bin/perl) that has issues (probably with DB
or SQLite modules).  You can also completely delete the output directory,
and start again to see if it was just a random error.  You should look at
each of those first.  You can also run MAKER with the --debug command line
flag and send it to me if all of those seem not to be the issue.

Thanks,
Carson


From:  Chris Bioinfo <chrisbioinfo at gmail.com>
Date:  Wednesday, March 19, 2014 at 5:09 AM
To:  <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Annotation with maker2

Hello,

I'm installing/using maker2 for the first time and I have an error by using
it.

I certainly missing something, but I don't know what.

I compile maker with no error message and I have all these directories after
compilation: 
bin  data  GMOD  INSTALL  lib  LICENSE  MWAS  perl  README  src

Nevertheless when I try maker2 on the test data (dpp_contig.fasta) I have
this error:

STATUS: Now running MAKER...
examining contents of the fasta file and run log


--Next Contig--

#---------------------------------------------------------------------
Now starting the contig!!
SeqID: contig-dpp-500-500
Length: 32156
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
doing repeat masking
DBI connect('dbname=/path/to/dpp_contig.maker.output/dpp_contig.db','',...)
failed: unable to open database file at
/usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm

Can't call method "do" on an undefined value at
/usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm
--> rank=NA, hostname=belem
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:contig-dpp-500-500
...

ideas?

Best,

Christelle

_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140319/66e7fe68/attachment.html>

From rbharris at uw.edu  Wed Mar 19 20:19:27 2014
From: rbharris at uw.edu (Rebecca Harris)
Date: Wed, 19 Mar 2014 18:19:27 -0700
Subject: [maker-devel] tradeoff between run time & file number
Message-ID: <CAESS274qd5dL9apLh3sobjkz0+vwjVa9j0Ytd5dR-Qrb4av+=Q@mail.gmail.com>

Hi -

I'm running maker on a dataset of >400,000 scaffolds with MPI -n 64. I've
gone through it once - and used the clean_up option because otherwise maker
exceeds the clusters file_quote. However, now I'm retraining SNAP and it is
taking a very long time - probably because it has to go through BLAST
again. Is there anyway of getting around this? I expect I may have to train
SNAP and rerun maker multiple times and it is taking about 3 weeks to get
through my dataset. Is there a way to prune down my original dataset based
on maker's output?

Thanks,
Rebecca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140319/80de6463/attachment.html>

From dence at genetics.utah.edu  Thu Mar 20 00:43:11 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Thu, 20 Mar 2014 05:43:11 +0000
Subject: [maker-devel] tradeoff between run time & file number
In-Reply-To: <CAESS274qd5dL9apLh3sobjkz0+vwjVa9j0Ytd5dR-Qrb4av+=Q@mail.gmail.com>
References: <CAESS274qd5dL9apLh3sobjkz0+vwjVa9j0Ytd5dR-Qrb4av+=Q@mail.gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6F524@mxb2.hg.genetics.utah.edu>

Hi Rebecca, So, as far as pruning down the dataset goes, I think that the biggest gains will be made by trimming the number of scaffolds that you annotate. What is the n50 of your 400,000 scaffold set? Usually, scaffolds shorter than 5k or 10kbp won't contribute much to the gene counts in the end.

Also, if you can, try to avoid using the alt_est option. It works completely fine, but blasting those sequences takes much longer than blastn or blastp.

Otherwise, I'd need to see your maker_opts.ctl file to see how you've got things set up. You can attach those to your reply (to the maker-devel list), and I'll take a look. I don't how to force maker to create fewer files. You definitely want to be able to make use of the results from prior runs to save time.

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Rebecca Harris [rbharris at uw.edu]
Sent: Wednesday, March 19, 2014 7:19 PM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] tradeoff between run time & file number

Hi -

I'm running maker on a dataset of >400,000 scaffolds with MPI -n 64. I've gone through it once - and used the clean_up option because otherwise maker exceeds the clusters file_quote. However, now I'm retraining SNAP and it is taking a very long time - probably because it has to go through BLAST again. Is there anyway of getting around this? I expect I may have to train SNAP and rerun maker multiple times and it is taking about 3 weeks to get through my dataset. Is there a way to prune down my original dataset based on maker's output?

Thanks,
Rebecca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140320/c636afd0/attachment.html>

From darasappan at gmail.com  Thu Mar 20 12:22:47 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Thu, 20 Mar 2014 12:22:47 -0500
Subject: [maker-devel] maker snap output files
In-Reply-To: <CF4E0363.AE3D%carsonhh@gmail.com>
References: <F88D0CA1-15E3-4E01-881F-4F697044B1FC@gmail.com>
	<CF4DFA69.AE2E%carsonhh@gmail.com>
	<05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com>
	<CF4E0363.AE3D%carsonhh@gmail.com>
Message-ID: <48D7969E-3BA8-4086-8886-11B32CDAA2A2@gmail.com>

Hi Carson,

Given that I now have maker transcripts, ab initio predicted transcripts and transcripts that don?t overlap, which ones are reflected in the gff file?

The ids in the gff file (for exons, genes, mrna) all say something like ?*snap-gene?  so does this mean these are the genes from the snap prediction tool?


Thanks
dhivya


On Mar 18, 2014, at 3:09 PM, Carson Holt <carsonhh at gmail.com> wrote:

> There can also be hint based predictions.  They may be similar in size, but there is no rule.  Generally maker.snap_masked.proteins.fasta will be larger, as gene predictors tend to over predict (as much as 10 fold).  You should always review your annotations in something like Apollo, to see how the models compare to the evidence.  Just counts don?t really mean anything.
> 
> Thanks,
> Carson
> 
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Tuesday, March 18, 2014 at 2:05 PM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: <maker-devel at yandell-lab.org>
> Subject: Re: maker snap output files
> 
> Thanks Carson.
> 
> Is it normal that in my maker results after running snap, the number of proteins (in *maker.proteins.fasta) Is actually less than the number of proteins in my pre-snap maker results?  I assumed that annotations through alignment+annotation through prediction would equal more annotations?
> 
> The unfiltered proteins file has more proteins though.
> 
> Thanks
> Dhivya
> 
> 
> 
> On Mar 18, 2014, at 2:34 PM, Carson Holt <carsonhh at gmail.com> wrote:
> 
>> maker.proteins.fasta - these are the final filtered and modified protein models (this is what you want)
>> maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab initio predictions (for reference purposes)
>> maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant rejected models that do not overlap the maker.proteins.fasta entries. If you think you are missing a gene, look for it here.  Sometimes people use interproscan (very slow) to analyze this file for false negatives.
>> 
>> 
>> These files are also described in the README distributed with MAKER in the ?MAKER OUTPUT? section.
>> 
>> Thanks,
>> Carson
>> 
>> 
>> 
>> 
>> From: dhivya arasappan <darasappan at gmail.com>
>> Date: Tuesday, March 18, 2014 at 1:27 PM
>> To: Carson Holt <carsonhh at gmail.com>, <maker-devel at yandell-lab.org>
>> Subject: maker snap output files
>> 
>> Hello,
>> 
>> I ran maker after running SNAP ab initio prediction (following instructions from the maker tutorial).  It ran successfully and when I ran fasta_merge, I got several output fasta files. I?m unable to find information on the tutorial about interpreting these different files. I?m hoping one of you can help.
>> 
>> *maker.proteins.fasta
>> *maker.snap_masked.proteins.fasta
>> *maker.non_overlapping_ab_initio.proteins.fasta
>> 
>> What is the difference among these? They all have different number of sequences.
>> 
>> Similarly,with transcripts:
>> 
>> maker.non_overlapping_ab_initio.transcripts.fasta
>> maker.snap_masked.transcripts.fasta
>> maker.transcripts.fasta
>> 
>> Thanks
>> Dhivya
>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140320/9aed362d/attachment.html>

From carsonhh at gmail.com  Thu Mar 20 12:24:41 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 20 Mar 2014 11:24:41 -0600
Subject: [maker-devel] maker snap output files
In-Reply-To: <48D7969E-3BA8-4086-8886-11B32CDAA2A2@gmail.com>
References: <F88D0CA1-15E3-4E01-881F-4F697044B1FC@gmail.com>
	<CF4DFA69.AE2E%carsonhh@gmail.com>
	<05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com>
	<CF4E0363.AE3D%carsonhh@gmail.com>
	<48D7969E-3BA8-4086-8886-11B32CDAA2A2@gmail.com>
Message-ID: <CF508021.AF35%carsonhh@gmail.com>

maker transcripts will be the gene/mRNA/exon/CDS features

All other transcripts from SNAP etc. will be match/match_part features in
the GFF3.

When you look at these in something like Apollo, they will be placed in
different viewing panels based on their type.

Thanks,
Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Thursday, March 20, 2014 at 11:22 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  <maker-devel at yandell-lab.org>
Subject:  Re: maker snap output files

Hi Carson,

Given that I now have maker transcripts, ab initio predicted transcripts and
transcripts that don?t overlap, which ones are reflected in the gff file?

The ids in the gff file (for exons, genes, mrna) all say something like
?*snap-gene?  so does this mean these are the genes from the snap prediction
tool?


Thanks
dhivya


On Mar 18, 2014, at 3:09 PM, Carson Holt <carsonhh at gmail.com> wrote:

> There can also be hint based predictions.  They may be similar in size, but
> there is no rule.  Generally maker.snap_masked.proteins.fasta will be larger,
> as gene predictors tend to over predict (as much as 10 fold).  You should
> always review your annotations in something like Apollo, to see how the models
> compare to the evidence.  Just counts don?t really mean anything.
> 
> Thanks,
> Carson
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Tuesday, March 18, 2014 at 2:05 PM
> To:  Carson Holt <carsonhh at gmail.com>
> Cc:  <maker-devel at yandell-lab.org>
> Subject:  Re: maker snap output files
> 
> Thanks Carson.
> 
> Is it normal that in my maker results after running snap, the number of
> proteins (in *maker.proteins.fasta) Is actually less than the number of
> proteins in my pre-snap maker results?  I assumed that annotations through
> alignment+annotation through prediction would equal more annotations?
> 
> The unfiltered proteins file has more proteins though.
> 
> Thanks
> Dhivya
> 
> 
> 
> On Mar 18, 2014, at 2:34 PM, Carson Holt <carsonhh at gmail.com> wrote:
> 
>> maker.proteins.fasta - these are the final filtered and modified protein
>> models (this is what you want)
>> maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab
>> initio predictions (for reference purposes)
>> maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant
>> rejected models that do not overlap the maker.proteins.fasta entries. If you
>> think you are missing a gene, look for it here.  Sometimes people use
>> interproscan (very slow) to analyze this file for false negatives.
>> 
>> 
>> These files are also described in the README distributed with MAKER in the
>> ?MAKER OUTPUT? section.
>> 
>> Thanks,
>> Carson
>> 
>> 
>> 
>> 
>> From:  dhivya arasappan <darasappan at gmail.com>
>> Date:  Tuesday, March 18, 2014 at 1:27 PM
>> To:  Carson Holt <carsonhh at gmail.com>, <maker-devel at yandell-lab.org>
>> Subject:  maker snap output files
>> 
>> Hello,
>> 
>> I ran maker after running SNAP ab initio prediction (following instructions
>> from the maker tutorial).  It ran successfully and when I ran fasta_merge, I
>> got several output fasta files. I?m unable to find information on the
>> tutorial about interpreting these different files. I?m hoping one of you can
>> help.
>> 
>> *maker.proteins.fasta
>> *maker.snap_masked.proteins.fasta
>> *maker.non_overlapping_ab_initio.proteins.fasta
>> 
>> What is the difference among these? They all have different number of
>> sequences.
>> 
>> Similarly,with transcripts:
>> 
>> maker.non_overlapping_ab_initio.transcripts.fasta
>> maker.snap_masked.transcripts.fasta
>> maker.transcripts.fasta
>> 
>> Thanks
>> Dhivya
>> 
>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140320/5d055334/attachment.html>

From carsonhh at gmail.com  Thu Mar 20 12:53:24 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 20 Mar 2014 11:53:24 -0600
Subject: [maker-devel] tradeoff between run time & file number
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6F524@mxb2.hg.genetics.utah.edu>
References: <CAESS274qd5dL9apLh3sobjkz0+vwjVa9j0Ytd5dR-Qrb4av+=Q@mail.gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6F524@mxb2.hg.genetics.utah.edu>
Message-ID: <CF50861A.AF65%carsonhh@gmail.com>

You may also want to try the GFF3 pass_through options.  Basically you give
your GFF3 file to maker_gff, tell it what kinds of evidence to maintain from
your past run by setting the 'pass' options to 1.  Then you can run without
your fast file inputs for ESTs, Proteins, and repeats (also blank out repeat
masker species as well).  The values will be passed forward from the GFF3
file into the current run.

--Carson


From:  Daniel Ence <dence at genetics.utah.edu>
Date:  Wednesday, March 19, 2014 at 11:43 PM
To:  Rebecca Harris <rbharris at uw.edu>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] tradeoff between run time & file number

Hi Rebecca, So, as far as pruning down the dataset goes, I think that the
biggest gains will be made by trimming the number of scaffolds that you
annotate. What is the n50 of your 400,000 scaffold set? Usually, scaffolds
shorter than 5k or 10kbp won't contribute much to the gene counts in the
end. 

Also, if you can, try to avoid using the alt_est option. It works completely
fine, but blasting those sequences takes much longer than blastn or blastp.

Otherwise, I'd need to see your maker_opts.ctl file to see how you've got
things set up. You can attach those to your reply (to the maker-devel list),
and I'll take a look. I don't how to force maker to create fewer files. You
definitely want to be able to make use of the results from prior runs to
save time. 

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330

From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Rebecca
Harris [rbharris at uw.edu]
Sent: Wednesday, March 19, 2014 7:19 PM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] tradeoff between run time & file number

Hi - 

I'm running maker on a dataset of >400,000 scaffolds with MPI -n 64. I've
gone through it once - and used the clean_up option because otherwise maker
exceeds the clusters file_quote. However, now I'm retraining SNAP and it is
taking a very long time - probably because it has to go through BLAST again.
Is there anyway of getting around this? I expect I may have to train SNAP
and rerun maker multiple times and it is taking about 3 weeks to get through
my dataset. Is there a way to prune down my original dataset based on
maker's output?

Thanks,
Rebecca
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140320/583f25f5/attachment.html>

From carsonhh at gmail.com  Fri Mar 21 09:23:18 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 21 Mar 2014 08:23:18 -0600
Subject: [maker-devel] Annotation with maker2
In-Reply-To: <CAF+kvSZZJA1+ZvRfqArTERXSy_aTZJ07w4kE_JgR0eo1mWe3FQ@mail.gmail.com>
References: <CAF+kvSZO+VzHveN+WNmD3O8qayyrOFATS7VA2c-wLdGs1m4iTw@mail.gmail.com>
	<CF4EF035.AE6F%carsonhh@gmail.com>
	<CAF+kvSasxjb7p_Wjtntmy2nht6kfL=JqaP5DfMGeC0GHkLy8Hw@mail.gmail.com>
	<CF5065FF.AEE7%carsonhh@gmail.com>
	<CAF+kvSbKs-sdFfvncqEgsAk4_XKbsB7KdB85fCdxpcWNe1rjWQ@mail.gmail.com>
	<CF506B1E.AEED%carsonhh@gmail.com>
	<CAF+kvSbmpRgneyfz6_tWsx_NS8ZWhuwnQAV0hA83qJrOVh-0hA@mail.gmail.com>
	<CAF+kvSZD7rpUeeoNGMKBGbH0zZN3bHksJFuUPP+hGoUKki34jw@mail.gmail.com>
	<CF506F04.AEF8%carsonhh@gmail.com>
	<CAF+kvSY_nWAFBH1YpKJqWV7qQ=XehHzhX9e+65miAG4f_+=ptA@mail.gmail.com>
	<CAF+kvSYYTA8pYFc0WY12+g6T_bk7P9MRUxNpzqtGkJARsA0wpg@mail.gmail.com>
	<CF50741C.AF02%carsonhh@gmail.com>
	<CAF+kvSZAhJnJdq+UcRfpWSya+6W26ecZHkRvHzGLsqk6K=fmQg@mail.gmail.com>
	<CF507AB2.AF1E%carsonhh@gmail.com>
	<CF507F90.AF30%carsonhh@gmail.com>
	<CAF+kvSZZJA1+ZvRfqArTERXSy_aTZJ07w4kE_JgR0eo1mWe3FQ@mail.gmail.com>
Message-ID: <CF51A74A.AFA8%carsonhh@gmail.com>

Glad it's working.  Let us know if anything else comes up.

--Carson


From:  Chris Bioinfo <chrisbioinfo at gmail.com>
Date:  Friday, March 21, 2014 at 4:57 AM
To:  Carson Holt <carsonhh at gmail.com>
Subject:  Re: [maker-devel] Annotation with maker2

Dear Carson

it works!! after many difficults :

I have installed sqlite3.8.4.1 yesterday: it was """better"""" (no error
message by launching sqlite3). Yet my test.db was not created..

Today I find the trick!
the problem was due to my too long path to created the db .. only that...

Thanks for your time and you help Carson!

All the best,

Christelle


2014-03-20 18:21 GMT+01:00 Carson Holt <carsonhh at gmail.com>:
> Also you can use this command line to test both before and after installing
> 
> perl -MDBI -MDBD::SQLite -e 'print "$DBD::SQLite::sqlite_version\n"; $dbh =
> DBI->connect("dbi:SQLite:dbname=/path/from/maker/error/dpp_contig.db","","");'
> 
> Make sure to set /path/from/maker/error/dpp_contig.db to whatever its was in
> the error.
> 
> --Carson
> 
> 
> From:  Carson Holt <carsonhh at gmail.com>
> Date:  Thursday, March 20, 2014 at 11:03 AM
> To:  Chris Bioinfo <chrisbioinfo at gmail.com>
> 
> Subject:  Re: [maker-devel] Annotation with maker2
> 
> The failure is in SQLite.  So you have to reinstall.  I.e. 'force install
> DBD::SQLite' in CPAN.  Otherwise you are just keeping whatever module is
> installed which may have broken C bindings.
> 
> You may also have to install SQLite 3.8.4.1, and then reinstall the perl
> modules using the force option to force recompile.
> 
> --Carson
> 
> 
> 
> From:  Chris Bioinfo <chrisbioinfo at gmail.com>
> Date:  Thursday, March 20, 2014 at 10:57 AM
> To:  Carson Holt <carsonhh at gmail.com>
> Subject:  Re: [maker-devel] Annotation with maker2
> 
> cpan[2]> install DBI
> DBI is up to date (1.631).
> 
> cpan[3]> install DBD::SQLite
> DBD::SQLite is up to date (1.42).
> 
> my test.db is not created effectively:
> 
> sqlite3 dpp_contig.maker.output/test.db
> SQLite version 3.8.3.1 2014-02-11 14:52:19
> Enter ".help" for instructions
> Enter SQL statements terminated with a ";"
> sqlite> 
> 
> 
> 
> 
> 2014-03-20 17:36 GMT+01:00 Carson Holt <carsonhh at gmail.com>:
>> I'm actually checking the mount points for the disk.  SQLite won't work on
>> filesystems that don't implement locks, and 'df' is a good way to infer some
>> of that info.
>> 
>> Basically I still think this is SQLlite failing on your system.  You might
>> need to reinstall SQLlite and then reinstall the perl DBI and DBD::SQLite
>> modules.
>> 
>> You can also do a test command --> 'sqllite3 dpp_contig.maker.output/test.db'
>> 
>> This will work if you have sqllite3 installed.  And any error it give may be
>> informative.
>> 
>> --Carson
>> 
>> From:  Chris Bioinfo <chrisbioinfo at gmail.com>
>> Date:  Thursday, March 20, 2014 at 10:29 AM
>> 
>> To:  Carson Holt <carsonhh at gmail.com>
>> Subject:  Re: [maker-devel] Annotation with maker2
>> 
>> oh sorry
>> 
>> my disks are quite full, but still space I guess for maker
>> 
>>  /dev/sdc1           19T     18T  934G  95% /home
>> 
>> 
>> 2014-03-20 17:23 GMT+01:00 Chris Bioinfo <chrisbioinfo at gmail.com>:
>>> this :
>>> 
>>>  du -h dpp_contig.maker.output/
>>> 0    
>>> dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500/theVoi
>>> d.contig-dpp-500-500/0
>>> 88K    
>>> dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500/theVoi
>>> d.contig-dpp-500-500
>>> 92K    dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500
>>> 92K    dpp_contig.maker.output/dpp_contig_datastore/05/1F
>>> 92K    dpp_contig.maker.output/dpp_contig_datastore/05
>>> 92K    dpp_contig.maker.output/dpp_contig_datastore
>>> 4.0K    dpp_contig.maker.output/dpp_contig_master_datastore_index.log
>>> 4.0K    dpp_contig.maker.output/maker_bopts.log
>>> 4.0K    dpp_contig.maker.output/maker_exe.log
>>> 8.0K    dpp_contig.maker.output/maker_opts.log
>>> 16K    dpp_contig.maker.output/mpi_blastdb/dpp_protein%2Efasta.mpi.1
>>> 44K    dpp_contig.maker.output/mpi_blastdb/dpp_contig%2Efasta.mpi.1
>>> 14M    dpp_contig.maker.output/mpi_blastdb/te_proteins%2Efasta.mpi.10
>>> 32K    dpp_contig.maker.output/mpi_blastdb/dpp_est%2Efasta.mpi.1
>>> 14M    dpp_contig.maker.output/mpi_blastdb
>>> 0    dpp_contig.maker.output/seen.dbm
>>> 
>>> 
>>> 
>>> 2014-03-20 17:10 GMT+01:00 Carson Holt <carsonhh at gmail.com>:
>>> 
>>>> What does 'df -h dpp_contig.maker.output' show?
>>>> 
>>>> --Carson
>>>> 
>>>> From:  Chris Bioinfo <chrisbioinfo at gmail.com>
>>>> Date:  Thursday, March 20, 2014 at 10:00 AM
>>>> 
>>>> To:  Carson Holt <carsonhh at gmail.com>
>>>> Subject:  Re: [maker-devel] Annotation with maker2
>>>> 
>>>> sorry, mistake on the dir!
>>>> 
>>>> I have these files:
>>>> dpp_contig_datastore  dpp_contig_master_datastore_index.log
>>>> maker_bopts.log  maker_exe.log  maker_opts.log  mpi_blastdb  seen.dbm
>>>> 
>>>> 
>>>> 2014-03-20 16:59 GMT+01:00 Chris Bioinfo <chrisbioinfo at gmail.com>:
>>>>> no,
>>>>> 
>>>>> I have theses files in the directory:
>>>>> dpp_contig.fasta         dpp_est.fasta      hsap_contig.fasta
>>>>> hsap_protein.fasta  maker_exe.ctl
>>>>> dpp_contig.maker.output  dpp_protein.fasta  hsap_est.fasta
>>>>> maker_bopts.ctl     maker_opts.ctl  te_proteins.fasta
>>>>> 
>>>>> 
>>>>> 
>>>>> 2014-03-20 16:53 GMT+01:00 Carson Holt <carsonhh at gmail.com>:
>>>>> 
>>>>>> Did 
>>>>>> /home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/dpp_contig.maker.
>>>>>> output/dpp_contig.db exist?
>>>>>> 
>>>>>> --Carson
>>>>>> 
>>>>>> 
>>>>>> From:  Chris Bioinfo <chrisbioinfo at gmail.com>
>>>>>> Date:  Thursday, March 20, 2014 at 9:50 AM
>>>>>> 
>>>>>> To:  Carson Holt <carsonhh at gmail.com>
>>>>>> Subject:  Re: [maker-devel] Annotation with maker2
>>>>>> 
>>>>>> cdantec at belem:~$ /usr/bin/perl -v
>>>>>> 
>>>>>> This is perl 5, version 18, subversion 1 (v5.18.1) built for
>>>>>> x86_64-linux-gnu-thread-multi
>>>>>> (with 46 registered patches, see perl -V for more detail)
>>>>>> 
>>>>>> Copyright 1987-2013, Larry Wall
>>>>>> 
>>>>>> Perl may be copied only under the terms of either the Artistic License or
>>>>>> the
>>>>>> GNU General Public License, which may be found in the Perl 5 source kit.
>>>>>> 
>>>>>> Complete documentation for Perl, including FAQ lists, should be found on
>>>>>> this system using "man perl" or "perldoc perl".  If you have access to
>>>>>> the
>>>>>> Internet, point your browser at http://www.perl.org/, the Perl Home Page.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 2014-03-20 16:32 GMT+01:00 Carson Holt <carsonhh at gmail.com>:
>>>>>>> What do you get for when you type --> /usr/bin/perl -v
>>>>>>> 
>>>>>>> The key to the error is this line -->
>>>>>>> DBI 
>>>>>>> connect('dbname=/home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/
>>>>>>> dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open
>>>>>>> database file
>>>>>>> 
>>>>>>> Either the database doesn't exist, or is corrupt.  Does it exist?
>>>>>>> 
>>>>>>> --Carson
>>>>>>> 
>>>>>>> From:  Chris Bioinfo <chrisbioinfo at gmail.com>
>>>>>>> Date:  Thursday, March 20, 2014 at 9:25 AM
>>>>>>> To:  Carson Holt <carsonhh at gmail.com>
>>>>>>> Subject:  Re: [maker-devel] Annotation with maker2
>>>>>>> 
>>>>>>> Dear Carson,
>>>>>>> 
>>>>>>> I have reinstalled DBD::SQLite module, check the permission in my
>>>>>>> directory, configure the TMP value in maker_opts.ctl. perl is in
>>>>>>> /usr/bin/perl.
>>>>>>> I have deleted many times  the output directory.. but same problem..
>>>>>>> 
>>>>>>> So here the debug output :
>>>>>>> ****MODULE VERSION INFO
>>>>>>>     0.05    Acme::Damn    /usr/local/lib/perl/5.18.1/Acme/Damn.pm
>>>>>>>     1.01    AnyDBM_File    /usr/share/perl/5.18/AnyDBM_File.pm
>>>>>>>     5.73    AutoLoader    /usr/share/perl/5.18/AutoLoader.pm
>>>>>>>     UNKNOWN    Bio::AnalysisParserI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/AnalysisParserI.pm
>>>>>>>     UNKNOWN    Bio::AnnotatableI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/AnnotatableI.pm
>>>>>>>     UNKNOWN    Bio::Annotation::Collection
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Annotation/Collection.pm
>>>>>>>     UNKNOWN    Bio::Annotation::SimpleValue
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Annotation/SimpleValue.pm
>>>>>>>     UNKNOWN    Bio::Annotation::TypeManager
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Annotation/TypeManager.pm
>>>>>>>     UNKNOWN    Bio::AnnotationCollectionI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/AnnotationCollectionI.pm
>>>>>>>     UNKNOWN    Bio::AnnotationI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/AnnotationI.pm
>>>>>>>     1.006923    Bio::DB::Fasta
>>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/Fasta.pm
>>>>>>>     UNKNOWN    Bio::DB::InMemoryCache
>>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/InMemoryCache.pm
>>>>>>>     UNKNOWN    Bio::DB::IndexedBase
>>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/IndexedBase.pm
>>>>>>>     UNKNOWN    Bio::DB::RandomAccessI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/RandomAccessI.pm
>>>>>>>     UNKNOWN    Bio::DB::SeqI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/SeqI.pm
>>>>>>>     UNKNOWN    Bio::DescribableI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/DescribableI.pm
>>>>>>>     UNKNOWN    Bio::Event::EventGeneratorI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Event/EventGeneratorI.pm
>>>>>>>     UNKNOWN    Bio::Event::EventHandlerI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Event/EventHandlerI.pm
>>>>>>>     UNKNOWN    Bio::Factory::ObjectFactory
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Factory/ObjectFactory.pm
>>>>>>>     UNKNOWN    Bio::Factory::ObjectFactoryI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Factory/ObjectFactoryI.pm
>>>>>>>     UNKNOWN    Bio::Factory::SequenceFactoryI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Factory/SequenceFactoryI.pm
>>>>>>>     UNKNOWN    Bio::FeatureHolderI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/FeatureHolderI.pm
>>>>>>>     UNKNOWN    Bio::IdentifiableI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/IdentifiableI.pm
>>>>>>>     UNKNOWN    Bio::LocatableSeq
>>>>>>> /usr/local/share/perl/5.18.1/Bio/LocatableSeq.pm
>>>>>>>     UNKNOWN    Bio::Location::Atomic
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Atomic.pm
>>>>>>>     UNKNOWN    Bio::Location::CoordinatePolicyI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/CoordinatePolicyI.pm
>>>>>>>     UNKNOWN    Bio::Location::Fuzzy
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Fuzzy.pm
>>>>>>>     UNKNOWN    Bio::Location::FuzzyLocationI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/FuzzyLocationI.pm
>>>>>>>     UNKNOWN    Bio::Location::Simple
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Simple.pm
>>>>>>>     UNKNOWN    Bio::Location::Split
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Split.pm
>>>>>>>     UNKNOWN    Bio::Location::SplitLocationI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/SplitLocationI.pm
>>>>>>>     UNKNOWN    Bio::Location::WidestCoordPolicy
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/WidestCoordPolicy.pm
>>>>>>>     UNKNOWN    Bio::LocationI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/LocationI.pm
>>>>>>>     UNKNOWN    Bio::PrimarySeq
>>>>>>> /usr/local/share/perl/5.18.1/Bio/PrimarySeq.pm
>>>>>>>     1.006923    Bio::PrimarySeqI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/PrimarySeqI.pm
>>>>>>>     UNKNOWN    Bio::Range    /usr/local/share/perl/5.18.1/Bio/Range.pm
>>>>>>>     UNKNOWN    Bio::RangeI    /usr/local/share/perl/5.18.1/Bio/RangeI.pm
>>>>>>>     1.006923    Bio::Root::Exception
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/Exception.pm
>>>>>>>     UNKNOWN    Bio::Root::HTTPget
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/HTTPget.pm
>>>>>>>     UNKNOWN    Bio::Root::IO
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/IO.pm
>>>>>>>     1.006923    Bio::Root::Root
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/Root.pm
>>>>>>>     1.006923    Bio::Root::RootI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/RootI.pm
>>>>>>>     1.006923    Bio::Root::Version
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/Version.pm
>>>>>>>     UNKNOWN    Bio::Search::HSP::GenericHSP
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/HSP/GenericHSP.pm
>>>>>>>     UNKNOWN    Bio::Search::HSP::HSPFactory
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/HSP/HSPFactory.pm
>>>>>>>     UNKNOWN    Bio::Search::HSP::HSPI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/HSP/HSPI.pm
>>>>>>>     0.01    Bio::Search::HSP::PhatHSP::Base
>>>>>>> 
/usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/Base.p>>>>>>>
m
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::augustus
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/august
>>>>>>> us.pm <http://augustus.pm>
>>>>>>>     0.01    Bio::Search::HSP::PhatHSP::blastn
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/blastn
>>>>>>> .pm <http://blastn.pm>
>>>>>>>     0.01    Bio::Search::HSP::PhatHSP::blastx
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/blastx
>>>>>>> .pm <http://blastx.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::cdna2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/cdna2g
>>>>>>> enome.pm <http://cdna2genome.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::est2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/est2ge
>>>>>>> nome.pm <http://est2genome.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::fgenesh
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/fgenes
>>>>>>> h.pm <http://fgenesh.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::genemark
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/genema
>>>>>>> rk.pm <http://genemark.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::gff3
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/gff3.p
>>>>>>> m <http://gff3.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::protein2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/protei
>>>>>>> n2genome.pm <http://protein2genome.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::repeatmasker
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/repeat
>>>>>>> masker.pm <http://repeatmasker.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::snap
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/snap.p
>>>>>>> m <http://snap.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::snoscan
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/snosca
>>>>>>> n.pm <http://snoscan.pm>
>>>>>>>     0.01    Bio::Search::HSP::PhatHSP::tblastx
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/tblast
>>>>>>> x.pm <http://tblastx.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::trnascan
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/trnasc
>>>>>>> an.pm <http://trnascan.pm>
>>>>>>>     1.006923    Bio::Search::Hit::GenericHit
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/Hit/GenericHit.pm
>>>>>>>     UNKNOWN    Bio::Search::Hit::HitFactory
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/Hit/HitFactory.pm
>>>>>>>     UNKNOWN    Bio::Search::Hit::HitI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/Hit/HitI.pm
>>>>>>>     0.01    Bio::Search::Hit::PhatHit::Base
>>>>>>> 
/usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/Base.p>>>>>>>
m
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::augustus
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/august
>>>>>>> us.pm <http://augustus.pm>
>>>>>>>     0.01    Bio::Search::Hit::PhatHit::blastn
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/blastn
>>>>>>> .pm <http://blastn.pm>
>>>>>>>     0.01    Bio::Search::Hit::PhatHit::blastx
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/blastx
>>>>>>> .pm <http://blastx.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::cdna2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/cdna2g
>>>>>>> enome.pm <http://cdna2genome.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::est2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/est2ge
>>>>>>> nome.pm <http://est2genome.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::fgenesh
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/fgenes
>>>>>>> h.pm <http://fgenesh.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::genemark
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/genema
>>>>>>> rk.pm <http://genemark.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::gff3
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/gff3.p
>>>>>>> m <http://gff3.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::protein2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/protei
>>>>>>> n2genome.pm <http://protein2genome.pm>
>>>>>>>     1.006923    Bio::Search::Hit::PhatHit::repeatmasker
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/repeat
>>>>>>> masker.pm <http://repeatmasker.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::snap
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/snap.p
>>>>>>> m <http://snap.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::snoscan
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/snosca
>>>>>>> n.pm <http://snoscan.pm>
>>>>>>>     0.01    Bio::Search::Hit::PhatHit::tblastx
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/tblast
>>>>>>> x.pm <http://tblastx.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::trnascan
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/trnasc
>>>>>>> an.pm <http://trnascan.pm>
>>>>>>>     1.006923    Bio::Search::SearchUtils
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/SearchUtils.pm
>>>>>>>     UNKNOWN    Bio::SearchIO
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SearchIO.pm
>>>>>>>     UNKNOWN    Bio::SearchIO::EventHandlerI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SearchIO/EventHandlerI.pm
>>>>>>>     UNKNOWN    Bio::SearchIO::SearchResultEventBuilder
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SearchIO/SearchResultEventBuilder.pm
>>>>>>>     UNKNOWN    Bio::Seq    /usr/local/share/perl/5.18.1/Bio/Seq.pm
>>>>>>>     UNKNOWN    Bio::Seq::SeqFactory
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Seq/SeqFactory.pm
>>>>>>>     UNKNOWN    Bio::SeqAnalysisParserI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqAnalysisParserI.pm
>>>>>>>     UNKNOWN    Bio::SeqFeature::FeaturePair
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/FeaturePair.pm
>>>>>>>     UNKNOWN    Bio::SeqFeature::Generic
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/Generic.pm
>>>>>>>     UNKNOWN    Bio::SeqFeature::Similarity
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/Similarity.pm
>>>>>>>     UNKNOWN    Bio::SeqFeature::SimilarityPair
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/SimilarityPair.pm
>>>>>>>     UNKNOWN    Bio::SeqFeatureI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeatureI.pm
>>>>>>>     UNKNOWN    Bio::SeqI    /usr/local/share/perl/5.18.1/Bio/SeqI.pm
>>>>>>>     UNKNOWN    Bio::SeqUtils
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqUtils.pm
>>>>>>>     1.006923    Bio::Tools::CodonTable
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Tools/CodonTable.pm
>>>>>>>     UNKNOWN    Bio::Tools::GFF
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Tools/GFF.pm
>>>>>>>     1.006923    Bio::Tools::IUPAC
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Tools/IUPAC.pm
>>>>>>>     7.3    Bit::Vector    /usr/local/lib/perl/5.18.1/Bit/Vector.pm
>>>>>>>     0.01    CGL::Annotation
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation.pm
>>>>>>>     0.01    CGL::Annotation::Feature
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature.pm
>>>>>>>     0.01    CGL::Annotation::Feature::Contig
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Contig
>>>>>>> .pm
>>>>>>>     0.01    CGL::Annotation::Feature::Exon
>>>>>>> 
/usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Exon.p>>>>>>>
m
>>>>>>>     0.01    CGL::Annotation::Feature::Gene
>>>>>>> 
/usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Gene.p>>>>>>>
m
>>>>>>>     0.01    CGL::Annotation::Feature::Intron
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Intron
>>>>>>> .pm
>>>>>>>     0.01    CGL::Annotation::Feature::Protein
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Protei
>>>>>>> n.pm
>>>>>>>     0.01    CGL::Annotation::Feature::Sequence_variant
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Sequen
>>>>>>> ce_variant.pm
>>>>>>>     0.01    CGL::Annotation::Feature::Transcript
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Transc
>>>>>>> ript.pm
>>>>>>>     0.01    CGL::Annotation::FeatureLocation
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/FeatureLocatio
>>>>>>> n.pm
>>>>>>>     0.01    CGL::Annotation::FeatureRelationship
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/FeatureRelatio
>>>>>>> nship.pm
>>>>>>>     0.01    CGL::Annotation::Iterator
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Iterator.pm
>>>>>>>     0.01    CGL::Annotation::Trace
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Trace.pm
>>>>>>>     0.01    CGL::Clone
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Clone.pm
>>>>>>>     0.01    CGL::Ontology::Node
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Node.pm
>>>>>>>     0.01    CGL::Ontology::NodeRelationship
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/NodeRelationship
>>>>>>> .pm
>>>>>>>     0.01    CGL::Ontology::Ontology
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Ontology.pm
>>>>>>>     0.01    CGL::Ontology::Parser::OBO
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Parser/OBO.pm
>>>>>>>     0.01    CGL::Ontology::SO
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/SO.pm
>>>>>>>     0.01    CGL::Ontology::Trace
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Trace.pm
>>>>>>>     0.01    CGL::Revcomp
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Revcomp.pm
>>>>>>>     0.01    CGL::TranslationMachine
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/TranslationMachine.pm
>>>>>>>     1.32    Carp    /usr/local/share/perl/5.18.1/Carp.pm
>>>>>>>     1.32    Carp::Heavy    /usr/local/share/perl/5.18.1/Carp/Heavy.pm
>>>>>>>     0.64    Class::Struct    /usr/share/perl/5.18/Class/Struct.pm
>>>>>>>     0.36    Clone    /usr/local/lib/perl/5.18.1/Clone.pm
>>>>>>>     5.018001    Config    /usr/lib/perl/5.18/Config.pm
>>>>>>>     3.40    Cwd    /usr/lib/perl/5.18/Cwd.pm
>>>>>>>     1.42    DBD::SQLite    /usr/local/lib/perl/5.18.1/DBD/SQLite.pm
>>>>>>>     1.631    DBI    /usr/local/lib/perl/5.18.1/DBI.pm
>>>>>>>     1.827    DB_File    /usr/lib/perl/5.18/DB_File.pm
>>>>>>>     2.145    Data::Dumper    /usr/lib/perl/5.18/Data/Dumper.pm
>>>>>>>     0.11    Datastore::Base
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Datastore/Base.pm
>>>>>>>     0.01    Datastore::MD5
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Datastore/MD5.pm
>>>>>>>     2.53    Digest::MD5    /usr/local/lib/perl/5.18.1/Digest/MD5.pm
>>>>>>>     1.16    Digest::base    /usr/share/perl/5.18/Digest/base.pm
>>>>>>> <http://base.pm>
>>>>>>>     UNKNOWN    Dumper::GFF::GFFV3
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Dumper/GFF/GFFV3.pm
>>>>>>>     UNKNOWN    Dumper::XML::Game
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Dumper/XML/Game.pm
>>>>>>>     UNKNOWN    Dumper::XML::Game_Xml
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Dumper/XML/Game_Xml.pm
>>>>>>>     1.18    DynaLoader    /usr/lib/perl/5.18/DynaLoader.pm
>>>>>>>     1.18    Errno    /usr/lib/perl/5.18/Errno.pm
>>>>>>>     0.17015    Error
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm
>>>>>>>     UNKNOWN    Error::Simple
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error/Simple.pm
>>>>>>>     5.68    Exporter    /usr/share/perl/5.18/Exporter.pm
>>>>>>>     5.68    Exporter::Heavy    /usr/share/perl/5.18/Exporter/Heavy.pm
>>>>>>>     UNKNOWN    Fasta
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Fasta.pm
>>>>>>>     UNKNOWN    FastaChunk
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaChunk.pm
>>>>>>>     UNKNOWN    FastaChunker
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaChunker.pm
>>>>>>>     UNKNOWN    FastaDB
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaDB.pm
>>>>>>>     UNKNOWN    FastaFile
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaFile.pm
>>>>>>>     UNKNOWN    FastaSeq
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaSeq.pm
>>>>>>>     1.11    Fcntl    /usr/lib/perl/5.18/Fcntl.pm
>>>>>>>     2.84    File::Basename    /usr/share/perl/5.18/File/Basename.pm
>>>>>>>     2.26    File::Copy    /usr/share/perl/5.18/File/Copy.pm
>>>>>>>     1.20    File::Glob    /usr/lib/perl/5.18/File/Glob.pm
>>>>>>>     1.20    File::NFSLock
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/File/NFSLock.pm
>>>>>>>     2.09    File::Path    /usr/share/perl/5.18/File/Path.pm
>>>>>>>     3.40    File::Spec    /usr/lib/perl/5.18/File/Spec.pm
>>>>>>>     3.40    File::Spec::Unix    /usr/lib/perl/5.18/File/Spec/Unix.pm
>>>>>>>     0.2304    File::Temp    /usr/local/share/perl/5.18.1/File/Temp.pm
>>>>>>>     1.09    File::Which
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/File/Which.pm
>>>>>>>     2.02    FileHandle    /usr/share/perl/5.18/FileHandle.pm
>>>>>>>     1.51    FindBin    /usr/share/perl/5.18/FindBin.pm
>>>>>>>     UNKNOWN    GFFDB
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm
>>>>>>>     UNKNOWN    GI    /usr/local/annotation/maker2.31/bin/../lib/GI.pm
>>>>>>>     2.42    Getopt::Long    /usr/local/share/perl/5.18.1/Getopt/Long.pm
>>>>>>>     6.02    HTTP::Date    /usr/share/perl5/HTTP/Date.pm
>>>>>>>     6.05    HTTP::Headers    /usr/share/perl5/HTTP/Headers.pm
>>>>>>>     6.06    HTTP::Message    /usr/share/perl5/HTTP/Message.pm
>>>>>>>     6.00    HTTP::Request    /usr/share/perl5/HTTP/Request.pm
>>>>>>>     6.04    HTTP::Response    /usr/share/perl5/HTTP/Response.pm
>>>>>>>     6.03    HTTP::Status    /usr/share/perl5/HTTP/Status.pm
>>>>>>>     1.28    IO    /usr/lib/perl/5.18/IO.pm
>>>>>>>     1.16    IO::File    /usr/lib/perl/5.18/IO/File.pm
>>>>>>>     1.34    IO::Handle    /usr/lib/perl/5.18/IO/Handle.pm
>>>>>>>     1.1    IO::Seekable    /usr/lib/perl/5.18/IO/Seekable.pm
>>>>>>>     1.21    IO::Select    /usr/lib/perl/5.18/IO/Select.pm
>>>>>>>     1.36    IO::Socket    /usr/lib/perl/5.18/IO/Socket.pm
>>>>>>>     1.33    IO::Socket::INET    /usr/lib/perl/5.18/IO/Socket/INET.pm
>>>>>>>     1.24    IO::Socket::UNIX    /usr/lib/perl/5.18/IO/Socket/UNIX.pm
>>>>>>>     1.13    IPC::Open3    /usr/share/perl/5.18/IPC/Open3.pm
>>>>>>>     0.53    Inline    /usr/local/share/perl/5.18.1/Inline.pm
>>>>>>>     UNKNOWN    Inline::denter
>>>>>>> /usr/local/share/perl/5.18.1/Inline/denter.pm <http://denter.pm>
>>>>>>>     UNKNOWN    Iterator
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator.pm
>>>>>>>     UNKNOWN    Iterator::Any
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator/Any.pm
>>>>>>>     UNKNOWN    Iterator::Fasta
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator/Fasta.pm
>>>>>>>     UNKNOWN    Iterator::GFF3
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator/GFF3.pm
>>>>>>>     6.05    LWP    /usr/share/perl5/LWP.pm
>>>>>>>     UNKNOWN    LWP::MemberMixin    /usr/share/perl5/LWP/MemberMixin.pm
>>>>>>>     6.00    LWP::Protocol    /usr/share/perl5/LWP/Protocol.pm
>>>>>>>     6.05    LWP::UserAgent    /usr/share/perl5/LWP/UserAgent.pm
>>>>>>>     0.33    List::MoreUtils
>>>>>>> /usr/local/lib/perl/5.18.1/List/MoreUtils.pm
>>>>>>>     1.38    List::Util    /usr/local/lib/perl/5.18.1/List/Util.pm
>>>>>>>     UNKNOWN    MAKER::ConfigData
>>>>>>> /usr/local/annotation/maker2.31/bin/../perl/lib/MAKER/ConfigData.pm
>>>>>>>     1.32    POSIX    /usr/lib/perl/5.18/POSIX.pm
>>>>>>>     0.01    Parallel::Application::MPI
>>>>>>> /usr/local/annotation/maker2.31/bin/../perl/lib/Parallel/Application/MPI
>>>>>>> .pm
>>>>>>>     0.02    Perl::Unsafe::Signals
>>>>>>> /usr/local/lib/perl/5.18.1/Perl/Unsafe/Signals.pm
>>>>>>>     UNKNOWN    PhatHit_utils
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/PhatHit_utils.pm
>>>>>>>     UNKNOWN    PostData
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/PostData.pm
>>>>>>>     1.0    Proc::ProcessTable_simple
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Proc/ProcessTable_simple.pm
>>>>>>>     1.0    Proc::Signal
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Proc/Signal.pm
>>>>>>>     UNKNOWN    Process::MpiChunk
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm
>>>>>>>     UNKNOWN    Process::MpiTiers
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiTiers.pm
>>>>>>>     1.38    Scalar::Util    /usr/local/lib/perl/5.18.1/Scalar/Util.pm
>>>>>>>     1.02    SelectSaver    /usr/share/perl/5.18/SelectSaver.pm
>>>>>>>     UNKNOWN    Shadower
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Shadower.pm
>>>>>>>     UNKNOWN    SimpleCluster
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/SimpleCluster.pm
>>>>>>>     2.009    Socket    /usr/lib/perl/5.18/Socket.pm
>>>>>>>     UNKNOWN    SpaceBase
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/SpaceBase.pm
>>>>>>>     2.45    Storable    /usr/local/lib/perl/5.18.1/Storable.pm
>>>>>>>     1.07    Symbol    /usr/share/perl/5.18/Symbol.pm
>>>>>>>     1.17    Sys::Hostname    /usr/lib/perl/5.18/Sys/Hostname.pm
>>>>>>>     0.21    Sys::SigAction
>>>>>>> /usr/local/share/perl/5.18.1/Sys/SigAction.pm
>>>>>>>     UNKNOWN    Sys::SigAction::Alarm
>>>>>>> /usr/local/share/perl/5.18.1/Sys/SigAction/Alarm.pm
>>>>>>>     4.02    Term::ANSIColor    /usr/share/perl/5.18/Term/ANSIColor.pm
>>>>>>>     4.2    Tie::Handle    /usr/share/perl/5.18/Tie/Handle.pm
>>>>>>>     1.04    Tie::Hash    /usr/share/perl/5.18/Tie/Hash.pm
>>>>>>>     4.3    Tie::StdHandle    /usr/share/perl/5.18/Tie/StdHandle.pm
>>>>>>>     1.9726    Time::HiRes    /usr/local/lib/perl/5.18.1/Time/HiRes.pm
>>>>>>>     1.2300    Time::Local    /usr/share/perl/5.18/Time/Local.pm
>>>>>>>     1.60    URI    /usr/share/perl5/URI.pm
>>>>>>>     3.31    URI::Escape    /usr/share/perl5/URI/Escape.pm
>>>>>>>     UNKNOWN    Widget
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget.pm
>>>>>>>     UNKNOWN    Widget::RepeatMasker
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/RepeatMasker.pm
>>>>>>>     UNKNOWN    Widget::augustus
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/augustus.pm
>>>>>>> <http://augustus.pm>
>>>>>>>     UNKNOWN    Widget::blastn
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/blastn.pm
>>>>>>> <http://blastn.pm>
>>>>>>>     UNKNOWN    Widget::blastx
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/blastx.pm
>>>>>>> <http://blastx.pm>
>>>>>>>     UNKNOWN    Widget::exonerate
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate.pm
>>>>>>> <http://exonerate.pm>
>>>>>>>     UNKNOWN    Widget::exonerate::cdna2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate/cdna2genome.
>>>>>>> pm <http://cdna2genome.pm>
>>>>>>>     UNKNOWN    Widget::exonerate::est2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate/est2genome.p
>>>>>>> m <http://est2genome.pm>
>>>>>>>     UNKNOWN    Widget::exonerate::protein2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate/protein2geno
>>>>>>> me.pm <http://protein2genome.pm>
>>>>>>>     UNKNOWN    Widget::fgenesh
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/fgenesh.pm
>>>>>>> <http://fgenesh.pm>
>>>>>>>     UNKNOWN    Widget::formater
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/formater.pm
>>>>>>> <http://formater.pm>
>>>>>>>     UNKNOWN    Widget::genemark
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/genemark.pm
>>>>>>> <http://genemark.pm>
>>>>>>>     UNKNOWN    Widget::snap
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/snap.pm
>>>>>>> <http://snap.pm>
>>>>>>>     UNKNOWN    Widget::snoscan
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/snoscan.pm
>>>>>>> <http://snoscan.pm>
>>>>>>>     UNKNOWN    Widget::tblastx
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/tblastx.pm
>>>>>>> <http://tblastx.pm>
>>>>>>>     UNKNOWN    Widget::trnascan
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/trnascan.pm 
>>>>>>> <http://trnascan.pm> 
>>>>>>>     0.16    XSLoader    /usr/share/perl/5.18/XSLoader.pm
>>>>>>>     0.21    attributes    /usr/lib/perl/5.18/attributes.pm 
>>>>>>> <http://attributes.pm> 
>>>>>>>     2.18    base    /usr/share/perl/5.18/base.pm <http://base.pm> 
>>>>>>>     1.04    bytes    /usr/share/perl/5.18/bytes.pm <http://bytes.pm> 
>>>>>>>     UNKNOWN    clean    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/clean.pm <http://clean.pm> 
>>>>>>>     UNKNOWN    cluster    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/cluster.pm 
>>>>>>> <http://cluster.pm> 
>>>>>>>     UNKNOWN    compare    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/compare.pm 
>>>>>>> <http://compare.pm> 
>>>>>>>     1.27    constant    /usr/share/perl/5.18/constant.pm 
>>>>>>> <http://constant.pm> 
>>>>>>>     UNKNOWN    ds_utility    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/ds_utility.pm 
>>>>>>> <http://ds_utility.pm> 
>>>>>>>     UNKNOWN    exonerate::splice_info    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/exonerate/splice_info.pm 
>>>>>>> <http://splice_info.pm> 
>>>>>>>     0.34    forks    /usr/local/lib/perl/5.18.1/forks.pm 
>>>>>>> <http://forks.pm> 
>>>>>>>     2.08001    forks::Devel::Symdump    
>>>>>>> /usr/local/lib/perl/5.18.1/forks/Devel/Symdump.pm
>>>>>>>     0.34    forks::shared    /usr/local/lib/perl/5.18.1/forks/shared.pm 
>>>>>>> <http://shared.pm> 
>>>>>>>     0.34    forks::signals    
>>>>>>> /usr/local/lib/perl/5.18.1/forks/signals.pm <http://signals.pm> 
>>>>>>>     1.00    integer    /usr/share/perl/5.18/integer.pm 
>>>>>>> <http://integer.pm> 
>>>>>>>     0.63    lib    /usr/lib/perl/5.18/lib.pm <http://lib.pm> 
>>>>>>>     1.02    locale    /usr/share/perl/5.18/locale.pm <http://locale.pm> 
>>>>>>>     UNKNOWN    maker::auto_annotator    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/auto_annotator.pm 
>>>>>>> <http://auto_annotator.pm> 
>>>>>>>     UNKNOWN    maker::join    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/join.pm 
>>>>>>> <http://join.pm> 
>>>>>>>     UNKNOWN    maker::quality_index    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/quality_index.pm 
>>>>>>> <http://quality_index.pm> 
>>>>>>>     UNKNOWN    maker::sens_spec    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/sens_spec.pm 
>>>>>>> <http://sens_spec.pm> 
>>>>>>>     1.22    overload    /usr/share/perl/5.18/overload.pm 
>>>>>>> <http://overload.pm> 
>>>>>>>     0.02    overloading    /usr/share/perl/5.18/overloading.pm 
>>>>>>> <http://overloading.pm> 
>>>>>>>     0.225    parent    /usr/share/perl/5.18/parent.pm <http://parent.pm> 
>>>>>>>     UNKNOWN    polisher    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher.pm 
>>>>>>> <http://polisher.pm> 
>>>>>>>     UNKNOWN    polisher::exonerate    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate.pm 
>>>>>>> <http://exonerate.pm> 
>>>>>>>     UNKNOWN    polisher::exonerate::altest    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate/altest.pm 
>>>>>>> <http://altest.pm> 
>>>>>>>     UNKNOWN    polisher::exonerate::est    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate/est.pm 
>>>>>>> <http://est.pm> 
>>>>>>>     UNKNOWN    polisher::exonerate::protein    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate/protein.pm 
>>>>>>> <http://protein.pm> 
>>>>>>>     UNKNOWN    repeat_mask_seq    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/repeat_mask_seq.pm 
>>>>>>> <http://repeat_mask_seq.pm> 
>>>>>>>     0.1    runlog    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/runlog.pm <http://runlog.pm> 
>>>>>>>     UNKNOWN    shadow_AED    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/shadow_AED.pm
>>>>>>>     1.07    sigtrap    /usr/share/perl/5.18/sigtrap.pm 
>>>>>>> <http://sigtrap.pm> 
>>>>>>>     1.07    strict    /usr/share/perl/5.18/strict.pm <http://strict.pm> 
>>>>>>>     1.77    threads    /usr/local/lib/perl/5.18.1/forks.pm 
>>>>>>> <http://forks.pm> 
>>>>>>>     1.33    threads::shared    
>>>>>>> /usr/local/lib/perl/5.18.1/forks/shared.pm <http://shared.pm> 
>>>>>>>     1.03    vars    /usr/share/perl/5.18/vars.pm <http://vars.pm> 
>>>>>>>     1.18    warnings    /usr/share/perl/5.18/warnings.pm 
>>>>>>> <http://warnings.pm> 
>>>>>>>     1.02    warnings::register    
>>>>>>> /usr/share/perl/5.18/warnings/register.pm <http://register.pm> 
>>>>>>> STATUS: Parsing control files...
>>>>>>> Calling GI::load_control_files at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 452.
>>>>>>> Calling GI::new_instance_temp at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 463.
>>>>>>> Calling GI::mount_check at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 465.
>>>>>>> Calling GI::set_global_temp at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 483.
>>>>>>> STATUS: Processing and indexing input FASTA files...
>>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 519.
>>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 519.
>>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 519.
>>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 519.
>>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 519.
>>>>>>> Calling List::Util::shuffle at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 529.
>>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 536.
>>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::nextDef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::nextFastaRef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling File::NFSLock::unlock at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling GI::build_fasta_index at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538.
>>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 539.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 536.
>>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::nextDef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::nextFastaRef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling File::NFSLock::unlock at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling GI::build_fasta_index at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538.
>>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 539.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 536.
>>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::nextDef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::nextFastaRef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling File::NFSLock::unlock at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling GI::build_fasta_index at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538.
>>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 539.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 536.
>>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::nextDef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::nextFastaRef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling File::NFSLock::unlock at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling GI::build_fasta_index at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538.
>>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 539.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling GI::create_blastdb at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 574.
>>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 575.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 575.
>>>>>>> Calling Iterator::Any::nextDef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 575.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 575.
>>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 575.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 575.
>>>>>>> Calling Iterator::Any::nextDef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 575.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 575.
>>>>>>> Calling GI::build_fasta_index at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 622.
>>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 623.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> STATUS: Setting up database for any GFF3 input...
>>>>>>> Calling GFFDB::new at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 629.
>>>>>>> Calling GFFDB::next_build at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 631.
>>>>>>> Calling ds_utility::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 635.
>>>>>>> A data structure will be created for you at:
>>>>>>> /home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/dpp_contig.maker
>>>>>>> .output/dpp_contig_datastore
>>>>>>> 
>>>>>>> To access files for individual sequences use the datastore index:
>>>>>>> /home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/dpp_contig.maker
>>>>>>> .output/dpp_contig_master_datastore_index.log
>>>>>>> 
>>>>>>> Calling Datastore::MD5::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 636.
>>>>>>> Calling Iterator::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 639.
>>>>>>> Calling Iterator::Fasta::skip_file at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 641.
>>>>>>> Calling Iterator::Fasta::step at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 643.
>>>>>>> STATUS: Now running MAKER...
>>>>>>> examining contents of the fasta file and run log
>>>>>>> Calling Datastore::MD5::mkdir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling Datastore::MD5::id_to_dir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling Datastore::MD5::mkdir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling uri_escape at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling File::Path::mkpath at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --Next Contig--
>>>>>>> 
>>>>>>> #---------------------------------------------------------------------
>>>>>>> Now starting the contig!!
>>>>>>> SeqID: contig-dpp-500-500
>>>>>>> Length: 32156
>>>>>>> #---------------------------------------------------------------------
>>>>>>> 
>>>>>>> 
>>>>>>> Calling FastaDB::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 462.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> setting up GFF3 output and fasta chunks
>>>>>>> doing repeat masking
>>>>>>> DBI 
>>>>>>> connect('dbname=/home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/
>>>>>>> dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open 
>>>>>>> database file at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm 
>>>>>>> line 107.
>>>>>>> Can't call method "do" on an undefined value at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm line 108.
>>>>>>> --> rank=NA, hostname=belem
>>>>>>> ERROR: Failed while doing repeat masking
>>>>>>> ERROR: Chunk failed at level:0, tier_type:1
>>>>>>> FAILED CONTIG:contig-dpp-500-500
>>>>>>> 
>>>>>>> ERROR: Chunk failed at level:2, tier_type:0
>>>>>>> FAILED CONTIG:contig-dpp-500-500
>>>>>>> 
>>>>>>> examining contents of the fasta file and run log
>>>>>>> Calling Datastore::MD5::mkdir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling Datastore::MD5::id_to_dir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling Datastore::MD5::mkdir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling uri_escape at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling File::Path::mkpath at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --Next Contig--
>>>>>>> 
>>>>>>> Processing run.log file...
>>>>>>> #---------------------------------------------------------------------
>>>>>>> Now retrying the contig!!
>>>>>>> SeqID: contig-dpp-500-500
>>>>>>> Length: 32156
>>>>>>> Tries: 2!!
>>>>>>> #---------------------------------------------------------------------
>>>>>>> 
>>>>>>> 
>>>>>>> Calling FastaDB::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 462.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> setting up GFF3 output and fasta chunks
>>>>>>> doing repeat masking
>>>>>>> DBI 
>>>>>>> connect('dbname=/home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/
>>>>>>> dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open 
>>>>>>> database file at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm 
>>>>>>> line 107.
>>>>>>> Can't call method "do" on an undefined value at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm line 108.
>>>>>>> --> rank=NA, hostname=belem
>>>>>>> ERROR: Failed while doing repeat masking
>>>>>>> ERROR: Chunk failed at level:0, tier_type:1
>>>>>>> FAILED CONTIG:contig-dpp-500-500
>>>>>>> 
>>>>>>> ERROR: Chunk failed at level:2, tier_type:0
>>>>>>> FAILED CONTIG:contig-dpp-500-500
>>>>>>> 
>>>>>>> examining contents of the fasta file and run log
>>>>>>> Calling Datastore::MD5::mkdir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling Datastore::MD5::id_to_dir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling Datastore::MD5::mkdir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling uri_escape at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling File::Path::mkpath at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --Next Contig--
>>>>>>> 
>>>>>>> Processing run.log file...
>>>>>>> 
>>>>>>> 
>>>>>>> Maker is now finished!!!
>>>>>>> 
>>>>>>> Many thanks for you help
>>>>>>> 
>>>>>>> Christelle
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 2014-03-19 14:01 GMT+01:00 Carson Holt <carsonhh at gmail.com>:
>>>>>>> Your problem is one of the following.  You need to reinstall the 
>>>>>>> DBD::SQLite module, you are running in a directory you don?t have 
>>>>>>> permissions for, you set your TMDIR environmental variable or TMP value 
>>>>>>> in maker_opts.ctl to an NFS mounted or memory mounted directory, or you 
>>>>>>> are using a self compiled version of Perl (I.e. not /usr/bin/perl) that 
>>>>>>> has issues (probably with DB or SQLite modules).  You can also 
>>>>>>> completely delete the output directory, and start again to see if it was 
>>>>>>> just a random error.  You should look at each of those first.  You can 
>>>>>>> also run MAKER with the --debug command line flag and send it to me if 
>>>>>>> all of those seem not to be the issue.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Carson
>>>>>>> 
>>>>>>> 
>>>>>>> From:  Chris Bioinfo <chrisbioinfo at gmail.com>
>>>>>>> Date:  Wednesday, March 19, 2014 at 5:09 AM
>>>>>>> To:  <maker-devel at yandell-lab.org>
>>>>>>> Subject:  [maker-devel] Annotation with maker2
>>>>>>> 
>>>>>>> Hello,
>>>>>>> 
>>>>>>> I'm installing/using maker2 for the first time and I have an error by 
>>>>>>> using it.
>>>>>>> 
>>>>>>> I certainly missing something, but I don't know what.
>>>>>>> 
>>>>>>> I compile maker with no error message and I have all these directories 
>>>>>>> after compilation: 
>>>>>>> bin  data  GMOD  INSTALL  lib  LICENSE  MWAS  perl  README  src
>>>>>>> 
>>>>>>> Nevertheless when I try maker2 on the test data (dpp_contig.fasta) I 
>>>>>>> have this error:
>>>>>>> 
>>>>>>> STATUS: Now running MAKER...
>>>>>>> examining contents of the fasta file and run log
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --Next Contig--
>>>>>>> 
>>>>>>> #---------------------------------------------------------------------
>>>>>>> Now starting the contig!!
>>>>>>> SeqID: contig-dpp-500-500
>>>>>>> Length: 32156
>>>>>>> #---------------------------------------------------------------------
>>>>>>> 
>>>>>>> 
>>>>>>> setting up GFF3 output and fasta chunks
>>>>>>> doing repeat masking
>>>>>>> DBI 
>>>>>>> connect('dbname=/path/to/dpp_contig.maker.output/dpp_contig.db','',...) 
>>>>>>> failed: unable to open database file at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm
>>>>>>> 
>>>>>>> Can't call method "do" on an undefined value at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm 
>>>>>>> --> rank=NA, hostname=belem
>>>>>>> ERROR: Failed while doing repeat masking
>>>>>>> ERROR: Chunk failed at level:0, tier_type:1
>>>>>>> FAILED CONTIG:contig-dpp-500-500
>>>>>>> ...
>>>>>>> 
>>>>>>> ideas?
>>>>>>> 
>>>>>>> Best,
>>>>>>> 
>>>>>>> Christelle
>>>>>>> 
>>>>>>> _______________________________________________ maker-devel mailing list 
>>>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listin
>>>>>>> fo/maker-devel_yandell-lab.org
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140321/add1314d/attachment.html>

From jfierst at uoregon.edu  Fri Mar 21 10:43:59 2014
From: jfierst at uoregon.edu (Janna Fierst)
Date: Fri, 21 Mar 2014 08:43:59 -0700
Subject: [maker-devel] associating gene names between related strains
In-Reply-To: <CF489F0B.AC19%carsonhh@gmail.com>
References: <CAGoyurZz5FvX_oCGtSoq5mzwfabFS5ixaHVgzQds7Bo26NcYHg@mail.gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6C3C3@mxb2.hg.genetics.utah.edu>
	<CF489F0B.AC19%carsonhh@gmail.com>
Message-ID: <CAGoyurYLEQqXv0e9wik4NQUXMZgkrUge2-uuh7xfGWEj9oKGow@mail.gmail.com>

Hi,

I just wanted to say thanks for all your help- I did the reciprocal best
blast hits and then used the maker scripts (map_fasta_ids, map_gff_ids) to
associate names between strain assemblies/annotations. Worked perfectly!
-Janna


On Fri, Mar 14, 2014 at 11:02 AM, Carson Holt <carsonhh at gmail.com> wrote:

> maker_map_ids does a translation (i.e. change gene-A to smug1), so you
> need to know which genes you want to translate names to (two column input
> file, column 1 -> original ID, column 2 -> new ID).  I'm not sure EST
> forward is the best way to do this, although I do think maker_map_ids is
> the tool to use in the end.  The question is how to make a list of IDs to
> translate as the input to maker_map_ids?
>
> I would actually just use BLASTP against the reference strain, and then
> do reciprocal best BLAST hits.  To do this you BLAST your reference
> proteins against your maker proteins.  Then do the opposite, BLAST your
>  maker proteins against your reference proteins.  If they are both each
> others best hit, then they are orthologous, and you can safely make a two
> column entry for the maker_map_ids input (i.e. maker-gene-1 translates into
> smug1).
>
> --Carson
>
>
> From: Daniel Ence <dence at genetics.utah.edu>
> Date: Friday, March 14, 2014 at 11:32 AM
> To: Janna Fierst <jfierst at uoregon.edu>, "maker-devel at yandell-lab.org" <
> maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] associating gene names between related strains
>
> Hi Janna, So do you have one strain that you want to use as the reference
> for all the others? There's a script that comes with MAKER called
> maker_map_ids that lets you use a common prefix or suffix for entries in a
> fasta file from one strain and then use est_forward to use that ID in the
> gene models for the other species.
>
> Let me know if that's not what you're looking for,
> Daniel
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ------------------------------
> *From:* maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
> Janna Fierst [jfierst at uoregon.edu]
> *Sent:* Friday, March 14, 2014 10:06 AM
> *To:* maker-devel at yandell-lab.org
> *Subject:* [maker-devel] associating gene names between related strains
>
> Hi,
>
> we are assembling and annotating genomes for several related strains of
> Caenorhabditis worms and I was wondering if there is a way to coordinate
> the gene naming so that orthologs between species can be associated by
> name. I have been playing around a little with the est_forward option but
> can't figure out a good system/workflow that preserves names but still uses
> the strain-specific RNA-Seq EST set for the actual gene models. Thanks!
> -Janna
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140321/b8ab29c4/attachment.html>

From carsonhh at gmail.com  Fri Mar 21 10:54:15 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 21 Mar 2014 09:54:15 -0600
Subject: [maker-devel] associating gene names between related strains
In-Reply-To: <CAGoyurYLEQqXv0e9wik4NQUXMZgkrUge2-uuh7xfGWEj9oKGow@mail.gmail.com>
References: <CAGoyurZz5FvX_oCGtSoq5mzwfabFS5ixaHVgzQds7Bo26NcYHg@mail.gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6C3C3@mxb2.hg.genetics.utah.edu>
	<CF489F0B.AC19%carsonhh@gmail.com>
	<CAGoyurYLEQqXv0e9wik4NQUXMZgkrUge2-uuh7xfGWEj9oKGow@mail.gmail.com>
Message-ID: <CF51BCA1.AFB9%carsonhh@gmail.com>

I'm glad we could help.

--Carson

From:  Janna Fierst <jfierst at uoregon.edu>
Date:  Friday, March 21, 2014 at 9:43 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] associating gene names between related strains

Hi,

I just wanted to say thanks for all your help- I did the reciprocal best
blast hits and then used the maker scripts (map_fasta_ids, map_gff_ids) to
associate names between strain assemblies/annotations. Worked perfectly!
-Janna


On Fri, Mar 14, 2014 at 11:02 AM, Carson Holt <carsonhh at gmail.com> wrote:
> maker_map_ids does a translation (i.e. change gene-A to smug1), so you need to
> know which genes you want to translate names to (two column input file, column
> 1 -> original ID, column 2 -> new ID).  I?m not sure EST forward is the best
> way to do this, although I do think maker_map_ids is the tool to use in the
> end.  The question is how to make a list of IDs to translate as the input to
> maker_map_ids?
> 
> I would actually just use BLASTP against the reference strain, and then do
> reciprocal best BLAST hits.  To do this you BLAST your reference proteins
> against your maker proteins.  Then do the opposite, BLAST your  maker proteins
> against your reference proteins.  If they are both each others best hit, then
> they are orthologous, and you can safely make a two column entry for the
> maker_map_ids input (i.e. maker-gene-1 translates into smug1).
> 
> ?Carson
> 
> 
> From:  Daniel Ence <dence at genetics.utah.edu>
> Date:  Friday, March 14, 2014 at 11:32 AM
> To:  Janna Fierst <jfierst at uoregon.edu>, "maker-devel at yandell-lab.org"
> <maker-devel at yandell-lab.org>
> Subject:  Re: [maker-devel] associating gene names between related strains
> 
> Hi Janna, So do you have one strain that you want to use as the reference for
> all the others? There's a script that comes with MAKER called maker_map_ids
> that lets you use a common prefix or suffix for entries in a fasta file from
> one strain and then use est_forward to use that ID in the gene models for the
> other species. 
> 
> Let me know if that's not what you're looking for,
> Daniel
> 
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> 
> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Janna
> Fierst [jfierst at uoregon.edu]
> Sent: Friday, March 14, 2014 10:06 AM
> To: maker-devel at yandell-lab.org
> Subject: [maker-devel] associating gene names between related strains
> 
> Hi,
> 
> we are assembling and annotating genomes for several related strains of
> Caenorhabditis worms and I was wondering if there is a way to coordinate the
> gene naming so that orthologs between species can be associated by name. I
> have been playing around a little with the est_forward option but can't figure
> out a good system/workflow that preserves names but still uses the
> strain-specific RNA-Seq EST set for the actual gene models. Thanks! -Janna
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak
> er-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140321/8a62aa07/attachment.html>

From Hossein.Borhan at AGR.GC.CA  Fri Mar 21 11:41:38 2014
From: Hossein.Borhan at AGR.GC.CA (Borhan, Hossein)
Date: Fri, 21 Mar 2014 16:41:38 +0000
Subject: [maker-devel] non-nucleotide characters in the maker generated
	transcripts
In-Reply-To: <CF4CA8DB.AD74%carson.holt@genetics.utah.edu>
References: <E8EDFB90D92694478065C37017B3A3A6A890C8AC@SKREGIXES2.AGR.GC.CA>
	<CF47300B.AB4F%carson.holt@genetics.utah.edu>
	<CF4731CC.AB5E%carson.holt@genetics.utah.edu>
	<E8EDFB90D92694478065C37017B3A3A6A890CC84@SKREGIXES2.AGR.GC.CA>
	<CF4CA8DB.AD74%carson.holt@genetics.utah.edu>
Message-ID: <E8EDFB90D92694478065C37017B3A3A6A890F2F6@SKREGIXES2.AGR.GC.CA>

Dear Carson

I ran maker and modified .pm files and it resolved the problem with the
fasta output. Thanks a lot for your help.


HB


On 14-03-17 1:45 PM, "Carson Holt" <carson.holt at genetics.utah.edu> wrote:

>I have attached 4 files for you to place in the .../maker/Widgets/
>directory.
>
>The *blast.pm files will suppress the BLAST+ failures you are getting
>(alternatively you can just downgrade to BLAST 2.27 to get the same
>effect).  BLAST 2.29 gives a lot of warnings etc., which you can ignore.
>In the latest release NCBI redid all their warnings and error codes so it
>spits out a lot of garbage and fails with different messages than it did
>before.  For example BLAST now warns you every time it encounter a fasta
>header with a comment (virtually every fasta entry in existence falls in
>this category), so your screen will be awash with meaningless warning
>messages.
>
>The fgenesh.pm file will fix the other failure, which only occurs if you
>use fgenesh simultaneously with the est_fustion=1 option.  No other
>predictors are affected.
>
>Thanks,
>Carson
>
>
>On 3/14/14, 5:14 PM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:
>
>>Dear  Carson
>>
>>Sorry for the late reply. I was away for a couple of days. I have
>>uploaded
>>the out put files plus control and error output on the FTP site that you
>>provided
>>The user ID is borhanh
>>
>>I used blast+ for this run.
>>
>>
>>
>>
>>Regards
>>
>>
>>HB
>>
>>
>>
>>
>>
>>
>>
>>
>>On 14-03-13 10:00 AM, "Carson Holt" <carson.holt at genetics.utah.edu>
>>wrote:
>>
>>>Just resending this to the correct maker-devel address.  Please when
>>>replying, do not CC the incorrect maker-devel-bounce address.
>>>
>>>Thanks,
>>>Carson
>>>
>>>
>>>On 3/13/14, 9:56 AM, "Carson Holt" <carson.holt at genetics.utah.edu>
>>>wrote:
>>>
>>>>FGENESH is not a heavily used tool, so depending on which version it is
>>>>(either too old or too new), output might be slightly different which
>>>>could cause incorrect parsing. Could you tar up your maker.output
>>>>folder,
>>>>and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>>>>(send me either your user/guest ID after you upload).
>>>>
>>>>For the BLAST error, use BLAST+ instead.  You are using blastall which
>>>>is
>>>>the old legacy version of NCBI BLAST.  You can do this by setting the
>>>>blast type in maker_bopts.ctl and the location of executables in
>>>>maker_exe.ctl.
>>>>
>>>>Thanks,
>>>>Carson
>>>>
>>>>
>>>>
>>>>On 3/12/14, 11:58 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA>
>>>>wrote:
>>>>
>>>>>Dear Maker users
>>>>>
>>>>>
>>>>>I ran maker (2.31) on a fungal genome and found out that it inserted
>>>>>the
>>>>>word SCLAR   followed by a pair of bracket like this (0x22de7020)
>>>>>inserted in the nucleotide sequence of some of the genes. This seems
>>>>>to
>>>>>be related to transcripts predicted by fgenesh_masked.
>>>>>
>>>>>
>>>>>Here is an example for one of the genes
>>>>>
>>>>>
>>>>>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript
>>>>>>offset:0 AE
>>>>>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651
>>>>>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23
>>>>>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA
>>>>>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG
>>>>>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC
>>>>>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT
>>>>>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC
>>>>>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT
>>>>>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA
>>>>>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA
>>>>>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT
>>>>>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT
>>>>>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC
>>>>>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG
>>>>>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG
>>>>>TTTCGACAAGC
>>>>>
>>>>>The same genome sequence was used for the first round of maker (2.10)
>>>>>without such problem. I checked the sequence for the scaffold related
>>>>>to
>>>>>one of the affected transcripts and there was no error in the
>>>>>sequence.
>>>>>I am not sure what is causing this. The only error that I could spot
>>>>>in
>>>>>the output error file is the following
>>>>>
>>>>>
>>>>>[blastall] FATAL ERROR:  search cannot proceed due to errors in all
>>>>>contexts/frames of query sequences.
>>>>>
>>>>>
>>>>>
>>>>>Your help is appreciated
>>>>>
>>>>>
>>>>>
>>>>>HB
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


From carsonhh at gmail.com  Fri Mar 21 11:43:10 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 21 Mar 2014 10:43:10 -0600
Subject: [maker-devel] non-nucleotide characters in the maker generated
 transcripts
Message-ID: <CF51C832.AFC0%carsonhh@gmail.com>

Thanks for letting me know.

--Carson


On 3/21/14, 10:41 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:

>Dear Carson
>
>I ran maker and modified .pm files and it resolved the problem with the
>fasta output. Thanks a lot for your help.
>
>
>
>
>HB
>
>
>
>
>
>
>
>
>On 14-03-17 1:45 PM, "Carson Holt" <carson.holt at genetics.utah.edu> wrote:
>
>>I have attached 4 files for you to place in the .../maker/Widgets/
>>directory.
>>
>>The *blast.pm files will suppress the BLAST+ failures you are getting
>>(alternatively you can just downgrade to BLAST 2.27 to get the same
>>effect).  BLAST 2.29 gives a lot of warnings etc., which you can ignore.
>>In the latest release NCBI redid all their warnings and error codes so it
>>spits out a lot of garbage and fails with different messages than it did
>>before.  For example BLAST now warns you every time it encounter a fasta
>>header with a comment (virtually every fasta entry in existence falls in
>>this category), so your screen will be awash with meaningless warning
>>messages.
>>
>>The fgenesh.pm file will fix the other failure, which only occurs if you
>>use fgenesh simultaneously with the est_fustion=1 option.  No other
>>predictors are affected.
>>
>>Thanks,
>>Carson
>>
>>
>>On 3/14/14, 5:14 PM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:
>>
>>>Dear  Carson
>>>
>>>Sorry for the late reply. I was away for a couple of days. I have
>>>uploaded
>>>the out put files plus control and error output on the FTP site that you
>>>provided
>>>The user ID is borhanh
>>>
>>>I used blast+ for this run.
>>>
>>>
>>>
>>>
>>>Regards
>>>
>>>
>>>HB
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>On 14-03-13 10:00 AM, "Carson Holt" <carson.holt at genetics.utah.edu>
>>>wrote:
>>>
>>>>Just resending this to the correct maker-devel address.  Please when
>>>>replying, do not CC the incorrect maker-devel-bounce address.
>>>>
>>>>Thanks,
>>>>Carson
>>>>
>>>>
>>>>On 3/13/14, 9:56 AM, "Carson Holt" <carson.holt at genetics.utah.edu>
>>>>wrote:
>>>>
>>>>>FGENESH is not a heavily used tool, so depending on which version it
>>>>>is
>>>>>(either too old or too new), output might be slightly different which
>>>>>could cause incorrect parsing. Could you tar up your maker.output
>>>>>folder,
>>>>>and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>>>>>(send me either your user/guest ID after you upload).
>>>>>
>>>>>For the BLAST error, use BLAST+ instead.  You are using blastall which
>>>>>is
>>>>>the old legacy version of NCBI BLAST.  You can do this by setting the
>>>>>blast type in maker_bopts.ctl and the location of executables in
>>>>>maker_exe.ctl.
>>>>>
>>>>>Thanks,
>>>>>Carson
>>>>>
>>>>>
>>>>>
>>>>>On 3/12/14, 11:58 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA>
>>>>>wrote:
>>>>>
>>>>>>Dear Maker users
>>>>>>
>>>>>>
>>>>>>I ran maker (2.31) on a fungal genome and found out that it inserted
>>>>>>the
>>>>>>word SCLAR   followed by a pair of bracket like this (0x22de7020)
>>>>>>inserted in the nucleotide sequence of some of the genes. This seems
>>>>>>to
>>>>>>be related to transcripts predicted by fgenesh_masked.
>>>>>>
>>>>>>
>>>>>>Here is an example for one of the genes
>>>>>>
>>>>>>
>>>>>>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript
>>>>>>>offset:0 AE
>>>>>>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651
>>>>>>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23
>>>>>>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA
>>>>>>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG
>>>>>>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC
>>>>>>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT
>>>>>>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC
>>>>>>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT
>>>>>>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA
>>>>>>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA
>>>>>>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT
>>>>>>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT
>>>>>>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC
>>>>>>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG
>>>>>>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG
>>>>>>TTTCGACAAGC
>>>>>>
>>>>>>The same genome sequence was used for the first round of maker (2.10)
>>>>>>without such problem. I checked the sequence for the scaffold related
>>>>>>to
>>>>>>one of the affected transcripts and there was no error in the
>>>>>>sequence.
>>>>>>I am not sure what is causing this. The only error that I could spot
>>>>>>in
>>>>>>the output error file is the following
>>>>>>
>>>>>>
>>>>>>[blastall] FATAL ERROR:  search cannot proceed due to errors in all
>>>>>>contexts/frames of query sequences.
>>>>>>
>>>>>>
>>>>>>
>>>>>>Your help is appreciated
>>>>>>
>>>>>>
>>>>>>
>>>>>>HB
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From marc.hoeppner at imbim.uu.se  Mon Mar 24 05:08:25 2014
From: marc.hoeppner at imbim.uu.se (=?iso-8859-1?Q?Marc_H=F6ppner?=)
Date: Mon, 24 Mar 2014 10:08:25 +0000
Subject: [maker-devel] Annotations from proteins, follow-up
Message-ID: <10AFC7D0-82BA-4527-9B77-80DC4BE80CFD@imbim.uu.se>

Hi,

I had previously inquired about protein-based gene building (for example to create a training set for SNAP). This is currently possible with Maker (2.31), but I noticed a limitation. Specifically, I tend to run Maker once to generate all the raw computes (protein and set alignments, mostly). I then separate these out into GFF files that I can store away and use in various combinations of settings and data in parallel. 

However, the protein2genome option does not seem to work off pre-aligned protein data (e.g. protein2genome.gff produced with Maker). Is that intentional and is there a work-around? Or is the only option to run this with fasta files?

Cheers,

Marc


Marc P. Hoeppner, PhD

Department for Medical Biochemistry and Microbiology
Uppsala University, Sweden
marc.hoeppner at imbim.uu.se


From sujaikumar at gmail.com  Mon Mar 24 09:15:16 2014
From: sujaikumar at gmail.com (Sujai)
Date: Mon, 24 Mar 2014 14:15:16 +0000
Subject: [maker-devel] Dashes in transcript predictions
Message-ID: <CAFADFFt-Af82itPN8kXv1Ozh_9K1YxO+9NWBYkDW2aR4jP4yFg@mail.gmail.com>

Dear Maker Team

On a recent run with maker 2.31, I noticed that a couple of the transcripts
had dashes/hyphens in them.

Example:
>snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript offset:261
AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAATTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG
AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATTCCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT
GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGACCATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG
GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTTACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT
AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCCTTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG
ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAAATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT
AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAACCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT
TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA

The protein prediction for this transcript is ok:

>snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25
eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCVMTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY
DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKNKKMVVWVVSSLPSAAIRNAKRRINEQSSHV

Is this a known bug? I tried searching for "dash|hyphen" in the email list
but couldn't find anything else.

Best wishes,

- Sujai

ps. I pulled out just this one contig and ran maker on it. all the
.maker.output files are attached.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140324/c626ff64/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nGt.0.3.035610.maker.output.tgz
Type: application/x-gzip
Size: 45641 bytes
Desc: not available
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140324/c626ff64/attachment.tgz>

From carsonhh at gmail.com  Mon Mar 24 11:49:46 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Mar 2014 10:49:46 -0600
Subject: [maker-devel] Dashes in transcript predictions
In-Reply-To: <CAFADFFt-Af82itPN8kXv1Ozh_9K1YxO+9NWBYkDW2aR4jP4yFg@mail.gmail.com>
References: <CAFADFFt-Af82itPN8kXv1Ozh_9K1YxO+9NWBYkDW2aR4jP4yFg@mail.gmail.com>
Message-ID: <CF55BD0D.B01C%carsonhh@gmail.com>

I've actually never seen that before, but looking through your output it
appears to be specifically caused by setting correct_est_fusion=1, and how
it interacts with some features of your dataset.

I've attached a patch in the form of a file you can use to replace
.../maker/lib/maker/join.pm.  I'm also going to add it to the MAKER
download.

Thanks,
Carson


From:  Sujai <sujaikumar at gmail.com>
Date:  Monday, March 24, 2014 at 8:15 AM
To:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Dashes in transcript predictions

Dear Maker Team

On a recent run with maker 2.31, I noticed that a couple of the transcripts
had dashes/hyphens in them.

Example:
>snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript offset:261
AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAA
TTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG
AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATT
CCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT
GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGAC
CATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG
GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTT
ACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT
AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCC
TTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG
ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAA
ATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT
AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAA
CCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT
TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA

The protein prediction for this transcript is ok:

>snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25 eAED:0.25
QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCV
MTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY
DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKN
KKMVVWVVSSLPSAAIRNAKRRINEQSSHV

Is this a known bug? I tried searching for "dash|hyphen" in the email list
but couldn't find anything else.

Best wishes,

- Sujai

ps. I pulled out just this one contig and ran maker on it. all the
.maker.output files are attached.


_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140324/ebc5d81c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: join.pm
Type: text/x-perl-script
Size: 18644 bytes
Desc: not available
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140324/ebc5d81c/attachment.bin>

From carsonhh at gmail.com  Mon Mar 24 12:05:15 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Mar 2014 11:05:15 -0600
Subject: [maker-devel] Annotations from proteins, follow-up
Message-ID: <CF55BE79.B028%carsonhh@gmail.com>

It not so much intentional as it is a a limitation of the information in
GFF3 format alignments. Right now protein2genome for Eukaryotes will only
try and make exonerate derived alignments work because they have been
polished around splice sites and MAKER still has access to the original
protein sequence and alignment cigar string fro additional filtering, etc.
 With GFF3 pass-through the algorithm doesn't know nearly as much about
what is passed in. For example the protein sequence is gone, cigar
alignment strings are rarely included (Gap= attribute in GFF3), and it's
not always clear if the  alignment was polished for splice sites.  Also
since protein2genome=1 is expected to be used only to generate an initial
training set, and not for final annotations, this is considered a
reasonable restriction.

If you still really want to force protein alignments from a GFF3 to be
considered as potential models, you could put them in as pred_gff.  In
which case they will always be considered as potential models.  Of course
it will be relatively ugly because you lack things I mentioned before such
as the alignment cigar string and original protein sequence that are
normally used to filter protein2genome results for inclusion as models.

--Carson


On 3/24/14, 4:08 AM, "Marc H?ppner" <marc.hoeppner at imbim.uu.se> wrote:

>Hi,
>
>I had previously inquired about protein-based gene building (for example
>to create a training set for SNAP). This is currently possible with Maker
>(2.31), but I noticed a limitation. Specifically, I tend to run Maker
>once to generate all the raw computes (protein and set alignments,
>mostly). I then separate these out into GFF files that I can store away
>and use in various combinations of settings and data in parallel.
>
>However, the protein2genome option does not seem to work off pre-aligned
>protein data (e.g. protein2genome.gff produced with Maker). Is that
>intentional and is there a work-around? Or is the only option to run this
>with fasta files?
>
>Cheers,
>
>Marc
>
>
>Marc P. Hoeppner, PhD
>
>Department for Medical Biochemistry and Microbiology
>Uppsala University, Sweden
>marc.hoeppner at imbim.uu.se
>
>
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Mon Mar 24 13:15:39 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Mar 2014 12:15:39 -0600
Subject: [maker-devel] Dashes in transcript predictions
In-Reply-To: <CF55BD0D.B01C%carsonhh@gmail.com>
References: <CAFADFFt-Af82itPN8kXv1Ozh_9K1YxO+9NWBYkDW2aR4jP4yFg@mail.gmail.com>
	<CF55BD0D.B01C%carsonhh@gmail.com>
Message-ID: <CF55C7D4.B05A%carsonhh@gmail.com>

One more note on this.  The sequence is actually fully correct if you just
remove the '-' characters.  So if you don't want to rerun MAKER with the
patch, then you can use the attached script to just repair the transcript
file by removing the '-' characters.  Your GFF3 files and proteins files
should already be correct as is.

Usage --> perl fix_dash transcript_file.fasta > new_file.fasta

You may need to place the script in the .../maker/bin/ directory so it can
detect BioPerl if you don't have BioPerl installed system wide.

Thanks,
Carson

From:  Carson Holt <carsonhh at gmail.com>
Date:  Monday, March 24, 2014 at 10:49 AM
To:  Sujai <sujaikumar at gmail.com>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Dashes in transcript predictions

I've actually never seen that before, but looking through your output it
appears to be specifically caused by setting correct_est_fusion=1, and how
it interacts with some features of your dataset.

I've attached a patch in the form of a file you can use to replace
.../maker/lib/maker/join.pm.  I'm also going to add it to the MAKER
download.

Thanks,
Carson


From:  Sujai <sujaikumar at gmail.com>
Date:  Monday, March 24, 2014 at 8:15 AM
To:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Dashes in transcript predictions

Dear Maker Team

On a recent run with maker 2.31, I noticed that a couple of the transcripts
had dashes/hyphens in them.

Example:
>snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript offset:261
AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAA
TTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG
AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATT
CCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT
GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGAC
CATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG
GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTT
ACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT
AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCC
TTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG
ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAA
ATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT
AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAA
CCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT
TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA

The protein prediction for this transcript is ok:

>snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25 eAED:0.25
QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCV
MTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY
DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKN
KKMVVWVVSSLPSAAIRNAKRRINEQSSHV

Is this a known bug? I tried searching for "dash|hyphen" in the email list
but couldn't find anything else.

Best wishes,

- Sujai

ps. I pulled out just this one contig and ran maker on it. all the
.maker.output files are attached.


_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m
aker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140324/0a71390d/attachment.html>

From sujaikumar at gmail.com  Mon Mar 24 13:17:02 2014
From: sujaikumar at gmail.com (Sujai)
Date: Mon, 24 Mar 2014 18:17:02 +0000
Subject: [maker-devel] Dashes in transcript predictions
In-Reply-To: <CF55C7D4.B05A%carsonhh@gmail.com>
References: <CAFADFFt-Af82itPN8kXv1Ozh_9K1YxO+9NWBYkDW2aR4jP4yFg@mail.gmail.com>
	<CF55BD0D.B01C%carsonhh@gmail.com> <CF55C7D4.B05A%carsonhh@gmail.com>
Message-ID: <CAFADFFs6KYiZ8rmfEwYVCYbGymJOUXHVcKVShscBBjjCR3q2fA@mail.gmail.com>

Wow. That was a super quick response. Thanks very much for confirming the
problem and the fixes!


On 24 March 2014 18:15, Carson Holt <carsonhh at gmail.com> wrote:

> One more note on this.  The sequence is actually fully correct if you just
> remove the '-' characters.  So if you don't want to rerun MAKER with the
> patch, then you can use the attached script to just repair the transcript
> file by removing the '-' characters.  Your GFF3 files and proteins files
> should already be correct as is.
>
> Usage --> perl fix_dash transcript_file.fasta > new_file.fasta
>
> You may need to place the script in the .../maker/bin/ directory so it can
> detect BioPerl if you don't have BioPerl installed system wide.
>
> Thanks,
> Carson
>
> From: Carson Holt <carsonhh at gmail.com>
> Date: Monday, March 24, 2014 at 10:49 AM
> To: Sujai <sujaikumar at gmail.com>, "maker-devel at yandell-lab.org" <
> maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Dashes in transcript predictions
>
> I've actually never seen that before, but looking through your output it
> appears to be specifically caused by setting correct_est_fusion=1, and how
> it interacts with some features of your dataset.
>
> I've attached a patch in the form of a file you can use to replace
> .../maker/lib/maker/join.pm.  I'm also going to add it to the MAKER
> download.
>
> Thanks,
> Carson
>
>
> From: Sujai <sujaikumar at gmail.com>
> Date: Monday, March 24, 2014 at 8:15 AM
> To: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: [maker-devel] Dashes in transcript predictions
>
> Dear Maker Team
>
> On a recent run with maker 2.31, I noticed that a couple of the
> transcripts had dashes/hyphens in them.
>
> Example:
> >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript
> offset:261 AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
> TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAATTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG
> AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATTCCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT
> GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGACCATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG
> GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTTACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT
> AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCCTTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG
> ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAAATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT
> AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAACCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT
> TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA
>
> The protein prediction for this transcript is ok:
>
> >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25
> eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
>
> MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCVMTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY
>
> DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKNKKMVVWVVSSLPSAAIRNAKRRINEQSSHV
>
> Is this a known bug? I tried searching for "dash|hyphen" in the email list
> but couldn't find anything else.
>
> Best wishes,
>
> - Sujai
>
> ps. I pulled out just this one contig and ran maker on it. all the
> .maker.output files are attached.
>
>
>  _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140324/88aabc4b/attachment.html>

From diana.garnica at anu.edu.au  Mon Mar 24 18:11:01 2014
From: diana.garnica at anu.edu.au (Diana Garnica Moreno)
Date: Mon, 24 Mar 2014 23:11:01 +0000
Subject: [maker-devel] Problem extracting fasta from a GFF file generated
	with MAKER
Message-ID: <1264f0423dbe47b18ed3bc8b49c5b31d@HKXPR06MB101.apcprd06.prod.outlook.com>

Hi there,


We recently assembled a fungal genome using MAKER and we got the gene models. and the corresponding transcripts, predicted proteins and GFF files. However, the predicted proteins do not have the stop codon included so I do not know which proteins are complete and which ones are incomplete at the 3' end. To solve that I have used different programs to extract the fasta sequence of the CDSs given the gff file and the genome sequence. The problem is that with the tools I have tested I get the right sequence for some of the proteins and wrong sequences for others (with multiple stop codons for example). I am not sure why it happens and since it happens with different tools (different python scripts and even gffread from cufflink) I do not know where is the problem. Could you please give me some advice on how to extract the right sequences with the stop codons included?


Thanks!


Diana
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140324/352e150d/attachment.html>

From carsonhh at gmail.com  Mon Mar 24 18:25:09 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Mar 2014 17:25:09 -0600
Subject: [maker-devel] Problem extracting fasta from a GFF file
 generated with MAKER
Message-ID: <CF56185B.B0E1%carsonhh@gmail.com>

You are probably getting the wrong proteins from your scripts because you
are not taking into account the 5' and 3' UTR in the transcript.

For example
>snap_masked-contig-processed-gene-0.2-mRNA-1 transcript offset:261 AED:0.25
eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|22|240

The 5' UTR is 261bp and the 3' UTR is 22bp long.  Both would have to be
trimmed before translating the transcript into a protein. Once they are
trimmed you can use frame 0 for the translation.

The fasta_tool that comes with MAKER can be used to quickly trim the UTR.

Example:
fasta_tool maker_transcripts.fasta --trim_maker_utr

Then you can try your other scripts again.

Thanks,
Carson


From:  Diana Garnica Moreno <diana.garnica at anu.edu.au>
Date:  Monday, March 24, 2014 at 5:11 PM
To:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Problem extracting fasta from a GFF file generated
with MAKER

Hi there,


We recently assembled a fungal genome using MAKER and we got the gene
models. and the corresponding transcripts, predicted proteins and GFF files.
However, the predicted proteins do not have the stop codon included so I do
not know which proteins are complete and which ones are incomplete at the 3'
end. To solve that I have used different programs to extract the fasta
sequence of the CDSs given the gff file and the genome sequence. The problem
is that with the tools I have tested I get the right sequence for some of
the proteins and wrong sequences for others (with multiple stop codons for
example). I am not sure why it happens and since it happens with different
tools (different python scripts and even gffread from cufflink) I do not
know where is the problem. Could you please give me some advice on how to
extract the right sequences with the stop codons included?


Thanks!


Diana
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140324/2bcbc369/attachment.html>

From daniel.standage at gmail.com  Tue Mar 25 08:24:14 2014
From: daniel.standage at gmail.com (Daniel Standage)
Date: Tue, 25 Mar 2014 09:24:14 -0400
Subject: [maker-devel] Maker iPlant image
Message-ID: <CAOfLjHVa1r8hdF0GK+gp59pmfZb7qZLO5rF0qwK7b+=hQ0CcrQ@mail.gmail.com>

Greetings,

I launched an instance from the Maker-P 2.28 image
(c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location
of the installed software. All I could find was an example data set on the
Desktop, but the "maker" program was not in the path and the contents of
"/usr/local/src" are empty. Could you please advise on how to run Maker in
iPlant Atmosphere? Thanks.

--
Daniel S. Standage
Ph.D. Candidate
Computational Genome Science Laboratory
Indiana University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140325/6766e38e/attachment.html>

From ernesto at ebi.ac.uk  Tue Mar 25 05:10:59 2014
From: ernesto at ebi.ac.uk (ernesto lowy gallego)
Date: Tue, 25 Mar 2014 10:10:59 +0000
Subject: [maker-devel] Incorrect translation start codon
Message-ID: <53315633.2070702@ebi.ac.uk>

Hi,

I have been inspecting the MAKER predictions and I detected a situation 
which appears with a certain frequency.
(See attached Apollo screenshot illustrating the situation I am going to 
describe):

Let's say that there is est2genome evidence supporting the prediction of 
the 5' UTR region, I have realized that in some of these transcripts 
with 5'UTR, MAKER is not capable of identifying the right downstream ATG 
protein start codon and considers a TTG codon (coding for L) as the 
incorrect protein start. The proper ATG codon start is further 
downstream, as the Ab-initio predictors (SNAP+AUGUSTUS) correctly 
predict in this case (see the attached screenshot)

Any comments on this?

Thanks!

ernesto

-- 
Developer

VectorBase | Ensembl Genomes

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2014-03-25 at 09.34.16.png
Type: image/png
Size: 32220 bytes
Desc: not available
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140325/f9ae69ec/attachment.png>

From carsonhh at gmail.com  Tue Mar 25 09:19:22 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 25 Mar 2014 08:19:22 -0600
Subject: [maker-devel] Incorrect translation start codon
In-Reply-To: <53315633.2070702@ebi.ac.uk>
References: <53315633.2070702@ebi.ac.uk>
Message-ID: <CF56EBF0.B109%carsonhh@gmail.com>

This is caused by BioPerl's is_start_codon method and default codon table
returning true for non-canonical start codons.  It was resolved some time
ago (See previous discussion -->
https://groups.google.com/forum/#!topic/maker-devel/S0j1fJ4LjVY ).  Make
sure you are using the most recent version of MAKER (currently 2.31).

Thanks,
Carson


https://groups.google.com/forum/#!topic/maker-devel/S0j1fJ4LjVY

On 3/25/14, 4:10 AM, "ernesto lowy gallego" <ernesto at ebi.ac.uk> wrote:

>Hi,
>
>I have been inspecting the MAKER predictions and I detected a situation
>which appears with a certain frequency.
>(See attached Apollo screenshot illustrating the situation I am going to
>describe):
>
>Let's say that there is est2genome evidence supporting the prediction of
>the 5' UTR region, I have realized that in some of these transcripts
>with 5'UTR, MAKER is not capable of identifying the right downstream ATG
>protein start codon and considers a TTG codon (coding for L) as the
>incorrect protein start. The proper ATG codon start is further
>downstream, as the Ab-initio predictors (SNAP+AUGUSTUS) correctly
>predict in this case (see the attached screenshot)
>
>Any comments on this?
>
>Thanks!
>
>ernesto
>
>-- 
>Developer
>
>VectorBase | Ensembl Genomes
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Tue Mar 25 09:24:36 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 25 Mar 2014 08:24:36 -0600
Subject: [maker-devel] Maker iPlant image
In-Reply-To: <CAOfLjHVa1r8hdF0GK+gp59pmfZb7qZLO5rF0qwK7b+=hQ0CcrQ@mail.gmail.com>
References: <CAOfLjHVa1r8hdF0GK+gp59pmfZb7qZLO5rF0qwK7b+=hQ0CcrQ@mail.gmail.com>
Message-ID: <CF56ED91.B119%carsonhh@gmail.com>

--> /opt/maker/bin/maker

It looks like most preinstalled software is under /opt on the image.

Thanks,
Carson


From:  Daniel Standage <daniel.standage at gmail.com>
Date:  Tuesday, March 25, 2014 at 7:24 AM
To:  Maker Mailing List <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Maker iPlant image

Greetings,

I launched an instance from the Maker-P 2.28 image
(c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location
of the installed software. All I could find was an example data set on the
Desktop, but the "maker" program was not in the path and the contents of
"/usr/local/src" are empty. Could you please advise on how to run Maker in
iPlant Atmosphere? Thanks.

--
Daniel S. Standage
Ph.D. Candidate
Computational Genome Science Laboratory
Indiana University
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140325/208a9c20/attachment.html>

From darasappan at gmail.com  Tue Mar 25 11:33:59 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Tue, 25 Mar 2014 11:33:59 -0500
Subject: [maker-devel] maker to EvidenceModeler
Message-ID: <08324618-6422-4E24-99D1-D05E64420FFB@gmail.com>

Hi Carson and others,

Is there an easy tool/pipeline available as part of maker utilities to convert maker and SNAP output to files acceptable by EvidenceModeler?

It looks like it also needs just gff files, but with a few tweaks. EvidenceModeler seems better equipped to handle PASA annotation results than maker results.

Thanks
Dhivya


From barry.utah at gmail.com  Tue Mar 25 12:51:38 2014
From: barry.utah at gmail.com (Barry Moore)
Date: Tue, 25 Mar 2014 11:51:38 -0600
Subject: [maker-devel] Problem extracting fasta from a GFF file
	generated	with MAKER
In-Reply-To: <1264f0423dbe47b18ed3bc8b49c5b31d@HKXPR06MB101.apcprd06.prod.outlook.com>
References: <1264f0423dbe47b18ed3bc8b49c5b31d@HKXPR06MB101.apcprd06.prod.outlook.com>
Message-ID: <B283D045-3B8D-4A0C-82F8-7C2DB291B065@genetics.utah.edu>

Hi Diana,

There is a Perl library - The Genome Annotation Library - that is designed to make writing code like this easy.  I just added a script to this library called gal_CDS_sequence which you would run like this:

gal_CDS_sequence --translate genes.gff3 genome.fasta

The focus of GAL is to try to make writing quick scripts like this easy, so if you're comfortable with a bit of Perl, you can modify existing scripts and write new ones to search, iterate through, and traverse the relationships of features in GFF3 files.

You can access the library here:

http://www.sequenceontology.org/software/GAL.html

Support for GAL is available via the SO mailing list:

https://lists.sourceforge.net/lists/listinfo/song-devel

Hope that helps,

Barry

On Mar 24, 2014, at 5:11 PM, Diana Garnica Moreno wrote:

> Hi there,
> 
> We recently assembled a fungal genome using MAKER and we got the gene models. and the corresponding transcripts, predicted proteins and GFF files. However, the predicted proteins do not have the stop codon included so I do not know which proteins are complete and which ones are incomplete at the 3' end. To solve that I have used different programs to extract the fasta sequence of the CDSs given the gff file and the genome sequence. The problem is that with the tools I have tested I get the right sequence for some of the proteins and wrong sequences for others (with multiple stop codons for example). I am not sure why it happens and since it happens with different tools (different python scripts and even gffread from cufflink) I do not know where is the problem. Could you please give me some advice on how to extract the right sequences with the stop codons included?
> 
> Thanks!
> 
> Diana
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140325/fb1d5733/attachment.html>

From kchilds at plantbiology.msu.edu  Wed Mar 26 09:21:36 2014
From: kchilds at plantbiology.msu.edu (Childs, Kevin)
Date: Wed, 26 Mar 2014 14:21:36 +0000
Subject: [maker-devel] Maker iPlant image
In-Reply-To: <CF56ED91.B119%carsonhh@gmail.com>
References: <CAOfLjHVa1r8hdF0GK+gp59pmfZb7qZLO5rF0qwK7b+=hQ0CcrQ@mail.gmail.com>
	<CF56ED91.B119%carsonhh@gmail.com>
Message-ID: <BE1EEBCF-58A6-4045-B169-699EB189D299@plantbiology.msu.edu>

Daniel,

There are a few small issues with the MAKER-P_2.28 image at iPlant.  I have been using the image successfully for more than a month.  I typically set several environmental variables immediately after starting an ssh session.

export PATH=$PATH:/opt/maker/bin:/opt/maker/exe/snap:/opt/maker/exe/augustus/bin:/opt/maker/exe/augustus/scripts/
export ZOE=/opt/maker/exe/snap
export AUGUSTUS_CONFIG_PATH=/opt/maker/exe/augustus/config
export TMP=/tmp

The image will allow you to train SNAP, but training Augustus is not possible with the current image.  Augustus training requires blat which was not installed in this image.  There is also an issue where training Augustus requires that you write to the /opt/maker/exe/augustus/config/species/ directory which requires some inconvenient directory hacking.  I've worked this all out on a forked image (currently private), but I have not had the time to contact Joshua Stein to suggest some modifications to his public image.

Augustus should work with a stock hmm on this image.

I have not attempted to use GeneMark, and of course, fgenesh is a completely different story.

Kevin Childs


---
Kevin Childs, PhD

Assistant Professor - Fixed Term
Plant Biology Department
Michigan State University

kchilds at plantbiology.msu.edu
517-775-2844 (m)
517-353-5969 (l)

On Mar 25, 2014, at 10:24 AM, Carson Holt wrote:

> --> /opt/maker/bin/maker
> 
> It looks like most preinstalled software is under /opt on the image.
> 
> Thanks,
> Carson
> 
> 
> From: Daniel Standage <daniel.standage at gmail.com>
> Date: Tuesday, March 25, 2014 at 7:24 AM
> To: Maker Mailing List <maker-devel at yandell-lab.org>
> Subject: [maker-devel] Maker iPlant image
> 
> Greetings,
> 
> I launched an instance from the Maker-P 2.28 image (c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location of the installed software. All I could find was an example data set on the Desktop, but the "maker" program was not in the path and the contents of "/usr/local/src" are empty. Could you please advise on how to run Maker in iPlant Atmosphere? Thanks.
> 
> --
> Daniel S. Standage
> Ph.D. Candidate
> Computational Genome Science Laboratory
> Indiana University
> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From steinj at cshl.edu  Wed Mar 26 13:41:37 2014
From: steinj at cshl.edu (Stein, Joshua)
Date: Wed, 26 Mar 2014 18:41:37 +0000
Subject: [maker-devel] Maker iPlant image
In-Reply-To: <BE1EEBCF-58A6-4045-B169-699EB189D299@plantbiology.msu.edu>
References: <CAOfLjHVa1r8hdF0GK+gp59pmfZb7qZLO5rF0qwK7b+=hQ0CcrQ@mail.gmail.com>
	<CF56ED91.B119%carsonhh@gmail.com>
	<BE1EEBCF-58A6-4045-B169-699EB189D299@plantbiology.msu.edu>
Message-ID: <A6505FF9-06C4-4EB2-949B-EDA9113F64E3@cshl.edu>

Also please note that there is a tutorial available here, particularly important if you want to use in MPI mode.
https://pods.iplantcollaborative.org/wiki/display/sciplant/MAKER-P+Atmosphere+Tutorial

Josh

Joshua Stein, PhD
Manager, Sci. Informatics III
Cold Spring Harbor Laboratory
steinj at cshl.edu
http://ware.cshl.org/


On Mar 26, 2014, at 10:20 AM, "Childs, Kevin" <kchilds at plantbiology.msu.edu>
 wrote:

> Daniel,
> 
> There are a few small issues with the MAKER-P_2.28 image at iPlant.  I have been using the image successfully for more than a month.  I typically set several environmental variables immediately after starting an ssh session.
> 
> export PATH=$PATH:/opt/maker/bin:/opt/maker/exe/snap:/opt/maker/exe/augustus/bin:/opt/maker/exe/augustus/scripts/
> export ZOE=/opt/maker/exe/snap
> export AUGUSTUS_CONFIG_PATH=/opt/maker/exe/augustus/config
> export TMP=/tmp
> 
> The image will allow you to train SNAP, but training Augustus is not possible with the current image.  Augustus training requires blat which was not installed in this image.  There is also an issue where training Augustus requires that you write to the /opt/maker/exe/augustus/config/species/ directory which requires some inconvenient directory hacking.  I've worked this all out on a forked image (currently private), but I have not had the time to contact Joshua Stein to suggest some modifications to his public image.
> 
> Augustus should work with a stock hmm on this image.
> 
> I have not attempted to use GeneMark, and of course, fgenesh is a completely different story.
> 
> Kevin Childs
> 
> 
> ---
> Kevin Childs, PhD
> 
> Assistant Professor - Fixed Term
> Plant Biology Department
> Michigan State University
> 
> kchilds at plantbiology.msu.edu
> 517-775-2844 (m)
> 517-353-5969 (l)
> 
> On Mar 25, 2014, at 10:24 AM, Carson Holt wrote:
> 
>> --> /opt/maker/bin/maker
>> 
>> It looks like most preinstalled software is under /opt on the image.
>> 
>> Thanks,
>> Carson
>> 
>> 
>> From: Daniel Standage <daniel.standage at gmail.com>
>> Date: Tuesday, March 25, 2014 at 7:24 AM
>> To: Maker Mailing List <maker-devel at yandell-lab.org>
>> Subject: [maker-devel] Maker iPlant image
>> 
>> Greetings,
>> 
>> I launched an instance from the Maker-P 2.28 image (c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location of the installed software. All I could find was an example data set on the Desktop, but the "maker" program was not in the path and the contents of "/usr/local/src" are empty. Could you please advise on how to run Maker in iPlant Atmosphere? Thanks.
>> 
>> --
>> Daniel S. Standage
>> Ph.D. Candidate
>> Computational Genome Science Laboratory
>> Indiana University
>> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at yandell-lab.org
> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org


From brubin at fieldmuseum.org  Sat Mar 29 11:24:05 2014
From: brubin at fieldmuseum.org (Benjamin Rubin)
Date: Sat, 29 Mar 2014 11:24:05 -0500
Subject: [maker-devel] Missing UTRs in GFF
Message-ID: <CAKpVPBLQ9i9qKv3e=fpD+pU9YFTyUXUFQUiMh0j0N9aDgvSRcQ@mail.gmail.com>

I have annotated a eukaryotic genome with MAKER 2.30. I recently realized
that there are a few genes in the GFF file produced by gff3_merge with
inconsistencies in the annotated CDS and UTRs. For most of my genes, the
UTRs have their own lines in the GFF file. However, for the problematic
genes, the UTRs are not specified in the GFF file and all exons are
annotated as CDS. The UTRs do appear in the gene header and the protein
sequences are the correct length (do not include the UTR). I have attached
an example from the GFF file.

Is this a known problem, or have I done something wrong? Is there an easy
way to fix the GFF file?

Thanks for your help,
Ben

-- 
_____________________________________________________
Benjamin ER Rubin
PhD Candidate
Committee on Evolutionary Biology
University of Chicago
benrubin.org

Division of Insects
Zoology Department
Field Museum of Natural History
1400 South Lake Shore Drive
Chicago, IL 60605
USA
Office: (312) 665-7776
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140329/0f93b3b2/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: missing_utr.gff
Type: application/octet-stream
Size: 2933 bytes
Desc: not available
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140329/0f93b3b2/attachment.obj>

From mhinsley at ebi.ac.uk  Mon Mar 31 05:20:10 2014
From: mhinsley at ebi.ac.uk (Malcolm Hinsley)
Date: Mon, 31 Mar 2014 11:20:10 +0100
Subject: [maker-devel] putative preponderance of short exons??
Message-ID: <5339415A.1020509@ebi.ac.uk>

Hi

I've run Maker on a de novo assembly of a species of fly and then ran 
some simple statistics (intron/ exon/ CDS length, exons per gene)  over 
the GFF output and compared with a couple of other species.
It all looks good except that there is a surprising number of very short 
exons (6000 < 50 bp, 3500 < 30 bp, 878< 10 bp, 87k total - see attached 
pdf), black is drosophilia, red is A.gambiae, green is with 5' and 3' 
exons removed).

I ran est2genome & protein2genome, then 3 cycles of Augustus and SNAP.  
I'm using maker 2.31 (unpatched).

Anecdotally, these short exons appear without EST or protein evidence 
and they all line up with canonical splice sequences (GT----AG).
(but i've only looked at a few using Apollo).

While there's no requirement that exons should be longer I'm suspicious 
of this as there must be some evolutionary relationship between these 
species.
I've compared with a another species annotated with Maker (using SNAP 
and Augustus)  which is more distant (not yet publicly available), and 
the same pattern of short exons is present.
I wondered if they were created to fulfil the need for start/stop 
codons, but this does not appear to be the case (mostly they are mid-gene).


Is there some way to adjust the predictors eg to require external 
evidence? or anything else you could suggest? ... I can see the 
following in the tutorial but I'm not sure how they could help:

pred_flank=200 #flank for extending evidence clusters sent to gene predictors
pred_stats=0 #report AED and QI statistics for all predictions as well as models
AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
min_protein=0 #require at least this many amino acids in predicted proteins
alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no
always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no


thanks

-- 
malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD
United Kingdom

-------------- next part --------------
A non-text attachment was scrubbed...
Name: exon_53.pdf
Type: application/pdf
Size: 10618 bytes
Desc: not available
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140331/edd22fe9/attachment.pdf>

From carsonhh at gmail.com  Mon Mar 31 08:52:15 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 31 Mar 2014 07:52:15 -0600
Subject: [maker-devel] putative preponderance of short exons??
In-Reply-To: <5339415A.1020509@ebi.ac.uk>
References: <5339415A.1020509@ebi.ac.uk>
Message-ID: <CF5ECE08.B30C%carsonhh@gmail.com>

The intron/exon structure is determined by SNAP, Augustus, etc.  It is not
affected by any of the maker parameters.  Only evidence alignments are
affected by the maker settings.  You can try retraining or manually
editing the HMMs, but they might also be regions where your assembly is
incorrect and those algorithms make short exons in order to make a
structure work without getting stop codons mid gene.

Thanks,
Carson


On 3/31/14, 4:20 AM, "Malcolm Hinsley" <mhinsley at ebi.ac.uk> wrote:

>Hi
>
>I've run Maker on a de novo assembly of a species of fly and then ran
>some simple statistics (intron/ exon/ CDS length, exons per gene)  over
>the GFF output and compared with a couple of other species.
>It all looks good except that there is a surprising number of very short
>exons (6000 < 50 bp, 3500 < 30 bp, 878< 10 bp, 87k total - see attached
>pdf), black is drosophilia, red is A.gambiae, green is with 5' and 3'
>exons removed).
>
>I ran est2genome & protein2genome, then 3 cycles of Augustus and SNAP.
>I'm using maker 2.31 (unpatched).
>
>Anecdotally, these short exons appear without EST or protein evidence
>and they all line up with canonical splice sequences (GT----AG).
>(but i've only looked at a few using Apollo).
>
>While there's no requirement that exons should be longer I'm suspicious
>of this as there must be some evolutionary relationship between these
>species.
>I've compared with a another species annotated with Maker (using SNAP
>and Augustus)  which is more distant (not yet publicly available), and
>the same pattern of short exons is present.
>I wondered if they were created to fulfil the need for start/stop
>codons, but this does not appear to be the case (mostly they are
>mid-gene).
>
>
>Is there some way to adjust the predictors eg to require external
>evidence? or anything else you could suggest? ... I can see the
>following in the tutorial but I'm not sure how they could help:
>
>pred_flank=200 #flank for extending evidence clusters sent to gene
>predictors
>pred_stats=0 #report AED and QI statistics for all predictions as well as
>models
>AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and
>1)
>min_protein=0 #require at least this many amino acids in predicted
>proteins
>alt_splice=0 #Take extra steps to try and find alternative splicing, 1 =
>yes, 0 = no
>always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0
>= no
>
>
>thanks
>
>-- 
>malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669
>European Bioinformatics Institute (EMBL-EBI)
>European Molecular Biology Laboratory
>Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD
>United Kingdom
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Mon Mar 31 09:37:15 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 31 Mar 2014 08:37:15 -0600
Subject: [maker-devel] Missing UTRs in GFF
In-Reply-To: <CAKpVPBLQ9i9qKv3e=fpD+pU9YFTyUXUFQUiMh0j0N9aDgvSRcQ@mail.gmail.com>
References: <CAKpVPBLQ9i9qKv3e=fpD+pU9YFTyUXUFQUiMh0j0N9aDgvSRcQ@mail.gmail.com>
Message-ID: <CF5ED8D3.B31A%carsonhh@gmail.com>

Not something I've seen before, but there was a patch for another issue that
was cause by the use of avoid_est_fusion=1, that may be related.  Try the
current stable release 2.31, and let me know if it still happens.

You can also upload the contig folder from one of the regions in question
here --> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi

Then I could verify the bug, and see if it is something that happens in the
current release.

--Carson


From:  Benjamin Rubin <brubin at fieldmuseum.org>
Date:  Saturday, March 29, 2014 at 10:24 AM
To:  <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Missing UTRs in GFF

I have annotated a eukaryotic genome with MAKER 2.30. I recently realized
that there are a few genes in the GFF file produced by gff3_merge with
inconsistencies in the annotated CDS and UTRs. For most of my genes, the
UTRs have their own lines in the GFF file. However, for the problematic
genes, the UTRs are not specified in the GFF file and all exons are
annotated as CDS. The UTRs do appear in the gene header and the protein
sequences are the correct length (do not include the UTR). I have attached
an example from the GFF file.

Is this a known problem, or have I done something wrong? Is there an easy
way to fix the GFF file?

Thanks for your help,
Ben

-- 
_____________________________________________________
Benjamin ER Rubin
PhD Candidate
Committee on Evolutionary Biology
University of Chicago
benrubin.org <http://benrubin.org>

Division of Insects
Zoology Department
Field Museum of Natural History
1400 South Lake Shore Drive
Chicago, IL 60605
USA
Office: (312) 665-7776
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140331/9116f7cb/attachment.html>

From pushplata.singh at teri.res.in  Sun Mar  2 22:29:37 2014
From: pushplata.singh at teri.res.in (Pushplata Singh)
Date: Mon, 3 Mar 2014 10:59:37 +0530
Subject: [maker-devel] Query on Hardware requirement
Message-ID: <OF837195A3.CDBC7472-ON65257C90.001D994D-65257C90.001E2DB9@teri.res.in>


Hi,

I am trying to assemble and analyse(bio-informatics) genome sequence of a
35 GB fungal genome. The raw data that has been generated from Illumina
sequencing is of  ~15 GB. Could you please suggest me the system (hardware)
requirement for installing and running Maker and ALLPATHS-LG sofrware for
the job?

Thank you
Pushplata Singh, PhD
Nanobiotechnology Centre
Biotechnology and Management of Bioresources Division
The Energy and Resources Institute
Darbari Seth Block , India Habitat Centre,Lodhi Road
New Delhi 110003 India
Phone +91 11 24682100 ext 2611
Fax +91 11 24682145


------------------------------------------------------------------------------------------------------------

Disclaimer:

The information contained in this e-mail is intended for the person or entity
to which it is addressed, and it may contain confidential and/or privileged
material. Any review or other use of this mail or taking any action based on it
by persons or entities other than the intended recipient is strictly prohibited.
If you receive this e-mail by mistake, please contact the sender, and delete all
copies of this mail.This e-mail has been scanned and verified by McAfee SaaS
Email Security, formerly MX Logic.


From dence at genetics.utah.edu  Mon Mar  3 07:11:34 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Mon, 3 Mar 2014 14:11:34 +0000
Subject: [maker-devel] Query on Hardware requirement
In-Reply-To: <OF837195A3.CDBC7472-ON65257C90.001D994D-65257C90.001E2DB9@teri.res.in>
References: <OF837195A3.CDBC7472-ON65257C90.001D994D-65257C90.001E2DB9@teri.res.in>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D68BF9@mxb2.hg.genetics.utah.edu>

Hi Pradeep, 

I think Allpaths is developed by the Broad Institute, so you'd have to check their documentation for their system requirments. MAKER is installable on Linux and Mac OS X computers. The throughput you'll be able to achieve with MAKER depends on how many processors and how much RAM the machine has. To take advantage of MAKER's ability to parallelize the annotation process, you need some version of MPI installed on your machine. MAKER can try to install MPI for you, but a manual installation is usually required. 

I hope that helps. 

Thanks,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Pushplata Singh [pushplata.singh at teri.res.in]
Sent: Sunday, March 02, 2014 10:29 PM
To: maker-devel at yandell-lab.org
Cc: Pradeep Dahiya
Subject: [maker-devel] Query on Hardware requirement

Hi,

I am trying to assemble and analyse(bio-informatics) genome sequence of a
35 GB fungal genome. The raw data that has been generated from Illumina
sequencing is of  ~15 GB. Could you please suggest me the system (hardware)
requirement for installing and running Maker and ALLPATHS-LG sofrware for
the job?

Thank you
Pushplata Singh, PhD
Nanobiotechnology Centre
Biotechnology and Management of Bioresources Division
The Energy and Resources Institute
Darbari Seth Block , India Habitat Centre,Lodhi Road
New Delhi 110003 India
Phone +91 11 24682100 ext 2611
Fax +91 11 24682145


------------------------------------------------------------------------------------------------------------

Disclaimer:

The information contained in this e-mail is intended for the person or entity
to which it is addressed, and it may contain confidential and/or privileged
material. Any review or other use of this mail or taking any action based on it
by persons or entities other than the intended recipient is strictly prohibited.
If you receive this e-mail by mistake, please contact the sender, and delete all
copies of this mail.This e-mail has been scanned and verified by McAfee SaaS
Email Security, formerly MX Logic.

_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carson.holt at genetics.utah.edu  Mon Mar  3 12:08:49 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Mon, 3 Mar 2014 19:08:49 +0000
Subject: [maker-devel] FW: error runinig agustus
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A890B159@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A890B159@SKREGIXES2.AGR.GC.CA>
Message-ID: <CF3A2120.A782%carson.holt@genetics.utah.edu>

Forwarding this to the maker-devel list.


On 3/3/14, 12:04 PM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:

>I encountered the following error while running maker (2nd annotation
>using gff file of the first maker run and trinity assembled RNA seq as
>EST)
>
>ERROR: Augustus failed
>--> rank=NA, hostname=rapa.agr.gc.ca
>
>Note : 1st run of the maker was done by Maker 2.10 and for the 2nd one I
>am using 2.31
>
>Your help is appreciated
>
>
>HB
>
>
>
>
>


From carsonhh at gmail.com  Mon Mar  3 12:11:08 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 03 Mar 2014 12:11:08 -0700
Subject: [maker-devel] FW: error runinig agustus
Message-ID: <CF3A21A5.A788%carsonhh@gmail.com>

You will need to provide more detail.  Probably the entire error log and
the maker control files.

Thanks,
Carson


On 3/3/14, 12:08 PM, "Carson Holt" <carson.holt at genetics.utah.edu> wrote:

>Forwarding this to the maker-devel list.
>
>
>On 3/3/14, 12:04 PM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:
>
>>I encountered the following error while running maker (2nd annotation
>>using gff file of the first maker run and trinity assembled RNA seq as
>>EST)
>>
>>ERROR: Augustus failed
>>--> rank=NA, hostname=rapa.agr.gc.ca
>>
>>Note : 1st run of the maker was done by Maker 2.10 and for the 2nd one I
>>am using 2.31
>>
>>Your help is appreciated
>>
>>
>>HB
>>
>>
>>
>>
>>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From sjackman at gmail.com  Tue Mar  4 19:10:42 2014
From: sjackman at gmail.com (Shaun Jackman)
Date: Tue, 4 Mar 2014 18:10:42 -0800
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
	<CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
	<CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>
Message-ID: <CADX6M3rdHPKSk4VMUpbxKSA=rhVVUk+L=dG1xMibWz1KntF2AA@mail.gmail.com>

Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for
the tip.

The rRNA genes that are found with est2genome have the feature type set to
*mRNA* and have corresponding *five_prime_UTR*, *CDS* and
*three_prime_UTR*features. Ideally the feature type would be set to
*rRNA* or *tRNA* as appropriate, and would omit the UTR and CDS features.
Is that a feature that you would be interested in adding to MAKER? The rRNA
gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is
standard, so determining the appropriate type should be straight forward.

Thanks again for your help with this. Cheers,
Shaun


On 27 February 2014 17:13, Carson Holt <carsonhh at gmail.com> wrote:

> Set single_exon=1, and the minimum size to a smaller value.  I think it's
> set to 250 right now.  Also est2genome is looking for ORF, so if there is
> none (as with tRNAs) they probably won't get picked up.
>
> --Carson
>
> Sent from my iPhone
>
> On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:
>
> Sorry, ignore my previous question. est_forward also carries forward the
> names of protein evidence and works like a charm. Thank you!
>
> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller
> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They
> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect
> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value
> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing
> these hits?
>
> organism_type=prokaryotic
> est2genome=1
> protein2genome=1
> est_forward=1
>
> Cheers,
> Shaun
>
>
> On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
>
>> Is there a corresponding protein_forward=1 option to map forward protein
>> names from protein2genome?
>>
>> Cheers,
>> Shaun
>>
>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com<//carsonhh at gmail.com>)
>> wrote:
>>
>> Sorry I meant to say prefilter on the score in the mRNA column before
>> passing the gff3 to model_gff.
>>
>> --Carson
>>
>> Sent from my iPhone
>>
>> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>
>>  What you can do is run it once with just est_forward=1 and
>> est2genome/protein2genome set to 1.  Then take those results, pass them in
>> as model_gff and use the map_forward option to then filter the results
>> based on mRNA score and that would copy names onto new gene under the
>> standard MAKER pipeline.  Eventually it?s really supposed to go into a
>> separate tool that will map genes onto new assemblies (but under the hood
>> the tool will just be calling MAKER with certain parameters restricted).  I
>> do this because if people commonly use it mixed with things like SNAP I can
>> start to get some very weird behaviors.
>>
>> Thanks,
>> Carson
>>
>>  From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>> Date: Wednesday, February 26, 2014 at 3:04 PM
>> To: Carson Holt <carsonhh at gmail.com>
>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] Mapping gene names
>>
>>  It seems that this could be a very useful option in those cases where
>> you have firm a priori knowledge of the placement of ESTs. However, while
>> trying it I note that est_forward implies that the est2genome predictor is
>> turned on, implicitly. Is this necessary for this to work? I?m after the
>> behavior you describe below where exonerate is made to try really hard
>> within a limited region to align an est, but I would not like maker to
>> produce est2genome predictions.
>>
>> In general, I think this maker_coor and est_forward is a feature set that
>> is worthy to be promoted into a documented feature.
>>
>> THanks,
>> Mikael
>>
>>  26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>>
>>  It will still work without est_forward.  It just works a little
>> differently.  Keep in mind this was a hidden feature I used to find
>> stubborn or hard to find missing genes after reassembly of a genome.
>>
>> If est_forward is provided, MAKER will parse the database to look for the
>> maker_coor tags early in the pipeline.  Then it will create a list of
>> locations to search, and it will search them even if there are no BLAST
>> results to seed the search (normally MAKER gets a BLAST result first and
>> then polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to
>> look for a match using all of chr1 as the input to exonerate even when
>> BLAST finds nothing (this is a very very slow search, but can help pick up
>> one or two stubborn genes that don?t remap well).  To allow this, MAKER
>> gives exonerate looser matching parameters (i.e. allows for single base
>> pair introns perhaps caused by assembly errors).  The logic here is that
>> given the fact that I already told MAKER that with some degree of
>> confidence I expect sequence A to map to to location X, it will try its
>> hardest to make it match.
>>
>> Without est_forward set, the maker_coor= flag still gets read in GI.pm at
>> line 1563, but only after a BLAST alignment has already seeded it to the
>> region (that BLAST result has the information in its description
>> parameter).  MAKER will then ignore seeds completely outside of maker_coor.
>> In addition any BLAST seeds that overlap maker_coor will get the search
>> space for alignment polishing adjusted to match maker_coor exactly.  Also
>> match parameters for exonerate will not be relaxed as they were with
>> est_forward.
>>
>> As you can see the behavior, is slightly different (because it?s an
>> accidental feature).
>>
>> Thanks,
>> Carson
>>
>>
>>
>>  From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>> Date: Wednesday, February 26, 2014 at 6:37 AM
>> To: Carson Holt <carsonhh at gmail.com>
>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] Mapping gene names
>>
>>  That might be a useful and time saving accidental feature. But, reading
>> the code, it seems that I need to supply maker_coor but not gene_id, as
>> well as the configuration option est_forward for this to work. Any
>> occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1
>> right?
>>
>> Mikael
>>
>>  26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>>
>>  Yes.  That should work as well as an accidental feature.
>>
>> --Carson
>>
>> Sent from my iPhone
>>
>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <
>> mikael.durling at slu.se> wrote:
>>
>> Can this use of maker_coor be used only to hint about the placement of
>> the ests, without affecting the naming of the final genes? Ie if I have a
>> database of EST where I have a priori knowledge of their rough placement,
>> can this placement be given to maker without providing est_forward=1?
>>
>> Thanks,
>> Mikael
>>
>>  26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>
>>  There is a way.  It?s not a standard option and it?s undocumented, but
>> if you add est_forward=1 to the maker_opts.ctl file, then it will do just
>> that.  The option won?t already be there so you?ll have to type it in.
>>
>> There is also a feature designed to work with this option.  If you add
>> tags to your fasta headers, those can be used to guide the mapping and
>> naming.  For example, gene_id=<some_gene>  will ensure different isoforms
>> that share a common gene_id get clustered into the same gene,
>> and maker_coor=chr1:1-10000 in the fasta header will force a particular
>> sequence to only be mapped against chr1 within the range of 1-10000 bp  and
>> just using maker_coor=chr1 will force it to only be mapped against chr1.
>>
>> This is an undocumented way to remap genes onto new assemblies using
>> blast alignments of earlier transcript or protein annotations as a guide.
>>
>> ?Carson
>>
>>
>>
>>
>>  From: Shaun Jackman <sjackman at gmail.com>
>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>> Date: Tuesday, February 25, 2014 at 5:06 PM
>> To: <maker-devel at yandell-lab.org>
>> Subject: [maker-devel] Mapping gene names
>>
>>  Hi,
>>
>> I?m annotating a genome using a closely related genome from Genbank,
>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence to
>> annotate my genome. I?ve run Maker, and the annotation seems to have worked
>> well. Is it possible to map the names of the genes from the related species
>> to my annotation? I see the *map_forward* option, which applies to the
>> *model_gff* parameter. Is there a similar option for *est* and *protein*?
>>
>> *maker_opts.ctl*
>>
>> est=NC_123456.frn
>> protein=NC_123456.faa
>> est2genome=1
>> protein2genome=1
>>
>> Thanks,
>> Shaun
>>  _______________________________________________ maker-devel mailing
>> list maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>>  http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>>
>>
>>   _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140304/86755749/attachment-0001.html>

From carsonhh at gmail.com  Tue Mar  4 19:33:12 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 04 Mar 2014 19:33:12 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CADX6M3rdHPKSk4VMUpbxKSA=rhVVUk+L=dG1xMibWz1KntF2AA@mail.gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
	<CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
	<CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>
	<CADX6M3rdHPKSk4VMUpbxKSA=rhVVUk+L=dG1xMibWz1KntF2AA@mail.gmail.com>
Message-ID: <CF3BD88C.A7D5%carsonhh@gmail.com>

Trying to call non-coding RNA from ESTs or even sequence homology is
extremely messy (non-trivial problem in most organisms with high false
positive rate), so MAKER for the most part doesn?t even try to do that.  It
focuses only on the coding genes.  You can now use tRNAscan and snoscan in
the newest version for some non-coding RNA support (those features were only
added a couple of months ago).  So just like other prediction tools (snap,
augustus etc.), the primary focus has always been the coding genes.  We?ve
only started adding non-coding RNA support recently for iPlant, so it?s
still relatively immature.

Thanks,
Carson


From:  Shaun Jackman <sjackman at gmail.com>
Reply-To:  Shaun Jackman <sjackman at gmail.com>
Date:  Tuesday, March 4, 2014 at 7:10 PM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Mapping gene names

Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for
the tip.

The rRNA genes that are found with est2genome have the feature type set to
mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR
features. Ideally the feature type would be set to rRNA or tRNA as
appropriate, and would omit the UTR and CDS features. Is that a feature that
you would be interested in adding to MAKER? The rRNA gene names all start
with ?rrn? and the tRNA gene names with ?trn?, as is standard, so
determining the appropriate type should be straight forward.

Thanks again for your help with this. Cheers,
Shaun


On 27 February 2014 17:13, Carson Holt <carsonhh at gmail.com> wrote:
> Set single_exon=1, and the minimum size to a smaller value.  I think it's set
> to 250 right now.  Also est2genome is looking for ORF, so if there is none (as
> with tRNAs) they probably won't get picked up.
> 
> --Carson 
> 
> Sent from my iPhone
> 
> On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:
> 
>> Sorry, ignore my previous question. est_forward also carries forward the
>> names of protein evidence and works like a charm. Thank you!
>> 
>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5
>> and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the
>> blastn output, and in the evidence_0.gff. rrn5 has perfect identity,
>> sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 <
>> eval_blastn=1e-10). How should I debug which filter is removing these hits?
>> organism_type=prokaryotic
>> est2genome=1
>> protein2genome=1
>> est_forward=1
>> Cheers,
>> Shaun
>> 
>> 
>> 
>> On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
>>> Is there a corresponding protein_forward=1 option to map forward protein
>>> names from protein2genome?
>>>  
>>> 
>>> Cheers,
>>> Shaun
>>> 
>>> 
>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com
>>> <mailto://carsonhh at gmail.com> ) wrote:
>>>  
>>>> Sorry I meant to say prefilter on the score in the mRNA column before
>>>> passing the gff3 to model_gff.
>>>> 
>>>> --Carson 
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>>> 
>>>>> What you can do is run it once with just est_forward=1 and
>>>>> est2genome/protein2genome set to 1.  Then take those results, pass them in
>>>>> as model_gff and use the map_forward option to then filter the results
>>>>> based on mRNA score and that would copy names onto new gene under the
>>>>> standard MAKER pipeline.  Eventually it?s really supposed to go into a
>>>>> separate tool that will map genes onto new assemblies (but under the hood
>>>>> the tool will just be calling MAKER with certain parameters restricted).
>>>>> I do this because if people commonly use it mixed with things like SNAP I
>>>>> can start to get some very weird behaviors.
>>>>> 
>>>>> Thanks,
>>>>> Carson
>>>>> 
>>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>>> Date: Wednesday, February 26, 2014 at 3:04 PM
>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>> 
>>>>> It seems that this could be a very useful option in those cases where you
>>>>> have firm a priori knowledge of the placement of ESTs. However, while
>>>>> trying it I note that est_forward implies that the est2genome predictor is
>>>>> turned on, implicitly. Is this necessary for this to work? I?m after the
>>>>> behavior you describe below where exonerate is made to try really hard
>>>>> within a limited region to align an est, but I would not like maker to
>>>>> produce est2genome predictions.
>>>>> 
>>>>> In general, I think this maker_coor and est_forward is a feature set that
>>>>> is worthy to be promoted into a documented feature.
>>>>> 
>>>>> THanks,
>>>>> Mikael
>>>>> 
>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>>>>> 
>>>>>> It will still work without est_forward.  It just works a little
>>>>>> differently.  Keep in mind this was a hidden feature I used to find
>>>>>> stubborn or hard to find missing genes after reassembly of a genome.
>>>>>> 
>>>>>> If est_forward is provided, MAKER will parse the database to look for the
>>>>>> maker_coor tags early in the pipeline.  Then it will create a list of
>>>>>> locations to search, and it will search them even if there are no BLAST
>>>>>> results to seed the search (normally MAKER gets a BLAST result first and
>>>>>> then polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to
>>>>>> look for a match using all of chr1 as the input to exonerate even when
>>>>>> BLAST finds nothing (this is a very very slow search, but can help pick
>>>>>> up one or two stubborn genes that don?t remap well).  To allow this,
>>>>>> MAKER gives exonerate looser matching parameters (i.e. allows for single
>>>>>> base pair introns perhaps caused by assembly errors).  The logic here is
>>>>>> that given the fact that I already told MAKER that with some degree of
>>>>>> confidence I expect sequence A to map to to location X, it will try its
>>>>>> hardest to make it match.
>>>>>> 
>>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm at
>>>>>> line 1563, but only after a BLAST alignment has already seeded it to the
>>>>>> region (that BLAST result has the information in its description
>>>>>> parameter).  MAKER will then ignore seeds completely outside of
>>>>>> maker_coor. In addition any BLAST seeds that overlap maker_coor will get
>>>>>> the search space for alignment polishing adjusted to match maker_coor
>>>>>> exactly.  Also match parameters for exonerate will not be relaxed as they
>>>>>> were with est_forward.
>>>>>> 
>>>>>> As you can see the behavior, is slightly different (because it?s an
>>>>>> accidental feature).
>>>>>> 
>>>>>> Thanks,
>>>>>> Carson
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM
>>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>>> 
>>>>>> That might be a useful and time saving accidental feature. But, reading
>>>>>> the code, it seems that I need to supply maker_coor but not gene_id, as
>>>>>> well as the configuration option est_forward for this to work. Any
>>>>>> occurrences of maker_coor in GI.pm seems to be conditioned on
>>>>>> set_forward=1 right?
>>>>>> 
>>>>>> Mikael
>>>>>> 
>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>> 
>>>>>>> Yes.  That should work as well as an accidental feature.
>>>>>>> 
>>>>>>> --Carson 
>>>>>>> 
>>>>>>> Sent from my iPhone
>>>>>>> 
>>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling
>>>>>>> <mikael.durling at slu.se> wrote:
>>>>>>> 
>>>>>>> Can this use of maker_coor be used only to hint about the placement of
>>>>>>> the ests, without affecting the naming of the final genes? Ie if I have
>>>>>>> a database of EST where I have a priori knowledge of their rough
>>>>>>> placement, can this placement be given to maker without providing
>>>>>>> est_forward=1?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Mikael
>>>>>>> 
>>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>> 
>>>>>>> There is a way.  It?s not a standard option and it?s undocumented, but
>>>>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do
>>>>>>> just that.  The option won?t already be there so you?ll have to type it
>>>>>>> in.
>>>>>>> 
>>>>>>> There is also a feature designed to work with this option.  If you add
>>>>>>> tags to your fasta headers, those can be used to guide the mapping and
>>>>>>> naming.  For example, gene_id=<some_gene>  will ensure different
>>>>>>> isoforms that share a common gene_id get clustered into the same gene,
>>>>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular
>>>>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp
>>>>>>> and just using maker_coor=chr1 will force it to only be mapped against
>>>>>>> chr1.
>>>>>>> 
>>>>>>> This is an undocumented way to remap genes onto new assemblies using
>>>>>>> blast alignments of earlier transcript or protein annotations as a
>>>>>>> guide.
>>>>>>> 
>>>>>>> ?Carson
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> From: Shaun Jackman <sjackman at gmail.com>
>>>>>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>>>>>> To: <maker-devel at yandell-lab.org>
>>>>>>> Subject: [maker-devel] Mapping gene names
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I?m annotating a genome using a closely related genome from Genbank,
>>>>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence
>>>>>>> to annotate my genome. I?ve run Maker, and the annotation seems to have
>>>>>>> worked well. Is it possible to map the names of the genes from the
>>>>>>> related species to my annotation? I see the map_forward option, which
>>>>>>> applies to the model_gff parameter. Is there a similar option for est
>>>>>>> and protein?
>>>>>>> 
>>>>>>> maker_opts.ctl
>>>>>>> est=NC_123456.frn
>>>>>>> protein=NC_123456.faa
>>>>>>> est2genome=1
>>>>>>> protein2genome=1
>>>>>>> Thanks,
>>>>>>> Shaun
>>>>>>> _______________________________________________ maker-devel mailing list
>>>>>>> maker-devel at box290.bluehost.com
>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>>> <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>>> > 
>>>>>>> _______________________________________________
>>>>>>> maker-devel mailing list
>>>>>>> maker-devel at box290.bluehost.com
>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>>> 
>>>>>> 
>>>>> 
>>>> _______________________________________________
>>>> maker-devel mailing list
>>>> maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140304/6f5e8e33/attachment-0001.html>

From felix.bemm at uni-wuerzburg.de  Wed Mar  5 09:35:33 2014
From: felix.bemm at uni-wuerzburg.de (Felix Bemm)
Date: Wed, 05 Mar 2014 17:35:33 +0100
Subject: [maker-devel] Build Issues - v2.31
Message-ID: <53175255.4050102@uni-wuerzburg.de>

Hi,

I am trying to build maker version 2.31. Got the following error:

Configuring MAKER with MPI support
'CCFLAGSEX' is not a valid config option for Inline::C
  at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm 
line 236
  at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm 
line 256
	Parallel::Application::MPI::_bind('/software/mpich2-1.5rc3/bin/mpicc', 
'/software/mpich2-1.5rc3/include', 'blib', '') called at 
/storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 277
	MAKER::Build::ACTION_build('MAKER::Build=HASH(0x2199060)') called at 
/usr/share/perl/5.14/Module/Build/Base.pm line 2024
	Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)', 
'build') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2007
	Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)', 'build') 
called at /storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 469
	MAKER::Build::ACTION_install('MAKER::Build=HASH(0x2199060)') called at 
/usr/share/perl/5.14/Module/Build/Base.pm line 2024
	Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)', 
'install') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2012
	Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)') called at 
./Build line 70

Same procedure worked with 2.29-beta!

Any ideas?

Felix

-- 
Felix Bemm
Department of Bioinformatics
University of W?rzburg, Germany
Tel: +49 931 - 31 83696
Fax: +49 931 - 31 84552
felix.bemm at uni-wuerzburg.de


From carsonhh at gmail.com  Wed Mar  5 09:40:05 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 05 Mar 2014 09:40:05 -0700
Subject: [maker-devel] Build Issues - v2.31
In-Reply-To: <53175255.4050102@uni-wuerzburg.de>
References: <53175255.4050102@uni-wuerzburg.de>
Message-ID: <CF3CA125.A7FA%carsonhh@gmail.com>

You need to update your Inline::C module.  The CCFLAGSEX option was added
to Inline::C a couple of years ago to allow users to pass in flags to the
compiler.

Thanks,
Carson


On 3/5/14, 9:35 AM, "Felix Bemm" <felix.bemm at uni-wuerzburg.de> wrote:

>Hi,
>
>I am trying to build maker version 2.31. Got the following error:
>
>Configuring MAKER with MPI support
>'CCFLAGSEX' is not a valid config option for Inline::C
>  at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm
>line 236
>  at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm
>line 256
>	Parallel::Application::MPI::_bind('/software/mpich2-1.5rc3/bin/mpicc',
>'/software/mpich2-1.5rc3/include', 'blib', '') called at
>/storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 277
>	MAKER::Build::ACTION_build('MAKER::Build=HASH(0x2199060)') called at
>/usr/share/perl/5.14/Module/Build/Base.pm line 2024
>	Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)',
>'build') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2007
>	Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)', 'build')
>called at /storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 469
>	MAKER::Build::ACTION_install('MAKER::Build=HASH(0x2199060)') called at
>/usr/share/perl/5.14/Module/Build/Base.pm line 2024
>	Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)',
>'install') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2012
>	Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)') called at
>./Build line 70
>
>Same procedure worked with 2.29-beta!
>
>Any ideas?
>
>Felix
>
>-- 
>Felix Bemm
>Department of Bioinformatics
>University of W?rzburg, Germany
>Tel: +49 931 - 31 83696
>Fax: +49 931 - 31 84552
>felix.bemm at uni-wuerzburg.de
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carson.holt at genetics.utah.edu  Wed Mar  5 12:02:26 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Wed, 5 Mar 2014 19:02:26 +0000
Subject: [maker-devel] FW: maker-control file
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A890B8A7@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A890B8A7@SKREGIXES2.AGR.GC.CA>
Message-ID: <CF3CC2C6.A802%carson.holt@genetics.utah.edu>


On 3/5/14, 11:59 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:

>Dear Maker users
>
>I want to run maker on a fungal genome of about 45 Mb with about 1/3 of
>the genome begin repeat rich. But most of the virulent genes are located
>within the repeat regions flanked but stretch of repeats. I am not sure
>if I  use the repeat masker option I am going to miss out on the
>predication of these virulent genes located within the repeats.
>
>Other concerns with the setting in maker-opts file for fungal genomes are:
>
>single_exon = 0     should this get changed to 1 since single exon genes
>are quit common in fungi and what is the consequence of this on using EST
>and assembled RNA as evidence for gene prediction
>
>correct_est_fusion=0                  #limits use of ESTs in annotation
>to avoid fusion genes         as I understand this option will remove the
>overlapping UTRs but what is the consequence of setting this option on
>the use of EST for predicting ORFs
>
>
>Thanks
>
>
>
>HB
>
>
>
>


From carsonhh at gmail.com  Wed Mar  5 12:17:57 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 05 Mar 2014 12:17:57 -0700
Subject: [maker-devel] FW: maker-control file
Message-ID: <CF3CC300.A805%carsonhh@gmail.com>

Not using repeat masking will cause many problems.  Beside a gene being
flanked by repeats does not mean it will be lost, any evidence/alignments
that can seed in non-repetative regions (gene/exon) are still allowed to
extend into repetitive regions during the polishing stage (aligners have
two stages - seed and extend).  So transposons should never seed, but
genes will because there sequence will contain non-repetative regions
(even if they are near repeats).

single_exon should be set to 1 for fungi, just make sure to set the
minimum length of single exon evidence to something reasonable like 250bp.

correct_est_fusion should not be used together with est2genome.  It won?t
fail, you just get odd results.  Actually est2genome should not ever be
used to generate the final annotation set.  It is a convenience method
that allows you to generate rough models for training gene predictors like
SNAP and Augustus.  But once they are trained it should be turned off,
because the models it produces will be partial (Ests rarely cover the
whole transcript) and the results will have many false potties from
background transcription events from your EST data.  These models are good
enough to train with, but make very poor final annotations. So in the end
you should be using correct_est_fusion=1 with the SNAP pr Augustus set and
not est2genome (which should already have been turned off by then).


Thanks,
Carson


>
>
>On 3/5/14, 11:59 AM, "Borhan, Hossein" <> wrote:
>
>>Dear Maker users
>>
>>I want to run maker on a fungal genome of about 45 Mb with about 1/3 of
>>the genome begin repeat rich. But most of the virulent genes are located
>>within the repeat regions flanked but stretch of repeats. I am not sure
>>if I  use the repeat masker option I am going to miss out on the
>>predication of these virulent genes located within the repeats.
>>
>>Other concerns with the setting in maker-opts file for fungal genomes
>>are:
>>
>>single_exon = 0     should this get changed to 1 since single exon genes
>>are quit common in fungi and what is the consequence of this on using EST
>>and assembled RNA as evidence for gene prediction
>>
>>correct_est_fusion=0                  #limits use of ESTs in annotation
>>to avoid fusion genes         as I understand this option will remove the
>>overlapping UTRs but what is the consequence of setting this option on
>>the use of EST for predicting ORFs
>>
>>
>>Thanks
>>
>>
>>
>>HB
>>
>>
>>
>>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From marc.hoeppner at imbim.uu.se  Thu Mar  6 00:26:29 2014
From: marc.hoeppner at imbim.uu.se (=?Windows-1252?Q?Marc_H=F6ppner?=)
Date: Thu, 6 Mar 2014 07:26:29 +0000
Subject: [maker-devel] FW: maker-control file
In-Reply-To: <CF3CC300.A805%carsonhh@gmail.com>
References: <CF3CC300.A805%carsonhh@gmail.com>
Message-ID: <1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se>

Hi,

I think this is an interesting comment that I would like a few more information on:


correct_est_fusion should not be used together with est2genome.  It won?t
fail, you just get odd results.  Actually est2genome should not ever be
used to generate the final annotation set.  It is a convenience method
that allows you to generate rough models for training gene predictors like
SNAP and Augustus.  But once they are trained it should be turned off,
because the models it produces will be partial (Ests rarely cover the
whole transcript) and the results will have many false potties from
background transcription events from your EST data.  These models are good
enough to train with, but make very poor final annotations. So in the end
you should be using correct_est_fusion=1 with the SNAP pr Augustus set and
not est2genome (which should already have been turned off by then).


My experience has been that the process of training gene finders, especially for complex genomes like vertebrates, is a very slow and painful process. And ultimately, the results are far from accurate, even with a sizeable, manually curated training set. Wouldn?t it be more sensible to rely on the evidence over probabilistic models? The annotation would be partial, but on the other hand the chance of incorporating false signals are smaller (assuming I can generate a clean set of transcripts from RNA-seq data)? And I?d rather underestimate the exon inventory slightly than putting out an annotation with ~ 10% false exon calls.

As an example, using SNAP and Augustus on a bird genome - with augustus achieving nucleotide and exon sensitivities in the 70-90% range gave a host if false exons that were simply not supported by the RNAseq data, yet made it into the final gene build. Not sure what to think about that to be honest. Is it possible to get some more details on how Maker uses ab-inito predictions and reconciles them with evidence alignments? At the moment it seems to me that maker gives higher weight to the ab-initio predictions, which to me seems problematic.


/Marc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140306/f7acdc87/attachment-0001.html>

From carsonhh at gmail.com  Thu Mar  6 07:29:35 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Mar 2014 07:29:35 -0700
Subject: [maker-devel] FW: maker-control file
In-Reply-To: <1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se>
References: <CF3CC300.A805%carsonhh@gmail.com>
	<1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se>
Message-ID: <CF3DCCB0.A85C%carsonhh@gmail.com>

> Wouldn?t it be more sensible to rely on the evidence over probabilistic
> models?

Yes.  Infact that is the backbone of MAKER.  The evidence is used to derive
hints that are passed back into the predictors and reviewed in light of the
evidence to decide on final models (no longer strictly probabalistic).  Take
a look at the MAKER2 paper (Table 2 and Figure 1) and you will see that eve
when you use the wrong species parameters in the predictor (I.e. A. thaliana
to annotate C. elegant) you get as much as a 3 fold increase in exon level
accuracy by using the hint feedback from MAKER.  With est2genome option you
don?t get that hint feedback (normally probabilistic models, EST evidence,
and protein evidence would all work together), and the models are overall
poorer and contain more false positives (we have looked at this a lot).


> The annotation would be partial, but on the other hand the chance of
> incorporating false signals are smaller (assuming I can generate a clean set
> of transcripts from RNA-seq data)?

False signals are abundant.  It?s just the nature of how ESTs and especially
mRNAseq reads are generated and anchored back to the assembly.  By letting
there be feedback between the probabilistic model and the evidence (both
protein and EST/mRNAseq) a lot of this is eliminated.


> As an example, using SNAP and Augustus on a bird genome - with augustus
> achieving nucleotide and exon sensitivities in the 70-90% range gave a host if
> false exons that were simply not supported by the RNAseq data, yet made it
> into the final gene build.

You will get false positives from est2genome alone approach as well.  Models
will be more partial, and false negative rate will be very high (often
30-70% false negative rate).  Also look at the MAKER2 paper Figure 1.  The
false positive rate from ab initio alone can be quite high, but with the
evidence feedback it is substantially reduced (especially for poorly trained
predictors).


> Is it possible to get some more details on how Maker uses ab-inito predictions
> and reconciles them with evidence alignments? At the moment it seems to me
> that maker gives higher weight to the ab-initio predictions, which to me seems
> problematic. 

Take a look at the MAKER, MAKER2, and MAKER-P papers.  Final genes are
chosen based off of evidence overlap using AED (completely evidence based).
It is the model generation that leverages the hint based feedback.  The
names of MAKER genes can let you know what the source of the model is.  Any
time hint based models match the evidence better the name will have hame
like this ?>
maker-<contig>-<predictor>-gene-<ID> (I.e. maker-chr1-snap-gene-0.4)

When the ab initio model matches better than the hint based model the name
is like this ?>
<predictor>-<contig>-abinit-gene-<ID> (I.e. snap-chr1-abinit-gene-0.2)


In summary, using est2genome alone (while good for generating training sets)
undercuts the power of the evidence feedback together with the probabilistic
models.


Thanks,
Carson

From:  Marc H?ppner <marc.hoeppner at imbim.uu.se>
Date:  Thursday, March 6, 2014 at 12:26 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] FW: maker-control file

Hi,

I think this is an interesting comment that I would like a few more
information on:

> 
> correct_est_fusion should not be used together with est2genome.  It won?t
> fail, you just get odd results.  Actually est2genome should not ever be
> used to generate the final annotation set.  It is a convenience method
> that allows you to generate rough models for training gene predictors like
> SNAP and Augustus.  But once they are trained it should be turned off,
> because the models it produces will be partial (Ests rarely cover the
> whole transcript) and the results will have many false potties from
> background transcription events from your EST data.  These models are good
> enough to train with, but make very poor final annotations. So in the end
> you should be using correct_est_fusion=1 with the SNAP pr Augustus set and
> not est2genome (which should already have been turned off by then).
> 

My experience has been that the process of training gene finders, especially
for complex genomes like vertebrates, is a very slow and painful process.
And ultimately, the results are far from accurate, even with a sizeable,
manually curated training set. Wouldn?t it be more sensible to rely on the
evidence over probabilistic models? The annotation would be partial, but on
the other hand the chance of incorporating false signals are smaller
(assuming I can generate a clean set of transcripts from RNA-seq data)? And
I?d rather underestimate the exon inventory slightly than putting out an
annotation with ~ 10% false exon calls.

As an example, using SNAP and Augustus on a bird genome - with augustus
achieving nucleotide and exon sensitivities in the 70-90% range gave a host
if false exons that were simply not supported by the RNAseq data, yet made
it into the final gene build. Not sure what to think about that to be
honest. Is it possible to get some more details on how Maker uses ab-inito
predictions and reconciles them with evidence alignments? At the moment it
seems to me that maker gives higher weight to the ab-initio predictions,
which to me seems problematic.


/Marc


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140306/465e3b3f/attachment-0001.html>

From marc.hoeppner at imbim.uu.se  Thu Mar  6 07:40:48 2014
From: marc.hoeppner at imbim.uu.se (=?Windows-1252?Q?Marc_H=F6ppner?=)
Date: Thu, 6 Mar 2014 14:40:48 +0000
Subject: [maker-devel] FW: maker-control file
In-Reply-To: <CF3DCCB0.A85C%carsonhh@gmail.com>
References: <CF3CC300.A805%carsonhh@gmail.com>
	<1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se>
	<CF3DCCB0.A85C%carsonhh@gmail.com>
Message-ID: <1E6F33D6-44FE-44C5-81C5-8FE58DA07D27@imbim.uu.se>

Hi Carson,

Thanks for the detailed feedback, this has cleared up a few things. I don?t necessarily share your view on the problematic nature of RNA-seq data - especially with newer protocols near-perfect strandedness. We work a lot on transcriptome assembly and with a stringent approach to transcript assembly I think I got better results with est2genome than trying to let Maker work with a semi-refined ab-initio model. But it can be a bit tricky to hit that sweet spot (we did validate > 4000 models manually in order to make that sort of assessment tho).

But I will have another look at this and see if I can get Maker to do what I need with the approach you describe. That reminds me, I think it would be fantastic if you guys could put together a Wiki for Maker. This is such a useful and powerful tool, but clearly there are many things that people should get a proper explanation on that has only ever been discussed on this list here - best practices, experimental features etc.

Regards,

Marc


On 06 Mar 2014, at 15:29, Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>> wrote:

Wouldn?t it be more sensible to rely on the evidence over probabilistic models?

Yes.  Infact that is the backbone of MAKER.  The evidence is used to derive hints that are passed back into the predictors and reviewed in light of the evidence to decide on final models (no longer strictly probabalistic).  Take a look at the MAKER2 paper (Table 2 and Figure 1) and you will see that eve when you use the wrong species parameters in the predictor (I.e. A. thaliana to annotate C. elegant) you get as much as a 3 fold increase in exon level accuracy by using the hint feedback from MAKER.  With est2genome option you don?t get that hint feedback (normally probabilistic models, EST evidence, and protein evidence would all work together), and the models are overall poorer and contain more false positives (we have looked at this a lot).


The annotation would be partial, but on the other hand the chance of incorporating false signals are smaller (assuming I can generate a clean set of transcripts from RNA-seq data)?

False signals are abundant.  It?s just the nature of how ESTs and especially mRNAseq reads are generated and anchored back to the assembly.  By letting there be feedback between the probabilistic model and the evidence (both protein and EST/mRNAseq) a lot of this is eliminated.


As an example, using SNAP and Augustus on a bird genome - with augustus achieving nucleotide and exon sensitivities in the 70-90% range gave a host if false exons that were simply not supported by the RNAseq data, yet made it into the final gene build.

You will get false positives from est2genome alone approach as well.  Models will be more partial, and false negative rate will be very high (often 30-70% false negative rate).  Also look at the MAKER2 paper Figure 1.  The false positive rate from ab initio alone can be quite high, but with the evidence feedback it is substantially reduced (especially for poorly trained predictors).


Is it possible to get some more details on how Maker uses ab-inito predictions and reconciles them with evidence alignments? At the moment it seems to me that maker gives higher weight to the ab-initio predictions, which to me seems problematic.

Take a look at the MAKER, MAKER2, and MAKER-P papers.  Final genes are chosen based off of evidence overlap using AED (completely evidence based).  It is the model generation that leverages the hint based feedback.  The names of MAKER genes can let you know what the source of the model is.  Any time hint based models match the evidence better the name will have hame like this ?>
maker-<contig>-<predictor>-gene-<ID> (I.e. maker-chr1-snap-gene-0.4)

When the ab initio model matches better than the hint based model the name is like this ?>
<predictor>-<contig>-abinit-gene-<ID> (I.e. snap-chr1-abinit-gene-0.2)


In summary, using est2genome alone (while good for generating training sets) undercuts the power of the evidence feedback together with the probabilistic models.


Thanks,
Carson

From: Marc H?ppner <marc.hoeppner at imbim.uu.se<mailto:marc.hoeppner at imbim.uu.se>>
Date: Thursday, March 6, 2014 at 12:26 AM
To: Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] FW: maker-control file

Hi,

I think this is an interesting comment that I would like a few more information on:


correct_est_fusion should not be used together with est2genome.  It won?t
fail, you just get odd results.  Actually est2genome should not ever be
used to generate the final annotation set.  It is a convenience method
that allows you to generate rough models for training gene predictors like
SNAP and Augustus.  But once they are trained it should be turned off,
because the models it produces will be partial (Ests rarely cover the
whole transcript) and the results will have many false potties from
background transcription events from your EST data.  These models are good
enough to train with, but make very poor final annotations. So in the end
you should be using correct_est_fusion=1 with the SNAP pr Augustus set and
not est2genome (which should already have been turned off by then).


My experience has been that the process of training gene finders, especially for complex genomes like vertebrates, is a very slow and painful process. And ultimately, the results are far from accurate, even with a sizeable, manually curated training set. Wouldn?t it be more sensible to rely on the evidence over probabilistic models? The annotation would be partial, but on the other hand the chance of incorporating false signals are smaller (assuming I can generate a clean set of transcripts from RNA-seq data)? And I?d rather underestimate the exon inventory slightly than putting out an annotation with ~ 10% false exon calls.

As an example, using SNAP and Augustus on a bird genome - with augustus achieving nucleotide and exon sensitivities in the 70-90% range gave a host if false exons that were simply not supported by the RNAseq data, yet made it into the final gene build. Not sure what to think about that to be honest. Is it possible to get some more details on how Maker uses ab-inito predictions and reconciles them with evidence alignments? At the moment it seems to me that maker gives higher weight to the ab-initio predictions, which to me seems problematic.


/Marc

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140306/868effc6/attachment-0001.html>

From carsonhh at gmail.com  Thu Mar  6 08:03:10 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Mar 2014 08:03:10 -0700
Subject: [maker-devel] FW: maker-control file
In-Reply-To: <1E6F33D6-44FE-44C5-81C5-8FE58DA07D27@imbim.uu.se>
References: <CF3CC300.A805%carsonhh@gmail.com>
	<1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se>
	<CF3DCCB0.A85C%carsonhh@gmail.com>
	<1E6F33D6-44FE-44C5-81C5-8FE58DA07D27@imbim.uu.se>
Message-ID: <CF3DDC22.A8AF%carsonhh@gmail.com>

MAKER wiki ?> 
http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Main_Page

Thanks,
Carson


From:  Marc H?ppner <marc.hoeppner at imbim.uu.se>
Date:  Thursday, March 6, 2014 at 7:40 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] FW: maker-control file

Hi Carson, 

Thanks for the detailed feedback, this has cleared up a few things. I don?t
necessarily share your view on the problematic nature of RNA-seq data -
especially with newer protocols near-perfect strandedness. We work a lot on
transcriptome assembly and with a stringent approach to transcript assembly
I think I got better results with est2genome than trying to let Maker work
with a semi-refined ab-initio model. But it can be a bit tricky to hit that
sweet spot (we did validate > 4000 models manually in order to make that
sort of assessment tho).

But I will have another look at this and see if I can get Maker to do what I
need with the approach you describe. That reminds me, I think it would be
fantastic if you guys could put together a Wiki for Maker. This is such a
useful and powerful tool, but clearly there are many things that people
should get a proper explanation on that has only ever been discussed on this
list here - best practices, experimental features etc.

Regards,

Marc


On 06 Mar 2014, at 15:29, Carson Holt <carsonhh at gmail.com> wrote:

>> Wouldn?t it be more sensible to rely on the evidence over probabilistic
>> models?
> 
> Yes.  Infact that is the backbone of MAKER.  The evidence is used to derive
> hints that are passed back into the predictors and reviewed in light of the
> evidence to decide on final models (no longer strictly probabalistic).  Take a
> look at the MAKER2 paper (Table 2 and Figure 1) and you will see that eve when
> you use the wrong species parameters in the predictor (I.e. A. thaliana to
> annotate C. elegant) you get as much as a 3 fold increase in exon level
> accuracy by using the hint feedback from MAKER.  With est2genome option you
> don?t get that hint feedback (normally probabilistic models, EST evidence, and
> protein evidence would all work together), and the models are overall poorer
> and contain more false positives (we have looked at this a lot).
> 
> 
>> The annotation would be partial, but on the other hand the chance of
>> incorporating false signals are smaller (assuming I can generate a clean set
>> of transcripts from RNA-seq data)?
> 
> False signals are abundant.  It?s just the nature of how ESTs and especially
> mRNAseq reads are generated and anchored back to the assembly.  By letting
> there be feedback between the probabilistic model and the evidence (both
> protein and EST/mRNAseq) a lot of this is eliminated.
> 
> 
>> As an example, using SNAP and Augustus on a bird genome - with augustus
>> achieving nucleotide and exon sensitivities in the 70-90% range gave a host
>> if false exons that were simply not supported by the RNAseq data, yet made it
>> into the final gene build.
> 
> You will get false positives from est2genome alone approach as well.  Models
> will be more partial, and false negative rate will be very high (often 30-70%
> false negative rate).  Also look at the MAKER2 paper Figure 1.  The false
> positive rate from ab initio alone can be quite high, but with the evidence
> feedback it is substantially reduced (especially for poorly trained
> predictors).
> 
> 
>> Is it possible to get some more details on how Maker uses ab-inito
>> predictions and reconciles them with evidence alignments? At the moment it
>> seems to me that maker gives higher weight to the ab-initio predictions,
>> which to me seems problematic.
> 
> Take a look at the MAKER, MAKER2, and MAKER-P papers.  Final genes are chosen
> based off of evidence overlap using AED (completely evidence based).  It is
> the model generation that leverages the hint based feedback.  The names of
> MAKER genes can let you know what the source of the model is.  Any time hint
> based models match the evidence better the name will have hame like this ?>
> maker-<contig>-<predictor>-gene-<ID> (I.e. maker-chr1-snap-gene-0.4)
> 
> When the ab initio model matches better than the hint based model the name is
> like this ?>
> <predictor>-<contig>-abinit-gene-<ID> (I.e. snap-chr1-abinit-gene-0.2)
> 
> 
> In summary, using est2genome alone (while good for generating training sets)
> undercuts the power of the evidence feedback together with the probabilistic
> models.
> 
> 
> Thanks,
> Carson
> 
> From: Marc H?ppner <marc.hoeppner at imbim.uu.se>
> Date: Thursday, March 6, 2014 at 12:26 AM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] FW: maker-control file
> 
> Hi,
> 
> I think this is an interesting comment that I would like a few more
> information on:
> 
>> 
>> correct_est_fusion should not be used together with est2genome.  It won?t
>> fail, you just get odd results.  Actually est2genome should not ever be
>> used to generate the final annotation set.  It is a convenience method
>> that allows you to generate rough models for training gene predictors like
>> SNAP and Augustus.  But once they are trained it should be turned off,
>> because the models it produces will be partial (Ests rarely cover the
>> whole transcript) and the results will have many false potties from
>> background transcription events from your EST data.  These models are good
>> enough to train with, but make very poor final annotations. So in the end
>> you should be using correct_est_fusion=1 with the SNAP pr Augustus set and
>> not est2genome (which should already have been turned off by then).
>> 
> 
> My experience has been that the process of training gene finders, especially
> for complex genomes like vertebrates, is a very slow and painful process. And
> ultimately, the results are far from accurate, even with a sizeable, manually
> curated training set. Wouldn?t it be more sensible to rely on the evidence
> over probabilistic models? The annotation would be partial, but on the other
> hand the chance of incorporating false signals are smaller (assuming I can
> generate a clean set of transcripts from RNA-seq data)? And I?d rather
> underestimate the exon inventory slightly than putting out an annotation with
> ~ 10% false exon calls.
> 
> As an example, using SNAP and Augustus on a bird genome - with augustus
> achieving nucleotide and exon sensitivities in the 70-90% range gave a host if
> false exons that were simply not supported by the RNAseq data, yet made it
> into the final gene build. Not sure what to think about that to be honest. Is
> it possible to get some more details on how Maker uses ab-inito predictions
> and reconciles them with evidence alignments? At the moment it seems to me
> that maker gives higher weight to the ab-initio predictions, which to me seems
> problematic. 
> 
> 
> /Marc


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140306/10d5f640/attachment-0001.html>

From sjackman at gmail.com  Thu Mar  6 13:56:34 2014
From: sjackman at gmail.com (Shaun Jackman)
Date: Thu, 6 Mar 2014 12:56:34 -0800
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CF3BD88C.A7D5%carsonhh@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
	<CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
	<CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>
	<CADX6M3rdHPKSk4VMUpbxKSA=rhVVUk+L=dG1xMibWz1KntF2AA@mail.gmail.com>
	<CF3BD88C.A7D5%carsonhh@gmail.com>
Message-ID: <etPan.5318e112.238e1f29.9018@pshen01-imac.phage.bcgsc.ca>

Hi, Carson. I agree that identifying non-coding RNA by homology in general is a non-trivial problem. In my particular case, I have a well annotated reference species that is very closely related (99.2% sequence identity), so lifting over the annotations from that reference species to my species should be pretty straight forward. It would be great if MAKER had an option for RNA sequence homology similar to est2genome that does not imply the sequence is coding.

The integration of MAKER-P with tRNAscan is very useful. The identified genes are named e.g. `trnascan-205522-processed-gene-0.38`. ?tRNA genes are conventionally named according to the amino acid and anticodon, such as `trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the names with that convention?

Cheers,
Shaun


On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote:

Trying to call non-coding RNA from ESTs or even sequence homology is extremely messy (non-trivial problem in most organisms with high false positive rate), so MAKER for the most part doesn?t even try to do that. ?It focuses only on the coding genes. ?You can now use tRNAscan and snoscan in the newest version for some non-coding RNA support (those features were only added a couple of months ago). ?So just like other prediction tools (snap, augustus etc.), the primary focus has always been the coding genes. ?We?ve only started adding non-coding RNA support recently for iPlant, so it?s still relatively immature.

Thanks,
Carson


From: Shaun Jackman <sjackman at gmail.com>
Reply-To: Shaun Jackman <sjackman at gmail.com>
Date: Tuesday, March 4, 2014 at 7:10 PM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

Hi, Carson. I set  
single_length=50, and it worked like a charm. Thanks for the tip.

The rRNA genes that are found with est2genome have the feature type set to mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. Ideally the feature type would be set to rRNA or tRNA as appropriate, and would omit the UTR and CDS features. Is that a feature that you would be interested in adding to MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is standard, so determining the appropriate type should be straight forward.

Thanks again for your help with this. Cheers,
Shaun


On 27 February 2014 17:13, Carson Holt <carsonhh at gmail.com> wrote:
Set single_exon=1, and the minimum size to a smaller value. ?I think it's set to 250 right now. ?Also est2genome is looking for ORF, so if there is none (as with tRNAs) they probably won't get picked up.

--Carson?

Sent from my iPhone

On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:

Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you!

The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits?


organism_type=prokaryotic
est2genome=1
protein2genome=1
est_forward=1

Cheers,
Shaun


On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
Is there a corresponding?protein_forward=1 option to map forward protein names from protein2genome?

Cheers,
Shaun

On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote:

Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff.

--Carson?

Sent from my iPhone

On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:

What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. ?Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. ?Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). ?I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors.?

Thanks,
Carson

From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date: Wednesday, February 26, 2014 at 3:04 PM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions.

In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature.

THanks,
Mikael

26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:

It will still work without est_forward. ?It just works a little differently. ?Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome.

If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. ?Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). ?So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). ?To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). ?The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match.?

Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). ?MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. ?Also match parameters for exonerate will not be relaxed as they were with est_forward.

As you can see the behavior, is slightly different (because it?s an accidental feature).

Thanks,
Carson


From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date: Wednesday, February 26, 2014 at 6:37 AM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right?

Mikael

26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:

Yes. ?That should work as well as an accidental feature.

--Carson?

Sent from my iPhone

On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se> wrote:

Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?

Thanks,
Mikael

26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:

There is a way. ?It?s not a standard option and it?s undocumented, but if you add?est_forward=1 to the maker_opts.ctl file, then it will do just that. ?The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option. ?If you add tags to your fasta headers, those can be used to guide the mapping and naming. ?For example, gene_id=<some_gene> ?will ensure different isoforms that share a common gene_id get clustered into the same gene, and?maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp ?and just using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.

?Carson


From: Shaun Jackman <sjackman at gmail.com>
Reply-To: Shaun Jackman <sjackman at gmail.com>
Date: Tuesday, February 25, 2014 at 5:06 PM
To: <maker-devel at yandell-lab.org>
Subject: [maker-devel] Mapping gene names

Hi,

I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?

maker_opts.ctl


est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1

Thanks,
Shaun

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140306/b953179f/attachment-0001.html>

From carsonhh at gmail.com  Thu Mar  6 13:58:41 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Mar 2014 13:58:41 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <etPan.5318e112.238e1f29.9018@pshen01-imac.phage.bcgsc.ca>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
	<CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
	<CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>
	<CADX6M3rdHPKSk4VMUpbxKSA=rhVVUk+L=dG1xMibWz1KntF2AA@mail.gmail.com>
	<CF3BD88C.A7D5%carsonhh@gmail.com>
	<etPan.5318e112.238e1f29.9018@pshen01-imac.phage.bcgsc.ca>
Message-ID: <CF3E2F7A.A911%carsonhh@gmail.com>

Yes.  I?ll fix the naming.

Thanks,
Carson


From:  Shaun Jackman <sjackman at gmail.com>
Date:  Thursday, March 6, 2014 at 1:56 PM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Mapping gene names

Hi, Carson. I agree that identifying non-coding RNA by homology in general
is a non-trivial problem. In my particular case, I have a well annotated
reference species that is very closely related (99.2% sequence identity), so
lifting over the annotations from that reference species to my species
should be pretty straight forward. It would be great if MAKER had an option
for RNA sequence homology similar to est2genome that does not imply the
sequence is coding.

The integration of MAKER-P with tRNAscan is very useful. The identified
genes are named e.g. `trnascan-205522-processed-gene-0.38`.  tRNA genes are
conventionally named according to the amino acid and anticodon, such as
`trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the
names with that convention?

Cheers,
Shaun


On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote:
 
> Trying to call non-coding RNA from ESTs or even sequence homology is extremely
> messy (non-trivial problem in most organisms with high false positive rate),
> so MAKER for the most part doesn?t even try to do that.  It focuses only on
> the coding genes.  You can now use tRNAscan and snoscan in the newest version
> for some non-coding RNA support (those features were only added a couple of
> months ago).  So just like other prediction tools (snap, augustus etc.), the
> primary focus has always been the coding genes.  We?ve only started adding
> non-coding RNA support recently for iPlant, so it?s still relatively immature.
> 
> Thanks,
> Carson
> 
> 
> From: Shaun Jackman <sjackman at gmail.com>
> Reply-To: Shaun Jackman <sjackman at gmail.com>
> Date: Tuesday, March 4, 2014 at 7:10 PM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Mapping gene names
> 
> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for the
> tip.
> 
> The rRNA genes that are found with est2genome have the feature type set to
> mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features.
> Ideally the feature type would be set to rRNA or tRNA as appropriate, and
> would omit the UTR and CDS features. Is that a feature that you would be
> interested in adding to MAKER? The rRNA gene names all start with ?rrn? and
> the tRNA gene names with ?trn?, as is standard, so determining the appropriate
> type should be straight forward.
> 
> Thanks again for your help with this. Cheers,
> Shaun
> 
> 
> 
> On 27 February 2014 17:13, Carson Holt <carsonhh at gmail.com> wrote:
>> Set single_exon=1, and the minimum size to a smaller value.  I think it's set
>> to 250 right now.  Also est2genome is looking for ORF, so if there is none
>> (as with tRNAs) they probably won't get picked up.
>> 
>> --Carson 
>> 
>> Sent from my iPhone
>> 
>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:
>> 
>>> Sorry, ignore my previous question. est_forward also carries forward the
>>> names of protein evidence and works like a charm. Thank you!
>>> 
>>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5
>>> and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in
>>> the blastn output, and in the evidence_0.gff. rrn5 has perfect identity,
>>> sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 <
>>> eval_blastn=1e-10). How should I debug which filter is removing these hits?
>>> organism_type=prokaryotic
>>> est2genome=1
>>> protein2genome=1
>>> est_forward=1
>>> Cheers,
>>> Shaun
>>> 
>>> 
>>> 
>>> On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
>>>> Is there a corresponding protein_forward=1 option to map forward protein
>>>> names from protein2genome?
>>>> 
>>>> Cheers, 
>>>> Shaun
>>>> 
>>>> 
>>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com
>>>> <mailto://carsonhh at gmail.com> ) wrote:
>>>>> 
>>>>> Sorry I meant to say prefilter on the score in the mRNA column before
>>>>> passing the gff3 to model_gff.
>>>>> 
>>>>> --Carson 
>>>>> 
>>>>> Sent from my iPhone
>>>>> 
>>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>>>> 
>>>>>> What you can do is run it once with just est_forward=1 and
>>>>>> est2genome/protein2genome set to 1.  Then take those results, pass them
>>>>>> in as model_gff and use the map_forward option to then filter the results
>>>>>> based on mRNA score and that would copy names onto new gene under the
>>>>>> standard MAKER pipeline.  Eventually it?s really supposed to go into a
>>>>>> separate tool that will map genes onto new assemblies (but under the hood
>>>>>> the tool will just be calling MAKER with certain parameters restricted).
>>>>>> I do this because if people commonly use it mixed with things like SNAP I
>>>>>> can start to get some very weird behaviors.
>>>>>> 
>>>>>> Thanks,
>>>>>> Carson
>>>>>> 
>>>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>>>> Date: Wednesday, February 26, 2014 at 3:04 PM
>>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>>> 
>>>>>> It seems that this could be a very useful option in those cases where you
>>>>>> have firm a priori knowledge of the placement of ESTs. However, while
>>>>>> trying it I note that est_forward implies that the est2genome predictor
>>>>>> is turned on, implicitly. Is this necessary for this to work? I?m after
>>>>>> the behavior you describe below where exonerate is made to try really
>>>>>> hard within a limited region to align an est, but I would not like maker
>>>>>> to produce est2genome predictions.
>>>>>> 
>>>>>> In general, I think this maker_coor and est_forward is a feature set that
>>>>>> is worthy to be promoted into a documented feature.
>>>>>> 
>>>>>> THanks,
>>>>>> Mikael
>>>>>> 
>>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>> 
>>>>>>> It will still work without est_forward.  It just works a little
>>>>>>> differently.  Keep in mind this was a hidden feature I used to find
>>>>>>> stubborn or hard to find missing genes after reassembly of a genome.
>>>>>>> 
>>>>>>> If est_forward is provided, MAKER will parse the database to look for
>>>>>>> the maker_coor tags early in the pipeline.  Then it will create a list
>>>>>>> of locations to search, and it will search them even if there are no
>>>>>>> BLAST results to seed the search (normally MAKER gets a BLAST result
>>>>>>> first and then polishes it with exonerate).  So maker_coor=chr1 will
>>>>>>> cause MAKER to look for a match using all of chr1 as the input to
>>>>>>> exonerate even when BLAST finds nothing (this is a very very slow
>>>>>>> search, but can help pick up one or two stubborn genes that don?t remap
>>>>>>> well).  To allow this, MAKER gives exonerate looser matching parameters
>>>>>>> (i.e. allows for single base pair introns perhaps caused by assembly
>>>>>>> errors).  The logic here is that given the fact that I already told
>>>>>>> MAKER that with some degree of confidence I expect sequence A to map to
>>>>>>> to location X, it will try its hardest to make it match.
>>>>>>> 
>>>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm
>>>>>>> at line 1563, but only after a BLAST alignment has already seeded it to
>>>>>>> the region (that BLAST result has the information in its description
>>>>>>> parameter).  MAKER will then ignore seeds completely outside of
>>>>>>> maker_coor. In addition any BLAST seeds that overlap maker_coor will get
>>>>>>> the search space for alignment polishing adjusted to match maker_coor
>>>>>>> exactly.  Also match parameters for exonerate will not be relaxed as
>>>>>>> they were with est_forward.
>>>>>>> 
>>>>>>> As you can see the behavior, is slightly different (because it?s an
>>>>>>> accidental feature).
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Carson
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM
>>>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>>>> 
>>>>>>> That might be a useful and time saving accidental feature. But, reading
>>>>>>> the code, it seems that I need to supply maker_coor but not gene_id, as
>>>>>>> well as the configuration option est_forward for this to work. Any
>>>>>>> occurrences of maker_coor in GI.pm seems to be conditioned on
>>>>>>> set_forward=1 right?
>>>>>>> 
>>>>>>> Mikael
>>>>>>> 
>>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>> 
>>>>>>> Yes.  That should work as well as an accidental feature.
>>>>>>> 
>>>>>>> --Carson 
>>>>>>> 
>>>>>>> Sent from my iPhone
>>>>>>> 
>>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling
>>>>>>> <mikael.durling at slu.se> wrote:
>>>>>>> 
>>>>>>> Can this use of maker_coor be used only to hint about the placement of
>>>>>>> the ests, without affecting the naming of the final genes? Ie if I have
>>>>>>> a database of EST where I have a priori knowledge of their rough
>>>>>>> placement, can this placement be given to maker without providing
>>>>>>> est_forward=1?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Mikael
>>>>>>> 
>>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>> 
>>>>>>> There is a way.  It?s not a standard option and it?s undocumented, but
>>>>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do
>>>>>>> just that.  The option won?t already be there so you?ll have to type it
>>>>>>> in.
>>>>>>> 
>>>>>>> There is also a feature designed to work with this option.  If you add
>>>>>>> tags to your fasta headers, those can be used to guide the mapping and
>>>>>>> naming.  For example, gene_id=<some_gene>  will ensure different
>>>>>>> isoforms that share a common gene_id get clustered into the same gene,
>>>>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular
>>>>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp
>>>>>>> and just using maker_coor=chr1 will force it to only be mapped against
>>>>>>> chr1.
>>>>>>> 
>>>>>>> This is an undocumented way to remap genes onto new assemblies using
>>>>>>> blast alignments of earlier transcript or protein annotations as a
>>>>>>> guide.
>>>>>>> 
>>>>>>> ?Carson
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> From: Shaun Jackman <sjackman at gmail.com>
>>>>>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>>>>>> To: <maker-devel at yandell-lab.org>
>>>>>>> Subject: [maker-devel] Mapping gene names
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I?m annotating a genome using a closely related genome from Genbank,
>>>>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence
>>>>>>> to annotate my genome. I?ve run Maker, and the annotation seems to have
>>>>>>> worked well. Is it possible to map the names of the genes from the
>>>>>>> related species to my annotation? I see the map_forward option, which
>>>>>>> applies to the model_gff parameter. Is there a similar option for est
>>>>>>> and protein?
>>>>>>> 
>>>>>>> maker_opts.ctl
>>>>>>> est=NC_123456.frn
>>>>>>> protein=NC_123456.faa
>>>>>>> est2genome=1
>>>>>>> protein2genome=1
>>>>>>> Thanks,
>>>>>>> Shaun
>>>>>>> _______________________________________________ maker-devel mailing list
>>>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listin
>>>>>>> fo/maker-devel_yandell-lab.org
>>>>>>> _______________________________________________
>>>>>>> maker-devel mailing list
>>>>>>> maker-devel at box290.bluehost.com
>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> _______________________________________________
>>>>> maker-devel mailing list
>>>>> maker-devel at box290.bluehost.com
>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140306/7d17d96d/attachment-0001.html>

From carson.holt at genetics.utah.edu  Thu Mar  6 16:00:40 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Thu, 6 Mar 2014 23:00:40 +0000
Subject: [maker-devel] maker problem with running blast
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A890BAE7@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A890BAE7@SKREGIXES2.AGR.GC.CA>
Message-ID: <CF3E4A6E.A92B%carson.holt@genetics.utah.edu>

Your blast_type parameter in maker_bopts.ctl is set to 'wublast' but the
executables for wublast are blank in maker_exe.ctl.

See, they?re blank ?>
xdformat=#location of WUBLAST xdformat executable
blasta=#location of WUBLAST blasta executable


You either need to provide executables or set your blast_type parameter to
something else. For example, you could set it to 'NCBI+', but you will nee
to fix the location of makeblastdb.

makeblastdb is set incorrectly here?>
makeblastdb=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+ #location of
NCBI+ makeblastdb executable


Alternativley you can set blast_type to 'NCBI', but you will need to
uncomment the executables.

Here?>
formatdb=#/usr/local/bin/formatdb #location of NCBI formatdb executable
blastall=#/usr/local/bin/blastall #location of NCBI blastall executable


?Carson


On 3/6/14, 3:51 PM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:

>Hi
>
>I have installed latest version of blast+ and provided the excitable path
>to the maker_exec.ctl  as follow
>
>#-----Location of Executables Used by MAKER/EVALUATOR
>makeblastdb=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+ #location of
>NCBI+ makeblastdb executable
>blastn=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+/bin/blastn #location
>of NCBI+ blastn executable
>blastx=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+/bin/blastx #location
>of NCBI+ blastx executable
>tblastx=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+/bin/tblastx
>#location of NCBI+ tblastx executable
>formatdb=#/usr/local/bin/formatdb #location of NCBI formatdb executable
>blastall=#/usr/local/bin/blastall #location of NCBI blastall executable
>xdformat=#location of WUBLAST xdformat executable
>blasta=#location of WUBLAST blasta executable
>RepeatMasker=/usr/local/RepeatMasker/RepeatMasker #location of
>RepeatMasker executable
>exonerate=/home/AAFC-AAC/borhanh/bin/exonerate-2.2.0-x86_64/bin/exonerate
>#location of exonerate executable
>
>#-----Ab-initio Gene Prediction Algorithms
>snap=/home/AAFC-AAC/borhanh/bin/snap/snap #location of snap executable
>gmhmme3=/home/AAFC-AAC/borhanh/bin/gm_es_bp_linux64_v2.3e/gmes/gmhmme3
>#location of eukaryotic genemark executable
>gmhmmp= #location of prokaryotic genemark executable
>augustus=/usr/local/augustus.2.5.5/bin/augustus #location of augustus
>executable
>fgenesh=/usr/local/FGENESH/fgenesh #location of fgenesh executable
>
>#-----Other Algorithms
>fathom=/home/AAFC-AAC/borhanh/bin/snap/fathom #location of fathom
>executable (experimental)
>probuild=/home/AAFC-AAC/borhanh/bin/gm_es_bp_linux64_v2.3e/gmes/probuild
>#location of probuild executable (required for genemark)
>
>
>
>
>
>But when running maker I get this error
>
>
>STATUS: Parsing control files...
>WARNING: blast_type is set to 'wublast' but executables cannot be located
>ERROR: Please provide a valid locaction for a BLAST algorithm in the
>control files.
>
>
>
>
>
>
>


From sjackman at gmail.com  Thu Mar  6 16:33:04 2014
From: sjackman at gmail.com (Shaun Jackman)
Date: Thu, 6 Mar 2014 15:33:04 -0800
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CF3E2F7A.A911%carsonhh@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
	<CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
	<CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>
	<CADX6M3rdHPKSk4VMUpbxKSA=rhVVUk+L=dG1xMibWz1KntF2AA@mail.gmail.com>
	<CF3BD88C.A7D5%carsonhh@gmail.com>
	<etPan.5318e112.238e1f29.9018@pshen01-imac.phage.bcgsc.ca>
	<CF3E2F7A.A911%carsonhh@gmail.com>
Message-ID: <etPan.531905bf.79e2a9e3.9018@pshen01-imac.phage.bcgsc.ca>

Fantastic. Thanks, Carson. When I use both est2genome and tRNAscan to identify tRNA, I was hoping that both forms of evidence would be used to create a single gene model, which doesn?t seem to be the case. I get duplicate overlapping gene models (one mRNA from est and one tRNA from tRNAscan). Could MAKER merge these models?

Cheers,
Shaun
On 2014-March-06 at 12:58:50 , Carson Holt (carsonhh at gmail.com) wrote:

Yes. ?I?ll fix the naming.

Thanks,
Carson


From: Shaun Jackman <sjackman at gmail.com>
Date: Thursday, March 6, 2014 at 1:56 PM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

Hi, Carson. I agree that identifying non-coding RNA by homology in general is a non-trivial problem. In my particular case, I have a well annotated reference species that is very closely related (99.2% sequence identity), so lifting over the annotations from that reference species to my species should be pretty straight forward. It would be great if MAKER had an option for RNA sequence homology similar to est2genome that does not imply the sequence is coding.

The integration of MAKER-P with tRNAscan is very useful. The identified genes are named e.g. `trnascan-205522-processed-gene-0.38`. ?tRNA genes are conventionally named according to the amino acid and anticodon, such as `trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the names with that convention?

Cheers,
Shaun


On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote:

Trying to call non-coding RNA from ESTs or even sequence homology is extremely messy (non-trivial problem in most organisms with high false positive rate), so MAKER for the most part doesn?t even try to do that. ?It focuses only on the coding genes. ?You can now use tRNAscan and snoscan in the newest version for some non-coding RNA support (those features were only added a couple of months ago). ?So just like other prediction tools (snap, augustus etc.), the primary focus has always been the coding genes. ?We?ve only started adding non-coding RNA support recently for iPlant, so it?s still relatively immature.

Thanks,
Carson


From: Shaun Jackman <sjackman at gmail.com>
Reply-To: Shaun Jackman <sjackman at gmail.com>
Date: Tuesday, March 4, 2014 at 7:10 PM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

Hi, Carson. I set  
single_length=50, and it worked like a charm. Thanks for the tip.

The rRNA genes that are found with est2genome have the feature type set to mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. Ideally the feature type would be set to rRNA or tRNA as appropriate, and would omit the UTR and CDS features. Is that a feature that you would be interested in adding to MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is standard, so determining the appropriate type should be straight forward.

Thanks again for your help with this. Cheers,
Shaun


On 27 February 2014 17:13, Carson Holt <carsonhh at gmail.com> wrote:
Set single_exon=1, and the minimum size to a smaller value. ?I think it's set to 250 right now. ?Also est2genome is looking for ORF, so if there is none (as with tRNAs) they probably won't get picked up.

--Carson?

Sent from my iPhone

On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:

Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you!

The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits?


organism_type=prokaryotic
est2genome=1
protein2genome=1
est_forward=1

Cheers,
Shaun


On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
Is there a corresponding?protein_forward=1 option to map forward protein names from protein2genome?

Cheers,
Shaun

On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote:

Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff.

--Carson?

Sent from my iPhone

On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:

What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. ?Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. ?Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). ?I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors.?

Thanks,
Carson

From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date: Wednesday, February 26, 2014 at 3:04 PM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions.

In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature.

THanks,
Mikael

26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:

It will still work without est_forward. ?It just works a little differently. ?Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome.

If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. ?Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). ?So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). ?To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). ?The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match.?

Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). ?MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. ?Also match parameters for exonerate will not be relaxed as they were with est_forward.

As you can see the behavior, is slightly different (because it?s an accidental feature).

Thanks,
Carson


From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date: Wednesday, February 26, 2014 at 6:37 AM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right?

Mikael

26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:

Yes. ?That should work as well as an accidental feature.

--Carson?

Sent from my iPhone

On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se> wrote:

Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?

Thanks,
Mikael

26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:

There is a way. ?It?s not a standard option and it?s undocumented, but if you add?est_forward=1 to the maker_opts.ctl file, then it will do just that. ?The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option. ?If you add tags to your fasta headers, those can be used to guide the mapping and naming. ?For example, gene_id=<some_gene> ?will ensure different isoforms that share a common gene_id get clustered into the same gene, and?maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp ?and just using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.

?Carson


From: Shaun Jackman <sjackman at gmail.com>
Reply-To: Shaun Jackman <sjackman at gmail.com>
Date: Tuesday, February 25, 2014 at 5:06 PM
To: <maker-devel at yandell-lab.org>
Subject: [maker-devel] Mapping gene names

Hi,

I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?

maker_opts.ctl


est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1

Thanks,
Shaun

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140306/41dd51b0/attachment-0001.html>

From carsonhh at gmail.com  Thu Mar  6 16:38:48 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Mar 2014 16:38:48 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <etPan.531905bf.79e2a9e3.9018@pshen01-imac.phage.bcgsc.ca>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
	<CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
	<CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>
	<CADX6M3rdHPKSk4VMUpbxKSA=rhVVUk+L=dG1xMibWz1KntF2AA@mail.gmail.com>
	<CF3BD88C.A7D5%carsonhh@gmail.com>
	<etPan.5318e112.238e1f29.9018@pshen01-imac.phage.bcgsc.ca>
	<CF3E2F7A.A911%carsonhh@gmail.com>
	<etPan.531905bf.79e2a9e3.9018@pshen01-imac.phage.bcgsc.ca>
Message-ID: <CF3E5408.A93F%carsonhh@gmail.com>

Well? not really.  I have no plans to add est2genome support for noncoding
genes (non-trivial), so you would either have to remove the ncRNA from your
input, or filter it out downstream.

Thanks,
Carson


From:  Shaun Jackman <sjackman at gmail.com>
Date:  Thursday, March 6, 2014 at 4:33 PM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Mapping gene names

Fantastic. Thanks, Carson. When I use both est2genome and tRNAscan to
identify tRNA, I was hoping that both forms of evidence would be used to
create a single gene model, which doesn?t seem to be the case. I get
duplicate overlapping gene models (one mRNA from est and one tRNA from
tRNAscan). Could MAKER merge these models?

Cheers,
Shaun
On 2014-March-06 at 12:58:50 , Carson Holt (carsonhh at gmail.com) wrote:
 
> Yes.  I?ll fix the naming.
> 
> Thanks,
> Carson
> 
> 
> From: Shaun Jackman <sjackman at gmail.com>
> Date: Thursday, March 6, 2014 at 1:56 PM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Mapping gene names
> 
> Hi, Carson. I agree that identifying non-coding RNA by homology in general is
> a non-trivial problem. In my particular case, I have a well annotated
> reference species that is very closely related (99.2% sequence identity), so
> lifting over the annotations from that reference species to my species should
> be pretty straight forward. It would be great if MAKER had an option for RNA
> sequence homology similar to est2genome that does not imply the sequence is
> coding.
> 
> The integration of MAKER-P with tRNAscan is very useful. The identified genes
> are named e.g. `trnascan-205522-processed-gene-0.38`.  tRNA genes are
> conventionally named according to the amino acid and anticodon, such as
> `trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the names
> with that convention?
> 
> Cheers,
> Shaun
> 
> 
> On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote:
>> 
>> Trying to call non-coding RNA from ESTs or even sequence homology is
>> extremely messy (non-trivial problem in most organisms with high false
>> positive rate), so MAKER for the most part doesn?t even try to do that.  It
>> focuses only on the coding genes.  You can now use tRNAscan and snoscan in
>> the newest version for some non-coding RNA support (those features were only
>> added a couple of months ago).  So just like other prediction tools (snap,
>> augustus etc.), the primary focus has always been the coding genes.  We?ve
>> only started adding non-coding RNA support recently for iPlant, so it?s still
>> relatively immature.
>> 
>> Thanks,
>> Carson
>> 
>> 
>> From: Shaun Jackman <sjackman at gmail.com>
>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>> Date: Tuesday, March 4, 2014 at 7:10 PM
>> To: Carson Holt <carsonhh at gmail.com>
>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] Mapping gene names
>> 
>> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for
>> the tip.
>> 
>> The rRNA genes that are found with est2genome have the feature type set to
>> mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features.
>> Ideally the feature type would be set to rRNA or tRNA as appropriate, and
>> would omit the UTR and CDS features. Is that a feature that you would be
>> interested in adding to MAKER? The rRNA gene names all start with ?rrn? and
>> the tRNA gene names with ?trn?, as is standard, so determining the
>> appropriate type should be straight forward.
>> 
>> Thanks again for your help with this. Cheers,
>> Shaun
>> 
>> 
>> 
>> On 27 February 2014 17:13, Carson Holt <carsonhh at gmail.com> wrote:
>>> Set single_exon=1, and the minimum size to a smaller value.  I think it's
>>> set to 250 right now.  Also est2genome is looking for ORF, so if there is
>>> none (as with tRNAs) they probably won't get picked up.
>>> 
>>> --Carson 
>>> 
>>> Sent from my iPhone
>>> 
>>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:
>>> 
>>>> Sorry, ignore my previous question. est_forward also carries forward the
>>>> names of protein evidence and works like a charm. Thank you!
>>>> 
>>>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller
>>>> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They
>>>> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect
>>>> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value
>>>> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing
>>>> these hits?
>>>> organism_type=prokaryotic
>>>> est2genome=1
>>>> protein2genome=1
>>>> est_forward=1
>>>> Cheers,
>>>> Shaun
>>>> 
>>>> 
>>>> 
>>>> On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
>>>>> Is there a corresponding protein_forward=1 option to map forward protein
>>>>> names from protein2genome?
>>>>> 
>>>>> Cheers, 
>>>>> Shaun
>>>>> 
>>>>> 
>>>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com
>>>>> <mailto://carsonhh at gmail.com> ) wrote:
>>>>>> 
>>>>>> Sorry I meant to say prefilter on the score in the mRNA column before
>>>>>> passing the gff3 to model_gff.
>>>>>> 
>>>>>> --Carson 
>>>>>> 
>>>>>> Sent from my iPhone
>>>>>> 
>>>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>>>>> 
>>>>>>> What you can do is run it once with just est_forward=1 and
>>>>>>> est2genome/protein2genome set to 1.  Then take those results, pass them
>>>>>>> in as model_gff and use the map_forward option to then filter the
>>>>>>> results based on mRNA score and that would copy names onto new gene
>>>>>>> under the standard MAKER pipeline.  Eventually it?s really supposed to
>>>>>>> go into a separate tool that will map genes onto new assemblies (but
>>>>>>> under the hood the tool will just be calling MAKER with certain
>>>>>>> parameters restricted).  I do this because if people commonly use it
>>>>>>> mixed with things like SNAP I can start to get some very weird
>>>>>>> behaviors. 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Carson
>>>>>>> 
>>>>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>>>>> Date: Wednesday, February 26, 2014 at 3:04 PM
>>>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>>>> 
>>>>>>> It seems that this could be a very useful option in those cases where
>>>>>>> you have firm a priori knowledge of the placement of ESTs. However,
>>>>>>> while trying it I note that est_forward implies that the est2genome
>>>>>>> predictor is turned on, implicitly. Is this necessary for this to work?
>>>>>>> I?m after the behavior you describe below where exonerate is made to try
>>>>>>> really hard within a limited region to align an est, but I would not
>>>>>>> like maker to produce est2genome predictions.
>>>>>>> 
>>>>>>> In general, I think this maker_coor and est_forward is a feature set
>>>>>>> that is worthy to be promoted into a documented feature.
>>>>>>> 
>>>>>>> THanks,
>>>>>>> Mikael
>>>>>>> 
>>>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>> 
>>>>>>> It will still work without est_forward.  It just works a little
>>>>>>> differently.  Keep in mind this was a hidden feature I used to find
>>>>>>> stubborn or hard to find missing genes after reassembly of a genome.
>>>>>>> 
>>>>>>> If est_forward is provided, MAKER will parse the database to look for
>>>>>>> the maker_coor tags early in the pipeline.  Then it will create a list
>>>>>>> of locations to search, and it will search them even if there are no
>>>>>>> BLAST results to seed the search (normally MAKER gets a BLAST result
>>>>>>> first and then polishes it with exonerate).  So maker_coor=chr1 will
>>>>>>> cause MAKER to look for a match using all of chr1 as the input to
>>>>>>> exonerate even when BLAST finds nothing (this is a very very slow
>>>>>>> search, but can help pick up one or two stubborn genes that don?t remap
>>>>>>> well).  To allow this, MAKER gives exonerate looser matching parameters
>>>>>>> (i.e. allows for single base pair introns perhaps caused by assembly
>>>>>>> errors).  The logic here is that given the fact that I already told
>>>>>>> MAKER that with some degree of confidence I expect sequence A to map to
>>>>>>> to location X, it will try its hardest to make it match.
>>>>>>> 
>>>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm
>>>>>>> at line 1563, but only after a BLAST alignment has already seeded it to
>>>>>>> the region (that BLAST result has the information in its description
>>>>>>> parameter).  MAKER will then ignore seeds completely outside of
>>>>>>> maker_coor. In addition any BLAST seeds that overlap maker_coor will get
>>>>>>> the search space for alignment polishing adjusted to match maker_coor
>>>>>>> exactly.  Also match parameters for exonerate will not be relaxed as
>>>>>>> they were with est_forward.
>>>>>>> 
>>>>>>> As you can see the behavior, is slightly different (because it?s an
>>>>>>> accidental feature).
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Carson
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM
>>>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>>>> 
>>>>>>> That might be a useful and time saving accidental feature. But, reading
>>>>>>> the code, it seems that I need to supply maker_coor but not gene_id, as
>>>>>>> well as the configuration option est_forward for this to work. Any
>>>>>>> occurrences of maker_coor in GI.pm seems to be conditioned on
>>>>>>> set_forward=1 right?
>>>>>>> 
>>>>>>> Mikael
>>>>>>> 
>>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>> 
>>>>>>> Yes.  That should work as well as an accidental feature.
>>>>>>> 
>>>>>>> --Carson 
>>>>>>> 
>>>>>>> Sent from my iPhone
>>>>>>> 
>>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling
>>>>>>> <mikael.durling at slu.se> wrote:
>>>>>>> 
>>>>>>> Can this use of maker_coor be used only to hint about the placement of
>>>>>>> the ests, without affecting the naming of the final genes? Ie if I have
>>>>>>> a database of EST where I have a priori knowledge of their rough
>>>>>>> placement, can this placement be given to maker without providing
>>>>>>> est_forward=1?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Mikael
>>>>>>> 
>>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>> 
>>>>>>> There is a way.  It?s not a standard option and it?s undocumented, but
>>>>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do
>>>>>>> just that.  The option won?t already be there so you?ll have to type it
>>>>>>> in.
>>>>>>> 
>>>>>>> There is also a feature designed to work with this option.  If you add
>>>>>>> tags to your fasta headers, those can be used to guide the mapping and
>>>>>>> naming.  For example, gene_id=<some_gene>  will ensure different
>>>>>>> isoforms that share a common gene_id get clustered into the same gene,
>>>>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular
>>>>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp
>>>>>>> and just using maker_coor=chr1 will force it to only be mapped against
>>>>>>> chr1.
>>>>>>> 
>>>>>>> This is an undocumented way to remap genes onto new assemblies using
>>>>>>> blast alignments of earlier transcript or protein annotations as a
>>>>>>> guide.
>>>>>>> 
>>>>>>> ?Carson
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> From: Shaun Jackman <sjackman at gmail.com>
>>>>>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>>>>>> To: <maker-devel at yandell-lab.org>
>>>>>>> Subject: [maker-devel] Mapping gene names
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I?m annotating a genome using a closely related genome from Genbank,
>>>>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence
>>>>>>> to annotate my genome. I?ve run Maker, and the annotation seems to have
>>>>>>> worked well. Is it possible to map the names of the genes from the
>>>>>>> related species to my annotation? I see the map_forward option, which
>>>>>>> applies to the model_gff parameter. Is there a similar option for est
>>>>>>> and protein?
>>>>>>> 
>>>>>>> maker_opts.ctl
>>>>>>> est=NC_123456.frn
>>>>>>> protein=NC_123456.faa
>>>>>>> est2genome=1
>>>>>>> protein2genome=1
>>>>>>> Thanks,
>>>>>>> Shaun
>>>>>>> _______________________________________________ maker-devel mailing list
>>>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listin
>>>>>>> fo/maker-devel_yandell-lab.org
>>>>>>> _______________________________________________
>>>>>>> maker-devel mailing list
>>>>>>> maker-devel at box290.bluehost.com
>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> _______________________________________________
>>>>>> maker-devel mailing list
>>>>>> maker-devel at box290.bluehost.com
>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>> 
>> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140306/1c286d5e/attachment-0001.html>

From sbrubaker at solazyme.com  Thu Mar  6 16:41:55 2014
From: sbrubaker at solazyme.com (Shane Brubaker)
Date: Thu, 6 Mar 2014 23:41:55 +0000
Subject: [maker-devel] Long introns from Augustus
Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA50F08236@EXCHANGE-MB01.internal.solazyme.com>

Hi, we have a very compact genome and we are getting a lot of fused gene models from running Augustus.  I am wondering if anyone has any advice about how to prevent introns above a certain cutoff from being created?

I tried a couple of things, some settings in a probabilities file and also changing a long list of probabilities to another file that someone had suggested on a forum.  So far I don't really see any changes though.

Any advice would be greatly appreciated.  

Thanks,
Shane


From carsonhh at gmail.com  Thu Mar  6 16:46:53 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Mar 2014 16:46:53 -0700
Subject: [maker-devel] Long introns from Augustus
Message-ID: <CF3E5643.A94C%carsonhh@gmail.com>

Are these the ab intio calls that are merged or final MAKER models.

?Carson


On 3/6/14, 4:41 PM, "Shane Brubaker" <sbrubaker at solazyme.com> wrote:

>Hi, we have a very compact genome and we are getting a lot of fused gene
>models from running Augustus.  I am wondering if anyone has any advice
>about how to prevent introns above a certain cutoff from being created?
>
>I tried a couple of things, some settings in a probabilities file and
>also changing a long list of probabilities to another file that someone
>had suggested on a forum.  So far I don't really see any changes though.
>
>Any advice would be greatly appreciated.
>
>Thanks,
>Shane
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From sbrubaker at solazyme.com  Thu Mar  6 17:48:15 2014
From: sbrubaker at solazyme.com (Shane Brubaker)
Date: Fri, 7 Mar 2014 00:48:15 +0000
Subject: [maker-devel] Long introns from Augustus
In-Reply-To: <CF3E5643.A94C%carsonhh@gmail.com>
References: <CF3E5643.A94C%carsonhh@gmail.com>
Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com>

Actually these are calls directly from Augustus (without using Maker).  They are not purely ab initio in that they are using hints from RNA-Seq data.

I had noticed that Maker does have some information about max intron length - does that mean it could be taken care of by Maker?  I don't have very good "EST" (transcriptome) assemblies because it is a very difficult organism to sequence.


-----Original Message-----
From: Carson Holt [mailto:carsonhh at gmail.com] 
Sent: Thursday, March 06, 2014 3:47 PM
To: Shane Brubaker; maker-devel at yandell-lab.org
Subject: Re: [maker-devel] Long introns from Augustus

Are these the ab intio calls that are merged or final MAKER models.

?Carson


On 3/6/14, 4:41 PM, "Shane Brubaker" <sbrubaker at solazyme.com> wrote:

>Hi, we have a very compact genome and we are getting a lot of fused 
>gene models from running Augustus.  I am wondering if anyone has any 
>advice about how to prevent introns above a certain cutoff from being created?
>
>I tried a couple of things, some settings in a probabilities file and 
>also changing a long list of probabilities to another file that someone 
>had suggested on a forum.  So far I don't really see any changes though.
>
>Any advice would be greatly appreciated.
>
>Thanks,
>Shane
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From mikael.durling at slu.se  Mon Mar 10 04:27:25 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Mon, 10 Mar 2014 10:27:25 +0000
Subject: [maker-devel] keep_preds values
Message-ID: <6765E2B1-3B6F-4F5D-92E1-80AE8C315FE3@slu.se>

Hi,

Can someone, please, explain the keep_preds parameter, as it works now with a value between 1 and 0? It used to be binary, but now it seems to test concordance towards something. The maker wiki doesn?t explain it any further either.

Thanks,
Mikael


From robert.king at rothamsted.ac.uk  Mon Mar 10 06:17:07 2014
From: robert.king at rothamsted.ac.uk (Robert King (RRes-Roth))
Date: Mon, 10 Mar 2014 12:17:07 +0000
Subject: [maker-devel] annotation comparison aed plots
Message-ID: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk>

Dear Maker Developers,

I've updated a reference that was had errors and was a little incomplete and now trying to produce a annotation for it. Please note the reference has not changed dramatically. I've produced two annotations using as evidence:

Annotation 1:
Uniprot proteins search using species keyword "fusarium"
Pubmed mRNA for the name of the organism
Prior annotation reference transcripts

Annotation 2:
Uniprot proteins search using species keyword "fusarium"
Pubmed mRNA for the name of the organism
Prior annotation reference transcripts
mRNA trinity assembly pasafly of different strain (only RNA-seq available)

I'm not sure if it was a smart move to use the prior annotation reference transcripts?

I want to compare these two annotations and have produced AED scores. How do I generate summary stats/figures to compare annotations. You mentioned last year in a post Mike Campbell has a script to produce these, do you know if he will post it? I've got the Eval program and converted to gtf format using the provided script, just waiting on some perl modules to be installed by admin to test it. I'm waiting on some perl modules to be installed by our administrator to test out the "Evaluator" and "compare" programs too, what do they do?

Best Wishes
Rob

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and we believe 
but do not warrant that this e-mail and any attachments
thereto do not contain any viruses. However, you are fully
responsible for performing any virus scanning.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140310/c3507502/attachment-0001.html>

From dence at genetics.utah.edu  Mon Mar 10 08:47:42 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Mon, 10 Mar 2014 14:47:42 +0000
Subject: [maker-devel] keep_preds values
In-Reply-To: <6765E2B1-3B6F-4F5D-92E1-80AE8C315FE3@slu.se>
References: <6765E2B1-3B6F-4F5D-92E1-80AE8C315FE3@slu.se>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6BA90@mxb2.hg.genetics.utah.edu>

Hi Mikael, 

The keep_preds parameter is often used the same as a binary parameter, but it doesn't have to be. The concordance that is mentioned in the comment line is the AED for that prediction. AED is a measurement of how well a prediction is supported by the evidence and ranges from 0 - 1. A prediction with an AED of 0 matches the evidence exactly while a prediction with an AED of 1 isn't overlapped by any evidence. 

The default behavior for MAKER is to make a gene model out of a prediction with any AED <1. When you change the keep_preds option from 0 to 1, then MAKER will make a gene model out of any prediction that matches the other parameters (like single_exon, min_exon, etc). Setting the keep_preds option to somewhere in between 0 and 1 will set a ceiling on the AED required for promoting a prediction to a gene model. 

>From a user standpoint, when you will almost certainly lose gene models when you set AED at an intermediate value, but you might benefit by knowing that all your models will now have an AED of at least a certain value. 

I hope that helps; let me know if it didn't. 

~Daniel

PS The original paper that described the AED is Eilbeck et al in BMC Bioinformatics 2009. It's also discussed in more detail in the MAKER2 paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews Genetics paper from 2012. 

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Mikael Brandstr?m Durling [mikael.durling at slu.se]
Sent: Monday, March 10, 2014 4:27 AM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] keep_preds values

Hi,

Can someone, please, explain the keep_preds parameter, as it works now with a value between 1 and 0? It used to be binary, but now it seems to test concordance towards something. The maker wiki doesn?t explain it any further either.

Thanks,
Mikael


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Mon Mar 10 09:51:21 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Mar 2014 08:51:21 -0700
Subject: [maker-devel] keep_preds values
Message-ID: <CF432CF3.A9C7%carsonhh@gmail.com>

Actually that is false. The keep_preds option is still binary.  Any value
other than 0 sets it to true.  There was discussion about making it a
non-binary value, but that has not been implemented.

?Carson


On 3/10/14, 7:47 AM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hi Mikael, 
>
>The keep_preds parameter is often used the same as a binary parameter,
>but it doesn't have to be. The concordance that is mentioned in the
>comment line is the AED for that prediction. AED is a measurement of how
>well a prediction is supported by the evidence and ranges from 0 - 1. A
>prediction with an AED of 0 matches the evidence exactly while a
>prediction with an AED of 1 isn't overlapped by any evidence.
>
>The default behavior for MAKER is to make a gene model out of a
>prediction with any AED <1. When you change the keep_preds option from 0
>to 1, then MAKER will make a gene model out of any prediction that
>matches the other parameters (like single_exon, min_exon, etc). Setting
>the keep_preds option to somewhere in between 0 and 1 will set a ceiling
>on the AED required for promoting a prediction to a gene model.
>
>From a user standpoint, when you will almost certainly lose gene models
>when you set AED at an intermediate value, but you might benefit by
>knowing that all your models will now have an AED of at least a certain
>value. 
>
>I hope that helps; let me know if it didn't.
>
>~Daniel
>
>PS The original paper that described the AED is Eilbeck et al in BMC
>Bioinformatics 2009. It's also discussed in more detail in the MAKER2
>paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews
>Genetics paper from 2012.
>
>Daniel Ence
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>Mikael Brandstr?m Durling [mikael.durling at slu.se]
>Sent: Monday, March 10, 2014 4:27 AM
>To: maker-devel at yandell-lab.org
>Subject: [maker-devel] keep_preds values
>
>Hi,
>
>Can someone, please, explain the keep_preds parameter, as it works now
>with a value between 1 and 0? It used to be binary, but now it seems to
>test concordance towards something. The maker wiki doesn?t explain it any
>further either.
>
>Thanks,
>Mikael
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From mikael.durling at slu.se  Mon Mar 10 08:57:23 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Mon, 10 Mar 2014 14:57:23 +0000
Subject: [maker-devel] keep_preds values
In-Reply-To: <CF432CF3.A9C7%carsonhh@gmail.com>
References: <CF432CF3.A9C7%carsonhh@gmail.com>
Message-ID: <E01F696F-4FC2-4B22-86B7-E40A5585A6F1@slu.se>

Hi Carson and Daniel,

That sounds more logical to me.  Then it would be appropriate to change the comment of keep_preds in the generated config files.

Would it make sense to make keep_preds a non-binary value to evaluate the concordance between ab initio models obtained from different predictors? That would assume that it is less likely to be a false positive when two or more predictors suggest the same unsported model?

Mikael


10 mar 2014 kl. 16:51 skrev Carson Holt <carsonhh at gmail.com>:

> Actually that is false. The keep_preds option is still binary.  Any value
> other than 0 sets it to true.  There was discussion about making it a
> non-binary value, but that has not been implemented.
> 
> ?Carson
> 
> 
> On 3/10/14, 7:47 AM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
> 
>> Hi Mikael, 
>> 
>> The keep_preds parameter is often used the same as a binary parameter,
>> but it doesn't have to be. The concordance that is mentioned in the
>> comment line is the AED for that prediction. AED is a measurement of how
>> well a prediction is supported by the evidence and ranges from 0 - 1. A
>> prediction with an AED of 0 matches the evidence exactly while a
>> prediction with an AED of 1 isn't overlapped by any evidence.
>> 
>> The default behavior for MAKER is to make a gene model out of a
>> prediction with any AED <1. When you change the keep_preds option from 0
>> to 1, then MAKER will make a gene model out of any prediction that
>> matches the other parameters (like single_exon, min_exon, etc). Setting
>> the keep_preds option to somewhere in between 0 and 1 will set a ceiling
>> on the AED required for promoting a prediction to a gene model.
>> 
>> From a user standpoint, when you will almost certainly lose gene models
>> when you set AED at an intermediate value, but you might benefit by
>> knowing that all your models will now have an AED of at least a certain
>> value. 
>> 
>> I hope that helps; let me know if it didn't.
>> 
>> ~Daniel
>> 
>> PS The original paper that described the AED is Eilbeck et al in BMC
>> Bioinformatics 2009. It's also discussed in more detail in the MAKER2
>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews
>> Genetics paper from 2012.
>> 
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> ________________________________________
>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>> Mikael Brandstr?m Durling [mikael.durling at slu.se]
>> Sent: Monday, March 10, 2014 4:27 AM
>> To: maker-devel at yandell-lab.org
>> Subject: [maker-devel] keep_preds values
>> 
>> Hi,
>> 
>> Can someone, please, explain the keep_preds parameter, as it works now
>> with a value between 1 and 0? It used to be binary, but now it seems to
>> test concordance towards something. The maker wiki doesn?t explain it any
>> further either.
>> 
>> Thanks,
>> Mikael
>> 
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> 


From carsonhh at gmail.com  Mon Mar 10 09:59:43 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Mar 2014 08:59:43 -0700
Subject: [maker-devel] keep_preds values
In-Reply-To: <E01F696F-4FC2-4B22-86B7-E40A5585A6F1@slu.se>
References: <CF432CF3.A9C7%carsonhh@gmail.com>
	<E01F696F-4FC2-4B22-86B7-E40A5585A6F1@slu.se>
Message-ID: <CF432F23.A9D4%carsonhh@gmail.com>

Yes.  It will eventually perform an AED like calculation between multiple
predictors (i.e. if you use 3 predictors it, then you require support by
at least 2 predictors across all exons to get a value of 0.33).  A value
of 0 would be perfect concordance across all 3 predictors.

?Carson


On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
wrote:

>Hi Carson and Daniel,
>
>That sounds more logical to me.  Then it would be appropriate to change
>the comment of keep_preds in the generated config files.
>
>Would it make sense to make keep_preds a non-binary value to evaluate the
>concordance between ab initio models obtained from different predictors?
>That would assume that it is less likely to be a false positive when two
>or more predictors suggest the same unsported model?
>
>Mikael
>
>
>10 mar 2014 kl. 16:51 skrev Carson Holt <carsonhh at gmail.com>:
>
>> Actually that is false. The keep_preds option is still binary.  Any
>>value
>> other than 0 sets it to true.  There was discussion about making it a
>> non-binary value, but that has not been implemented.
>> 
>> ?Carson
>> 
>> 
>> On 3/10/14, 7:47 AM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>> 
>>> Hi Mikael, 
>>> 
>>> The keep_preds parameter is often used the same as a binary parameter,
>>> but it doesn't have to be. The concordance that is mentioned in the
>>> comment line is the AED for that prediction. AED is a measurement of
>>>how
>>> well a prediction is supported by the evidence and ranges from 0 - 1. A
>>> prediction with an AED of 0 matches the evidence exactly while a
>>> prediction with an AED of 1 isn't overlapped by any evidence.
>>> 
>>> The default behavior for MAKER is to make a gene model out of a
>>> prediction with any AED <1. When you change the keep_preds option from
>>>0
>>> to 1, then MAKER will make a gene model out of any prediction that
>>> matches the other parameters (like single_exon, min_exon, etc). Setting
>>> the keep_preds option to somewhere in between 0 and 1 will set a
>>>ceiling
>>> on the AED required for promoting a prediction to a gene model.
>>> 
>>> From a user standpoint, when you will almost certainly lose gene models
>>> when you set AED at an intermediate value, but you might benefit by
>>> knowing that all your models will now have an AED of at least a certain
>>> value. 
>>> 
>>> I hope that helps; let me know if it didn't.
>>> 
>>> ~Daniel
>>> 
>>> PS The original paper that described the AED is Eilbeck et al in BMC
>>> Bioinformatics 2009. It's also discussed in more detail in the MAKER2
>>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews
>>> Genetics paper from 2012.
>>> 
>>> Daniel Ence
>>> Graduate Student
>>> Eccles Institute of Human Genetics
>>> University of Utah
>>> 15 North 2030 East, Room 2100
>>> Salt Lake City, UT 84112-5330
>>> ________________________________________
>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>>> Mikael Brandstr?m Durling [mikael.durling at slu.se]
>>> Sent: Monday, March 10, 2014 4:27 AM
>>> To: maker-devel at yandell-lab.org
>>> Subject: [maker-devel] keep_preds values
>>> 
>>> Hi,
>>> 
>>> Can someone, please, explain the keep_preds parameter, as it works now
>>> with a value between 1 and 0? It used to be binary, but now it seems to
>>> test concordance towards something. The maker wiki doesn?t explain it
>>>any
>>> further either.
>>> 
>>> Thanks,
>>> Mikael
>>> 
>>> 
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> 
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
>> 
>


From mikael.durling at slu.se  Mon Mar 10 09:08:16 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Mon, 10 Mar 2014 15:08:16 +0000
Subject: [maker-devel] keep_preds values
In-Reply-To: <CF432F23.A9D4%carsonhh@gmail.com>
References: <CF432CF3.A9C7%carsonhh@gmail.com>
	<E01F696F-4FC2-4B22-86B7-E40A5585A6F1@slu.se>
	<CF432F23.A9D4%carsonhh@gmail.com>
Message-ID: <00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se>

Ok. But that is not implemented no as far as I can tell from the source, right? Or is it reflected in the AED for the unsupported models?

Mikael

10 mar 2014 kl. 16:59 skrev Carson Holt <carsonhh at gmail.com>:

> Yes.  It will eventually perform an AED like calculation between multiple
> predictors (i.e. if you use 3 predictors it, then you require support by
> at least 2 predictors across all exons to get a value of 0.33).  A value
> of 0 would be perfect concordance across all 3 predictors.
> 
> ?Carson
> 
> 
> 
> 
> On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
> wrote:
> 
>> Hi Carson and Daniel,
>> 
>> That sounds more logical to me.  Then it would be appropriate to change
>> the comment of keep_preds in the generated config files.
>> 
>> Would it make sense to make keep_preds a non-binary value to evaluate the
>> concordance between ab initio models obtained from different predictors?
>> That would assume that it is less likely to be a false positive when two
>> or more predictors suggest the same unsported model?
>> 
>> Mikael
>> 
>> 
>> 10 mar 2014 kl. 16:51 skrev Carson Holt <carsonhh at gmail.com>:
>> 
>>> Actually that is false. The keep_preds option is still binary.  Any
>>> value
>>> other than 0 sets it to true.  There was discussion about making it a
>>> non-binary value, but that has not been implemented.
>>> 
>>> ?Carson
>>> 
>>> 
>>> On 3/10/14, 7:47 AM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>> 
>>>> Hi Mikael, 
>>>> 
>>>> The keep_preds parameter is often used the same as a binary parameter,
>>>> but it doesn't have to be. The concordance that is mentioned in the
>>>> comment line is the AED for that prediction. AED is a measurement of
>>>> how
>>>> well a prediction is supported by the evidence and ranges from 0 - 1. A
>>>> prediction with an AED of 0 matches the evidence exactly while a
>>>> prediction with an AED of 1 isn't overlapped by any evidence.
>>>> 
>>>> The default behavior for MAKER is to make a gene model out of a
>>>> prediction with any AED <1. When you change the keep_preds option from
>>>> 0
>>>> to 1, then MAKER will make a gene model out of any prediction that
>>>> matches the other parameters (like single_exon, min_exon, etc). Setting
>>>> the keep_preds option to somewhere in between 0 and 1 will set a
>>>> ceiling
>>>> on the AED required for promoting a prediction to a gene model.
>>>> 
>>>> From a user standpoint, when you will almost certainly lose gene models
>>>> when you set AED at an intermediate value, but you might benefit by
>>>> knowing that all your models will now have an AED of at least a certain
>>>> value. 
>>>> 
>>>> I hope that helps; let me know if it didn't.
>>>> 
>>>> ~Daniel
>>>> 
>>>> PS The original paper that described the AED is Eilbeck et al in BMC
>>>> Bioinformatics 2009. It's also discussed in more detail in the MAKER2
>>>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews
>>>> Genetics paper from 2012.
>>>> 
>>>> Daniel Ence
>>>> Graduate Student
>>>> Eccles Institute of Human Genetics
>>>> University of Utah
>>>> 15 North 2030 East, Room 2100
>>>> Salt Lake City, UT 84112-5330
>>>> ________________________________________
>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>>>> Mikael Brandstr?m Durling [mikael.durling at slu.se]
>>>> Sent: Monday, March 10, 2014 4:27 AM
>>>> To: maker-devel at yandell-lab.org
>>>> Subject: [maker-devel] keep_preds values
>>>> 
>>>> Hi,
>>>> 
>>>> Can someone, please, explain the keep_preds parameter, as it works now
>>>> with a value between 1 and 0? It used to be binary, but now it seems to
>>>> test concordance towards something. The maker wiki doesn?t explain it
>>>> any
>>>> further either.
>>>> 
>>>> Thanks,
>>>> Mikael
>>>> 
>>>> 
>>>> _______________________________________________
>>>> maker-devel mailing list
>>>> maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>> 
>>>> _______________________________________________
>>>> maker-devel mailing list
>>>> maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> 
>>> 
>> 
> 
> 


From carsonhh at gmail.com  Mon Mar 10 10:16:59 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Mar 2014 09:16:59 -0700
Subject: [maker-devel] keep_preds values
In-Reply-To: <00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se>
References: <CF432CF3.A9C7%carsonhh@gmail.com>
	<E01F696F-4FC2-4B22-86B7-E40A5585A6F1@slu.se>
	<CF432F23.A9D4%carsonhh@gmail.com>
	<00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se>
Message-ID: <CF4331A9.A9E0%carsonhh@gmail.com>

There is a value called abAED being calculated, which somewhat captures
the concordance among the predictors.  It is not currently printed in the
GFF3, but it is used to identify the best non-overlapping ab initio
predictor to put in the non-overlapping fasta file.  There are a couple of
things I still need to do with it to though.  It?s not yet normalized to
take into account the absence of a predictor in the cluster of overlapping
predictions. For example, if I have 2 predictors and 2 make perfectly
matching calls and 1 makes no call, they get a score of 0 before I have
perfect concordance between what?s there, but I really should make it 0.33
because the abscence of the third predictor is meaningful.  The
unnormalized concordance value is fine for deciding which overlapping
model to keep in the file, but not for global comparison.

?Carson


On 3/10/14, 8:08 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
wrote:

>Ok. But that is not implemented no as far as I can tell from the source,
>right? Or is it reflected in the AED for the unsupported models?
>
>Mikael
>
>10 mar 2014 kl. 16:59 skrev Carson Holt <carsonhh at gmail.com>:
>
>> Yes.  It will eventually perform an AED like calculation between
>>multiple
>> predictors (i.e. if you use 3 predictors it, then you require support by
>> at least 2 predictors across all exons to get a value of 0.33).  A value
>> of 0 would be perfect concordance across all 3 predictors.
>> 
>> ?Carson
>> 
>> 
>> 
>> 
>> On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
>> wrote:
>> 
>>> Hi Carson and Daniel,
>>> 
>>> That sounds more logical to me.  Then it would be appropriate to change
>>> the comment of keep_preds in the generated config files.
>>> 
>>> Would it make sense to make keep_preds a non-binary value to evaluate
>>>the
>>> concordance between ab initio models obtained from different
>>>predictors?
>>> That would assume that it is less likely to be a false positive when
>>>two
>>> or more predictors suggest the same unsported model?
>>> 
>>> Mikael
>>> 
>>> 
>>> 10 mar 2014 kl. 16:51 skrev Carson Holt <carsonhh at gmail.com>:
>>> 
>>>> Actually that is false. The keep_preds option is still binary.  Any
>>>> value
>>>> other than 0 sets it to true.  There was discussion about making it a
>>>> non-binary value, but that has not been implemented.
>>>> 
>>>> ?Carson
>>>> 
>>>> 
>>>> On 3/10/14, 7:47 AM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>>> 
>>>>> Hi Mikael, 
>>>>> 
>>>>> The keep_preds parameter is often used the same as a binary
>>>>>parameter,
>>>>> but it doesn't have to be. The concordance that is mentioned in the
>>>>> comment line is the AED for that prediction. AED is a measurement of
>>>>> how
>>>>> well a prediction is supported by the evidence and ranges from 0 -
>>>>>1. A
>>>>> prediction with an AED of 0 matches the evidence exactly while a
>>>>> prediction with an AED of 1 isn't overlapped by any evidence.
>>>>> 
>>>>> The default behavior for MAKER is to make a gene model out of a
>>>>> prediction with any AED <1. When you change the keep_preds option
>>>>>from
>>>>> 0
>>>>> to 1, then MAKER will make a gene model out of any prediction that
>>>>> matches the other parameters (like single_exon, min_exon, etc).
>>>>>Setting
>>>>> the keep_preds option to somewhere in between 0 and 1 will set a
>>>>> ceiling
>>>>> on the AED required for promoting a prediction to a gene model.
>>>>> 
>>>>> From a user standpoint, when you will almost certainly lose gene
>>>>>models
>>>>> when you set AED at an intermediate value, but you might benefit by
>>>>> knowing that all your models will now have an AED of at least a
>>>>>certain
>>>>> value. 
>>>>> 
>>>>> I hope that helps; let me know if it didn't.
>>>>> 
>>>>> ~Daniel
>>>>> 
>>>>> PS The original paper that described the AED is Eilbeck et al in BMC
>>>>> Bioinformatics 2009. It's also discussed in more detail in the MAKER2
>>>>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews
>>>>> Genetics paper from 2012.
>>>>> 
>>>>> Daniel Ence
>>>>> Graduate Student
>>>>> Eccles Institute of Human Genetics
>>>>> University of Utah
>>>>> 15 North 2030 East, Room 2100
>>>>> Salt Lake City, UT 84112-5330
>>>>> ________________________________________
>>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>>>>> Mikael Brandstr?m Durling [mikael.durling at slu.se]
>>>>> Sent: Monday, March 10, 2014 4:27 AM
>>>>> To: maker-devel at yandell-lab.org
>>>>> Subject: [maker-devel] keep_preds values
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> Can someone, please, explain the keep_preds parameter, as it works
>>>>>now
>>>>> with a value between 1 and 0? It used to be binary, but now it seems
>>>>>to
>>>>> test concordance towards something. The maker wiki doesn?t explain it
>>>>> any
>>>>> further either.
>>>>> 
>>>>> Thanks,
>>>>> Mikael
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> maker-devel mailing list
>>>>> maker-devel at box290.bluehost.com
>>>>> 
>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or
>>>>>g
>>>>> 
>>>>> _______________________________________________
>>>>> maker-devel mailing list
>>>>> maker-devel at box290.bluehost.com
>>>>> 
>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or
>>>>>g
>>>> 
>>>> 
>>> 
>> 
>> 
>


From carsonhh at gmail.com  Mon Mar 10 10:18:14 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Mar 2014 09:18:14 -0700
Subject: [maker-devel] keep_preds values
In-Reply-To: <CF4331A9.A9E0%carsonhh@gmail.com>
References: <CF432CF3.A9C7%carsonhh@gmail.com>
	<E01F696F-4FC2-4B22-86B7-E40A5585A6F1@slu.se>
	<CF432F23.A9D4%carsonhh@gmail.com>
	<00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se>
	<CF4331A9.A9E0%carsonhh@gmail.com>
Message-ID: <CF4333C1.AA06%carsonhh@gmail.com>

Sorry meant to say "3 predictors and 2 make perfectly
matching calls and 1 makes no call."


On 3/10/14, 9:16 AM, "Carson Holt" <carsonhh at gmail.com> wrote:

>There is a value called abAED being calculated, which somewhat captures
>the concordance among the predictors.  It is not currently printed in the
>GFF3, but it is used to identify the best non-overlapping ab initio
>predictor to put in the non-overlapping fasta file.  There are a couple of
>things I still need to do with it to though.  It?s not yet normalized to
>take into account the absence of a predictor in the cluster of overlapping
>predictions. For example, if I have 2 predictors and 2 make perfectly
>matching calls and 1 makes no call, they get a score of 0 before I have
>perfect concordance between what?s there, but I really should make it 0.33
>because the abscence of the third predictor is meaningful.  The
>unnormalized concordance value is fine for deciding which overlapping
>model to keep in the file, but not for global comparison.
>
>?Carson
>
>
>
>On 3/10/14, 8:08 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
>wrote:
>
>>Ok. But that is not implemented no as far as I can tell from the source,
>>right? Or is it reflected in the AED for the unsupported models?
>>
>>Mikael
>>
>>10 mar 2014 kl. 16:59 skrev Carson Holt <carsonhh at gmail.com>:
>>
>>> Yes.  It will eventually perform an AED like calculation between
>>>multiple
>>> predictors (i.e. if you use 3 predictors it, then you require support
>>>by
>>> at least 2 predictors across all exons to get a value of 0.33).  A
>>>value
>>> of 0 would be perfect concordance across all 3 predictors.
>>> 
>>> ?Carson
>>> 
>>> 
>>> 
>>> 
>>> On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling"
>>><mikael.durling at slu.se>
>>> wrote:
>>> 
>>>> Hi Carson and Daniel,
>>>> 
>>>> That sounds more logical to me.  Then it would be appropriate to
>>>>change
>>>> the comment of keep_preds in the generated config files.
>>>> 
>>>> Would it make sense to make keep_preds a non-binary value to evaluate
>>>>the
>>>> concordance between ab initio models obtained from different
>>>>predictors?
>>>> That would assume that it is less likely to be a false positive when
>>>>two
>>>> or more predictors suggest the same unsported model?
>>>> 
>>>> Mikael
>>>> 
>>>> 
>>>> 10 mar 2014 kl. 16:51 skrev Carson Holt <carsonhh at gmail.com>:
>>>> 
>>>>> Actually that is false. The keep_preds option is still binary.  Any
>>>>> value
>>>>> other than 0 sets it to true.  There was discussion about making it a
>>>>> non-binary value, but that has not been implemented.
>>>>> 
>>>>> ?Carson
>>>>> 
>>>>> 
>>>>> On 3/10/14, 7:47 AM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>>>> 
>>>>>> Hi Mikael, 
>>>>>> 
>>>>>> The keep_preds parameter is often used the same as a binary
>>>>>>parameter,
>>>>>> but it doesn't have to be. The concordance that is mentioned in the
>>>>>> comment line is the AED for that prediction. AED is a measurement of
>>>>>> how
>>>>>> well a prediction is supported by the evidence and ranges from 0 -
>>>>>>1. A
>>>>>> prediction with an AED of 0 matches the evidence exactly while a
>>>>>> prediction with an AED of 1 isn't overlapped by any evidence.
>>>>>> 
>>>>>> The default behavior for MAKER is to make a gene model out of a
>>>>>> prediction with any AED <1. When you change the keep_preds option
>>>>>>from
>>>>>> 0
>>>>>> to 1, then MAKER will make a gene model out of any prediction that
>>>>>> matches the other parameters (like single_exon, min_exon, etc).
>>>>>>Setting
>>>>>> the keep_preds option to somewhere in between 0 and 1 will set a
>>>>>> ceiling
>>>>>> on the AED required for promoting a prediction to a gene model.
>>>>>> 
>>>>>> From a user standpoint, when you will almost certainly lose gene
>>>>>>models
>>>>>> when you set AED at an intermediate value, but you might benefit by
>>>>>> knowing that all your models will now have an AED of at least a
>>>>>>certain
>>>>>> value. 
>>>>>> 
>>>>>> I hope that helps; let me know if it didn't.
>>>>>> 
>>>>>> ~Daniel
>>>>>> 
>>>>>> PS The original paper that described the AED is Eilbeck et al in BMC
>>>>>> Bioinformatics 2009. It's also discussed in more detail in the
>>>>>>MAKER2
>>>>>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews
>>>>>> Genetics paper from 2012.
>>>>>> 
>>>>>> Daniel Ence
>>>>>> Graduate Student
>>>>>> Eccles Institute of Human Genetics
>>>>>> University of Utah
>>>>>> 15 North 2030 East, Room 2100
>>>>>> Salt Lake City, UT 84112-5330
>>>>>> ________________________________________
>>>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>>>>>> Mikael Brandstr?m Durling [mikael.durling at slu.se]
>>>>>> Sent: Monday, March 10, 2014 4:27 AM
>>>>>> To: maker-devel at yandell-lab.org
>>>>>> Subject: [maker-devel] keep_preds values
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Can someone, please, explain the keep_preds parameter, as it works
>>>>>>now
>>>>>> with a value between 1 and 0? It used to be binary, but now it seems
>>>>>>to
>>>>>> test concordance towards something. The maker wiki doesn?t explain
>>>>>>it
>>>>>> any
>>>>>> further either.
>>>>>> 
>>>>>> Thanks,
>>>>>> Mikael
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> maker-devel mailing list
>>>>>> maker-devel at box290.bluehost.com
>>>>>> 
>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o
>>>>>>r
>>>>>>g
>>>>>> 
>>>>>> _______________________________________________
>>>>>> maker-devel mailing list
>>>>>> maker-devel at box290.bluehost.com
>>>>>> 
>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o
>>>>>>r
>>>>>>g
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>
>
>


From carsonhh at gmail.com  Mon Mar 10 10:25:50 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Mar 2014 09:25:50 -0700
Subject: [maker-devel] annotation comparison aed plots
Message-ID: <CF4330EC.A9DA%carsonhh@gmail.com>

I don?t know about Michaels?s script, but I?ve always used eval.  It
produces sensitivity/specificity metrics.  It assumes the first models are
100% correct, and then tells you the sensitivity/specificity value for the
second models.

It is not therefor a quality metric.  Instead you should view it as a change
metric. Lower sensitivity tells you that models/exons have been lost between
versions, and lower specificity tells you models/exons have been gained.
There will also be a lost of generic statistics on exon/intron distribution
and UTR length.  Then the AED values from the MAEKR run can be used
independently to evaluate how well models match the evidence.

?Carson


From:  "Robert King (RRes-Roth)" <robert.king at rothamsted.ac.uk>
Date:  Monday, March 10, 2014 at 5:17 AM
To:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  [maker-devel] annotation comparison aed plots

Dear Maker Developers,
 
I?ve updated a reference that was had errors and was a little incomplete and
now trying to produce a annotation for it. Please note the reference has not
changed dramatically. I?ve produced two annotations using as evidence:
 
Annotation 1:
Uniprot proteins search using species keyword ?fusarium?
Pubmed mRNA for the name of the organism
Prior annotation reference transcripts
 
Annotation 2:
Uniprot proteins search using species keyword ?fusarium?
Pubmed mRNA for the name of the organism
Prior annotation reference transcripts
mRNA trinity assembly pasafly of different strain (only RNA-seq available)
 
I?m not sure if it was a smart move to use the prior annotation reference
transcripts?
 
I want to compare these two annotations and have produced AED scores. How do
I generate summary stats/figures to compare annotations. You mentioned last
year in a post Mike Campbell has a script to produce these, do you know if
he will post it? I?ve got the Eval program and converted to gtf format using
the provided script, just waiting on some perl modules to be installed by
admin to test it. I?m waiting on some perl modules to be installed by our
administrator to test out the ?Evaluator? and ?compare? programs too, what
do they do?
 
Best Wishes
Rob

-- 
This message has been scanned for viruses and
dangerous content by MailScanner <http://www.mailscanner.info/> , and
we believe  but do not warrant that this e-mail and any attachments thereto
do not contain any viruses. However, you are fully responsible for
performing any virus scanning.
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140310/cbd8263c/attachment-0001.html>

From michael.s.campbell1 at gmail.com  Mon Mar 10 09:50:53 2014
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Mon, 10 Mar 2014 09:50:53 -0600
Subject: [maker-devel] annotation comparison aed plots
In-Reply-To: <CAAi6vWVWuP4b39zf+3k_SAwKuWxAFGRvAD3oNCugkuPLjagOww@mail.gmail.com>
References: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk>
	<CAAi6vWVWuP4b39zf+3k_SAwKuWxAFGRvAD3oNCugkuPLjagOww@mail.gmail.com>
Message-ID: <CAAi6vWUSY6UgyyXAJ5=-aUA_39FBwFREVX3xmeHSZaE264AKGw@mail.gmail.com>

One more point. The sensitivity, specificity,and accuracy produced by the
compare_annotations_3.2.pl script are gene level, and overlap is defined
very liberally between annotation sets is defined as at least one
nucleotide of an exon overlap.
Mike


On Mon, Mar 10, 2014 at 9:47 AM, Michael Campbell <
michael.s.campbell1 at gmail.com> wrote:

> Hi Robert,
>
> Here are the scripts that were mentioned before.
>
> The AED_cdf_generator.pl script is for making cumulative distribution
> function plots based on annotation edit distance. This script is quite
> simple and strait forward in its internals.
>
> The compare_annotations_3.2.pl script is for generating summary stats for
> annotations and will compare two annotations of the same assembly.
>
> You can run either script without arguments to get a usage statement.
>
> Thanks,
> Mike
>
>
> On Mon, Mar 10, 2014 at 6:17 AM, Robert King (RRes-Roth) <
> robert.king at rothamsted.ac.uk> wrote:
>
>>  Dear Maker Developers,
>>
>>
>>
>> I've updated a reference that was had errors and was a little incomplete
>> and now trying to produce a annotation for it. Please note the reference
>> has not changed dramatically. I've produced two annotations using as
>> evidence:
>>
>>
>>
>> Annotation 1:
>>
>> Uniprot proteins search using species keyword "fusarium"
>>
>> Pubmed mRNA for the name of the organism
>>
>> Prior annotation reference transcripts
>>
>>
>>
>> Annotation 2:
>>
>> Uniprot proteins search using species keyword "fusarium"
>>
>> Pubmed mRNA for the name of the organism
>>
>> Prior annotation reference transcripts
>>
>> mRNA trinity assembly pasafly of different strain (only RNA-seq available)
>>
>>
>>
>> I'm not sure if it was a smart move to use the prior annotation reference
>> transcripts?
>>
>>
>>
>> I want to compare these two annotations and have produced AED scores. How
>> do I generate summary stats/figures to compare annotations. You mentioned
>> last year in a post Mike Campbell has a script to produce these, do you
>> know if he will post it? I've got the Eval program and converted to gtf
>> format using the provided script, just waiting on some perl modules to be
>> installed by admin to test it. I'm waiting on some perl modules to be
>> installed by our administrator to test out the "Evaluator" and "compare"
>> programs too, what do they do?
>>
>>
>>
>> Best Wishes
>>
>> Rob
>>
>> --
>> This message has been scanned for viruses and
>> dangerous content by *MailScanner* <http://www.mailscanner.info/>, and
>> we believe but do not warrant that this e-mail and any attachments
>> thereto do not contain any viruses. However, you are fully responsible for
>> performing any virus scanning.
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>
>
> --
> Michael Campbell MS, RD.
> Doctoral Candidate
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ph:585-3543
>
>


-- 
Michael Campbell MS, RD.
Doctoral Candidate
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
ph:585-3543
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140310/25073390/attachment-0001.html>

From cjfields at illinois.edu  Mon Mar 10 09:52:50 2014
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Mon, 10 Mar 2014 15:52:50 +0000
Subject: [maker-devel] geneid (or alternative ab initio predictors)
Message-ID: <CEB024AC-5E08-4827-9EC4-17D09F06E1FA@illinois.edu>

I have been running MAKER 2.31 using Augustus and SNAP on an avian genome.  Augustus gives pretty decent gene model predictions based on a custom model we have and the hints MAKER provides.  However, SNAP seems to throw out a ton of false positives; in many cases this appears to cause erroneous gene fusions.  Leaving out SNAP altogether however leads to a marked decrease in # models overall, which is worse.  GeneMark had a very similar problem (high # false positives) and thus no marked improvement, either when using with both Augustus and SNAP or with Augustus alone.

I have been exploring using geneid (http://genome.crg.es/software/geneid/) as an alternative, based on some feedback on another project I worked with int he past.  This would be feed into MAKER using external GFF, but I wanted to see if anyone has tried geneid with MAKER first.  

Finally, how hard would it be to incorporate alternative callers into MAKER?  For instance, would it be possible to add these like a ?plugin??  

chris


From carsonhh at gmail.com  Mon Mar 10 11:05:24 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Mar 2014 10:05:24 -0700
Subject: [maker-devel] geneid (or alternative ab initio predictors)
Message-ID: <CF433C40.AA26%carsonhh@gmail.com>

Adding a new predictor can take some time.  It obviously requires some
coding.  It?s usually not too hard just to convert results to GFF3 and
then pass it in.  Integrated support is really only beneficial for
predictors that can take ?hints? from evidence alignments (for example we
are working on EVM integration right now -
http://evidencemodeler.sourceforge.net).  If SNAP and GeneMark give
problems just drop them.  GeneMark really doesn?t work very good on
genomes with complex intron/exon structure (and I really wouldn?t use it
for anything but fungi).

Make sure you are also giving sufficient protein evidence.  Perhaps all
proteins from chicken and pigeon for example.  Then you shouldn?t find
loss of any true genes if just using Augustus.  Also try not to use gene
count as an indicator of performance.  The value is very deceptive,
especially if the genome assembly is fragmented.

Thanks,
Carson


On 3/10/14, 8:52 AM, "Fields, Christopher J" <cjfields at illinois.edu> wrote:

>I have been running MAKER 2.31 using Augustus and SNAP on an avian
>genome.  Augustus gives pretty decent gene model predictions based on a
>custom model we have and the hints MAKER provides.  However, SNAP seems
>to throw out a ton of false positives; in many cases this appears to
>cause erroneous gene fusions.  Leaving out SNAP altogether however leads
>to a marked decrease in # models overall, which is worse.  GeneMark had a
>very similar problem (high # false positives) and thus no marked
>improvement, either when using with both Augustus and SNAP or with
>Augustus alone.
>
>I have been exploring using geneid
>(http://genome.crg.es/software/geneid/) as an alternative, based on some
>feedback on another project I worked with int he past.  This would be
>feed into MAKER using external GFF, but I wanted to see if anyone has
>tried geneid with MAKER first.
>
>Finally, how hard would it be to incorporate alternative callers into
>MAKER?  For instance, would it be possible to add these like a ?plugin??
>
>chris
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From michael.s.campbell1 at gmail.com  Mon Mar 10 09:47:50 2014
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Mon, 10 Mar 2014 09:47:50 -0600
Subject: [maker-devel] annotation comparison aed plots
In-Reply-To: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk>
References: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk>
Message-ID: <CAAi6vWVWuP4b39zf+3k_SAwKuWxAFGRvAD3oNCugkuPLjagOww@mail.gmail.com>

Hi Robert,

Here are the scripts that were mentioned before.

The AED_cdf_generator.pl script is for making cumulative distribution
function plots based on annotation edit distance. This script is quite
simple and strait forward in its internals.

The compare_annotations_3.2.pl script is for generating summary stats for
annotations and will compare two annotations of the same assembly.

You can run either script without arguments to get a usage statement.

Thanks,
Mike


On Mon, Mar 10, 2014 at 6:17 AM, Robert King (RRes-Roth) <
robert.king at rothamsted.ac.uk> wrote:

>  Dear Maker Developers,
>
>
>
> I've updated a reference that was had errors and was a little incomplete
> and now trying to produce a annotation for it. Please note the reference
> has not changed dramatically. I've produced two annotations using as
> evidence:
>
>
>
> Annotation 1:
>
> Uniprot proteins search using species keyword "fusarium"
>
> Pubmed mRNA for the name of the organism
>
> Prior annotation reference transcripts
>
>
>
> Annotation 2:
>
> Uniprot proteins search using species keyword "fusarium"
>
> Pubmed mRNA for the name of the organism
>
> Prior annotation reference transcripts
>
> mRNA trinity assembly pasafly of different strain (only RNA-seq available)
>
>
>
> I'm not sure if it was a smart move to use the prior annotation reference
> transcripts?
>
>
>
> I want to compare these two annotations and have produced AED scores. How
> do I generate summary stats/figures to compare annotations. You mentioned
> last year in a post Mike Campbell has a script to produce these, do you
> know if he will post it? I've got the Eval program and converted to gtf
> format using the provided script, just waiting on some perl modules to be
> installed by admin to test it. I'm waiting on some perl modules to be
> installed by our administrator to test out the "Evaluator" and "compare"
> programs too, what do they do?
>
>
>
> Best Wishes
>
> Rob
>
> --
> This message has been scanned for viruses and
> dangerous content by *MailScanner* <http://www.mailscanner.info/>, and
> we believe but do not warrant that this e-mail and any attachments thereto
> do not contain any viruses. However, you are fully responsible for
> performing any virus scanning.
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>


-- 
Michael Campbell MS, RD.
Doctoral Candidate
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
ph:585-3543
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140310/e21497bc/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: AED_cdf_generator.pl
Type: text/x-perl-script
Size: 2579 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140310/e21497bc/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: compare_annotations_3.2.pl
Type: text/x-perl-script
Size: 29154 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140310/e21497bc/attachment-0003.bin>

From sajeet at gmail.com  Mon Mar 10 12:31:40 2014
From: sajeet at gmail.com (Sajeet Haridas)
Date: Mon, 10 Mar 2014 11:31:40 -0700
Subject: [maker-devel] geneid (or alternative ab initio predictors)
In-Reply-To: <CF433C40.AA26%carsonhh@gmail.com>
References: <CF433C40.AA26%carsonhh@gmail.com>
Message-ID: <CAJrwUqnXZgJgse2X6z7QhQcC_aNih_dp90dpjsW037F0Qk-W4A@mail.gmail.com>

One of the problems I have found with genemark is that it does not
understand a soft-masked genome. Hence, the self training is incorrect. I
have found marked improvement to genemark's prediction by running the
training on a hard masked genome.


On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt <carsonhh at gmail.com> wrote:

> Adding a new predictor can take some time.  It obviously requires some
> coding.  It's usually not too hard just to convert results to GFF3 and
> then pass it in.  Integrated support is really only beneficial for
> predictors that can take "hints" from evidence alignments (for example we
> are working on EVM integration right now -
> http://evidencemodeler.sourceforge.net).  If SNAP and GeneMark give
> problems just drop them.  GeneMark really doesn't work very good on
> genomes with complex intron/exon structure (and I really wouldn't use it
> for anything but fungi).
>
> Make sure you are also giving sufficient protein evidence.  Perhaps all
> proteins from chicken and pigeon for example.  Then you shouldn't find
> loss of any true genes if just using Augustus.  Also try not to use gene
> count as an indicator of performance.  The value is very deceptive,
> especially if the genome assembly is fragmented.
>
> Thanks,
> Carson
>
>
>
> On 3/10/14, 8:52 AM, "Fields, Christopher J" <cjfields at illinois.edu>
> wrote:
>
> >I have been running MAKER 2.31 using Augustus and SNAP on an avian
> >genome.  Augustus gives pretty decent gene model predictions based on a
> >custom model we have and the hints MAKER provides.  However, SNAP seems
> >to throw out a ton of false positives; in many cases this appears to
> >cause erroneous gene fusions.  Leaving out SNAP altogether however leads
> >to a marked decrease in # models overall, which is worse.  GeneMark had a
> >very similar problem (high # false positives) and thus no marked
> >improvement, either when using with both Augustus and SNAP or with
> >Augustus alone.
> >
> >I have been exploring using geneid
> >(http://genome.crg.es/software/geneid/) as an alternative, based on some
> >feedback on another project I worked with int he past.  This would be
> >feed into MAKER using external GFF, but I wanted to see if anyone has
> >tried geneid with MAKER first.
> >
> >Finally, how hard would it be to incorporate alternative callers into
> >MAKER?  For instance, would it be possible to add these like a 'plugin'?
> >
> >chris
> >_______________________________________________
> >maker-devel mailing list
> >maker-devel at box290.bluehost.com
> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140310/e3f33e33/attachment-0001.html>

From carsonhh at gmail.com  Mon Mar 10 22:13:43 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Mar 2014 22:13:43 -0600
Subject: [maker-devel] Long introns from Augustus
In-Reply-To: <61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com>
References: <CF3E5643.A94C%carsonhh@gmail.com>
	<61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com>
Message-ID: <99883695-A1E7-4B03-BB8D-06863D8132E5@gmail.com>

Maybe.  The max intron length will affect evidence alignments and clustering, which will be used as hints to Augustus. You can give it a try. If you lack transcriptome data, just make sure you provide it with a couple of related proteomes.

--Carson

Sent from my iPhone

> On Mar 6, 2014, at 5:48 PM, Shane Brubaker <sbrubaker at solazyme.com> wrote:
> 
> Actually these are calls directly from Augustus (without using Maker).  They are not purely ab initio in that they are using hints from RNA-Seq data.
> 
> I had noticed that Maker does have some information about max intron length - does that mean it could be taken care of by Maker?  I don't have very good "EST" (transcriptome) assemblies because it is a very difficult organism to sequence.
> 
> 
> -----Original Message-----
> From: Carson Holt [mailto:carsonhh at gmail.com] 
> Sent: Thursday, March 06, 2014 3:47 PM
> To: Shane Brubaker; maker-devel at yandell-lab.org
> Subject: Re: [maker-devel] Long introns from Augustus
> 
> Are these the ab intio calls that are merged or final MAKER models.
> 
> ?Carson
> 
> 
>> On 3/6/14, 4:41 PM, "Shane Brubaker" <sbrubaker at solazyme.com> wrote:
>> 
>> Hi, we have a very compact genome and we are getting a lot of fused 
>> gene models from running Augustus.  I am wondering if anyone has any 
>> advice about how to prevent introns above a certain cutoff from being created?
>> 
>> I tried a couple of things, some settings in a probabilities file and 
>> also changing a long list of probabilities to another file that someone 
>> had suggested on a forum.  So far I don't really see any changes though.
>> 
>> Any advice would be greatly appreciated.
>> 
>> Thanks,
>> Shane
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> 


From darasappan at gmail.com  Mon Mar 10 14:14:03 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Mon, 10 Mar 2014 15:14:03 -0500
Subject: [maker-devel] maker output- transcripts.fasta and proteins.fasta
	files missing
Message-ID: <E1538E4B-E356-4044-BD96-6D56D6F65C87@gmail.com>

Hello,

I've been running maker with different assembly files, reference files  
etc  and I check the output by:

1. concatenating the gff files
2. concatenating the *transcripts.fasta files
3. concatenating the *proteins.fasta files

I'm noticing that when I ran maker twice with same parameters, the  
second time around, many of the output subdirectories  do not have a  
*transcripts.fasta or *proteins.fasta file in it.
There are 251 subdirectories and only 97 of them have all 3 output  
files.  Maker log looks ok to me, but I've attached it here as well.

What could be the reason for this?

Thanks
dhivya

-------------- next part --------------
A non-text attachment was scrubbed...
Name: maker.o1813247.gz
Type: application/x-gzip
Size: 13857217 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140310/34f3a118/attachment-0001.gz>
-------------- next part --------------


From sbrubaker at solazyme.com  Tue Mar 11 11:06:57 2014
From: sbrubaker at solazyme.com (Shane Brubaker)
Date: Tue, 11 Mar 2014 17:06:57 +0000
Subject: [maker-devel] Long introns from Augustus
In-Reply-To: <99883695-A1E7-4B03-BB8D-06863D8132E5@gmail.com>
References: <CF3E5643.A94C%carsonhh@gmail.com>
	<61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com>
	<99883695-A1E7-4B03-BB8D-06863D8132E5@gmail.com>
Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA50F08FB3@EXCHANGE-MB01.internal.solazyme.com>

Ok thank you.

-----Original Message-----
From: Carson Holt [mailto:carsonhh at gmail.com] 
Sent: Monday, March 10, 2014 9:14 PM
To: Shane Brubaker
Cc: maker-devel at yandell-lab.org
Subject: Re: [maker-devel] Long introns from Augustus

Maybe.  The max intron length will affect evidence alignments and clustering, which will be used as hints to Augustus. You can give it a try. If you lack transcriptome data, just make sure you provide it with a couple of related proteomes.

--Carson

Sent from my iPhone

> On Mar 6, 2014, at 5:48 PM, Shane Brubaker <sbrubaker at solazyme.com> wrote:
> 
> Actually these are calls directly from Augustus (without using Maker).  They are not purely ab initio in that they are using hints from RNA-Seq data.
> 
> I had noticed that Maker does have some information about max intron length - does that mean it could be taken care of by Maker?  I don't have very good "EST" (transcriptome) assemblies because it is a very difficult organism to sequence.
> 
> 
> -----Original Message-----
> From: Carson Holt [mailto:carsonhh at gmail.com]
> Sent: Thursday, March 06, 2014 3:47 PM
> To: Shane Brubaker; maker-devel at yandell-lab.org
> Subject: Re: [maker-devel] Long introns from Augustus
> 
> Are these the ab intio calls that are merged or final MAKER models.
> 
> ?Carson
> 
> 
>> On 3/6/14, 4:41 PM, "Shane Brubaker" <sbrubaker at solazyme.com> wrote:
>> 
>> Hi, we have a very compact genome and we are getting a lot of fused 
>> gene models from running Augustus.  I am wondering if anyone has any 
>> advice about how to prevent introns above a certain cutoff from being created?
>> 
>> I tried a couple of things, some settings in a probabilities file and 
>> also changing a long list of probabilities to another file that 
>> someone had suggested on a forum.  So far I don't really see any changes though.
>> 
>> Any advice would be greatly appreciated.
>> 
>> Thanks,
>> Shane
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o
>> rg
> 
> 

From carson.holt at genetics.utah.edu  Thu Mar 13 10:00:06 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Thu, 13 Mar 2014 16:00:06 +0000
Subject: [maker-devel] non-nucleotide characters in the maker generated
	transcripts
In-Reply-To: <CF47300B.AB4F%carson.holt@genetics.utah.edu>
References: <E8EDFB90D92694478065C37017B3A3A6A890C8AC@SKREGIXES2.AGR.GC.CA>
	<CF47300B.AB4F%carson.holt@genetics.utah.edu>
Message-ID: <CF4731CC.AB5E%carson.holt@genetics.utah.edu>

Just resending this to the correct maker-devel address.  Please when
replying, do not CC the incorrect maker-devel-bounce address.

Thanks,
Carson


On 3/13/14, 9:56 AM, "Carson Holt" <carson.holt at genetics.utah.edu> wrote:

>FGENESH is not a heavily used tool, so depending on which version it is
>(either too old or too new), output might be slightly different which
>could cause incorrect parsing. Could you tar up your maker.output folder,
>and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>(send me either your user/guest ID after you upload).
>
>For the BLAST error, use BLAST+ instead.  You are using blastall which is
>the old legacy version of NCBI BLAST.  You can do this by setting the
>blast type in maker_bopts.ctl and the location of executables in
>maker_exe.ctl.
>
>Thanks,
>Carson
>
>
>
>On 3/12/14, 11:58 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:
>
>>Dear Maker users
>>
>>
>>I ran maker (2.31) on a fungal genome and found out that it inserted the
>>word SCLAR   followed by a pair of bracket like this (0x22de7020)
>>inserted in the nucleotide sequence of some of the genes. This seems to
>>be related to transcripts predicted by fgenesh_masked.
>>
>>
>>Here is an example for one of the genes
>>
>>
>>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript
>>>offset:0 AE
>>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651
>>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23
>>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA
>>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG
>>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC
>>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT
>>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC
>>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT
>>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA
>>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA
>>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT
>>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT
>>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC
>>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG
>>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG
>>TTTCGACAAGC
>>
>>The same genome sequence was used for the first round of maker (2.10)
>>without such problem. I checked the sequence for the scaffold related to
>>one of the affected transcripts and there was no error in the sequence.
>>I am not sure what is causing this. The only error that I could spot in
>>the output error file is the following
>>
>>
>>[blastall] FATAL ERROR:  search cannot proceed due to errors in all
>>contexts/frames of query sequences.
>>
>>
>>
>>Your help is appreciated
>>
>>
>>
>>HB
>>
>>
>>
>>
>>
>>
>


From carsonhh at gmail.com  Thu Mar 13 10:14:54 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 13 Mar 2014 10:14:54 -0600
Subject: [maker-devel] maker output- transcripts.fasta and
	proteins.fasta files missing
In-Reply-To: <A22880BB-7693-4655-A3F3-D99D4F1FC08D@gmail.com>
References: <E1538E4B-E356-4044-BD96-6D56D6F65C87@gmail.com>
	<64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com>
	<CF4382AA.AA8B%carsonhh@gmail.com>
	<A1D096BC-F25A-48D9-8C7F-8A64946E57F7@gmail.com>
	<CF438653.AA92%carsonhh@gmail.com>
	<A22880BB-7693-4655-A3F3-D99D4F1FC08D@gmail.com>
Message-ID: <CF4733ED.AB63%carsonhh@gmail.com>

Note protein/transcript fasts are only created when there are gene models to
output to those files (so their absence means there were no gene models for
that contig). Most sequences without protein/transcript fasts in your sample
are very short and thus don?t contain anything.  What is left either have no
est2genome results or the est2genome alignments do not have sufficient open
reading frame to be turned into a gene model (false merging of regions by
trinity can cause this, so make sure you use the jaccard index option when
assembling reads with trinity to avoid this).

You are using only the est2genome=1 option.  This will result in a limited
set of genes that can be used for training SNAP/Augustus (so not getting
results on all contigs is expected).  You really won?t get much as far as
results until you have one of the ab initio predictors turned on.

Thanks,
Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Tuesday, March 11, 2014 at 8:52 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Daniel Ence <dence at genetics.utah.edu>
Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
missing

Alright done. My username is daras

Thanks
Dhivya

On Mar 10, 2014, at 5:10 PM, Carson Holt wrote:

> Input and compressed file of output.
> 
> Thanks,
> Carson
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Monday, March 10, 2014 at 2:09 PM
> To:  Carson Holt <carsonhh at gmail.com>
> Cc:  Daniel Ence <dence at genetics.utah.edu>
> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files missing
> 
> Hi Carson,
> 
> Do you mean the whole maker output?
> 
> Thanks
> dhivya
> 
> On Mar 10, 2014, at 4:55 PM, Carson Holt wrote:
> 
>> Could you upload everything here ?>
>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>> 
>> Than send us the link generated or your user ID.
>> 
>> Thanks,
>> Carson
>> 
>> 
>> 
>> From:  dhivya arasappan <darasappan at gmail.com>
>> Date:  Monday, March 10, 2014 at 1:50 PM
>> To:  Carson Holt <carsonhh at gmail.com>, Daniel Ence <dence at genetics.utah.edu>
>> Subject:  Fwd: maker output- transcripts.fasta and proteins.fasta files
>> missing
>> 
>> Hi Carson and Daniel,
>> 
>> I'm sending this across to you separately since maker list is blocking my
>> email due to attachment size.
>> 
>> As always, thanks for any guidance you can provide.
>> Dhivya
>> 
>> 
>> Begin forwarded message:
>> 
>>> From: dhivya arasappan <darasappan at gmail.com>
>>> Date: March 10, 2014 3:14:03 PM CDT
>>> To: maker-devel at yandell-lab.org
>>> Subject: maker output- transcripts.fasta and proteins.fasta files missing
>>> 
>>>  
>>> Hello,
>>> 
>>> I've been running maker with different assembly files, reference files etc
>>> and I check the output by:
>>> 
>>> 1. concatenating the gff files
>>> 2. concatenating the *transcripts.fasta files
>>> 3. concatenating the *proteins.fasta files
>>> 
>>> I'm noticing that when I ran maker twice with same parameters, the second
>>> time around, many of the output subdirectories  do not have a
>>> *transcripts.fasta or *proteins.fasta file in it.
>>> There are 251 subdirectories and only 97 of them have all 3 output files.
>>> Maker log looks ok to me, but I've attached it here as well.
>>> 
>>> What could be the reason for this?
>>> 
>>> Thanks
>>> dhivya
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140313/1484b1b6/attachment-0001.html>

From carsonhh at gmail.com  Thu Mar 13 10:55:40 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 13 Mar 2014 10:55:40 -0600
Subject: [maker-devel] maker output- transcripts.fasta and
	proteins.fasta files missing
In-Reply-To: <0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com>
References: <E1538E4B-E356-4044-BD96-6D56D6F65C87@gmail.com>
	<64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com>
	<CF4382AA.AA8B%carsonhh@gmail.com>
	<A1D096BC-F25A-48D9-8C7F-8A64946E57F7@gmail.com>
	<CF438653.AA92%carsonhh@gmail.com>
	<A22880BB-7693-4655-A3F3-D99D4F1FC08D@gmail.com>
	<CF4733ED.AB63%carsonhh@gmail.com>
	<0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com>
Message-ID: <CF473DBA.AB9F%carsonhh@gmail.com>

The second time, it should have just started where it left off, so it would
run faster (because the processing from the previous job counted towards the
second one).  The archived output you sent me had 21,183 proteins and
transcripts.  If you are using the fasta_merge to collect them, just make
sure the datastore.index file is not truncated or corrupt otherwise it won?t
collect all the fastas from every contig.  You can rebuild the
datastore.index using the -dsindex flag with MAKER, if you want to check
that.  Also you can have maker just regenerate results without rerunning
BLAST etc., by using the -a flag if you want to just recalculate ll results
quickly (rebuilds all FASTA and GFF3 without redoing most analysis).

?Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Thursday, March 13, 2014 at 10:47 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
missing

Thanks Carson for the response.  I understand that est2genome=1 does not use
any ab initio gene predictions, but simply identifies ests based on
alignment.  I'm a little confused because I ran maker on my assembly before,
using the same parameters ( including est2genome=1).  I got a very good
result with > 20,000 transcripts and proteins.

Then I  was able to get an improved assembly, where many scaffolds were
combined into superscaffolds. So I reran maker on this assembly.   Same
parameters, same transcriptome and proteins files.  Now, I see such
drastically different results:  Only 500+ genes and transcripts.  My
scaffolds are now bigger than before, so I'm not sure how this is happening.
These were the results I sent you.

Another odd thing I noticed (and I am hesitant to report this because
perhaps it is due to some sort of error on my part):  I ran maker on the
improved assembly the first time and maker did not complete in the 48 hours
I allocated.  But I had  19,000+ transcripts in the unfinished output.  When
I reran maker, just changing the time allocated, it completed much faster,
but is giving much fewer transcripts and proteins as output.  Could
something like this happen? If not, then I'm guessing I must have changed
something although I'm pretty sure that I did not change anything other than
the time allocated. I've attached the trascripts and proteins files from the
first time I ran maker on my improved assembly.

Thanks again for your help
Dhivya


On Mar 13, 2014, at 11:14 AM, Carson Holt wrote:

> Note protein/transcript fasts are only created when there are gene models to
> output to those files (so their absence means there were no gene models for
> that contig). Most sequences without protein/transcript fasts in your sample
> are very short and thus don?t contain anything.  What is left either have no
> est2genome results or the est2genome alignments do not have sufficient open
> reading frame to be turned into a gene model (false merging of regions by
> trinity can cause this, so make sure you use the jaccard index option when
> assembling reads with trinity to avoid this).
> 
> You are using only the est2genome=1 option.  This will result in a limited set
> of genes that can be used for training SNAP/Augustus (so not getting results
> on all contigs is expected).  You really won?t get much as far as results
> until you have one of the ab initio predictors turned on.
> 
> Thanks,
> Carson
> 
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Tuesday, March 11, 2014 at 8:52 AM
> To:  Carson Holt <carsonhh at gmail.com>
> Cc:  Daniel Ence <dence at genetics.utah.edu>
> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files missing
> 
> Alright done. My username is daras
> 
> Thanks
> Dhivya
> 
> On Mar 10, 2014, at 5:10 PM, Carson Holt wrote:
> 
>> Input and compressed file of output.
>> 
>> Thanks,
>> Carson
>> 
>> From:  dhivya arasappan <darasappan at gmail.com>
>> Date:  Monday, March 10, 2014 at 2:09 PM
>> To:  Carson Holt <carsonhh at gmail.com>
>> Cc:  Daniel Ence <dence at genetics.utah.edu>
>> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
>> missing
>> 
>> Hi Carson,
>> 
>> Do you mean the whole maker output?
>> 
>> Thanks
>> dhivya
>> 
>> On Mar 10, 2014, at 4:55 PM, Carson Holt wrote:
>> 
>>> Could you upload everything here ?>
>>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>>> 
>>> Than send us the link generated or your user ID.
>>> 
>>> Thanks,
>>> Carson
>>> 
>>> 
>>> 
>>> From:  dhivya arasappan <darasappan at gmail.com>
>>> Date:  Monday, March 10, 2014 at 1:50 PM
>>> To:  Carson Holt <carsonhh at gmail.com>, Daniel Ence <dence at genetics.utah.edu>
>>> Subject:  Fwd: maker output- transcripts.fasta and proteins.fasta files
>>> missing
>>> 
>>> Hi Carson and Daniel,
>>> 
>>> I'm sending this across to you separately since maker list is blocking my
>>> email due to attachment size.
>>> 
>>> As always, thanks for any guidance you can provide.
>>> Dhivya
>>> 
>>> 
>>> Begin forwarded message:
>>> 
>>>> From: dhivya arasappan <darasappan at gmail.com>
>>>> Date: March 10, 2014 3:14:03 PM CDT
>>>> To: maker-devel at yandell-lab.org
>>>> Subject: maker output- transcripts.fasta and proteins.fasta files missing
>>>> 
>>>>  
>>>> Hello,
>>>> 
>>>> I've been running maker with different assembly files, reference files etc
>>>> and I check the output by:
>>>> 
>>>> 1. concatenating the gff files
>>>> 2. concatenating the *transcripts.fasta files
>>>> 3. concatenating the *proteins.fasta files
>>>> 
>>>> I'm noticing that when I ran maker twice with same parameters, the second
>>>> time around, many of the output subdirectories  do not have a
>>>> *transcripts.fasta or *proteins.fasta file in it.
>>>> There are 251 subdirectories and only 97 of them have all 3 output files.
>>>> Maker log looks ok to me, but I've attached it here as well.
>>>> 
>>>> What could be the reason for this?
>>>> 
>>>> Thanks
>>>> dhivya
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140313/a1a879a2/attachment-0001.html>

From darasappan at gmail.com  Thu Mar 13 10:47:25 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Thu, 13 Mar 2014 11:47:25 -0500
Subject: [maker-devel] maker output- transcripts.fasta and
	proteins.fasta files missing
In-Reply-To: <CF4733ED.AB63%carsonhh@gmail.com>
References: <E1538E4B-E356-4044-BD96-6D56D6F65C87@gmail.com>
	<64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com>
	<CF4382AA.AA8B%carsonhh@gmail.com>
	<A1D096BC-F25A-48D9-8C7F-8A64946E57F7@gmail.com>
	<CF438653.AA92%carsonhh@gmail.com>
	<A22880BB-7693-4655-A3F3-D99D4F1FC08D@gmail.com>
	<CF4733ED.AB63%carsonhh@gmail.com>
Message-ID: <0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com>

Thanks Carson for the response.  I understand that est2genome=1 does  
not use any ab initio gene predictions, but simply identifies ests  
based on alignment.  I'm a little confused because I ran maker on my  
assembly before, using the same parameters ( including est2genome=1).   
I got a very good result with > 20,000 transcripts and proteins.

Then I  was able to get an improved assembly, where many scaffolds  
were combined into superscaffolds. So I reran maker on this  
assembly.   Same parameters, same transcriptome and proteins files.   
Now, I see such drastically different results:  Only 500+ genes and  
transcripts.  My scaffolds are now bigger than before, so I'm not sure  
how this is happening.   These were the results I sent you.

Another odd thing I noticed (and I am hesitant to report this because  
perhaps it is due to some sort of error on my part):  I ran maker on  
the improved assembly the first time and maker did not complete in the  
48 hours I allocated.  But I had  19,000+ transcripts in the  
unfinished output.  When I reran maker, just changing the time  
allocated, it completed much faster, but is giving much fewer  
transcripts and proteins as output.  Could something like this happen?  
If not, then I'm guessing I must have changed something although I'm  
pretty sure that I did not change anything other than the time  
allocated. I've attached the trascripts and proteins files from the  
first time I ran maker on my improved assembly.

Thanks again for your help
Dhivya


On Mar 13, 2014, at 11:14 AM, Carson Holt wrote:

> Note protein/transcript fasts are only created when there are gene  
> models to output to those files (so their absence means there were  
> no gene models for that contig). Most sequences without protein/ 
> transcript fasts in your sample are very short and thus don?t  
> contain anything.  What is left either have no est2genome results or  
> the est2genome alignments do not have sufficient open reading frame  
> to be turned into a gene model (false merging of regions by trinity  
> can cause this, so make sure you use the jaccard index option when  
> assembling reads with trinity to avoid this).
>
> You are using only the est2genome=1 option.  This will result in a  
> limited set of genes that can be used for training SNAP/Augustus (so  
> not getting results on all contigs is expected).  You really won?t  
> get much as far as results until you have one of the ab initio  
> predictors turned on.
>
> Thanks,
> Carson
>
>
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Tuesday, March 11, 2014 at 8:52 AM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: Daniel Ence <dence at genetics.utah.edu>
> Subject: Re: maker output- transcripts.fasta and proteins.fasta  
> files missing
>
> Alright done. My username is daras
>
> Thanks
> Dhivya
>
> On Mar 10, 2014, at 5:10 PM, Carson Holt wrote:
>
>> Input and compressed file of output.
>>
>> Thanks,
>> Carson
>>
>> From: dhivya arasappan <darasappan at gmail.com>
>> Date: Monday, March 10, 2014 at 2:09 PM
>> To: Carson Holt <carsonhh at gmail.com>
>> Cc: Daniel Ence <dence at genetics.utah.edu>
>> Subject: Re: maker output- transcripts.fasta and proteins.fasta  
>> files missing
>>
>> Hi Carson,
>>
>> Do you mean the whole maker output?
>>
>> Thanks
>> dhivya
>>
>> On Mar 10, 2014, at 4:55 PM, Carson Holt wrote:
>>
>>> Could you upload everything here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>>>
>>> Than send us the link generated or your user ID.
>>>
>>> Thanks,
>>> Carson
>>>
>>>
>>>
>>> From: dhivya arasappan <darasappan at gmail.com>
>>> Date: Monday, March 10, 2014 at 1:50 PM
>>> To: Carson Holt <carsonhh at gmail.com>, Daniel Ence <dence at genetics.utah.edu 
>>> >
>>> Subject: Fwd: maker output- transcripts.fasta and proteins.fasta  
>>> files missing
>>>
>>> Hi Carson and Daniel,
>>>
>>> I'm sending this across to you separately since maker list is  
>>> blocking my email due to attachment size.
>>>
>>> As always, thanks for any guidance you can provide.
>>> Dhivya
>>>
>>>
>>> Begin forwarded message:
>>>
>>>> From: dhivya arasappan <darasappan at gmail.com>
>>>> Date: March 10, 2014 3:14:03 PM CDT
>>>> To: maker-devel at yandell-lab.org
>>>> Subject: maker output- transcripts.fasta and proteins.fasta files  
>>>> missing
>>>>
>>>> Hello,
>>>>
>>>> I've been running maker with different assembly files, reference  
>>>> files etc  and I check the output by:
>>>>
>>>> 1. concatenating the gff files
>>>> 2. concatenating the *transcripts.fasta files
>>>> 3. concatenating the *proteins.fasta files
>>>>
>>>> I'm noticing that when I ran maker twice with same parameters,  
>>>> the second time around, many of the output subdirectories  do not  
>>>> have a *transcripts.fasta or *proteins.fasta file in it.
>>>> There are 251 subdirectories and only 97 of them have all 3  
>>>> output files.  Maker log looks ok to me, but I've attached it  
>>>> here as well.
>>>>
>>>> What could be the reason for this?
>>>>
>>>> Thanks
>>>> dhivya
>>>>
>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140313/2048cfef/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: transcripts.cat.fasta.old.gz
Type: application/x-gzip
Size: 7927581 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140313/2048cfef/attachment-0002.gz>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140313/2048cfef/attachment-0004.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: proteins.cat.fasta.old.gz
Type: application/x-gzip
Size: 3668381 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140313/2048cfef/attachment-0003.gz>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140313/2048cfef/attachment-0005.html>

From carsonhh at gmail.com  Thu Mar 13 12:53:05 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 13 Mar 2014 12:53:05 -0600
Subject: [maker-devel] maker output- transcripts.fasta and
	proteins.fasta files missing
In-Reply-To: <C5EC9853-C3A9-4651-9C7F-05F7B73FC628@gmail.com>
References: <E1538E4B-E356-4044-BD96-6D56D6F65C87@gmail.com>
	<64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com>
	<CF4382AA.AA8B%carsonhh@gmail.com>
	<A1D096BC-F25A-48D9-8C7F-8A64946E57F7@gmail.com>
	<CF438653.AA92%carsonhh@gmail.com>
	<A22880BB-7693-4655-A3F3-D99D4F1FC08D@gmail.com>
	<CF4733ED.AB63%carsonhh@gmail.com>
	<0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com>
	<CF473DBA.AB9F%carsonhh@gmail.com>
	<672A27A2-FFBD-45EC-9303-E3973EEA5AB6@gmail.com>
	<CF474291.ABC0%carsonhh@gmail.com>
	<CF4744C6.ABC9%carsonhh@gmail.com>
	<5EE3B5E8-E7DC-4F09-B52D-E08CA4D85A15@gmail.com>
	<CF474BE5.ABDA%carsonhh@gmail.com>
	<C5EC9853-C3A9-4651-9C7F-05F7B73FC628@gmail.com>
Message-ID: <CF4759BA.ABE2%carsonhh@gmail.com>

For future reference, I suggest using the ?/maker/bin/fasta_merge tool to
merge based on the datastore.index rather than other command line based
methods.  It will handle the multiple fasta types that are produced in the
results, and will validate with the datastore.index file.

Example:
fasta_merge -d 
opgenResult+scaffoldsLengthsLess200_master_datastore_index.log

The same is also true when merging gff3 files.
gff3_merge -d opgenResult+scaffoldsLengthsLess200_master_datastore_index.log

Thanks,
Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Thursday, March 13, 2014 at 12:48 PM
To:  Carson Holt <carsonhh at gmail.com>
Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
missing

ah  I forgot that some were called superscaffolds.  That is a difference
between the old and new assembly. This was definitely the issue. Thanks and
sorry for the mix up.

Dhivya
On Mar 13, 2014, at 12:51 PM, Carson Holt wrote:

> Note that your command does not capture everything because not all scaffolds
> start with the name ?scaffold".
> 
> This works though ?>
> ls -lh opgenResult+scaffoldsLengthsLess200_datastore/*/*/*/*trans*fasta|wc -l
> 
> Thanks,
> Carson
> 
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Thursday, March 13, 2014 at 11:34 AM
> To:  Carson Holt <carsonhh at gmail.com>
> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files missing
> 
> Hi Carson,
> 
> Am I looking in the wrong place for my fasta files?  I looked here:
> 
> ls -lh opgenResult+scaffoldsLengthsLess200_datastore/*/*/sca*/*trans*fasta|wc
> -l
> 
> I see only 97 such files- so 97 contigs with transcripts.fasta files?
> 
> When I count the number of sequences in all these files, I get 514 sequences.
> 
> grep -c '^>' 
> opgenResult+scaffoldsLengthsLess200_datastore/*/*/sca*/*trans*fasta|cut -d ':'
> -f 2|awk '{total+=$0}END{print total}'
> 
> Could you tell how and where you are getting the 21,183 transcripts?
> 
> thanks
> dhivya
> 
> On Mar 13, 2014, at 12:21 PM, Carson Holt wrote:
> 
>> This is what I see in your uploaded data.  There are 21,183 transcripts from
>> 201 contigs.  Then there are 707 contigs with no gene models.
>> 
>> ?Carson
>> 
>> 
>> From:  Carson Holt <carsonhh at gmail.com>
>> Date:  Thursday, March 13, 2014 at 11:11 AM
>> To:  dhivya arasappan <darasappan at gmail.com>
>> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
>> missing
>> 
>> "as you saw from the output I uploaded before, the output certainly was much
>> less than 20,000 transcripts?
>> 
>> Actually there were 21,183 in the output you uploaded.  I saw no loss of
>> entries.
>> 
>> ?Carson
>> 
>> From:  dhivya arasappan <darasappan at gmail.com>
>> Date:  Thursday, March 13, 2014 at 11:09 AM
>> To:  Carson Holt <carsonhh at gmail.com>
>> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
>> missing
>> 
>> Hi Carson,
>> 
>> The datastore.index file looks fine- it has a started and finished status for
>> my 980 scaffolds.  I reran with increased time twice. Second time around, I
>> actually deleted the entire output directory to make sure it runs all over
>> again.  It still seemed to complete within a day. As you saw from the output
>> I uploaded before, the output certainly was much less than 20,000
>> transcripts. Given that I was seeing great results for an older version of my
>> assembly, I'm puzzled as to why my results are worse this time around. Any
>> suggestions of what to check or what I can do to see improved results would
>> be really helpful.
>> 
>> I do know that I went from ~4% gaps to ~6% gaps in my new assembly- other
>> than that, its better in every way. Could this cause just a dramatic
>> difference in results?
>> 
>> Thanks
>> dhivya
>> 
>> On Mar 13, 2014, at 11:55 AM, Carson Holt wrote:
>> 
>>> The second time, it should have just started where it left off, so it would
>>> run faster (because the processing from the previous job counted towards the
>>> second one).  The archived output you sent me had 21,183 proteins and
>>> transcripts.  If you are using the fasta_merge to collect them, just make
>>> sure the datastore.index file is not truncated or corrupt otherwise it won?t
>>> collect all the fastas from every contig.  You can rebuild the
>>> datastore.index using the -dsindex flag with MAKER, if you want to check
>>> that.  Also you can have maker just regenerate results without rerunning
>>> BLAST etc., by using the -a flag if you want to just recalculate ll results
>>> quickly (rebuilds all FASTA and GFF3 without redoing most analysis).
>>> 
>>> ?Carson
>>> 
>>> 
>>> From:  dhivya arasappan <darasappan at gmail.com>
>>> Date:  Thursday, March 13, 2014 at 10:47 AM
>>> To:  Carson Holt <carsonhh at gmail.com>
>>> Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
>>> <maker-devel at yandell-lab.org>
>>> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
>>> missing
>>> 
>>> Thanks Carson for the response.  I understand that est2genome=1 does not use
>>> any ab initio gene predictions, but simply identifies ests based on
>>> alignment.  I'm a little confused because I ran maker on my assembly before,
>>> using the same parameters ( including est2genome=1).  I got a very good
>>> result with > 20,000 transcripts and proteins.
>>> 
>>> Then I  was able to get an improved assembly, where many scaffolds were
>>> combined into superscaffolds. So I reran maker on this assembly.   Same
>>> parameters, same transcriptome and proteins files.  Now, I see such
>>> drastically different results:  Only 500+ genes and transcripts.  My
>>> scaffolds are now bigger than before, so I'm not sure how this is happening.
>>> These were the results I sent you.
>>> 
>>> Another odd thing I noticed (and I am hesitant to report this because
>>> perhaps it is due to some sort of error on my part):  I ran maker on the
>>> improved assembly the first time and maker did not complete in the 48 hours
>>> I allocated.  But I had  19,000+ transcripts in the unfinished output.  When
>>> I reran maker, just changing the time allocated, it completed much faster,
>>> but is giving much fewer transcripts and proteins as output.  Could
>>> something like this happen? If not, then I'm guessing I must have changed
>>> something although I'm pretty sure that I did not change anything other than
>>> the time allocated. I've attached the trascripts and proteins files from the
>>> first time I ran maker on my improved assembly.
>>> 
>>> Thanks again for your help
>>> Dhivya
>>> 
>>> 
>>> 
>>> On Mar 13, 2014, at 11:14 AM, Carson Holt wrote:
>>> 
>>>> Note protein/transcript fasts are only created when there are gene models
>>>> to output to those files (so their absence means there were no gene models
>>>> for that contig). Most sequences without protein/transcript fasts in your
>>>> sample are very short and thus don?t contain anything.  What is left either
>>>> have no est2genome results or the est2genome alignments do not have
>>>> sufficient open reading frame to be turned into a gene model (false merging
>>>> of regions by trinity can cause this, so make sure you use the jaccard
>>>> index option when assembling reads with trinity to avoid this).
>>>> 
>>>> You are using only the est2genome=1 option.  This will result in a limited
>>>> set of genes that can be used for training SNAP/Augustus (so not getting
>>>> results on all contigs is expected).  You really won?t get much as far as
>>>> results until you have one of the ab initio predictors turned on.
>>>> 
>>>> Thanks,
>>>> Carson
>>>> 
>>>> 
>>>> From:  dhivya arasappan <darasappan at gmail.com>
>>>> Date:  Tuesday, March 11, 2014 at 8:52 AM
>>>> To:  Carson Holt <carsonhh at gmail.com>
>>>> Cc:  Daniel Ence <dence at genetics.utah.edu>
>>>> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
>>>> missing
>>>> 
>>>> Alright done. My username is daras
>>>> 
>>>> Thanks
>>>> Dhivya
>>>> 
>>>> On Mar 10, 2014, at 5:10 PM, Carson Holt wrote:
>>>> 
>>>>> Input and compressed file of output.
>>>>> 
>>>>> Thanks,
>>>>> Carson
>>>>> 
>>>>> From:  dhivya arasappan <darasappan at gmail.com>
>>>>> Date:  Monday, March 10, 2014 at 2:09 PM
>>>>> To:  Carson Holt <carsonhh at gmail.com>
>>>>> Cc:  Daniel Ence <dence at genetics.utah.edu>
>>>>> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
>>>>> missing
>>>>> 
>>>>> Hi Carson,
>>>>> 
>>>>> Do you mean the whole maker output?
>>>>> 
>>>>> Thanks
>>>>> dhivya
>>>>> 
>>>>> On Mar 10, 2014, at 4:55 PM, Carson Holt wrote:
>>>>> 
>>>>>> Could you upload everything here ?>
>>>>>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>>>>>> 
>>>>>> Than send us the link generated or your user ID.
>>>>>> 
>>>>>> Thanks,
>>>>>> Carson
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> From:  dhivya arasappan <darasappan at gmail.com>
>>>>>> Date:  Monday, March 10, 2014 at 1:50 PM
>>>>>> To:  Carson Holt <carsonhh at gmail.com>, Daniel Ence
>>>>>> <dence at genetics.utah.edu>
>>>>>> Subject:  Fwd: maker output- transcripts.fasta and proteins.fasta files
>>>>>> missing
>>>>>> 
>>>>>> Hi Carson and Daniel,
>>>>>> 
>>>>>> I'm sending this across to you separately since maker list is blocking my
>>>>>> email due to attachment size.
>>>>>> 
>>>>>> As always, thanks for any guidance you can provide.
>>>>>> Dhivya
>>>>>> 
>>>>>> 
>>>>>> Begin forwarded message:
>>>>>> 
>>>>>>> From: dhivya arasappan <darasappan at gmail.com>
>>>>>>> Date: March 10, 2014 3:14:03 PM CDT
>>>>>>> To: maker-devel at yandell-lab.org
>>>>>>> Subject: maker output- transcripts.fasta and proteins.fasta files
>>>>>>> missing
>>>>>>> 
>>>>>>>  
>>>>>>> Hello,
>>>>>>> 
>>>>>>> I've been running maker with different assembly files, reference files
>>>>>>> etc  and I check the output by:
>>>>>>> 
>>>>>>> 1. concatenating the gff files
>>>>>>> 2. concatenating the *transcripts.fasta files
>>>>>>> 3. concatenating the *proteins.fasta files
>>>>>>> 
>>>>>>> I'm noticing that when I ran maker twice with same parameters, the
>>>>>>> second time around, many of the output subdirectories  do not have a
>>>>>>> *transcripts.fasta or *proteins.fasta file in it.
>>>>>>> There are 251 subdirectories and only 97 of them have all 3 output
>>>>>>> files.  Maker log looks ok to me, but I've attached it here as well.
>>>>>>> 
>>>>>>> What could be the reason for this?
>>>>>>> 
>>>>>>> Thanks
>>>>>>> dhivya
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140313/dff0c913/attachment-0001.html>

From cjfields at illinois.edu  Thu Mar 13 15:04:23 2014
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 13 Mar 2014 21:04:23 +0000
Subject: [maker-devel] geneid (or alternative ab initio predictors)
In-Reply-To: <CAJrwUqnXZgJgse2X6z7QhQcC_aNih_dp90dpjsW037F0Qk-W4A@mail.gmail.com>
References: <CF433C40.AA26%carsonhh@gmail.com>
	<CAJrwUqnXZgJgse2X6z7QhQcC_aNih_dp90dpjsW037F0Qk-W4A@mail.gmail.com>
Message-ID: <A7C303EB-717F-4E95-8829-7912B49A6D38@illinois.edu>

That is nice to know; I?ll have to check the masking on this assembly to see if that is the problem (my guess is that it is).

Carson, re: geneid and ?hints?, it looks as if geneid can take some hints such as BLAST HSPs (as well as other information), in the form of a GFF ?homology? file.  I assume it could take protein2genome/est2genome as well through the same route.

chris

On Mar 10, 2014, at 1:31 PM, Sajeet Haridas <sajeet at gmail.com<mailto:sajeet at gmail.com>> wrote:

One of the problems I have found with genemark is that it does not understand a soft-masked genome. Hence, the self training is incorrect. I have found marked improvement to genemark's prediction by running the training on a hard masked genome.


On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>> wrote:
Adding a new predictor can take some time.  It obviously requires some
coding.  It?s usually not too hard just to convert results to GFF3 and
then pass it in.  Integrated support is really only beneficial for
predictors that can take ?hints? from evidence alignments (for example we
are working on EVM integration right now -
http://evidencemodeler.sourceforge.net<http://evidencemodeler.sourceforge.net/>).  If SNAP and GeneMark give
problems just drop them.  GeneMark really doesn?t work very good on
genomes with complex intron/exon structure (and I really wouldn?t use it
for anything but fungi).

Make sure you are also giving sufficient protein evidence.  Perhaps all
proteins from chicken and pigeon for example.  Then you shouldn?t find
loss of any true genes if just using Augustus.  Also try not to use gene
count as an indicator of performance.  The value is very deceptive,
especially if the genome assembly is fragmented.

Thanks,
Carson


On 3/10/14, 8:52 AM, "Fields, Christopher J" <cjfields at illinois.edu<mailto:cjfields at illinois.edu>> wrote:

>I have been running MAKER 2.31 using Augustus and SNAP on an avian
>genome.  Augustus gives pretty decent gene model predictions based on a
>custom model we have and the hints MAKER provides.  However, SNAP seems
>to throw out a ton of false positives; in many cases this appears to
>cause erroneous gene fusions.  Leaving out SNAP altogether however leads
>to a marked decrease in # models overall, which is worse.  GeneMark had a
>very similar problem (high # false positives) and thus no marked
>improvement, either when using with both Augustus and SNAP or with
>Augustus alone.
>
>I have been exploring using geneid
>(http://genome.crg.es/software/geneid/) as an alternative, based on some
>feedback on another project I worked with int he past.  This would be
>feed into MAKER using external GFF, but I wanted to see if anyone has
>tried geneid with MAKER first.
>
>Finally, how hard would it be to incorporate alternative callers into
>MAKER?  For instance, would it be possible to add these like a ?plugin??
>
>chris
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140313/357a688a/attachment-0001.html>

From jfierst at uoregon.edu  Fri Mar 14 10:06:26 2014
From: jfierst at uoregon.edu (Janna Fierst)
Date: Fri, 14 Mar 2014 09:06:26 -0700
Subject: [maker-devel] associating gene names between related strains
Message-ID: <CAGoyurZz5FvX_oCGtSoq5mzwfabFS5ixaHVgzQds7Bo26NcYHg@mail.gmail.com>

Hi,

we are assembling and annotating genomes for several related strains of
Caenorhabditis worms and I was wondering if there is a way to coordinate
the gene naming so that orthologs between species can be associated by
name. I have been playing around a little with the est_forward option but
can't figure out a good system/workflow that preserves names but still uses
the strain-specific RNA-Seq EST set for the actual gene models. Thanks!
-Janna
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140314/6d450ccc/attachment-0001.html>

From dence at genetics.utah.edu  Fri Mar 14 11:32:02 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Fri, 14 Mar 2014 17:32:02 +0000
Subject: [maker-devel] associating gene names between related strains
In-Reply-To: <CAGoyurZz5FvX_oCGtSoq5mzwfabFS5ixaHVgzQds7Bo26NcYHg@mail.gmail.com>
References: <CAGoyurZz5FvX_oCGtSoq5mzwfabFS5ixaHVgzQds7Bo26NcYHg@mail.gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6C3C3@mxb2.hg.genetics.utah.edu>

Hi Janna, So do you have one strain that you want to use as the reference for all the others? There's a script that comes with MAKER called maker_map_ids that lets you use a common prefix or suffix for entries in a fasta file from one strain and then use est_forward to use that ID in the gene models for the other species.

Let me know if that's not what you're looking for,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Janna Fierst [jfierst at uoregon.edu]
Sent: Friday, March 14, 2014 10:06 AM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] associating gene names between related strains

Hi,

we are assembling and annotating genomes for several related strains of Caenorhabditis worms and I was wondering if there is a way to coordinate the gene naming so that orthologs between species can be associated by name. I have been playing around a little with the est_forward option but can't figure out a good system/workflow that preserves names but still uses the strain-specific RNA-Seq EST set for the actual gene models. Thanks! -Janna
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140314/84143c7f/attachment-0001.html>

From jfierst at uoregon.edu  Fri Mar 14 12:01:16 2014
From: jfierst at uoregon.edu (Janna Fierst)
Date: Fri, 14 Mar 2014 11:01:16 -0700
Subject: [maker-devel] associating gene names between related strains
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6C3C3@mxb2.hg.genetics.utah.edu>
References: <CAGoyurZz5FvX_oCGtSoq5mzwfabFS5ixaHVgzQds7Bo26NcYHg@mail.gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6C3C3@mxb2.hg.genetics.utah.edu>
Message-ID: <CAGoyuracbDO5pcWU7wThnnnGbfoKo2xEn+trPPUaUJx9t+8_Lg@mail.gmail.com>

I will try it today. Thanks for the quick reply!


On Fri, Mar 14, 2014 at 10:32 AM, Daniel Ence <dence at genetics.utah.edu>wrote:

>  Hi Janna, So do you have one strain that you want to use as the
> reference for all the others? There's a script that comes with MAKER called
> maker_map_ids that lets you use a common prefix or suffix for entries in a
> fasta file from one strain and then use est_forward to use that ID in the
> gene models for the other species.
>
>  Let me know if that's not what you're looking for,
> Daniel
>
>  Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
>   ------------------------------
> *From:* maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
> Janna Fierst [jfierst at uoregon.edu]
> *Sent:* Friday, March 14, 2014 10:06 AM
> *To:* maker-devel at yandell-lab.org
> *Subject:* [maker-devel] associating gene names between related strains
>
>   Hi,
>
> we are assembling and annotating genomes for several related strains of
> Caenorhabditis worms and I was wondering if there is a way to coordinate
> the gene naming so that orthologs between species can be associated by
> name. I have been playing around a little with the est_forward option but
> can't figure out a good system/workflow that preserves names but still uses
> the strain-specific RNA-Seq EST set for the actual gene models. Thanks!
> -Janna
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140314/6c26531d/attachment-0001.html>

From carsonhh at gmail.com  Fri Mar 14 12:02:48 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 14 Mar 2014 12:02:48 -0600
Subject: [maker-devel] associating gene names between related strains
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6C3C3@mxb2.hg.genetics.utah.edu>
References: <CAGoyurZz5FvX_oCGtSoq5mzwfabFS5ixaHVgzQds7Bo26NcYHg@mail.gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6C3C3@mxb2.hg.genetics.utah.edu>
Message-ID: <CF489F0B.AC19%carsonhh@gmail.com>

maker_map_ids does a translation (i.e. change gene-A to smug1), so you need
to know which genes you want to translate names to (two column input file,
column 1 -> original ID, column 2 -> new ID).  I?m not sure EST forward is
the best way to do this, although I do think maker_map_ids is the tool to
use in the end.  The question is how to make a list of IDs to translate as
the input to maker_map_ids?

I would actually just use BLASTP against the reference strain, and then do
reciprocal best BLAST hits.  To do this you BLAST your reference proteins
against your maker proteins.  Then do the opposite, BLAST your  maker
proteins against your reference proteins.  If they are both each others best
hit, then they are orthologous, and you can safely make a two column entry
for the maker_map_ids input (i.e. maker-gene-1 translates into smug1).

?Carson


From:  Daniel Ence <dence at genetics.utah.edu>
Date:  Friday, March 14, 2014 at 11:32 AM
To:  Janna Fierst <jfierst at uoregon.edu>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] associating gene names between related strains

Hi Janna, So do you have one strain that you want to use as the reference
for all the others? There's a script that comes with MAKER called
maker_map_ids that lets you use a common prefix or suffix for entries in a
fasta file from one strain and then use est_forward to use that ID in the
gene models for the other species.

Let me know if that's not what you're looking for,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330

From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Janna
Fierst [jfierst at uoregon.edu]
Sent: Friday, March 14, 2014 10:06 AM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] associating gene names between related strains

Hi,

we are assembling and annotating genomes for several related strains of
Caenorhabditis worms and I was wondering if there is a way to coordinate the
gene naming so that orthologs between species can be associated by name. I
have been playing around a little with the est_forward option but can't
figure out a good system/workflow that preserves names but still uses the
strain-specific RNA-Seq EST set for the actual gene models. Thanks! -Janna
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140314/e19abad7/attachment-0001.html>

From carsonhh at gmail.com  Fri Mar 14 12:43:41 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 14 Mar 2014 12:43:41 -0600
Subject: [maker-devel] Error when running maker2zff script
In-Reply-To: <9E3C7171-E5F7-4602-A7B7-9E9CE91F303A@gmail.com>
References: <C9394A0F-A682-4249-80DD-D79E45AE18EA@gmail.com>
	<3219E92A-2024-45C6-84A9-66C646287D7E@gmail.com>
	<9E3C7171-E5F7-4602-A7B7-9E9CE91F303A@gmail.com>
Message-ID: <CF48A7BD.AC29%carsonhh@gmail.com>

I?m glad you were able to fix it.  I?ll check to see why it was failing as
well.

Thanks,
Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Friday, March 14, 2014 at 10:16 AM
To:  Carson Holt <carsonhh at gmail.com>
Subject:  Re: Error when running maker2zff script

Kindly ignore my previous question. I was able to manipulate the scaffold
names in the gff file to get maker2zff to work.

Thanks
dhivya

On Mar 14, 2014, at 10:55 AM, dhivya arasappan <darasappan at gmail.com> wrote:

> My message got flagged by the maker list again, so I?m forwarding this
> separately to you.  Is there a better way to send biggish files?
> 
> 
> Thank you
> Dhivya
> 
> 
> 
> Begin forwarded message:
> 
>> From: dhivya arasappan <darasappan at gmail.com>
>> Subject: Error when running maker2zff script
>> Date: March 13, 2014 at 8:35:27 PM CDT
>> To: Carson Holt <carsonhh at gmail.com>, maker-devel at yandell-lab.org
>> 
>> Hi Carson,
>> 
>> I used gff3_merge to create my gff file from maker output. I've attached it
>> here. But when I run maker2zff on it, I get the following error:
>> 
>> Can't use an undefined value as an ARRAY reference at
>> /opt/apps/maker/2.30/bin/maker2zff line 177, <GFF> line 7294251.
>> 
>> It produces an incomplete output file and it looks like it may be running
>> into problems when it encounters scaffold3%2F0.  I'm wondering if its having
>> problems with my scaffold names. There seem to be some inconsistencies
>> because it's referred to as  scaffold3%F0 and scaffold3/0 in the gff file.
>> It goes through other scaffolds like SCAFFOLD3_873, SCAFFOLD3_95 etc just
>> fine.   I did try replacing the scaffold names in the gff file, but still get
>> the same error.   Any ideas?
>> 
>> Substitution command I used, for your reference:  sed 's/3\%2F/3_/g' gfffile|
>> sed 's/\//\_/'  > mod.gfffile
>> 
>> Thanks
>> Dhivya
>> 
> <head.gff.gz>
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140314/0ab2c23b/attachment-0001.html>

From carsonhh at gmail.com  Fri Mar 14 13:25:58 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 14 Mar 2014 13:25:58 -0600
Subject: [maker-devel] geneid (or alternative ab initio predictors)
In-Reply-To: <A7C303EB-717F-4E95-8829-7912B49A6D38@illinois.edu>
References: <CF433C40.AA26%carsonhh@gmail.com>
	<CAJrwUqnXZgJgse2X6z7QhQcC_aNih_dp90dpjsW037F0Qk-W4A@mail.gmail.com>
	<A7C303EB-717F-4E95-8829-7912B49A6D38@illinois.edu>
Message-ID: <CF48B2BC.AC3E%carsonhh@gmail.com>

We can look into it.

?Carson

From:  "Fields, Christopher J" <cjfields at illinois.edu>
Date:  Thursday, March 13, 2014 at 3:04 PM
To:  Sajeet Haridas <sajeet at gmail.com>
Cc:  Carson Holt <carsonhh at gmail.com>, "<maker-devel at yandell-lab.org> List"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] geneid (or alternative ab initio predictors)

That is nice to know; I?ll have to check the masking on this assembly to see
if that is the problem (my guess is that it is).

Carson, re: geneid and ?hints?, it looks as if geneid can take some hints
such as BLAST HSPs (as well as other information), in the form of a GFF
?homology? file.  I assume it could take protein2genome/est2genome as well
through the same route.

chris

On Mar 10, 2014, at 1:31 PM, Sajeet Haridas <sajeet at gmail.com> wrote:

> One of the problems I have found with genemark is that it does not understand
> a soft-masked genome. Hence, the self training is incorrect. I have found
> marked improvement to genemark's prediction by running the training on a hard
> masked genome.
> 
> 
> On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt <carsonhh at gmail.com> wrote:
>> Adding a new predictor can take some time.  It obviously requires some
>> coding.  It?s usually not too hard just to convert results to GFF3 and
>> then pass it in.  Integrated support is really only beneficial for
>> predictors that can take ?hints? from evidence alignments (for example we
>> are working on EVM integration right now -
>> http://evidencemodeler.sourceforge.net
>> <http://evidencemodeler.sourceforge.net/> ).  If SNAP and GeneMark give
>> problems just drop them.  GeneMark really doesn?t work very good on
>> genomes with complex intron/exon structure (and I really wouldn?t use it
>> for anything but fungi).
>> 
>> Make sure you are also giving sufficient protein evidence.  Perhaps all
>> proteins from chicken and pigeon for example.  Then you shouldn?t find
>> loss of any true genes if just using Augustus.  Also try not to use gene
>> count as an indicator of performance.  The value is very deceptive,
>> especially if the genome assembly is fragmented.
>> 
>> Thanks,
>> Carson
>> 
>> 
>> 
>> On 3/10/14, 8:52 AM, "Fields, Christopher J" <cjfields at illinois.edu> wrote:
>> 
>>> >I have been running MAKER 2.31 using Augustus and SNAP on an avian
>>> >genome.  Augustus gives pretty decent gene model predictions based on a
>>> >custom model we have and the hints MAKER provides.  However, SNAP seems
>>> >to throw out a ton of false positives; in many cases this appears to
>>> >cause erroneous gene fusions.  Leaving out SNAP altogether however leads
>>> >to a marked decrease in # models overall, which is worse.  GeneMark had a
>>> >very similar problem (high # false positives) and thus no marked
>>> >improvement, either when using with both Augustus and SNAP or with
>>> >Augustus alone.
>>> >
>>> >I have been exploring using geneid
>>> >(http://genome.crg.es/software/geneid/) as an alternative, based on some
>>> >feedback on another project I worked with int he past.  This would be
>>> >feed into MAKER using external GFF, but I wanted to see if anyone has
>>> >tried geneid with MAKER first.
>>> >
>>> >Finally, how hard would it be to incorporate alternative callers into
>>> >MAKER?  For instance, would it be possible to add these like a ?plugin??
>>> >
>>> >chris
>>> >_______________________________________________
>>> >maker-devel mailing list
>>> >maker-devel at box290.bluehost.com
>>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
>> 
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140314/f67ff628/attachment-0001.html>

From cjfields at illinois.edu  Fri Mar 14 20:22:55 2014
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Sat, 15 Mar 2014 02:22:55 +0000
Subject: [maker-devel] geneid (or alternative ab initio predictors)
In-Reply-To: <CF48B2BC.AC3E%carsonhh@gmail.com>
References: <CF433C40.AA26%carsonhh@gmail.com>
	<CAJrwUqnXZgJgse2X6z7QhQcC_aNih_dp90dpjsW037F0Qk-W4A@mail.gmail.com>
	<A7C303EB-717F-4E95-8829-7912B49A6D38@illinois.edu>
	<CF48B2BC.AC3E%carsonhh@gmail.com>
Message-ID: <53FD788A-15EA-4A18-BB2F-3072178816CA@illinois.edu>

Not an issue at the moment; I?ll likely supply these via gff for now.  If needed I can work off a svn checkout and send along a patch should I ever manage to eek out time to work on it.

chris

On Mar 14, 2014, at 2:25 PM, Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>> wrote:

We can look into it.

?Carson

From: "Fields, Christopher J" <cjfields at illinois.edu<mailto:cjfields at illinois.edu>>
Date: Thursday, March 13, 2014 at 3:04 PM
To: Sajeet Haridas <sajeet at gmail.com<mailto:sajeet at gmail.com>>
Cc: Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>, "<maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>> List" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] geneid (or alternative ab initio predictors)

That is nice to know; I?ll have to check the masking on this assembly to see if that is the problem (my guess is that it is).

Carson, re: geneid and ?hints?, it looks as if geneid can take some hints such as BLAST HSPs (as well as other information), in the form of a GFF ?homology? file.  I assume it could take protein2genome/est2genome as well through the same route.

chris

On Mar 10, 2014, at 1:31 PM, Sajeet Haridas <sajeet at gmail.com<mailto:sajeet at gmail.com>> wrote:

One of the problems I have found with genemark is that it does not understand a soft-masked genome. Hence, the self training is incorrect. I have found marked improvement to genemark's prediction by running the training on a hard masked genome.


On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>> wrote:
Adding a new predictor can take some time.  It obviously requires some
coding.  It?s usually not too hard just to convert results to GFF3 and
then pass it in.  Integrated support is really only beneficial for
predictors that can take ?hints? from evidence alignments (for example we
are working on EVM integration right now -
http://evidencemodeler.sourceforge.net<http://evidencemodeler.sourceforge.net/>).  If SNAP and GeneMark give
problems just drop them.  GeneMark really doesn?t work very good on
genomes with complex intron/exon structure (and I really wouldn?t use it
for anything but fungi).

Make sure you are also giving sufficient protein evidence.  Perhaps all
proteins from chicken and pigeon for example.  Then you shouldn?t find
loss of any true genes if just using Augustus.  Also try not to use gene
count as an indicator of performance.  The value is very deceptive,
especially if the genome assembly is fragmented.

Thanks,
Carson


On 3/10/14, 8:52 AM, "Fields, Christopher J" <cjfields at illinois.edu<mailto:cjfields at illinois.edu>> wrote:

>I have been running MAKER 2.31 using Augustus and SNAP on an avian
>genome.  Augustus gives pretty decent gene model predictions based on a
>custom model we have and the hints MAKER provides.  However, SNAP seems
>to throw out a ton of false positives; in many cases this appears to
>cause erroneous gene fusions.  Leaving out SNAP altogether however leads
>to a marked decrease in # models overall, which is worse.  GeneMark had a
>very similar problem (high # false positives) and thus no marked
>improvement, either when using with both Augustus and SNAP or with
>Augustus alone.
>
>I have been exploring using geneid
>(http://genome.crg.es/software/geneid/) as an alternative, based on some
>feedback on another project I worked with int he past.  This would be
>feed into MAKER using external GFF, but I wanted to see if anyone has
>tried geneid with MAKER first.
>
>Finally, how hard would it be to incorporate alternative callers into
>MAKER?  For instance, would it be possible to add these like a ?plugin??
>
>chris
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140315/e6294622/attachment-0001.html>

From carson.holt at genetics.utah.edu  Mon Mar 17 13:45:15 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Mon, 17 Mar 2014 19:45:15 +0000
Subject: [maker-devel] non-nucleotide characters in the maker generated
	transcripts
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A890CC84@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A890C8AC@SKREGIXES2.AGR.GC.CA>
	<CF47300B.AB4F%carson.holt@genetics.utah.edu>
	<CF4731CC.AB5E%carson.holt@genetics.utah.edu>
	<E8EDFB90D92694478065C37017B3A3A6A890CC84@SKREGIXES2.AGR.GC.CA>
Message-ID: <CF4CA8DB.AD74%carson.holt@genetics.utah.edu>

I have attached 4 files for you to place in the .../maker/Widgets/
directory.

The *blast.pm files will suppress the BLAST+ failures you are getting
(alternatively you can just downgrade to BLAST 2.27 to get the same
effect).  BLAST 2.29 gives a lot of warnings etc., which you can ignore.
In the latest release NCBI redid all their warnings and error codes so it
spits out a lot of garbage and fails with different messages than it did
before.  For example BLAST now warns you every time it encounter a fasta
header with a comment (virtually every fasta entry in existence falls in
this category), so your screen will be awash with meaningless warning
messages.

The fgenesh.pm file will fix the other failure, which only occurs if you
use fgenesh simultaneously with the est_fustion=1 option.  No other
predictors are affected.

Thanks,
Carson


On 3/14/14, 5:14 PM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:

>Dear  Carson
>
>Sorry for the late reply. I was away for a couple of days. I have uploaded
>the out put files plus control and error output on the FTP site that you
>provided
>The user ID is borhanh
>
>I used blast+ for this run.
>
>
>
>
>Regards
>
>
>HB
>
>
>
>
>
>
>
>
>On 14-03-13 10:00 AM, "Carson Holt" <carson.holt at genetics.utah.edu> wrote:
>
>>Just resending this to the correct maker-devel address.  Please when
>>replying, do not CC the incorrect maker-devel-bounce address.
>>
>>Thanks,
>>Carson
>>
>>
>>On 3/13/14, 9:56 AM, "Carson Holt" <carson.holt at genetics.utah.edu> wrote:
>>
>>>FGENESH is not a heavily used tool, so depending on which version it is
>>>(either too old or too new), output might be slightly different which
>>>could cause incorrect parsing. Could you tar up your maker.output
>>>folder,
>>>and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>>>(send me either your user/guest ID after you upload).
>>>
>>>For the BLAST error, use BLAST+ instead.  You are using blastall which
>>>is
>>>the old legacy version of NCBI BLAST.  You can do this by setting the
>>>blast type in maker_bopts.ctl and the location of executables in
>>>maker_exe.ctl.
>>>
>>>Thanks,
>>>Carson
>>>
>>>
>>>
>>>On 3/12/14, 11:58 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA>
>>>wrote:
>>>
>>>>Dear Maker users
>>>>
>>>>
>>>>I ran maker (2.31) on a fungal genome and found out that it inserted
>>>>the
>>>>word SCLAR   followed by a pair of bracket like this (0x22de7020)
>>>>inserted in the nucleotide sequence of some of the genes. This seems to
>>>>be related to transcripts predicted by fgenesh_masked.
>>>>
>>>>
>>>>Here is an example for one of the genes
>>>>
>>>>
>>>>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript
>>>>>offset:0 AE
>>>>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651
>>>>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23
>>>>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA
>>>>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG
>>>>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC
>>>>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT
>>>>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC
>>>>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT
>>>>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA
>>>>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA
>>>>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT
>>>>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT
>>>>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC
>>>>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG
>>>>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG
>>>>TTTCGACAAGC
>>>>
>>>>The same genome sequence was used for the first round of maker (2.10)
>>>>without such problem. I checked the sequence for the scaffold related
>>>>to
>>>>one of the affected transcripts and there was no error in the sequence.
>>>>I am not sure what is causing this. The only error that I could spot in
>>>>the output error file is the following
>>>>
>>>>
>>>>[blastall] FATAL ERROR:  search cannot proceed due to errors in all
>>>>contexts/frames of query sequences.
>>>>
>>>>
>>>>
>>>>Your help is appreciated
>>>>
>>>>
>>>>
>>>>HB
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: blastn.pm
Type: text/x-perl-script
Size: 8112 bytes
Desc: blastn.pm
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140317/e73c4b0f/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: blastx.pm
Type: text/x-perl-script
Size: 8218 bytes
Desc: blastx.pm
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140317/e73c4b0f/attachment-0005.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fgenesh.pm
Type: text/x-perl-script
Size: 19744 bytes
Desc: fgenesh.pm
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140317/e73c4b0f/attachment-0006.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tblastx.pm
Type: text/x-perl-script
Size: 9113 bytes
Desc: tblastx.pm
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140317/e73c4b0f/attachment-0007.bin>

From carsonhh at gmail.com  Mon Mar 17 15:14:42 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 17 Mar 2014 15:14:42 -0600
Subject: [maker-devel] Error when running maker2zff script
In-Reply-To: <C9394A0F-A682-4249-80DD-D79E45AE18EA@gmail.com>
References: <C9394A0F-A682-4249-80DD-D79E45AE18EA@gmail.com>
Message-ID: <CF4CBEAF.ADA3%carsonhh@gmail.com>

Just an update on this.  I?ve fixed the maker2zff script to handle the
issues seen.  Looking at this actually brought to light another issue.
There is inconsistent escape character specification for GFF3 in column 1
(the source ID), column 8 (the attributes ID and Target_ID), as well as
the FASTA ID for internal sequence.  We?re updating the GFF3 spec to
clarify this so that everywhere you see the same ID getting treated the
same way for character escaping.
 
To be safe though, only use these characters in your contig IDs for the
assembly when using any tool that reads or outputs GFF3 ?>
a-zA-Z0-9.:^*$@!+_?-|

Any character not in that set has a high chance of breaking some
downstream tool.  For now just assume the strict interpretation from the
GFF3 spec for column 1, must be used on all IDs everywhere (see below).

>>Column 1: ?seqid"
>>The ID of the landmark used to establish the coordinate system for the
>>current feature.
>>IDs may contain any characters, but must escape any characters not in
>>the set [a-zA-Z0-9.:^*$@!+_?-|].
>>In particular, IDs may not contain unescaped whitespace and must not
>>begin with an unescaped ">".


Thanks,
Carson


On 3/13/14, 7:35 PM, "dhivya arasappan" <darasappan at gmail.com> wrote:

>Hi Carson,
>
>I used gff3_merge to create my gff file from maker output. I've
>attached it here. But when I run maker2zff on it, I get the following
>error:
>
>Can't use an undefined value as an ARRAY reference at /opt/apps/maker/
>2.30/bin/maker2zff line 177, <GFF> line 7294251.
>
>It produces an incomplete output file and it looks like it may be
>running into problems when it encounters scaffold3%2F0.  I'm wondering
>if its having problems with my scaffold names. There seem to be some
>inconsistencies because it's referred to as  scaffold3%F0 and
>scaffold3/0 in the gff file.  It goes through other scaffolds like
>SCAFFOLD3_873, SCAFFOLD3_95 etc just fine.   I did try replacing the
>scaffold names in the gff file, but still get the same error.   Any
>ideas?
>
>Substitution command I used, for your reference:  sed 's/3\%2F/3_/g'
>gfffile| sed 's/\//\_/'  > mod.gfffile
>
>Thanks
>Dhivya
>


From darasappan at gmail.com  Mon Mar 17 15:20:18 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Mon, 17 Mar 2014 16:20:18 -0500
Subject: [maker-devel] Error when running maker2zff script
In-Reply-To: <CF4CBEAF.ADA3%carsonhh@gmail.com>
References: <C9394A0F-A682-4249-80DD-D79E45AE18EA@gmail.com>
	<CF4CBEAF.ADA3%carsonhh@gmail.com>
Message-ID: <CAGWaY_61EFs28=2dThqjgnkeisCXjad7JM72ews-fkTn0v7FCA@mail.gmail.com>

Awesome! Thanks Carson.

Dhivya


On Mon, Mar 17, 2014 at 4:14 PM, Carson Holt <carsonhh at gmail.com> wrote:

> Just an update on this.  I've fixed the maker2zff script to handle the
> issues seen.  Looking at this actually brought to light another issue.
> There is inconsistent escape character specification for GFF3 in column 1
> (the source ID), column 8 (the attributes ID and Target_ID), as well as
> the FASTA ID for internal sequence.  We're updating the GFF3 spec to
> clarify this so that everywhere you see the same ID getting treated the
> same way for character escaping.
>
> To be safe though, only use these characters in your contig IDs for the
> assembly when using any tool that reads or outputs GFF3 -->
> a-zA-Z0-9.:^*$@!+_?-|
>
> Any character not in that set has a high chance of breaking some
> downstream tool.  For now just assume the strict interpretation from the
> GFF3 spec for column 1, must be used on all IDs everywhere (see below).
>
> >>Column 1: "seqid"
> >>The ID of the landmark used to establish the coordinate system for the
> >>current feature.
> >>IDs may contain any characters, but must escape any characters not in
> >>the set [a-zA-Z0-9.:^*$@!+_?-|].
> >>In particular, IDs may not contain unescaped whitespace and must not
> >>begin with an unescaped ">".
>
>
> Thanks,
> Carson
>
>
>
> On 3/13/14, 7:35 PM, "dhivya arasappan" <darasappan at gmail.com> wrote:
>
> >Hi Carson,
> >
> >I used gff3_merge to create my gff file from maker output. I've
> >attached it here. But when I run maker2zff on it, I get the following
> >error:
> >
> >Can't use an undefined value as an ARRAY reference at /opt/apps/maker/
> >2.30/bin/maker2zff line 177, <GFF> line 7294251.
> >
> >It produces an incomplete output file and it looks like it may be
> >running into problems when it encounters scaffold3%2F0.  I'm wondering
> >if its having problems with my scaffold names. There seem to be some
> >inconsistencies because it's referred to as  scaffold3%F0 and
> >scaffold3/0 in the gff file.  It goes through other scaffolds like
> >SCAFFOLD3_873, SCAFFOLD3_95 etc just fine.   I did try replacing the
> >scaffold names in the gff file, but still get the same error.   Any
> >ideas?
> >
> >Substitution command I used, for your reference:  sed 's/3\%2F/3_/g'
> >gfffile| sed 's/\//\_/'  > mod.gfffile
> >
> >Thanks
> >Dhivya
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140317/7b1247b8/attachment-0001.html>

From marc.hoeppner at bils.se  Tue Mar 18 05:43:43 2014
From: marc.hoeppner at bils.se (=?windows-1252?Q?Marc_H=F6ppner?=)
Date: Tue, 18 Mar 2014 12:43:43 +0100
Subject: [maker-devel] Maker changes 2.30-2.31
Message-ID: <92E3B1D1-092C-45CF-9DD6-1A5E6921FC15@bils.se>

Hi,

I have observed a few oddities with our installation of maker 2.31 and was therefore wondering if there is a change log somewhere to get some information on what, if anything, was changed between 2.30 and 2.31?

There is of course a good chance that the issues I am seeing (pipeline locking up) are related to our setup and not necessarily Maker - but I?d  like to make sure, if possible. Both versions use the exact same external binaries etc, and were run on the same data. 2.30 is running along happily, 2.31 however has randomly locked up. I should perhaps also say that I am running on SL 6.2 and am using mpich2 for the MPI run. 

I haven?t done any more systematic testing so far, but will probably do so if there is no ?obvious? reason why Maker 2.31 should behave differently..

Cheers,

Marc


Marc P. Hoeppner, PhD
Department for Medical Biochemistry and Microbiology
Uppsala University, Sweden
marc.hoeppner at bils.se


From carsonhh at gmail.com  Tue Mar 18 09:07:07 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 18 Mar 2014 09:07:07 -0600
Subject: [maker-devel] Maker changes 2.30-2.31
In-Reply-To: <92E3B1D1-092C-45CF-9DD6-1A5E6921FC15@bils.se>
References: <92E3B1D1-092C-45CF-9DD6-1A5E6921FC15@bils.se>
Message-ID: <CF4DBC09.ADE0%carsonhh@gmail.com>

Attached.  Also make sure you are using the tar ball from the lab website
and not the prerelease from the subversion repository.

Thanks,
Carson


On 3/18/14, 5:43 AM, "Marc H?ppner" <marc.hoeppner at bils.se> wrote:

>Hi,
>
>I have observed a few oddities with our installation of maker 2.31 and
>was therefore wondering if there is a change log somewhere to get some
>information on what, if anything, was changed between 2.30 and 2.31?
>
>There is of course a good chance that the issues I am seeing (pipeline
>locking up) are related to our setup and not necessarily Maker - but I?d
>like to make sure, if possible. Both versions use the exact same external
>binaries etc, and were run on the same data. 2.30 is running along
>happily, 2.31 however has randomly locked up. I should perhaps also say
>that I am running on SL 6.2 and am using mpich2 for the MPI run.
>
>I haven?t done any more systematic testing so far, but will probably do
>so if there is no ?obvious? reason why Maker 2.31 should behave
>differently..
>
>Cheers,
>
>Marc
>
>
>
>
>Marc P. Hoeppner, PhD
>Department for Medical Biochemistry and Microbiology
>Uppsala University, Sweden
>marc.hoeppner at bils.se
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: svn_log.txt
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/65c17518/attachment-0001.txt>

From fbarreto at ucsd.edu  Tue Mar 18 10:08:47 2014
From: fbarreto at ucsd.edu (Felipe Barreto)
Date: Tue, 18 Mar 2014 09:08:47 -0700
Subject: [maker-devel] Size of initial EST training set for SNAP
Message-ID: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>

Hi, all,

I've been learning a lot from reading posts from this group, and finally
started doing actual runs of Maker on our current genome assembly
(arthropod, genome size ~230Mb).  I started by training SNAP, but would
like to check my approach before continuing with longer runs.

>From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I
deemed of very high quality based on blast alignments to Swiss-Prot (based
on query-subject coverage, bit score, etc).  I then used only these 2000
ESTs in a first Maker run using est2genome=1.  The output returned 1500
models (with the 500 "missing" models probably a result of single-exon
issues; not a concern at this point).

I now plan on training SNAP with this first output, and then doing another
Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein
evidence, and 3) SNAP with the first HMM file.  The output of this second
run will be used to re-train SNAP, and this second HMM file will be used in
a final "official" run (while continuing to provide the EST and protein
evidence, of course).

Does this sound like a reasonable approach?  Simply put, my main concern is
whether I'm using too few ESTs in my first est2genome step.

Thanks for any insight!

-- 
Felipe Barreto
Post-doctoral Scholar
Scripps Institution of Oceanography
University of California, San Diego
La Jolla, CA 92093
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/c8c3b2ba/attachment-0001.html>

From carsonhh at gmail.com  Tue Mar 18 10:14:29 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 18 Mar 2014 10:14:29 -0600
Subject: [maker-devel] Size of initial EST training set for SNAP
In-Reply-To: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>
References: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>
Message-ID: <CF4DCCE1.ADEE%carsonhh@gmail.com>

That sounds good.  1,500 initial models should be more than sufficient for
the first round of training.

?Carson


From:  Felipe Barreto <fbarreto at ucsd.edu>
Date:  Tuesday, March 18, 2014 at 10:08 AM
To:  MAKER group <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Size of initial EST training set for SNAP

Hi, all,

I've been learning a lot from reading posts from this group, and finally
started doing actual runs of Maker on our current genome assembly
(arthropod, genome size ~230Mb).  I started by training SNAP, but would like
to check my approach before continuing with longer runs.

>From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I
deemed of very high quality based on blast alignments to Swiss-Prot (based
on query-subject coverage, bit score, etc).  I then used only these 2000
ESTs in a first Maker run using est2genome=1.  The output returned 1500
models (with the 500 "missing" models probably a result of single-exon
issues; not a concern at this point).

I now plan on training SNAP with this first output, and then doing another
Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein
evidence, and 3) SNAP with the first HMM file.  The output of this second
run will be used to re-train SNAP, and this second HMM file will be used in
a final "official" run (while continuing to provide the EST and protein
evidence, of course).

Does this sound like a reasonable approach?  Simply put, my main concern is
whether I'm using too few ESTs in my first est2genome step.

Thanks for any insight!

-- 
Felipe Barreto
Post-doctoral Scholar
Scripps Institution of Oceanography
University of California, San Diego
La Jolla, CA 92093 
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/2cd5fce1/attachment-0001.html>

From dence at genetics.utah.edu  Tue Mar 18 10:16:20 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Tue, 18 Mar 2014 16:16:20 +0000
Subject: [maker-devel] Size of initial EST training set for SNAP
In-Reply-To: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>
References: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6E483@mxb2.hg.genetics.utah.edu>

Hi Felipe,

I think 1500 models sounds like a good size set with which to train SNAP. I think that SNAP expects ~1000 models for training.

The only other comment on the approach is perhaps that using only one ab-initio predictor is a little bit risky. Using multiple predictors would allow MAKER to select from among their different models for the one that best fits the evidence.

Good luck and let us know if there's anything we can help with!

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Felipe Barreto [fbarreto at ucsd.edu]
Sent: Tuesday, March 18, 2014 10:08 AM
To: MAKER group
Subject: [maker-devel] Size of initial EST training set for SNAP

Hi, all,

I've been learning a lot from reading posts from this group, and finally started doing actual runs of Maker on our current genome assembly (arthropod, genome size ~230Mb).  I started by training SNAP, but would like to check my approach before continuing with longer runs.

>From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I deemed of very high quality based on blast alignments to Swiss-Prot (based on query-subject coverage, bit score, etc).  I then used only these 2000 ESTs in a first Maker run using est2genome=1.  The output returned 1500 models (with the 500 "missing" models probably a result of single-exon issues; not a concern at this point).

I now plan on training SNAP with this first output, and then doing another Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein evidence, and 3) SNAP with the first HMM file.  The output of this second run will be used to re-train SNAP, and this second HMM file will be used in a final "official" run (while continuing to provide the EST and protein evidence, of course).

Does this sound like a reasonable approach?  Simply put, my main concern is whether I'm using too few ESTs in my first est2genome step.

Thanks for any insight!

--
Felipe Barreto
Post-doctoral Scholar
Scripps Institution of Oceanography
University of California, San Diego
La Jolla, CA 92093
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/b9bf5ff0/attachment-0001.html>

From barry.utah at gmail.com  Tue Mar 18 10:26:45 2014
From: barry.utah at gmail.com (Barry Moore)
Date: Tue, 18 Mar 2014 10:26:45 -0600
Subject: [maker-devel] Size of initial EST training set for SNAP
In-Reply-To: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>
References: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>
Message-ID: <02A2F388-D911-4C73-BF34-47A125A62EE5@genetics.utah.edu>

Hi Felipe,

I think that plan sounds quite reasonable.  To address your primary concern, most gene prediction tools recommend something in the range of a minimum of a few hundred gene models to train on.  Since your an order of magnitude above that I think your in good shape.  Having said that, of course if you have concerns about biases in your training set you may be able to supplement it further by using a tool like CEGMA (http://korflab.ucdavis.edu/datasets/cegma/) to include high confidence genes that your set is missing.

Since the final gene set will only be as complete as the gene predictions that MAKER has to choose from I would suggest that you also consider including at least one other gene predictor.  Augustus works well on a wide variety of genomes and while it is more difficult to train than SNAP it does accept hints from MAKER and will likely add to the diversity of the final gene set, even if you choose to use an existing HMM that has some reasonable relationship to your genome.  This is one of the advantages of MAKER supervision, while it would be best to train Augustus as well, MAKER will ensure that the final models are not too far out of line with the evidence and you'll likely see quite good results using a custom SNAP HMM and an existing Augustus HMM as predictor within MAKER.

Thanks,

B

On Mar 18, 2014, at 10:08 AM, Felipe Barreto wrote:

> Hi, all,
> 
> I've been learning a lot from reading posts from this group, and finally started doing actual runs of Maker on our current genome assembly (arthropod, genome size ~230Mb).  I started by training SNAP, but would like to check my approach before continuing with longer runs.  
> 
> From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I deemed of very high quality based on blast alignments to Swiss-Prot (based on query-subject coverage, bit score, etc).  I then used only these 2000 ESTs in a first Maker run using est2genome=1.  The output returned 1500 models (with the 500 "missing" models probably a result of single-exon issues; not a concern at this point).
> 
> I now plan on training SNAP with this first output, and then doing another Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein evidence, and 3) SNAP with the first HMM file.  The output of this second run will be used to re-train SNAP, and this second HMM file will be used in a final "official" run (while continuing to provide the EST and protein evidence, of course).
> 
> Does this sound like a reasonable approach?  Simply put, my main concern is whether I'm using too few ESTs in my first est2genome step.
> 
> Thanks for any insight!
> 
> -- 
> Felipe Barreto
> Post-doctoral Scholar
> Scripps Institution of Oceanography
> University of California, San Diego
> La Jolla, CA 92093
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/94293e29/attachment-0001.html>

From fbarreto at ucsd.edu  Tue Mar 18 10:59:39 2014
From: fbarreto at ucsd.edu (Felipe Barreto)
Date: Tue, 18 Mar 2014 09:59:39 -0700
Subject: [maker-devel] Size of initial EST training set for SNAP
In-Reply-To: <02A2F388-D911-4C73-BF34-47A125A62EE5@genetics.utah.edu>
References: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>
	<02A2F388-D911-4C73-BF34-47A125A62EE5@genetics.utah.edu>
Message-ID: <CAOi0ENYUcJFJsg0nDj3-9if0E96N+UY=vPyJkfH0T4xvFYOQ3w@mail.gmail.com>

Thanks, guys, for the swift and informative response!  I will try to train
Augustus again, but at the very least, will include it with an arthropod
HMM in my final run (in addition to my custom SNAP HMM).

Cheers,

Felipe


On Tue, Mar 18, 2014 at 9:26 AM, Barry Moore <barry.utah at gmail.com> wrote:

> Hi Felipe,
>
> I think that plan sounds quite reasonable.  To address your primary
> concern, most gene prediction tools recommend something in the range of a
> minimum of a few hundred gene models to train on.  Since your an order of
> magnitude above that I think your in good shape.  Having said that, of
> course if you have concerns about biases in your training set you may be
> able to supplement it further by using a tool like CEGMA (
> http://korflab.ucdavis.edu/datasets/cegma/) to include high confidence
> genes that your set is missing.
>
> Since the final gene set will only be as complete as the gene predictions
> that MAKER has to choose from I would suggest that you also consider
> including at least one other gene predictor.  Augustus works well on a wide
> variety of genomes and while it is more difficult to train than SNAP it
> does accept hints from MAKER and will likely add to the diversity of the
> final gene set, even if you choose to use an existing HMM that has some
> reasonable relationship to your genome.  This is one of the advantages of
> MAKER supervision, while it would be best to train Augustus as well, MAKER
> will ensure that the final models are not too far out of line with the
> evidence and you'll likely see quite good results using a custom SNAP HMM
> and an existing Augustus HMM as predictor within MAKER.
>
> Thanks,
>
> B
>
> On Mar 18, 2014, at 10:08 AM, Felipe Barreto wrote:
>
> Hi, all,
>
> I've been learning a lot from reading posts from this group, and finally
> started doing actual runs of Maker on our current genome assembly
> (arthropod, genome size ~230Mb).  I started by training SNAP, but would
> like to check my approach before continuing with longer runs.
>
> From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I
> deemed of very high quality based on blast alignments to Swiss-Prot (based
> on query-subject coverage, bit score, etc).  I then used only these 2000
> ESTs in a first Maker run using est2genome=1.  The output returned 1500
> models (with the 500 "missing" models probably a result of single-exon
> issues; not a concern at this point).
>
> I now plan on training SNAP with this first output, and then doing another
> Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein
> evidence, and 3) SNAP with the first HMM file.  The output of this second
> run will be used to re-train SNAP, and this second HMM file will be used in
> a final "official" run (while continuing to provide the EST and protein
> evidence, of course).
>
> Does this sound like a reasonable approach?  Simply put, my main concern
> is whether I'm using too few ESTs in my first est2genome step.
>
> Thanks for any insight!
>
> --
> Felipe Barreto
> Post-doctoral Scholar
> Scripps Institution of Oceanography
> University of California, San Diego
> La Jolla, CA 92093
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
> Barry Moore
> Research Scientist
> Dept. of Human Genetics
> University of Utah
> Salt Lake City, UT 84112
> --------------------------------------------
> (801) 585-3543
>
>
>
>
>


-- 
Felipe Barreto
Post-doctoral Scholar
Scripps Institution of Oceanography
University of California, San Diego
La Jolla, CA 92093
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/f95daccd/attachment-0001.html>

From darasappan at gmail.com  Tue Mar 18 13:27:11 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Tue, 18 Mar 2014 14:27:11 -0500
Subject: [maker-devel] maker snap output files
Message-ID: <F88D0CA1-15E3-4E01-881F-4F697044B1FC@gmail.com>

Hello,

I ran maker after running SNAP ab initio prediction (following instructions from the maker tutorial).  It ran successfully and when I ran fasta_merge, I got several output fasta files. I?m unable to find information on the tutorial about interpreting these different files. I?m hoping one of you can help.

*maker.proteins.fasta
*maker.snap_masked.proteins.fasta
*maker.non_overlapping_ab_initio.proteins.fasta

What is the difference among these? They all have different number of sequences.

Similarly,with transcripts:

maker.non_overlapping_ab_initio.transcripts.fasta
maker.snap_masked.transcripts.fasta
maker.transcripts.fasta

Thanks
Dhivya


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/93fd247e/attachment-0001.html>

From carsonhh at gmail.com  Tue Mar 18 13:34:05 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 18 Mar 2014 13:34:05 -0600
Subject: [maker-devel] maker snap output files
In-Reply-To: <F88D0CA1-15E3-4E01-881F-4F697044B1FC@gmail.com>
References: <F88D0CA1-15E3-4E01-881F-4F697044B1FC@gmail.com>
Message-ID: <CF4DFA69.AE2E%carsonhh@gmail.com>

maker.proteins.fasta - these are the final filtered and modified protein
models (this is what you want)
maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab
initio predictions (for reference purposes)
maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant
rejected models that do not overlap the maker.proteins.fasta entries. If you
think you are missing a gene, look for it here.  Sometimes people use
interproscan (very slow) to analyze this file for false negatives.


These files are also described in the README distributed with MAKER in the
?MAKER OUTPUT? section.

Thanks,
Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Tuesday, March 18, 2014 at 1:27 PM
To:  Carson Holt <carsonhh at gmail.com>, <maker-devel at yandell-lab.org>
Subject:  maker snap output files

Hello,

I ran maker after running SNAP ab initio prediction (following instructions
from the maker tutorial).  It ran successfully and when I ran fasta_merge, I
got several output fasta files. I?m unable to find information on the
tutorial about interpreting these different files. I?m hoping one of you can
help.

*maker.proteins.fasta
*maker.snap_masked.proteins.fasta
*maker.non_overlapping_ab_initio.proteins.fasta

What is the difference among these? They all have different number of
sequences.

Similarly,with transcripts:

maker.non_overlapping_ab_initio.transcripts.fasta
maker.snap_masked.transcripts.fasta
maker.transcripts.fasta

Thanks
Dhivya


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/5d1345f9/attachment-0001.html>

From darasappan at gmail.com  Tue Mar 18 14:05:39 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Tue, 18 Mar 2014 15:05:39 -0500
Subject: [maker-devel] maker snap output files
In-Reply-To: <CF4DFA69.AE2E%carsonhh@gmail.com>
References: <F88D0CA1-15E3-4E01-881F-4F697044B1FC@gmail.com>
	<CF4DFA69.AE2E%carsonhh@gmail.com>
Message-ID: <05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com>

Thanks Carson.

Is it normal that in my maker results after running snap, the number of proteins (in *maker.proteins.fasta) Is actually less than the number of proteins in my pre-snap maker results?  I assumed that annotations through alignment+annotation through prediction would equal more annotations?

The unfiltered proteins file has more proteins though.

Thanks
Dhivya


On Mar 18, 2014, at 2:34 PM, Carson Holt <carsonhh at gmail.com> wrote:

> maker.proteins.fasta - these are the final filtered and modified protein models (this is what you want)
> maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab initio predictions (for reference purposes)
> maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant rejected models that do not overlap the maker.proteins.fasta entries. If you think you are missing a gene, look for it here.  Sometimes people use interproscan (very slow) to analyze this file for false negatives.
> 
> 
> These files are also described in the README distributed with MAKER in the ?MAKER OUTPUT? section.
> 
> Thanks,
> Carson
> 
> 
> 
> 
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Tuesday, March 18, 2014 at 1:27 PM
> To: Carson Holt <carsonhh at gmail.com>, <maker-devel at yandell-lab.org>
> Subject: maker snap output files
> 
> Hello,
> 
> I ran maker after running SNAP ab initio prediction (following instructions from the maker tutorial).  It ran successfully and when I ran fasta_merge, I got several output fasta files. I?m unable to find information on the tutorial about interpreting these different files. I?m hoping one of you can help.
> 
> *maker.proteins.fasta
> *maker.snap_masked.proteins.fasta
> *maker.non_overlapping_ab_initio.proteins.fasta
> 
> What is the difference among these? They all have different number of sequences.
> 
> Similarly,with transcripts:
> 
> maker.non_overlapping_ab_initio.transcripts.fasta
> maker.snap_masked.transcripts.fasta
> maker.transcripts.fasta
> 
> Thanks
> Dhivya
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/8f85193d/attachment-0001.html>

From carsonhh at gmail.com  Tue Mar 18 14:09:01 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 18 Mar 2014 14:09:01 -0600
Subject: [maker-devel] maker snap output files
In-Reply-To: <05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com>
References: <F88D0CA1-15E3-4E01-881F-4F697044B1FC@gmail.com>
	<CF4DFA69.AE2E%carsonhh@gmail.com>
	<05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com>
Message-ID: <CF4E0363.AE3D%carsonhh@gmail.com>

There can also be hint based predictions.  They may be similar in size, but
there is no rule.  Generally maker.snap_masked.proteins.fasta will be
larger, as gene predictors tend to over predict (as much as 10 fold).  You
should always review your annotations in something like Apollo, to see how
the models compare to the evidence.  Just counts don?t really mean anything.

Thanks,
Carson

From:  dhivya arasappan <darasappan at gmail.com>
Date:  Tuesday, March 18, 2014 at 2:05 PM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  <maker-devel at yandell-lab.org>
Subject:  Re: maker snap output files

Thanks Carson.

Is it normal that in my maker results after running snap, the number of
proteins (in *maker.proteins.fasta) Is actually less than the number of
proteins in my pre-snap maker results?  I assumed that annotations through
alignment+annotation through prediction would equal more annotations?

The unfiltered proteins file has more proteins though.

Thanks
Dhivya


On Mar 18, 2014, at 2:34 PM, Carson Holt <carsonhh at gmail.com> wrote:

> maker.proteins.fasta - these are the final filtered and modified protein
> models (this is what you want)
> maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab initio
> predictions (for reference purposes)
> maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant
> rejected models that do not overlap the maker.proteins.fasta entries. If you
> think you are missing a gene, look for it here.  Sometimes people use
> interproscan (very slow) to analyze this file for false negatives.
> 
> 
> These files are also described in the README distributed with MAKER in the
> ?MAKER OUTPUT? section.
> 
> Thanks,
> Carson
> 
> 
> 
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Tuesday, March 18, 2014 at 1:27 PM
> To:  Carson Holt <carsonhh at gmail.com>, <maker-devel at yandell-lab.org>
> Subject:  maker snap output files
> 
> Hello,
> 
> I ran maker after running SNAP ab initio prediction (following instructions
> from the maker tutorial).  It ran successfully and when I ran fasta_merge, I
> got several output fasta files. I?m unable to find information on the tutorial
> about interpreting these different files. I?m hoping one of you can help.
> 
> *maker.proteins.fasta
> *maker.snap_masked.proteins.fasta
> *maker.non_overlapping_ab_initio.proteins.fasta
> 
> What is the difference among these? They all have different number of
> sequences.
> 
> Similarly,with transcripts:
> 
> maker.non_overlapping_ab_initio.transcripts.fasta
> maker.snap_masked.transcripts.fasta
> maker.transcripts.fasta
> 
> Thanks
> Dhivya
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/f5d761ca/attachment-0001.html>

From chrisbioinfo at gmail.com  Wed Mar 19 05:09:57 2014
From: chrisbioinfo at gmail.com (Chris Bioinfo)
Date: Wed, 19 Mar 2014 12:09:57 +0100
Subject: [maker-devel] Annotation with maker2
Message-ID: <CAF+kvSZO+VzHveN+WNmD3O8qayyrOFATS7VA2c-wLdGs1m4iTw@mail.gmail.com>

Hello,

I'm installing/using maker2 for the first time and I have an error by using
it.

I certainly missing something, but I don't know what.

I compile maker with no error message and I have all these directories
after compilation:
bin  data  GMOD  INSTALL  lib  LICENSE  MWAS  perl  README  src

Nevertheless when I try maker2 on the test data (dpp_contig.fasta) I have
this error:

STATUS: Now running MAKER...
examining contents of the fasta file and run log


--Next Contig--

#---------------------------------------------------------------------
Now starting the contig!!
SeqID: contig-dpp-500-500
Length: 32156
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
doing repeat masking
DBI connect('dbname=/path/to/dpp_contig.maker.output/dpp_contig.db','',...)
failed: unable to open database file at
/usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm

Can't call method "do" on an undefined value at
/usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm
--> rank=NA, hostname=belem
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:contig-dpp-500-500
...

ideas?

Best,

Christelle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140319/f54e5d3c/attachment-0001.html>

From carsonhh at gmail.com  Wed Mar 19 07:01:35 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 19 Mar 2014 07:01:35 -0600
Subject: [maker-devel] Annotation with maker2
In-Reply-To: <CAF+kvSZO+VzHveN+WNmD3O8qayyrOFATS7VA2c-wLdGs1m4iTw@mail.gmail.com>
References: <CAF+kvSZO+VzHveN+WNmD3O8qayyrOFATS7VA2c-wLdGs1m4iTw@mail.gmail.com>
Message-ID: <CF4EF035.AE6F%carsonhh@gmail.com>

Your problem is one of the following.  You need to reinstall the DBD::SQLite
module, you are running in a directory you don?t have permissions for, you
set your TMDIR environmental variable or TMP value in maker_opts.ctl to an
NFS mounted or memory mounted directory, or you are using a self compiled
version of Perl (I.e. not /usr/bin/perl) that has issues (probably with DB
or SQLite modules).  You can also completely delete the output directory,
and start again to see if it was just a random error.  You should look at
each of those first.  You can also run MAKER with the --debug command line
flag and send it to me if all of those seem not to be the issue.

Thanks,
Carson


From:  Chris Bioinfo <chrisbioinfo at gmail.com>
Date:  Wednesday, March 19, 2014 at 5:09 AM
To:  <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Annotation with maker2

Hello,

I'm installing/using maker2 for the first time and I have an error by using
it.

I certainly missing something, but I don't know what.

I compile maker with no error message and I have all these directories after
compilation: 
bin  data  GMOD  INSTALL  lib  LICENSE  MWAS  perl  README  src

Nevertheless when I try maker2 on the test data (dpp_contig.fasta) I have
this error:

STATUS: Now running MAKER...
examining contents of the fasta file and run log


--Next Contig--

#---------------------------------------------------------------------
Now starting the contig!!
SeqID: contig-dpp-500-500
Length: 32156
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
doing repeat masking
DBI connect('dbname=/path/to/dpp_contig.maker.output/dpp_contig.db','',...)
failed: unable to open database file at
/usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm

Can't call method "do" on an undefined value at
/usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm
--> rank=NA, hostname=belem
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:contig-dpp-500-500
...

ideas?

Best,

Christelle

_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140319/66e7fe68/attachment-0001.html>

From rbharris at uw.edu  Wed Mar 19 19:19:27 2014
From: rbharris at uw.edu (Rebecca Harris)
Date: Wed, 19 Mar 2014 18:19:27 -0700
Subject: [maker-devel] tradeoff between run time & file number
Message-ID: <CAESS274qd5dL9apLh3sobjkz0+vwjVa9j0Ytd5dR-Qrb4av+=Q@mail.gmail.com>

Hi -

I'm running maker on a dataset of >400,000 scaffolds with MPI -n 64. I've
gone through it once - and used the clean_up option because otherwise maker
exceeds the clusters file_quote. However, now I'm retraining SNAP and it is
taking a very long time - probably because it has to go through BLAST
again. Is there anyway of getting around this? I expect I may have to train
SNAP and rerun maker multiple times and it is taking about 3 weeks to get
through my dataset. Is there a way to prune down my original dataset based
on maker's output?

Thanks,
Rebecca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140319/80de6463/attachment-0001.html>

From dence at genetics.utah.edu  Wed Mar 19 23:43:11 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Thu, 20 Mar 2014 05:43:11 +0000
Subject: [maker-devel] tradeoff between run time & file number
In-Reply-To: <CAESS274qd5dL9apLh3sobjkz0+vwjVa9j0Ytd5dR-Qrb4av+=Q@mail.gmail.com>
References: <CAESS274qd5dL9apLh3sobjkz0+vwjVa9j0Ytd5dR-Qrb4av+=Q@mail.gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6F524@mxb2.hg.genetics.utah.edu>

Hi Rebecca, So, as far as pruning down the dataset goes, I think that the biggest gains will be made by trimming the number of scaffolds that you annotate. What is the n50 of your 400,000 scaffold set? Usually, scaffolds shorter than 5k or 10kbp won't contribute much to the gene counts in the end.

Also, if you can, try to avoid using the alt_est option. It works completely fine, but blasting those sequences takes much longer than blastn or blastp.

Otherwise, I'd need to see your maker_opts.ctl file to see how you've got things set up. You can attach those to your reply (to the maker-devel list), and I'll take a look. I don't how to force maker to create fewer files. You definitely want to be able to make use of the results from prior runs to save time.

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Rebecca Harris [rbharris at uw.edu]
Sent: Wednesday, March 19, 2014 7:19 PM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] tradeoff between run time & file number

Hi -

I'm running maker on a dataset of >400,000 scaffolds with MPI -n 64. I've gone through it once - and used the clean_up option because otherwise maker exceeds the clusters file_quote. However, now I'm retraining SNAP and it is taking a very long time - probably because it has to go through BLAST again. Is there anyway of getting around this? I expect I may have to train SNAP and rerun maker multiple times and it is taking about 3 weeks to get through my dataset. Is there a way to prune down my original dataset based on maker's output?

Thanks,
Rebecca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140320/c636afd0/attachment-0001.html>

From darasappan at gmail.com  Thu Mar 20 11:22:47 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Thu, 20 Mar 2014 12:22:47 -0500
Subject: [maker-devel] maker snap output files
In-Reply-To: <CF4E0363.AE3D%carsonhh@gmail.com>
References: <F88D0CA1-15E3-4E01-881F-4F697044B1FC@gmail.com>
	<CF4DFA69.AE2E%carsonhh@gmail.com>
	<05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com>
	<CF4E0363.AE3D%carsonhh@gmail.com>
Message-ID: <48D7969E-3BA8-4086-8886-11B32CDAA2A2@gmail.com>

Hi Carson,

Given that I now have maker transcripts, ab initio predicted transcripts and transcripts that don?t overlap, which ones are reflected in the gff file?

The ids in the gff file (for exons, genes, mrna) all say something like ?*snap-gene?  so does this mean these are the genes from the snap prediction tool?


Thanks
dhivya


On Mar 18, 2014, at 3:09 PM, Carson Holt <carsonhh at gmail.com> wrote:

> There can also be hint based predictions.  They may be similar in size, but there is no rule.  Generally maker.snap_masked.proteins.fasta will be larger, as gene predictors tend to over predict (as much as 10 fold).  You should always review your annotations in something like Apollo, to see how the models compare to the evidence.  Just counts don?t really mean anything.
> 
> Thanks,
> Carson
> 
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Tuesday, March 18, 2014 at 2:05 PM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: <maker-devel at yandell-lab.org>
> Subject: Re: maker snap output files
> 
> Thanks Carson.
> 
> Is it normal that in my maker results after running snap, the number of proteins (in *maker.proteins.fasta) Is actually less than the number of proteins in my pre-snap maker results?  I assumed that annotations through alignment+annotation through prediction would equal more annotations?
> 
> The unfiltered proteins file has more proteins though.
> 
> Thanks
> Dhivya
> 
> 
> 
> On Mar 18, 2014, at 2:34 PM, Carson Holt <carsonhh at gmail.com> wrote:
> 
>> maker.proteins.fasta - these are the final filtered and modified protein models (this is what you want)
>> maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab initio predictions (for reference purposes)
>> maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant rejected models that do not overlap the maker.proteins.fasta entries. If you think you are missing a gene, look for it here.  Sometimes people use interproscan (very slow) to analyze this file for false negatives.
>> 
>> 
>> These files are also described in the README distributed with MAKER in the ?MAKER OUTPUT? section.
>> 
>> Thanks,
>> Carson
>> 
>> 
>> 
>> 
>> From: dhivya arasappan <darasappan at gmail.com>
>> Date: Tuesday, March 18, 2014 at 1:27 PM
>> To: Carson Holt <carsonhh at gmail.com>, <maker-devel at yandell-lab.org>
>> Subject: maker snap output files
>> 
>> Hello,
>> 
>> I ran maker after running SNAP ab initio prediction (following instructions from the maker tutorial).  It ran successfully and when I ran fasta_merge, I got several output fasta files. I?m unable to find information on the tutorial about interpreting these different files. I?m hoping one of you can help.
>> 
>> *maker.proteins.fasta
>> *maker.snap_masked.proteins.fasta
>> *maker.non_overlapping_ab_initio.proteins.fasta
>> 
>> What is the difference among these? They all have different number of sequences.
>> 
>> Similarly,with transcripts:
>> 
>> maker.non_overlapping_ab_initio.transcripts.fasta
>> maker.snap_masked.transcripts.fasta
>> maker.transcripts.fasta
>> 
>> Thanks
>> Dhivya
>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140320/9aed362d/attachment-0001.html>

From carsonhh at gmail.com  Thu Mar 20 11:24:41 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 20 Mar 2014 11:24:41 -0600
Subject: [maker-devel] maker snap output files
In-Reply-To: <48D7969E-3BA8-4086-8886-11B32CDAA2A2@gmail.com>
References: <F88D0CA1-15E3-4E01-881F-4F697044B1FC@gmail.com>
	<CF4DFA69.AE2E%carsonhh@gmail.com>
	<05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com>
	<CF4E0363.AE3D%carsonhh@gmail.com>
	<48D7969E-3BA8-4086-8886-11B32CDAA2A2@gmail.com>
Message-ID: <CF508021.AF35%carsonhh@gmail.com>

maker transcripts will be the gene/mRNA/exon/CDS features

All other transcripts from SNAP etc. will be match/match_part features in
the GFF3.

When you look at these in something like Apollo, they will be placed in
different viewing panels based on their type.

Thanks,
Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Thursday, March 20, 2014 at 11:22 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  <maker-devel at yandell-lab.org>
Subject:  Re: maker snap output files

Hi Carson,

Given that I now have maker transcripts, ab initio predicted transcripts and
transcripts that don?t overlap, which ones are reflected in the gff file?

The ids in the gff file (for exons, genes, mrna) all say something like
?*snap-gene?  so does this mean these are the genes from the snap prediction
tool?


Thanks
dhivya


On Mar 18, 2014, at 3:09 PM, Carson Holt <carsonhh at gmail.com> wrote:

> There can also be hint based predictions.  They may be similar in size, but
> there is no rule.  Generally maker.snap_masked.proteins.fasta will be larger,
> as gene predictors tend to over predict (as much as 10 fold).  You should
> always review your annotations in something like Apollo, to see how the models
> compare to the evidence.  Just counts don?t really mean anything.
> 
> Thanks,
> Carson
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Tuesday, March 18, 2014 at 2:05 PM
> To:  Carson Holt <carsonhh at gmail.com>
> Cc:  <maker-devel at yandell-lab.org>
> Subject:  Re: maker snap output files
> 
> Thanks Carson.
> 
> Is it normal that in my maker results after running snap, the number of
> proteins (in *maker.proteins.fasta) Is actually less than the number of
> proteins in my pre-snap maker results?  I assumed that annotations through
> alignment+annotation through prediction would equal more annotations?
> 
> The unfiltered proteins file has more proteins though.
> 
> Thanks
> Dhivya
> 
> 
> 
> On Mar 18, 2014, at 2:34 PM, Carson Holt <carsonhh at gmail.com> wrote:
> 
>> maker.proteins.fasta - these are the final filtered and modified protein
>> models (this is what you want)
>> maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab
>> initio predictions (for reference purposes)
>> maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant
>> rejected models that do not overlap the maker.proteins.fasta entries. If you
>> think you are missing a gene, look for it here.  Sometimes people use
>> interproscan (very slow) to analyze this file for false negatives.
>> 
>> 
>> These files are also described in the README distributed with MAKER in the
>> ?MAKER OUTPUT? section.
>> 
>> Thanks,
>> Carson
>> 
>> 
>> 
>> 
>> From:  dhivya arasappan <darasappan at gmail.com>
>> Date:  Tuesday, March 18, 2014 at 1:27 PM
>> To:  Carson Holt <carsonhh at gmail.com>, <maker-devel at yandell-lab.org>
>> Subject:  maker snap output files
>> 
>> Hello,
>> 
>> I ran maker after running SNAP ab initio prediction (following instructions
>> from the maker tutorial).  It ran successfully and when I ran fasta_merge, I
>> got several output fasta files. I?m unable to find information on the
>> tutorial about interpreting these different files. I?m hoping one of you can
>> help.
>> 
>> *maker.proteins.fasta
>> *maker.snap_masked.proteins.fasta
>> *maker.non_overlapping_ab_initio.proteins.fasta
>> 
>> What is the difference among these? They all have different number of
>> sequences.
>> 
>> Similarly,with transcripts:
>> 
>> maker.non_overlapping_ab_initio.transcripts.fasta
>> maker.snap_masked.transcripts.fasta
>> maker.transcripts.fasta
>> 
>> Thanks
>> Dhivya
>> 
>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140320/5d055334/attachment-0001.html>

From carsonhh at gmail.com  Thu Mar 20 11:53:24 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 20 Mar 2014 11:53:24 -0600
Subject: [maker-devel] tradeoff between run time & file number
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6F524@mxb2.hg.genetics.utah.edu>
References: <CAESS274qd5dL9apLh3sobjkz0+vwjVa9j0Ytd5dR-Qrb4av+=Q@mail.gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6F524@mxb2.hg.genetics.utah.edu>
Message-ID: <CF50861A.AF65%carsonhh@gmail.com>

You may also want to try the GFF3 pass_through options.  Basically you give
your GFF3 file to maker_gff, tell it what kinds of evidence to maintain from
your past run by setting the 'pass' options to 1.  Then you can run without
your fast file inputs for ESTs, Proteins, and repeats (also blank out repeat
masker species as well).  The values will be passed forward from the GFF3
file into the current run.

--Carson


From:  Daniel Ence <dence at genetics.utah.edu>
Date:  Wednesday, March 19, 2014 at 11:43 PM
To:  Rebecca Harris <rbharris at uw.edu>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] tradeoff between run time & file number

Hi Rebecca, So, as far as pruning down the dataset goes, I think that the
biggest gains will be made by trimming the number of scaffolds that you
annotate. What is the n50 of your 400,000 scaffold set? Usually, scaffolds
shorter than 5k or 10kbp won't contribute much to the gene counts in the
end. 

Also, if you can, try to avoid using the alt_est option. It works completely
fine, but blasting those sequences takes much longer than blastn or blastp.

Otherwise, I'd need to see your maker_opts.ctl file to see how you've got
things set up. You can attach those to your reply (to the maker-devel list),
and I'll take a look. I don't how to force maker to create fewer files. You
definitely want to be able to make use of the results from prior runs to
save time. 

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330

From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Rebecca
Harris [rbharris at uw.edu]
Sent: Wednesday, March 19, 2014 7:19 PM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] tradeoff between run time & file number

Hi - 

I'm running maker on a dataset of >400,000 scaffolds with MPI -n 64. I've
gone through it once - and used the clean_up option because otherwise maker
exceeds the clusters file_quote. However, now I'm retraining SNAP and it is
taking a very long time - probably because it has to go through BLAST again.
Is there anyway of getting around this? I expect I may have to train SNAP
and rerun maker multiple times and it is taking about 3 weeks to get through
my dataset. Is there a way to prune down my original dataset based on
maker's output?

Thanks,
Rebecca
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140320/583f25f5/attachment-0001.html>

From carsonhh at gmail.com  Fri Mar 21 08:23:18 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 21 Mar 2014 08:23:18 -0600
Subject: [maker-devel] Annotation with maker2
In-Reply-To: <CAF+kvSZZJA1+ZvRfqArTERXSy_aTZJ07w4kE_JgR0eo1mWe3FQ@mail.gmail.com>
References: <CAF+kvSZO+VzHveN+WNmD3O8qayyrOFATS7VA2c-wLdGs1m4iTw@mail.gmail.com>
	<CF4EF035.AE6F%carsonhh@gmail.com>
	<CAF+kvSasxjb7p_Wjtntmy2nht6kfL=JqaP5DfMGeC0GHkLy8Hw@mail.gmail.com>
	<CF5065FF.AEE7%carsonhh@gmail.com>
	<CAF+kvSbKs-sdFfvncqEgsAk4_XKbsB7KdB85fCdxpcWNe1rjWQ@mail.gmail.com>
	<CF506B1E.AEED%carsonhh@gmail.com>
	<CAF+kvSbmpRgneyfz6_tWsx_NS8ZWhuwnQAV0hA83qJrOVh-0hA@mail.gmail.com>
	<CAF+kvSZD7rpUeeoNGMKBGbH0zZN3bHksJFuUPP+hGoUKki34jw@mail.gmail.com>
	<CF506F04.AEF8%carsonhh@gmail.com>
	<CAF+kvSY_nWAFBH1YpKJqWV7qQ=XehHzhX9e+65miAG4f_+=ptA@mail.gmail.com>
	<CAF+kvSYYTA8pYFc0WY12+g6T_bk7P9MRUxNpzqtGkJARsA0wpg@mail.gmail.com>
	<CF50741C.AF02%carsonhh@gmail.com>
	<CAF+kvSZAhJnJdq+UcRfpWSya+6W26ecZHkRvHzGLsqk6K=fmQg@mail.gmail.com>
	<CF507AB2.AF1E%carsonhh@gmail.com>
	<CF507F90.AF30%carsonhh@gmail.com>
	<CAF+kvSZZJA1+ZvRfqArTERXSy_aTZJ07w4kE_JgR0eo1mWe3FQ@mail.gmail.com>
Message-ID: <CF51A74A.AFA8%carsonhh@gmail.com>

Glad it's working.  Let us know if anything else comes up.

--Carson


From:  Chris Bioinfo <chrisbioinfo at gmail.com>
Date:  Friday, March 21, 2014 at 4:57 AM
To:  Carson Holt <carsonhh at gmail.com>
Subject:  Re: [maker-devel] Annotation with maker2

Dear Carson

it works!! after many difficults :

I have installed sqlite3.8.4.1 yesterday: it was """better"""" (no error
message by launching sqlite3). Yet my test.db was not created..

Today I find the trick!
the problem was due to my too long path to created the db .. only that...

Thanks for your time and you help Carson!

All the best,

Christelle


2014-03-20 18:21 GMT+01:00 Carson Holt <carsonhh at gmail.com>:
> Also you can use this command line to test both before and after installing
> 
> perl -MDBI -MDBD::SQLite -e 'print "$DBD::SQLite::sqlite_version\n"; $dbh =
> DBI->connect("dbi:SQLite:dbname=/path/from/maker/error/dpp_contig.db","","");'
> 
> Make sure to set /path/from/maker/error/dpp_contig.db to whatever its was in
> the error.
> 
> --Carson
> 
> 
> From:  Carson Holt <carsonhh at gmail.com>
> Date:  Thursday, March 20, 2014 at 11:03 AM
> To:  Chris Bioinfo <chrisbioinfo at gmail.com>
> 
> Subject:  Re: [maker-devel] Annotation with maker2
> 
> The failure is in SQLite.  So you have to reinstall.  I.e. 'force install
> DBD::SQLite' in CPAN.  Otherwise you are just keeping whatever module is
> installed which may have broken C bindings.
> 
> You may also have to install SQLite 3.8.4.1, and then reinstall the perl
> modules using the force option to force recompile.
> 
> --Carson
> 
> 
> 
> From:  Chris Bioinfo <chrisbioinfo at gmail.com>
> Date:  Thursday, March 20, 2014 at 10:57 AM
> To:  Carson Holt <carsonhh at gmail.com>
> Subject:  Re: [maker-devel] Annotation with maker2
> 
> cpan[2]> install DBI
> DBI is up to date (1.631).
> 
> cpan[3]> install DBD::SQLite
> DBD::SQLite is up to date (1.42).
> 
> my test.db is not created effectively:
> 
> sqlite3 dpp_contig.maker.output/test.db
> SQLite version 3.8.3.1 2014-02-11 14:52:19
> Enter ".help" for instructions
> Enter SQL statements terminated with a ";"
> sqlite> 
> 
> 
> 
> 
> 2014-03-20 17:36 GMT+01:00 Carson Holt <carsonhh at gmail.com>:
>> I'm actually checking the mount points for the disk.  SQLite won't work on
>> filesystems that don't implement locks, and 'df' is a good way to infer some
>> of that info.
>> 
>> Basically I still think this is SQLlite failing on your system.  You might
>> need to reinstall SQLlite and then reinstall the perl DBI and DBD::SQLite
>> modules.
>> 
>> You can also do a test command --> 'sqllite3 dpp_contig.maker.output/test.db'
>> 
>> This will work if you have sqllite3 installed.  And any error it give may be
>> informative.
>> 
>> --Carson
>> 
>> From:  Chris Bioinfo <chrisbioinfo at gmail.com>
>> Date:  Thursday, March 20, 2014 at 10:29 AM
>> 
>> To:  Carson Holt <carsonhh at gmail.com>
>> Subject:  Re: [maker-devel] Annotation with maker2
>> 
>> oh sorry
>> 
>> my disks are quite full, but still space I guess for maker
>> 
>>  /dev/sdc1           19T     18T  934G  95% /home
>> 
>> 
>> 2014-03-20 17:23 GMT+01:00 Chris Bioinfo <chrisbioinfo at gmail.com>:
>>> this :
>>> 
>>>  du -h dpp_contig.maker.output/
>>> 0    
>>> dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500/theVoi
>>> d.contig-dpp-500-500/0
>>> 88K    
>>> dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500/theVoi
>>> d.contig-dpp-500-500
>>> 92K    dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500
>>> 92K    dpp_contig.maker.output/dpp_contig_datastore/05/1F
>>> 92K    dpp_contig.maker.output/dpp_contig_datastore/05
>>> 92K    dpp_contig.maker.output/dpp_contig_datastore
>>> 4.0K    dpp_contig.maker.output/dpp_contig_master_datastore_index.log
>>> 4.0K    dpp_contig.maker.output/maker_bopts.log
>>> 4.0K    dpp_contig.maker.output/maker_exe.log
>>> 8.0K    dpp_contig.maker.output/maker_opts.log
>>> 16K    dpp_contig.maker.output/mpi_blastdb/dpp_protein%2Efasta.mpi.1
>>> 44K    dpp_contig.maker.output/mpi_blastdb/dpp_contig%2Efasta.mpi.1
>>> 14M    dpp_contig.maker.output/mpi_blastdb/te_proteins%2Efasta.mpi.10
>>> 32K    dpp_contig.maker.output/mpi_blastdb/dpp_est%2Efasta.mpi.1
>>> 14M    dpp_contig.maker.output/mpi_blastdb
>>> 0    dpp_contig.maker.output/seen.dbm
>>> 
>>> 
>>> 
>>> 2014-03-20 17:10 GMT+01:00 Carson Holt <carsonhh at gmail.com>:
>>> 
>>>> What does 'df -h dpp_contig.maker.output' show?
>>>> 
>>>> --Carson
>>>> 
>>>> From:  Chris Bioinfo <chrisbioinfo at gmail.com>
>>>> Date:  Thursday, March 20, 2014 at 10:00 AM
>>>> 
>>>> To:  Carson Holt <carsonhh at gmail.com>
>>>> Subject:  Re: [maker-devel] Annotation with maker2
>>>> 
>>>> sorry, mistake on the dir!
>>>> 
>>>> I have these files:
>>>> dpp_contig_datastore  dpp_contig_master_datastore_index.log
>>>> maker_bopts.log  maker_exe.log  maker_opts.log  mpi_blastdb  seen.dbm
>>>> 
>>>> 
>>>> 2014-03-20 16:59 GMT+01:00 Chris Bioinfo <chrisbioinfo at gmail.com>:
>>>>> no,
>>>>> 
>>>>> I have theses files in the directory:
>>>>> dpp_contig.fasta         dpp_est.fasta      hsap_contig.fasta
>>>>> hsap_protein.fasta  maker_exe.ctl
>>>>> dpp_contig.maker.output  dpp_protein.fasta  hsap_est.fasta
>>>>> maker_bopts.ctl     maker_opts.ctl  te_proteins.fasta
>>>>> 
>>>>> 
>>>>> 
>>>>> 2014-03-20 16:53 GMT+01:00 Carson Holt <carsonhh at gmail.com>:
>>>>> 
>>>>>> Did 
>>>>>> /home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/dpp_contig.maker.
>>>>>> output/dpp_contig.db exist?
>>>>>> 
>>>>>> --Carson
>>>>>> 
>>>>>> 
>>>>>> From:  Chris Bioinfo <chrisbioinfo at gmail.com>
>>>>>> Date:  Thursday, March 20, 2014 at 9:50 AM
>>>>>> 
>>>>>> To:  Carson Holt <carsonhh at gmail.com>
>>>>>> Subject:  Re: [maker-devel] Annotation with maker2
>>>>>> 
>>>>>> cdantec at belem:~$ /usr/bin/perl -v
>>>>>> 
>>>>>> This is perl 5, version 18, subversion 1 (v5.18.1) built for
>>>>>> x86_64-linux-gnu-thread-multi
>>>>>> (with 46 registered patches, see perl -V for more detail)
>>>>>> 
>>>>>> Copyright 1987-2013, Larry Wall
>>>>>> 
>>>>>> Perl may be copied only under the terms of either the Artistic License or
>>>>>> the
>>>>>> GNU General Public License, which may be found in the Perl 5 source kit.
>>>>>> 
>>>>>> Complete documentation for Perl, including FAQ lists, should be found on
>>>>>> this system using "man perl" or "perldoc perl".  If you have access to
>>>>>> the
>>>>>> Internet, point your browser at http://www.perl.org/, the Perl Home Page.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 2014-03-20 16:32 GMT+01:00 Carson Holt <carsonhh at gmail.com>:
>>>>>>> What do you get for when you type --> /usr/bin/perl -v
>>>>>>> 
>>>>>>> The key to the error is this line -->
>>>>>>> DBI 
>>>>>>> connect('dbname=/home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/
>>>>>>> dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open
>>>>>>> database file
>>>>>>> 
>>>>>>> Either the database doesn't exist, or is corrupt.  Does it exist?
>>>>>>> 
>>>>>>> --Carson
>>>>>>> 
>>>>>>> From:  Chris Bioinfo <chrisbioinfo at gmail.com>
>>>>>>> Date:  Thursday, March 20, 2014 at 9:25 AM
>>>>>>> To:  Carson Holt <carsonhh at gmail.com>
>>>>>>> Subject:  Re: [maker-devel] Annotation with maker2
>>>>>>> 
>>>>>>> Dear Carson,
>>>>>>> 
>>>>>>> I have reinstalled DBD::SQLite module, check the permission in my
>>>>>>> directory, configure the TMP value in maker_opts.ctl. perl is in
>>>>>>> /usr/bin/perl.
>>>>>>> I have deleted many times  the output directory.. but same problem..
>>>>>>> 
>>>>>>> So here the debug output :
>>>>>>> ****MODULE VERSION INFO
>>>>>>>     0.05    Acme::Damn    /usr/local/lib/perl/5.18.1/Acme/Damn.pm
>>>>>>>     1.01    AnyDBM_File    /usr/share/perl/5.18/AnyDBM_File.pm
>>>>>>>     5.73    AutoLoader    /usr/share/perl/5.18/AutoLoader.pm
>>>>>>>     UNKNOWN    Bio::AnalysisParserI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/AnalysisParserI.pm
>>>>>>>     UNKNOWN    Bio::AnnotatableI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/AnnotatableI.pm
>>>>>>>     UNKNOWN    Bio::Annotation::Collection
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Annotation/Collection.pm
>>>>>>>     UNKNOWN    Bio::Annotation::SimpleValue
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Annotation/SimpleValue.pm
>>>>>>>     UNKNOWN    Bio::Annotation::TypeManager
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Annotation/TypeManager.pm
>>>>>>>     UNKNOWN    Bio::AnnotationCollectionI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/AnnotationCollectionI.pm
>>>>>>>     UNKNOWN    Bio::AnnotationI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/AnnotationI.pm
>>>>>>>     1.006923    Bio::DB::Fasta
>>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/Fasta.pm
>>>>>>>     UNKNOWN    Bio::DB::InMemoryCache
>>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/InMemoryCache.pm
>>>>>>>     UNKNOWN    Bio::DB::IndexedBase
>>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/IndexedBase.pm
>>>>>>>     UNKNOWN    Bio::DB::RandomAccessI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/RandomAccessI.pm
>>>>>>>     UNKNOWN    Bio::DB::SeqI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/SeqI.pm
>>>>>>>     UNKNOWN    Bio::DescribableI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/DescribableI.pm
>>>>>>>     UNKNOWN    Bio::Event::EventGeneratorI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Event/EventGeneratorI.pm
>>>>>>>     UNKNOWN    Bio::Event::EventHandlerI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Event/EventHandlerI.pm
>>>>>>>     UNKNOWN    Bio::Factory::ObjectFactory
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Factory/ObjectFactory.pm
>>>>>>>     UNKNOWN    Bio::Factory::ObjectFactoryI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Factory/ObjectFactoryI.pm
>>>>>>>     UNKNOWN    Bio::Factory::SequenceFactoryI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Factory/SequenceFactoryI.pm
>>>>>>>     UNKNOWN    Bio::FeatureHolderI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/FeatureHolderI.pm
>>>>>>>     UNKNOWN    Bio::IdentifiableI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/IdentifiableI.pm
>>>>>>>     UNKNOWN    Bio::LocatableSeq
>>>>>>> /usr/local/share/perl/5.18.1/Bio/LocatableSeq.pm
>>>>>>>     UNKNOWN    Bio::Location::Atomic
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Atomic.pm
>>>>>>>     UNKNOWN    Bio::Location::CoordinatePolicyI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/CoordinatePolicyI.pm
>>>>>>>     UNKNOWN    Bio::Location::Fuzzy
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Fuzzy.pm
>>>>>>>     UNKNOWN    Bio::Location::FuzzyLocationI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/FuzzyLocationI.pm
>>>>>>>     UNKNOWN    Bio::Location::Simple
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Simple.pm
>>>>>>>     UNKNOWN    Bio::Location::Split
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Split.pm
>>>>>>>     UNKNOWN    Bio::Location::SplitLocationI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/SplitLocationI.pm
>>>>>>>     UNKNOWN    Bio::Location::WidestCoordPolicy
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/WidestCoordPolicy.pm
>>>>>>>     UNKNOWN    Bio::LocationI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/LocationI.pm
>>>>>>>     UNKNOWN    Bio::PrimarySeq
>>>>>>> /usr/local/share/perl/5.18.1/Bio/PrimarySeq.pm
>>>>>>>     1.006923    Bio::PrimarySeqI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/PrimarySeqI.pm
>>>>>>>     UNKNOWN    Bio::Range    /usr/local/share/perl/5.18.1/Bio/Range.pm
>>>>>>>     UNKNOWN    Bio::RangeI    /usr/local/share/perl/5.18.1/Bio/RangeI.pm
>>>>>>>     1.006923    Bio::Root::Exception
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/Exception.pm
>>>>>>>     UNKNOWN    Bio::Root::HTTPget
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/HTTPget.pm
>>>>>>>     UNKNOWN    Bio::Root::IO
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/IO.pm
>>>>>>>     1.006923    Bio::Root::Root
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/Root.pm
>>>>>>>     1.006923    Bio::Root::RootI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/RootI.pm
>>>>>>>     1.006923    Bio::Root::Version
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/Version.pm
>>>>>>>     UNKNOWN    Bio::Search::HSP::GenericHSP
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/HSP/GenericHSP.pm
>>>>>>>     UNKNOWN    Bio::Search::HSP::HSPFactory
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/HSP/HSPFactory.pm
>>>>>>>     UNKNOWN    Bio::Search::HSP::HSPI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/HSP/HSPI.pm
>>>>>>>     0.01    Bio::Search::HSP::PhatHSP::Base
>>>>>>> 
/usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/Base.p>>>>>>>
m
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::augustus
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/august
>>>>>>> us.pm <http://augustus.pm>
>>>>>>>     0.01    Bio::Search::HSP::PhatHSP::blastn
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/blastn
>>>>>>> .pm <http://blastn.pm>
>>>>>>>     0.01    Bio::Search::HSP::PhatHSP::blastx
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/blastx
>>>>>>> .pm <http://blastx.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::cdna2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/cdna2g
>>>>>>> enome.pm <http://cdna2genome.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::est2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/est2ge
>>>>>>> nome.pm <http://est2genome.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::fgenesh
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/fgenes
>>>>>>> h.pm <http://fgenesh.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::genemark
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/genema
>>>>>>> rk.pm <http://genemark.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::gff3
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/gff3.p
>>>>>>> m <http://gff3.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::protein2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/protei
>>>>>>> n2genome.pm <http://protein2genome.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::repeatmasker
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/repeat
>>>>>>> masker.pm <http://repeatmasker.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::snap
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/snap.p
>>>>>>> m <http://snap.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::snoscan
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/snosca
>>>>>>> n.pm <http://snoscan.pm>
>>>>>>>     0.01    Bio::Search::HSP::PhatHSP::tblastx
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/tblast
>>>>>>> x.pm <http://tblastx.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::trnascan
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/trnasc
>>>>>>> an.pm <http://trnascan.pm>
>>>>>>>     1.006923    Bio::Search::Hit::GenericHit
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/Hit/GenericHit.pm
>>>>>>>     UNKNOWN    Bio::Search::Hit::HitFactory
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/Hit/HitFactory.pm
>>>>>>>     UNKNOWN    Bio::Search::Hit::HitI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/Hit/HitI.pm
>>>>>>>     0.01    Bio::Search::Hit::PhatHit::Base
>>>>>>> 
/usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/Base.p>>>>>>>
m
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::augustus
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/august
>>>>>>> us.pm <http://augustus.pm>
>>>>>>>     0.01    Bio::Search::Hit::PhatHit::blastn
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/blastn
>>>>>>> .pm <http://blastn.pm>
>>>>>>>     0.01    Bio::Search::Hit::PhatHit::blastx
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/blastx
>>>>>>> .pm <http://blastx.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::cdna2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/cdna2g
>>>>>>> enome.pm <http://cdna2genome.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::est2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/est2ge
>>>>>>> nome.pm <http://est2genome.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::fgenesh
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/fgenes
>>>>>>> h.pm <http://fgenesh.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::genemark
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/genema
>>>>>>> rk.pm <http://genemark.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::gff3
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/gff3.p
>>>>>>> m <http://gff3.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::protein2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/protei
>>>>>>> n2genome.pm <http://protein2genome.pm>
>>>>>>>     1.006923    Bio::Search::Hit::PhatHit::repeatmasker
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/repeat
>>>>>>> masker.pm <http://repeatmasker.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::snap
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/snap.p
>>>>>>> m <http://snap.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::snoscan
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/snosca
>>>>>>> n.pm <http://snoscan.pm>
>>>>>>>     0.01    Bio::Search::Hit::PhatHit::tblastx
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/tblast
>>>>>>> x.pm <http://tblastx.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::trnascan
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/trnasc
>>>>>>> an.pm <http://trnascan.pm>
>>>>>>>     1.006923    Bio::Search::SearchUtils
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/SearchUtils.pm
>>>>>>>     UNKNOWN    Bio::SearchIO
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SearchIO.pm
>>>>>>>     UNKNOWN    Bio::SearchIO::EventHandlerI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SearchIO/EventHandlerI.pm
>>>>>>>     UNKNOWN    Bio::SearchIO::SearchResultEventBuilder
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SearchIO/SearchResultEventBuilder.pm
>>>>>>>     UNKNOWN    Bio::Seq    /usr/local/share/perl/5.18.1/Bio/Seq.pm
>>>>>>>     UNKNOWN    Bio::Seq::SeqFactory
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Seq/SeqFactory.pm
>>>>>>>     UNKNOWN    Bio::SeqAnalysisParserI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqAnalysisParserI.pm
>>>>>>>     UNKNOWN    Bio::SeqFeature::FeaturePair
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/FeaturePair.pm
>>>>>>>     UNKNOWN    Bio::SeqFeature::Generic
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/Generic.pm
>>>>>>>     UNKNOWN    Bio::SeqFeature::Similarity
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/Similarity.pm
>>>>>>>     UNKNOWN    Bio::SeqFeature::SimilarityPair
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/SimilarityPair.pm
>>>>>>>     UNKNOWN    Bio::SeqFeatureI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeatureI.pm
>>>>>>>     UNKNOWN    Bio::SeqI    /usr/local/share/perl/5.18.1/Bio/SeqI.pm
>>>>>>>     UNKNOWN    Bio::SeqUtils
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqUtils.pm
>>>>>>>     1.006923    Bio::Tools::CodonTable
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Tools/CodonTable.pm
>>>>>>>     UNKNOWN    Bio::Tools::GFF
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Tools/GFF.pm
>>>>>>>     1.006923    Bio::Tools::IUPAC
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Tools/IUPAC.pm
>>>>>>>     7.3    Bit::Vector    /usr/local/lib/perl/5.18.1/Bit/Vector.pm
>>>>>>>     0.01    CGL::Annotation
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation.pm
>>>>>>>     0.01    CGL::Annotation::Feature
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature.pm
>>>>>>>     0.01    CGL::Annotation::Feature::Contig
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Contig
>>>>>>> .pm
>>>>>>>     0.01    CGL::Annotation::Feature::Exon
>>>>>>> 
/usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Exon.p>>>>>>>
m
>>>>>>>     0.01    CGL::Annotation::Feature::Gene
>>>>>>> 
/usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Gene.p>>>>>>>
m
>>>>>>>     0.01    CGL::Annotation::Feature::Intron
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Intron
>>>>>>> .pm
>>>>>>>     0.01    CGL::Annotation::Feature::Protein
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Protei
>>>>>>> n.pm
>>>>>>>     0.01    CGL::Annotation::Feature::Sequence_variant
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Sequen
>>>>>>> ce_variant.pm
>>>>>>>     0.01    CGL::Annotation::Feature::Transcript
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Transc
>>>>>>> ript.pm
>>>>>>>     0.01    CGL::Annotation::FeatureLocation
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/FeatureLocatio
>>>>>>> n.pm
>>>>>>>     0.01    CGL::Annotation::FeatureRelationship
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/FeatureRelatio
>>>>>>> nship.pm
>>>>>>>     0.01    CGL::Annotation::Iterator
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Iterator.pm
>>>>>>>     0.01    CGL::Annotation::Trace
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Trace.pm
>>>>>>>     0.01    CGL::Clone
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Clone.pm
>>>>>>>     0.01    CGL::Ontology::Node
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Node.pm
>>>>>>>     0.01    CGL::Ontology::NodeRelationship
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/NodeRelationship
>>>>>>> .pm
>>>>>>>     0.01    CGL::Ontology::Ontology
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Ontology.pm
>>>>>>>     0.01    CGL::Ontology::Parser::OBO
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Parser/OBO.pm
>>>>>>>     0.01    CGL::Ontology::SO
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/SO.pm
>>>>>>>     0.01    CGL::Ontology::Trace
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Trace.pm
>>>>>>>     0.01    CGL::Revcomp
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Revcomp.pm
>>>>>>>     0.01    CGL::TranslationMachine
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/TranslationMachine.pm
>>>>>>>     1.32    Carp    /usr/local/share/perl/5.18.1/Carp.pm
>>>>>>>     1.32    Carp::Heavy    /usr/local/share/perl/5.18.1/Carp/Heavy.pm
>>>>>>>     0.64    Class::Struct    /usr/share/perl/5.18/Class/Struct.pm
>>>>>>>     0.36    Clone    /usr/local/lib/perl/5.18.1/Clone.pm
>>>>>>>     5.018001    Config    /usr/lib/perl/5.18/Config.pm
>>>>>>>     3.40    Cwd    /usr/lib/perl/5.18/Cwd.pm
>>>>>>>     1.42    DBD::SQLite    /usr/local/lib/perl/5.18.1/DBD/SQLite.pm
>>>>>>>     1.631    DBI    /usr/local/lib/perl/5.18.1/DBI.pm
>>>>>>>     1.827    DB_File    /usr/lib/perl/5.18/DB_File.pm
>>>>>>>     2.145    Data::Dumper    /usr/lib/perl/5.18/Data/Dumper.pm
>>>>>>>     0.11    Datastore::Base
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Datastore/Base.pm
>>>>>>>     0.01    Datastore::MD5
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Datastore/MD5.pm
>>>>>>>     2.53    Digest::MD5    /usr/local/lib/perl/5.18.1/Digest/MD5.pm
>>>>>>>     1.16    Digest::base    /usr/share/perl/5.18/Digest/base.pm
>>>>>>> <http://base.pm>
>>>>>>>     UNKNOWN    Dumper::GFF::GFFV3
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Dumper/GFF/GFFV3.pm
>>>>>>>     UNKNOWN    Dumper::XML::Game
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Dumper/XML/Game.pm
>>>>>>>     UNKNOWN    Dumper::XML::Game_Xml
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Dumper/XML/Game_Xml.pm
>>>>>>>     1.18    DynaLoader    /usr/lib/perl/5.18/DynaLoader.pm
>>>>>>>     1.18    Errno    /usr/lib/perl/5.18/Errno.pm
>>>>>>>     0.17015    Error
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm
>>>>>>>     UNKNOWN    Error::Simple
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error/Simple.pm
>>>>>>>     5.68    Exporter    /usr/share/perl/5.18/Exporter.pm
>>>>>>>     5.68    Exporter::Heavy    /usr/share/perl/5.18/Exporter/Heavy.pm
>>>>>>>     UNKNOWN    Fasta
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Fasta.pm
>>>>>>>     UNKNOWN    FastaChunk
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaChunk.pm
>>>>>>>     UNKNOWN    FastaChunker
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaChunker.pm
>>>>>>>     UNKNOWN    FastaDB
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaDB.pm
>>>>>>>     UNKNOWN    FastaFile
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaFile.pm
>>>>>>>     UNKNOWN    FastaSeq
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaSeq.pm
>>>>>>>     1.11    Fcntl    /usr/lib/perl/5.18/Fcntl.pm
>>>>>>>     2.84    File::Basename    /usr/share/perl/5.18/File/Basename.pm
>>>>>>>     2.26    File::Copy    /usr/share/perl/5.18/File/Copy.pm
>>>>>>>     1.20    File::Glob    /usr/lib/perl/5.18/File/Glob.pm
>>>>>>>     1.20    File::NFSLock
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/File/NFSLock.pm
>>>>>>>     2.09    File::Path    /usr/share/perl/5.18/File/Path.pm
>>>>>>>     3.40    File::Spec    /usr/lib/perl/5.18/File/Spec.pm
>>>>>>>     3.40    File::Spec::Unix    /usr/lib/perl/5.18/File/Spec/Unix.pm
>>>>>>>     0.2304    File::Temp    /usr/local/share/perl/5.18.1/File/Temp.pm
>>>>>>>     1.09    File::Which
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/File/Which.pm
>>>>>>>     2.02    FileHandle    /usr/share/perl/5.18/FileHandle.pm
>>>>>>>     1.51    FindBin    /usr/share/perl/5.18/FindBin.pm
>>>>>>>     UNKNOWN    GFFDB
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm
>>>>>>>     UNKNOWN    GI    /usr/local/annotation/maker2.31/bin/../lib/GI.pm
>>>>>>>     2.42    Getopt::Long    /usr/local/share/perl/5.18.1/Getopt/Long.pm
>>>>>>>     6.02    HTTP::Date    /usr/share/perl5/HTTP/Date.pm
>>>>>>>     6.05    HTTP::Headers    /usr/share/perl5/HTTP/Headers.pm
>>>>>>>     6.06    HTTP::Message    /usr/share/perl5/HTTP/Message.pm
>>>>>>>     6.00    HTTP::Request    /usr/share/perl5/HTTP/Request.pm
>>>>>>>     6.04    HTTP::Response    /usr/share/perl5/HTTP/Response.pm
>>>>>>>     6.03    HTTP::Status    /usr/share/perl5/HTTP/Status.pm
>>>>>>>     1.28    IO    /usr/lib/perl/5.18/IO.pm
>>>>>>>     1.16    IO::File    /usr/lib/perl/5.18/IO/File.pm
>>>>>>>     1.34    IO::Handle    /usr/lib/perl/5.18/IO/Handle.pm
>>>>>>>     1.1    IO::Seekable    /usr/lib/perl/5.18/IO/Seekable.pm
>>>>>>>     1.21    IO::Select    /usr/lib/perl/5.18/IO/Select.pm
>>>>>>>     1.36    IO::Socket    /usr/lib/perl/5.18/IO/Socket.pm
>>>>>>>     1.33    IO::Socket::INET    /usr/lib/perl/5.18/IO/Socket/INET.pm
>>>>>>>     1.24    IO::Socket::UNIX    /usr/lib/perl/5.18/IO/Socket/UNIX.pm
>>>>>>>     1.13    IPC::Open3    /usr/share/perl/5.18/IPC/Open3.pm
>>>>>>>     0.53    Inline    /usr/local/share/perl/5.18.1/Inline.pm
>>>>>>>     UNKNOWN    Inline::denter
>>>>>>> /usr/local/share/perl/5.18.1/Inline/denter.pm <http://denter.pm>
>>>>>>>     UNKNOWN    Iterator
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator.pm
>>>>>>>     UNKNOWN    Iterator::Any
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator/Any.pm
>>>>>>>     UNKNOWN    Iterator::Fasta
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator/Fasta.pm
>>>>>>>     UNKNOWN    Iterator::GFF3
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator/GFF3.pm
>>>>>>>     6.05    LWP    /usr/share/perl5/LWP.pm
>>>>>>>     UNKNOWN    LWP::MemberMixin    /usr/share/perl5/LWP/MemberMixin.pm
>>>>>>>     6.00    LWP::Protocol    /usr/share/perl5/LWP/Protocol.pm
>>>>>>>     6.05    LWP::UserAgent    /usr/share/perl5/LWP/UserAgent.pm
>>>>>>>     0.33    List::MoreUtils
>>>>>>> /usr/local/lib/perl/5.18.1/List/MoreUtils.pm
>>>>>>>     1.38    List::Util    /usr/local/lib/perl/5.18.1/List/Util.pm
>>>>>>>     UNKNOWN    MAKER::ConfigData
>>>>>>> /usr/local/annotation/maker2.31/bin/../perl/lib/MAKER/ConfigData.pm
>>>>>>>     1.32    POSIX    /usr/lib/perl/5.18/POSIX.pm
>>>>>>>     0.01    Parallel::Application::MPI
>>>>>>> /usr/local/annotation/maker2.31/bin/../perl/lib/Parallel/Application/MPI
>>>>>>> .pm
>>>>>>>     0.02    Perl::Unsafe::Signals
>>>>>>> /usr/local/lib/perl/5.18.1/Perl/Unsafe/Signals.pm
>>>>>>>     UNKNOWN    PhatHit_utils
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/PhatHit_utils.pm
>>>>>>>     UNKNOWN    PostData
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/PostData.pm
>>>>>>>     1.0    Proc::ProcessTable_simple
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Proc/ProcessTable_simple.pm
>>>>>>>     1.0    Proc::Signal
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Proc/Signal.pm
>>>>>>>     UNKNOWN    Process::MpiChunk
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm
>>>>>>>     UNKNOWN    Process::MpiTiers
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiTiers.pm
>>>>>>>     1.38    Scalar::Util    /usr/local/lib/perl/5.18.1/Scalar/Util.pm
>>>>>>>     1.02    SelectSaver    /usr/share/perl/5.18/SelectSaver.pm
>>>>>>>     UNKNOWN    Shadower
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Shadower.pm
>>>>>>>     UNKNOWN    SimpleCluster
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/SimpleCluster.pm
>>>>>>>     2.009    Socket    /usr/lib/perl/5.18/Socket.pm
>>>>>>>     UNKNOWN    SpaceBase
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/SpaceBase.pm
>>>>>>>     2.45    Storable    /usr/local/lib/perl/5.18.1/Storable.pm
>>>>>>>     1.07    Symbol    /usr/share/perl/5.18/Symbol.pm
>>>>>>>     1.17    Sys::Hostname    /usr/lib/perl/5.18/Sys/Hostname.pm
>>>>>>>     0.21    Sys::SigAction
>>>>>>> /usr/local/share/perl/5.18.1/Sys/SigAction.pm
>>>>>>>     UNKNOWN    Sys::SigAction::Alarm
>>>>>>> /usr/local/share/perl/5.18.1/Sys/SigAction/Alarm.pm
>>>>>>>     4.02    Term::ANSIColor    /usr/share/perl/5.18/Term/ANSIColor.pm
>>>>>>>     4.2    Tie::Handle    /usr/share/perl/5.18/Tie/Handle.pm
>>>>>>>     1.04    Tie::Hash    /usr/share/perl/5.18/Tie/Hash.pm
>>>>>>>     4.3    Tie::StdHandle    /usr/share/perl/5.18/Tie/StdHandle.pm
>>>>>>>     1.9726    Time::HiRes    /usr/local/lib/perl/5.18.1/Time/HiRes.pm
>>>>>>>     1.2300    Time::Local    /usr/share/perl/5.18/Time/Local.pm
>>>>>>>     1.60    URI    /usr/share/perl5/URI.pm
>>>>>>>     3.31    URI::Escape    /usr/share/perl5/URI/Escape.pm
>>>>>>>     UNKNOWN    Widget
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget.pm
>>>>>>>     UNKNOWN    Widget::RepeatMasker
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/RepeatMasker.pm
>>>>>>>     UNKNOWN    Widget::augustus
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/augustus.pm
>>>>>>> <http://augustus.pm>
>>>>>>>     UNKNOWN    Widget::blastn
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/blastn.pm
>>>>>>> <http://blastn.pm>
>>>>>>>     UNKNOWN    Widget::blastx
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/blastx.pm
>>>>>>> <http://blastx.pm>
>>>>>>>     UNKNOWN    Widget::exonerate
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate.pm
>>>>>>> <http://exonerate.pm>
>>>>>>>     UNKNOWN    Widget::exonerate::cdna2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate/cdna2genome.
>>>>>>> pm <http://cdna2genome.pm>
>>>>>>>     UNKNOWN    Widget::exonerate::est2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate/est2genome.p
>>>>>>> m <http://est2genome.pm>
>>>>>>>     UNKNOWN    Widget::exonerate::protein2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate/protein2geno
>>>>>>> me.pm <http://protein2genome.pm>
>>>>>>>     UNKNOWN    Widget::fgenesh
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/fgenesh.pm
>>>>>>> <http://fgenesh.pm>
>>>>>>>     UNKNOWN    Widget::formater
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/formater.pm
>>>>>>> <http://formater.pm>
>>>>>>>     UNKNOWN    Widget::genemark
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/genemark.pm
>>>>>>> <http://genemark.pm>
>>>>>>>     UNKNOWN    Widget::snap
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/snap.pm
>>>>>>> <http://snap.pm>
>>>>>>>     UNKNOWN    Widget::snoscan
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/snoscan.pm
>>>>>>> <http://snoscan.pm>
>>>>>>>     UNKNOWN    Widget::tblastx
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/tblastx.pm
>>>>>>> <http://tblastx.pm>
>>>>>>>     UNKNOWN    Widget::trnascan
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/trnascan.pm 
>>>>>>> <http://trnascan.pm> 
>>>>>>>     0.16    XSLoader    /usr/share/perl/5.18/XSLoader.pm
>>>>>>>     0.21    attributes    /usr/lib/perl/5.18/attributes.pm 
>>>>>>> <http://attributes.pm> 
>>>>>>>     2.18    base    /usr/share/perl/5.18/base.pm <http://base.pm> 
>>>>>>>     1.04    bytes    /usr/share/perl/5.18/bytes.pm <http://bytes.pm> 
>>>>>>>     UNKNOWN    clean    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/clean.pm <http://clean.pm> 
>>>>>>>     UNKNOWN    cluster    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/cluster.pm 
>>>>>>> <http://cluster.pm> 
>>>>>>>     UNKNOWN    compare    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/compare.pm 
>>>>>>> <http://compare.pm> 
>>>>>>>     1.27    constant    /usr/share/perl/5.18/constant.pm 
>>>>>>> <http://constant.pm> 
>>>>>>>     UNKNOWN    ds_utility    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/ds_utility.pm 
>>>>>>> <http://ds_utility.pm> 
>>>>>>>     UNKNOWN    exonerate::splice_info    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/exonerate/splice_info.pm 
>>>>>>> <http://splice_info.pm> 
>>>>>>>     0.34    forks    /usr/local/lib/perl/5.18.1/forks.pm 
>>>>>>> <http://forks.pm> 
>>>>>>>     2.08001    forks::Devel::Symdump    
>>>>>>> /usr/local/lib/perl/5.18.1/forks/Devel/Symdump.pm
>>>>>>>     0.34    forks::shared    /usr/local/lib/perl/5.18.1/forks/shared.pm 
>>>>>>> <http://shared.pm> 
>>>>>>>     0.34    forks::signals    
>>>>>>> /usr/local/lib/perl/5.18.1/forks/signals.pm <http://signals.pm> 
>>>>>>>     1.00    integer    /usr/share/perl/5.18/integer.pm 
>>>>>>> <http://integer.pm> 
>>>>>>>     0.63    lib    /usr/lib/perl/5.18/lib.pm <http://lib.pm> 
>>>>>>>     1.02    locale    /usr/share/perl/5.18/locale.pm <http://locale.pm> 
>>>>>>>     UNKNOWN    maker::auto_annotator    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/auto_annotator.pm 
>>>>>>> <http://auto_annotator.pm> 
>>>>>>>     UNKNOWN    maker::join    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/join.pm 
>>>>>>> <http://join.pm> 
>>>>>>>     UNKNOWN    maker::quality_index    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/quality_index.pm 
>>>>>>> <http://quality_index.pm> 
>>>>>>>     UNKNOWN    maker::sens_spec    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/sens_spec.pm 
>>>>>>> <http://sens_spec.pm> 
>>>>>>>     1.22    overload    /usr/share/perl/5.18/overload.pm 
>>>>>>> <http://overload.pm> 
>>>>>>>     0.02    overloading    /usr/share/perl/5.18/overloading.pm 
>>>>>>> <http://overloading.pm> 
>>>>>>>     0.225    parent    /usr/share/perl/5.18/parent.pm <http://parent.pm> 
>>>>>>>     UNKNOWN    polisher    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher.pm 
>>>>>>> <http://polisher.pm> 
>>>>>>>     UNKNOWN    polisher::exonerate    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate.pm 
>>>>>>> <http://exonerate.pm> 
>>>>>>>     UNKNOWN    polisher::exonerate::altest    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate/altest.pm 
>>>>>>> <http://altest.pm> 
>>>>>>>     UNKNOWN    polisher::exonerate::est    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate/est.pm 
>>>>>>> <http://est.pm> 
>>>>>>>     UNKNOWN    polisher::exonerate::protein    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate/protein.pm 
>>>>>>> <http://protein.pm> 
>>>>>>>     UNKNOWN    repeat_mask_seq    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/repeat_mask_seq.pm 
>>>>>>> <http://repeat_mask_seq.pm> 
>>>>>>>     0.1    runlog    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/runlog.pm <http://runlog.pm> 
>>>>>>>     UNKNOWN    shadow_AED    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/shadow_AED.pm
>>>>>>>     1.07    sigtrap    /usr/share/perl/5.18/sigtrap.pm 
>>>>>>> <http://sigtrap.pm> 
>>>>>>>     1.07    strict    /usr/share/perl/5.18/strict.pm <http://strict.pm> 
>>>>>>>     1.77    threads    /usr/local/lib/perl/5.18.1/forks.pm 
>>>>>>> <http://forks.pm> 
>>>>>>>     1.33    threads::shared    
>>>>>>> /usr/local/lib/perl/5.18.1/forks/shared.pm <http://shared.pm> 
>>>>>>>     1.03    vars    /usr/share/perl/5.18/vars.pm <http://vars.pm> 
>>>>>>>     1.18    warnings    /usr/share/perl/5.18/warnings.pm 
>>>>>>> <http://warnings.pm> 
>>>>>>>     1.02    warnings::register    
>>>>>>> /usr/share/perl/5.18/warnings/register.pm <http://register.pm> 
>>>>>>> STATUS: Parsing control files...
>>>>>>> Calling GI::load_control_files at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 452.
>>>>>>> Calling GI::new_instance_temp at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 463.
>>>>>>> Calling GI::mount_check at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 465.
>>>>>>> Calling GI::set_global_temp at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 483.
>>>>>>> STATUS: Processing and indexing input FASTA files...
>>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 519.
>>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 519.
>>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 519.
>>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 519.
>>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 519.
>>>>>>> Calling List::Util::shuffle at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 529.
>>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 536.
>>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::nextDef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::nextFastaRef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling File::NFSLock::unlock at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling GI::build_fasta_index at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538.
>>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 539.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 536.
>>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::nextDef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::nextFastaRef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling File::NFSLock::unlock at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling GI::build_fasta_index at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538.
>>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 539.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 536.
>>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::nextDef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::nextFastaRef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling File::NFSLock::unlock at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling GI::build_fasta_index at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538.
>>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 539.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 536.
>>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::nextDef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::nextFastaRef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling File::NFSLock::unlock at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling GI::build_fasta_index at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538.
>>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 539.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling GI::create_blastdb at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 574.
>>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 575.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 575.
>>>>>>> Calling Iterator::Any::nextDef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 575.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 575.
>>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 575.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 575.
>>>>>>> Calling Iterator::Any::nextDef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 575.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 575.
>>>>>>> Calling GI::build_fasta_index at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 622.
>>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 623.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> STATUS: Setting up database for any GFF3 input...
>>>>>>> Calling GFFDB::new at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 629.
>>>>>>> Calling GFFDB::next_build at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 631.
>>>>>>> Calling ds_utility::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 635.
>>>>>>> A data structure will be created for you at:
>>>>>>> /home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/dpp_contig.maker
>>>>>>> .output/dpp_contig_datastore
>>>>>>> 
>>>>>>> To access files for individual sequences use the datastore index:
>>>>>>> /home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/dpp_contig.maker
>>>>>>> .output/dpp_contig_master_datastore_index.log
>>>>>>> 
>>>>>>> Calling Datastore::MD5::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 636.
>>>>>>> Calling Iterator::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 639.
>>>>>>> Calling Iterator::Fasta::skip_file at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 641.
>>>>>>> Calling Iterator::Fasta::step at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 643.
>>>>>>> STATUS: Now running MAKER...
>>>>>>> examining contents of the fasta file and run log
>>>>>>> Calling Datastore::MD5::mkdir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling Datastore::MD5::id_to_dir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling Datastore::MD5::mkdir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling uri_escape at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling File::Path::mkpath at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --Next Contig--
>>>>>>> 
>>>>>>> #---------------------------------------------------------------------
>>>>>>> Now starting the contig!!
>>>>>>> SeqID: contig-dpp-500-500
>>>>>>> Length: 32156
>>>>>>> #---------------------------------------------------------------------
>>>>>>> 
>>>>>>> 
>>>>>>> Calling FastaDB::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 462.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> setting up GFF3 output and fasta chunks
>>>>>>> doing repeat masking
>>>>>>> DBI 
>>>>>>> connect('dbname=/home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/
>>>>>>> dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open 
>>>>>>> database file at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm 
>>>>>>> line 107.
>>>>>>> Can't call method "do" on an undefined value at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm line 108.
>>>>>>> --> rank=NA, hostname=belem
>>>>>>> ERROR: Failed while doing repeat masking
>>>>>>> ERROR: Chunk failed at level:0, tier_type:1
>>>>>>> FAILED CONTIG:contig-dpp-500-500
>>>>>>> 
>>>>>>> ERROR: Chunk failed at level:2, tier_type:0
>>>>>>> FAILED CONTIG:contig-dpp-500-500
>>>>>>> 
>>>>>>> examining contents of the fasta file and run log
>>>>>>> Calling Datastore::MD5::mkdir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling Datastore::MD5::id_to_dir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling Datastore::MD5::mkdir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling uri_escape at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling File::Path::mkpath at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --Next Contig--
>>>>>>> 
>>>>>>> Processing run.log file...
>>>>>>> #---------------------------------------------------------------------
>>>>>>> Now retrying the contig!!
>>>>>>> SeqID: contig-dpp-500-500
>>>>>>> Length: 32156
>>>>>>> Tries: 2!!
>>>>>>> #---------------------------------------------------------------------
>>>>>>> 
>>>>>>> 
>>>>>>> Calling FastaDB::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 462.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> setting up GFF3 output and fasta chunks
>>>>>>> doing repeat masking
>>>>>>> DBI 
>>>>>>> connect('dbname=/home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/
>>>>>>> dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open 
>>>>>>> database file at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm 
>>>>>>> line 107.
>>>>>>> Can't call method "do" on an undefined value at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm line 108.
>>>>>>> --> rank=NA, hostname=belem
>>>>>>> ERROR: Failed while doing repeat masking
>>>>>>> ERROR: Chunk failed at level:0, tier_type:1
>>>>>>> FAILED CONTIG:contig-dpp-500-500
>>>>>>> 
>>>>>>> ERROR: Chunk failed at level:2, tier_type:0
>>>>>>> FAILED CONTIG:contig-dpp-500-500
>>>>>>> 
>>>>>>> examining contents of the fasta file and run log
>>>>>>> Calling Datastore::MD5::mkdir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling Datastore::MD5::id_to_dir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling Datastore::MD5::mkdir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling uri_escape at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling File::Path::mkpath at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --Next Contig--
>>>>>>> 
>>>>>>> Processing run.log file...
>>>>>>> 
>>>>>>> 
>>>>>>> Maker is now finished!!!
>>>>>>> 
>>>>>>> Many thanks for you help
>>>>>>> 
>>>>>>> Christelle
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 2014-03-19 14:01 GMT+01:00 Carson Holt <carsonhh at gmail.com>:
>>>>>>> Your problem is one of the following.  You need to reinstall the 
>>>>>>> DBD::SQLite module, you are running in a directory you don?t have 
>>>>>>> permissions for, you set your TMDIR environmental variable or TMP value 
>>>>>>> in maker_opts.ctl to an NFS mounted or memory mounted directory, or you 
>>>>>>> are using a self compiled version of Perl (I.e. not /usr/bin/perl) that 
>>>>>>> has issues (probably with DB or SQLite modules).  You can also 
>>>>>>> completely delete the output directory, and start again to see if it was 
>>>>>>> just a random error.  You should look at each of those first.  You can 
>>>>>>> also run MAKER with the --debug command line flag and send it to me if 
>>>>>>> all of those seem not to be the issue.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Carson
>>>>>>> 
>>>>>>> 
>>>>>>> From:  Chris Bioinfo <chrisbioinfo at gmail.com>
>>>>>>> Date:  Wednesday, March 19, 2014 at 5:09 AM
>>>>>>> To:  <maker-devel at yandell-lab.org>
>>>>>>> Subject:  [maker-devel] Annotation with maker2
>>>>>>> 
>>>>>>> Hello,
>>>>>>> 
>>>>>>> I'm installing/using maker2 for the first time and I have an error by 
>>>>>>> using it.
>>>>>>> 
>>>>>>> I certainly missing something, but I don't know what.
>>>>>>> 
>>>>>>> I compile maker with no error message and I have all these directories 
>>>>>>> after compilation: 
>>>>>>> bin  data  GMOD  INSTALL  lib  LICENSE  MWAS  perl  README  src
>>>>>>> 
>>>>>>> Nevertheless when I try maker2 on the test data (dpp_contig.fasta) I 
>>>>>>> have this error:
>>>>>>> 
>>>>>>> STATUS: Now running MAKER...
>>>>>>> examining contents of the fasta file and run log
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --Next Contig--
>>>>>>> 
>>>>>>> #---------------------------------------------------------------------
>>>>>>> Now starting the contig!!
>>>>>>> SeqID: contig-dpp-500-500
>>>>>>> Length: 32156
>>>>>>> #---------------------------------------------------------------------
>>>>>>> 
>>>>>>> 
>>>>>>> setting up GFF3 output and fasta chunks
>>>>>>> doing repeat masking
>>>>>>> DBI 
>>>>>>> connect('dbname=/path/to/dpp_contig.maker.output/dpp_contig.db','',...) 
>>>>>>> failed: unable to open database file at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm
>>>>>>> 
>>>>>>> Can't call method "do" on an undefined value at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm 
>>>>>>> --> rank=NA, hostname=belem
>>>>>>> ERROR: Failed while doing repeat masking
>>>>>>> ERROR: Chunk failed at level:0, tier_type:1
>>>>>>> FAILED CONTIG:contig-dpp-500-500
>>>>>>> ...
>>>>>>> 
>>>>>>> ideas?
>>>>>>> 
>>>>>>> Best,
>>>>>>> 
>>>>>>> Christelle
>>>>>>> 
>>>>>>> _______________________________________________ maker-devel mailing list 
>>>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listin
>>>>>>> fo/maker-devel_yandell-lab.org
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140321/add1314d/attachment-0001.html>

From jfierst at uoregon.edu  Fri Mar 21 09:43:59 2014
From: jfierst at uoregon.edu (Janna Fierst)
Date: Fri, 21 Mar 2014 08:43:59 -0700
Subject: [maker-devel] associating gene names between related strains
In-Reply-To: <CF489F0B.AC19%carsonhh@gmail.com>
References: <CAGoyurZz5FvX_oCGtSoq5mzwfabFS5ixaHVgzQds7Bo26NcYHg@mail.gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6C3C3@mxb2.hg.genetics.utah.edu>
	<CF489F0B.AC19%carsonhh@gmail.com>
Message-ID: <CAGoyurYLEQqXv0e9wik4NQUXMZgkrUge2-uuh7xfGWEj9oKGow@mail.gmail.com>

Hi,

I just wanted to say thanks for all your help- I did the reciprocal best
blast hits and then used the maker scripts (map_fasta_ids, map_gff_ids) to
associate names between strain assemblies/annotations. Worked perfectly!
-Janna


On Fri, Mar 14, 2014 at 11:02 AM, Carson Holt <carsonhh at gmail.com> wrote:

> maker_map_ids does a translation (i.e. change gene-A to smug1), so you
> need to know which genes you want to translate names to (two column input
> file, column 1 -> original ID, column 2 -> new ID).  I'm not sure EST
> forward is the best way to do this, although I do think maker_map_ids is
> the tool to use in the end.  The question is how to make a list of IDs to
> translate as the input to maker_map_ids?
>
> I would actually just use BLASTP against the reference strain, and then
> do reciprocal best BLAST hits.  To do this you BLAST your reference
> proteins against your maker proteins.  Then do the opposite, BLAST your
>  maker proteins against your reference proteins.  If they are both each
> others best hit, then they are orthologous, and you can safely make a two
> column entry for the maker_map_ids input (i.e. maker-gene-1 translates into
> smug1).
>
> --Carson
>
>
> From: Daniel Ence <dence at genetics.utah.edu>
> Date: Friday, March 14, 2014 at 11:32 AM
> To: Janna Fierst <jfierst at uoregon.edu>, "maker-devel at yandell-lab.org" <
> maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] associating gene names between related strains
>
> Hi Janna, So do you have one strain that you want to use as the reference
> for all the others? There's a script that comes with MAKER called
> maker_map_ids that lets you use a common prefix or suffix for entries in a
> fasta file from one strain and then use est_forward to use that ID in the
> gene models for the other species.
>
> Let me know if that's not what you're looking for,
> Daniel
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ------------------------------
> *From:* maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
> Janna Fierst [jfierst at uoregon.edu]
> *Sent:* Friday, March 14, 2014 10:06 AM
> *To:* maker-devel at yandell-lab.org
> *Subject:* [maker-devel] associating gene names between related strains
>
> Hi,
>
> we are assembling and annotating genomes for several related strains of
> Caenorhabditis worms and I was wondering if there is a way to coordinate
> the gene naming so that orthologs between species can be associated by
> name. I have been playing around a little with the est_forward option but
> can't figure out a good system/workflow that preserves names but still uses
> the strain-specific RNA-Seq EST set for the actual gene models. Thanks!
> -Janna
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140321/b8ab29c4/attachment-0001.html>

From carsonhh at gmail.com  Fri Mar 21 09:54:15 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 21 Mar 2014 09:54:15 -0600
Subject: [maker-devel] associating gene names between related strains
In-Reply-To: <CAGoyurYLEQqXv0e9wik4NQUXMZgkrUge2-uuh7xfGWEj9oKGow@mail.gmail.com>
References: <CAGoyurZz5FvX_oCGtSoq5mzwfabFS5ixaHVgzQds7Bo26NcYHg@mail.gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6C3C3@mxb2.hg.genetics.utah.edu>
	<CF489F0B.AC19%carsonhh@gmail.com>
	<CAGoyurYLEQqXv0e9wik4NQUXMZgkrUge2-uuh7xfGWEj9oKGow@mail.gmail.com>
Message-ID: <CF51BCA1.AFB9%carsonhh@gmail.com>

I'm glad we could help.

--Carson

From:  Janna Fierst <jfierst at uoregon.edu>
Date:  Friday, March 21, 2014 at 9:43 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] associating gene names between related strains

Hi,

I just wanted to say thanks for all your help- I did the reciprocal best
blast hits and then used the maker scripts (map_fasta_ids, map_gff_ids) to
associate names between strain assemblies/annotations. Worked perfectly!
-Janna


On Fri, Mar 14, 2014 at 11:02 AM, Carson Holt <carsonhh at gmail.com> wrote:
> maker_map_ids does a translation (i.e. change gene-A to smug1), so you need to
> know which genes you want to translate names to (two column input file, column
> 1 -> original ID, column 2 -> new ID).  I?m not sure EST forward is the best
> way to do this, although I do think maker_map_ids is the tool to use in the
> end.  The question is how to make a list of IDs to translate as the input to
> maker_map_ids?
> 
> I would actually just use BLASTP against the reference strain, and then do
> reciprocal best BLAST hits.  To do this you BLAST your reference proteins
> against your maker proteins.  Then do the opposite, BLAST your  maker proteins
> against your reference proteins.  If they are both each others best hit, then
> they are orthologous, and you can safely make a two column entry for the
> maker_map_ids input (i.e. maker-gene-1 translates into smug1).
> 
> ?Carson
> 
> 
> From:  Daniel Ence <dence at genetics.utah.edu>
> Date:  Friday, March 14, 2014 at 11:32 AM
> To:  Janna Fierst <jfierst at uoregon.edu>, "maker-devel at yandell-lab.org"
> <maker-devel at yandell-lab.org>
> Subject:  Re: [maker-devel] associating gene names between related strains
> 
> Hi Janna, So do you have one strain that you want to use as the reference for
> all the others? There's a script that comes with MAKER called maker_map_ids
> that lets you use a common prefix or suffix for entries in a fasta file from
> one strain and then use est_forward to use that ID in the gene models for the
> other species. 
> 
> Let me know if that's not what you're looking for,
> Daniel
> 
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> 
> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Janna
> Fierst [jfierst at uoregon.edu]
> Sent: Friday, March 14, 2014 10:06 AM
> To: maker-devel at yandell-lab.org
> Subject: [maker-devel] associating gene names between related strains
> 
> Hi,
> 
> we are assembling and annotating genomes for several related strains of
> Caenorhabditis worms and I was wondering if there is a way to coordinate the
> gene naming so that orthologs between species can be associated by name. I
> have been playing around a little with the est_forward option but can't figure
> out a good system/workflow that preserves names but still uses the
> strain-specific RNA-Seq EST set for the actual gene models. Thanks! -Janna
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak
> er-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140321/8a62aa07/attachment-0001.html>

From Hossein.Borhan at AGR.GC.CA  Fri Mar 21 10:41:38 2014
From: Hossein.Borhan at AGR.GC.CA (Borhan, Hossein)
Date: Fri, 21 Mar 2014 16:41:38 +0000
Subject: [maker-devel] non-nucleotide characters in the maker generated
	transcripts
In-Reply-To: <CF4CA8DB.AD74%carson.holt@genetics.utah.edu>
References: <E8EDFB90D92694478065C37017B3A3A6A890C8AC@SKREGIXES2.AGR.GC.CA>
	<CF47300B.AB4F%carson.holt@genetics.utah.edu>
	<CF4731CC.AB5E%carson.holt@genetics.utah.edu>
	<E8EDFB90D92694478065C37017B3A3A6A890CC84@SKREGIXES2.AGR.GC.CA>
	<CF4CA8DB.AD74%carson.holt@genetics.utah.edu>
Message-ID: <E8EDFB90D92694478065C37017B3A3A6A890F2F6@SKREGIXES2.AGR.GC.CA>

Dear Carson

I ran maker and modified .pm files and it resolved the problem with the
fasta output. Thanks a lot for your help.


HB


On 14-03-17 1:45 PM, "Carson Holt" <carson.holt at genetics.utah.edu> wrote:

>I have attached 4 files for you to place in the .../maker/Widgets/
>directory.
>
>The *blast.pm files will suppress the BLAST+ failures you are getting
>(alternatively you can just downgrade to BLAST 2.27 to get the same
>effect).  BLAST 2.29 gives a lot of warnings etc., which you can ignore.
>In the latest release NCBI redid all their warnings and error codes so it
>spits out a lot of garbage and fails with different messages than it did
>before.  For example BLAST now warns you every time it encounter a fasta
>header with a comment (virtually every fasta entry in existence falls in
>this category), so your screen will be awash with meaningless warning
>messages.
>
>The fgenesh.pm file will fix the other failure, which only occurs if you
>use fgenesh simultaneously with the est_fustion=1 option.  No other
>predictors are affected.
>
>Thanks,
>Carson
>
>
>On 3/14/14, 5:14 PM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:
>
>>Dear  Carson
>>
>>Sorry for the late reply. I was away for a couple of days. I have
>>uploaded
>>the out put files plus control and error output on the FTP site that you
>>provided
>>The user ID is borhanh
>>
>>I used blast+ for this run.
>>
>>
>>
>>
>>Regards
>>
>>
>>HB
>>
>>
>>
>>
>>
>>
>>
>>
>>On 14-03-13 10:00 AM, "Carson Holt" <carson.holt at genetics.utah.edu>
>>wrote:
>>
>>>Just resending this to the correct maker-devel address.  Please when
>>>replying, do not CC the incorrect maker-devel-bounce address.
>>>
>>>Thanks,
>>>Carson
>>>
>>>
>>>On 3/13/14, 9:56 AM, "Carson Holt" <carson.holt at genetics.utah.edu>
>>>wrote:
>>>
>>>>FGENESH is not a heavily used tool, so depending on which version it is
>>>>(either too old or too new), output might be slightly different which
>>>>could cause incorrect parsing. Could you tar up your maker.output
>>>>folder,
>>>>and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>>>>(send me either your user/guest ID after you upload).
>>>>
>>>>For the BLAST error, use BLAST+ instead.  You are using blastall which
>>>>is
>>>>the old legacy version of NCBI BLAST.  You can do this by setting the
>>>>blast type in maker_bopts.ctl and the location of executables in
>>>>maker_exe.ctl.
>>>>
>>>>Thanks,
>>>>Carson
>>>>
>>>>
>>>>
>>>>On 3/12/14, 11:58 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA>
>>>>wrote:
>>>>
>>>>>Dear Maker users
>>>>>
>>>>>
>>>>>I ran maker (2.31) on a fungal genome and found out that it inserted
>>>>>the
>>>>>word SCLAR   followed by a pair of bracket like this (0x22de7020)
>>>>>inserted in the nucleotide sequence of some of the genes. This seems
>>>>>to
>>>>>be related to transcripts predicted by fgenesh_masked.
>>>>>
>>>>>
>>>>>Here is an example for one of the genes
>>>>>
>>>>>
>>>>>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript
>>>>>>offset:0 AE
>>>>>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651
>>>>>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23
>>>>>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA
>>>>>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG
>>>>>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC
>>>>>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT
>>>>>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC
>>>>>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT
>>>>>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA
>>>>>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA
>>>>>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT
>>>>>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT
>>>>>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC
>>>>>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG
>>>>>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG
>>>>>TTTCGACAAGC
>>>>>
>>>>>The same genome sequence was used for the first round of maker (2.10)
>>>>>without such problem. I checked the sequence for the scaffold related
>>>>>to
>>>>>one of the affected transcripts and there was no error in the
>>>>>sequence.
>>>>>I am not sure what is causing this. The only error that I could spot
>>>>>in
>>>>>the output error file is the following
>>>>>
>>>>>
>>>>>[blastall] FATAL ERROR:  search cannot proceed due to errors in all
>>>>>contexts/frames of query sequences.
>>>>>
>>>>>
>>>>>
>>>>>Your help is appreciated
>>>>>
>>>>>
>>>>>
>>>>>HB
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


From carsonhh at gmail.com  Fri Mar 21 10:43:10 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 21 Mar 2014 10:43:10 -0600
Subject: [maker-devel] non-nucleotide characters in the maker generated
 transcripts
Message-ID: <CF51C832.AFC0%carsonhh@gmail.com>

Thanks for letting me know.

--Carson


On 3/21/14, 10:41 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:

>Dear Carson
>
>I ran maker and modified .pm files and it resolved the problem with the
>fasta output. Thanks a lot for your help.
>
>
>
>
>HB
>
>
>
>
>
>
>
>
>On 14-03-17 1:45 PM, "Carson Holt" <carson.holt at genetics.utah.edu> wrote:
>
>>I have attached 4 files for you to place in the .../maker/Widgets/
>>directory.
>>
>>The *blast.pm files will suppress the BLAST+ failures you are getting
>>(alternatively you can just downgrade to BLAST 2.27 to get the same
>>effect).  BLAST 2.29 gives a lot of warnings etc., which you can ignore.
>>In the latest release NCBI redid all their warnings and error codes so it
>>spits out a lot of garbage and fails with different messages than it did
>>before.  For example BLAST now warns you every time it encounter a fasta
>>header with a comment (virtually every fasta entry in existence falls in
>>this category), so your screen will be awash with meaningless warning
>>messages.
>>
>>The fgenesh.pm file will fix the other failure, which only occurs if you
>>use fgenesh simultaneously with the est_fustion=1 option.  No other
>>predictors are affected.
>>
>>Thanks,
>>Carson
>>
>>
>>On 3/14/14, 5:14 PM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:
>>
>>>Dear  Carson
>>>
>>>Sorry for the late reply. I was away for a couple of days. I have
>>>uploaded
>>>the out put files plus control and error output on the FTP site that you
>>>provided
>>>The user ID is borhanh
>>>
>>>I used blast+ for this run.
>>>
>>>
>>>
>>>
>>>Regards
>>>
>>>
>>>HB
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>On 14-03-13 10:00 AM, "Carson Holt" <carson.holt at genetics.utah.edu>
>>>wrote:
>>>
>>>>Just resending this to the correct maker-devel address.  Please when
>>>>replying, do not CC the incorrect maker-devel-bounce address.
>>>>
>>>>Thanks,
>>>>Carson
>>>>
>>>>
>>>>On 3/13/14, 9:56 AM, "Carson Holt" <carson.holt at genetics.utah.edu>
>>>>wrote:
>>>>
>>>>>FGENESH is not a heavily used tool, so depending on which version it
>>>>>is
>>>>>(either too old or too new), output might be slightly different which
>>>>>could cause incorrect parsing. Could you tar up your maker.output
>>>>>folder,
>>>>>and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>>>>>(send me either your user/guest ID after you upload).
>>>>>
>>>>>For the BLAST error, use BLAST+ instead.  You are using blastall which
>>>>>is
>>>>>the old legacy version of NCBI BLAST.  You can do this by setting the
>>>>>blast type in maker_bopts.ctl and the location of executables in
>>>>>maker_exe.ctl.
>>>>>
>>>>>Thanks,
>>>>>Carson
>>>>>
>>>>>
>>>>>
>>>>>On 3/12/14, 11:58 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA>
>>>>>wrote:
>>>>>
>>>>>>Dear Maker users
>>>>>>
>>>>>>
>>>>>>I ran maker (2.31) on a fungal genome and found out that it inserted
>>>>>>the
>>>>>>word SCLAR   followed by a pair of bracket like this (0x22de7020)
>>>>>>inserted in the nucleotide sequence of some of the genes. This seems
>>>>>>to
>>>>>>be related to transcripts predicted by fgenesh_masked.
>>>>>>
>>>>>>
>>>>>>Here is an example for one of the genes
>>>>>>
>>>>>>
>>>>>>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript
>>>>>>>offset:0 AE
>>>>>>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651
>>>>>>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23
>>>>>>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA
>>>>>>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG
>>>>>>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC
>>>>>>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT
>>>>>>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC
>>>>>>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT
>>>>>>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA
>>>>>>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA
>>>>>>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT
>>>>>>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT
>>>>>>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC
>>>>>>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG
>>>>>>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG
>>>>>>TTTCGACAAGC
>>>>>>
>>>>>>The same genome sequence was used for the first round of maker (2.10)
>>>>>>without such problem. I checked the sequence for the scaffold related
>>>>>>to
>>>>>>one of the affected transcripts and there was no error in the
>>>>>>sequence.
>>>>>>I am not sure what is causing this. The only error that I could spot
>>>>>>in
>>>>>>the output error file is the following
>>>>>>
>>>>>>
>>>>>>[blastall] FATAL ERROR:  search cannot proceed due to errors in all
>>>>>>contexts/frames of query sequences.
>>>>>>
>>>>>>
>>>>>>
>>>>>>Your help is appreciated
>>>>>>
>>>>>>
>>>>>>
>>>>>>HB
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From marc.hoeppner at imbim.uu.se  Mon Mar 24 04:08:25 2014
From: marc.hoeppner at imbim.uu.se (=?iso-8859-1?Q?Marc_H=F6ppner?=)
Date: Mon, 24 Mar 2014 10:08:25 +0000
Subject: [maker-devel] Annotations from proteins, follow-up
Message-ID: <10AFC7D0-82BA-4527-9B77-80DC4BE80CFD@imbim.uu.se>

Hi,

I had previously inquired about protein-based gene building (for example to create a training set for SNAP). This is currently possible with Maker (2.31), but I noticed a limitation. Specifically, I tend to run Maker once to generate all the raw computes (protein and set alignments, mostly). I then separate these out into GFF files that I can store away and use in various combinations of settings and data in parallel. 

However, the protein2genome option does not seem to work off pre-aligned protein data (e.g. protein2genome.gff produced with Maker). Is that intentional and is there a work-around? Or is the only option to run this with fasta files?

Cheers,

Marc


Marc P. Hoeppner, PhD

Department for Medical Biochemistry and Microbiology
Uppsala University, Sweden
marc.hoeppner at imbim.uu.se


From sujaikumar at gmail.com  Mon Mar 24 08:15:16 2014
From: sujaikumar at gmail.com (Sujai)
Date: Mon, 24 Mar 2014 14:15:16 +0000
Subject: [maker-devel] Dashes in transcript predictions
Message-ID: <CAFADFFt-Af82itPN8kXv1Ozh_9K1YxO+9NWBYkDW2aR4jP4yFg@mail.gmail.com>

Dear Maker Team

On a recent run with maker 2.31, I noticed that a couple of the transcripts
had dashes/hyphens in them.

Example:
>snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript offset:261
AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAATTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG
AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATTCCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT
GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGACCATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG
GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTTACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT
AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCCTTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG
ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAAATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT
AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAACCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT
TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA

The protein prediction for this transcript is ok:

>snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25
eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCVMTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY
DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKNKKMVVWVVSSLPSAAIRNAKRRINEQSSHV

Is this a known bug? I tried searching for "dash|hyphen" in the email list
but couldn't find anything else.

Best wishes,

- Sujai

ps. I pulled out just this one contig and ran maker on it. all the
.maker.output files are attached.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140324/c626ff64/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nGt.0.3.035610.maker.output.tgz
Type: application/x-gzip
Size: 45641 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140324/c626ff64/attachment-0001.tgz>

From carsonhh at gmail.com  Mon Mar 24 10:49:46 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Mar 2014 10:49:46 -0600
Subject: [maker-devel] Dashes in transcript predictions
In-Reply-To: <CAFADFFt-Af82itPN8kXv1Ozh_9K1YxO+9NWBYkDW2aR4jP4yFg@mail.gmail.com>
References: <CAFADFFt-Af82itPN8kXv1Ozh_9K1YxO+9NWBYkDW2aR4jP4yFg@mail.gmail.com>
Message-ID: <CF55BD0D.B01C%carsonhh@gmail.com>

I've actually never seen that before, but looking through your output it
appears to be specifically caused by setting correct_est_fusion=1, and how
it interacts with some features of your dataset.

I've attached a patch in the form of a file you can use to replace
.../maker/lib/maker/join.pm.  I'm also going to add it to the MAKER
download.

Thanks,
Carson


From:  Sujai <sujaikumar at gmail.com>
Date:  Monday, March 24, 2014 at 8:15 AM
To:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Dashes in transcript predictions

Dear Maker Team

On a recent run with maker 2.31, I noticed that a couple of the transcripts
had dashes/hyphens in them.

Example:
>snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript offset:261
AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAA
TTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG
AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATT
CCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT
GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGAC
CATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG
GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTT
ACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT
AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCC
TTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG
ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAA
ATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT
AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAA
CCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT
TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA

The protein prediction for this transcript is ok:

>snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25 eAED:0.25
QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCV
MTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY
DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKN
KKMVVWVVSSLPSAAIRNAKRRINEQSSHV

Is this a known bug? I tried searching for "dash|hyphen" in the email list
but couldn't find anything else.

Best wishes,

- Sujai

ps. I pulled out just this one contig and ran maker on it. all the
.maker.output files are attached.


_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140324/ebc5d81c/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: join.pm
Type: text/x-perl-script
Size: 18644 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140324/ebc5d81c/attachment-0001.bin>

From carsonhh at gmail.com  Mon Mar 24 11:05:15 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Mar 2014 11:05:15 -0600
Subject: [maker-devel] Annotations from proteins, follow-up
Message-ID: <CF55BE79.B028%carsonhh@gmail.com>

It not so much intentional as it is a a limitation of the information in
GFF3 format alignments. Right now protein2genome for Eukaryotes will only
try and make exonerate derived alignments work because they have been
polished around splice sites and MAKER still has access to the original
protein sequence and alignment cigar string fro additional filtering, etc.
 With GFF3 pass-through the algorithm doesn't know nearly as much about
what is passed in. For example the protein sequence is gone, cigar
alignment strings are rarely included (Gap= attribute in GFF3), and it's
not always clear if the  alignment was polished for splice sites.  Also
since protein2genome=1 is expected to be used only to generate an initial
training set, and not for final annotations, this is considered a
reasonable restriction.

If you still really want to force protein alignments from a GFF3 to be
considered as potential models, you could put them in as pred_gff.  In
which case they will always be considered as potential models.  Of course
it will be relatively ugly because you lack things I mentioned before such
as the alignment cigar string and original protein sequence that are
normally used to filter protein2genome results for inclusion as models.

--Carson


On 3/24/14, 4:08 AM, "Marc H?ppner" <marc.hoeppner at imbim.uu.se> wrote:

>Hi,
>
>I had previously inquired about protein-based gene building (for example
>to create a training set for SNAP). This is currently possible with Maker
>(2.31), but I noticed a limitation. Specifically, I tend to run Maker
>once to generate all the raw computes (protein and set alignments,
>mostly). I then separate these out into GFF files that I can store away
>and use in various combinations of settings and data in parallel.
>
>However, the protein2genome option does not seem to work off pre-aligned
>protein data (e.g. protein2genome.gff produced with Maker). Is that
>intentional and is there a work-around? Or is the only option to run this
>with fasta files?
>
>Cheers,
>
>Marc
>
>
>Marc P. Hoeppner, PhD
>
>Department for Medical Biochemistry and Microbiology
>Uppsala University, Sweden
>marc.hoeppner at imbim.uu.se
>
>
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Mon Mar 24 12:15:39 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Mar 2014 12:15:39 -0600
Subject: [maker-devel] Dashes in transcript predictions
In-Reply-To: <CF55BD0D.B01C%carsonhh@gmail.com>
References: <CAFADFFt-Af82itPN8kXv1Ozh_9K1YxO+9NWBYkDW2aR4jP4yFg@mail.gmail.com>
	<CF55BD0D.B01C%carsonhh@gmail.com>
Message-ID: <CF55C7D4.B05A%carsonhh@gmail.com>

One more note on this.  The sequence is actually fully correct if you just
remove the '-' characters.  So if you don't want to rerun MAKER with the
patch, then you can use the attached script to just repair the transcript
file by removing the '-' characters.  Your GFF3 files and proteins files
should already be correct as is.

Usage --> perl fix_dash transcript_file.fasta > new_file.fasta

You may need to place the script in the .../maker/bin/ directory so it can
detect BioPerl if you don't have BioPerl installed system wide.

Thanks,
Carson

From:  Carson Holt <carsonhh at gmail.com>
Date:  Monday, March 24, 2014 at 10:49 AM
To:  Sujai <sujaikumar at gmail.com>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Dashes in transcript predictions

I've actually never seen that before, but looking through your output it
appears to be specifically caused by setting correct_est_fusion=1, and how
it interacts with some features of your dataset.

I've attached a patch in the form of a file you can use to replace
.../maker/lib/maker/join.pm.  I'm also going to add it to the MAKER
download.

Thanks,
Carson


From:  Sujai <sujaikumar at gmail.com>
Date:  Monday, March 24, 2014 at 8:15 AM
To:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Dashes in transcript predictions

Dear Maker Team

On a recent run with maker 2.31, I noticed that a couple of the transcripts
had dashes/hyphens in them.

Example:
>snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript offset:261
AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAA
TTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG
AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATT
CCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT
GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGAC
CATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG
GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTT
ACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT
AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCC
TTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG
ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAA
ATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT
AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAA
CCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT
TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA

The protein prediction for this transcript is ok:

>snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25 eAED:0.25
QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCV
MTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY
DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKN
KKMVVWVVSSLPSAAIRNAKRRINEQSSHV

Is this a known bug? I tried searching for "dash|hyphen" in the email list
but couldn't find anything else.

Best wishes,

- Sujai

ps. I pulled out just this one contig and ran maker on it. all the
.maker.output files are attached.


_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m
aker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140324/0a71390d/attachment-0001.html>

From sujaikumar at gmail.com  Mon Mar 24 12:17:02 2014
From: sujaikumar at gmail.com (Sujai)
Date: Mon, 24 Mar 2014 18:17:02 +0000
Subject: [maker-devel] Dashes in transcript predictions
In-Reply-To: <CF55C7D4.B05A%carsonhh@gmail.com>
References: <CAFADFFt-Af82itPN8kXv1Ozh_9K1YxO+9NWBYkDW2aR4jP4yFg@mail.gmail.com>
	<CF55BD0D.B01C%carsonhh@gmail.com> <CF55C7D4.B05A%carsonhh@gmail.com>
Message-ID: <CAFADFFs6KYiZ8rmfEwYVCYbGymJOUXHVcKVShscBBjjCR3q2fA@mail.gmail.com>

Wow. That was a super quick response. Thanks very much for confirming the
problem and the fixes!


On 24 March 2014 18:15, Carson Holt <carsonhh at gmail.com> wrote:

> One more note on this.  The sequence is actually fully correct if you just
> remove the '-' characters.  So if you don't want to rerun MAKER with the
> patch, then you can use the attached script to just repair the transcript
> file by removing the '-' characters.  Your GFF3 files and proteins files
> should already be correct as is.
>
> Usage --> perl fix_dash transcript_file.fasta > new_file.fasta
>
> You may need to place the script in the .../maker/bin/ directory so it can
> detect BioPerl if you don't have BioPerl installed system wide.
>
> Thanks,
> Carson
>
> From: Carson Holt <carsonhh at gmail.com>
> Date: Monday, March 24, 2014 at 10:49 AM
> To: Sujai <sujaikumar at gmail.com>, "maker-devel at yandell-lab.org" <
> maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Dashes in transcript predictions
>
> I've actually never seen that before, but looking through your output it
> appears to be specifically caused by setting correct_est_fusion=1, and how
> it interacts with some features of your dataset.
>
> I've attached a patch in the form of a file you can use to replace
> .../maker/lib/maker/join.pm.  I'm also going to add it to the MAKER
> download.
>
> Thanks,
> Carson
>
>
> From: Sujai <sujaikumar at gmail.com>
> Date: Monday, March 24, 2014 at 8:15 AM
> To: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: [maker-devel] Dashes in transcript predictions
>
> Dear Maker Team
>
> On a recent run with maker 2.31, I noticed that a couple of the
> transcripts had dashes/hyphens in them.
>
> Example:
> >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript
> offset:261 AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
> TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAATTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG
> AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATTCCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT
> GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGACCATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG
> GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTTACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT
> AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCCTTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG
> ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAAATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT
> AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAACCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT
> TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA
>
> The protein prediction for this transcript is ok:
>
> >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25
> eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
>
> MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCVMTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY
>
> DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKNKKMVVWVVSSLPSAAIRNAKRRINEQSSHV
>
> Is this a known bug? I tried searching for "dash|hyphen" in the email list
> but couldn't find anything else.
>
> Best wishes,
>
> - Sujai
>
> ps. I pulled out just this one contig and ran maker on it. all the
> .maker.output files are attached.
>
>
>  _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140324/88aabc4b/attachment-0001.html>

From diana.garnica at anu.edu.au  Mon Mar 24 17:11:01 2014
From: diana.garnica at anu.edu.au (Diana Garnica Moreno)
Date: Mon, 24 Mar 2014 23:11:01 +0000
Subject: [maker-devel] Problem extracting fasta from a GFF file generated
	with MAKER
Message-ID: <1264f0423dbe47b18ed3bc8b49c5b31d@HKXPR06MB101.apcprd06.prod.outlook.com>

Hi there,


We recently assembled a fungal genome using MAKER and we got the gene models. and the corresponding transcripts, predicted proteins and GFF files. However, the predicted proteins do not have the stop codon included so I do not know which proteins are complete and which ones are incomplete at the 3' end. To solve that I have used different programs to extract the fasta sequence of the CDSs given the gff file and the genome sequence. The problem is that with the tools I have tested I get the right sequence for some of the proteins and wrong sequences for others (with multiple stop codons for example). I am not sure why it happens and since it happens with different tools (different python scripts and even gffread from cufflink) I do not know where is the problem. Could you please give me some advice on how to extract the right sequences with the stop codons included?


Thanks!


Diana
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140324/352e150d/attachment-0001.html>

From carsonhh at gmail.com  Mon Mar 24 17:25:09 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Mar 2014 17:25:09 -0600
Subject: [maker-devel] Problem extracting fasta from a GFF file
 generated with MAKER
Message-ID: <CF56185B.B0E1%carsonhh@gmail.com>

You are probably getting the wrong proteins from your scripts because you
are not taking into account the 5' and 3' UTR in the transcript.

For example
>snap_masked-contig-processed-gene-0.2-mRNA-1 transcript offset:261 AED:0.25
eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|22|240

The 5' UTR is 261bp and the 3' UTR is 22bp long.  Both would have to be
trimmed before translating the transcript into a protein. Once they are
trimmed you can use frame 0 for the translation.

The fasta_tool that comes with MAKER can be used to quickly trim the UTR.

Example:
fasta_tool maker_transcripts.fasta --trim_maker_utr

Then you can try your other scripts again.

Thanks,
Carson


From:  Diana Garnica Moreno <diana.garnica at anu.edu.au>
Date:  Monday, March 24, 2014 at 5:11 PM
To:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Problem extracting fasta from a GFF file generated
with MAKER

Hi there,


We recently assembled a fungal genome using MAKER and we got the gene
models. and the corresponding transcripts, predicted proteins and GFF files.
However, the predicted proteins do not have the stop codon included so I do
not know which proteins are complete and which ones are incomplete at the 3'
end. To solve that I have used different programs to extract the fasta
sequence of the CDSs given the gff file and the genome sequence. The problem
is that with the tools I have tested I get the right sequence for some of
the proteins and wrong sequences for others (with multiple stop codons for
example). I am not sure why it happens and since it happens with different
tools (different python scripts and even gffread from cufflink) I do not
know where is the problem. Could you please give me some advice on how to
extract the right sequences with the stop codons included?


Thanks!


Diana
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140324/2bcbc369/attachment-0001.html>

From daniel.standage at gmail.com  Tue Mar 25 07:24:14 2014
From: daniel.standage at gmail.com (Daniel Standage)
Date: Tue, 25 Mar 2014 09:24:14 -0400
Subject: [maker-devel] Maker iPlant image
Message-ID: <CAOfLjHVa1r8hdF0GK+gp59pmfZb7qZLO5rF0qwK7b+=hQ0CcrQ@mail.gmail.com>

Greetings,

I launched an instance from the Maker-P 2.28 image
(c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location
of the installed software. All I could find was an example data set on the
Desktop, but the "maker" program was not in the path and the contents of
"/usr/local/src" are empty. Could you please advise on how to run Maker in
iPlant Atmosphere? Thanks.

--
Daniel S. Standage
Ph.D. Candidate
Computational Genome Science Laboratory
Indiana University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140325/6766e38e/attachment-0001.html>

From ernesto at ebi.ac.uk  Tue Mar 25 04:10:59 2014
From: ernesto at ebi.ac.uk (ernesto lowy gallego)
Date: Tue, 25 Mar 2014 10:10:59 +0000
Subject: [maker-devel] Incorrect translation start codon
Message-ID: <53315633.2070702@ebi.ac.uk>

Hi,

I have been inspecting the MAKER predictions and I detected a situation 
which appears with a certain frequency.
(See attached Apollo screenshot illustrating the situation I am going to 
describe):

Let's say that there is est2genome evidence supporting the prediction of 
the 5' UTR region, I have realized that in some of these transcripts 
with 5'UTR, MAKER is not capable of identifying the right downstream ATG 
protein start codon and considers a TTG codon (coding for L) as the 
incorrect protein start. The proper ATG codon start is further 
downstream, as the Ab-initio predictors (SNAP+AUGUSTUS) correctly 
predict in this case (see the attached screenshot)

Any comments on this?

Thanks!

ernesto

-- 
Developer

VectorBase | Ensembl Genomes

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2014-03-25 at 09.34.16.png
Type: image/png
Size: 32220 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140325/f9ae69ec/attachment-0001.png>

From carsonhh at gmail.com  Tue Mar 25 08:19:22 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 25 Mar 2014 08:19:22 -0600
Subject: [maker-devel] Incorrect translation start codon
In-Reply-To: <53315633.2070702@ebi.ac.uk>
References: <53315633.2070702@ebi.ac.uk>
Message-ID: <CF56EBF0.B109%carsonhh@gmail.com>

This is caused by BioPerl's is_start_codon method and default codon table
returning true for non-canonical start codons.  It was resolved some time
ago (See previous discussion -->
https://groups.google.com/forum/#!topic/maker-devel/S0j1fJ4LjVY ).  Make
sure you are using the most recent version of MAKER (currently 2.31).

Thanks,
Carson


https://groups.google.com/forum/#!topic/maker-devel/S0j1fJ4LjVY

On 3/25/14, 4:10 AM, "ernesto lowy gallego" <ernesto at ebi.ac.uk> wrote:

>Hi,
>
>I have been inspecting the MAKER predictions and I detected a situation
>which appears with a certain frequency.
>(See attached Apollo screenshot illustrating the situation I am going to
>describe):
>
>Let's say that there is est2genome evidence supporting the prediction of
>the 5' UTR region, I have realized that in some of these transcripts
>with 5'UTR, MAKER is not capable of identifying the right downstream ATG
>protein start codon and considers a TTG codon (coding for L) as the
>incorrect protein start. The proper ATG codon start is further
>downstream, as the Ab-initio predictors (SNAP+AUGUSTUS) correctly
>predict in this case (see the attached screenshot)
>
>Any comments on this?
>
>Thanks!
>
>ernesto
>
>-- 
>Developer
>
>VectorBase | Ensembl Genomes
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Tue Mar 25 08:24:36 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 25 Mar 2014 08:24:36 -0600
Subject: [maker-devel] Maker iPlant image
In-Reply-To: <CAOfLjHVa1r8hdF0GK+gp59pmfZb7qZLO5rF0qwK7b+=hQ0CcrQ@mail.gmail.com>
References: <CAOfLjHVa1r8hdF0GK+gp59pmfZb7qZLO5rF0qwK7b+=hQ0CcrQ@mail.gmail.com>
Message-ID: <CF56ED91.B119%carsonhh@gmail.com>

--> /opt/maker/bin/maker

It looks like most preinstalled software is under /opt on the image.

Thanks,
Carson


From:  Daniel Standage <daniel.standage at gmail.com>
Date:  Tuesday, March 25, 2014 at 7:24 AM
To:  Maker Mailing List <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Maker iPlant image

Greetings,

I launched an instance from the Maker-P 2.28 image
(c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location
of the installed software. All I could find was an example data set on the
Desktop, but the "maker" program was not in the path and the contents of
"/usr/local/src" are empty. Could you please advise on how to run Maker in
iPlant Atmosphere? Thanks.

--
Daniel S. Standage
Ph.D. Candidate
Computational Genome Science Laboratory
Indiana University
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140325/208a9c20/attachment-0001.html>

From darasappan at gmail.com  Tue Mar 25 10:33:59 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Tue, 25 Mar 2014 11:33:59 -0500
Subject: [maker-devel] maker to EvidenceModeler
Message-ID: <08324618-6422-4E24-99D1-D05E64420FFB@gmail.com>

Hi Carson and others,

Is there an easy tool/pipeline available as part of maker utilities to convert maker and SNAP output to files acceptable by EvidenceModeler?

It looks like it also needs just gff files, but with a few tweaks. EvidenceModeler seems better equipped to handle PASA annotation results than maker results.

Thanks
Dhivya


From barry.utah at gmail.com  Tue Mar 25 11:51:38 2014
From: barry.utah at gmail.com (Barry Moore)
Date: Tue, 25 Mar 2014 11:51:38 -0600
Subject: [maker-devel] Problem extracting fasta from a GFF file
	generated	with MAKER
In-Reply-To: <1264f0423dbe47b18ed3bc8b49c5b31d@HKXPR06MB101.apcprd06.prod.outlook.com>
References: <1264f0423dbe47b18ed3bc8b49c5b31d@HKXPR06MB101.apcprd06.prod.outlook.com>
Message-ID: <B283D045-3B8D-4A0C-82F8-7C2DB291B065@genetics.utah.edu>

Hi Diana,

There is a Perl library - The Genome Annotation Library - that is designed to make writing code like this easy.  I just added a script to this library called gal_CDS_sequence which you would run like this:

gal_CDS_sequence --translate genes.gff3 genome.fasta

The focus of GAL is to try to make writing quick scripts like this easy, so if you're comfortable with a bit of Perl, you can modify existing scripts and write new ones to search, iterate through, and traverse the relationships of features in GFF3 files.

You can access the library here:

http://www.sequenceontology.org/software/GAL.html

Support for GAL is available via the SO mailing list:

https://lists.sourceforge.net/lists/listinfo/song-devel

Hope that helps,

Barry

On Mar 24, 2014, at 5:11 PM, Diana Garnica Moreno wrote:

> Hi there,
> 
> We recently assembled a fungal genome using MAKER and we got the gene models. and the corresponding transcripts, predicted proteins and GFF files. However, the predicted proteins do not have the stop codon included so I do not know which proteins are complete and which ones are incomplete at the 3' end. To solve that I have used different programs to extract the fasta sequence of the CDSs given the gff file and the genome sequence. The problem is that with the tools I have tested I get the right sequence for some of the proteins and wrong sequences for others (with multiple stop codons for example). I am not sure why it happens and since it happens with different tools (different python scripts and even gffread from cufflink) I do not know where is the problem. Could you please give me some advice on how to extract the right sequences with the stop codons included?
> 
> Thanks!
> 
> Diana
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140325/fb1d5733/attachment-0001.html>

From kchilds at plantbiology.msu.edu  Wed Mar 26 08:21:36 2014
From: kchilds at plantbiology.msu.edu (Childs, Kevin)
Date: Wed, 26 Mar 2014 14:21:36 +0000
Subject: [maker-devel] Maker iPlant image
In-Reply-To: <CF56ED91.B119%carsonhh@gmail.com>
References: <CAOfLjHVa1r8hdF0GK+gp59pmfZb7qZLO5rF0qwK7b+=hQ0CcrQ@mail.gmail.com>
	<CF56ED91.B119%carsonhh@gmail.com>
Message-ID: <BE1EEBCF-58A6-4045-B169-699EB189D299@plantbiology.msu.edu>

Daniel,

There are a few small issues with the MAKER-P_2.28 image at iPlant.  I have been using the image successfully for more than a month.  I typically set several environmental variables immediately after starting an ssh session.

export PATH=$PATH:/opt/maker/bin:/opt/maker/exe/snap:/opt/maker/exe/augustus/bin:/opt/maker/exe/augustus/scripts/
export ZOE=/opt/maker/exe/snap
export AUGUSTUS_CONFIG_PATH=/opt/maker/exe/augustus/config
export TMP=/tmp

The image will allow you to train SNAP, but training Augustus is not possible with the current image.  Augustus training requires blat which was not installed in this image.  There is also an issue where training Augustus requires that you write to the /opt/maker/exe/augustus/config/species/ directory which requires some inconvenient directory hacking.  I've worked this all out on a forked image (currently private), but I have not had the time to contact Joshua Stein to suggest some modifications to his public image.

Augustus should work with a stock hmm on this image.

I have not attempted to use GeneMark, and of course, fgenesh is a completely different story.

Kevin Childs


---
Kevin Childs, PhD

Assistant Professor - Fixed Term
Plant Biology Department
Michigan State University

kchilds at plantbiology.msu.edu
517-775-2844 (m)
517-353-5969 (l)

On Mar 25, 2014, at 10:24 AM, Carson Holt wrote:

> --> /opt/maker/bin/maker
> 
> It looks like most preinstalled software is under /opt on the image.
> 
> Thanks,
> Carson
> 
> 
> From: Daniel Standage <daniel.standage at gmail.com>
> Date: Tuesday, March 25, 2014 at 7:24 AM
> To: Maker Mailing List <maker-devel at yandell-lab.org>
> Subject: [maker-devel] Maker iPlant image
> 
> Greetings,
> 
> I launched an instance from the Maker-P 2.28 image (c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location of the installed software. All I could find was an example data set on the Desktop, but the "maker" program was not in the path and the contents of "/usr/local/src" are empty. Could you please advise on how to run Maker in iPlant Atmosphere? Thanks.
> 
> --
> Daniel S. Standage
> Ph.D. Candidate
> Computational Genome Science Laboratory
> Indiana University
> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From steinj at cshl.edu  Wed Mar 26 12:41:37 2014
From: steinj at cshl.edu (Stein, Joshua)
Date: Wed, 26 Mar 2014 18:41:37 +0000
Subject: [maker-devel] Maker iPlant image
In-Reply-To: <BE1EEBCF-58A6-4045-B169-699EB189D299@plantbiology.msu.edu>
References: <CAOfLjHVa1r8hdF0GK+gp59pmfZb7qZLO5rF0qwK7b+=hQ0CcrQ@mail.gmail.com>
	<CF56ED91.B119%carsonhh@gmail.com>
	<BE1EEBCF-58A6-4045-B169-699EB189D299@plantbiology.msu.edu>
Message-ID: <A6505FF9-06C4-4EB2-949B-EDA9113F64E3@cshl.edu>

Also please note that there is a tutorial available here, particularly important if you want to use in MPI mode.
https://pods.iplantcollaborative.org/wiki/display/sciplant/MAKER-P+Atmosphere+Tutorial

Josh

Joshua Stein, PhD
Manager, Sci. Informatics III
Cold Spring Harbor Laboratory
steinj at cshl.edu
http://ware.cshl.org/


On Mar 26, 2014, at 10:20 AM, "Childs, Kevin" <kchilds at plantbiology.msu.edu>
 wrote:

> Daniel,
> 
> There are a few small issues with the MAKER-P_2.28 image at iPlant.  I have been using the image successfully for more than a month.  I typically set several environmental variables immediately after starting an ssh session.
> 
> export PATH=$PATH:/opt/maker/bin:/opt/maker/exe/snap:/opt/maker/exe/augustus/bin:/opt/maker/exe/augustus/scripts/
> export ZOE=/opt/maker/exe/snap
> export AUGUSTUS_CONFIG_PATH=/opt/maker/exe/augustus/config
> export TMP=/tmp
> 
> The image will allow you to train SNAP, but training Augustus is not possible with the current image.  Augustus training requires blat which was not installed in this image.  There is also an issue where training Augustus requires that you write to the /opt/maker/exe/augustus/config/species/ directory which requires some inconvenient directory hacking.  I've worked this all out on a forked image (currently private), but I have not had the time to contact Joshua Stein to suggest some modifications to his public image.
> 
> Augustus should work with a stock hmm on this image.
> 
> I have not attempted to use GeneMark, and of course, fgenesh is a completely different story.
> 
> Kevin Childs
> 
> 
> ---
> Kevin Childs, PhD
> 
> Assistant Professor - Fixed Term
> Plant Biology Department
> Michigan State University
> 
> kchilds at plantbiology.msu.edu
> 517-775-2844 (m)
> 517-353-5969 (l)
> 
> On Mar 25, 2014, at 10:24 AM, Carson Holt wrote:
> 
>> --> /opt/maker/bin/maker
>> 
>> It looks like most preinstalled software is under /opt on the image.
>> 
>> Thanks,
>> Carson
>> 
>> 
>> From: Daniel Standage <daniel.standage at gmail.com>
>> Date: Tuesday, March 25, 2014 at 7:24 AM
>> To: Maker Mailing List <maker-devel at yandell-lab.org>
>> Subject: [maker-devel] Maker iPlant image
>> 
>> Greetings,
>> 
>> I launched an instance from the Maker-P 2.28 image (c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location of the installed software. All I could find was an example data set on the Desktop, but the "maker" program was not in the path and the contents of "/usr/local/src" are empty. Could you please advise on how to run Maker in iPlant Atmosphere? Thanks.
>> 
>> --
>> Daniel S. Standage
>> Ph.D. Candidate
>> Computational Genome Science Laboratory
>> Indiana University
>> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at yandell-lab.org
> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org


From brubin at fieldmuseum.org  Sat Mar 29 10:24:05 2014
From: brubin at fieldmuseum.org (Benjamin Rubin)
Date: Sat, 29 Mar 2014 11:24:05 -0500
Subject: [maker-devel] Missing UTRs in GFF
Message-ID: <CAKpVPBLQ9i9qKv3e=fpD+pU9YFTyUXUFQUiMh0j0N9aDgvSRcQ@mail.gmail.com>

I have annotated a eukaryotic genome with MAKER 2.30. I recently realized
that there are a few genes in the GFF file produced by gff3_merge with
inconsistencies in the annotated CDS and UTRs. For most of my genes, the
UTRs have their own lines in the GFF file. However, for the problematic
genes, the UTRs are not specified in the GFF file and all exons are
annotated as CDS. The UTRs do appear in the gene header and the protein
sequences are the correct length (do not include the UTR). I have attached
an example from the GFF file.

Is this a known problem, or have I done something wrong? Is there an easy
way to fix the GFF file?

Thanks for your help,
Ben

-- 
_____________________________________________________
Benjamin ER Rubin
PhD Candidate
Committee on Evolutionary Biology
University of Chicago
benrubin.org

Division of Insects
Zoology Department
Field Museum of Natural History
1400 South Lake Shore Drive
Chicago, IL 60605
USA
Office: (312) 665-7776
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140329/0f93b3b2/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: missing_utr.gff
Type: application/octet-stream
Size: 2933 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140329/0f93b3b2/attachment-0001.obj>

From mhinsley at ebi.ac.uk  Mon Mar 31 04:20:10 2014
From: mhinsley at ebi.ac.uk (Malcolm Hinsley)
Date: Mon, 31 Mar 2014 11:20:10 +0100
Subject: [maker-devel] putative preponderance of short exons??
Message-ID: <5339415A.1020509@ebi.ac.uk>

Hi

I've run Maker on a de novo assembly of a species of fly and then ran 
some simple statistics (intron/ exon/ CDS length, exons per gene)  over 
the GFF output and compared with a couple of other species.
It all looks good except that there is a surprising number of very short 
exons (6000 < 50 bp, 3500 < 30 bp, 878< 10 bp, 87k total - see attached 
pdf), black is drosophilia, red is A.gambiae, green is with 5' and 3' 
exons removed).

I ran est2genome & protein2genome, then 3 cycles of Augustus and SNAP.  
I'm using maker 2.31 (unpatched).

Anecdotally, these short exons appear without EST or protein evidence 
and they all line up with canonical splice sequences (GT----AG).
(but i've only looked at a few using Apollo).

While there's no requirement that exons should be longer I'm suspicious 
of this as there must be some evolutionary relationship between these 
species.
I've compared with a another species annotated with Maker (using SNAP 
and Augustus)  which is more distant (not yet publicly available), and 
the same pattern of short exons is present.
I wondered if they were created to fulfil the need for start/stop 
codons, but this does not appear to be the case (mostly they are mid-gene).


Is there some way to adjust the predictors eg to require external 
evidence? or anything else you could suggest? ... I can see the 
following in the tutorial but I'm not sure how they could help:

pred_flank=200 #flank for extending evidence clusters sent to gene predictors
pred_stats=0 #report AED and QI statistics for all predictions as well as models
AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
min_protein=0 #require at least this many amino acids in predicted proteins
alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no
always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no


thanks

-- 
malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD
United Kingdom

-------------- next part --------------
A non-text attachment was scrubbed...
Name: exon_53.pdf
Type: application/pdf
Size: 10618 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140331/edd22fe9/attachment-0001.pdf>

From carsonhh at gmail.com  Mon Mar 31 07:52:15 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 31 Mar 2014 07:52:15 -0600
Subject: [maker-devel] putative preponderance of short exons??
In-Reply-To: <5339415A.1020509@ebi.ac.uk>
References: <5339415A.1020509@ebi.ac.uk>
Message-ID: <CF5ECE08.B30C%carsonhh@gmail.com>

The intron/exon structure is determined by SNAP, Augustus, etc.  It is not
affected by any of the maker parameters.  Only evidence alignments are
affected by the maker settings.  You can try retraining or manually
editing the HMMs, but they might also be regions where your assembly is
incorrect and those algorithms make short exons in order to make a
structure work without getting stop codons mid gene.

Thanks,
Carson


On 3/31/14, 4:20 AM, "Malcolm Hinsley" <mhinsley at ebi.ac.uk> wrote:

>Hi
>
>I've run Maker on a de novo assembly of a species of fly and then ran
>some simple statistics (intron/ exon/ CDS length, exons per gene)  over
>the GFF output and compared with a couple of other species.
>It all looks good except that there is a surprising number of very short
>exons (6000 < 50 bp, 3500 < 30 bp, 878< 10 bp, 87k total - see attached
>pdf), black is drosophilia, red is A.gambiae, green is with 5' and 3'
>exons removed).
>
>I ran est2genome & protein2genome, then 3 cycles of Augustus and SNAP.
>I'm using maker 2.31 (unpatched).
>
>Anecdotally, these short exons appear without EST or protein evidence
>and they all line up with canonical splice sequences (GT----AG).
>(but i've only looked at a few using Apollo).
>
>While there's no requirement that exons should be longer I'm suspicious
>of this as there must be some evolutionary relationship between these
>species.
>I've compared with a another species annotated with Maker (using SNAP
>and Augustus)  which is more distant (not yet publicly available), and
>the same pattern of short exons is present.
>I wondered if they were created to fulfil the need for start/stop
>codons, but this does not appear to be the case (mostly they are
>mid-gene).
>
>
>Is there some way to adjust the predictors eg to require external
>evidence? or anything else you could suggest? ... I can see the
>following in the tutorial but I'm not sure how they could help:
>
>pred_flank=200 #flank for extending evidence clusters sent to gene
>predictors
>pred_stats=0 #report AED and QI statistics for all predictions as well as
>models
>AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and
>1)
>min_protein=0 #require at least this many amino acids in predicted
>proteins
>alt_splice=0 #Take extra steps to try and find alternative splicing, 1 =
>yes, 0 = no
>always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0
>= no
>
>
>thanks
>
>-- 
>malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669
>European Bioinformatics Institute (EMBL-EBI)
>European Molecular Biology Laboratory
>Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD
>United Kingdom
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Mon Mar 31 08:37:15 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 31 Mar 2014 08:37:15 -0600
Subject: [maker-devel] Missing UTRs in GFF
In-Reply-To: <CAKpVPBLQ9i9qKv3e=fpD+pU9YFTyUXUFQUiMh0j0N9aDgvSRcQ@mail.gmail.com>
References: <CAKpVPBLQ9i9qKv3e=fpD+pU9YFTyUXUFQUiMh0j0N9aDgvSRcQ@mail.gmail.com>
Message-ID: <CF5ED8D3.B31A%carsonhh@gmail.com>

Not something I've seen before, but there was a patch for another issue that
was cause by the use of avoid_est_fusion=1, that may be related.  Try the
current stable release 2.31, and let me know if it still happens.

You can also upload the contig folder from one of the regions in question
here --> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi

Then I could verify the bug, and see if it is something that happens in the
current release.

--Carson


From:  Benjamin Rubin <brubin at fieldmuseum.org>
Date:  Saturday, March 29, 2014 at 10:24 AM
To:  <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Missing UTRs in GFF

I have annotated a eukaryotic genome with MAKER 2.30. I recently realized
that there are a few genes in the GFF file produced by gff3_merge with
inconsistencies in the annotated CDS and UTRs. For most of my genes, the
UTRs have their own lines in the GFF file. However, for the problematic
genes, the UTRs are not specified in the GFF file and all exons are
annotated as CDS. The UTRs do appear in the gene header and the protein
sequences are the correct length (do not include the UTR). I have attached
an example from the GFF file.

Is this a known problem, or have I done something wrong? Is there an easy
way to fix the GFF file?

Thanks for your help,
Ben

-- 
_____________________________________________________
Benjamin ER Rubin
PhD Candidate
Committee on Evolutionary Biology
University of Chicago
benrubin.org <http://benrubin.org>

Division of Insects
Zoology Department
Field Museum of Natural History
1400 South Lake Shore Drive
Chicago, IL 60605
USA
Office: (312) 665-7776
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140331/9116f7cb/attachment-0001.html>

From pushplata.singh at teri.res.in  Sun Mar  2 22:29:37 2014
From: pushplata.singh at teri.res.in (Pushplata Singh)
Date: Mon, 3 Mar 2014 10:59:37 +0530
Subject: [maker-devel] Query on Hardware requirement
Message-ID: <OF837195A3.CDBC7472-ON65257C90.001D994D-65257C90.001E2DB9@teri.res.in>


Hi,

I am trying to assemble and analyse(bio-informatics) genome sequence of a
35 GB fungal genome. The raw data that has been generated from Illumina
sequencing is of  ~15 GB. Could you please suggest me the system (hardware)
requirement for installing and running Maker and ALLPATHS-LG sofrware for
the job?

Thank you
Pushplata Singh, PhD
Nanobiotechnology Centre
Biotechnology and Management of Bioresources Division
The Energy and Resources Institute
Darbari Seth Block , India Habitat Centre,Lodhi Road
New Delhi 110003 India
Phone +91 11 24682100 ext 2611
Fax +91 11 24682145


------------------------------------------------------------------------------------------------------------

Disclaimer:

The information contained in this e-mail is intended for the person or entity
to which it is addressed, and it may contain confidential and/or privileged
material. Any review or other use of this mail or taking any action based on it
by persons or entities other than the intended recipient is strictly prohibited.
If you receive this e-mail by mistake, please contact the sender, and delete all
copies of this mail.This e-mail has been scanned and verified by McAfee SaaS
Email Security, formerly MX Logic.


From dence at genetics.utah.edu  Mon Mar  3 07:11:34 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Mon, 3 Mar 2014 14:11:34 +0000
Subject: [maker-devel] Query on Hardware requirement
In-Reply-To: <OF837195A3.CDBC7472-ON65257C90.001D994D-65257C90.001E2DB9@teri.res.in>
References: <OF837195A3.CDBC7472-ON65257C90.001D994D-65257C90.001E2DB9@teri.res.in>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D68BF9@mxb2.hg.genetics.utah.edu>

Hi Pradeep, 

I think Allpaths is developed by the Broad Institute, so you'd have to check their documentation for their system requirments. MAKER is installable on Linux and Mac OS X computers. The throughput you'll be able to achieve with MAKER depends on how many processors and how much RAM the machine has. To take advantage of MAKER's ability to parallelize the annotation process, you need some version of MPI installed on your machine. MAKER can try to install MPI for you, but a manual installation is usually required. 

I hope that helps. 

Thanks,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Pushplata Singh [pushplata.singh at teri.res.in]
Sent: Sunday, March 02, 2014 10:29 PM
To: maker-devel at yandell-lab.org
Cc: Pradeep Dahiya
Subject: [maker-devel] Query on Hardware requirement

Hi,

I am trying to assemble and analyse(bio-informatics) genome sequence of a
35 GB fungal genome. The raw data that has been generated from Illumina
sequencing is of  ~15 GB. Could you please suggest me the system (hardware)
requirement for installing and running Maker and ALLPATHS-LG sofrware for
the job?

Thank you
Pushplata Singh, PhD
Nanobiotechnology Centre
Biotechnology and Management of Bioresources Division
The Energy and Resources Institute
Darbari Seth Block , India Habitat Centre,Lodhi Road
New Delhi 110003 India
Phone +91 11 24682100 ext 2611
Fax +91 11 24682145


------------------------------------------------------------------------------------------------------------

Disclaimer:

The information contained in this e-mail is intended for the person or entity
to which it is addressed, and it may contain confidential and/or privileged
material. Any review or other use of this mail or taking any action based on it
by persons or entities other than the intended recipient is strictly prohibited.
If you receive this e-mail by mistake, please contact the sender, and delete all
copies of this mail.This e-mail has been scanned and verified by McAfee SaaS
Email Security, formerly MX Logic.

_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carson.holt at genetics.utah.edu  Mon Mar  3 12:08:49 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Mon, 3 Mar 2014 19:08:49 +0000
Subject: [maker-devel] FW: error runinig agustus
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A890B159@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A890B159@SKREGIXES2.AGR.GC.CA>
Message-ID: <CF3A2120.A782%carson.holt@genetics.utah.edu>

Forwarding this to the maker-devel list.


On 3/3/14, 12:04 PM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:

>I encountered the following error while running maker (2nd annotation
>using gff file of the first maker run and trinity assembled RNA seq as
>EST)
>
>ERROR: Augustus failed
>--> rank=NA, hostname=rapa.agr.gc.ca
>
>Note : 1st run of the maker was done by Maker 2.10 and for the 2nd one I
>am using 2.31
>
>Your help is appreciated
>
>
>HB
>
>
>
>
>


From carsonhh at gmail.com  Mon Mar  3 12:11:08 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 03 Mar 2014 12:11:08 -0700
Subject: [maker-devel] FW: error runinig agustus
Message-ID: <CF3A21A5.A788%carsonhh@gmail.com>

You will need to provide more detail.  Probably the entire error log and
the maker control files.

Thanks,
Carson


On 3/3/14, 12:08 PM, "Carson Holt" <carson.holt at genetics.utah.edu> wrote:

>Forwarding this to the maker-devel list.
>
>
>On 3/3/14, 12:04 PM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:
>
>>I encountered the following error while running maker (2nd annotation
>>using gff file of the first maker run and trinity assembled RNA seq as
>>EST)
>>
>>ERROR: Augustus failed
>>--> rank=NA, hostname=rapa.agr.gc.ca
>>
>>Note : 1st run of the maker was done by Maker 2.10 and for the 2nd one I
>>am using 2.31
>>
>>Your help is appreciated
>>
>>
>>HB
>>
>>
>>
>>
>>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From sjackman at gmail.com  Tue Mar  4 19:10:42 2014
From: sjackman at gmail.com (Shaun Jackman)
Date: Tue, 4 Mar 2014 18:10:42 -0800
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
	<CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
	<CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>
Message-ID: <CADX6M3rdHPKSk4VMUpbxKSA=rhVVUk+L=dG1xMibWz1KntF2AA@mail.gmail.com>

Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for
the tip.

The rRNA genes that are found with est2genome have the feature type set to
*mRNA* and have corresponding *five_prime_UTR*, *CDS* and
*three_prime_UTR*features. Ideally the feature type would be set to
*rRNA* or *tRNA* as appropriate, and would omit the UTR and CDS features.
Is that a feature that you would be interested in adding to MAKER? The rRNA
gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is
standard, so determining the appropriate type should be straight forward.

Thanks again for your help with this. Cheers,
Shaun


On 27 February 2014 17:13, Carson Holt <carsonhh at gmail.com> wrote:

> Set single_exon=1, and the minimum size to a smaller value.  I think it's
> set to 250 right now.  Also est2genome is looking for ORF, so if there is
> none (as with tRNAs) they probably won't get picked up.
>
> --Carson
>
> Sent from my iPhone
>
> On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:
>
> Sorry, ignore my previous question. est_forward also carries forward the
> names of protein evidence and works like a charm. Thank you!
>
> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller
> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They
> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect
> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value
> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing
> these hits?
>
> organism_type=prokaryotic
> est2genome=1
> protein2genome=1
> est_forward=1
>
> Cheers,
> Shaun
>
>
> On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
>
>> Is there a corresponding protein_forward=1 option to map forward protein
>> names from protein2genome?
>>
>> Cheers,
>> Shaun
>>
>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com<//carsonhh at gmail.com>)
>> wrote:
>>
>> Sorry I meant to say prefilter on the score in the mRNA column before
>> passing the gff3 to model_gff.
>>
>> --Carson
>>
>> Sent from my iPhone
>>
>> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>
>>  What you can do is run it once with just est_forward=1 and
>> est2genome/protein2genome set to 1.  Then take those results, pass them in
>> as model_gff and use the map_forward option to then filter the results
>> based on mRNA score and that would copy names onto new gene under the
>> standard MAKER pipeline.  Eventually it?s really supposed to go into a
>> separate tool that will map genes onto new assemblies (but under the hood
>> the tool will just be calling MAKER with certain parameters restricted).  I
>> do this because if people commonly use it mixed with things like SNAP I can
>> start to get some very weird behaviors.
>>
>> Thanks,
>> Carson
>>
>>  From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>> Date: Wednesday, February 26, 2014 at 3:04 PM
>> To: Carson Holt <carsonhh at gmail.com>
>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] Mapping gene names
>>
>>  It seems that this could be a very useful option in those cases where
>> you have firm a priori knowledge of the placement of ESTs. However, while
>> trying it I note that est_forward implies that the est2genome predictor is
>> turned on, implicitly. Is this necessary for this to work? I?m after the
>> behavior you describe below where exonerate is made to try really hard
>> within a limited region to align an est, but I would not like maker to
>> produce est2genome predictions.
>>
>> In general, I think this maker_coor and est_forward is a feature set that
>> is worthy to be promoted into a documented feature.
>>
>> THanks,
>> Mikael
>>
>>  26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>>
>>  It will still work without est_forward.  It just works a little
>> differently.  Keep in mind this was a hidden feature I used to find
>> stubborn or hard to find missing genes after reassembly of a genome.
>>
>> If est_forward is provided, MAKER will parse the database to look for the
>> maker_coor tags early in the pipeline.  Then it will create a list of
>> locations to search, and it will search them even if there are no BLAST
>> results to seed the search (normally MAKER gets a BLAST result first and
>> then polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to
>> look for a match using all of chr1 as the input to exonerate even when
>> BLAST finds nothing (this is a very very slow search, but can help pick up
>> one or two stubborn genes that don?t remap well).  To allow this, MAKER
>> gives exonerate looser matching parameters (i.e. allows for single base
>> pair introns perhaps caused by assembly errors).  The logic here is that
>> given the fact that I already told MAKER that with some degree of
>> confidence I expect sequence A to map to to location X, it will try its
>> hardest to make it match.
>>
>> Without est_forward set, the maker_coor= flag still gets read in GI.pm at
>> line 1563, but only after a BLAST alignment has already seeded it to the
>> region (that BLAST result has the information in its description
>> parameter).  MAKER will then ignore seeds completely outside of maker_coor.
>> In addition any BLAST seeds that overlap maker_coor will get the search
>> space for alignment polishing adjusted to match maker_coor exactly.  Also
>> match parameters for exonerate will not be relaxed as they were with
>> est_forward.
>>
>> As you can see the behavior, is slightly different (because it?s an
>> accidental feature).
>>
>> Thanks,
>> Carson
>>
>>
>>
>>  From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>> Date: Wednesday, February 26, 2014 at 6:37 AM
>> To: Carson Holt <carsonhh at gmail.com>
>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] Mapping gene names
>>
>>  That might be a useful and time saving accidental feature. But, reading
>> the code, it seems that I need to supply maker_coor but not gene_id, as
>> well as the configuration option est_forward for this to work. Any
>> occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1
>> right?
>>
>> Mikael
>>
>>  26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>>
>>  Yes.  That should work as well as an accidental feature.
>>
>> --Carson
>>
>> Sent from my iPhone
>>
>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <
>> mikael.durling at slu.se> wrote:
>>
>> Can this use of maker_coor be used only to hint about the placement of
>> the ests, without affecting the naming of the final genes? Ie if I have a
>> database of EST where I have a priori knowledge of their rough placement,
>> can this placement be given to maker without providing est_forward=1?
>>
>> Thanks,
>> Mikael
>>
>>  26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>
>>  There is a way.  It?s not a standard option and it?s undocumented, but
>> if you add est_forward=1 to the maker_opts.ctl file, then it will do just
>> that.  The option won?t already be there so you?ll have to type it in.
>>
>> There is also a feature designed to work with this option.  If you add
>> tags to your fasta headers, those can be used to guide the mapping and
>> naming.  For example, gene_id=<some_gene>  will ensure different isoforms
>> that share a common gene_id get clustered into the same gene,
>> and maker_coor=chr1:1-10000 in the fasta header will force a particular
>> sequence to only be mapped against chr1 within the range of 1-10000 bp  and
>> just using maker_coor=chr1 will force it to only be mapped against chr1.
>>
>> This is an undocumented way to remap genes onto new assemblies using
>> blast alignments of earlier transcript or protein annotations as a guide.
>>
>> ?Carson
>>
>>
>>
>>
>>  From: Shaun Jackman <sjackman at gmail.com>
>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>> Date: Tuesday, February 25, 2014 at 5:06 PM
>> To: <maker-devel at yandell-lab.org>
>> Subject: [maker-devel] Mapping gene names
>>
>>  Hi,
>>
>> I?m annotating a genome using a closely related genome from Genbank,
>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence to
>> annotate my genome. I?ve run Maker, and the annotation seems to have worked
>> well. Is it possible to map the names of the genes from the related species
>> to my annotation? I see the *map_forward* option, which applies to the
>> *model_gff* parameter. Is there a similar option for *est* and *protein*?
>>
>> *maker_opts.ctl*
>>
>> est=NC_123456.frn
>> protein=NC_123456.faa
>> est2genome=1
>> protein2genome=1
>>
>> Thanks,
>> Shaun
>>  _______________________________________________ maker-devel mailing
>> list maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>>  http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>>
>>
>>   _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140304/86755749/attachment-0002.html>

From carsonhh at gmail.com  Tue Mar  4 19:33:12 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 04 Mar 2014 19:33:12 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CADX6M3rdHPKSk4VMUpbxKSA=rhVVUk+L=dG1xMibWz1KntF2AA@mail.gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
	<CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
	<CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>
	<CADX6M3rdHPKSk4VMUpbxKSA=rhVVUk+L=dG1xMibWz1KntF2AA@mail.gmail.com>
Message-ID: <CF3BD88C.A7D5%carsonhh@gmail.com>

Trying to call non-coding RNA from ESTs or even sequence homology is
extremely messy (non-trivial problem in most organisms with high false
positive rate), so MAKER for the most part doesn?t even try to do that.  It
focuses only on the coding genes.  You can now use tRNAscan and snoscan in
the newest version for some non-coding RNA support (those features were only
added a couple of months ago).  So just like other prediction tools (snap,
augustus etc.), the primary focus has always been the coding genes.  We?ve
only started adding non-coding RNA support recently for iPlant, so it?s
still relatively immature.

Thanks,
Carson


From:  Shaun Jackman <sjackman at gmail.com>
Reply-To:  Shaun Jackman <sjackman at gmail.com>
Date:  Tuesday, March 4, 2014 at 7:10 PM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Mapping gene names

Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for
the tip.

The rRNA genes that are found with est2genome have the feature type set to
mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR
features. Ideally the feature type would be set to rRNA or tRNA as
appropriate, and would omit the UTR and CDS features. Is that a feature that
you would be interested in adding to MAKER? The rRNA gene names all start
with ?rrn? and the tRNA gene names with ?trn?, as is standard, so
determining the appropriate type should be straight forward.

Thanks again for your help with this. Cheers,
Shaun


On 27 February 2014 17:13, Carson Holt <carsonhh at gmail.com> wrote:
> Set single_exon=1, and the minimum size to a smaller value.  I think it's set
> to 250 right now.  Also est2genome is looking for ORF, so if there is none (as
> with tRNAs) they probably won't get picked up.
> 
> --Carson 
> 
> Sent from my iPhone
> 
> On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:
> 
>> Sorry, ignore my previous question. est_forward also carries forward the
>> names of protein evidence and works like a charm. Thank you!
>> 
>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5
>> and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the
>> blastn output, and in the evidence_0.gff. rrn5 has perfect identity,
>> sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 <
>> eval_blastn=1e-10). How should I debug which filter is removing these hits?
>> organism_type=prokaryotic
>> est2genome=1
>> protein2genome=1
>> est_forward=1
>> Cheers,
>> Shaun
>> 
>> 
>> 
>> On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
>>> Is there a corresponding protein_forward=1 option to map forward protein
>>> names from protein2genome?
>>>  
>>> 
>>> Cheers,
>>> Shaun
>>> 
>>> 
>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com
>>> <mailto://carsonhh at gmail.com> ) wrote:
>>>  
>>>> Sorry I meant to say prefilter on the score in the mRNA column before
>>>> passing the gff3 to model_gff.
>>>> 
>>>> --Carson 
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>>> 
>>>>> What you can do is run it once with just est_forward=1 and
>>>>> est2genome/protein2genome set to 1.  Then take those results, pass them in
>>>>> as model_gff and use the map_forward option to then filter the results
>>>>> based on mRNA score and that would copy names onto new gene under the
>>>>> standard MAKER pipeline.  Eventually it?s really supposed to go into a
>>>>> separate tool that will map genes onto new assemblies (but under the hood
>>>>> the tool will just be calling MAKER with certain parameters restricted).
>>>>> I do this because if people commonly use it mixed with things like SNAP I
>>>>> can start to get some very weird behaviors.
>>>>> 
>>>>> Thanks,
>>>>> Carson
>>>>> 
>>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>>> Date: Wednesday, February 26, 2014 at 3:04 PM
>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>> 
>>>>> It seems that this could be a very useful option in those cases where you
>>>>> have firm a priori knowledge of the placement of ESTs. However, while
>>>>> trying it I note that est_forward implies that the est2genome predictor is
>>>>> turned on, implicitly. Is this necessary for this to work? I?m after the
>>>>> behavior you describe below where exonerate is made to try really hard
>>>>> within a limited region to align an est, but I would not like maker to
>>>>> produce est2genome predictions.
>>>>> 
>>>>> In general, I think this maker_coor and est_forward is a feature set that
>>>>> is worthy to be promoted into a documented feature.
>>>>> 
>>>>> THanks,
>>>>> Mikael
>>>>> 
>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>>>>> 
>>>>>> It will still work without est_forward.  It just works a little
>>>>>> differently.  Keep in mind this was a hidden feature I used to find
>>>>>> stubborn or hard to find missing genes after reassembly of a genome.
>>>>>> 
>>>>>> If est_forward is provided, MAKER will parse the database to look for the
>>>>>> maker_coor tags early in the pipeline.  Then it will create a list of
>>>>>> locations to search, and it will search them even if there are no BLAST
>>>>>> results to seed the search (normally MAKER gets a BLAST result first and
>>>>>> then polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to
>>>>>> look for a match using all of chr1 as the input to exonerate even when
>>>>>> BLAST finds nothing (this is a very very slow search, but can help pick
>>>>>> up one or two stubborn genes that don?t remap well).  To allow this,
>>>>>> MAKER gives exonerate looser matching parameters (i.e. allows for single
>>>>>> base pair introns perhaps caused by assembly errors).  The logic here is
>>>>>> that given the fact that I already told MAKER that with some degree of
>>>>>> confidence I expect sequence A to map to to location X, it will try its
>>>>>> hardest to make it match.
>>>>>> 
>>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm at
>>>>>> line 1563, but only after a BLAST alignment has already seeded it to the
>>>>>> region (that BLAST result has the information in its description
>>>>>> parameter).  MAKER will then ignore seeds completely outside of
>>>>>> maker_coor. In addition any BLAST seeds that overlap maker_coor will get
>>>>>> the search space for alignment polishing adjusted to match maker_coor
>>>>>> exactly.  Also match parameters for exonerate will not be relaxed as they
>>>>>> were with est_forward.
>>>>>> 
>>>>>> As you can see the behavior, is slightly different (because it?s an
>>>>>> accidental feature).
>>>>>> 
>>>>>> Thanks,
>>>>>> Carson
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM
>>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>>> 
>>>>>> That might be a useful and time saving accidental feature. But, reading
>>>>>> the code, it seems that I need to supply maker_coor but not gene_id, as
>>>>>> well as the configuration option est_forward for this to work. Any
>>>>>> occurrences of maker_coor in GI.pm seems to be conditioned on
>>>>>> set_forward=1 right?
>>>>>> 
>>>>>> Mikael
>>>>>> 
>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>> 
>>>>>>> Yes.  That should work as well as an accidental feature.
>>>>>>> 
>>>>>>> --Carson 
>>>>>>> 
>>>>>>> Sent from my iPhone
>>>>>>> 
>>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling
>>>>>>> <mikael.durling at slu.se> wrote:
>>>>>>> 
>>>>>>> Can this use of maker_coor be used only to hint about the placement of
>>>>>>> the ests, without affecting the naming of the final genes? Ie if I have
>>>>>>> a database of EST where I have a priori knowledge of their rough
>>>>>>> placement, can this placement be given to maker without providing
>>>>>>> est_forward=1?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Mikael
>>>>>>> 
>>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>> 
>>>>>>> There is a way.  It?s not a standard option and it?s undocumented, but
>>>>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do
>>>>>>> just that.  The option won?t already be there so you?ll have to type it
>>>>>>> in.
>>>>>>> 
>>>>>>> There is also a feature designed to work with this option.  If you add
>>>>>>> tags to your fasta headers, those can be used to guide the mapping and
>>>>>>> naming.  For example, gene_id=<some_gene>  will ensure different
>>>>>>> isoforms that share a common gene_id get clustered into the same gene,
>>>>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular
>>>>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp
>>>>>>> and just using maker_coor=chr1 will force it to only be mapped against
>>>>>>> chr1.
>>>>>>> 
>>>>>>> This is an undocumented way to remap genes onto new assemblies using
>>>>>>> blast alignments of earlier transcript or protein annotations as a
>>>>>>> guide.
>>>>>>> 
>>>>>>> ?Carson
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> From: Shaun Jackman <sjackman at gmail.com>
>>>>>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>>>>>> To: <maker-devel at yandell-lab.org>
>>>>>>> Subject: [maker-devel] Mapping gene names
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I?m annotating a genome using a closely related genome from Genbank,
>>>>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence
>>>>>>> to annotate my genome. I?ve run Maker, and the annotation seems to have
>>>>>>> worked well. Is it possible to map the names of the genes from the
>>>>>>> related species to my annotation? I see the map_forward option, which
>>>>>>> applies to the model_gff parameter. Is there a similar option for est
>>>>>>> and protein?
>>>>>>> 
>>>>>>> maker_opts.ctl
>>>>>>> est=NC_123456.frn
>>>>>>> protein=NC_123456.faa
>>>>>>> est2genome=1
>>>>>>> protein2genome=1
>>>>>>> Thanks,
>>>>>>> Shaun
>>>>>>> _______________________________________________ maker-devel mailing list
>>>>>>> maker-devel at box290.bluehost.com
>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>>> <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>>> > 
>>>>>>> _______________________________________________
>>>>>>> maker-devel mailing list
>>>>>>> maker-devel at box290.bluehost.com
>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>>> 
>>>>>> 
>>>>> 
>>>> _______________________________________________
>>>> maker-devel mailing list
>>>> maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140304/6f5e8e33/attachment-0002.html>

From felix.bemm at uni-wuerzburg.de  Wed Mar  5 09:35:33 2014
From: felix.bemm at uni-wuerzburg.de (Felix Bemm)
Date: Wed, 05 Mar 2014 17:35:33 +0100
Subject: [maker-devel] Build Issues - v2.31
Message-ID: <53175255.4050102@uni-wuerzburg.de>

Hi,

I am trying to build maker version 2.31. Got the following error:

Configuring MAKER with MPI support
'CCFLAGSEX' is not a valid config option for Inline::C
  at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm 
line 236
  at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm 
line 256
	Parallel::Application::MPI::_bind('/software/mpich2-1.5rc3/bin/mpicc', 
'/software/mpich2-1.5rc3/include', 'blib', '') called at 
/storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 277
	MAKER::Build::ACTION_build('MAKER::Build=HASH(0x2199060)') called at 
/usr/share/perl/5.14/Module/Build/Base.pm line 2024
	Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)', 
'build') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2007
	Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)', 'build') 
called at /storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 469
	MAKER::Build::ACTION_install('MAKER::Build=HASH(0x2199060)') called at 
/usr/share/perl/5.14/Module/Build/Base.pm line 2024
	Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)', 
'install') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2012
	Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)') called at 
./Build line 70

Same procedure worked with 2.29-beta!

Any ideas?

Felix

-- 
Felix Bemm
Department of Bioinformatics
University of W?rzburg, Germany
Tel: +49 931 - 31 83696
Fax: +49 931 - 31 84552
felix.bemm at uni-wuerzburg.de


From carsonhh at gmail.com  Wed Mar  5 09:40:05 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 05 Mar 2014 09:40:05 -0700
Subject: [maker-devel] Build Issues - v2.31
In-Reply-To: <53175255.4050102@uni-wuerzburg.de>
References: <53175255.4050102@uni-wuerzburg.de>
Message-ID: <CF3CA125.A7FA%carsonhh@gmail.com>

You need to update your Inline::C module.  The CCFLAGSEX option was added
to Inline::C a couple of years ago to allow users to pass in flags to the
compiler.

Thanks,
Carson


On 3/5/14, 9:35 AM, "Felix Bemm" <felix.bemm at uni-wuerzburg.de> wrote:

>Hi,
>
>I am trying to build maker version 2.31. Got the following error:
>
>Configuring MAKER with MPI support
>'CCFLAGSEX' is not a valid config option for Inline::C
>  at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm
>line 236
>  at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm
>line 256
>	Parallel::Application::MPI::_bind('/software/mpich2-1.5rc3/bin/mpicc',
>'/software/mpich2-1.5rc3/include', 'blib', '') called at
>/storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 277
>	MAKER::Build::ACTION_build('MAKER::Build=HASH(0x2199060)') called at
>/usr/share/perl/5.14/Module/Build/Base.pm line 2024
>	Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)',
>'build') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2007
>	Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)', 'build')
>called at /storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 469
>	MAKER::Build::ACTION_install('MAKER::Build=HASH(0x2199060)') called at
>/usr/share/perl/5.14/Module/Build/Base.pm line 2024
>	Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)',
>'install') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2012
>	Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)') called at
>./Build line 70
>
>Same procedure worked with 2.29-beta!
>
>Any ideas?
>
>Felix
>
>-- 
>Felix Bemm
>Department of Bioinformatics
>University of W?rzburg, Germany
>Tel: +49 931 - 31 83696
>Fax: +49 931 - 31 84552
>felix.bemm at uni-wuerzburg.de
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carson.holt at genetics.utah.edu  Wed Mar  5 12:02:26 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Wed, 5 Mar 2014 19:02:26 +0000
Subject: [maker-devel] FW: maker-control file
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A890B8A7@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A890B8A7@SKREGIXES2.AGR.GC.CA>
Message-ID: <CF3CC2C6.A802%carson.holt@genetics.utah.edu>


On 3/5/14, 11:59 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:

>Dear Maker users
>
>I want to run maker on a fungal genome of about 45 Mb with about 1/3 of
>the genome begin repeat rich. But most of the virulent genes are located
>within the repeat regions flanked but stretch of repeats. I am not sure
>if I  use the repeat masker option I am going to miss out on the
>predication of these virulent genes located within the repeats.
>
>Other concerns with the setting in maker-opts file for fungal genomes are:
>
>single_exon = 0     should this get changed to 1 since single exon genes
>are quit common in fungi and what is the consequence of this on using EST
>and assembled RNA as evidence for gene prediction
>
>correct_est_fusion=0                  #limits use of ESTs in annotation
>to avoid fusion genes         as I understand this option will remove the
>overlapping UTRs but what is the consequence of setting this option on
>the use of EST for predicting ORFs
>
>
>Thanks
>
>
>
>HB
>
>
>
>


From carsonhh at gmail.com  Wed Mar  5 12:17:57 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 05 Mar 2014 12:17:57 -0700
Subject: [maker-devel] FW: maker-control file
Message-ID: <CF3CC300.A805%carsonhh@gmail.com>

Not using repeat masking will cause many problems.  Beside a gene being
flanked by repeats does not mean it will be lost, any evidence/alignments
that can seed in non-repetative regions (gene/exon) are still allowed to
extend into repetitive regions during the polishing stage (aligners have
two stages - seed and extend).  So transposons should never seed, but
genes will because there sequence will contain non-repetative regions
(even if they are near repeats).

single_exon should be set to 1 for fungi, just make sure to set the
minimum length of single exon evidence to something reasonable like 250bp.

correct_est_fusion should not be used together with est2genome.  It won?t
fail, you just get odd results.  Actually est2genome should not ever be
used to generate the final annotation set.  It is a convenience method
that allows you to generate rough models for training gene predictors like
SNAP and Augustus.  But once they are trained it should be turned off,
because the models it produces will be partial (Ests rarely cover the
whole transcript) and the results will have many false potties from
background transcription events from your EST data.  These models are good
enough to train with, but make very poor final annotations. So in the end
you should be using correct_est_fusion=1 with the SNAP pr Augustus set and
not est2genome (which should already have been turned off by then).


Thanks,
Carson


>
>
>On 3/5/14, 11:59 AM, "Borhan, Hossein" <> wrote:
>
>>Dear Maker users
>>
>>I want to run maker on a fungal genome of about 45 Mb with about 1/3 of
>>the genome begin repeat rich. But most of the virulent genes are located
>>within the repeat regions flanked but stretch of repeats. I am not sure
>>if I  use the repeat masker option I am going to miss out on the
>>predication of these virulent genes located within the repeats.
>>
>>Other concerns with the setting in maker-opts file for fungal genomes
>>are:
>>
>>single_exon = 0     should this get changed to 1 since single exon genes
>>are quit common in fungi and what is the consequence of this on using EST
>>and assembled RNA as evidence for gene prediction
>>
>>correct_est_fusion=0                  #limits use of ESTs in annotation
>>to avoid fusion genes         as I understand this option will remove the
>>overlapping UTRs but what is the consequence of setting this option on
>>the use of EST for predicting ORFs
>>
>>
>>Thanks
>>
>>
>>
>>HB
>>
>>
>>
>>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From marc.hoeppner at imbim.uu.se  Thu Mar  6 00:26:29 2014
From: marc.hoeppner at imbim.uu.se (=?Windows-1252?Q?Marc_H=F6ppner?=)
Date: Thu, 6 Mar 2014 07:26:29 +0000
Subject: [maker-devel] FW: maker-control file
In-Reply-To: <CF3CC300.A805%carsonhh@gmail.com>
References: <CF3CC300.A805%carsonhh@gmail.com>
Message-ID: <1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se>

Hi,

I think this is an interesting comment that I would like a few more information on:


correct_est_fusion should not be used together with est2genome.  It won?t
fail, you just get odd results.  Actually est2genome should not ever be
used to generate the final annotation set.  It is a convenience method
that allows you to generate rough models for training gene predictors like
SNAP and Augustus.  But once they are trained it should be turned off,
because the models it produces will be partial (Ests rarely cover the
whole transcript) and the results will have many false potties from
background transcription events from your EST data.  These models are good
enough to train with, but make very poor final annotations. So in the end
you should be using correct_est_fusion=1 with the SNAP pr Augustus set and
not est2genome (which should already have been turned off by then).


My experience has been that the process of training gene finders, especially for complex genomes like vertebrates, is a very slow and painful process. And ultimately, the results are far from accurate, even with a sizeable, manually curated training set. Wouldn?t it be more sensible to rely on the evidence over probabilistic models? The annotation would be partial, but on the other hand the chance of incorporating false signals are smaller (assuming I can generate a clean set of transcripts from RNA-seq data)? And I?d rather underestimate the exon inventory slightly than putting out an annotation with ~ 10% false exon calls.

As an example, using SNAP and Augustus on a bird genome - with augustus achieving nucleotide and exon sensitivities in the 70-90% range gave a host if false exons that were simply not supported by the RNAseq data, yet made it into the final gene build. Not sure what to think about that to be honest. Is it possible to get some more details on how Maker uses ab-inito predictions and reconciles them with evidence alignments? At the moment it seems to me that maker gives higher weight to the ab-initio predictions, which to me seems problematic.


/Marc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140306/f7acdc87/attachment-0002.html>

From carsonhh at gmail.com  Thu Mar  6 07:29:35 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Mar 2014 07:29:35 -0700
Subject: [maker-devel] FW: maker-control file
In-Reply-To: <1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se>
References: <CF3CC300.A805%carsonhh@gmail.com>
	<1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se>
Message-ID: <CF3DCCB0.A85C%carsonhh@gmail.com>

> Wouldn?t it be more sensible to rely on the evidence over probabilistic
> models?

Yes.  Infact that is the backbone of MAKER.  The evidence is used to derive
hints that are passed back into the predictors and reviewed in light of the
evidence to decide on final models (no longer strictly probabalistic).  Take
a look at the MAKER2 paper (Table 2 and Figure 1) and you will see that eve
when you use the wrong species parameters in the predictor (I.e. A. thaliana
to annotate C. elegant) you get as much as a 3 fold increase in exon level
accuracy by using the hint feedback from MAKER.  With est2genome option you
don?t get that hint feedback (normally probabilistic models, EST evidence,
and protein evidence would all work together), and the models are overall
poorer and contain more false positives (we have looked at this a lot).


> The annotation would be partial, but on the other hand the chance of
> incorporating false signals are smaller (assuming I can generate a clean set
> of transcripts from RNA-seq data)?

False signals are abundant.  It?s just the nature of how ESTs and especially
mRNAseq reads are generated and anchored back to the assembly.  By letting
there be feedback between the probabilistic model and the evidence (both
protein and EST/mRNAseq) a lot of this is eliminated.


> As an example, using SNAP and Augustus on a bird genome - with augustus
> achieving nucleotide and exon sensitivities in the 70-90% range gave a host if
> false exons that were simply not supported by the RNAseq data, yet made it
> into the final gene build.

You will get false positives from est2genome alone approach as well.  Models
will be more partial, and false negative rate will be very high (often
30-70% false negative rate).  Also look at the MAKER2 paper Figure 1.  The
false positive rate from ab initio alone can be quite high, but with the
evidence feedback it is substantially reduced (especially for poorly trained
predictors).


> Is it possible to get some more details on how Maker uses ab-inito predictions
> and reconciles them with evidence alignments? At the moment it seems to me
> that maker gives higher weight to the ab-initio predictions, which to me seems
> problematic. 

Take a look at the MAKER, MAKER2, and MAKER-P papers.  Final genes are
chosen based off of evidence overlap using AED (completely evidence based).
It is the model generation that leverages the hint based feedback.  The
names of MAKER genes can let you know what the source of the model is.  Any
time hint based models match the evidence better the name will have hame
like this ?>
maker-<contig>-<predictor>-gene-<ID> (I.e. maker-chr1-snap-gene-0.4)

When the ab initio model matches better than the hint based model the name
is like this ?>
<predictor>-<contig>-abinit-gene-<ID> (I.e. snap-chr1-abinit-gene-0.2)


In summary, using est2genome alone (while good for generating training sets)
undercuts the power of the evidence feedback together with the probabilistic
models.


Thanks,
Carson

From:  Marc H?ppner <marc.hoeppner at imbim.uu.se>
Date:  Thursday, March 6, 2014 at 12:26 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] FW: maker-control file

Hi,

I think this is an interesting comment that I would like a few more
information on:

> 
> correct_est_fusion should not be used together with est2genome.  It won?t
> fail, you just get odd results.  Actually est2genome should not ever be
> used to generate the final annotation set.  It is a convenience method
> that allows you to generate rough models for training gene predictors like
> SNAP and Augustus.  But once they are trained it should be turned off,
> because the models it produces will be partial (Ests rarely cover the
> whole transcript) and the results will have many false potties from
> background transcription events from your EST data.  These models are good
> enough to train with, but make very poor final annotations. So in the end
> you should be using correct_est_fusion=1 with the SNAP pr Augustus set and
> not est2genome (which should already have been turned off by then).
> 

My experience has been that the process of training gene finders, especially
for complex genomes like vertebrates, is a very slow and painful process.
And ultimately, the results are far from accurate, even with a sizeable,
manually curated training set. Wouldn?t it be more sensible to rely on the
evidence over probabilistic models? The annotation would be partial, but on
the other hand the chance of incorporating false signals are smaller
(assuming I can generate a clean set of transcripts from RNA-seq data)? And
I?d rather underestimate the exon inventory slightly than putting out an
annotation with ~ 10% false exon calls.

As an example, using SNAP and Augustus on a bird genome - with augustus
achieving nucleotide and exon sensitivities in the 70-90% range gave a host
if false exons that were simply not supported by the RNAseq data, yet made
it into the final gene build. Not sure what to think about that to be
honest. Is it possible to get some more details on how Maker uses ab-inito
predictions and reconciles them with evidence alignments? At the moment it
seems to me that maker gives higher weight to the ab-initio predictions,
which to me seems problematic.


/Marc


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140306/465e3b3f/attachment-0002.html>

From marc.hoeppner at imbim.uu.se  Thu Mar  6 07:40:48 2014
From: marc.hoeppner at imbim.uu.se (=?Windows-1252?Q?Marc_H=F6ppner?=)
Date: Thu, 6 Mar 2014 14:40:48 +0000
Subject: [maker-devel] FW: maker-control file
In-Reply-To: <CF3DCCB0.A85C%carsonhh@gmail.com>
References: <CF3CC300.A805%carsonhh@gmail.com>
	<1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se>
	<CF3DCCB0.A85C%carsonhh@gmail.com>
Message-ID: <1E6F33D6-44FE-44C5-81C5-8FE58DA07D27@imbim.uu.se>

Hi Carson,

Thanks for the detailed feedback, this has cleared up a few things. I don?t necessarily share your view on the problematic nature of RNA-seq data - especially with newer protocols near-perfect strandedness. We work a lot on transcriptome assembly and with a stringent approach to transcript assembly I think I got better results with est2genome than trying to let Maker work with a semi-refined ab-initio model. But it can be a bit tricky to hit that sweet spot (we did validate > 4000 models manually in order to make that sort of assessment tho).

But I will have another look at this and see if I can get Maker to do what I need with the approach you describe. That reminds me, I think it would be fantastic if you guys could put together a Wiki for Maker. This is such a useful and powerful tool, but clearly there are many things that people should get a proper explanation on that has only ever been discussed on this list here - best practices, experimental features etc.

Regards,

Marc


On 06 Mar 2014, at 15:29, Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>> wrote:

Wouldn?t it be more sensible to rely on the evidence over probabilistic models?

Yes.  Infact that is the backbone of MAKER.  The evidence is used to derive hints that are passed back into the predictors and reviewed in light of the evidence to decide on final models (no longer strictly probabalistic).  Take a look at the MAKER2 paper (Table 2 and Figure 1) and you will see that eve when you use the wrong species parameters in the predictor (I.e. A. thaliana to annotate C. elegant) you get as much as a 3 fold increase in exon level accuracy by using the hint feedback from MAKER.  With est2genome option you don?t get that hint feedback (normally probabilistic models, EST evidence, and protein evidence would all work together), and the models are overall poorer and contain more false positives (we have looked at this a lot).


The annotation would be partial, but on the other hand the chance of incorporating false signals are smaller (assuming I can generate a clean set of transcripts from RNA-seq data)?

False signals are abundant.  It?s just the nature of how ESTs and especially mRNAseq reads are generated and anchored back to the assembly.  By letting there be feedback between the probabilistic model and the evidence (both protein and EST/mRNAseq) a lot of this is eliminated.


As an example, using SNAP and Augustus on a bird genome - with augustus achieving nucleotide and exon sensitivities in the 70-90% range gave a host if false exons that were simply not supported by the RNAseq data, yet made it into the final gene build.

You will get false positives from est2genome alone approach as well.  Models will be more partial, and false negative rate will be very high (often 30-70% false negative rate).  Also look at the MAKER2 paper Figure 1.  The false positive rate from ab initio alone can be quite high, but with the evidence feedback it is substantially reduced (especially for poorly trained predictors).


Is it possible to get some more details on how Maker uses ab-inito predictions and reconciles them with evidence alignments? At the moment it seems to me that maker gives higher weight to the ab-initio predictions, which to me seems problematic.

Take a look at the MAKER, MAKER2, and MAKER-P papers.  Final genes are chosen based off of evidence overlap using AED (completely evidence based).  It is the model generation that leverages the hint based feedback.  The names of MAKER genes can let you know what the source of the model is.  Any time hint based models match the evidence better the name will have hame like this ?>
maker-<contig>-<predictor>-gene-<ID> (I.e. maker-chr1-snap-gene-0.4)

When the ab initio model matches better than the hint based model the name is like this ?>
<predictor>-<contig>-abinit-gene-<ID> (I.e. snap-chr1-abinit-gene-0.2)


In summary, using est2genome alone (while good for generating training sets) undercuts the power of the evidence feedback together with the probabilistic models.


Thanks,
Carson

From: Marc H?ppner <marc.hoeppner at imbim.uu.se<mailto:marc.hoeppner at imbim.uu.se>>
Date: Thursday, March 6, 2014 at 12:26 AM
To: Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] FW: maker-control file

Hi,

I think this is an interesting comment that I would like a few more information on:


correct_est_fusion should not be used together with est2genome.  It won?t
fail, you just get odd results.  Actually est2genome should not ever be
used to generate the final annotation set.  It is a convenience method
that allows you to generate rough models for training gene predictors like
SNAP and Augustus.  But once they are trained it should be turned off,
because the models it produces will be partial (Ests rarely cover the
whole transcript) and the results will have many false potties from
background transcription events from your EST data.  These models are good
enough to train with, but make very poor final annotations. So in the end
you should be using correct_est_fusion=1 with the SNAP pr Augustus set and
not est2genome (which should already have been turned off by then).


My experience has been that the process of training gene finders, especially for complex genomes like vertebrates, is a very slow and painful process. And ultimately, the results are far from accurate, even with a sizeable, manually curated training set. Wouldn?t it be more sensible to rely on the evidence over probabilistic models? The annotation would be partial, but on the other hand the chance of incorporating false signals are smaller (assuming I can generate a clean set of transcripts from RNA-seq data)? And I?d rather underestimate the exon inventory slightly than putting out an annotation with ~ 10% false exon calls.

As an example, using SNAP and Augustus on a bird genome - with augustus achieving nucleotide and exon sensitivities in the 70-90% range gave a host if false exons that were simply not supported by the RNAseq data, yet made it into the final gene build. Not sure what to think about that to be honest. Is it possible to get some more details on how Maker uses ab-inito predictions and reconciles them with evidence alignments? At the moment it seems to me that maker gives higher weight to the ab-initio predictions, which to me seems problematic.


/Marc

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140306/868effc6/attachment-0002.html>

From carsonhh at gmail.com  Thu Mar  6 08:03:10 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Mar 2014 08:03:10 -0700
Subject: [maker-devel] FW: maker-control file
In-Reply-To: <1E6F33D6-44FE-44C5-81C5-8FE58DA07D27@imbim.uu.se>
References: <CF3CC300.A805%carsonhh@gmail.com>
	<1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se>
	<CF3DCCB0.A85C%carsonhh@gmail.com>
	<1E6F33D6-44FE-44C5-81C5-8FE58DA07D27@imbim.uu.se>
Message-ID: <CF3DDC22.A8AF%carsonhh@gmail.com>

MAKER wiki ?> 
http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Main_Page

Thanks,
Carson


From:  Marc H?ppner <marc.hoeppner at imbim.uu.se>
Date:  Thursday, March 6, 2014 at 7:40 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] FW: maker-control file

Hi Carson, 

Thanks for the detailed feedback, this has cleared up a few things. I don?t
necessarily share your view on the problematic nature of RNA-seq data -
especially with newer protocols near-perfect strandedness. We work a lot on
transcriptome assembly and with a stringent approach to transcript assembly
I think I got better results with est2genome than trying to let Maker work
with a semi-refined ab-initio model. But it can be a bit tricky to hit that
sweet spot (we did validate > 4000 models manually in order to make that
sort of assessment tho).

But I will have another look at this and see if I can get Maker to do what I
need with the approach you describe. That reminds me, I think it would be
fantastic if you guys could put together a Wiki for Maker. This is such a
useful and powerful tool, but clearly there are many things that people
should get a proper explanation on that has only ever been discussed on this
list here - best practices, experimental features etc.

Regards,

Marc


On 06 Mar 2014, at 15:29, Carson Holt <carsonhh at gmail.com> wrote:

>> Wouldn?t it be more sensible to rely on the evidence over probabilistic
>> models?
> 
> Yes.  Infact that is the backbone of MAKER.  The evidence is used to derive
> hints that are passed back into the predictors and reviewed in light of the
> evidence to decide on final models (no longer strictly probabalistic).  Take a
> look at the MAKER2 paper (Table 2 and Figure 1) and you will see that eve when
> you use the wrong species parameters in the predictor (I.e. A. thaliana to
> annotate C. elegant) you get as much as a 3 fold increase in exon level
> accuracy by using the hint feedback from MAKER.  With est2genome option you
> don?t get that hint feedback (normally probabilistic models, EST evidence, and
> protein evidence would all work together), and the models are overall poorer
> and contain more false positives (we have looked at this a lot).
> 
> 
>> The annotation would be partial, but on the other hand the chance of
>> incorporating false signals are smaller (assuming I can generate a clean set
>> of transcripts from RNA-seq data)?
> 
> False signals are abundant.  It?s just the nature of how ESTs and especially
> mRNAseq reads are generated and anchored back to the assembly.  By letting
> there be feedback between the probabilistic model and the evidence (both
> protein and EST/mRNAseq) a lot of this is eliminated.
> 
> 
>> As an example, using SNAP and Augustus on a bird genome - with augustus
>> achieving nucleotide and exon sensitivities in the 70-90% range gave a host
>> if false exons that were simply not supported by the RNAseq data, yet made it
>> into the final gene build.
> 
> You will get false positives from est2genome alone approach as well.  Models
> will be more partial, and false negative rate will be very high (often 30-70%
> false negative rate).  Also look at the MAKER2 paper Figure 1.  The false
> positive rate from ab initio alone can be quite high, but with the evidence
> feedback it is substantially reduced (especially for poorly trained
> predictors).
> 
> 
>> Is it possible to get some more details on how Maker uses ab-inito
>> predictions and reconciles them with evidence alignments? At the moment it
>> seems to me that maker gives higher weight to the ab-initio predictions,
>> which to me seems problematic.
> 
> Take a look at the MAKER, MAKER2, and MAKER-P papers.  Final genes are chosen
> based off of evidence overlap using AED (completely evidence based).  It is
> the model generation that leverages the hint based feedback.  The names of
> MAKER genes can let you know what the source of the model is.  Any time hint
> based models match the evidence better the name will have hame like this ?>
> maker-<contig>-<predictor>-gene-<ID> (I.e. maker-chr1-snap-gene-0.4)
> 
> When the ab initio model matches better than the hint based model the name is
> like this ?>
> <predictor>-<contig>-abinit-gene-<ID> (I.e. snap-chr1-abinit-gene-0.2)
> 
> 
> In summary, using est2genome alone (while good for generating training sets)
> undercuts the power of the evidence feedback together with the probabilistic
> models.
> 
> 
> Thanks,
> Carson
> 
> From: Marc H?ppner <marc.hoeppner at imbim.uu.se>
> Date: Thursday, March 6, 2014 at 12:26 AM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] FW: maker-control file
> 
> Hi,
> 
> I think this is an interesting comment that I would like a few more
> information on:
> 
>> 
>> correct_est_fusion should not be used together with est2genome.  It won?t
>> fail, you just get odd results.  Actually est2genome should not ever be
>> used to generate the final annotation set.  It is a convenience method
>> that allows you to generate rough models for training gene predictors like
>> SNAP and Augustus.  But once they are trained it should be turned off,
>> because the models it produces will be partial (Ests rarely cover the
>> whole transcript) and the results will have many false potties from
>> background transcription events from your EST data.  These models are good
>> enough to train with, but make very poor final annotations. So in the end
>> you should be using correct_est_fusion=1 with the SNAP pr Augustus set and
>> not est2genome (which should already have been turned off by then).
>> 
> 
> My experience has been that the process of training gene finders, especially
> for complex genomes like vertebrates, is a very slow and painful process. And
> ultimately, the results are far from accurate, even with a sizeable, manually
> curated training set. Wouldn?t it be more sensible to rely on the evidence
> over probabilistic models? The annotation would be partial, but on the other
> hand the chance of incorporating false signals are smaller (assuming I can
> generate a clean set of transcripts from RNA-seq data)? And I?d rather
> underestimate the exon inventory slightly than putting out an annotation with
> ~ 10% false exon calls.
> 
> As an example, using SNAP and Augustus on a bird genome - with augustus
> achieving nucleotide and exon sensitivities in the 70-90% range gave a host if
> false exons that were simply not supported by the RNAseq data, yet made it
> into the final gene build. Not sure what to think about that to be honest. Is
> it possible to get some more details on how Maker uses ab-inito predictions
> and reconciles them with evidence alignments? At the moment it seems to me
> that maker gives higher weight to the ab-initio predictions, which to me seems
> problematic. 
> 
> 
> /Marc


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140306/10d5f640/attachment-0002.html>

From sjackman at gmail.com  Thu Mar  6 13:56:34 2014
From: sjackman at gmail.com (Shaun Jackman)
Date: Thu, 6 Mar 2014 12:56:34 -0800
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CF3BD88C.A7D5%carsonhh@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
	<CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
	<CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>
	<CADX6M3rdHPKSk4VMUpbxKSA=rhVVUk+L=dG1xMibWz1KntF2AA@mail.gmail.com>
	<CF3BD88C.A7D5%carsonhh@gmail.com>
Message-ID: <etPan.5318e112.238e1f29.9018@pshen01-imac.phage.bcgsc.ca>

Hi, Carson. I agree that identifying non-coding RNA by homology in general is a non-trivial problem. In my particular case, I have a well annotated reference species that is very closely related (99.2% sequence identity), so lifting over the annotations from that reference species to my species should be pretty straight forward. It would be great if MAKER had an option for RNA sequence homology similar to est2genome that does not imply the sequence is coding.

The integration of MAKER-P with tRNAscan is very useful. The identified genes are named e.g. `trnascan-205522-processed-gene-0.38`. ?tRNA genes are conventionally named according to the amino acid and anticodon, such as `trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the names with that convention?

Cheers,
Shaun


On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote:

Trying to call non-coding RNA from ESTs or even sequence homology is extremely messy (non-trivial problem in most organisms with high false positive rate), so MAKER for the most part doesn?t even try to do that. ?It focuses only on the coding genes. ?You can now use tRNAscan and snoscan in the newest version for some non-coding RNA support (those features were only added a couple of months ago). ?So just like other prediction tools (snap, augustus etc.), the primary focus has always been the coding genes. ?We?ve only started adding non-coding RNA support recently for iPlant, so it?s still relatively immature.

Thanks,
Carson


From: Shaun Jackman <sjackman at gmail.com>
Reply-To: Shaun Jackman <sjackman at gmail.com>
Date: Tuesday, March 4, 2014 at 7:10 PM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

Hi, Carson. I set  
single_length=50, and it worked like a charm. Thanks for the tip.

The rRNA genes that are found with est2genome have the feature type set to mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. Ideally the feature type would be set to rRNA or tRNA as appropriate, and would omit the UTR and CDS features. Is that a feature that you would be interested in adding to MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is standard, so determining the appropriate type should be straight forward.

Thanks again for your help with this. Cheers,
Shaun


On 27 February 2014 17:13, Carson Holt <carsonhh at gmail.com> wrote:
Set single_exon=1, and the minimum size to a smaller value. ?I think it's set to 250 right now. ?Also est2genome is looking for ORF, so if there is none (as with tRNAs) they probably won't get picked up.

--Carson?

Sent from my iPhone

On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:

Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you!

The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits?


organism_type=prokaryotic
est2genome=1
protein2genome=1
est_forward=1

Cheers,
Shaun


On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
Is there a corresponding?protein_forward=1 option to map forward protein names from protein2genome?

Cheers,
Shaun

On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote:

Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff.

--Carson?

Sent from my iPhone

On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:

What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. ?Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. ?Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). ?I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors.?

Thanks,
Carson

From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date: Wednesday, February 26, 2014 at 3:04 PM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions.

In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature.

THanks,
Mikael

26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:

It will still work without est_forward. ?It just works a little differently. ?Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome.

If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. ?Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). ?So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). ?To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). ?The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match.?

Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). ?MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. ?Also match parameters for exonerate will not be relaxed as they were with est_forward.

As you can see the behavior, is slightly different (because it?s an accidental feature).

Thanks,
Carson


From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date: Wednesday, February 26, 2014 at 6:37 AM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right?

Mikael

26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:

Yes. ?That should work as well as an accidental feature.

--Carson?

Sent from my iPhone

On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se> wrote:

Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?

Thanks,
Mikael

26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:

There is a way. ?It?s not a standard option and it?s undocumented, but if you add?est_forward=1 to the maker_opts.ctl file, then it will do just that. ?The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option. ?If you add tags to your fasta headers, those can be used to guide the mapping and naming. ?For example, gene_id=<some_gene> ?will ensure different isoforms that share a common gene_id get clustered into the same gene, and?maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp ?and just using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.

?Carson


From: Shaun Jackman <sjackman at gmail.com>
Reply-To: Shaun Jackman <sjackman at gmail.com>
Date: Tuesday, February 25, 2014 at 5:06 PM
To: <maker-devel at yandell-lab.org>
Subject: [maker-devel] Mapping gene names

Hi,

I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?

maker_opts.ctl


est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1

Thanks,
Shaun

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140306/b953179f/attachment-0002.html>

From carsonhh at gmail.com  Thu Mar  6 13:58:41 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Mar 2014 13:58:41 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <etPan.5318e112.238e1f29.9018@pshen01-imac.phage.bcgsc.ca>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
	<CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
	<CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>
	<CADX6M3rdHPKSk4VMUpbxKSA=rhVVUk+L=dG1xMibWz1KntF2AA@mail.gmail.com>
	<CF3BD88C.A7D5%carsonhh@gmail.com>
	<etPan.5318e112.238e1f29.9018@pshen01-imac.phage.bcgsc.ca>
Message-ID: <CF3E2F7A.A911%carsonhh@gmail.com>

Yes.  I?ll fix the naming.

Thanks,
Carson


From:  Shaun Jackman <sjackman at gmail.com>
Date:  Thursday, March 6, 2014 at 1:56 PM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Mapping gene names

Hi, Carson. I agree that identifying non-coding RNA by homology in general
is a non-trivial problem. In my particular case, I have a well annotated
reference species that is very closely related (99.2% sequence identity), so
lifting over the annotations from that reference species to my species
should be pretty straight forward. It would be great if MAKER had an option
for RNA sequence homology similar to est2genome that does not imply the
sequence is coding.

The integration of MAKER-P with tRNAscan is very useful. The identified
genes are named e.g. `trnascan-205522-processed-gene-0.38`.  tRNA genes are
conventionally named according to the amino acid and anticodon, such as
`trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the
names with that convention?

Cheers,
Shaun


On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote:
 
> Trying to call non-coding RNA from ESTs or even sequence homology is extremely
> messy (non-trivial problem in most organisms with high false positive rate),
> so MAKER for the most part doesn?t even try to do that.  It focuses only on
> the coding genes.  You can now use tRNAscan and snoscan in the newest version
> for some non-coding RNA support (those features were only added a couple of
> months ago).  So just like other prediction tools (snap, augustus etc.), the
> primary focus has always been the coding genes.  We?ve only started adding
> non-coding RNA support recently for iPlant, so it?s still relatively immature.
> 
> Thanks,
> Carson
> 
> 
> From: Shaun Jackman <sjackman at gmail.com>
> Reply-To: Shaun Jackman <sjackman at gmail.com>
> Date: Tuesday, March 4, 2014 at 7:10 PM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Mapping gene names
> 
> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for the
> tip.
> 
> The rRNA genes that are found with est2genome have the feature type set to
> mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features.
> Ideally the feature type would be set to rRNA or tRNA as appropriate, and
> would omit the UTR and CDS features. Is that a feature that you would be
> interested in adding to MAKER? The rRNA gene names all start with ?rrn? and
> the tRNA gene names with ?trn?, as is standard, so determining the appropriate
> type should be straight forward.
> 
> Thanks again for your help with this. Cheers,
> Shaun
> 
> 
> 
> On 27 February 2014 17:13, Carson Holt <carsonhh at gmail.com> wrote:
>> Set single_exon=1, and the minimum size to a smaller value.  I think it's set
>> to 250 right now.  Also est2genome is looking for ORF, so if there is none
>> (as with tRNAs) they probably won't get picked up.
>> 
>> --Carson 
>> 
>> Sent from my iPhone
>> 
>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:
>> 
>>> Sorry, ignore my previous question. est_forward also carries forward the
>>> names of protein evidence and works like a charm. Thank you!
>>> 
>>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5
>>> and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in
>>> the blastn output, and in the evidence_0.gff. rrn5 has perfect identity,
>>> sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 <
>>> eval_blastn=1e-10). How should I debug which filter is removing these hits?
>>> organism_type=prokaryotic
>>> est2genome=1
>>> protein2genome=1
>>> est_forward=1
>>> Cheers,
>>> Shaun
>>> 
>>> 
>>> 
>>> On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
>>>> Is there a corresponding protein_forward=1 option to map forward protein
>>>> names from protein2genome?
>>>> 
>>>> Cheers, 
>>>> Shaun
>>>> 
>>>> 
>>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com
>>>> <mailto://carsonhh at gmail.com> ) wrote:
>>>>> 
>>>>> Sorry I meant to say prefilter on the score in the mRNA column before
>>>>> passing the gff3 to model_gff.
>>>>> 
>>>>> --Carson 
>>>>> 
>>>>> Sent from my iPhone
>>>>> 
>>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>>>> 
>>>>>> What you can do is run it once with just est_forward=1 and
>>>>>> est2genome/protein2genome set to 1.  Then take those results, pass them
>>>>>> in as model_gff and use the map_forward option to then filter the results
>>>>>> based on mRNA score and that would copy names onto new gene under the
>>>>>> standard MAKER pipeline.  Eventually it?s really supposed to go into a
>>>>>> separate tool that will map genes onto new assemblies (but under the hood
>>>>>> the tool will just be calling MAKER with certain parameters restricted).
>>>>>> I do this because if people commonly use it mixed with things like SNAP I
>>>>>> can start to get some very weird behaviors.
>>>>>> 
>>>>>> Thanks,
>>>>>> Carson
>>>>>> 
>>>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>>>> Date: Wednesday, February 26, 2014 at 3:04 PM
>>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>>> 
>>>>>> It seems that this could be a very useful option in those cases where you
>>>>>> have firm a priori knowledge of the placement of ESTs. However, while
>>>>>> trying it I note that est_forward implies that the est2genome predictor
>>>>>> is turned on, implicitly. Is this necessary for this to work? I?m after
>>>>>> the behavior you describe below where exonerate is made to try really
>>>>>> hard within a limited region to align an est, but I would not like maker
>>>>>> to produce est2genome predictions.
>>>>>> 
>>>>>> In general, I think this maker_coor and est_forward is a feature set that
>>>>>> is worthy to be promoted into a documented feature.
>>>>>> 
>>>>>> THanks,
>>>>>> Mikael
>>>>>> 
>>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>> 
>>>>>>> It will still work without est_forward.  It just works a little
>>>>>>> differently.  Keep in mind this was a hidden feature I used to find
>>>>>>> stubborn or hard to find missing genes after reassembly of a genome.
>>>>>>> 
>>>>>>> If est_forward is provided, MAKER will parse the database to look for
>>>>>>> the maker_coor tags early in the pipeline.  Then it will create a list
>>>>>>> of locations to search, and it will search them even if there are no
>>>>>>> BLAST results to seed the search (normally MAKER gets a BLAST result
>>>>>>> first and then polishes it with exonerate).  So maker_coor=chr1 will
>>>>>>> cause MAKER to look for a match using all of chr1 as the input to
>>>>>>> exonerate even when BLAST finds nothing (this is a very very slow
>>>>>>> search, but can help pick up one or two stubborn genes that don?t remap
>>>>>>> well).  To allow this, MAKER gives exonerate looser matching parameters
>>>>>>> (i.e. allows for single base pair introns perhaps caused by assembly
>>>>>>> errors).  The logic here is that given the fact that I already told
>>>>>>> MAKER that with some degree of confidence I expect sequence A to map to
>>>>>>> to location X, it will try its hardest to make it match.
>>>>>>> 
>>>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm
>>>>>>> at line 1563, but only after a BLAST alignment has already seeded it to
>>>>>>> the region (that BLAST result has the information in its description
>>>>>>> parameter).  MAKER will then ignore seeds completely outside of
>>>>>>> maker_coor. In addition any BLAST seeds that overlap maker_coor will get
>>>>>>> the search space for alignment polishing adjusted to match maker_coor
>>>>>>> exactly.  Also match parameters for exonerate will not be relaxed as
>>>>>>> they were with est_forward.
>>>>>>> 
>>>>>>> As you can see the behavior, is slightly different (because it?s an
>>>>>>> accidental feature).
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Carson
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM
>>>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>>>> 
>>>>>>> That might be a useful and time saving accidental feature. But, reading
>>>>>>> the code, it seems that I need to supply maker_coor but not gene_id, as
>>>>>>> well as the configuration option est_forward for this to work. Any
>>>>>>> occurrences of maker_coor in GI.pm seems to be conditioned on
>>>>>>> set_forward=1 right?
>>>>>>> 
>>>>>>> Mikael
>>>>>>> 
>>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>> 
>>>>>>> Yes.  That should work as well as an accidental feature.
>>>>>>> 
>>>>>>> --Carson 
>>>>>>> 
>>>>>>> Sent from my iPhone
>>>>>>> 
>>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling
>>>>>>> <mikael.durling at slu.se> wrote:
>>>>>>> 
>>>>>>> Can this use of maker_coor be used only to hint about the placement of
>>>>>>> the ests, without affecting the naming of the final genes? Ie if I have
>>>>>>> a database of EST where I have a priori knowledge of their rough
>>>>>>> placement, can this placement be given to maker without providing
>>>>>>> est_forward=1?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Mikael
>>>>>>> 
>>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>> 
>>>>>>> There is a way.  It?s not a standard option and it?s undocumented, but
>>>>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do
>>>>>>> just that.  The option won?t already be there so you?ll have to type it
>>>>>>> in.
>>>>>>> 
>>>>>>> There is also a feature designed to work with this option.  If you add
>>>>>>> tags to your fasta headers, those can be used to guide the mapping and
>>>>>>> naming.  For example, gene_id=<some_gene>  will ensure different
>>>>>>> isoforms that share a common gene_id get clustered into the same gene,
>>>>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular
>>>>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp
>>>>>>> and just using maker_coor=chr1 will force it to only be mapped against
>>>>>>> chr1.
>>>>>>> 
>>>>>>> This is an undocumented way to remap genes onto new assemblies using
>>>>>>> blast alignments of earlier transcript or protein annotations as a
>>>>>>> guide.
>>>>>>> 
>>>>>>> ?Carson
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> From: Shaun Jackman <sjackman at gmail.com>
>>>>>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>>>>>> To: <maker-devel at yandell-lab.org>
>>>>>>> Subject: [maker-devel] Mapping gene names
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I?m annotating a genome using a closely related genome from Genbank,
>>>>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence
>>>>>>> to annotate my genome. I?ve run Maker, and the annotation seems to have
>>>>>>> worked well. Is it possible to map the names of the genes from the
>>>>>>> related species to my annotation? I see the map_forward option, which
>>>>>>> applies to the model_gff parameter. Is there a similar option for est
>>>>>>> and protein?
>>>>>>> 
>>>>>>> maker_opts.ctl
>>>>>>> est=NC_123456.frn
>>>>>>> protein=NC_123456.faa
>>>>>>> est2genome=1
>>>>>>> protein2genome=1
>>>>>>> Thanks,
>>>>>>> Shaun
>>>>>>> _______________________________________________ maker-devel mailing list
>>>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listin
>>>>>>> fo/maker-devel_yandell-lab.org
>>>>>>> _______________________________________________
>>>>>>> maker-devel mailing list
>>>>>>> maker-devel at box290.bluehost.com
>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> _______________________________________________
>>>>> maker-devel mailing list
>>>>> maker-devel at box290.bluehost.com
>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140306/7d17d96d/attachment-0002.html>

From carson.holt at genetics.utah.edu  Thu Mar  6 16:00:40 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Thu, 6 Mar 2014 23:00:40 +0000
Subject: [maker-devel] maker problem with running blast
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A890BAE7@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A890BAE7@SKREGIXES2.AGR.GC.CA>
Message-ID: <CF3E4A6E.A92B%carson.holt@genetics.utah.edu>

Your blast_type parameter in maker_bopts.ctl is set to 'wublast' but the
executables for wublast are blank in maker_exe.ctl.

See, they?re blank ?>
xdformat=#location of WUBLAST xdformat executable
blasta=#location of WUBLAST blasta executable


You either need to provide executables or set your blast_type parameter to
something else. For example, you could set it to 'NCBI+', but you will nee
to fix the location of makeblastdb.

makeblastdb is set incorrectly here?>
makeblastdb=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+ #location of
NCBI+ makeblastdb executable


Alternativley you can set blast_type to 'NCBI', but you will need to
uncomment the executables.

Here?>
formatdb=#/usr/local/bin/formatdb #location of NCBI formatdb executable
blastall=#/usr/local/bin/blastall #location of NCBI blastall executable


?Carson


On 3/6/14, 3:51 PM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:

>Hi
>
>I have installed latest version of blast+ and provided the excitable path
>to the maker_exec.ctl  as follow
>
>#-----Location of Executables Used by MAKER/EVALUATOR
>makeblastdb=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+ #location of
>NCBI+ makeblastdb executable
>blastn=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+/bin/blastn #location
>of NCBI+ blastn executable
>blastx=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+/bin/blastx #location
>of NCBI+ blastx executable
>tblastx=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+/bin/tblastx
>#location of NCBI+ tblastx executable
>formatdb=#/usr/local/bin/formatdb #location of NCBI formatdb executable
>blastall=#/usr/local/bin/blastall #location of NCBI blastall executable
>xdformat=#location of WUBLAST xdformat executable
>blasta=#location of WUBLAST blasta executable
>RepeatMasker=/usr/local/RepeatMasker/RepeatMasker #location of
>RepeatMasker executable
>exonerate=/home/AAFC-AAC/borhanh/bin/exonerate-2.2.0-x86_64/bin/exonerate
>#location of exonerate executable
>
>#-----Ab-initio Gene Prediction Algorithms
>snap=/home/AAFC-AAC/borhanh/bin/snap/snap #location of snap executable
>gmhmme3=/home/AAFC-AAC/borhanh/bin/gm_es_bp_linux64_v2.3e/gmes/gmhmme3
>#location of eukaryotic genemark executable
>gmhmmp= #location of prokaryotic genemark executable
>augustus=/usr/local/augustus.2.5.5/bin/augustus #location of augustus
>executable
>fgenesh=/usr/local/FGENESH/fgenesh #location of fgenesh executable
>
>#-----Other Algorithms
>fathom=/home/AAFC-AAC/borhanh/bin/snap/fathom #location of fathom
>executable (experimental)
>probuild=/home/AAFC-AAC/borhanh/bin/gm_es_bp_linux64_v2.3e/gmes/probuild
>#location of probuild executable (required for genemark)
>
>
>
>
>
>But when running maker I get this error
>
>
>STATUS: Parsing control files...
>WARNING: blast_type is set to 'wublast' but executables cannot be located
>ERROR: Please provide a valid locaction for a BLAST algorithm in the
>control files.
>
>
>
>
>
>
>


From sjackman at gmail.com  Thu Mar  6 16:33:04 2014
From: sjackman at gmail.com (Shaun Jackman)
Date: Thu, 6 Mar 2014 15:33:04 -0800
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CF3E2F7A.A911%carsonhh@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
	<CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
	<CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>
	<CADX6M3rdHPKSk4VMUpbxKSA=rhVVUk+L=dG1xMibWz1KntF2AA@mail.gmail.com>
	<CF3BD88C.A7D5%carsonhh@gmail.com>
	<etPan.5318e112.238e1f29.9018@pshen01-imac.phage.bcgsc.ca>
	<CF3E2F7A.A911%carsonhh@gmail.com>
Message-ID: <etPan.531905bf.79e2a9e3.9018@pshen01-imac.phage.bcgsc.ca>

Fantastic. Thanks, Carson. When I use both est2genome and tRNAscan to identify tRNA, I was hoping that both forms of evidence would be used to create a single gene model, which doesn?t seem to be the case. I get duplicate overlapping gene models (one mRNA from est and one tRNA from tRNAscan). Could MAKER merge these models?

Cheers,
Shaun
On 2014-March-06 at 12:58:50 , Carson Holt (carsonhh at gmail.com) wrote:

Yes. ?I?ll fix the naming.

Thanks,
Carson


From: Shaun Jackman <sjackman at gmail.com>
Date: Thursday, March 6, 2014 at 1:56 PM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

Hi, Carson. I agree that identifying non-coding RNA by homology in general is a non-trivial problem. In my particular case, I have a well annotated reference species that is very closely related (99.2% sequence identity), so lifting over the annotations from that reference species to my species should be pretty straight forward. It would be great if MAKER had an option for RNA sequence homology similar to est2genome that does not imply the sequence is coding.

The integration of MAKER-P with tRNAscan is very useful. The identified genes are named e.g. `trnascan-205522-processed-gene-0.38`. ?tRNA genes are conventionally named according to the amino acid and anticodon, such as `trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the names with that convention?

Cheers,
Shaun


On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote:

Trying to call non-coding RNA from ESTs or even sequence homology is extremely messy (non-trivial problem in most organisms with high false positive rate), so MAKER for the most part doesn?t even try to do that. ?It focuses only on the coding genes. ?You can now use tRNAscan and snoscan in the newest version for some non-coding RNA support (those features were only added a couple of months ago). ?So just like other prediction tools (snap, augustus etc.), the primary focus has always been the coding genes. ?We?ve only started adding non-coding RNA support recently for iPlant, so it?s still relatively immature.

Thanks,
Carson


From: Shaun Jackman <sjackman at gmail.com>
Reply-To: Shaun Jackman <sjackman at gmail.com>
Date: Tuesday, March 4, 2014 at 7:10 PM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

Hi, Carson. I set  
single_length=50, and it worked like a charm. Thanks for the tip.

The rRNA genes that are found with est2genome have the feature type set to mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. Ideally the feature type would be set to rRNA or tRNA as appropriate, and would omit the UTR and CDS features. Is that a feature that you would be interested in adding to MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is standard, so determining the appropriate type should be straight forward.

Thanks again for your help with this. Cheers,
Shaun


On 27 February 2014 17:13, Carson Holt <carsonhh at gmail.com> wrote:
Set single_exon=1, and the minimum size to a smaller value. ?I think it's set to 250 right now. ?Also est2genome is looking for ORF, so if there is none (as with tRNAs) they probably won't get picked up.

--Carson?

Sent from my iPhone

On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:

Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you!

The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits?


organism_type=prokaryotic
est2genome=1
protein2genome=1
est_forward=1

Cheers,
Shaun


On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
Is there a corresponding?protein_forward=1 option to map forward protein names from protein2genome?

Cheers,
Shaun

On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote:

Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff.

--Carson?

Sent from my iPhone

On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:

What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. ?Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. ?Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). ?I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors.?

Thanks,
Carson

From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date: Wednesday, February 26, 2014 at 3:04 PM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions.

In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature.

THanks,
Mikael

26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:

It will still work without est_forward. ?It just works a little differently. ?Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome.

If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. ?Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). ?So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). ?To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). ?The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match.?

Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). ?MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. ?Also match parameters for exonerate will not be relaxed as they were with est_forward.

As you can see the behavior, is slightly different (because it?s an accidental feature).

Thanks,
Carson


From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date: Wednesday, February 26, 2014 at 6:37 AM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right?

Mikael

26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:

Yes. ?That should work as well as an accidental feature.

--Carson?

Sent from my iPhone

On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se> wrote:

Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?

Thanks,
Mikael

26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:

There is a way. ?It?s not a standard option and it?s undocumented, but if you add?est_forward=1 to the maker_opts.ctl file, then it will do just that. ?The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option. ?If you add tags to your fasta headers, those can be used to guide the mapping and naming. ?For example, gene_id=<some_gene> ?will ensure different isoforms that share a common gene_id get clustered into the same gene, and?maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp ?and just using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.

?Carson


From: Shaun Jackman <sjackman at gmail.com>
Reply-To: Shaun Jackman <sjackman at gmail.com>
Date: Tuesday, February 25, 2014 at 5:06 PM
To: <maker-devel at yandell-lab.org>
Subject: [maker-devel] Mapping gene names

Hi,

I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?

maker_opts.ctl


est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1

Thanks,
Shaun

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140306/41dd51b0/attachment-0002.html>

From carsonhh at gmail.com  Thu Mar  6 16:38:48 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Mar 2014 16:38:48 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <etPan.531905bf.79e2a9e3.9018@pshen01-imac.phage.bcgsc.ca>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
	<CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
	<CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>
	<CADX6M3rdHPKSk4VMUpbxKSA=rhVVUk+L=dG1xMibWz1KntF2AA@mail.gmail.com>
	<CF3BD88C.A7D5%carsonhh@gmail.com>
	<etPan.5318e112.238e1f29.9018@pshen01-imac.phage.bcgsc.ca>
	<CF3E2F7A.A911%carsonhh@gmail.com>
	<etPan.531905bf.79e2a9e3.9018@pshen01-imac.phage.bcgsc.ca>
Message-ID: <CF3E5408.A93F%carsonhh@gmail.com>

Well? not really.  I have no plans to add est2genome support for noncoding
genes (non-trivial), so you would either have to remove the ncRNA from your
input, or filter it out downstream.

Thanks,
Carson


From:  Shaun Jackman <sjackman at gmail.com>
Date:  Thursday, March 6, 2014 at 4:33 PM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Mapping gene names

Fantastic. Thanks, Carson. When I use both est2genome and tRNAscan to
identify tRNA, I was hoping that both forms of evidence would be used to
create a single gene model, which doesn?t seem to be the case. I get
duplicate overlapping gene models (one mRNA from est and one tRNA from
tRNAscan). Could MAKER merge these models?

Cheers,
Shaun
On 2014-March-06 at 12:58:50 , Carson Holt (carsonhh at gmail.com) wrote:
 
> Yes.  I?ll fix the naming.
> 
> Thanks,
> Carson
> 
> 
> From: Shaun Jackman <sjackman at gmail.com>
> Date: Thursday, March 6, 2014 at 1:56 PM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Mapping gene names
> 
> Hi, Carson. I agree that identifying non-coding RNA by homology in general is
> a non-trivial problem. In my particular case, I have a well annotated
> reference species that is very closely related (99.2% sequence identity), so
> lifting over the annotations from that reference species to my species should
> be pretty straight forward. It would be great if MAKER had an option for RNA
> sequence homology similar to est2genome that does not imply the sequence is
> coding.
> 
> The integration of MAKER-P with tRNAscan is very useful. The identified genes
> are named e.g. `trnascan-205522-processed-gene-0.38`.  tRNA genes are
> conventionally named according to the amino acid and anticodon, such as
> `trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the names
> with that convention?
> 
> Cheers,
> Shaun
> 
> 
> On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote:
>> 
>> Trying to call non-coding RNA from ESTs or even sequence homology is
>> extremely messy (non-trivial problem in most organisms with high false
>> positive rate), so MAKER for the most part doesn?t even try to do that.  It
>> focuses only on the coding genes.  You can now use tRNAscan and snoscan in
>> the newest version for some non-coding RNA support (those features were only
>> added a couple of months ago).  So just like other prediction tools (snap,
>> augustus etc.), the primary focus has always been the coding genes.  We?ve
>> only started adding non-coding RNA support recently for iPlant, so it?s still
>> relatively immature.
>> 
>> Thanks,
>> Carson
>> 
>> 
>> From: Shaun Jackman <sjackman at gmail.com>
>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>> Date: Tuesday, March 4, 2014 at 7:10 PM
>> To: Carson Holt <carsonhh at gmail.com>
>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] Mapping gene names
>> 
>> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for
>> the tip.
>> 
>> The rRNA genes that are found with est2genome have the feature type set to
>> mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features.
>> Ideally the feature type would be set to rRNA or tRNA as appropriate, and
>> would omit the UTR and CDS features. Is that a feature that you would be
>> interested in adding to MAKER? The rRNA gene names all start with ?rrn? and
>> the tRNA gene names with ?trn?, as is standard, so determining the
>> appropriate type should be straight forward.
>> 
>> Thanks again for your help with this. Cheers,
>> Shaun
>> 
>> 
>> 
>> On 27 February 2014 17:13, Carson Holt <carsonhh at gmail.com> wrote:
>>> Set single_exon=1, and the minimum size to a smaller value.  I think it's
>>> set to 250 right now.  Also est2genome is looking for ORF, so if there is
>>> none (as with tRNAs) they probably won't get picked up.
>>> 
>>> --Carson 
>>> 
>>> Sent from my iPhone
>>> 
>>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:
>>> 
>>>> Sorry, ignore my previous question. est_forward also carries forward the
>>>> names of protein evidence and works like a charm. Thank you!
>>>> 
>>>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller
>>>> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They
>>>> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect
>>>> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value
>>>> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing
>>>> these hits?
>>>> organism_type=prokaryotic
>>>> est2genome=1
>>>> protein2genome=1
>>>> est_forward=1
>>>> Cheers,
>>>> Shaun
>>>> 
>>>> 
>>>> 
>>>> On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
>>>>> Is there a corresponding protein_forward=1 option to map forward protein
>>>>> names from protein2genome?
>>>>> 
>>>>> Cheers, 
>>>>> Shaun
>>>>> 
>>>>> 
>>>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com
>>>>> <mailto://carsonhh at gmail.com> ) wrote:
>>>>>> 
>>>>>> Sorry I meant to say prefilter on the score in the mRNA column before
>>>>>> passing the gff3 to model_gff.
>>>>>> 
>>>>>> --Carson 
>>>>>> 
>>>>>> Sent from my iPhone
>>>>>> 
>>>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>>>>> 
>>>>>>> What you can do is run it once with just est_forward=1 and
>>>>>>> est2genome/protein2genome set to 1.  Then take those results, pass them
>>>>>>> in as model_gff and use the map_forward option to then filter the
>>>>>>> results based on mRNA score and that would copy names onto new gene
>>>>>>> under the standard MAKER pipeline.  Eventually it?s really supposed to
>>>>>>> go into a separate tool that will map genes onto new assemblies (but
>>>>>>> under the hood the tool will just be calling MAKER with certain
>>>>>>> parameters restricted).  I do this because if people commonly use it
>>>>>>> mixed with things like SNAP I can start to get some very weird
>>>>>>> behaviors. 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Carson
>>>>>>> 
>>>>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>>>>> Date: Wednesday, February 26, 2014 at 3:04 PM
>>>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>>>> 
>>>>>>> It seems that this could be a very useful option in those cases where
>>>>>>> you have firm a priori knowledge of the placement of ESTs. However,
>>>>>>> while trying it I note that est_forward implies that the est2genome
>>>>>>> predictor is turned on, implicitly. Is this necessary for this to work?
>>>>>>> I?m after the behavior you describe below where exonerate is made to try
>>>>>>> really hard within a limited region to align an est, but I would not
>>>>>>> like maker to produce est2genome predictions.
>>>>>>> 
>>>>>>> In general, I think this maker_coor and est_forward is a feature set
>>>>>>> that is worthy to be promoted into a documented feature.
>>>>>>> 
>>>>>>> THanks,
>>>>>>> Mikael
>>>>>>> 
>>>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>> 
>>>>>>> It will still work without est_forward.  It just works a little
>>>>>>> differently.  Keep in mind this was a hidden feature I used to find
>>>>>>> stubborn or hard to find missing genes after reassembly of a genome.
>>>>>>> 
>>>>>>> If est_forward is provided, MAKER will parse the database to look for
>>>>>>> the maker_coor tags early in the pipeline.  Then it will create a list
>>>>>>> of locations to search, and it will search them even if there are no
>>>>>>> BLAST results to seed the search (normally MAKER gets a BLAST result
>>>>>>> first and then polishes it with exonerate).  So maker_coor=chr1 will
>>>>>>> cause MAKER to look for a match using all of chr1 as the input to
>>>>>>> exonerate even when BLAST finds nothing (this is a very very slow
>>>>>>> search, but can help pick up one or two stubborn genes that don?t remap
>>>>>>> well).  To allow this, MAKER gives exonerate looser matching parameters
>>>>>>> (i.e. allows for single base pair introns perhaps caused by assembly
>>>>>>> errors).  The logic here is that given the fact that I already told
>>>>>>> MAKER that with some degree of confidence I expect sequence A to map to
>>>>>>> to location X, it will try its hardest to make it match.
>>>>>>> 
>>>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm
>>>>>>> at line 1563, but only after a BLAST alignment has already seeded it to
>>>>>>> the region (that BLAST result has the information in its description
>>>>>>> parameter).  MAKER will then ignore seeds completely outside of
>>>>>>> maker_coor. In addition any BLAST seeds that overlap maker_coor will get
>>>>>>> the search space for alignment polishing adjusted to match maker_coor
>>>>>>> exactly.  Also match parameters for exonerate will not be relaxed as
>>>>>>> they were with est_forward.
>>>>>>> 
>>>>>>> As you can see the behavior, is slightly different (because it?s an
>>>>>>> accidental feature).
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Carson
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM
>>>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>>>> 
>>>>>>> That might be a useful and time saving accidental feature. But, reading
>>>>>>> the code, it seems that I need to supply maker_coor but not gene_id, as
>>>>>>> well as the configuration option est_forward for this to work. Any
>>>>>>> occurrences of maker_coor in GI.pm seems to be conditioned on
>>>>>>> set_forward=1 right?
>>>>>>> 
>>>>>>> Mikael
>>>>>>> 
>>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>> 
>>>>>>> Yes.  That should work as well as an accidental feature.
>>>>>>> 
>>>>>>> --Carson 
>>>>>>> 
>>>>>>> Sent from my iPhone
>>>>>>> 
>>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling
>>>>>>> <mikael.durling at slu.se> wrote:
>>>>>>> 
>>>>>>> Can this use of maker_coor be used only to hint about the placement of
>>>>>>> the ests, without affecting the naming of the final genes? Ie if I have
>>>>>>> a database of EST where I have a priori knowledge of their rough
>>>>>>> placement, can this placement be given to maker without providing
>>>>>>> est_forward=1?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Mikael
>>>>>>> 
>>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>> 
>>>>>>> There is a way.  It?s not a standard option and it?s undocumented, but
>>>>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do
>>>>>>> just that.  The option won?t already be there so you?ll have to type it
>>>>>>> in.
>>>>>>> 
>>>>>>> There is also a feature designed to work with this option.  If you add
>>>>>>> tags to your fasta headers, those can be used to guide the mapping and
>>>>>>> naming.  For example, gene_id=<some_gene>  will ensure different
>>>>>>> isoforms that share a common gene_id get clustered into the same gene,
>>>>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular
>>>>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp
>>>>>>> and just using maker_coor=chr1 will force it to only be mapped against
>>>>>>> chr1.
>>>>>>> 
>>>>>>> This is an undocumented way to remap genes onto new assemblies using
>>>>>>> blast alignments of earlier transcript or protein annotations as a
>>>>>>> guide.
>>>>>>> 
>>>>>>> ?Carson
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> From: Shaun Jackman <sjackman at gmail.com>
>>>>>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>>>>>> To: <maker-devel at yandell-lab.org>
>>>>>>> Subject: [maker-devel] Mapping gene names
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I?m annotating a genome using a closely related genome from Genbank,
>>>>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence
>>>>>>> to annotate my genome. I?ve run Maker, and the annotation seems to have
>>>>>>> worked well. Is it possible to map the names of the genes from the
>>>>>>> related species to my annotation? I see the map_forward option, which
>>>>>>> applies to the model_gff parameter. Is there a similar option for est
>>>>>>> and protein?
>>>>>>> 
>>>>>>> maker_opts.ctl
>>>>>>> est=NC_123456.frn
>>>>>>> protein=NC_123456.faa
>>>>>>> est2genome=1
>>>>>>> protein2genome=1
>>>>>>> Thanks,
>>>>>>> Shaun
>>>>>>> _______________________________________________ maker-devel mailing list
>>>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listin
>>>>>>> fo/maker-devel_yandell-lab.org
>>>>>>> _______________________________________________
>>>>>>> maker-devel mailing list
>>>>>>> maker-devel at box290.bluehost.com
>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> _______________________________________________
>>>>>> maker-devel mailing list
>>>>>> maker-devel at box290.bluehost.com
>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>> 
>> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140306/1c286d5e/attachment-0002.html>

From sbrubaker at solazyme.com  Thu Mar  6 16:41:55 2014
From: sbrubaker at solazyme.com (Shane Brubaker)
Date: Thu, 6 Mar 2014 23:41:55 +0000
Subject: [maker-devel] Long introns from Augustus
Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA50F08236@EXCHANGE-MB01.internal.solazyme.com>

Hi, we have a very compact genome and we are getting a lot of fused gene models from running Augustus.  I am wondering if anyone has any advice about how to prevent introns above a certain cutoff from being created?

I tried a couple of things, some settings in a probabilities file and also changing a long list of probabilities to another file that someone had suggested on a forum.  So far I don't really see any changes though.

Any advice would be greatly appreciated.  

Thanks,
Shane


From carsonhh at gmail.com  Thu Mar  6 16:46:53 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Mar 2014 16:46:53 -0700
Subject: [maker-devel] Long introns from Augustus
Message-ID: <CF3E5643.A94C%carsonhh@gmail.com>

Are these the ab intio calls that are merged or final MAKER models.

?Carson


On 3/6/14, 4:41 PM, "Shane Brubaker" <sbrubaker at solazyme.com> wrote:

>Hi, we have a very compact genome and we are getting a lot of fused gene
>models from running Augustus.  I am wondering if anyone has any advice
>about how to prevent introns above a certain cutoff from being created?
>
>I tried a couple of things, some settings in a probabilities file and
>also changing a long list of probabilities to another file that someone
>had suggested on a forum.  So far I don't really see any changes though.
>
>Any advice would be greatly appreciated.
>
>Thanks,
>Shane
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From sbrubaker at solazyme.com  Thu Mar  6 17:48:15 2014
From: sbrubaker at solazyme.com (Shane Brubaker)
Date: Fri, 7 Mar 2014 00:48:15 +0000
Subject: [maker-devel] Long introns from Augustus
In-Reply-To: <CF3E5643.A94C%carsonhh@gmail.com>
References: <CF3E5643.A94C%carsonhh@gmail.com>
Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com>

Actually these are calls directly from Augustus (without using Maker).  They are not purely ab initio in that they are using hints from RNA-Seq data.

I had noticed that Maker does have some information about max intron length - does that mean it could be taken care of by Maker?  I don't have very good "EST" (transcriptome) assemblies because it is a very difficult organism to sequence.


-----Original Message-----
From: Carson Holt [mailto:carsonhh at gmail.com] 
Sent: Thursday, March 06, 2014 3:47 PM
To: Shane Brubaker; maker-devel at yandell-lab.org
Subject: Re: [maker-devel] Long introns from Augustus

Are these the ab intio calls that are merged or final MAKER models.

?Carson


On 3/6/14, 4:41 PM, "Shane Brubaker" <sbrubaker at solazyme.com> wrote:

>Hi, we have a very compact genome and we are getting a lot of fused 
>gene models from running Augustus.  I am wondering if anyone has any 
>advice about how to prevent introns above a certain cutoff from being created?
>
>I tried a couple of things, some settings in a probabilities file and 
>also changing a long list of probabilities to another file that someone 
>had suggested on a forum.  So far I don't really see any changes though.
>
>Any advice would be greatly appreciated.
>
>Thanks,
>Shane
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From mikael.durling at slu.se  Mon Mar 10 04:27:25 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Mon, 10 Mar 2014 10:27:25 +0000
Subject: [maker-devel] keep_preds values
Message-ID: <6765E2B1-3B6F-4F5D-92E1-80AE8C315FE3@slu.se>

Hi,

Can someone, please, explain the keep_preds parameter, as it works now with a value between 1 and 0? It used to be binary, but now it seems to test concordance towards something. The maker wiki doesn?t explain it any further either.

Thanks,
Mikael


From robert.king at rothamsted.ac.uk  Mon Mar 10 06:17:07 2014
From: robert.king at rothamsted.ac.uk (Robert King (RRes-Roth))
Date: Mon, 10 Mar 2014 12:17:07 +0000
Subject: [maker-devel] annotation comparison aed plots
Message-ID: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk>

Dear Maker Developers,

I've updated a reference that was had errors and was a little incomplete and now trying to produce a annotation for it. Please note the reference has not changed dramatically. I've produced two annotations using as evidence:

Annotation 1:
Uniprot proteins search using species keyword "fusarium"
Pubmed mRNA for the name of the organism
Prior annotation reference transcripts

Annotation 2:
Uniprot proteins search using species keyword "fusarium"
Pubmed mRNA for the name of the organism
Prior annotation reference transcripts
mRNA trinity assembly pasafly of different strain (only RNA-seq available)

I'm not sure if it was a smart move to use the prior annotation reference transcripts?

I want to compare these two annotations and have produced AED scores. How do I generate summary stats/figures to compare annotations. You mentioned last year in a post Mike Campbell has a script to produce these, do you know if he will post it? I've got the Eval program and converted to gtf format using the provided script, just waiting on some perl modules to be installed by admin to test it. I'm waiting on some perl modules to be installed by our administrator to test out the "Evaluator" and "compare" programs too, what do they do?

Best Wishes
Rob

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and we believe 
but do not warrant that this e-mail and any attachments
thereto do not contain any viruses. However, you are fully
responsible for performing any virus scanning.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140310/c3507502/attachment-0002.html>

From dence at genetics.utah.edu  Mon Mar 10 08:47:42 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Mon, 10 Mar 2014 14:47:42 +0000
Subject: [maker-devel] keep_preds values
In-Reply-To: <6765E2B1-3B6F-4F5D-92E1-80AE8C315FE3@slu.se>
References: <6765E2B1-3B6F-4F5D-92E1-80AE8C315FE3@slu.se>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6BA90@mxb2.hg.genetics.utah.edu>

Hi Mikael, 

The keep_preds parameter is often used the same as a binary parameter, but it doesn't have to be. The concordance that is mentioned in the comment line is the AED for that prediction. AED is a measurement of how well a prediction is supported by the evidence and ranges from 0 - 1. A prediction with an AED of 0 matches the evidence exactly while a prediction with an AED of 1 isn't overlapped by any evidence. 

The default behavior for MAKER is to make a gene model out of a prediction with any AED <1. When you change the keep_preds option from 0 to 1, then MAKER will make a gene model out of any prediction that matches the other parameters (like single_exon, min_exon, etc). Setting the keep_preds option to somewhere in between 0 and 1 will set a ceiling on the AED required for promoting a prediction to a gene model. 

>From a user standpoint, when you will almost certainly lose gene models when you set AED at an intermediate value, but you might benefit by knowing that all your models will now have an AED of at least a certain value. 

I hope that helps; let me know if it didn't. 

~Daniel

PS The original paper that described the AED is Eilbeck et al in BMC Bioinformatics 2009. It's also discussed in more detail in the MAKER2 paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews Genetics paper from 2012. 

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Mikael Brandstr?m Durling [mikael.durling at slu.se]
Sent: Monday, March 10, 2014 4:27 AM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] keep_preds values

Hi,

Can someone, please, explain the keep_preds parameter, as it works now with a value between 1 and 0? It used to be binary, but now it seems to test concordance towards something. The maker wiki doesn?t explain it any further either.

Thanks,
Mikael


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Mon Mar 10 09:51:21 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Mar 2014 08:51:21 -0700
Subject: [maker-devel] keep_preds values
Message-ID: <CF432CF3.A9C7%carsonhh@gmail.com>

Actually that is false. The keep_preds option is still binary.  Any value
other than 0 sets it to true.  There was discussion about making it a
non-binary value, but that has not been implemented.

?Carson


On 3/10/14, 7:47 AM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hi Mikael, 
>
>The keep_preds parameter is often used the same as a binary parameter,
>but it doesn't have to be. The concordance that is mentioned in the
>comment line is the AED for that prediction. AED is a measurement of how
>well a prediction is supported by the evidence and ranges from 0 - 1. A
>prediction with an AED of 0 matches the evidence exactly while a
>prediction with an AED of 1 isn't overlapped by any evidence.
>
>The default behavior for MAKER is to make a gene model out of a
>prediction with any AED <1. When you change the keep_preds option from 0
>to 1, then MAKER will make a gene model out of any prediction that
>matches the other parameters (like single_exon, min_exon, etc). Setting
>the keep_preds option to somewhere in between 0 and 1 will set a ceiling
>on the AED required for promoting a prediction to a gene model.
>
>From a user standpoint, when you will almost certainly lose gene models
>when you set AED at an intermediate value, but you might benefit by
>knowing that all your models will now have an AED of at least a certain
>value. 
>
>I hope that helps; let me know if it didn't.
>
>~Daniel
>
>PS The original paper that described the AED is Eilbeck et al in BMC
>Bioinformatics 2009. It's also discussed in more detail in the MAKER2
>paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews
>Genetics paper from 2012.
>
>Daniel Ence
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>Mikael Brandstr?m Durling [mikael.durling at slu.se]
>Sent: Monday, March 10, 2014 4:27 AM
>To: maker-devel at yandell-lab.org
>Subject: [maker-devel] keep_preds values
>
>Hi,
>
>Can someone, please, explain the keep_preds parameter, as it works now
>with a value between 1 and 0? It used to be binary, but now it seems to
>test concordance towards something. The maker wiki doesn?t explain it any
>further either.
>
>Thanks,
>Mikael
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From mikael.durling at slu.se  Mon Mar 10 08:57:23 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Mon, 10 Mar 2014 14:57:23 +0000
Subject: [maker-devel] keep_preds values
In-Reply-To: <CF432CF3.A9C7%carsonhh@gmail.com>
References: <CF432CF3.A9C7%carsonhh@gmail.com>
Message-ID: <E01F696F-4FC2-4B22-86B7-E40A5585A6F1@slu.se>

Hi Carson and Daniel,

That sounds more logical to me.  Then it would be appropriate to change the comment of keep_preds in the generated config files.

Would it make sense to make keep_preds a non-binary value to evaluate the concordance between ab initio models obtained from different predictors? That would assume that it is less likely to be a false positive when two or more predictors suggest the same unsported model?

Mikael


10 mar 2014 kl. 16:51 skrev Carson Holt <carsonhh at gmail.com>:

> Actually that is false. The keep_preds option is still binary.  Any value
> other than 0 sets it to true.  There was discussion about making it a
> non-binary value, but that has not been implemented.
> 
> ?Carson
> 
> 
> On 3/10/14, 7:47 AM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
> 
>> Hi Mikael, 
>> 
>> The keep_preds parameter is often used the same as a binary parameter,
>> but it doesn't have to be. The concordance that is mentioned in the
>> comment line is the AED for that prediction. AED is a measurement of how
>> well a prediction is supported by the evidence and ranges from 0 - 1. A
>> prediction with an AED of 0 matches the evidence exactly while a
>> prediction with an AED of 1 isn't overlapped by any evidence.
>> 
>> The default behavior for MAKER is to make a gene model out of a
>> prediction with any AED <1. When you change the keep_preds option from 0
>> to 1, then MAKER will make a gene model out of any prediction that
>> matches the other parameters (like single_exon, min_exon, etc). Setting
>> the keep_preds option to somewhere in between 0 and 1 will set a ceiling
>> on the AED required for promoting a prediction to a gene model.
>> 
>> From a user standpoint, when you will almost certainly lose gene models
>> when you set AED at an intermediate value, but you might benefit by
>> knowing that all your models will now have an AED of at least a certain
>> value. 
>> 
>> I hope that helps; let me know if it didn't.
>> 
>> ~Daniel
>> 
>> PS The original paper that described the AED is Eilbeck et al in BMC
>> Bioinformatics 2009. It's also discussed in more detail in the MAKER2
>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews
>> Genetics paper from 2012.
>> 
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> ________________________________________
>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>> Mikael Brandstr?m Durling [mikael.durling at slu.se]
>> Sent: Monday, March 10, 2014 4:27 AM
>> To: maker-devel at yandell-lab.org
>> Subject: [maker-devel] keep_preds values
>> 
>> Hi,
>> 
>> Can someone, please, explain the keep_preds parameter, as it works now
>> with a value between 1 and 0? It used to be binary, but now it seems to
>> test concordance towards something. The maker wiki doesn?t explain it any
>> further either.
>> 
>> Thanks,
>> Mikael
>> 
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> 


From carsonhh at gmail.com  Mon Mar 10 09:59:43 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Mar 2014 08:59:43 -0700
Subject: [maker-devel] keep_preds values
In-Reply-To: <E01F696F-4FC2-4B22-86B7-E40A5585A6F1@slu.se>
References: <CF432CF3.A9C7%carsonhh@gmail.com>
	<E01F696F-4FC2-4B22-86B7-E40A5585A6F1@slu.se>
Message-ID: <CF432F23.A9D4%carsonhh@gmail.com>

Yes.  It will eventually perform an AED like calculation between multiple
predictors (i.e. if you use 3 predictors it, then you require support by
at least 2 predictors across all exons to get a value of 0.33).  A value
of 0 would be perfect concordance across all 3 predictors.

?Carson


On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
wrote:

>Hi Carson and Daniel,
>
>That sounds more logical to me.  Then it would be appropriate to change
>the comment of keep_preds in the generated config files.
>
>Would it make sense to make keep_preds a non-binary value to evaluate the
>concordance between ab initio models obtained from different predictors?
>That would assume that it is less likely to be a false positive when two
>or more predictors suggest the same unsported model?
>
>Mikael
>
>
>10 mar 2014 kl. 16:51 skrev Carson Holt <carsonhh at gmail.com>:
>
>> Actually that is false. The keep_preds option is still binary.  Any
>>value
>> other than 0 sets it to true.  There was discussion about making it a
>> non-binary value, but that has not been implemented.
>> 
>> ?Carson
>> 
>> 
>> On 3/10/14, 7:47 AM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>> 
>>> Hi Mikael, 
>>> 
>>> The keep_preds parameter is often used the same as a binary parameter,
>>> but it doesn't have to be. The concordance that is mentioned in the
>>> comment line is the AED for that prediction. AED is a measurement of
>>>how
>>> well a prediction is supported by the evidence and ranges from 0 - 1. A
>>> prediction with an AED of 0 matches the evidence exactly while a
>>> prediction with an AED of 1 isn't overlapped by any evidence.
>>> 
>>> The default behavior for MAKER is to make a gene model out of a
>>> prediction with any AED <1. When you change the keep_preds option from
>>>0
>>> to 1, then MAKER will make a gene model out of any prediction that
>>> matches the other parameters (like single_exon, min_exon, etc). Setting
>>> the keep_preds option to somewhere in between 0 and 1 will set a
>>>ceiling
>>> on the AED required for promoting a prediction to a gene model.
>>> 
>>> From a user standpoint, when you will almost certainly lose gene models
>>> when you set AED at an intermediate value, but you might benefit by
>>> knowing that all your models will now have an AED of at least a certain
>>> value. 
>>> 
>>> I hope that helps; let me know if it didn't.
>>> 
>>> ~Daniel
>>> 
>>> PS The original paper that described the AED is Eilbeck et al in BMC
>>> Bioinformatics 2009. It's also discussed in more detail in the MAKER2
>>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews
>>> Genetics paper from 2012.
>>> 
>>> Daniel Ence
>>> Graduate Student
>>> Eccles Institute of Human Genetics
>>> University of Utah
>>> 15 North 2030 East, Room 2100
>>> Salt Lake City, UT 84112-5330
>>> ________________________________________
>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>>> Mikael Brandstr?m Durling [mikael.durling at slu.se]
>>> Sent: Monday, March 10, 2014 4:27 AM
>>> To: maker-devel at yandell-lab.org
>>> Subject: [maker-devel] keep_preds values
>>> 
>>> Hi,
>>> 
>>> Can someone, please, explain the keep_preds parameter, as it works now
>>> with a value between 1 and 0? It used to be binary, but now it seems to
>>> test concordance towards something. The maker wiki doesn?t explain it
>>>any
>>> further either.
>>> 
>>> Thanks,
>>> Mikael
>>> 
>>> 
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> 
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
>> 
>


From mikael.durling at slu.se  Mon Mar 10 09:08:16 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Mon, 10 Mar 2014 15:08:16 +0000
Subject: [maker-devel] keep_preds values
In-Reply-To: <CF432F23.A9D4%carsonhh@gmail.com>
References: <CF432CF3.A9C7%carsonhh@gmail.com>
	<E01F696F-4FC2-4B22-86B7-E40A5585A6F1@slu.se>
	<CF432F23.A9D4%carsonhh@gmail.com>
Message-ID: <00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se>

Ok. But that is not implemented no as far as I can tell from the source, right? Or is it reflected in the AED for the unsupported models?

Mikael

10 mar 2014 kl. 16:59 skrev Carson Holt <carsonhh at gmail.com>:

> Yes.  It will eventually perform an AED like calculation between multiple
> predictors (i.e. if you use 3 predictors it, then you require support by
> at least 2 predictors across all exons to get a value of 0.33).  A value
> of 0 would be perfect concordance across all 3 predictors.
> 
> ?Carson
> 
> 
> 
> 
> On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
> wrote:
> 
>> Hi Carson and Daniel,
>> 
>> That sounds more logical to me.  Then it would be appropriate to change
>> the comment of keep_preds in the generated config files.
>> 
>> Would it make sense to make keep_preds a non-binary value to evaluate the
>> concordance between ab initio models obtained from different predictors?
>> That would assume that it is less likely to be a false positive when two
>> or more predictors suggest the same unsported model?
>> 
>> Mikael
>> 
>> 
>> 10 mar 2014 kl. 16:51 skrev Carson Holt <carsonhh at gmail.com>:
>> 
>>> Actually that is false. The keep_preds option is still binary.  Any
>>> value
>>> other than 0 sets it to true.  There was discussion about making it a
>>> non-binary value, but that has not been implemented.
>>> 
>>> ?Carson
>>> 
>>> 
>>> On 3/10/14, 7:47 AM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>> 
>>>> Hi Mikael, 
>>>> 
>>>> The keep_preds parameter is often used the same as a binary parameter,
>>>> but it doesn't have to be. The concordance that is mentioned in the
>>>> comment line is the AED for that prediction. AED is a measurement of
>>>> how
>>>> well a prediction is supported by the evidence and ranges from 0 - 1. A
>>>> prediction with an AED of 0 matches the evidence exactly while a
>>>> prediction with an AED of 1 isn't overlapped by any evidence.
>>>> 
>>>> The default behavior for MAKER is to make a gene model out of a
>>>> prediction with any AED <1. When you change the keep_preds option from
>>>> 0
>>>> to 1, then MAKER will make a gene model out of any prediction that
>>>> matches the other parameters (like single_exon, min_exon, etc). Setting
>>>> the keep_preds option to somewhere in between 0 and 1 will set a
>>>> ceiling
>>>> on the AED required for promoting a prediction to a gene model.
>>>> 
>>>> From a user standpoint, when you will almost certainly lose gene models
>>>> when you set AED at an intermediate value, but you might benefit by
>>>> knowing that all your models will now have an AED of at least a certain
>>>> value. 
>>>> 
>>>> I hope that helps; let me know if it didn't.
>>>> 
>>>> ~Daniel
>>>> 
>>>> PS The original paper that described the AED is Eilbeck et al in BMC
>>>> Bioinformatics 2009. It's also discussed in more detail in the MAKER2
>>>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews
>>>> Genetics paper from 2012.
>>>> 
>>>> Daniel Ence
>>>> Graduate Student
>>>> Eccles Institute of Human Genetics
>>>> University of Utah
>>>> 15 North 2030 East, Room 2100
>>>> Salt Lake City, UT 84112-5330
>>>> ________________________________________
>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>>>> Mikael Brandstr?m Durling [mikael.durling at slu.se]
>>>> Sent: Monday, March 10, 2014 4:27 AM
>>>> To: maker-devel at yandell-lab.org
>>>> Subject: [maker-devel] keep_preds values
>>>> 
>>>> Hi,
>>>> 
>>>> Can someone, please, explain the keep_preds parameter, as it works now
>>>> with a value between 1 and 0? It used to be binary, but now it seems to
>>>> test concordance towards something. The maker wiki doesn?t explain it
>>>> any
>>>> further either.
>>>> 
>>>> Thanks,
>>>> Mikael
>>>> 
>>>> 
>>>> _______________________________________________
>>>> maker-devel mailing list
>>>> maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>> 
>>>> _______________________________________________
>>>> maker-devel mailing list
>>>> maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> 
>>> 
>> 
> 
> 


From carsonhh at gmail.com  Mon Mar 10 10:16:59 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Mar 2014 09:16:59 -0700
Subject: [maker-devel] keep_preds values
In-Reply-To: <00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se>
References: <CF432CF3.A9C7%carsonhh@gmail.com>
	<E01F696F-4FC2-4B22-86B7-E40A5585A6F1@slu.se>
	<CF432F23.A9D4%carsonhh@gmail.com>
	<00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se>
Message-ID: <CF4331A9.A9E0%carsonhh@gmail.com>

There is a value called abAED being calculated, which somewhat captures
the concordance among the predictors.  It is not currently printed in the
GFF3, but it is used to identify the best non-overlapping ab initio
predictor to put in the non-overlapping fasta file.  There are a couple of
things I still need to do with it to though.  It?s not yet normalized to
take into account the absence of a predictor in the cluster of overlapping
predictions. For example, if I have 2 predictors and 2 make perfectly
matching calls and 1 makes no call, they get a score of 0 before I have
perfect concordance between what?s there, but I really should make it 0.33
because the abscence of the third predictor is meaningful.  The
unnormalized concordance value is fine for deciding which overlapping
model to keep in the file, but not for global comparison.

?Carson


On 3/10/14, 8:08 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
wrote:

>Ok. But that is not implemented no as far as I can tell from the source,
>right? Or is it reflected in the AED for the unsupported models?
>
>Mikael
>
>10 mar 2014 kl. 16:59 skrev Carson Holt <carsonhh at gmail.com>:
>
>> Yes.  It will eventually perform an AED like calculation between
>>multiple
>> predictors (i.e. if you use 3 predictors it, then you require support by
>> at least 2 predictors across all exons to get a value of 0.33).  A value
>> of 0 would be perfect concordance across all 3 predictors.
>> 
>> ?Carson
>> 
>> 
>> 
>> 
>> On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
>> wrote:
>> 
>>> Hi Carson and Daniel,
>>> 
>>> That sounds more logical to me.  Then it would be appropriate to change
>>> the comment of keep_preds in the generated config files.
>>> 
>>> Would it make sense to make keep_preds a non-binary value to evaluate
>>>the
>>> concordance between ab initio models obtained from different
>>>predictors?
>>> That would assume that it is less likely to be a false positive when
>>>two
>>> or more predictors suggest the same unsported model?
>>> 
>>> Mikael
>>> 
>>> 
>>> 10 mar 2014 kl. 16:51 skrev Carson Holt <carsonhh at gmail.com>:
>>> 
>>>> Actually that is false. The keep_preds option is still binary.  Any
>>>> value
>>>> other than 0 sets it to true.  There was discussion about making it a
>>>> non-binary value, but that has not been implemented.
>>>> 
>>>> ?Carson
>>>> 
>>>> 
>>>> On 3/10/14, 7:47 AM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>>> 
>>>>> Hi Mikael, 
>>>>> 
>>>>> The keep_preds parameter is often used the same as a binary
>>>>>parameter,
>>>>> but it doesn't have to be. The concordance that is mentioned in the
>>>>> comment line is the AED for that prediction. AED is a measurement of
>>>>> how
>>>>> well a prediction is supported by the evidence and ranges from 0 -
>>>>>1. A
>>>>> prediction with an AED of 0 matches the evidence exactly while a
>>>>> prediction with an AED of 1 isn't overlapped by any evidence.
>>>>> 
>>>>> The default behavior for MAKER is to make a gene model out of a
>>>>> prediction with any AED <1. When you change the keep_preds option
>>>>>from
>>>>> 0
>>>>> to 1, then MAKER will make a gene model out of any prediction that
>>>>> matches the other parameters (like single_exon, min_exon, etc).
>>>>>Setting
>>>>> the keep_preds option to somewhere in between 0 and 1 will set a
>>>>> ceiling
>>>>> on the AED required for promoting a prediction to a gene model.
>>>>> 
>>>>> From a user standpoint, when you will almost certainly lose gene
>>>>>models
>>>>> when you set AED at an intermediate value, but you might benefit by
>>>>> knowing that all your models will now have an AED of at least a
>>>>>certain
>>>>> value. 
>>>>> 
>>>>> I hope that helps; let me know if it didn't.
>>>>> 
>>>>> ~Daniel
>>>>> 
>>>>> PS The original paper that described the AED is Eilbeck et al in BMC
>>>>> Bioinformatics 2009. It's also discussed in more detail in the MAKER2
>>>>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews
>>>>> Genetics paper from 2012.
>>>>> 
>>>>> Daniel Ence
>>>>> Graduate Student
>>>>> Eccles Institute of Human Genetics
>>>>> University of Utah
>>>>> 15 North 2030 East, Room 2100
>>>>> Salt Lake City, UT 84112-5330
>>>>> ________________________________________
>>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>>>>> Mikael Brandstr?m Durling [mikael.durling at slu.se]
>>>>> Sent: Monday, March 10, 2014 4:27 AM
>>>>> To: maker-devel at yandell-lab.org
>>>>> Subject: [maker-devel] keep_preds values
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> Can someone, please, explain the keep_preds parameter, as it works
>>>>>now
>>>>> with a value between 1 and 0? It used to be binary, but now it seems
>>>>>to
>>>>> test concordance towards something. The maker wiki doesn?t explain it
>>>>> any
>>>>> further either.
>>>>> 
>>>>> Thanks,
>>>>> Mikael
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> maker-devel mailing list
>>>>> maker-devel at box290.bluehost.com
>>>>> 
>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or
>>>>>g
>>>>> 
>>>>> _______________________________________________
>>>>> maker-devel mailing list
>>>>> maker-devel at box290.bluehost.com
>>>>> 
>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or
>>>>>g
>>>> 
>>>> 
>>> 
>> 
>> 
>


From carsonhh at gmail.com  Mon Mar 10 10:18:14 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Mar 2014 09:18:14 -0700
Subject: [maker-devel] keep_preds values
In-Reply-To: <CF4331A9.A9E0%carsonhh@gmail.com>
References: <CF432CF3.A9C7%carsonhh@gmail.com>
	<E01F696F-4FC2-4B22-86B7-E40A5585A6F1@slu.se>
	<CF432F23.A9D4%carsonhh@gmail.com>
	<00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se>
	<CF4331A9.A9E0%carsonhh@gmail.com>
Message-ID: <CF4333C1.AA06%carsonhh@gmail.com>

Sorry meant to say "3 predictors and 2 make perfectly
matching calls and 1 makes no call."


On 3/10/14, 9:16 AM, "Carson Holt" <carsonhh at gmail.com> wrote:

>There is a value called abAED being calculated, which somewhat captures
>the concordance among the predictors.  It is not currently printed in the
>GFF3, but it is used to identify the best non-overlapping ab initio
>predictor to put in the non-overlapping fasta file.  There are a couple of
>things I still need to do with it to though.  It?s not yet normalized to
>take into account the absence of a predictor in the cluster of overlapping
>predictions. For example, if I have 2 predictors and 2 make perfectly
>matching calls and 1 makes no call, they get a score of 0 before I have
>perfect concordance between what?s there, but I really should make it 0.33
>because the abscence of the third predictor is meaningful.  The
>unnormalized concordance value is fine for deciding which overlapping
>model to keep in the file, but not for global comparison.
>
>?Carson
>
>
>
>On 3/10/14, 8:08 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
>wrote:
>
>>Ok. But that is not implemented no as far as I can tell from the source,
>>right? Or is it reflected in the AED for the unsupported models?
>>
>>Mikael
>>
>>10 mar 2014 kl. 16:59 skrev Carson Holt <carsonhh at gmail.com>:
>>
>>> Yes.  It will eventually perform an AED like calculation between
>>>multiple
>>> predictors (i.e. if you use 3 predictors it, then you require support
>>>by
>>> at least 2 predictors across all exons to get a value of 0.33).  A
>>>value
>>> of 0 would be perfect concordance across all 3 predictors.
>>> 
>>> ?Carson
>>> 
>>> 
>>> 
>>> 
>>> On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling"
>>><mikael.durling at slu.se>
>>> wrote:
>>> 
>>>> Hi Carson and Daniel,
>>>> 
>>>> That sounds more logical to me.  Then it would be appropriate to
>>>>change
>>>> the comment of keep_preds in the generated config files.
>>>> 
>>>> Would it make sense to make keep_preds a non-binary value to evaluate
>>>>the
>>>> concordance between ab initio models obtained from different
>>>>predictors?
>>>> That would assume that it is less likely to be a false positive when
>>>>two
>>>> or more predictors suggest the same unsported model?
>>>> 
>>>> Mikael
>>>> 
>>>> 
>>>> 10 mar 2014 kl. 16:51 skrev Carson Holt <carsonhh at gmail.com>:
>>>> 
>>>>> Actually that is false. The keep_preds option is still binary.  Any
>>>>> value
>>>>> other than 0 sets it to true.  There was discussion about making it a
>>>>> non-binary value, but that has not been implemented.
>>>>> 
>>>>> ?Carson
>>>>> 
>>>>> 
>>>>> On 3/10/14, 7:47 AM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>>>> 
>>>>>> Hi Mikael, 
>>>>>> 
>>>>>> The keep_preds parameter is often used the same as a binary
>>>>>>parameter,
>>>>>> but it doesn't have to be. The concordance that is mentioned in the
>>>>>> comment line is the AED for that prediction. AED is a measurement of
>>>>>> how
>>>>>> well a prediction is supported by the evidence and ranges from 0 -
>>>>>>1. A
>>>>>> prediction with an AED of 0 matches the evidence exactly while a
>>>>>> prediction with an AED of 1 isn't overlapped by any evidence.
>>>>>> 
>>>>>> The default behavior for MAKER is to make a gene model out of a
>>>>>> prediction with any AED <1. When you change the keep_preds option
>>>>>>from
>>>>>> 0
>>>>>> to 1, then MAKER will make a gene model out of any prediction that
>>>>>> matches the other parameters (like single_exon, min_exon, etc).
>>>>>>Setting
>>>>>> the keep_preds option to somewhere in between 0 and 1 will set a
>>>>>> ceiling
>>>>>> on the AED required for promoting a prediction to a gene model.
>>>>>> 
>>>>>> From a user standpoint, when you will almost certainly lose gene
>>>>>>models
>>>>>> when you set AED at an intermediate value, but you might benefit by
>>>>>> knowing that all your models will now have an AED of at least a
>>>>>>certain
>>>>>> value. 
>>>>>> 
>>>>>> I hope that helps; let me know if it didn't.
>>>>>> 
>>>>>> ~Daniel
>>>>>> 
>>>>>> PS The original paper that described the AED is Eilbeck et al in BMC
>>>>>> Bioinformatics 2009. It's also discussed in more detail in the
>>>>>>MAKER2
>>>>>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews
>>>>>> Genetics paper from 2012.
>>>>>> 
>>>>>> Daniel Ence
>>>>>> Graduate Student
>>>>>> Eccles Institute of Human Genetics
>>>>>> University of Utah
>>>>>> 15 North 2030 East, Room 2100
>>>>>> Salt Lake City, UT 84112-5330
>>>>>> ________________________________________
>>>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>>>>>> Mikael Brandstr?m Durling [mikael.durling at slu.se]
>>>>>> Sent: Monday, March 10, 2014 4:27 AM
>>>>>> To: maker-devel at yandell-lab.org
>>>>>> Subject: [maker-devel] keep_preds values
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Can someone, please, explain the keep_preds parameter, as it works
>>>>>>now
>>>>>> with a value between 1 and 0? It used to be binary, but now it seems
>>>>>>to
>>>>>> test concordance towards something. The maker wiki doesn?t explain
>>>>>>it
>>>>>> any
>>>>>> further either.
>>>>>> 
>>>>>> Thanks,
>>>>>> Mikael
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> maker-devel mailing list
>>>>>> maker-devel at box290.bluehost.com
>>>>>> 
>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o
>>>>>>r
>>>>>>g
>>>>>> 
>>>>>> _______________________________________________
>>>>>> maker-devel mailing list
>>>>>> maker-devel at box290.bluehost.com
>>>>>> 
>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o
>>>>>>r
>>>>>>g
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>
>
>


From carsonhh at gmail.com  Mon Mar 10 10:25:50 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Mar 2014 09:25:50 -0700
Subject: [maker-devel] annotation comparison aed plots
Message-ID: <CF4330EC.A9DA%carsonhh@gmail.com>

I don?t know about Michaels?s script, but I?ve always used eval.  It
produces sensitivity/specificity metrics.  It assumes the first models are
100% correct, and then tells you the sensitivity/specificity value for the
second models.

It is not therefor a quality metric.  Instead you should view it as a change
metric. Lower sensitivity tells you that models/exons have been lost between
versions, and lower specificity tells you models/exons have been gained.
There will also be a lost of generic statistics on exon/intron distribution
and UTR length.  Then the AED values from the MAEKR run can be used
independently to evaluate how well models match the evidence.

?Carson


From:  "Robert King (RRes-Roth)" <robert.king at rothamsted.ac.uk>
Date:  Monday, March 10, 2014 at 5:17 AM
To:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  [maker-devel] annotation comparison aed plots

Dear Maker Developers,
 
I?ve updated a reference that was had errors and was a little incomplete and
now trying to produce a annotation for it. Please note the reference has not
changed dramatically. I?ve produced two annotations using as evidence:
 
Annotation 1:
Uniprot proteins search using species keyword ?fusarium?
Pubmed mRNA for the name of the organism
Prior annotation reference transcripts
 
Annotation 2:
Uniprot proteins search using species keyword ?fusarium?
Pubmed mRNA for the name of the organism
Prior annotation reference transcripts
mRNA trinity assembly pasafly of different strain (only RNA-seq available)
 
I?m not sure if it was a smart move to use the prior annotation reference
transcripts?
 
I want to compare these two annotations and have produced AED scores. How do
I generate summary stats/figures to compare annotations. You mentioned last
year in a post Mike Campbell has a script to produce these, do you know if
he will post it? I?ve got the Eval program and converted to gtf format using
the provided script, just waiting on some perl modules to be installed by
admin to test it. I?m waiting on some perl modules to be installed by our
administrator to test out the ?Evaluator? and ?compare? programs too, what
do they do?
 
Best Wishes
Rob

-- 
This message has been scanned for viruses and
dangerous content by MailScanner <http://www.mailscanner.info/> , and
we believe  but do not warrant that this e-mail and any attachments thereto
do not contain any viruses. However, you are fully responsible for
performing any virus scanning.
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140310/cbd8263c/attachment-0002.html>

From michael.s.campbell1 at gmail.com  Mon Mar 10 09:50:53 2014
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Mon, 10 Mar 2014 09:50:53 -0600
Subject: [maker-devel] annotation comparison aed plots
In-Reply-To: <CAAi6vWVWuP4b39zf+3k_SAwKuWxAFGRvAD3oNCugkuPLjagOww@mail.gmail.com>
References: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk>
	<CAAi6vWVWuP4b39zf+3k_SAwKuWxAFGRvAD3oNCugkuPLjagOww@mail.gmail.com>
Message-ID: <CAAi6vWUSY6UgyyXAJ5=-aUA_39FBwFREVX3xmeHSZaE264AKGw@mail.gmail.com>

One more point. The sensitivity, specificity,and accuracy produced by the
compare_annotations_3.2.pl script are gene level, and overlap is defined
very liberally between annotation sets is defined as at least one
nucleotide of an exon overlap.
Mike


On Mon, Mar 10, 2014 at 9:47 AM, Michael Campbell <
michael.s.campbell1 at gmail.com> wrote:

> Hi Robert,
>
> Here are the scripts that were mentioned before.
>
> The AED_cdf_generator.pl script is for making cumulative distribution
> function plots based on annotation edit distance. This script is quite
> simple and strait forward in its internals.
>
> The compare_annotations_3.2.pl script is for generating summary stats for
> annotations and will compare two annotations of the same assembly.
>
> You can run either script without arguments to get a usage statement.
>
> Thanks,
> Mike
>
>
> On Mon, Mar 10, 2014 at 6:17 AM, Robert King (RRes-Roth) <
> robert.king at rothamsted.ac.uk> wrote:
>
>>  Dear Maker Developers,
>>
>>
>>
>> I've updated a reference that was had errors and was a little incomplete
>> and now trying to produce a annotation for it. Please note the reference
>> has not changed dramatically. I've produced two annotations using as
>> evidence:
>>
>>
>>
>> Annotation 1:
>>
>> Uniprot proteins search using species keyword "fusarium"
>>
>> Pubmed mRNA for the name of the organism
>>
>> Prior annotation reference transcripts
>>
>>
>>
>> Annotation 2:
>>
>> Uniprot proteins search using species keyword "fusarium"
>>
>> Pubmed mRNA for the name of the organism
>>
>> Prior annotation reference transcripts
>>
>> mRNA trinity assembly pasafly of different strain (only RNA-seq available)
>>
>>
>>
>> I'm not sure if it was a smart move to use the prior annotation reference
>> transcripts?
>>
>>
>>
>> I want to compare these two annotations and have produced AED scores. How
>> do I generate summary stats/figures to compare annotations. You mentioned
>> last year in a post Mike Campbell has a script to produce these, do you
>> know if he will post it? I've got the Eval program and converted to gtf
>> format using the provided script, just waiting on some perl modules to be
>> installed by admin to test it. I'm waiting on some perl modules to be
>> installed by our administrator to test out the "Evaluator" and "compare"
>> programs too, what do they do?
>>
>>
>>
>> Best Wishes
>>
>> Rob
>>
>> --
>> This message has been scanned for viruses and
>> dangerous content by *MailScanner* <http://www.mailscanner.info/>, and
>> we believe but do not warrant that this e-mail and any attachments
>> thereto do not contain any viruses. However, you are fully responsible for
>> performing any virus scanning.
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>
>
> --
> Michael Campbell MS, RD.
> Doctoral Candidate
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ph:585-3543
>
>


-- 
Michael Campbell MS, RD.
Doctoral Candidate
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
ph:585-3543
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140310/25073390/attachment-0002.html>

From cjfields at illinois.edu  Mon Mar 10 09:52:50 2014
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Mon, 10 Mar 2014 15:52:50 +0000
Subject: [maker-devel] geneid (or alternative ab initio predictors)
Message-ID: <CEB024AC-5E08-4827-9EC4-17D09F06E1FA@illinois.edu>

I have been running MAKER 2.31 using Augustus and SNAP on an avian genome.  Augustus gives pretty decent gene model predictions based on a custom model we have and the hints MAKER provides.  However, SNAP seems to throw out a ton of false positives; in many cases this appears to cause erroneous gene fusions.  Leaving out SNAP altogether however leads to a marked decrease in # models overall, which is worse.  GeneMark had a very similar problem (high # false positives) and thus no marked improvement, either when using with both Augustus and SNAP or with Augustus alone.

I have been exploring using geneid (http://genome.crg.es/software/geneid/) as an alternative, based on some feedback on another project I worked with int he past.  This would be feed into MAKER using external GFF, but I wanted to see if anyone has tried geneid with MAKER first.  

Finally, how hard would it be to incorporate alternative callers into MAKER?  For instance, would it be possible to add these like a ?plugin??  

chris


From carsonhh at gmail.com  Mon Mar 10 11:05:24 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Mar 2014 10:05:24 -0700
Subject: [maker-devel] geneid (or alternative ab initio predictors)
Message-ID: <CF433C40.AA26%carsonhh@gmail.com>

Adding a new predictor can take some time.  It obviously requires some
coding.  It?s usually not too hard just to convert results to GFF3 and
then pass it in.  Integrated support is really only beneficial for
predictors that can take ?hints? from evidence alignments (for example we
are working on EVM integration right now -
http://evidencemodeler.sourceforge.net).  If SNAP and GeneMark give
problems just drop them.  GeneMark really doesn?t work very good on
genomes with complex intron/exon structure (and I really wouldn?t use it
for anything but fungi).

Make sure you are also giving sufficient protein evidence.  Perhaps all
proteins from chicken and pigeon for example.  Then you shouldn?t find
loss of any true genes if just using Augustus.  Also try not to use gene
count as an indicator of performance.  The value is very deceptive,
especially if the genome assembly is fragmented.

Thanks,
Carson


On 3/10/14, 8:52 AM, "Fields, Christopher J" <cjfields at illinois.edu> wrote:

>I have been running MAKER 2.31 using Augustus and SNAP on an avian
>genome.  Augustus gives pretty decent gene model predictions based on a
>custom model we have and the hints MAKER provides.  However, SNAP seems
>to throw out a ton of false positives; in many cases this appears to
>cause erroneous gene fusions.  Leaving out SNAP altogether however leads
>to a marked decrease in # models overall, which is worse.  GeneMark had a
>very similar problem (high # false positives) and thus no marked
>improvement, either when using with both Augustus and SNAP or with
>Augustus alone.
>
>I have been exploring using geneid
>(http://genome.crg.es/software/geneid/) as an alternative, based on some
>feedback on another project I worked with int he past.  This would be
>feed into MAKER using external GFF, but I wanted to see if anyone has
>tried geneid with MAKER first.
>
>Finally, how hard would it be to incorporate alternative callers into
>MAKER?  For instance, would it be possible to add these like a ?plugin??
>
>chris
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From michael.s.campbell1 at gmail.com  Mon Mar 10 09:47:50 2014
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Mon, 10 Mar 2014 09:47:50 -0600
Subject: [maker-devel] annotation comparison aed plots
In-Reply-To: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk>
References: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk>
Message-ID: <CAAi6vWVWuP4b39zf+3k_SAwKuWxAFGRvAD3oNCugkuPLjagOww@mail.gmail.com>

Hi Robert,

Here are the scripts that were mentioned before.

The AED_cdf_generator.pl script is for making cumulative distribution
function plots based on annotation edit distance. This script is quite
simple and strait forward in its internals.

The compare_annotations_3.2.pl script is for generating summary stats for
annotations and will compare two annotations of the same assembly.

You can run either script without arguments to get a usage statement.

Thanks,
Mike


On Mon, Mar 10, 2014 at 6:17 AM, Robert King (RRes-Roth) <
robert.king at rothamsted.ac.uk> wrote:

>  Dear Maker Developers,
>
>
>
> I've updated a reference that was had errors and was a little incomplete
> and now trying to produce a annotation for it. Please note the reference
> has not changed dramatically. I've produced two annotations using as
> evidence:
>
>
>
> Annotation 1:
>
> Uniprot proteins search using species keyword "fusarium"
>
> Pubmed mRNA for the name of the organism
>
> Prior annotation reference transcripts
>
>
>
> Annotation 2:
>
> Uniprot proteins search using species keyword "fusarium"
>
> Pubmed mRNA for the name of the organism
>
> Prior annotation reference transcripts
>
> mRNA trinity assembly pasafly of different strain (only RNA-seq available)
>
>
>
> I'm not sure if it was a smart move to use the prior annotation reference
> transcripts?
>
>
>
> I want to compare these two annotations and have produced AED scores. How
> do I generate summary stats/figures to compare annotations. You mentioned
> last year in a post Mike Campbell has a script to produce these, do you
> know if he will post it? I've got the Eval program and converted to gtf
> format using the provided script, just waiting on some perl modules to be
> installed by admin to test it. I'm waiting on some perl modules to be
> installed by our administrator to test out the "Evaluator" and "compare"
> programs too, what do they do?
>
>
>
> Best Wishes
>
> Rob
>
> --
> This message has been scanned for viruses and
> dangerous content by *MailScanner* <http://www.mailscanner.info/>, and
> we believe but do not warrant that this e-mail and any attachments thereto
> do not contain any viruses. However, you are fully responsible for
> performing any virus scanning.
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>


-- 
Michael Campbell MS, RD.
Doctoral Candidate
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
ph:585-3543
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140310/e21497bc/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: AED_cdf_generator.pl
Type: text/x-perl-script
Size: 2580 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140310/e21497bc/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: compare_annotations_3.2.pl
Type: text/x-perl-script
Size: 29155 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140310/e21497bc/attachment-0005.bin>

From sajeet at gmail.com  Mon Mar 10 12:31:40 2014
From: sajeet at gmail.com (Sajeet Haridas)
Date: Mon, 10 Mar 2014 11:31:40 -0700
Subject: [maker-devel] geneid (or alternative ab initio predictors)
In-Reply-To: <CF433C40.AA26%carsonhh@gmail.com>
References: <CF433C40.AA26%carsonhh@gmail.com>
Message-ID: <CAJrwUqnXZgJgse2X6z7QhQcC_aNih_dp90dpjsW037F0Qk-W4A@mail.gmail.com>

One of the problems I have found with genemark is that it does not
understand a soft-masked genome. Hence, the self training is incorrect. I
have found marked improvement to genemark's prediction by running the
training on a hard masked genome.


On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt <carsonhh at gmail.com> wrote:

> Adding a new predictor can take some time.  It obviously requires some
> coding.  It's usually not too hard just to convert results to GFF3 and
> then pass it in.  Integrated support is really only beneficial for
> predictors that can take "hints" from evidence alignments (for example we
> are working on EVM integration right now -
> http://evidencemodeler.sourceforge.net).  If SNAP and GeneMark give
> problems just drop them.  GeneMark really doesn't work very good on
> genomes with complex intron/exon structure (and I really wouldn't use it
> for anything but fungi).
>
> Make sure you are also giving sufficient protein evidence.  Perhaps all
> proteins from chicken and pigeon for example.  Then you shouldn't find
> loss of any true genes if just using Augustus.  Also try not to use gene
> count as an indicator of performance.  The value is very deceptive,
> especially if the genome assembly is fragmented.
>
> Thanks,
> Carson
>
>
>
> On 3/10/14, 8:52 AM, "Fields, Christopher J" <cjfields at illinois.edu>
> wrote:
>
> >I have been running MAKER 2.31 using Augustus and SNAP on an avian
> >genome.  Augustus gives pretty decent gene model predictions based on a
> >custom model we have and the hints MAKER provides.  However, SNAP seems
> >to throw out a ton of false positives; in many cases this appears to
> >cause erroneous gene fusions.  Leaving out SNAP altogether however leads
> >to a marked decrease in # models overall, which is worse.  GeneMark had a
> >very similar problem (high # false positives) and thus no marked
> >improvement, either when using with both Augustus and SNAP or with
> >Augustus alone.
> >
> >I have been exploring using geneid
> >(http://genome.crg.es/software/geneid/) as an alternative, based on some
> >feedback on another project I worked with int he past.  This would be
> >feed into MAKER using external GFF, but I wanted to see if anyone has
> >tried geneid with MAKER first.
> >
> >Finally, how hard would it be to incorporate alternative callers into
> >MAKER?  For instance, would it be possible to add these like a 'plugin'?
> >
> >chris
> >_______________________________________________
> >maker-devel mailing list
> >maker-devel at box290.bluehost.com
> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140310/e3f33e33/attachment-0002.html>

From carsonhh at gmail.com  Mon Mar 10 22:13:43 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Mar 2014 22:13:43 -0600
Subject: [maker-devel] Long introns from Augustus
In-Reply-To: <61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com>
References: <CF3E5643.A94C%carsonhh@gmail.com>
	<61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com>
Message-ID: <99883695-A1E7-4B03-BB8D-06863D8132E5@gmail.com>

Maybe.  The max intron length will affect evidence alignments and clustering, which will be used as hints to Augustus. You can give it a try. If you lack transcriptome data, just make sure you provide it with a couple of related proteomes.

--Carson

Sent from my iPhone

> On Mar 6, 2014, at 5:48 PM, Shane Brubaker <sbrubaker at solazyme.com> wrote:
> 
> Actually these are calls directly from Augustus (without using Maker).  They are not purely ab initio in that they are using hints from RNA-Seq data.
> 
> I had noticed that Maker does have some information about max intron length - does that mean it could be taken care of by Maker?  I don't have very good "EST" (transcriptome) assemblies because it is a very difficult organism to sequence.
> 
> 
> -----Original Message-----
> From: Carson Holt [mailto:carsonhh at gmail.com] 
> Sent: Thursday, March 06, 2014 3:47 PM
> To: Shane Brubaker; maker-devel at yandell-lab.org
> Subject: Re: [maker-devel] Long introns from Augustus
> 
> Are these the ab intio calls that are merged or final MAKER models.
> 
> ?Carson
> 
> 
>> On 3/6/14, 4:41 PM, "Shane Brubaker" <sbrubaker at solazyme.com> wrote:
>> 
>> Hi, we have a very compact genome and we are getting a lot of fused 
>> gene models from running Augustus.  I am wondering if anyone has any 
>> advice about how to prevent introns above a certain cutoff from being created?
>> 
>> I tried a couple of things, some settings in a probabilities file and 
>> also changing a long list of probabilities to another file that someone 
>> had suggested on a forum.  So far I don't really see any changes though.
>> 
>> Any advice would be greatly appreciated.
>> 
>> Thanks,
>> Shane
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> 


From darasappan at gmail.com  Mon Mar 10 14:14:03 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Mon, 10 Mar 2014 15:14:03 -0500
Subject: [maker-devel] maker output- transcripts.fasta and proteins.fasta
	files missing
Message-ID: <E1538E4B-E356-4044-BD96-6D56D6F65C87@gmail.com>

Hello,

I've been running maker with different assembly files, reference files  
etc  and I check the output by:

1. concatenating the gff files
2. concatenating the *transcripts.fasta files
3. concatenating the *proteins.fasta files

I'm noticing that when I ran maker twice with same parameters, the  
second time around, many of the output subdirectories  do not have a  
*transcripts.fasta or *proteins.fasta file in it.
There are 251 subdirectories and only 97 of them have all 3 output  
files.  Maker log looks ok to me, but I've attached it here as well.

What could be the reason for this?

Thanks
dhivya

-------------- next part --------------
A non-text attachment was scrubbed...
Name: maker.o1813247.gz
Type: application/x-gzip
Size: 13857217 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140310/34f3a118/attachment.tgz>
-------------- next part --------------


From sbrubaker at solazyme.com  Tue Mar 11 11:06:57 2014
From: sbrubaker at solazyme.com (Shane Brubaker)
Date: Tue, 11 Mar 2014 17:06:57 +0000
Subject: [maker-devel] Long introns from Augustus
In-Reply-To: <99883695-A1E7-4B03-BB8D-06863D8132E5@gmail.com>
References: <CF3E5643.A94C%carsonhh@gmail.com>
	<61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com>
	<99883695-A1E7-4B03-BB8D-06863D8132E5@gmail.com>
Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA50F08FB3@EXCHANGE-MB01.internal.solazyme.com>

Ok thank you.

-----Original Message-----
From: Carson Holt [mailto:carsonhh at gmail.com] 
Sent: Monday, March 10, 2014 9:14 PM
To: Shane Brubaker
Cc: maker-devel at yandell-lab.org
Subject: Re: [maker-devel] Long introns from Augustus

Maybe.  The max intron length will affect evidence alignments and clustering, which will be used as hints to Augustus. You can give it a try. If you lack transcriptome data, just make sure you provide it with a couple of related proteomes.

--Carson

Sent from my iPhone

> On Mar 6, 2014, at 5:48 PM, Shane Brubaker <sbrubaker at solazyme.com> wrote:
> 
> Actually these are calls directly from Augustus (without using Maker).  They are not purely ab initio in that they are using hints from RNA-Seq data.
> 
> I had noticed that Maker does have some information about max intron length - does that mean it could be taken care of by Maker?  I don't have very good "EST" (transcriptome) assemblies because it is a very difficult organism to sequence.
> 
> 
> -----Original Message-----
> From: Carson Holt [mailto:carsonhh at gmail.com]
> Sent: Thursday, March 06, 2014 3:47 PM
> To: Shane Brubaker; maker-devel at yandell-lab.org
> Subject: Re: [maker-devel] Long introns from Augustus
> 
> Are these the ab intio calls that are merged or final MAKER models.
> 
> ?Carson
> 
> 
>> On 3/6/14, 4:41 PM, "Shane Brubaker" <sbrubaker at solazyme.com> wrote:
>> 
>> Hi, we have a very compact genome and we are getting a lot of fused 
>> gene models from running Augustus.  I am wondering if anyone has any 
>> advice about how to prevent introns above a certain cutoff from being created?
>> 
>> I tried a couple of things, some settings in a probabilities file and 
>> also changing a long list of probabilities to another file that 
>> someone had suggested on a forum.  So far I don't really see any changes though.
>> 
>> Any advice would be greatly appreciated.
>> 
>> Thanks,
>> Shane
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o
>> rg
> 
> 

From carson.holt at genetics.utah.edu  Thu Mar 13 10:00:06 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Thu, 13 Mar 2014 16:00:06 +0000
Subject: [maker-devel] non-nucleotide characters in the maker generated
	transcripts
In-Reply-To: <CF47300B.AB4F%carson.holt@genetics.utah.edu>
References: <E8EDFB90D92694478065C37017B3A3A6A890C8AC@SKREGIXES2.AGR.GC.CA>
	<CF47300B.AB4F%carson.holt@genetics.utah.edu>
Message-ID: <CF4731CC.AB5E%carson.holt@genetics.utah.edu>

Just resending this to the correct maker-devel address.  Please when
replying, do not CC the incorrect maker-devel-bounce address.

Thanks,
Carson


On 3/13/14, 9:56 AM, "Carson Holt" <carson.holt at genetics.utah.edu> wrote:

>FGENESH is not a heavily used tool, so depending on which version it is
>(either too old or too new), output might be slightly different which
>could cause incorrect parsing. Could you tar up your maker.output folder,
>and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>(send me either your user/guest ID after you upload).
>
>For the BLAST error, use BLAST+ instead.  You are using blastall which is
>the old legacy version of NCBI BLAST.  You can do this by setting the
>blast type in maker_bopts.ctl and the location of executables in
>maker_exe.ctl.
>
>Thanks,
>Carson
>
>
>
>On 3/12/14, 11:58 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:
>
>>Dear Maker users
>>
>>
>>I ran maker (2.31) on a fungal genome and found out that it inserted the
>>word SCLAR   followed by a pair of bracket like this (0x22de7020)
>>inserted in the nucleotide sequence of some of the genes. This seems to
>>be related to transcripts predicted by fgenesh_masked.
>>
>>
>>Here is an example for one of the genes
>>
>>
>>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript
>>>offset:0 AE
>>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651
>>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23
>>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA
>>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG
>>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC
>>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT
>>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC
>>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT
>>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA
>>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA
>>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT
>>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT
>>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC
>>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG
>>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG
>>TTTCGACAAGC
>>
>>The same genome sequence was used for the first round of maker (2.10)
>>without such problem. I checked the sequence for the scaffold related to
>>one of the affected transcripts and there was no error in the sequence.
>>I am not sure what is causing this. The only error that I could spot in
>>the output error file is the following
>>
>>
>>[blastall] FATAL ERROR:  search cannot proceed due to errors in all
>>contexts/frames of query sequences.
>>
>>
>>
>>Your help is appreciated
>>
>>
>>
>>HB
>>
>>
>>
>>
>>
>>
>


From carsonhh at gmail.com  Thu Mar 13 10:14:54 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 13 Mar 2014 10:14:54 -0600
Subject: [maker-devel] maker output- transcripts.fasta and
	proteins.fasta files missing
In-Reply-To: <A22880BB-7693-4655-A3F3-D99D4F1FC08D@gmail.com>
References: <E1538E4B-E356-4044-BD96-6D56D6F65C87@gmail.com>
	<64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com>
	<CF4382AA.AA8B%carsonhh@gmail.com>
	<A1D096BC-F25A-48D9-8C7F-8A64946E57F7@gmail.com>
	<CF438653.AA92%carsonhh@gmail.com>
	<A22880BB-7693-4655-A3F3-D99D4F1FC08D@gmail.com>
Message-ID: <CF4733ED.AB63%carsonhh@gmail.com>

Note protein/transcript fasts are only created when there are gene models to
output to those files (so their absence means there were no gene models for
that contig). Most sequences without protein/transcript fasts in your sample
are very short and thus don?t contain anything.  What is left either have no
est2genome results or the est2genome alignments do not have sufficient open
reading frame to be turned into a gene model (false merging of regions by
trinity can cause this, so make sure you use the jaccard index option when
assembling reads with trinity to avoid this).

You are using only the est2genome=1 option.  This will result in a limited
set of genes that can be used for training SNAP/Augustus (so not getting
results on all contigs is expected).  You really won?t get much as far as
results until you have one of the ab initio predictors turned on.

Thanks,
Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Tuesday, March 11, 2014 at 8:52 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Daniel Ence <dence at genetics.utah.edu>
Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
missing

Alright done. My username is daras

Thanks
Dhivya

On Mar 10, 2014, at 5:10 PM, Carson Holt wrote:

> Input and compressed file of output.
> 
> Thanks,
> Carson
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Monday, March 10, 2014 at 2:09 PM
> To:  Carson Holt <carsonhh at gmail.com>
> Cc:  Daniel Ence <dence at genetics.utah.edu>
> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files missing
> 
> Hi Carson,
> 
> Do you mean the whole maker output?
> 
> Thanks
> dhivya
> 
> On Mar 10, 2014, at 4:55 PM, Carson Holt wrote:
> 
>> Could you upload everything here ?>
>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>> 
>> Than send us the link generated or your user ID.
>> 
>> Thanks,
>> Carson
>> 
>> 
>> 
>> From:  dhivya arasappan <darasappan at gmail.com>
>> Date:  Monday, March 10, 2014 at 1:50 PM
>> To:  Carson Holt <carsonhh at gmail.com>, Daniel Ence <dence at genetics.utah.edu>
>> Subject:  Fwd: maker output- transcripts.fasta and proteins.fasta files
>> missing
>> 
>> Hi Carson and Daniel,
>> 
>> I'm sending this across to you separately since maker list is blocking my
>> email due to attachment size.
>> 
>> As always, thanks for any guidance you can provide.
>> Dhivya
>> 
>> 
>> Begin forwarded message:
>> 
>>> From: dhivya arasappan <darasappan at gmail.com>
>>> Date: March 10, 2014 3:14:03 PM CDT
>>> To: maker-devel at yandell-lab.org
>>> Subject: maker output- transcripts.fasta and proteins.fasta files missing
>>> 
>>>  
>>> Hello,
>>> 
>>> I've been running maker with different assembly files, reference files etc
>>> and I check the output by:
>>> 
>>> 1. concatenating the gff files
>>> 2. concatenating the *transcripts.fasta files
>>> 3. concatenating the *proteins.fasta files
>>> 
>>> I'm noticing that when I ran maker twice with same parameters, the second
>>> time around, many of the output subdirectories  do not have a
>>> *transcripts.fasta or *proteins.fasta file in it.
>>> There are 251 subdirectories and only 97 of them have all 3 output files.
>>> Maker log looks ok to me, but I've attached it here as well.
>>> 
>>> What could be the reason for this?
>>> 
>>> Thanks
>>> dhivya
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140313/1484b1b6/attachment-0002.html>

From carsonhh at gmail.com  Thu Mar 13 10:55:40 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 13 Mar 2014 10:55:40 -0600
Subject: [maker-devel] maker output- transcripts.fasta and
	proteins.fasta files missing
In-Reply-To: <0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com>
References: <E1538E4B-E356-4044-BD96-6D56D6F65C87@gmail.com>
	<64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com>
	<CF4382AA.AA8B%carsonhh@gmail.com>
	<A1D096BC-F25A-48D9-8C7F-8A64946E57F7@gmail.com>
	<CF438653.AA92%carsonhh@gmail.com>
	<A22880BB-7693-4655-A3F3-D99D4F1FC08D@gmail.com>
	<CF4733ED.AB63%carsonhh@gmail.com>
	<0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com>
Message-ID: <CF473DBA.AB9F%carsonhh@gmail.com>

The second time, it should have just started where it left off, so it would
run faster (because the processing from the previous job counted towards the
second one).  The archived output you sent me had 21,183 proteins and
transcripts.  If you are using the fasta_merge to collect them, just make
sure the datastore.index file is not truncated or corrupt otherwise it won?t
collect all the fastas from every contig.  You can rebuild the
datastore.index using the -dsindex flag with MAKER, if you want to check
that.  Also you can have maker just regenerate results without rerunning
BLAST etc., by using the -a flag if you want to just recalculate ll results
quickly (rebuilds all FASTA and GFF3 without redoing most analysis).

?Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Thursday, March 13, 2014 at 10:47 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
missing

Thanks Carson for the response.  I understand that est2genome=1 does not use
any ab initio gene predictions, but simply identifies ests based on
alignment.  I'm a little confused because I ran maker on my assembly before,
using the same parameters ( including est2genome=1).  I got a very good
result with > 20,000 transcripts and proteins.

Then I  was able to get an improved assembly, where many scaffolds were
combined into superscaffolds. So I reran maker on this assembly.   Same
parameters, same transcriptome and proteins files.  Now, I see such
drastically different results:  Only 500+ genes and transcripts.  My
scaffolds are now bigger than before, so I'm not sure how this is happening.
These were the results I sent you.

Another odd thing I noticed (and I am hesitant to report this because
perhaps it is due to some sort of error on my part):  I ran maker on the
improved assembly the first time and maker did not complete in the 48 hours
I allocated.  But I had  19,000+ transcripts in the unfinished output.  When
I reran maker, just changing the time allocated, it completed much faster,
but is giving much fewer transcripts and proteins as output.  Could
something like this happen? If not, then I'm guessing I must have changed
something although I'm pretty sure that I did not change anything other than
the time allocated. I've attached the trascripts and proteins files from the
first time I ran maker on my improved assembly.

Thanks again for your help
Dhivya


On Mar 13, 2014, at 11:14 AM, Carson Holt wrote:

> Note protein/transcript fasts are only created when there are gene models to
> output to those files (so their absence means there were no gene models for
> that contig). Most sequences without protein/transcript fasts in your sample
> are very short and thus don?t contain anything.  What is left either have no
> est2genome results or the est2genome alignments do not have sufficient open
> reading frame to be turned into a gene model (false merging of regions by
> trinity can cause this, so make sure you use the jaccard index option when
> assembling reads with trinity to avoid this).
> 
> You are using only the est2genome=1 option.  This will result in a limited set
> of genes that can be used for training SNAP/Augustus (so not getting results
> on all contigs is expected).  You really won?t get much as far as results
> until you have one of the ab initio predictors turned on.
> 
> Thanks,
> Carson
> 
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Tuesday, March 11, 2014 at 8:52 AM
> To:  Carson Holt <carsonhh at gmail.com>
> Cc:  Daniel Ence <dence at genetics.utah.edu>
> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files missing
> 
> Alright done. My username is daras
> 
> Thanks
> Dhivya
> 
> On Mar 10, 2014, at 5:10 PM, Carson Holt wrote:
> 
>> Input and compressed file of output.
>> 
>> Thanks,
>> Carson
>> 
>> From:  dhivya arasappan <darasappan at gmail.com>
>> Date:  Monday, March 10, 2014 at 2:09 PM
>> To:  Carson Holt <carsonhh at gmail.com>
>> Cc:  Daniel Ence <dence at genetics.utah.edu>
>> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
>> missing
>> 
>> Hi Carson,
>> 
>> Do you mean the whole maker output?
>> 
>> Thanks
>> dhivya
>> 
>> On Mar 10, 2014, at 4:55 PM, Carson Holt wrote:
>> 
>>> Could you upload everything here ?>
>>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>>> 
>>> Than send us the link generated or your user ID.
>>> 
>>> Thanks,
>>> Carson
>>> 
>>> 
>>> 
>>> From:  dhivya arasappan <darasappan at gmail.com>
>>> Date:  Monday, March 10, 2014 at 1:50 PM
>>> To:  Carson Holt <carsonhh at gmail.com>, Daniel Ence <dence at genetics.utah.edu>
>>> Subject:  Fwd: maker output- transcripts.fasta and proteins.fasta files
>>> missing
>>> 
>>> Hi Carson and Daniel,
>>> 
>>> I'm sending this across to you separately since maker list is blocking my
>>> email due to attachment size.
>>> 
>>> As always, thanks for any guidance you can provide.
>>> Dhivya
>>> 
>>> 
>>> Begin forwarded message:
>>> 
>>>> From: dhivya arasappan <darasappan at gmail.com>
>>>> Date: March 10, 2014 3:14:03 PM CDT
>>>> To: maker-devel at yandell-lab.org
>>>> Subject: maker output- transcripts.fasta and proteins.fasta files missing
>>>> 
>>>>  
>>>> Hello,
>>>> 
>>>> I've been running maker with different assembly files, reference files etc
>>>> and I check the output by:
>>>> 
>>>> 1. concatenating the gff files
>>>> 2. concatenating the *transcripts.fasta files
>>>> 3. concatenating the *proteins.fasta files
>>>> 
>>>> I'm noticing that when I ran maker twice with same parameters, the second
>>>> time around, many of the output subdirectories  do not have a
>>>> *transcripts.fasta or *proteins.fasta file in it.
>>>> There are 251 subdirectories and only 97 of them have all 3 output files.
>>>> Maker log looks ok to me, but I've attached it here as well.
>>>> 
>>>> What could be the reason for this?
>>>> 
>>>> Thanks
>>>> dhivya
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140313/a1a879a2/attachment-0002.html>

From darasappan at gmail.com  Thu Mar 13 10:47:25 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Thu, 13 Mar 2014 11:47:25 -0500
Subject: [maker-devel] maker output- transcripts.fasta and
	proteins.fasta files missing
In-Reply-To: <CF4733ED.AB63%carsonhh@gmail.com>
References: <E1538E4B-E356-4044-BD96-6D56D6F65C87@gmail.com>
	<64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com>
	<CF4382AA.AA8B%carsonhh@gmail.com>
	<A1D096BC-F25A-48D9-8C7F-8A64946E57F7@gmail.com>
	<CF438653.AA92%carsonhh@gmail.com>
	<A22880BB-7693-4655-A3F3-D99D4F1FC08D@gmail.com>
	<CF4733ED.AB63%carsonhh@gmail.com>
Message-ID: <0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com>

Thanks Carson for the response.  I understand that est2genome=1 does  
not use any ab initio gene predictions, but simply identifies ests  
based on alignment.  I'm a little confused because I ran maker on my  
assembly before, using the same parameters ( including est2genome=1).   
I got a very good result with > 20,000 transcripts and proteins.

Then I  was able to get an improved assembly, where many scaffolds  
were combined into superscaffolds. So I reran maker on this  
assembly.   Same parameters, same transcriptome and proteins files.   
Now, I see such drastically different results:  Only 500+ genes and  
transcripts.  My scaffolds are now bigger than before, so I'm not sure  
how this is happening.   These were the results I sent you.

Another odd thing I noticed (and I am hesitant to report this because  
perhaps it is due to some sort of error on my part):  I ran maker on  
the improved assembly the first time and maker did not complete in the  
48 hours I allocated.  But I had  19,000+ transcripts in the  
unfinished output.  When I reran maker, just changing the time  
allocated, it completed much faster, but is giving much fewer  
transcripts and proteins as output.  Could something like this happen?  
If not, then I'm guessing I must have changed something although I'm  
pretty sure that I did not change anything other than the time  
allocated. I've attached the trascripts and proteins files from the  
first time I ran maker on my improved assembly.

Thanks again for your help
Dhivya


On Mar 13, 2014, at 11:14 AM, Carson Holt wrote:

> Note protein/transcript fasts are only created when there are gene  
> models to output to those files (so their absence means there were  
> no gene models for that contig). Most sequences without protein/ 
> transcript fasts in your sample are very short and thus don?t  
> contain anything.  What is left either have no est2genome results or  
> the est2genome alignments do not have sufficient open reading frame  
> to be turned into a gene model (false merging of regions by trinity  
> can cause this, so make sure you use the jaccard index option when  
> assembling reads with trinity to avoid this).
>
> You are using only the est2genome=1 option.  This will result in a  
> limited set of genes that can be used for training SNAP/Augustus (so  
> not getting results on all contigs is expected).  You really won?t  
> get much as far as results until you have one of the ab initio  
> predictors turned on.
>
> Thanks,
> Carson
>
>
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Tuesday, March 11, 2014 at 8:52 AM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: Daniel Ence <dence at genetics.utah.edu>
> Subject: Re: maker output- transcripts.fasta and proteins.fasta  
> files missing
>
> Alright done. My username is daras
>
> Thanks
> Dhivya
>
> On Mar 10, 2014, at 5:10 PM, Carson Holt wrote:
>
>> Input and compressed file of output.
>>
>> Thanks,
>> Carson
>>
>> From: dhivya arasappan <darasappan at gmail.com>
>> Date: Monday, March 10, 2014 at 2:09 PM
>> To: Carson Holt <carsonhh at gmail.com>
>> Cc: Daniel Ence <dence at genetics.utah.edu>
>> Subject: Re: maker output- transcripts.fasta and proteins.fasta  
>> files missing
>>
>> Hi Carson,
>>
>> Do you mean the whole maker output?
>>
>> Thanks
>> dhivya
>>
>> On Mar 10, 2014, at 4:55 PM, Carson Holt wrote:
>>
>>> Could you upload everything here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>>>
>>> Than send us the link generated or your user ID.
>>>
>>> Thanks,
>>> Carson
>>>
>>>
>>>
>>> From: dhivya arasappan <darasappan at gmail.com>
>>> Date: Monday, March 10, 2014 at 1:50 PM
>>> To: Carson Holt <carsonhh at gmail.com>, Daniel Ence <dence at genetics.utah.edu 
>>> >
>>> Subject: Fwd: maker output- transcripts.fasta and proteins.fasta  
>>> files missing
>>>
>>> Hi Carson and Daniel,
>>>
>>> I'm sending this across to you separately since maker list is  
>>> blocking my email due to attachment size.
>>>
>>> As always, thanks for any guidance you can provide.
>>> Dhivya
>>>
>>>
>>> Begin forwarded message:
>>>
>>>> From: dhivya arasappan <darasappan at gmail.com>
>>>> Date: March 10, 2014 3:14:03 PM CDT
>>>> To: maker-devel at yandell-lab.org
>>>> Subject: maker output- transcripts.fasta and proteins.fasta files  
>>>> missing
>>>>
>>>> Hello,
>>>>
>>>> I've been running maker with different assembly files, reference  
>>>> files etc  and I check the output by:
>>>>
>>>> 1. concatenating the gff files
>>>> 2. concatenating the *transcripts.fasta files
>>>> 3. concatenating the *proteins.fasta files
>>>>
>>>> I'm noticing that when I ran maker twice with same parameters,  
>>>> the second time around, many of the output subdirectories  do not  
>>>> have a *transcripts.fasta or *proteins.fasta file in it.
>>>> There are 251 subdirectories and only 97 of them have all 3  
>>>> output files.  Maker log looks ok to me, but I've attached it  
>>>> here as well.
>>>>
>>>> What could be the reason for this?
>>>>
>>>> Thanks
>>>> dhivya
>>>>
>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140313/2048cfef/attachment-0006.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: transcripts.cat.fasta.old.gz
Type: application/x-gzip
Size: 7927581 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140313/2048cfef/attachment.tgz>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140313/2048cfef/attachment-0007.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: proteins.cat.fasta.old.gz
Type: application/x-gzip
Size: 3668381 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140313/2048cfef/attachment-0001.tgz>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140313/2048cfef/attachment-0008.html>

From carsonhh at gmail.com  Thu Mar 13 12:53:05 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 13 Mar 2014 12:53:05 -0600
Subject: [maker-devel] maker output- transcripts.fasta and
	proteins.fasta files missing
In-Reply-To: <C5EC9853-C3A9-4651-9C7F-05F7B73FC628@gmail.com>
References: <E1538E4B-E356-4044-BD96-6D56D6F65C87@gmail.com>
	<64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com>
	<CF4382AA.AA8B%carsonhh@gmail.com>
	<A1D096BC-F25A-48D9-8C7F-8A64946E57F7@gmail.com>
	<CF438653.AA92%carsonhh@gmail.com>
	<A22880BB-7693-4655-A3F3-D99D4F1FC08D@gmail.com>
	<CF4733ED.AB63%carsonhh@gmail.com>
	<0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com>
	<CF473DBA.AB9F%carsonhh@gmail.com>
	<672A27A2-FFBD-45EC-9303-E3973EEA5AB6@gmail.com>
	<CF474291.ABC0%carsonhh@gmail.com>
	<CF4744C6.ABC9%carsonhh@gmail.com>
	<5EE3B5E8-E7DC-4F09-B52D-E08CA4D85A15@gmail.com>
	<CF474BE5.ABDA%carsonhh@gmail.com>
	<C5EC9853-C3A9-4651-9C7F-05F7B73FC628@gmail.com>
Message-ID: <CF4759BA.ABE2%carsonhh@gmail.com>

For future reference, I suggest using the ?/maker/bin/fasta_merge tool to
merge based on the datastore.index rather than other command line based
methods.  It will handle the multiple fasta types that are produced in the
results, and will validate with the datastore.index file.

Example:
fasta_merge -d 
opgenResult+scaffoldsLengthsLess200_master_datastore_index.log

The same is also true when merging gff3 files.
gff3_merge -d opgenResult+scaffoldsLengthsLess200_master_datastore_index.log

Thanks,
Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Thursday, March 13, 2014 at 12:48 PM
To:  Carson Holt <carsonhh at gmail.com>
Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
missing

ah  I forgot that some were called superscaffolds.  That is a difference
between the old and new assembly. This was definitely the issue. Thanks and
sorry for the mix up.

Dhivya
On Mar 13, 2014, at 12:51 PM, Carson Holt wrote:

> Note that your command does not capture everything because not all scaffolds
> start with the name ?scaffold".
> 
> This works though ?>
> ls -lh opgenResult+scaffoldsLengthsLess200_datastore/*/*/*/*trans*fasta|wc -l
> 
> Thanks,
> Carson
> 
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Thursday, March 13, 2014 at 11:34 AM
> To:  Carson Holt <carsonhh at gmail.com>
> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files missing
> 
> Hi Carson,
> 
> Am I looking in the wrong place for my fasta files?  I looked here:
> 
> ls -lh opgenResult+scaffoldsLengthsLess200_datastore/*/*/sca*/*trans*fasta|wc
> -l
> 
> I see only 97 such files- so 97 contigs with transcripts.fasta files?
> 
> When I count the number of sequences in all these files, I get 514 sequences.
> 
> grep -c '^>' 
> opgenResult+scaffoldsLengthsLess200_datastore/*/*/sca*/*trans*fasta|cut -d ':'
> -f 2|awk '{total+=$0}END{print total}'
> 
> Could you tell how and where you are getting the 21,183 transcripts?
> 
> thanks
> dhivya
> 
> On Mar 13, 2014, at 12:21 PM, Carson Holt wrote:
> 
>> This is what I see in your uploaded data.  There are 21,183 transcripts from
>> 201 contigs.  Then there are 707 contigs with no gene models.
>> 
>> ?Carson
>> 
>> 
>> From:  Carson Holt <carsonhh at gmail.com>
>> Date:  Thursday, March 13, 2014 at 11:11 AM
>> To:  dhivya arasappan <darasappan at gmail.com>
>> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
>> missing
>> 
>> "as you saw from the output I uploaded before, the output certainly was much
>> less than 20,000 transcripts?
>> 
>> Actually there were 21,183 in the output you uploaded.  I saw no loss of
>> entries.
>> 
>> ?Carson
>> 
>> From:  dhivya arasappan <darasappan at gmail.com>
>> Date:  Thursday, March 13, 2014 at 11:09 AM
>> To:  Carson Holt <carsonhh at gmail.com>
>> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
>> missing
>> 
>> Hi Carson,
>> 
>> The datastore.index file looks fine- it has a started and finished status for
>> my 980 scaffolds.  I reran with increased time twice. Second time around, I
>> actually deleted the entire output directory to make sure it runs all over
>> again.  It still seemed to complete within a day. As you saw from the output
>> I uploaded before, the output certainly was much less than 20,000
>> transcripts. Given that I was seeing great results for an older version of my
>> assembly, I'm puzzled as to why my results are worse this time around. Any
>> suggestions of what to check or what I can do to see improved results would
>> be really helpful.
>> 
>> I do know that I went from ~4% gaps to ~6% gaps in my new assembly- other
>> than that, its better in every way. Could this cause just a dramatic
>> difference in results?
>> 
>> Thanks
>> dhivya
>> 
>> On Mar 13, 2014, at 11:55 AM, Carson Holt wrote:
>> 
>>> The second time, it should have just started where it left off, so it would
>>> run faster (because the processing from the previous job counted towards the
>>> second one).  The archived output you sent me had 21,183 proteins and
>>> transcripts.  If you are using the fasta_merge to collect them, just make
>>> sure the datastore.index file is not truncated or corrupt otherwise it won?t
>>> collect all the fastas from every contig.  You can rebuild the
>>> datastore.index using the -dsindex flag with MAKER, if you want to check
>>> that.  Also you can have maker just regenerate results without rerunning
>>> BLAST etc., by using the -a flag if you want to just recalculate ll results
>>> quickly (rebuilds all FASTA and GFF3 without redoing most analysis).
>>> 
>>> ?Carson
>>> 
>>> 
>>> From:  dhivya arasappan <darasappan at gmail.com>
>>> Date:  Thursday, March 13, 2014 at 10:47 AM
>>> To:  Carson Holt <carsonhh at gmail.com>
>>> Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
>>> <maker-devel at yandell-lab.org>
>>> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
>>> missing
>>> 
>>> Thanks Carson for the response.  I understand that est2genome=1 does not use
>>> any ab initio gene predictions, but simply identifies ests based on
>>> alignment.  I'm a little confused because I ran maker on my assembly before,
>>> using the same parameters ( including est2genome=1).  I got a very good
>>> result with > 20,000 transcripts and proteins.
>>> 
>>> Then I  was able to get an improved assembly, where many scaffolds were
>>> combined into superscaffolds. So I reran maker on this assembly.   Same
>>> parameters, same transcriptome and proteins files.  Now, I see such
>>> drastically different results:  Only 500+ genes and transcripts.  My
>>> scaffolds are now bigger than before, so I'm not sure how this is happening.
>>> These were the results I sent you.
>>> 
>>> Another odd thing I noticed (and I am hesitant to report this because
>>> perhaps it is due to some sort of error on my part):  I ran maker on the
>>> improved assembly the first time and maker did not complete in the 48 hours
>>> I allocated.  But I had  19,000+ transcripts in the unfinished output.  When
>>> I reran maker, just changing the time allocated, it completed much faster,
>>> but is giving much fewer transcripts and proteins as output.  Could
>>> something like this happen? If not, then I'm guessing I must have changed
>>> something although I'm pretty sure that I did not change anything other than
>>> the time allocated. I've attached the trascripts and proteins files from the
>>> first time I ran maker on my improved assembly.
>>> 
>>> Thanks again for your help
>>> Dhivya
>>> 
>>> 
>>> 
>>> On Mar 13, 2014, at 11:14 AM, Carson Holt wrote:
>>> 
>>>> Note protein/transcript fasts are only created when there are gene models
>>>> to output to those files (so their absence means there were no gene models
>>>> for that contig). Most sequences without protein/transcript fasts in your
>>>> sample are very short and thus don?t contain anything.  What is left either
>>>> have no est2genome results or the est2genome alignments do not have
>>>> sufficient open reading frame to be turned into a gene model (false merging
>>>> of regions by trinity can cause this, so make sure you use the jaccard
>>>> index option when assembling reads with trinity to avoid this).
>>>> 
>>>> You are using only the est2genome=1 option.  This will result in a limited
>>>> set of genes that can be used for training SNAP/Augustus (so not getting
>>>> results on all contigs is expected).  You really won?t get much as far as
>>>> results until you have one of the ab initio predictors turned on.
>>>> 
>>>> Thanks,
>>>> Carson
>>>> 
>>>> 
>>>> From:  dhivya arasappan <darasappan at gmail.com>
>>>> Date:  Tuesday, March 11, 2014 at 8:52 AM
>>>> To:  Carson Holt <carsonhh at gmail.com>
>>>> Cc:  Daniel Ence <dence at genetics.utah.edu>
>>>> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
>>>> missing
>>>> 
>>>> Alright done. My username is daras
>>>> 
>>>> Thanks
>>>> Dhivya
>>>> 
>>>> On Mar 10, 2014, at 5:10 PM, Carson Holt wrote:
>>>> 
>>>>> Input and compressed file of output.
>>>>> 
>>>>> Thanks,
>>>>> Carson
>>>>> 
>>>>> From:  dhivya arasappan <darasappan at gmail.com>
>>>>> Date:  Monday, March 10, 2014 at 2:09 PM
>>>>> To:  Carson Holt <carsonhh at gmail.com>
>>>>> Cc:  Daniel Ence <dence at genetics.utah.edu>
>>>>> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
>>>>> missing
>>>>> 
>>>>> Hi Carson,
>>>>> 
>>>>> Do you mean the whole maker output?
>>>>> 
>>>>> Thanks
>>>>> dhivya
>>>>> 
>>>>> On Mar 10, 2014, at 4:55 PM, Carson Holt wrote:
>>>>> 
>>>>>> Could you upload everything here ?>
>>>>>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>>>>>> 
>>>>>> Than send us the link generated or your user ID.
>>>>>> 
>>>>>> Thanks,
>>>>>> Carson
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> From:  dhivya arasappan <darasappan at gmail.com>
>>>>>> Date:  Monday, March 10, 2014 at 1:50 PM
>>>>>> To:  Carson Holt <carsonhh at gmail.com>, Daniel Ence
>>>>>> <dence at genetics.utah.edu>
>>>>>> Subject:  Fwd: maker output- transcripts.fasta and proteins.fasta files
>>>>>> missing
>>>>>> 
>>>>>> Hi Carson and Daniel,
>>>>>> 
>>>>>> I'm sending this across to you separately since maker list is blocking my
>>>>>> email due to attachment size.
>>>>>> 
>>>>>> As always, thanks for any guidance you can provide.
>>>>>> Dhivya
>>>>>> 
>>>>>> 
>>>>>> Begin forwarded message:
>>>>>> 
>>>>>>> From: dhivya arasappan <darasappan at gmail.com>
>>>>>>> Date: March 10, 2014 3:14:03 PM CDT
>>>>>>> To: maker-devel at yandell-lab.org
>>>>>>> Subject: maker output- transcripts.fasta and proteins.fasta files
>>>>>>> missing
>>>>>>> 
>>>>>>>  
>>>>>>> Hello,
>>>>>>> 
>>>>>>> I've been running maker with different assembly files, reference files
>>>>>>> etc  and I check the output by:
>>>>>>> 
>>>>>>> 1. concatenating the gff files
>>>>>>> 2. concatenating the *transcripts.fasta files
>>>>>>> 3. concatenating the *proteins.fasta files
>>>>>>> 
>>>>>>> I'm noticing that when I ran maker twice with same parameters, the
>>>>>>> second time around, many of the output subdirectories  do not have a
>>>>>>> *transcripts.fasta or *proteins.fasta file in it.
>>>>>>> There are 251 subdirectories and only 97 of them have all 3 output
>>>>>>> files.  Maker log looks ok to me, but I've attached it here as well.
>>>>>>> 
>>>>>>> What could be the reason for this?
>>>>>>> 
>>>>>>> Thanks
>>>>>>> dhivya
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140313/dff0c913/attachment-0002.html>

From cjfields at illinois.edu  Thu Mar 13 15:04:23 2014
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 13 Mar 2014 21:04:23 +0000
Subject: [maker-devel] geneid (or alternative ab initio predictors)
In-Reply-To: <CAJrwUqnXZgJgse2X6z7QhQcC_aNih_dp90dpjsW037F0Qk-W4A@mail.gmail.com>
References: <CF433C40.AA26%carsonhh@gmail.com>
	<CAJrwUqnXZgJgse2X6z7QhQcC_aNih_dp90dpjsW037F0Qk-W4A@mail.gmail.com>
Message-ID: <A7C303EB-717F-4E95-8829-7912B49A6D38@illinois.edu>

That is nice to know; I?ll have to check the masking on this assembly to see if that is the problem (my guess is that it is).

Carson, re: geneid and ?hints?, it looks as if geneid can take some hints such as BLAST HSPs (as well as other information), in the form of a GFF ?homology? file.  I assume it could take protein2genome/est2genome as well through the same route.

chris

On Mar 10, 2014, at 1:31 PM, Sajeet Haridas <sajeet at gmail.com<mailto:sajeet at gmail.com>> wrote:

One of the problems I have found with genemark is that it does not understand a soft-masked genome. Hence, the self training is incorrect. I have found marked improvement to genemark's prediction by running the training on a hard masked genome.


On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>> wrote:
Adding a new predictor can take some time.  It obviously requires some
coding.  It?s usually not too hard just to convert results to GFF3 and
then pass it in.  Integrated support is really only beneficial for
predictors that can take ?hints? from evidence alignments (for example we
are working on EVM integration right now -
http://evidencemodeler.sourceforge.net<http://evidencemodeler.sourceforge.net/>).  If SNAP and GeneMark give
problems just drop them.  GeneMark really doesn?t work very good on
genomes with complex intron/exon structure (and I really wouldn?t use it
for anything but fungi).

Make sure you are also giving sufficient protein evidence.  Perhaps all
proteins from chicken and pigeon for example.  Then you shouldn?t find
loss of any true genes if just using Augustus.  Also try not to use gene
count as an indicator of performance.  The value is very deceptive,
especially if the genome assembly is fragmented.

Thanks,
Carson


On 3/10/14, 8:52 AM, "Fields, Christopher J" <cjfields at illinois.edu<mailto:cjfields at illinois.edu>> wrote:

>I have been running MAKER 2.31 using Augustus and SNAP on an avian
>genome.  Augustus gives pretty decent gene model predictions based on a
>custom model we have and the hints MAKER provides.  However, SNAP seems
>to throw out a ton of false positives; in many cases this appears to
>cause erroneous gene fusions.  Leaving out SNAP altogether however leads
>to a marked decrease in # models overall, which is worse.  GeneMark had a
>very similar problem (high # false positives) and thus no marked
>improvement, either when using with both Augustus and SNAP or with
>Augustus alone.
>
>I have been exploring using geneid
>(http://genome.crg.es/software/geneid/) as an alternative, based on some
>feedback on another project I worked with int he past.  This would be
>feed into MAKER using external GFF, but I wanted to see if anyone has
>tried geneid with MAKER first.
>
>Finally, how hard would it be to incorporate alternative callers into
>MAKER?  For instance, would it be possible to add these like a ?plugin??
>
>chris
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140313/357a688a/attachment-0002.html>

From jfierst at uoregon.edu  Fri Mar 14 10:06:26 2014
From: jfierst at uoregon.edu (Janna Fierst)
Date: Fri, 14 Mar 2014 09:06:26 -0700
Subject: [maker-devel] associating gene names between related strains
Message-ID: <CAGoyurZz5FvX_oCGtSoq5mzwfabFS5ixaHVgzQds7Bo26NcYHg@mail.gmail.com>

Hi,

we are assembling and annotating genomes for several related strains of
Caenorhabditis worms and I was wondering if there is a way to coordinate
the gene naming so that orthologs between species can be associated by
name. I have been playing around a little with the est_forward option but
can't figure out a good system/workflow that preserves names but still uses
the strain-specific RNA-Seq EST set for the actual gene models. Thanks!
-Janna
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140314/6d450ccc/attachment-0002.html>

From dence at genetics.utah.edu  Fri Mar 14 11:32:02 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Fri, 14 Mar 2014 17:32:02 +0000
Subject: [maker-devel] associating gene names between related strains
In-Reply-To: <CAGoyurZz5FvX_oCGtSoq5mzwfabFS5ixaHVgzQds7Bo26NcYHg@mail.gmail.com>
References: <CAGoyurZz5FvX_oCGtSoq5mzwfabFS5ixaHVgzQds7Bo26NcYHg@mail.gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6C3C3@mxb2.hg.genetics.utah.edu>

Hi Janna, So do you have one strain that you want to use as the reference for all the others? There's a script that comes with MAKER called maker_map_ids that lets you use a common prefix or suffix for entries in a fasta file from one strain and then use est_forward to use that ID in the gene models for the other species.

Let me know if that's not what you're looking for,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Janna Fierst [jfierst at uoregon.edu]
Sent: Friday, March 14, 2014 10:06 AM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] associating gene names between related strains

Hi,

we are assembling and annotating genomes for several related strains of Caenorhabditis worms and I was wondering if there is a way to coordinate the gene naming so that orthologs between species can be associated by name. I have been playing around a little with the est_forward option but can't figure out a good system/workflow that preserves names but still uses the strain-specific RNA-Seq EST set for the actual gene models. Thanks! -Janna
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140314/84143c7f/attachment-0002.html>

From jfierst at uoregon.edu  Fri Mar 14 12:01:16 2014
From: jfierst at uoregon.edu (Janna Fierst)
Date: Fri, 14 Mar 2014 11:01:16 -0700
Subject: [maker-devel] associating gene names between related strains
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6C3C3@mxb2.hg.genetics.utah.edu>
References: <CAGoyurZz5FvX_oCGtSoq5mzwfabFS5ixaHVgzQds7Bo26NcYHg@mail.gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6C3C3@mxb2.hg.genetics.utah.edu>
Message-ID: <CAGoyuracbDO5pcWU7wThnnnGbfoKo2xEn+trPPUaUJx9t+8_Lg@mail.gmail.com>

I will try it today. Thanks for the quick reply!


On Fri, Mar 14, 2014 at 10:32 AM, Daniel Ence <dence at genetics.utah.edu>wrote:

>  Hi Janna, So do you have one strain that you want to use as the
> reference for all the others? There's a script that comes with MAKER called
> maker_map_ids that lets you use a common prefix or suffix for entries in a
> fasta file from one strain and then use est_forward to use that ID in the
> gene models for the other species.
>
>  Let me know if that's not what you're looking for,
> Daniel
>
>  Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
>   ------------------------------
> *From:* maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
> Janna Fierst [jfierst at uoregon.edu]
> *Sent:* Friday, March 14, 2014 10:06 AM
> *To:* maker-devel at yandell-lab.org
> *Subject:* [maker-devel] associating gene names between related strains
>
>   Hi,
>
> we are assembling and annotating genomes for several related strains of
> Caenorhabditis worms and I was wondering if there is a way to coordinate
> the gene naming so that orthologs between species can be associated by
> name. I have been playing around a little with the est_forward option but
> can't figure out a good system/workflow that preserves names but still uses
> the strain-specific RNA-Seq EST set for the actual gene models. Thanks!
> -Janna
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140314/6c26531d/attachment-0002.html>

From carsonhh at gmail.com  Fri Mar 14 12:02:48 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 14 Mar 2014 12:02:48 -0600
Subject: [maker-devel] associating gene names between related strains
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6C3C3@mxb2.hg.genetics.utah.edu>
References: <CAGoyurZz5FvX_oCGtSoq5mzwfabFS5ixaHVgzQds7Bo26NcYHg@mail.gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6C3C3@mxb2.hg.genetics.utah.edu>
Message-ID: <CF489F0B.AC19%carsonhh@gmail.com>

maker_map_ids does a translation (i.e. change gene-A to smug1), so you need
to know which genes you want to translate names to (two column input file,
column 1 -> original ID, column 2 -> new ID).  I?m not sure EST forward is
the best way to do this, although I do think maker_map_ids is the tool to
use in the end.  The question is how to make a list of IDs to translate as
the input to maker_map_ids?

I would actually just use BLASTP against the reference strain, and then do
reciprocal best BLAST hits.  To do this you BLAST your reference proteins
against your maker proteins.  Then do the opposite, BLAST your  maker
proteins against your reference proteins.  If they are both each others best
hit, then they are orthologous, and you can safely make a two column entry
for the maker_map_ids input (i.e. maker-gene-1 translates into smug1).

?Carson


From:  Daniel Ence <dence at genetics.utah.edu>
Date:  Friday, March 14, 2014 at 11:32 AM
To:  Janna Fierst <jfierst at uoregon.edu>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] associating gene names between related strains

Hi Janna, So do you have one strain that you want to use as the reference
for all the others? There's a script that comes with MAKER called
maker_map_ids that lets you use a common prefix or suffix for entries in a
fasta file from one strain and then use est_forward to use that ID in the
gene models for the other species.

Let me know if that's not what you're looking for,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330

From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Janna
Fierst [jfierst at uoregon.edu]
Sent: Friday, March 14, 2014 10:06 AM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] associating gene names between related strains

Hi,

we are assembling and annotating genomes for several related strains of
Caenorhabditis worms and I was wondering if there is a way to coordinate the
gene naming so that orthologs between species can be associated by name. I
have been playing around a little with the est_forward option but can't
figure out a good system/workflow that preserves names but still uses the
strain-specific RNA-Seq EST set for the actual gene models. Thanks! -Janna
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140314/e19abad7/attachment-0002.html>

From carsonhh at gmail.com  Fri Mar 14 12:43:41 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 14 Mar 2014 12:43:41 -0600
Subject: [maker-devel] Error when running maker2zff script
In-Reply-To: <9E3C7171-E5F7-4602-A7B7-9E9CE91F303A@gmail.com>
References: <C9394A0F-A682-4249-80DD-D79E45AE18EA@gmail.com>
	<3219E92A-2024-45C6-84A9-66C646287D7E@gmail.com>
	<9E3C7171-E5F7-4602-A7B7-9E9CE91F303A@gmail.com>
Message-ID: <CF48A7BD.AC29%carsonhh@gmail.com>

I?m glad you were able to fix it.  I?ll check to see why it was failing as
well.

Thanks,
Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Friday, March 14, 2014 at 10:16 AM
To:  Carson Holt <carsonhh at gmail.com>
Subject:  Re: Error when running maker2zff script

Kindly ignore my previous question. I was able to manipulate the scaffold
names in the gff file to get maker2zff to work.

Thanks
dhivya

On Mar 14, 2014, at 10:55 AM, dhivya arasappan <darasappan at gmail.com> wrote:

> My message got flagged by the maker list again, so I?m forwarding this
> separately to you.  Is there a better way to send biggish files?
> 
> 
> Thank you
> Dhivya
> 
> 
> 
> Begin forwarded message:
> 
>> From: dhivya arasappan <darasappan at gmail.com>
>> Subject: Error when running maker2zff script
>> Date: March 13, 2014 at 8:35:27 PM CDT
>> To: Carson Holt <carsonhh at gmail.com>, maker-devel at yandell-lab.org
>> 
>> Hi Carson,
>> 
>> I used gff3_merge to create my gff file from maker output. I've attached it
>> here. But when I run maker2zff on it, I get the following error:
>> 
>> Can't use an undefined value as an ARRAY reference at
>> /opt/apps/maker/2.30/bin/maker2zff line 177, <GFF> line 7294251.
>> 
>> It produces an incomplete output file and it looks like it may be running
>> into problems when it encounters scaffold3%2F0.  I'm wondering if its having
>> problems with my scaffold names. There seem to be some inconsistencies
>> because it's referred to as  scaffold3%F0 and scaffold3/0 in the gff file.
>> It goes through other scaffolds like SCAFFOLD3_873, SCAFFOLD3_95 etc just
>> fine.   I did try replacing the scaffold names in the gff file, but still get
>> the same error.   Any ideas?
>> 
>> Substitution command I used, for your reference:  sed 's/3\%2F/3_/g' gfffile|
>> sed 's/\//\_/'  > mod.gfffile
>> 
>> Thanks
>> Dhivya
>> 
> <head.gff.gz>
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140314/0ab2c23b/attachment-0002.html>

From carsonhh at gmail.com  Fri Mar 14 13:25:58 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 14 Mar 2014 13:25:58 -0600
Subject: [maker-devel] geneid (or alternative ab initio predictors)
In-Reply-To: <A7C303EB-717F-4E95-8829-7912B49A6D38@illinois.edu>
References: <CF433C40.AA26%carsonhh@gmail.com>
	<CAJrwUqnXZgJgse2X6z7QhQcC_aNih_dp90dpjsW037F0Qk-W4A@mail.gmail.com>
	<A7C303EB-717F-4E95-8829-7912B49A6D38@illinois.edu>
Message-ID: <CF48B2BC.AC3E%carsonhh@gmail.com>

We can look into it.

?Carson

From:  "Fields, Christopher J" <cjfields at illinois.edu>
Date:  Thursday, March 13, 2014 at 3:04 PM
To:  Sajeet Haridas <sajeet at gmail.com>
Cc:  Carson Holt <carsonhh at gmail.com>, "<maker-devel at yandell-lab.org> List"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] geneid (or alternative ab initio predictors)

That is nice to know; I?ll have to check the masking on this assembly to see
if that is the problem (my guess is that it is).

Carson, re: geneid and ?hints?, it looks as if geneid can take some hints
such as BLAST HSPs (as well as other information), in the form of a GFF
?homology? file.  I assume it could take protein2genome/est2genome as well
through the same route.

chris

On Mar 10, 2014, at 1:31 PM, Sajeet Haridas <sajeet at gmail.com> wrote:

> One of the problems I have found with genemark is that it does not understand
> a soft-masked genome. Hence, the self training is incorrect. I have found
> marked improvement to genemark's prediction by running the training on a hard
> masked genome.
> 
> 
> On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt <carsonhh at gmail.com> wrote:
>> Adding a new predictor can take some time.  It obviously requires some
>> coding.  It?s usually not too hard just to convert results to GFF3 and
>> then pass it in.  Integrated support is really only beneficial for
>> predictors that can take ?hints? from evidence alignments (for example we
>> are working on EVM integration right now -
>> http://evidencemodeler.sourceforge.net
>> <http://evidencemodeler.sourceforge.net/> ).  If SNAP and GeneMark give
>> problems just drop them.  GeneMark really doesn?t work very good on
>> genomes with complex intron/exon structure (and I really wouldn?t use it
>> for anything but fungi).
>> 
>> Make sure you are also giving sufficient protein evidence.  Perhaps all
>> proteins from chicken and pigeon for example.  Then you shouldn?t find
>> loss of any true genes if just using Augustus.  Also try not to use gene
>> count as an indicator of performance.  The value is very deceptive,
>> especially if the genome assembly is fragmented.
>> 
>> Thanks,
>> Carson
>> 
>> 
>> 
>> On 3/10/14, 8:52 AM, "Fields, Christopher J" <cjfields at illinois.edu> wrote:
>> 
>>> >I have been running MAKER 2.31 using Augustus and SNAP on an avian
>>> >genome.  Augustus gives pretty decent gene model predictions based on a
>>> >custom model we have and the hints MAKER provides.  However, SNAP seems
>>> >to throw out a ton of false positives; in many cases this appears to
>>> >cause erroneous gene fusions.  Leaving out SNAP altogether however leads
>>> >to a marked decrease in # models overall, which is worse.  GeneMark had a
>>> >very similar problem (high # false positives) and thus no marked
>>> >improvement, either when using with both Augustus and SNAP or with
>>> >Augustus alone.
>>> >
>>> >I have been exploring using geneid
>>> >(http://genome.crg.es/software/geneid/) as an alternative, based on some
>>> >feedback on another project I worked with int he past.  This would be
>>> >feed into MAKER using external GFF, but I wanted to see if anyone has
>>> >tried geneid with MAKER first.
>>> >
>>> >Finally, how hard would it be to incorporate alternative callers into
>>> >MAKER?  For instance, would it be possible to add these like a ?plugin??
>>> >
>>> >chris
>>> >_______________________________________________
>>> >maker-devel mailing list
>>> >maker-devel at box290.bluehost.com
>>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
>> 
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140314/f67ff628/attachment-0002.html>

From cjfields at illinois.edu  Fri Mar 14 20:22:55 2014
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Sat, 15 Mar 2014 02:22:55 +0000
Subject: [maker-devel] geneid (or alternative ab initio predictors)
In-Reply-To: <CF48B2BC.AC3E%carsonhh@gmail.com>
References: <CF433C40.AA26%carsonhh@gmail.com>
	<CAJrwUqnXZgJgse2X6z7QhQcC_aNih_dp90dpjsW037F0Qk-W4A@mail.gmail.com>
	<A7C303EB-717F-4E95-8829-7912B49A6D38@illinois.edu>
	<CF48B2BC.AC3E%carsonhh@gmail.com>
Message-ID: <53FD788A-15EA-4A18-BB2F-3072178816CA@illinois.edu>

Not an issue at the moment; I?ll likely supply these via gff for now.  If needed I can work off a svn checkout and send along a patch should I ever manage to eek out time to work on it.

chris

On Mar 14, 2014, at 2:25 PM, Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>> wrote:

We can look into it.

?Carson

From: "Fields, Christopher J" <cjfields at illinois.edu<mailto:cjfields at illinois.edu>>
Date: Thursday, March 13, 2014 at 3:04 PM
To: Sajeet Haridas <sajeet at gmail.com<mailto:sajeet at gmail.com>>
Cc: Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>, "<maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>> List" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] geneid (or alternative ab initio predictors)

That is nice to know; I?ll have to check the masking on this assembly to see if that is the problem (my guess is that it is).

Carson, re: geneid and ?hints?, it looks as if geneid can take some hints such as BLAST HSPs (as well as other information), in the form of a GFF ?homology? file.  I assume it could take protein2genome/est2genome as well through the same route.

chris

On Mar 10, 2014, at 1:31 PM, Sajeet Haridas <sajeet at gmail.com<mailto:sajeet at gmail.com>> wrote:

One of the problems I have found with genemark is that it does not understand a soft-masked genome. Hence, the self training is incorrect. I have found marked improvement to genemark's prediction by running the training on a hard masked genome.


On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>> wrote:
Adding a new predictor can take some time.  It obviously requires some
coding.  It?s usually not too hard just to convert results to GFF3 and
then pass it in.  Integrated support is really only beneficial for
predictors that can take ?hints? from evidence alignments (for example we
are working on EVM integration right now -
http://evidencemodeler.sourceforge.net<http://evidencemodeler.sourceforge.net/>).  If SNAP and GeneMark give
problems just drop them.  GeneMark really doesn?t work very good on
genomes with complex intron/exon structure (and I really wouldn?t use it
for anything but fungi).

Make sure you are also giving sufficient protein evidence.  Perhaps all
proteins from chicken and pigeon for example.  Then you shouldn?t find
loss of any true genes if just using Augustus.  Also try not to use gene
count as an indicator of performance.  The value is very deceptive,
especially if the genome assembly is fragmented.

Thanks,
Carson


On 3/10/14, 8:52 AM, "Fields, Christopher J" <cjfields at illinois.edu<mailto:cjfields at illinois.edu>> wrote:

>I have been running MAKER 2.31 using Augustus and SNAP on an avian
>genome.  Augustus gives pretty decent gene model predictions based on a
>custom model we have and the hints MAKER provides.  However, SNAP seems
>to throw out a ton of false positives; in many cases this appears to
>cause erroneous gene fusions.  Leaving out SNAP altogether however leads
>to a marked decrease in # models overall, which is worse.  GeneMark had a
>very similar problem (high # false positives) and thus no marked
>improvement, either when using with both Augustus and SNAP or with
>Augustus alone.
>
>I have been exploring using geneid
>(http://genome.crg.es/software/geneid/) as an alternative, based on some
>feedback on another project I worked with int he past.  This would be
>feed into MAKER using external GFF, but I wanted to see if anyone has
>tried geneid with MAKER first.
>
>Finally, how hard would it be to incorporate alternative callers into
>MAKER?  For instance, would it be possible to add these like a ?plugin??
>
>chris
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140315/e6294622/attachment-0002.html>

From carson.holt at genetics.utah.edu  Mon Mar 17 13:45:15 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Mon, 17 Mar 2014 19:45:15 +0000
Subject: [maker-devel] non-nucleotide characters in the maker generated
	transcripts
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A890CC84@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A890C8AC@SKREGIXES2.AGR.GC.CA>
	<CF47300B.AB4F%carson.holt@genetics.utah.edu>
	<CF4731CC.AB5E%carson.holt@genetics.utah.edu>
	<E8EDFB90D92694478065C37017B3A3A6A890CC84@SKREGIXES2.AGR.GC.CA>
Message-ID: <CF4CA8DB.AD74%carson.holt@genetics.utah.edu>

I have attached 4 files for you to place in the .../maker/Widgets/
directory.

The *blast.pm files will suppress the BLAST+ failures you are getting
(alternatively you can just downgrade to BLAST 2.27 to get the same
effect).  BLAST 2.29 gives a lot of warnings etc., which you can ignore.
In the latest release NCBI redid all their warnings and error codes so it
spits out a lot of garbage and fails with different messages than it did
before.  For example BLAST now warns you every time it encounter a fasta
header with a comment (virtually every fasta entry in existence falls in
this category), so your screen will be awash with meaningless warning
messages.

The fgenesh.pm file will fix the other failure, which only occurs if you
use fgenesh simultaneously with the est_fustion=1 option.  No other
predictors are affected.

Thanks,
Carson


On 3/14/14, 5:14 PM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:

>Dear  Carson
>
>Sorry for the late reply. I was away for a couple of days. I have uploaded
>the out put files plus control and error output on the FTP site that you
>provided
>The user ID is borhanh
>
>I used blast+ for this run.
>
>
>
>
>Regards
>
>
>HB
>
>
>
>
>
>
>
>
>On 14-03-13 10:00 AM, "Carson Holt" <carson.holt at genetics.utah.edu> wrote:
>
>>Just resending this to the correct maker-devel address.  Please when
>>replying, do not CC the incorrect maker-devel-bounce address.
>>
>>Thanks,
>>Carson
>>
>>
>>On 3/13/14, 9:56 AM, "Carson Holt" <carson.holt at genetics.utah.edu> wrote:
>>
>>>FGENESH is not a heavily used tool, so depending on which version it is
>>>(either too old or too new), output might be slightly different which
>>>could cause incorrect parsing. Could you tar up your maker.output
>>>folder,
>>>and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>>>(send me either your user/guest ID after you upload).
>>>
>>>For the BLAST error, use BLAST+ instead.  You are using blastall which
>>>is
>>>the old legacy version of NCBI BLAST.  You can do this by setting the
>>>blast type in maker_bopts.ctl and the location of executables in
>>>maker_exe.ctl.
>>>
>>>Thanks,
>>>Carson
>>>
>>>
>>>
>>>On 3/12/14, 11:58 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA>
>>>wrote:
>>>
>>>>Dear Maker users
>>>>
>>>>
>>>>I ran maker (2.31) on a fungal genome and found out that it inserted
>>>>the
>>>>word SCLAR   followed by a pair of bracket like this (0x22de7020)
>>>>inserted in the nucleotide sequence of some of the genes. This seems to
>>>>be related to transcripts predicted by fgenesh_masked.
>>>>
>>>>
>>>>Here is an example for one of the genes
>>>>
>>>>
>>>>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript
>>>>>offset:0 AE
>>>>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651
>>>>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23
>>>>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA
>>>>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG
>>>>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC
>>>>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT
>>>>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC
>>>>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT
>>>>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA
>>>>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA
>>>>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT
>>>>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT
>>>>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC
>>>>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG
>>>>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG
>>>>TTTCGACAAGC
>>>>
>>>>The same genome sequence was used for the first round of maker (2.10)
>>>>without such problem. I checked the sequence for the scaffold related
>>>>to
>>>>one of the affected transcripts and there was no error in the sequence.
>>>>I am not sure what is causing this. The only error that I could spot in
>>>>the output error file is the following
>>>>
>>>>
>>>>[blastall] FATAL ERROR:  search cannot proceed due to errors in all
>>>>contexts/frames of query sequences.
>>>>
>>>>
>>>>
>>>>Your help is appreciated
>>>>
>>>>
>>>>
>>>>HB
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: blastn.pm
Type: text/x-perl-script
Size: 8112 bytes
Desc: blastn.pm
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140317/e73c4b0f/attachment-0008.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: blastx.pm
Type: text/x-perl-script
Size: 8218 bytes
Desc: blastx.pm
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140317/e73c4b0f/attachment-0009.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fgenesh.pm
Type: text/x-perl-script
Size: 19744 bytes
Desc: fgenesh.pm
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140317/e73c4b0f/attachment-0010.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tblastx.pm
Type: text/x-perl-script
Size: 9113 bytes
Desc: tblastx.pm
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140317/e73c4b0f/attachment-0011.bin>

From carsonhh at gmail.com  Mon Mar 17 15:14:42 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 17 Mar 2014 15:14:42 -0600
Subject: [maker-devel] Error when running maker2zff script
In-Reply-To: <C9394A0F-A682-4249-80DD-D79E45AE18EA@gmail.com>
References: <C9394A0F-A682-4249-80DD-D79E45AE18EA@gmail.com>
Message-ID: <CF4CBEAF.ADA3%carsonhh@gmail.com>

Just an update on this.  I?ve fixed the maker2zff script to handle the
issues seen.  Looking at this actually brought to light another issue.
There is inconsistent escape character specification for GFF3 in column 1
(the source ID), column 8 (the attributes ID and Target_ID), as well as
the FASTA ID for internal sequence.  We?re updating the GFF3 spec to
clarify this so that everywhere you see the same ID getting treated the
same way for character escaping.
 
To be safe though, only use these characters in your contig IDs for the
assembly when using any tool that reads or outputs GFF3 ?>
a-zA-Z0-9.:^*$@!+_?-|

Any character not in that set has a high chance of breaking some
downstream tool.  For now just assume the strict interpretation from the
GFF3 spec for column 1, must be used on all IDs everywhere (see below).

>>Column 1: ?seqid"
>>The ID of the landmark used to establish the coordinate system for the
>>current feature.
>>IDs may contain any characters, but must escape any characters not in
>>the set [a-zA-Z0-9.:^*$@!+_?-|].
>>In particular, IDs may not contain unescaped whitespace and must not
>>begin with an unescaped ">".


Thanks,
Carson


On 3/13/14, 7:35 PM, "dhivya arasappan" <darasappan at gmail.com> wrote:

>Hi Carson,
>
>I used gff3_merge to create my gff file from maker output. I've
>attached it here. But when I run maker2zff on it, I get the following
>error:
>
>Can't use an undefined value as an ARRAY reference at /opt/apps/maker/
>2.30/bin/maker2zff line 177, <GFF> line 7294251.
>
>It produces an incomplete output file and it looks like it may be
>running into problems when it encounters scaffold3%2F0.  I'm wondering
>if its having problems with my scaffold names. There seem to be some
>inconsistencies because it's referred to as  scaffold3%F0 and
>scaffold3/0 in the gff file.  It goes through other scaffolds like
>SCAFFOLD3_873, SCAFFOLD3_95 etc just fine.   I did try replacing the
>scaffold names in the gff file, but still get the same error.   Any
>ideas?
>
>Substitution command I used, for your reference:  sed 's/3\%2F/3_/g'
>gfffile| sed 's/\//\_/'  > mod.gfffile
>
>Thanks
>Dhivya
>


From darasappan at gmail.com  Mon Mar 17 15:20:18 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Mon, 17 Mar 2014 16:20:18 -0500
Subject: [maker-devel] Error when running maker2zff script
In-Reply-To: <CF4CBEAF.ADA3%carsonhh@gmail.com>
References: <C9394A0F-A682-4249-80DD-D79E45AE18EA@gmail.com>
	<CF4CBEAF.ADA3%carsonhh@gmail.com>
Message-ID: <CAGWaY_61EFs28=2dThqjgnkeisCXjad7JM72ews-fkTn0v7FCA@mail.gmail.com>

Awesome! Thanks Carson.

Dhivya


On Mon, Mar 17, 2014 at 4:14 PM, Carson Holt <carsonhh at gmail.com> wrote:

> Just an update on this.  I've fixed the maker2zff script to handle the
> issues seen.  Looking at this actually brought to light another issue.
> There is inconsistent escape character specification for GFF3 in column 1
> (the source ID), column 8 (the attributes ID and Target_ID), as well as
> the FASTA ID for internal sequence.  We're updating the GFF3 spec to
> clarify this so that everywhere you see the same ID getting treated the
> same way for character escaping.
>
> To be safe though, only use these characters in your contig IDs for the
> assembly when using any tool that reads or outputs GFF3 -->
> a-zA-Z0-9.:^*$@!+_?-|
>
> Any character not in that set has a high chance of breaking some
> downstream tool.  For now just assume the strict interpretation from the
> GFF3 spec for column 1, must be used on all IDs everywhere (see below).
>
> >>Column 1: "seqid"
> >>The ID of the landmark used to establish the coordinate system for the
> >>current feature.
> >>IDs may contain any characters, but must escape any characters not in
> >>the set [a-zA-Z0-9.:^*$@!+_?-|].
> >>In particular, IDs may not contain unescaped whitespace and must not
> >>begin with an unescaped ">".
>
>
> Thanks,
> Carson
>
>
>
> On 3/13/14, 7:35 PM, "dhivya arasappan" <darasappan at gmail.com> wrote:
>
> >Hi Carson,
> >
> >I used gff3_merge to create my gff file from maker output. I've
> >attached it here. But when I run maker2zff on it, I get the following
> >error:
> >
> >Can't use an undefined value as an ARRAY reference at /opt/apps/maker/
> >2.30/bin/maker2zff line 177, <GFF> line 7294251.
> >
> >It produces an incomplete output file and it looks like it may be
> >running into problems when it encounters scaffold3%2F0.  I'm wondering
> >if its having problems with my scaffold names. There seem to be some
> >inconsistencies because it's referred to as  scaffold3%F0 and
> >scaffold3/0 in the gff file.  It goes through other scaffolds like
> >SCAFFOLD3_873, SCAFFOLD3_95 etc just fine.   I did try replacing the
> >scaffold names in the gff file, but still get the same error.   Any
> >ideas?
> >
> >Substitution command I used, for your reference:  sed 's/3\%2F/3_/g'
> >gfffile| sed 's/\//\_/'  > mod.gfffile
> >
> >Thanks
> >Dhivya
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140317/7b1247b8/attachment-0002.html>

From marc.hoeppner at bils.se  Tue Mar 18 05:43:43 2014
From: marc.hoeppner at bils.se (=?windows-1252?Q?Marc_H=F6ppner?=)
Date: Tue, 18 Mar 2014 12:43:43 +0100
Subject: [maker-devel] Maker changes 2.30-2.31
Message-ID: <92E3B1D1-092C-45CF-9DD6-1A5E6921FC15@bils.se>

Hi,

I have observed a few oddities with our installation of maker 2.31 and was therefore wondering if there is a change log somewhere to get some information on what, if anything, was changed between 2.30 and 2.31?

There is of course a good chance that the issues I am seeing (pipeline locking up) are related to our setup and not necessarily Maker - but I?d  like to make sure, if possible. Both versions use the exact same external binaries etc, and were run on the same data. 2.30 is running along happily, 2.31 however has randomly locked up. I should perhaps also say that I am running on SL 6.2 and am using mpich2 for the MPI run. 

I haven?t done any more systematic testing so far, but will probably do so if there is no ?obvious? reason why Maker 2.31 should behave differently..

Cheers,

Marc


Marc P. Hoeppner, PhD
Department for Medical Biochemistry and Microbiology
Uppsala University, Sweden
marc.hoeppner at bils.se


From carsonhh at gmail.com  Tue Mar 18 09:07:07 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 18 Mar 2014 09:07:07 -0600
Subject: [maker-devel] Maker changes 2.30-2.31
In-Reply-To: <92E3B1D1-092C-45CF-9DD6-1A5E6921FC15@bils.se>
References: <92E3B1D1-092C-45CF-9DD6-1A5E6921FC15@bils.se>
Message-ID: <CF4DBC09.ADE0%carsonhh@gmail.com>

Attached.  Also make sure you are using the tar ball from the lab website
and not the prerelease from the subversion repository.

Thanks,
Carson


On 3/18/14, 5:43 AM, "Marc H?ppner" <marc.hoeppner at bils.se> wrote:

>Hi,
>
>I have observed a few oddities with our installation of maker 2.31 and
>was therefore wondering if there is a change log somewhere to get some
>information on what, if anything, was changed between 2.30 and 2.31?
>
>There is of course a good chance that the issues I am seeing (pipeline
>locking up) are related to our setup and not necessarily Maker - but I?d
>like to make sure, if possible. Both versions use the exact same external
>binaries etc, and were run on the same data. 2.30 is running along
>happily, 2.31 however has randomly locked up. I should perhaps also say
>that I am running on SL 6.2 and am using mpich2 for the MPI run.
>
>I haven?t done any more systematic testing so far, but will probably do
>so if there is no ?obvious? reason why Maker 2.31 should behave
>differently..
>
>Cheers,
>
>Marc
>
>
>
>
>Marc P. Hoeppner, PhD
>Department for Medical Biochemistry and Microbiology
>Uppsala University, Sweden
>marc.hoeppner at bils.se
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
r1060 | cholt | 2013-11-04 11:18:12 -0700 (Mon, 04 Nov 2013) | MAKER stable release version 2.30
r1061 | cholt | 2013-11-10 22:19:51 -0700 (Sun, 10 Nov 2013) | altered build install slightly
r1062 | cholt | 2013-11-25 09:33:16 -0700 (Mon, 25 Nov 2013) | updated fgenesh for hint based annotation error
r1063 | cholt | 2013-12-05 14:10:42 -0700 (Thu, 05 Dec 2013) | fix repeat too short output error
r1064 | cholt | 2013-12-05 14:18:04 -0700 (Thu, 05 Dec 2013) | updated installation scripts
r1065 | cholt | 2013-12-13 08:42:08 -0700 (Fri, 13 Dec 2013) | fix fully masked failure for BLAST 2.2.25
r1066 | cholt | 2014-01-09 10:45:08 -0700 (Thu, 09 Jan 2014) | update MWAS and maker2jbrowse
r1067 | cholt | 2014-01-09 11:34:18 -0700 (Thu, 09 Jan 2014) | fix invalid character in Ecoli example fasta
r1068 | cholt | 2014-01-24 10:42:15 -0700 (Fri, 24 Jan 2014) | added iprscan to maker.css for MWAS
r1070 | cholt | 2014-01-26 20:27:52 -0700 (Sun, 26 Jan 2014) | attempt to fix ipr_update issues with Name ne to ID and fix lock with GFF3DB as well as docs for JBrowse and MAKER install
r1071 | cholt | 2014-01-26 20:41:55 -0700 (Sun, 26 Jan 2014) | alter install to hide MWAS fix skip of small contigs and map forward of genes with est_forward
r1072 | cholt | 2014-01-28 11:20:41 -0700 (Tue, 28 Jan 2014) | added message to get user to use the correct maker executable and updated INSTALL docs
r1073 | cholt | 2014-01-28 11:36:19 -0700 (Tue, 28 Jan 2014) | further update to maker from wrong directory message when name has whitespace
r1074 | cholt | 2014-02-03 14:48:05 -0700 (Mon, 03 Feb 2014) | fixed segfault on exit for OpenMPI
r1075 | cholt | 2014-02-03 15:32:38 -0700 (Mon, 03 Feb 2014) | added support for optional test compiler flags to be used with MVAPICH2
r1076 | cholt | 2014-02-03 15:38:52 -0700 (Mon, 03 Feb 2014) | fixed build commit missing m option
r1077 | cholt | 2014-02-04 14:29:43 -0700 (Tue, 04 Feb 2014) | made MPI communication always serialize
r1078 | cholt | 2014-02-05 11:23:10 -0700 (Wed, 05 Feb 2014) | updated MPI calling to use probe for size rather than another message for faster performance
r1079 | cholt | 2014-02-06 08:29:45 -0700 (Thu, 06 Feb 2014) | fixed labeling bug, fixed hanging MPI calls, fixed trnascan introns, and length
r1080 | cholt | 2014-02-11 10:08:33 -0700 (Tue, 11 Feb 2014) | switch FindBin::Bin for FindBin::RealBin throughout
r1081 | cholt | 2014-02-11 10:49:24 -0700 (Tue, 11 Feb 2014) | MAKER stable release version 2.31

From fbarreto at ucsd.edu  Tue Mar 18 10:08:47 2014
From: fbarreto at ucsd.edu (Felipe Barreto)
Date: Tue, 18 Mar 2014 09:08:47 -0700
Subject: [maker-devel] Size of initial EST training set for SNAP
Message-ID: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>

Hi, all,

I've been learning a lot from reading posts from this group, and finally
started doing actual runs of Maker on our current genome assembly
(arthropod, genome size ~230Mb).  I started by training SNAP, but would
like to check my approach before continuing with longer runs.

>From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I
deemed of very high quality based on blast alignments to Swiss-Prot (based
on query-subject coverage, bit score, etc).  I then used only these 2000
ESTs in a first Maker run using est2genome=1.  The output returned 1500
models (with the 500 "missing" models probably a result of single-exon
issues; not a concern at this point).

I now plan on training SNAP with this first output, and then doing another
Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein
evidence, and 3) SNAP with the first HMM file.  The output of this second
run will be used to re-train SNAP, and this second HMM file will be used in
a final "official" run (while continuing to provide the EST and protein
evidence, of course).

Does this sound like a reasonable approach?  Simply put, my main concern is
whether I'm using too few ESTs in my first est2genome step.

Thanks for any insight!

-- 
Felipe Barreto
Post-doctoral Scholar
Scripps Institution of Oceanography
University of California, San Diego
La Jolla, CA 92093
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/c8c3b2ba/attachment-0002.html>

From carsonhh at gmail.com  Tue Mar 18 10:14:29 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 18 Mar 2014 10:14:29 -0600
Subject: [maker-devel] Size of initial EST training set for SNAP
In-Reply-To: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>
References: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>
Message-ID: <CF4DCCE1.ADEE%carsonhh@gmail.com>

That sounds good.  1,500 initial models should be more than sufficient for
the first round of training.

?Carson


From:  Felipe Barreto <fbarreto at ucsd.edu>
Date:  Tuesday, March 18, 2014 at 10:08 AM
To:  MAKER group <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Size of initial EST training set for SNAP

Hi, all,

I've been learning a lot from reading posts from this group, and finally
started doing actual runs of Maker on our current genome assembly
(arthropod, genome size ~230Mb).  I started by training SNAP, but would like
to check my approach before continuing with longer runs.

>From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I
deemed of very high quality based on blast alignments to Swiss-Prot (based
on query-subject coverage, bit score, etc).  I then used only these 2000
ESTs in a first Maker run using est2genome=1.  The output returned 1500
models (with the 500 "missing" models probably a result of single-exon
issues; not a concern at this point).

I now plan on training SNAP with this first output, and then doing another
Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein
evidence, and 3) SNAP with the first HMM file.  The output of this second
run will be used to re-train SNAP, and this second HMM file will be used in
a final "official" run (while continuing to provide the EST and protein
evidence, of course).

Does this sound like a reasonable approach?  Simply put, my main concern is
whether I'm using too few ESTs in my first est2genome step.

Thanks for any insight!

-- 
Felipe Barreto
Post-doctoral Scholar
Scripps Institution of Oceanography
University of California, San Diego
La Jolla, CA 92093 
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/2cd5fce1/attachment-0002.html>

From dence at genetics.utah.edu  Tue Mar 18 10:16:20 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Tue, 18 Mar 2014 16:16:20 +0000
Subject: [maker-devel] Size of initial EST training set for SNAP
In-Reply-To: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>
References: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6E483@mxb2.hg.genetics.utah.edu>

Hi Felipe,

I think 1500 models sounds like a good size set with which to train SNAP. I think that SNAP expects ~1000 models for training.

The only other comment on the approach is perhaps that using only one ab-initio predictor is a little bit risky. Using multiple predictors would allow MAKER to select from among their different models for the one that best fits the evidence.

Good luck and let us know if there's anything we can help with!

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Felipe Barreto [fbarreto at ucsd.edu]
Sent: Tuesday, March 18, 2014 10:08 AM
To: MAKER group
Subject: [maker-devel] Size of initial EST training set for SNAP

Hi, all,

I've been learning a lot from reading posts from this group, and finally started doing actual runs of Maker on our current genome assembly (arthropod, genome size ~230Mb).  I started by training SNAP, but would like to check my approach before continuing with longer runs.

>From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I deemed of very high quality based on blast alignments to Swiss-Prot (based on query-subject coverage, bit score, etc).  I then used only these 2000 ESTs in a first Maker run using est2genome=1.  The output returned 1500 models (with the 500 "missing" models probably a result of single-exon issues; not a concern at this point).

I now plan on training SNAP with this first output, and then doing another Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein evidence, and 3) SNAP with the first HMM file.  The output of this second run will be used to re-train SNAP, and this second HMM file will be used in a final "official" run (while continuing to provide the EST and protein evidence, of course).

Does this sound like a reasonable approach?  Simply put, my main concern is whether I'm using too few ESTs in my first est2genome step.

Thanks for any insight!

--
Felipe Barreto
Post-doctoral Scholar
Scripps Institution of Oceanography
University of California, San Diego
La Jolla, CA 92093
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/b9bf5ff0/attachment-0002.html>

From barry.utah at gmail.com  Tue Mar 18 10:26:45 2014
From: barry.utah at gmail.com (Barry Moore)
Date: Tue, 18 Mar 2014 10:26:45 -0600
Subject: [maker-devel] Size of initial EST training set for SNAP
In-Reply-To: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>
References: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>
Message-ID: <02A2F388-D911-4C73-BF34-47A125A62EE5@genetics.utah.edu>

Hi Felipe,

I think that plan sounds quite reasonable.  To address your primary concern, most gene prediction tools recommend something in the range of a minimum of a few hundred gene models to train on.  Since your an order of magnitude above that I think your in good shape.  Having said that, of course if you have concerns about biases in your training set you may be able to supplement it further by using a tool like CEGMA (http://korflab.ucdavis.edu/datasets/cegma/) to include high confidence genes that your set is missing.

Since the final gene set will only be as complete as the gene predictions that MAKER has to choose from I would suggest that you also consider including at least one other gene predictor.  Augustus works well on a wide variety of genomes and while it is more difficult to train than SNAP it does accept hints from MAKER and will likely add to the diversity of the final gene set, even if you choose to use an existing HMM that has some reasonable relationship to your genome.  This is one of the advantages of MAKER supervision, while it would be best to train Augustus as well, MAKER will ensure that the final models are not too far out of line with the evidence and you'll likely see quite good results using a custom SNAP HMM and an existing Augustus HMM as predictor within MAKER.

Thanks,

B

On Mar 18, 2014, at 10:08 AM, Felipe Barreto wrote:

> Hi, all,
> 
> I've been learning a lot from reading posts from this group, and finally started doing actual runs of Maker on our current genome assembly (arthropod, genome size ~230Mb).  I started by training SNAP, but would like to check my approach before continuing with longer runs.  
> 
> From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I deemed of very high quality based on blast alignments to Swiss-Prot (based on query-subject coverage, bit score, etc).  I then used only these 2000 ESTs in a first Maker run using est2genome=1.  The output returned 1500 models (with the 500 "missing" models probably a result of single-exon issues; not a concern at this point).
> 
> I now plan on training SNAP with this first output, and then doing another Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein evidence, and 3) SNAP with the first HMM file.  The output of this second run will be used to re-train SNAP, and this second HMM file will be used in a final "official" run (while continuing to provide the EST and protein evidence, of course).
> 
> Does this sound like a reasonable approach?  Simply put, my main concern is whether I'm using too few ESTs in my first est2genome step.
> 
> Thanks for any insight!
> 
> -- 
> Felipe Barreto
> Post-doctoral Scholar
> Scripps Institution of Oceanography
> University of California, San Diego
> La Jolla, CA 92093
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/94293e29/attachment-0002.html>

From fbarreto at ucsd.edu  Tue Mar 18 10:59:39 2014
From: fbarreto at ucsd.edu (Felipe Barreto)
Date: Tue, 18 Mar 2014 09:59:39 -0700
Subject: [maker-devel] Size of initial EST training set for SNAP
In-Reply-To: <02A2F388-D911-4C73-BF34-47A125A62EE5@genetics.utah.edu>
References: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>
	<02A2F388-D911-4C73-BF34-47A125A62EE5@genetics.utah.edu>
Message-ID: <CAOi0ENYUcJFJsg0nDj3-9if0E96N+UY=vPyJkfH0T4xvFYOQ3w@mail.gmail.com>

Thanks, guys, for the swift and informative response!  I will try to train
Augustus again, but at the very least, will include it with an arthropod
HMM in my final run (in addition to my custom SNAP HMM).

Cheers,

Felipe


On Tue, Mar 18, 2014 at 9:26 AM, Barry Moore <barry.utah at gmail.com> wrote:

> Hi Felipe,
>
> I think that plan sounds quite reasonable.  To address your primary
> concern, most gene prediction tools recommend something in the range of a
> minimum of a few hundred gene models to train on.  Since your an order of
> magnitude above that I think your in good shape.  Having said that, of
> course if you have concerns about biases in your training set you may be
> able to supplement it further by using a tool like CEGMA (
> http://korflab.ucdavis.edu/datasets/cegma/) to include high confidence
> genes that your set is missing.
>
> Since the final gene set will only be as complete as the gene predictions
> that MAKER has to choose from I would suggest that you also consider
> including at least one other gene predictor.  Augustus works well on a wide
> variety of genomes and while it is more difficult to train than SNAP it
> does accept hints from MAKER and will likely add to the diversity of the
> final gene set, even if you choose to use an existing HMM that has some
> reasonable relationship to your genome.  This is one of the advantages of
> MAKER supervision, while it would be best to train Augustus as well, MAKER
> will ensure that the final models are not too far out of line with the
> evidence and you'll likely see quite good results using a custom SNAP HMM
> and an existing Augustus HMM as predictor within MAKER.
>
> Thanks,
>
> B
>
> On Mar 18, 2014, at 10:08 AM, Felipe Barreto wrote:
>
> Hi, all,
>
> I've been learning a lot from reading posts from this group, and finally
> started doing actual runs of Maker on our current genome assembly
> (arthropod, genome size ~230Mb).  I started by training SNAP, but would
> like to check my approach before continuing with longer runs.
>
> From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I
> deemed of very high quality based on blast alignments to Swiss-Prot (based
> on query-subject coverage, bit score, etc).  I then used only these 2000
> ESTs in a first Maker run using est2genome=1.  The output returned 1500
> models (with the 500 "missing" models probably a result of single-exon
> issues; not a concern at this point).
>
> I now plan on training SNAP with this first output, and then doing another
> Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein
> evidence, and 3) SNAP with the first HMM file.  The output of this second
> run will be used to re-train SNAP, and this second HMM file will be used in
> a final "official" run (while continuing to provide the EST and protein
> evidence, of course).
>
> Does this sound like a reasonable approach?  Simply put, my main concern
> is whether I'm using too few ESTs in my first est2genome step.
>
> Thanks for any insight!
>
> --
> Felipe Barreto
> Post-doctoral Scholar
> Scripps Institution of Oceanography
> University of California, San Diego
> La Jolla, CA 92093
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
> Barry Moore
> Research Scientist
> Dept. of Human Genetics
> University of Utah
> Salt Lake City, UT 84112
> --------------------------------------------
> (801) 585-3543
>
>
>
>
>


-- 
Felipe Barreto
Post-doctoral Scholar
Scripps Institution of Oceanography
University of California, San Diego
La Jolla, CA 92093
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/f95daccd/attachment-0002.html>

From darasappan at gmail.com  Tue Mar 18 13:27:11 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Tue, 18 Mar 2014 14:27:11 -0500
Subject: [maker-devel] maker snap output files
Message-ID: <F88D0CA1-15E3-4E01-881F-4F697044B1FC@gmail.com>

Hello,

I ran maker after running SNAP ab initio prediction (following instructions from the maker tutorial).  It ran successfully and when I ran fasta_merge, I got several output fasta files. I?m unable to find information on the tutorial about interpreting these different files. I?m hoping one of you can help.

*maker.proteins.fasta
*maker.snap_masked.proteins.fasta
*maker.non_overlapping_ab_initio.proteins.fasta

What is the difference among these? They all have different number of sequences.

Similarly,with transcripts:

maker.non_overlapping_ab_initio.transcripts.fasta
maker.snap_masked.transcripts.fasta
maker.transcripts.fasta

Thanks
Dhivya


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/93fd247e/attachment-0002.html>

From carsonhh at gmail.com  Tue Mar 18 13:34:05 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 18 Mar 2014 13:34:05 -0600
Subject: [maker-devel] maker snap output files
In-Reply-To: <F88D0CA1-15E3-4E01-881F-4F697044B1FC@gmail.com>
References: <F88D0CA1-15E3-4E01-881F-4F697044B1FC@gmail.com>
Message-ID: <CF4DFA69.AE2E%carsonhh@gmail.com>

maker.proteins.fasta - these are the final filtered and modified protein
models (this is what you want)
maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab
initio predictions (for reference purposes)
maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant
rejected models that do not overlap the maker.proteins.fasta entries. If you
think you are missing a gene, look for it here.  Sometimes people use
interproscan (very slow) to analyze this file for false negatives.


These files are also described in the README distributed with MAKER in the
?MAKER OUTPUT? section.

Thanks,
Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Tuesday, March 18, 2014 at 1:27 PM
To:  Carson Holt <carsonhh at gmail.com>, <maker-devel at yandell-lab.org>
Subject:  maker snap output files

Hello,

I ran maker after running SNAP ab initio prediction (following instructions
from the maker tutorial).  It ran successfully and when I ran fasta_merge, I
got several output fasta files. I?m unable to find information on the
tutorial about interpreting these different files. I?m hoping one of you can
help.

*maker.proteins.fasta
*maker.snap_masked.proteins.fasta
*maker.non_overlapping_ab_initio.proteins.fasta

What is the difference among these? They all have different number of
sequences.

Similarly,with transcripts:

maker.non_overlapping_ab_initio.transcripts.fasta
maker.snap_masked.transcripts.fasta
maker.transcripts.fasta

Thanks
Dhivya


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/5d1345f9/attachment-0002.html>

From darasappan at gmail.com  Tue Mar 18 14:05:39 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Tue, 18 Mar 2014 15:05:39 -0500
Subject: [maker-devel] maker snap output files
In-Reply-To: <CF4DFA69.AE2E%carsonhh@gmail.com>
References: <F88D0CA1-15E3-4E01-881F-4F697044B1FC@gmail.com>
	<CF4DFA69.AE2E%carsonhh@gmail.com>
Message-ID: <05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com>

Thanks Carson.

Is it normal that in my maker results after running snap, the number of proteins (in *maker.proteins.fasta) Is actually less than the number of proteins in my pre-snap maker results?  I assumed that annotations through alignment+annotation through prediction would equal more annotations?

The unfiltered proteins file has more proteins though.

Thanks
Dhivya


On Mar 18, 2014, at 2:34 PM, Carson Holt <carsonhh at gmail.com> wrote:

> maker.proteins.fasta - these are the final filtered and modified protein models (this is what you want)
> maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab initio predictions (for reference purposes)
> maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant rejected models that do not overlap the maker.proteins.fasta entries. If you think you are missing a gene, look for it here.  Sometimes people use interproscan (very slow) to analyze this file for false negatives.
> 
> 
> These files are also described in the README distributed with MAKER in the ?MAKER OUTPUT? section.
> 
> Thanks,
> Carson
> 
> 
> 
> 
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Tuesday, March 18, 2014 at 1:27 PM
> To: Carson Holt <carsonhh at gmail.com>, <maker-devel at yandell-lab.org>
> Subject: maker snap output files
> 
> Hello,
> 
> I ran maker after running SNAP ab initio prediction (following instructions from the maker tutorial).  It ran successfully and when I ran fasta_merge, I got several output fasta files. I?m unable to find information on the tutorial about interpreting these different files. I?m hoping one of you can help.
> 
> *maker.proteins.fasta
> *maker.snap_masked.proteins.fasta
> *maker.non_overlapping_ab_initio.proteins.fasta
> 
> What is the difference among these? They all have different number of sequences.
> 
> Similarly,with transcripts:
> 
> maker.non_overlapping_ab_initio.transcripts.fasta
> maker.snap_masked.transcripts.fasta
> maker.transcripts.fasta
> 
> Thanks
> Dhivya
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/8f85193d/attachment-0002.html>

From carsonhh at gmail.com  Tue Mar 18 14:09:01 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 18 Mar 2014 14:09:01 -0600
Subject: [maker-devel] maker snap output files
In-Reply-To: <05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com>
References: <F88D0CA1-15E3-4E01-881F-4F697044B1FC@gmail.com>
	<CF4DFA69.AE2E%carsonhh@gmail.com>
	<05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com>
Message-ID: <CF4E0363.AE3D%carsonhh@gmail.com>

There can also be hint based predictions.  They may be similar in size, but
there is no rule.  Generally maker.snap_masked.proteins.fasta will be
larger, as gene predictors tend to over predict (as much as 10 fold).  You
should always review your annotations in something like Apollo, to see how
the models compare to the evidence.  Just counts don?t really mean anything.

Thanks,
Carson

From:  dhivya arasappan <darasappan at gmail.com>
Date:  Tuesday, March 18, 2014 at 2:05 PM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  <maker-devel at yandell-lab.org>
Subject:  Re: maker snap output files

Thanks Carson.

Is it normal that in my maker results after running snap, the number of
proteins (in *maker.proteins.fasta) Is actually less than the number of
proteins in my pre-snap maker results?  I assumed that annotations through
alignment+annotation through prediction would equal more annotations?

The unfiltered proteins file has more proteins though.

Thanks
Dhivya


On Mar 18, 2014, at 2:34 PM, Carson Holt <carsonhh at gmail.com> wrote:

> maker.proteins.fasta - these are the final filtered and modified protein
> models (this is what you want)
> maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab initio
> predictions (for reference purposes)
> maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant
> rejected models that do not overlap the maker.proteins.fasta entries. If you
> think you are missing a gene, look for it here.  Sometimes people use
> interproscan (very slow) to analyze this file for false negatives.
> 
> 
> These files are also described in the README distributed with MAKER in the
> ?MAKER OUTPUT? section.
> 
> Thanks,
> Carson
> 
> 
> 
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Tuesday, March 18, 2014 at 1:27 PM
> To:  Carson Holt <carsonhh at gmail.com>, <maker-devel at yandell-lab.org>
> Subject:  maker snap output files
> 
> Hello,
> 
> I ran maker after running SNAP ab initio prediction (following instructions
> from the maker tutorial).  It ran successfully and when I ran fasta_merge, I
> got several output fasta files. I?m unable to find information on the tutorial
> about interpreting these different files. I?m hoping one of you can help.
> 
> *maker.proteins.fasta
> *maker.snap_masked.proteins.fasta
> *maker.non_overlapping_ab_initio.proteins.fasta
> 
> What is the difference among these? They all have different number of
> sequences.
> 
> Similarly,with transcripts:
> 
> maker.non_overlapping_ab_initio.transcripts.fasta
> maker.snap_masked.transcripts.fasta
> maker.transcripts.fasta
> 
> Thanks
> Dhivya
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/f5d761ca/attachment-0002.html>

From chrisbioinfo at gmail.com  Wed Mar 19 05:09:57 2014
From: chrisbioinfo at gmail.com (Chris Bioinfo)
Date: Wed, 19 Mar 2014 12:09:57 +0100
Subject: [maker-devel] Annotation with maker2
Message-ID: <CAF+kvSZO+VzHveN+WNmD3O8qayyrOFATS7VA2c-wLdGs1m4iTw@mail.gmail.com>

Hello,

I'm installing/using maker2 for the first time and I have an error by using
it.

I certainly missing something, but I don't know what.

I compile maker with no error message and I have all these directories
after compilation:
bin  data  GMOD  INSTALL  lib  LICENSE  MWAS  perl  README  src

Nevertheless when I try maker2 on the test data (dpp_contig.fasta) I have
this error:

STATUS: Now running MAKER...
examining contents of the fasta file and run log


--Next Contig--

#---------------------------------------------------------------------
Now starting the contig!!
SeqID: contig-dpp-500-500
Length: 32156
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
doing repeat masking
DBI connect('dbname=/path/to/dpp_contig.maker.output/dpp_contig.db','',...)
failed: unable to open database file at
/usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm

Can't call method "do" on an undefined value at
/usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm
--> rank=NA, hostname=belem
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:contig-dpp-500-500
...

ideas?

Best,

Christelle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140319/f54e5d3c/attachment-0002.html>

From carsonhh at gmail.com  Wed Mar 19 07:01:35 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 19 Mar 2014 07:01:35 -0600
Subject: [maker-devel] Annotation with maker2
In-Reply-To: <CAF+kvSZO+VzHveN+WNmD3O8qayyrOFATS7VA2c-wLdGs1m4iTw@mail.gmail.com>
References: <CAF+kvSZO+VzHveN+WNmD3O8qayyrOFATS7VA2c-wLdGs1m4iTw@mail.gmail.com>
Message-ID: <CF4EF035.AE6F%carsonhh@gmail.com>

Your problem is one of the following.  You need to reinstall the DBD::SQLite
module, you are running in a directory you don?t have permissions for, you
set your TMDIR environmental variable or TMP value in maker_opts.ctl to an
NFS mounted or memory mounted directory, or you are using a self compiled
version of Perl (I.e. not /usr/bin/perl) that has issues (probably with DB
or SQLite modules).  You can also completely delete the output directory,
and start again to see if it was just a random error.  You should look at
each of those first.  You can also run MAKER with the --debug command line
flag and send it to me if all of those seem not to be the issue.

Thanks,
Carson


From:  Chris Bioinfo <chrisbioinfo at gmail.com>
Date:  Wednesday, March 19, 2014 at 5:09 AM
To:  <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Annotation with maker2

Hello,

I'm installing/using maker2 for the first time and I have an error by using
it.

I certainly missing something, but I don't know what.

I compile maker with no error message and I have all these directories after
compilation: 
bin  data  GMOD  INSTALL  lib  LICENSE  MWAS  perl  README  src

Nevertheless when I try maker2 on the test data (dpp_contig.fasta) I have
this error:

STATUS: Now running MAKER...
examining contents of the fasta file and run log


--Next Contig--

#---------------------------------------------------------------------
Now starting the contig!!
SeqID: contig-dpp-500-500
Length: 32156
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
doing repeat masking
DBI connect('dbname=/path/to/dpp_contig.maker.output/dpp_contig.db','',...)
failed: unable to open database file at
/usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm

Can't call method "do" on an undefined value at
/usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm
--> rank=NA, hostname=belem
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:contig-dpp-500-500
...

ideas?

Best,

Christelle

_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140319/66e7fe68/attachment-0002.html>

From rbharris at uw.edu  Wed Mar 19 19:19:27 2014
From: rbharris at uw.edu (Rebecca Harris)
Date: Wed, 19 Mar 2014 18:19:27 -0700
Subject: [maker-devel] tradeoff between run time & file number
Message-ID: <CAESS274qd5dL9apLh3sobjkz0+vwjVa9j0Ytd5dR-Qrb4av+=Q@mail.gmail.com>

Hi -

I'm running maker on a dataset of >400,000 scaffolds with MPI -n 64. I've
gone through it once - and used the clean_up option because otherwise maker
exceeds the clusters file_quote. However, now I'm retraining SNAP and it is
taking a very long time - probably because it has to go through BLAST
again. Is there anyway of getting around this? I expect I may have to train
SNAP and rerun maker multiple times and it is taking about 3 weeks to get
through my dataset. Is there a way to prune down my original dataset based
on maker's output?

Thanks,
Rebecca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140319/80de6463/attachment-0002.html>

From dence at genetics.utah.edu  Wed Mar 19 23:43:11 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Thu, 20 Mar 2014 05:43:11 +0000
Subject: [maker-devel] tradeoff between run time & file number
In-Reply-To: <CAESS274qd5dL9apLh3sobjkz0+vwjVa9j0Ytd5dR-Qrb4av+=Q@mail.gmail.com>
References: <CAESS274qd5dL9apLh3sobjkz0+vwjVa9j0Ytd5dR-Qrb4av+=Q@mail.gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6F524@mxb2.hg.genetics.utah.edu>

Hi Rebecca, So, as far as pruning down the dataset goes, I think that the biggest gains will be made by trimming the number of scaffolds that you annotate. What is the n50 of your 400,000 scaffold set? Usually, scaffolds shorter than 5k or 10kbp won't contribute much to the gene counts in the end.

Also, if you can, try to avoid using the alt_est option. It works completely fine, but blasting those sequences takes much longer than blastn or blastp.

Otherwise, I'd need to see your maker_opts.ctl file to see how you've got things set up. You can attach those to your reply (to the maker-devel list), and I'll take a look. I don't how to force maker to create fewer files. You definitely want to be able to make use of the results from prior runs to save time.

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Rebecca Harris [rbharris at uw.edu]
Sent: Wednesday, March 19, 2014 7:19 PM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] tradeoff between run time & file number

Hi -

I'm running maker on a dataset of >400,000 scaffolds with MPI -n 64. I've gone through it once - and used the clean_up option because otherwise maker exceeds the clusters file_quote. However, now I'm retraining SNAP and it is taking a very long time - probably because it has to go through BLAST again. Is there anyway of getting around this? I expect I may have to train SNAP and rerun maker multiple times and it is taking about 3 weeks to get through my dataset. Is there a way to prune down my original dataset based on maker's output?

Thanks,
Rebecca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140320/c636afd0/attachment-0002.html>

From darasappan at gmail.com  Thu Mar 20 11:22:47 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Thu, 20 Mar 2014 12:22:47 -0500
Subject: [maker-devel] maker snap output files
In-Reply-To: <CF4E0363.AE3D%carsonhh@gmail.com>
References: <F88D0CA1-15E3-4E01-881F-4F697044B1FC@gmail.com>
	<CF4DFA69.AE2E%carsonhh@gmail.com>
	<05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com>
	<CF4E0363.AE3D%carsonhh@gmail.com>
Message-ID: <48D7969E-3BA8-4086-8886-11B32CDAA2A2@gmail.com>

Hi Carson,

Given that I now have maker transcripts, ab initio predicted transcripts and transcripts that don?t overlap, which ones are reflected in the gff file?

The ids in the gff file (for exons, genes, mrna) all say something like ?*snap-gene?  so does this mean these are the genes from the snap prediction tool?


Thanks
dhivya


On Mar 18, 2014, at 3:09 PM, Carson Holt <carsonhh at gmail.com> wrote:

> There can also be hint based predictions.  They may be similar in size, but there is no rule.  Generally maker.snap_masked.proteins.fasta will be larger, as gene predictors tend to over predict (as much as 10 fold).  You should always review your annotations in something like Apollo, to see how the models compare to the evidence.  Just counts don?t really mean anything.
> 
> Thanks,
> Carson
> 
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Tuesday, March 18, 2014 at 2:05 PM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: <maker-devel at yandell-lab.org>
> Subject: Re: maker snap output files
> 
> Thanks Carson.
> 
> Is it normal that in my maker results after running snap, the number of proteins (in *maker.proteins.fasta) Is actually less than the number of proteins in my pre-snap maker results?  I assumed that annotations through alignment+annotation through prediction would equal more annotations?
> 
> The unfiltered proteins file has more proteins though.
> 
> Thanks
> Dhivya
> 
> 
> 
> On Mar 18, 2014, at 2:34 PM, Carson Holt <carsonhh at gmail.com> wrote:
> 
>> maker.proteins.fasta - these are the final filtered and modified protein models (this is what you want)
>> maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab initio predictions (for reference purposes)
>> maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant rejected models that do not overlap the maker.proteins.fasta entries. If you think you are missing a gene, look for it here.  Sometimes people use interproscan (very slow) to analyze this file for false negatives.
>> 
>> 
>> These files are also described in the README distributed with MAKER in the ?MAKER OUTPUT? section.
>> 
>> Thanks,
>> Carson
>> 
>> 
>> 
>> 
>> From: dhivya arasappan <darasappan at gmail.com>
>> Date: Tuesday, March 18, 2014 at 1:27 PM
>> To: Carson Holt <carsonhh at gmail.com>, <maker-devel at yandell-lab.org>
>> Subject: maker snap output files
>> 
>> Hello,
>> 
>> I ran maker after running SNAP ab initio prediction (following instructions from the maker tutorial).  It ran successfully and when I ran fasta_merge, I got several output fasta files. I?m unable to find information on the tutorial about interpreting these different files. I?m hoping one of you can help.
>> 
>> *maker.proteins.fasta
>> *maker.snap_masked.proteins.fasta
>> *maker.non_overlapping_ab_initio.proteins.fasta
>> 
>> What is the difference among these? They all have different number of sequences.
>> 
>> Similarly,with transcripts:
>> 
>> maker.non_overlapping_ab_initio.transcripts.fasta
>> maker.snap_masked.transcripts.fasta
>> maker.transcripts.fasta
>> 
>> Thanks
>> Dhivya
>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140320/9aed362d/attachment-0002.html>

From carsonhh at gmail.com  Thu Mar 20 11:24:41 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 20 Mar 2014 11:24:41 -0600
Subject: [maker-devel] maker snap output files
In-Reply-To: <48D7969E-3BA8-4086-8886-11B32CDAA2A2@gmail.com>
References: <F88D0CA1-15E3-4E01-881F-4F697044B1FC@gmail.com>
	<CF4DFA69.AE2E%carsonhh@gmail.com>
	<05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com>
	<CF4E0363.AE3D%carsonhh@gmail.com>
	<48D7969E-3BA8-4086-8886-11B32CDAA2A2@gmail.com>
Message-ID: <CF508021.AF35%carsonhh@gmail.com>

maker transcripts will be the gene/mRNA/exon/CDS features

All other transcripts from SNAP etc. will be match/match_part features in
the GFF3.

When you look at these in something like Apollo, they will be placed in
different viewing panels based on their type.

Thanks,
Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Thursday, March 20, 2014 at 11:22 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  <maker-devel at yandell-lab.org>
Subject:  Re: maker snap output files

Hi Carson,

Given that I now have maker transcripts, ab initio predicted transcripts and
transcripts that don?t overlap, which ones are reflected in the gff file?

The ids in the gff file (for exons, genes, mrna) all say something like
?*snap-gene?  so does this mean these are the genes from the snap prediction
tool?


Thanks
dhivya


On Mar 18, 2014, at 3:09 PM, Carson Holt <carsonhh at gmail.com> wrote:

> There can also be hint based predictions.  They may be similar in size, but
> there is no rule.  Generally maker.snap_masked.proteins.fasta will be larger,
> as gene predictors tend to over predict (as much as 10 fold).  You should
> always review your annotations in something like Apollo, to see how the models
> compare to the evidence.  Just counts don?t really mean anything.
> 
> Thanks,
> Carson
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Tuesday, March 18, 2014 at 2:05 PM
> To:  Carson Holt <carsonhh at gmail.com>
> Cc:  <maker-devel at yandell-lab.org>
> Subject:  Re: maker snap output files
> 
> Thanks Carson.
> 
> Is it normal that in my maker results after running snap, the number of
> proteins (in *maker.proteins.fasta) Is actually less than the number of
> proteins in my pre-snap maker results?  I assumed that annotations through
> alignment+annotation through prediction would equal more annotations?
> 
> The unfiltered proteins file has more proteins though.
> 
> Thanks
> Dhivya
> 
> 
> 
> On Mar 18, 2014, at 2:34 PM, Carson Holt <carsonhh at gmail.com> wrote:
> 
>> maker.proteins.fasta - these are the final filtered and modified protein
>> models (this is what you want)
>> maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab
>> initio predictions (for reference purposes)
>> maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant
>> rejected models that do not overlap the maker.proteins.fasta entries. If you
>> think you are missing a gene, look for it here.  Sometimes people use
>> interproscan (very slow) to analyze this file for false negatives.
>> 
>> 
>> These files are also described in the README distributed with MAKER in the
>> ?MAKER OUTPUT? section.
>> 
>> Thanks,
>> Carson
>> 
>> 
>> 
>> 
>> From:  dhivya arasappan <darasappan at gmail.com>
>> Date:  Tuesday, March 18, 2014 at 1:27 PM
>> To:  Carson Holt <carsonhh at gmail.com>, <maker-devel at yandell-lab.org>
>> Subject:  maker snap output files
>> 
>> Hello,
>> 
>> I ran maker after running SNAP ab initio prediction (following instructions
>> from the maker tutorial).  It ran successfully and when I ran fasta_merge, I
>> got several output fasta files. I?m unable to find information on the
>> tutorial about interpreting these different files. I?m hoping one of you can
>> help.
>> 
>> *maker.proteins.fasta
>> *maker.snap_masked.proteins.fasta
>> *maker.non_overlapping_ab_initio.proteins.fasta
>> 
>> What is the difference among these? They all have different number of
>> sequences.
>> 
>> Similarly,with transcripts:
>> 
>> maker.non_overlapping_ab_initio.transcripts.fasta
>> maker.snap_masked.transcripts.fasta
>> maker.transcripts.fasta
>> 
>> Thanks
>> Dhivya
>> 
>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140320/5d055334/attachment-0002.html>

From carsonhh at gmail.com  Thu Mar 20 11:53:24 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 20 Mar 2014 11:53:24 -0600
Subject: [maker-devel] tradeoff between run time & file number
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6F524@mxb2.hg.genetics.utah.edu>
References: <CAESS274qd5dL9apLh3sobjkz0+vwjVa9j0Ytd5dR-Qrb4av+=Q@mail.gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6F524@mxb2.hg.genetics.utah.edu>
Message-ID: <CF50861A.AF65%carsonhh@gmail.com>

You may also want to try the GFF3 pass_through options.  Basically you give
your GFF3 file to maker_gff, tell it what kinds of evidence to maintain from
your past run by setting the 'pass' options to 1.  Then you can run without
your fast file inputs for ESTs, Proteins, and repeats (also blank out repeat
masker species as well).  The values will be passed forward from the GFF3
file into the current run.

--Carson


From:  Daniel Ence <dence at genetics.utah.edu>
Date:  Wednesday, March 19, 2014 at 11:43 PM
To:  Rebecca Harris <rbharris at uw.edu>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] tradeoff between run time & file number

Hi Rebecca, So, as far as pruning down the dataset goes, I think that the
biggest gains will be made by trimming the number of scaffolds that you
annotate. What is the n50 of your 400,000 scaffold set? Usually, scaffolds
shorter than 5k or 10kbp won't contribute much to the gene counts in the
end. 

Also, if you can, try to avoid using the alt_est option. It works completely
fine, but blasting those sequences takes much longer than blastn or blastp.

Otherwise, I'd need to see your maker_opts.ctl file to see how you've got
things set up. You can attach those to your reply (to the maker-devel list),
and I'll take a look. I don't how to force maker to create fewer files. You
definitely want to be able to make use of the results from prior runs to
save time. 

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330

From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Rebecca
Harris [rbharris at uw.edu]
Sent: Wednesday, March 19, 2014 7:19 PM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] tradeoff between run time & file number

Hi - 

I'm running maker on a dataset of >400,000 scaffolds with MPI -n 64. I've
gone through it once - and used the clean_up option because otherwise maker
exceeds the clusters file_quote. However, now I'm retraining SNAP and it is
taking a very long time - probably because it has to go through BLAST again.
Is there anyway of getting around this? I expect I may have to train SNAP
and rerun maker multiple times and it is taking about 3 weeks to get through
my dataset. Is there a way to prune down my original dataset based on
maker's output?

Thanks,
Rebecca
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140320/583f25f5/attachment-0002.html>

From carsonhh at gmail.com  Fri Mar 21 08:23:18 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 21 Mar 2014 08:23:18 -0600
Subject: [maker-devel] Annotation with maker2
In-Reply-To: <CAF+kvSZZJA1+ZvRfqArTERXSy_aTZJ07w4kE_JgR0eo1mWe3FQ@mail.gmail.com>
References: <CAF+kvSZO+VzHveN+WNmD3O8qayyrOFATS7VA2c-wLdGs1m4iTw@mail.gmail.com>
	<CF4EF035.AE6F%carsonhh@gmail.com>
	<CAF+kvSasxjb7p_Wjtntmy2nht6kfL=JqaP5DfMGeC0GHkLy8Hw@mail.gmail.com>
	<CF5065FF.AEE7%carsonhh@gmail.com>
	<CAF+kvSbKs-sdFfvncqEgsAk4_XKbsB7KdB85fCdxpcWNe1rjWQ@mail.gmail.com>
	<CF506B1E.AEED%carsonhh@gmail.com>
	<CAF+kvSbmpRgneyfz6_tWsx_NS8ZWhuwnQAV0hA83qJrOVh-0hA@mail.gmail.com>
	<CAF+kvSZD7rpUeeoNGMKBGbH0zZN3bHksJFuUPP+hGoUKki34jw@mail.gmail.com>
	<CF506F04.AEF8%carsonhh@gmail.com>
	<CAF+kvSY_nWAFBH1YpKJqWV7qQ=XehHzhX9e+65miAG4f_+=ptA@mail.gmail.com>
	<CAF+kvSYYTA8pYFc0WY12+g6T_bk7P9MRUxNpzqtGkJARsA0wpg@mail.gmail.com>
	<CF50741C.AF02%carsonhh@gmail.com>
	<CAF+kvSZAhJnJdq+UcRfpWSya+6W26ecZHkRvHzGLsqk6K=fmQg@mail.gmail.com>
	<CF507AB2.AF1E%carsonhh@gmail.com>
	<CF507F90.AF30%carsonhh@gmail.com>
	<CAF+kvSZZJA1+ZvRfqArTERXSy_aTZJ07w4kE_JgR0eo1mWe3FQ@mail.gmail.com>
Message-ID: <CF51A74A.AFA8%carsonhh@gmail.com>

Glad it's working.  Let us know if anything else comes up.

--Carson


From:  Chris Bioinfo <chrisbioinfo at gmail.com>
Date:  Friday, March 21, 2014 at 4:57 AM
To:  Carson Holt <carsonhh at gmail.com>
Subject:  Re: [maker-devel] Annotation with maker2

Dear Carson

it works!! after many difficults :

I have installed sqlite3.8.4.1 yesterday: it was """better"""" (no error
message by launching sqlite3). Yet my test.db was not created..

Today I find the trick!
the problem was due to my too long path to created the db .. only that...

Thanks for your time and you help Carson!

All the best,

Christelle


2014-03-20 18:21 GMT+01:00 Carson Holt <carsonhh at gmail.com>:
> Also you can use this command line to test both before and after installing
> 
> perl -MDBI -MDBD::SQLite -e 'print "$DBD::SQLite::sqlite_version\n"; $dbh =
> DBI->connect("dbi:SQLite:dbname=/path/from/maker/error/dpp_contig.db","","");'
> 
> Make sure to set /path/from/maker/error/dpp_contig.db to whatever its was in
> the error.
> 
> --Carson
> 
> 
> From:  Carson Holt <carsonhh at gmail.com>
> Date:  Thursday, March 20, 2014 at 11:03 AM
> To:  Chris Bioinfo <chrisbioinfo at gmail.com>
> 
> Subject:  Re: [maker-devel] Annotation with maker2
> 
> The failure is in SQLite.  So you have to reinstall.  I.e. 'force install
> DBD::SQLite' in CPAN.  Otherwise you are just keeping whatever module is
> installed which may have broken C bindings.
> 
> You may also have to install SQLite 3.8.4.1, and then reinstall the perl
> modules using the force option to force recompile.
> 
> --Carson
> 
> 
> 
> From:  Chris Bioinfo <chrisbioinfo at gmail.com>
> Date:  Thursday, March 20, 2014 at 10:57 AM
> To:  Carson Holt <carsonhh at gmail.com>
> Subject:  Re: [maker-devel] Annotation with maker2
> 
> cpan[2]> install DBI
> DBI is up to date (1.631).
> 
> cpan[3]> install DBD::SQLite
> DBD::SQLite is up to date (1.42).
> 
> my test.db is not created effectively:
> 
> sqlite3 dpp_contig.maker.output/test.db
> SQLite version 3.8.3.1 2014-02-11 14:52:19
> Enter ".help" for instructions
> Enter SQL statements terminated with a ";"
> sqlite> 
> 
> 
> 
> 
> 2014-03-20 17:36 GMT+01:00 Carson Holt <carsonhh at gmail.com>:
>> I'm actually checking the mount points for the disk.  SQLite won't work on
>> filesystems that don't implement locks, and 'df' is a good way to infer some
>> of that info.
>> 
>> Basically I still think this is SQLlite failing on your system.  You might
>> need to reinstall SQLlite and then reinstall the perl DBI and DBD::SQLite
>> modules.
>> 
>> You can also do a test command --> 'sqllite3 dpp_contig.maker.output/test.db'
>> 
>> This will work if you have sqllite3 installed.  And any error it give may be
>> informative.
>> 
>> --Carson
>> 
>> From:  Chris Bioinfo <chrisbioinfo at gmail.com>
>> Date:  Thursday, March 20, 2014 at 10:29 AM
>> 
>> To:  Carson Holt <carsonhh at gmail.com>
>> Subject:  Re: [maker-devel] Annotation with maker2
>> 
>> oh sorry
>> 
>> my disks are quite full, but still space I guess for maker
>> 
>>  /dev/sdc1           19T     18T  934G  95% /home
>> 
>> 
>> 2014-03-20 17:23 GMT+01:00 Chris Bioinfo <chrisbioinfo at gmail.com>:
>>> this :
>>> 
>>>  du -h dpp_contig.maker.output/
>>> 0    
>>> dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500/theVoi
>>> d.contig-dpp-500-500/0
>>> 88K    
>>> dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500/theVoi
>>> d.contig-dpp-500-500
>>> 92K    dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500
>>> 92K    dpp_contig.maker.output/dpp_contig_datastore/05/1F
>>> 92K    dpp_contig.maker.output/dpp_contig_datastore/05
>>> 92K    dpp_contig.maker.output/dpp_contig_datastore
>>> 4.0K    dpp_contig.maker.output/dpp_contig_master_datastore_index.log
>>> 4.0K    dpp_contig.maker.output/maker_bopts.log
>>> 4.0K    dpp_contig.maker.output/maker_exe.log
>>> 8.0K    dpp_contig.maker.output/maker_opts.log
>>> 16K    dpp_contig.maker.output/mpi_blastdb/dpp_protein%2Efasta.mpi.1
>>> 44K    dpp_contig.maker.output/mpi_blastdb/dpp_contig%2Efasta.mpi.1
>>> 14M    dpp_contig.maker.output/mpi_blastdb/te_proteins%2Efasta.mpi.10
>>> 32K    dpp_contig.maker.output/mpi_blastdb/dpp_est%2Efasta.mpi.1
>>> 14M    dpp_contig.maker.output/mpi_blastdb
>>> 0    dpp_contig.maker.output/seen.dbm
>>> 
>>> 
>>> 
>>> 2014-03-20 17:10 GMT+01:00 Carson Holt <carsonhh at gmail.com>:
>>> 
>>>> What does 'df -h dpp_contig.maker.output' show?
>>>> 
>>>> --Carson
>>>> 
>>>> From:  Chris Bioinfo <chrisbioinfo at gmail.com>
>>>> Date:  Thursday, March 20, 2014 at 10:00 AM
>>>> 
>>>> To:  Carson Holt <carsonhh at gmail.com>
>>>> Subject:  Re: [maker-devel] Annotation with maker2
>>>> 
>>>> sorry, mistake on the dir!
>>>> 
>>>> I have these files:
>>>> dpp_contig_datastore  dpp_contig_master_datastore_index.log
>>>> maker_bopts.log  maker_exe.log  maker_opts.log  mpi_blastdb  seen.dbm
>>>> 
>>>> 
>>>> 2014-03-20 16:59 GMT+01:00 Chris Bioinfo <chrisbioinfo at gmail.com>:
>>>>> no,
>>>>> 
>>>>> I have theses files in the directory:
>>>>> dpp_contig.fasta         dpp_est.fasta      hsap_contig.fasta
>>>>> hsap_protein.fasta  maker_exe.ctl
>>>>> dpp_contig.maker.output  dpp_protein.fasta  hsap_est.fasta
>>>>> maker_bopts.ctl     maker_opts.ctl  te_proteins.fasta
>>>>> 
>>>>> 
>>>>> 
>>>>> 2014-03-20 16:53 GMT+01:00 Carson Holt <carsonhh at gmail.com>:
>>>>> 
>>>>>> Did 
>>>>>> /home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/dpp_contig.maker.
>>>>>> output/dpp_contig.db exist?
>>>>>> 
>>>>>> --Carson
>>>>>> 
>>>>>> 
>>>>>> From:  Chris Bioinfo <chrisbioinfo at gmail.com>
>>>>>> Date:  Thursday, March 20, 2014 at 9:50 AM
>>>>>> 
>>>>>> To:  Carson Holt <carsonhh at gmail.com>
>>>>>> Subject:  Re: [maker-devel] Annotation with maker2
>>>>>> 
>>>>>> cdantec at belem:~$ /usr/bin/perl -v
>>>>>> 
>>>>>> This is perl 5, version 18, subversion 1 (v5.18.1) built for
>>>>>> x86_64-linux-gnu-thread-multi
>>>>>> (with 46 registered patches, see perl -V for more detail)
>>>>>> 
>>>>>> Copyright 1987-2013, Larry Wall
>>>>>> 
>>>>>> Perl may be copied only under the terms of either the Artistic License or
>>>>>> the
>>>>>> GNU General Public License, which may be found in the Perl 5 source kit.
>>>>>> 
>>>>>> Complete documentation for Perl, including FAQ lists, should be found on
>>>>>> this system using "man perl" or "perldoc perl".  If you have access to
>>>>>> the
>>>>>> Internet, point your browser at http://www.perl.org/, the Perl Home Page.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 2014-03-20 16:32 GMT+01:00 Carson Holt <carsonhh at gmail.com>:
>>>>>>> What do you get for when you type --> /usr/bin/perl -v
>>>>>>> 
>>>>>>> The key to the error is this line -->
>>>>>>> DBI 
>>>>>>> connect('dbname=/home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/
>>>>>>> dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open
>>>>>>> database file
>>>>>>> 
>>>>>>> Either the database doesn't exist, or is corrupt.  Does it exist?
>>>>>>> 
>>>>>>> --Carson
>>>>>>> 
>>>>>>> From:  Chris Bioinfo <chrisbioinfo at gmail.com>
>>>>>>> Date:  Thursday, March 20, 2014 at 9:25 AM
>>>>>>> To:  Carson Holt <carsonhh at gmail.com>
>>>>>>> Subject:  Re: [maker-devel] Annotation with maker2
>>>>>>> 
>>>>>>> Dear Carson,
>>>>>>> 
>>>>>>> I have reinstalled DBD::SQLite module, check the permission in my
>>>>>>> directory, configure the TMP value in maker_opts.ctl. perl is in
>>>>>>> /usr/bin/perl.
>>>>>>> I have deleted many times  the output directory.. but same problem..
>>>>>>> 
>>>>>>> So here the debug output :
>>>>>>> ****MODULE VERSION INFO
>>>>>>>     0.05    Acme::Damn    /usr/local/lib/perl/5.18.1/Acme/Damn.pm
>>>>>>>     1.01    AnyDBM_File    /usr/share/perl/5.18/AnyDBM_File.pm
>>>>>>>     5.73    AutoLoader    /usr/share/perl/5.18/AutoLoader.pm
>>>>>>>     UNKNOWN    Bio::AnalysisParserI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/AnalysisParserI.pm
>>>>>>>     UNKNOWN    Bio::AnnotatableI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/AnnotatableI.pm
>>>>>>>     UNKNOWN    Bio::Annotation::Collection
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Annotation/Collection.pm
>>>>>>>     UNKNOWN    Bio::Annotation::SimpleValue
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Annotation/SimpleValue.pm
>>>>>>>     UNKNOWN    Bio::Annotation::TypeManager
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Annotation/TypeManager.pm
>>>>>>>     UNKNOWN    Bio::AnnotationCollectionI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/AnnotationCollectionI.pm
>>>>>>>     UNKNOWN    Bio::AnnotationI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/AnnotationI.pm
>>>>>>>     1.006923    Bio::DB::Fasta
>>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/Fasta.pm
>>>>>>>     UNKNOWN    Bio::DB::InMemoryCache
>>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/InMemoryCache.pm
>>>>>>>     UNKNOWN    Bio::DB::IndexedBase
>>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/IndexedBase.pm
>>>>>>>     UNKNOWN    Bio::DB::RandomAccessI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/RandomAccessI.pm
>>>>>>>     UNKNOWN    Bio::DB::SeqI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/SeqI.pm
>>>>>>>     UNKNOWN    Bio::DescribableI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/DescribableI.pm
>>>>>>>     UNKNOWN    Bio::Event::EventGeneratorI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Event/EventGeneratorI.pm
>>>>>>>     UNKNOWN    Bio::Event::EventHandlerI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Event/EventHandlerI.pm
>>>>>>>     UNKNOWN    Bio::Factory::ObjectFactory
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Factory/ObjectFactory.pm
>>>>>>>     UNKNOWN    Bio::Factory::ObjectFactoryI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Factory/ObjectFactoryI.pm
>>>>>>>     UNKNOWN    Bio::Factory::SequenceFactoryI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Factory/SequenceFactoryI.pm
>>>>>>>     UNKNOWN    Bio::FeatureHolderI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/FeatureHolderI.pm
>>>>>>>     UNKNOWN    Bio::IdentifiableI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/IdentifiableI.pm
>>>>>>>     UNKNOWN    Bio::LocatableSeq
>>>>>>> /usr/local/share/perl/5.18.1/Bio/LocatableSeq.pm
>>>>>>>     UNKNOWN    Bio::Location::Atomic
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Atomic.pm
>>>>>>>     UNKNOWN    Bio::Location::CoordinatePolicyI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/CoordinatePolicyI.pm
>>>>>>>     UNKNOWN    Bio::Location::Fuzzy
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Fuzzy.pm
>>>>>>>     UNKNOWN    Bio::Location::FuzzyLocationI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/FuzzyLocationI.pm
>>>>>>>     UNKNOWN    Bio::Location::Simple
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Simple.pm
>>>>>>>     UNKNOWN    Bio::Location::Split
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Split.pm
>>>>>>>     UNKNOWN    Bio::Location::SplitLocationI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/SplitLocationI.pm
>>>>>>>     UNKNOWN    Bio::Location::WidestCoordPolicy
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/WidestCoordPolicy.pm
>>>>>>>     UNKNOWN    Bio::LocationI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/LocationI.pm
>>>>>>>     UNKNOWN    Bio::PrimarySeq
>>>>>>> /usr/local/share/perl/5.18.1/Bio/PrimarySeq.pm
>>>>>>>     1.006923    Bio::PrimarySeqI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/PrimarySeqI.pm
>>>>>>>     UNKNOWN    Bio::Range    /usr/local/share/perl/5.18.1/Bio/Range.pm
>>>>>>>     UNKNOWN    Bio::RangeI    /usr/local/share/perl/5.18.1/Bio/RangeI.pm
>>>>>>>     1.006923    Bio::Root::Exception
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/Exception.pm
>>>>>>>     UNKNOWN    Bio::Root::HTTPget
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/HTTPget.pm
>>>>>>>     UNKNOWN    Bio::Root::IO
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/IO.pm
>>>>>>>     1.006923    Bio::Root::Root
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/Root.pm
>>>>>>>     1.006923    Bio::Root::RootI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/RootI.pm
>>>>>>>     1.006923    Bio::Root::Version
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/Version.pm
>>>>>>>     UNKNOWN    Bio::Search::HSP::GenericHSP
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/HSP/GenericHSP.pm
>>>>>>>     UNKNOWN    Bio::Search::HSP::HSPFactory
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/HSP/HSPFactory.pm
>>>>>>>     UNKNOWN    Bio::Search::HSP::HSPI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/HSP/HSPI.pm
>>>>>>>     0.01    Bio::Search::HSP::PhatHSP::Base
>>>>>>> 
/usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/Base.p>>>>>>>
m
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::augustus
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/august
>>>>>>> us.pm <http://augustus.pm>
>>>>>>>     0.01    Bio::Search::HSP::PhatHSP::blastn
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/blastn
>>>>>>> .pm <http://blastn.pm>
>>>>>>>     0.01    Bio::Search::HSP::PhatHSP::blastx
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/blastx
>>>>>>> .pm <http://blastx.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::cdna2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/cdna2g
>>>>>>> enome.pm <http://cdna2genome.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::est2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/est2ge
>>>>>>> nome.pm <http://est2genome.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::fgenesh
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/fgenes
>>>>>>> h.pm <http://fgenesh.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::genemark
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/genema
>>>>>>> rk.pm <http://genemark.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::gff3
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/gff3.p
>>>>>>> m <http://gff3.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::protein2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/protei
>>>>>>> n2genome.pm <http://protein2genome.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::repeatmasker
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/repeat
>>>>>>> masker.pm <http://repeatmasker.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::snap
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/snap.p
>>>>>>> m <http://snap.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::snoscan
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/snosca
>>>>>>> n.pm <http://snoscan.pm>
>>>>>>>     0.01    Bio::Search::HSP::PhatHSP::tblastx
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/tblast
>>>>>>> x.pm <http://tblastx.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::trnascan
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/trnasc
>>>>>>> an.pm <http://trnascan.pm>
>>>>>>>     1.006923    Bio::Search::Hit::GenericHit
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/Hit/GenericHit.pm
>>>>>>>     UNKNOWN    Bio::Search::Hit::HitFactory
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/Hit/HitFactory.pm
>>>>>>>     UNKNOWN    Bio::Search::Hit::HitI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/Hit/HitI.pm
>>>>>>>     0.01    Bio::Search::Hit::PhatHit::Base
>>>>>>> 
/usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/Base.p>>>>>>>
m
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::augustus
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/august
>>>>>>> us.pm <http://augustus.pm>
>>>>>>>     0.01    Bio::Search::Hit::PhatHit::blastn
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/blastn
>>>>>>> .pm <http://blastn.pm>
>>>>>>>     0.01    Bio::Search::Hit::PhatHit::blastx
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/blastx
>>>>>>> .pm <http://blastx.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::cdna2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/cdna2g
>>>>>>> enome.pm <http://cdna2genome.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::est2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/est2ge
>>>>>>> nome.pm <http://est2genome.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::fgenesh
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/fgenes
>>>>>>> h.pm <http://fgenesh.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::genemark
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/genema
>>>>>>> rk.pm <http://genemark.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::gff3
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/gff3.p
>>>>>>> m <http://gff3.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::protein2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/protei
>>>>>>> n2genome.pm <http://protein2genome.pm>
>>>>>>>     1.006923    Bio::Search::Hit::PhatHit::repeatmasker
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/repeat
>>>>>>> masker.pm <http://repeatmasker.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::snap
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/snap.p
>>>>>>> m <http://snap.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::snoscan
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/snosca
>>>>>>> n.pm <http://snoscan.pm>
>>>>>>>     0.01    Bio::Search::Hit::PhatHit::tblastx
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/tblast
>>>>>>> x.pm <http://tblastx.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::trnascan
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/trnasc
>>>>>>> an.pm <http://trnascan.pm>
>>>>>>>     1.006923    Bio::Search::SearchUtils
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/SearchUtils.pm
>>>>>>>     UNKNOWN    Bio::SearchIO
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SearchIO.pm
>>>>>>>     UNKNOWN    Bio::SearchIO::EventHandlerI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SearchIO/EventHandlerI.pm
>>>>>>>     UNKNOWN    Bio::SearchIO::SearchResultEventBuilder
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SearchIO/SearchResultEventBuilder.pm
>>>>>>>     UNKNOWN    Bio::Seq    /usr/local/share/perl/5.18.1/Bio/Seq.pm
>>>>>>>     UNKNOWN    Bio::Seq::SeqFactory
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Seq/SeqFactory.pm
>>>>>>>     UNKNOWN    Bio::SeqAnalysisParserI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqAnalysisParserI.pm
>>>>>>>     UNKNOWN    Bio::SeqFeature::FeaturePair
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/FeaturePair.pm
>>>>>>>     UNKNOWN    Bio::SeqFeature::Generic
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/Generic.pm
>>>>>>>     UNKNOWN    Bio::SeqFeature::Similarity
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/Similarity.pm
>>>>>>>     UNKNOWN    Bio::SeqFeature::SimilarityPair
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/SimilarityPair.pm
>>>>>>>     UNKNOWN    Bio::SeqFeatureI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeatureI.pm
>>>>>>>     UNKNOWN    Bio::SeqI    /usr/local/share/perl/5.18.1/Bio/SeqI.pm
>>>>>>>     UNKNOWN    Bio::SeqUtils
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqUtils.pm
>>>>>>>     1.006923    Bio::Tools::CodonTable
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Tools/CodonTable.pm
>>>>>>>     UNKNOWN    Bio::Tools::GFF
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Tools/GFF.pm
>>>>>>>     1.006923    Bio::Tools::IUPAC
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Tools/IUPAC.pm
>>>>>>>     7.3    Bit::Vector    /usr/local/lib/perl/5.18.1/Bit/Vector.pm
>>>>>>>     0.01    CGL::Annotation
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation.pm
>>>>>>>     0.01    CGL::Annotation::Feature
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature.pm
>>>>>>>     0.01    CGL::Annotation::Feature::Contig
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Contig
>>>>>>> .pm
>>>>>>>     0.01    CGL::Annotation::Feature::Exon
>>>>>>> 
/usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Exon.p>>>>>>>
m
>>>>>>>     0.01    CGL::Annotation::Feature::Gene
>>>>>>> 
/usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Gene.p>>>>>>>
m
>>>>>>>     0.01    CGL::Annotation::Feature::Intron
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Intron
>>>>>>> .pm
>>>>>>>     0.01    CGL::Annotation::Feature::Protein
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Protei
>>>>>>> n.pm
>>>>>>>     0.01    CGL::Annotation::Feature::Sequence_variant
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Sequen
>>>>>>> ce_variant.pm
>>>>>>>     0.01    CGL::Annotation::Feature::Transcript
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Transc
>>>>>>> ript.pm
>>>>>>>     0.01    CGL::Annotation::FeatureLocation
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/FeatureLocatio
>>>>>>> n.pm
>>>>>>>     0.01    CGL::Annotation::FeatureRelationship
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/FeatureRelatio
>>>>>>> nship.pm
>>>>>>>     0.01    CGL::Annotation::Iterator
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Iterator.pm
>>>>>>>     0.01    CGL::Annotation::Trace
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Trace.pm
>>>>>>>     0.01    CGL::Clone
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Clone.pm
>>>>>>>     0.01    CGL::Ontology::Node
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Node.pm
>>>>>>>     0.01    CGL::Ontology::NodeRelationship
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/NodeRelationship
>>>>>>> .pm
>>>>>>>     0.01    CGL::Ontology::Ontology
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Ontology.pm
>>>>>>>     0.01    CGL::Ontology::Parser::OBO
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Parser/OBO.pm
>>>>>>>     0.01    CGL::Ontology::SO
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/SO.pm
>>>>>>>     0.01    CGL::Ontology::Trace
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Trace.pm
>>>>>>>     0.01    CGL::Revcomp
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Revcomp.pm
>>>>>>>     0.01    CGL::TranslationMachine
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/TranslationMachine.pm
>>>>>>>     1.32    Carp    /usr/local/share/perl/5.18.1/Carp.pm
>>>>>>>     1.32    Carp::Heavy    /usr/local/share/perl/5.18.1/Carp/Heavy.pm
>>>>>>>     0.64    Class::Struct    /usr/share/perl/5.18/Class/Struct.pm
>>>>>>>     0.36    Clone    /usr/local/lib/perl/5.18.1/Clone.pm
>>>>>>>     5.018001    Config    /usr/lib/perl/5.18/Config.pm
>>>>>>>     3.40    Cwd    /usr/lib/perl/5.18/Cwd.pm
>>>>>>>     1.42    DBD::SQLite    /usr/local/lib/perl/5.18.1/DBD/SQLite.pm
>>>>>>>     1.631    DBI    /usr/local/lib/perl/5.18.1/DBI.pm
>>>>>>>     1.827    DB_File    /usr/lib/perl/5.18/DB_File.pm
>>>>>>>     2.145    Data::Dumper    /usr/lib/perl/5.18/Data/Dumper.pm
>>>>>>>     0.11    Datastore::Base
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Datastore/Base.pm
>>>>>>>     0.01    Datastore::MD5
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Datastore/MD5.pm
>>>>>>>     2.53    Digest::MD5    /usr/local/lib/perl/5.18.1/Digest/MD5.pm
>>>>>>>     1.16    Digest::base    /usr/share/perl/5.18/Digest/base.pm
>>>>>>> <http://base.pm>
>>>>>>>     UNKNOWN    Dumper::GFF::GFFV3
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Dumper/GFF/GFFV3.pm
>>>>>>>     UNKNOWN    Dumper::XML::Game
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Dumper/XML/Game.pm
>>>>>>>     UNKNOWN    Dumper::XML::Game_Xml
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Dumper/XML/Game_Xml.pm
>>>>>>>     1.18    DynaLoader    /usr/lib/perl/5.18/DynaLoader.pm
>>>>>>>     1.18    Errno    /usr/lib/perl/5.18/Errno.pm
>>>>>>>     0.17015    Error
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm
>>>>>>>     UNKNOWN    Error::Simple
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error/Simple.pm
>>>>>>>     5.68    Exporter    /usr/share/perl/5.18/Exporter.pm
>>>>>>>     5.68    Exporter::Heavy    /usr/share/perl/5.18/Exporter/Heavy.pm
>>>>>>>     UNKNOWN    Fasta
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Fasta.pm
>>>>>>>     UNKNOWN    FastaChunk
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaChunk.pm
>>>>>>>     UNKNOWN    FastaChunker
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaChunker.pm
>>>>>>>     UNKNOWN    FastaDB
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaDB.pm
>>>>>>>     UNKNOWN    FastaFile
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaFile.pm
>>>>>>>     UNKNOWN    FastaSeq
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaSeq.pm
>>>>>>>     1.11    Fcntl    /usr/lib/perl/5.18/Fcntl.pm
>>>>>>>     2.84    File::Basename    /usr/share/perl/5.18/File/Basename.pm
>>>>>>>     2.26    File::Copy    /usr/share/perl/5.18/File/Copy.pm
>>>>>>>     1.20    File::Glob    /usr/lib/perl/5.18/File/Glob.pm
>>>>>>>     1.20    File::NFSLock
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/File/NFSLock.pm
>>>>>>>     2.09    File::Path    /usr/share/perl/5.18/File/Path.pm
>>>>>>>     3.40    File::Spec    /usr/lib/perl/5.18/File/Spec.pm
>>>>>>>     3.40    File::Spec::Unix    /usr/lib/perl/5.18/File/Spec/Unix.pm
>>>>>>>     0.2304    File::Temp    /usr/local/share/perl/5.18.1/File/Temp.pm
>>>>>>>     1.09    File::Which
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/File/Which.pm
>>>>>>>     2.02    FileHandle    /usr/share/perl/5.18/FileHandle.pm
>>>>>>>     1.51    FindBin    /usr/share/perl/5.18/FindBin.pm
>>>>>>>     UNKNOWN    GFFDB
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm
>>>>>>>     UNKNOWN    GI    /usr/local/annotation/maker2.31/bin/../lib/GI.pm
>>>>>>>     2.42    Getopt::Long    /usr/local/share/perl/5.18.1/Getopt/Long.pm
>>>>>>>     6.02    HTTP::Date    /usr/share/perl5/HTTP/Date.pm
>>>>>>>     6.05    HTTP::Headers    /usr/share/perl5/HTTP/Headers.pm
>>>>>>>     6.06    HTTP::Message    /usr/share/perl5/HTTP/Message.pm
>>>>>>>     6.00    HTTP::Request    /usr/share/perl5/HTTP/Request.pm
>>>>>>>     6.04    HTTP::Response    /usr/share/perl5/HTTP/Response.pm
>>>>>>>     6.03    HTTP::Status    /usr/share/perl5/HTTP/Status.pm
>>>>>>>     1.28    IO    /usr/lib/perl/5.18/IO.pm
>>>>>>>     1.16    IO::File    /usr/lib/perl/5.18/IO/File.pm
>>>>>>>     1.34    IO::Handle    /usr/lib/perl/5.18/IO/Handle.pm
>>>>>>>     1.1    IO::Seekable    /usr/lib/perl/5.18/IO/Seekable.pm
>>>>>>>     1.21    IO::Select    /usr/lib/perl/5.18/IO/Select.pm
>>>>>>>     1.36    IO::Socket    /usr/lib/perl/5.18/IO/Socket.pm
>>>>>>>     1.33    IO::Socket::INET    /usr/lib/perl/5.18/IO/Socket/INET.pm
>>>>>>>     1.24    IO::Socket::UNIX    /usr/lib/perl/5.18/IO/Socket/UNIX.pm
>>>>>>>     1.13    IPC::Open3    /usr/share/perl/5.18/IPC/Open3.pm
>>>>>>>     0.53    Inline    /usr/local/share/perl/5.18.1/Inline.pm
>>>>>>>     UNKNOWN    Inline::denter
>>>>>>> /usr/local/share/perl/5.18.1/Inline/denter.pm <http://denter.pm>
>>>>>>>     UNKNOWN    Iterator
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator.pm
>>>>>>>     UNKNOWN    Iterator::Any
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator/Any.pm
>>>>>>>     UNKNOWN    Iterator::Fasta
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator/Fasta.pm
>>>>>>>     UNKNOWN    Iterator::GFF3
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator/GFF3.pm
>>>>>>>     6.05    LWP    /usr/share/perl5/LWP.pm
>>>>>>>     UNKNOWN    LWP::MemberMixin    /usr/share/perl5/LWP/MemberMixin.pm
>>>>>>>     6.00    LWP::Protocol    /usr/share/perl5/LWP/Protocol.pm
>>>>>>>     6.05    LWP::UserAgent    /usr/share/perl5/LWP/UserAgent.pm
>>>>>>>     0.33    List::MoreUtils
>>>>>>> /usr/local/lib/perl/5.18.1/List/MoreUtils.pm
>>>>>>>     1.38    List::Util    /usr/local/lib/perl/5.18.1/List/Util.pm
>>>>>>>     UNKNOWN    MAKER::ConfigData
>>>>>>> /usr/local/annotation/maker2.31/bin/../perl/lib/MAKER/ConfigData.pm
>>>>>>>     1.32    POSIX    /usr/lib/perl/5.18/POSIX.pm
>>>>>>>     0.01    Parallel::Application::MPI
>>>>>>> /usr/local/annotation/maker2.31/bin/../perl/lib/Parallel/Application/MPI
>>>>>>> .pm
>>>>>>>     0.02    Perl::Unsafe::Signals
>>>>>>> /usr/local/lib/perl/5.18.1/Perl/Unsafe/Signals.pm
>>>>>>>     UNKNOWN    PhatHit_utils
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/PhatHit_utils.pm
>>>>>>>     UNKNOWN    PostData
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/PostData.pm
>>>>>>>     1.0    Proc::ProcessTable_simple
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Proc/ProcessTable_simple.pm
>>>>>>>     1.0    Proc::Signal
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Proc/Signal.pm
>>>>>>>     UNKNOWN    Process::MpiChunk
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm
>>>>>>>     UNKNOWN    Process::MpiTiers
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiTiers.pm
>>>>>>>     1.38    Scalar::Util    /usr/local/lib/perl/5.18.1/Scalar/Util.pm
>>>>>>>     1.02    SelectSaver    /usr/share/perl/5.18/SelectSaver.pm
>>>>>>>     UNKNOWN    Shadower
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Shadower.pm
>>>>>>>     UNKNOWN    SimpleCluster
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/SimpleCluster.pm
>>>>>>>     2.009    Socket    /usr/lib/perl/5.18/Socket.pm
>>>>>>>     UNKNOWN    SpaceBase
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/SpaceBase.pm
>>>>>>>     2.45    Storable    /usr/local/lib/perl/5.18.1/Storable.pm
>>>>>>>     1.07    Symbol    /usr/share/perl/5.18/Symbol.pm
>>>>>>>     1.17    Sys::Hostname    /usr/lib/perl/5.18/Sys/Hostname.pm
>>>>>>>     0.21    Sys::SigAction
>>>>>>> /usr/local/share/perl/5.18.1/Sys/SigAction.pm
>>>>>>>     UNKNOWN    Sys::SigAction::Alarm
>>>>>>> /usr/local/share/perl/5.18.1/Sys/SigAction/Alarm.pm
>>>>>>>     4.02    Term::ANSIColor    /usr/share/perl/5.18/Term/ANSIColor.pm
>>>>>>>     4.2    Tie::Handle    /usr/share/perl/5.18/Tie/Handle.pm
>>>>>>>     1.04    Tie::Hash    /usr/share/perl/5.18/Tie/Hash.pm
>>>>>>>     4.3    Tie::StdHandle    /usr/share/perl/5.18/Tie/StdHandle.pm
>>>>>>>     1.9726    Time::HiRes    /usr/local/lib/perl/5.18.1/Time/HiRes.pm
>>>>>>>     1.2300    Time::Local    /usr/share/perl/5.18/Time/Local.pm
>>>>>>>     1.60    URI    /usr/share/perl5/URI.pm
>>>>>>>     3.31    URI::Escape    /usr/share/perl5/URI/Escape.pm
>>>>>>>     UNKNOWN    Widget
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget.pm
>>>>>>>     UNKNOWN    Widget::RepeatMasker
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/RepeatMasker.pm
>>>>>>>     UNKNOWN    Widget::augustus
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/augustus.pm
>>>>>>> <http://augustus.pm>
>>>>>>>     UNKNOWN    Widget::blastn
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/blastn.pm
>>>>>>> <http://blastn.pm>
>>>>>>>     UNKNOWN    Widget::blastx
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/blastx.pm
>>>>>>> <http://blastx.pm>
>>>>>>>     UNKNOWN    Widget::exonerate
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate.pm
>>>>>>> <http://exonerate.pm>
>>>>>>>     UNKNOWN    Widget::exonerate::cdna2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate/cdna2genome.
>>>>>>> pm <http://cdna2genome.pm>
>>>>>>>     UNKNOWN    Widget::exonerate::est2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate/est2genome.p
>>>>>>> m <http://est2genome.pm>
>>>>>>>     UNKNOWN    Widget::exonerate::protein2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate/protein2geno
>>>>>>> me.pm <http://protein2genome.pm>
>>>>>>>     UNKNOWN    Widget::fgenesh
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/fgenesh.pm
>>>>>>> <http://fgenesh.pm>
>>>>>>>     UNKNOWN    Widget::formater
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/formater.pm
>>>>>>> <http://formater.pm>
>>>>>>>     UNKNOWN    Widget::genemark
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/genemark.pm
>>>>>>> <http://genemark.pm>
>>>>>>>     UNKNOWN    Widget::snap
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/snap.pm
>>>>>>> <http://snap.pm>
>>>>>>>     UNKNOWN    Widget::snoscan
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/snoscan.pm
>>>>>>> <http://snoscan.pm>
>>>>>>>     UNKNOWN    Widget::tblastx
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/tblastx.pm
>>>>>>> <http://tblastx.pm>
>>>>>>>     UNKNOWN    Widget::trnascan
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/trnascan.pm 
>>>>>>> <http://trnascan.pm> 
>>>>>>>     0.16    XSLoader    /usr/share/perl/5.18/XSLoader.pm
>>>>>>>     0.21    attributes    /usr/lib/perl/5.18/attributes.pm 
>>>>>>> <http://attributes.pm> 
>>>>>>>     2.18    base    /usr/share/perl/5.18/base.pm <http://base.pm> 
>>>>>>>     1.04    bytes    /usr/share/perl/5.18/bytes.pm <http://bytes.pm> 
>>>>>>>     UNKNOWN    clean    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/clean.pm <http://clean.pm> 
>>>>>>>     UNKNOWN    cluster    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/cluster.pm 
>>>>>>> <http://cluster.pm> 
>>>>>>>     UNKNOWN    compare    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/compare.pm 
>>>>>>> <http://compare.pm> 
>>>>>>>     1.27    constant    /usr/share/perl/5.18/constant.pm 
>>>>>>> <http://constant.pm> 
>>>>>>>     UNKNOWN    ds_utility    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/ds_utility.pm 
>>>>>>> <http://ds_utility.pm> 
>>>>>>>     UNKNOWN    exonerate::splice_info    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/exonerate/splice_info.pm 
>>>>>>> <http://splice_info.pm> 
>>>>>>>     0.34    forks    /usr/local/lib/perl/5.18.1/forks.pm 
>>>>>>> <http://forks.pm> 
>>>>>>>     2.08001    forks::Devel::Symdump    
>>>>>>> /usr/local/lib/perl/5.18.1/forks/Devel/Symdump.pm
>>>>>>>     0.34    forks::shared    /usr/local/lib/perl/5.18.1/forks/shared.pm 
>>>>>>> <http://shared.pm> 
>>>>>>>     0.34    forks::signals    
>>>>>>> /usr/local/lib/perl/5.18.1/forks/signals.pm <http://signals.pm> 
>>>>>>>     1.00    integer    /usr/share/perl/5.18/integer.pm 
>>>>>>> <http://integer.pm> 
>>>>>>>     0.63    lib    /usr/lib/perl/5.18/lib.pm <http://lib.pm> 
>>>>>>>     1.02    locale    /usr/share/perl/5.18/locale.pm <http://locale.pm> 
>>>>>>>     UNKNOWN    maker::auto_annotator    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/auto_annotator.pm 
>>>>>>> <http://auto_annotator.pm> 
>>>>>>>     UNKNOWN    maker::join    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/join.pm 
>>>>>>> <http://join.pm> 
>>>>>>>     UNKNOWN    maker::quality_index    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/quality_index.pm 
>>>>>>> <http://quality_index.pm> 
>>>>>>>     UNKNOWN    maker::sens_spec    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/sens_spec.pm 
>>>>>>> <http://sens_spec.pm> 
>>>>>>>     1.22    overload    /usr/share/perl/5.18/overload.pm 
>>>>>>> <http://overload.pm> 
>>>>>>>     0.02    overloading    /usr/share/perl/5.18/overloading.pm 
>>>>>>> <http://overloading.pm> 
>>>>>>>     0.225    parent    /usr/share/perl/5.18/parent.pm <http://parent.pm> 
>>>>>>>     UNKNOWN    polisher    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher.pm 
>>>>>>> <http://polisher.pm> 
>>>>>>>     UNKNOWN    polisher::exonerate    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate.pm 
>>>>>>> <http://exonerate.pm> 
>>>>>>>     UNKNOWN    polisher::exonerate::altest    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate/altest.pm 
>>>>>>> <http://altest.pm> 
>>>>>>>     UNKNOWN    polisher::exonerate::est    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate/est.pm 
>>>>>>> <http://est.pm> 
>>>>>>>     UNKNOWN    polisher::exonerate::protein    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate/protein.pm 
>>>>>>> <http://protein.pm> 
>>>>>>>     UNKNOWN    repeat_mask_seq    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/repeat_mask_seq.pm 
>>>>>>> <http://repeat_mask_seq.pm> 
>>>>>>>     0.1    runlog    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/runlog.pm <http://runlog.pm> 
>>>>>>>     UNKNOWN    shadow_AED    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/shadow_AED.pm
>>>>>>>     1.07    sigtrap    /usr/share/perl/5.18/sigtrap.pm 
>>>>>>> <http://sigtrap.pm> 
>>>>>>>     1.07    strict    /usr/share/perl/5.18/strict.pm <http://strict.pm> 
>>>>>>>     1.77    threads    /usr/local/lib/perl/5.18.1/forks.pm 
>>>>>>> <http://forks.pm> 
>>>>>>>     1.33    threads::shared    
>>>>>>> /usr/local/lib/perl/5.18.1/forks/shared.pm <http://shared.pm> 
>>>>>>>     1.03    vars    /usr/share/perl/5.18/vars.pm <http://vars.pm> 
>>>>>>>     1.18    warnings    /usr/share/perl/5.18/warnings.pm 
>>>>>>> <http://warnings.pm> 
>>>>>>>     1.02    warnings::register    
>>>>>>> /usr/share/perl/5.18/warnings/register.pm <http://register.pm> 
>>>>>>> STATUS: Parsing control files...
>>>>>>> Calling GI::load_control_files at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 452.
>>>>>>> Calling GI::new_instance_temp at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 463.
>>>>>>> Calling GI::mount_check at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 465.
>>>>>>> Calling GI::set_global_temp at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 483.
>>>>>>> STATUS: Processing and indexing input FASTA files...
>>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 519.
>>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 519.
>>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 519.
>>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 519.
>>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 519.
>>>>>>> Calling List::Util::shuffle at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 529.
>>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 536.
>>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::nextDef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::nextFastaRef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling File::NFSLock::unlock at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling GI::build_fasta_index at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538.
>>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 539.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 536.
>>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::nextDef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::nextFastaRef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling File::NFSLock::unlock at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling GI::build_fasta_index at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538.
>>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 539.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 536.
>>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::nextDef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::nextFastaRef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling File::NFSLock::unlock at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling GI::build_fasta_index at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538.
>>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 539.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 536.
>>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::nextDef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::nextFastaRef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling File::NFSLock::unlock at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling GI::build_fasta_index at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538.
>>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 539.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling GI::create_blastdb at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 574.
>>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 575.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 575.
>>>>>>> Calling Iterator::Any::nextDef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 575.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 575.
>>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 575.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 575.
>>>>>>> Calling Iterator::Any::nextDef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 575.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 575.
>>>>>>> Calling GI::build_fasta_index at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 622.
>>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 623.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> STATUS: Setting up database for any GFF3 input...
>>>>>>> Calling GFFDB::new at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 629.
>>>>>>> Calling GFFDB::next_build at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 631.
>>>>>>> Calling ds_utility::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 635.
>>>>>>> A data structure will be created for you at:
>>>>>>> /home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/dpp_contig.maker
>>>>>>> .output/dpp_contig_datastore
>>>>>>> 
>>>>>>> To access files for individual sequences use the datastore index:
>>>>>>> /home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/dpp_contig.maker
>>>>>>> .output/dpp_contig_master_datastore_index.log
>>>>>>> 
>>>>>>> Calling Datastore::MD5::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 636.
>>>>>>> Calling Iterator::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 639.
>>>>>>> Calling Iterator::Fasta::skip_file at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 641.
>>>>>>> Calling Iterator::Fasta::step at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 643.
>>>>>>> STATUS: Now running MAKER...
>>>>>>> examining contents of the fasta file and run log
>>>>>>> Calling Datastore::MD5::mkdir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling Datastore::MD5::id_to_dir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling Datastore::MD5::mkdir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling uri_escape at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling File::Path::mkpath at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --Next Contig--
>>>>>>> 
>>>>>>> #---------------------------------------------------------------------
>>>>>>> Now starting the contig!!
>>>>>>> SeqID: contig-dpp-500-500
>>>>>>> Length: 32156
>>>>>>> #---------------------------------------------------------------------
>>>>>>> 
>>>>>>> 
>>>>>>> Calling FastaDB::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 462.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> setting up GFF3 output and fasta chunks
>>>>>>> doing repeat masking
>>>>>>> DBI 
>>>>>>> connect('dbname=/home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/
>>>>>>> dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open 
>>>>>>> database file at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm 
>>>>>>> line 107.
>>>>>>> Can't call method "do" on an undefined value at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm line 108.
>>>>>>> --> rank=NA, hostname=belem
>>>>>>> ERROR: Failed while doing repeat masking
>>>>>>> ERROR: Chunk failed at level:0, tier_type:1
>>>>>>> FAILED CONTIG:contig-dpp-500-500
>>>>>>> 
>>>>>>> ERROR: Chunk failed at level:2, tier_type:0
>>>>>>> FAILED CONTIG:contig-dpp-500-500
>>>>>>> 
>>>>>>> examining contents of the fasta file and run log
>>>>>>> Calling Datastore::MD5::mkdir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling Datastore::MD5::id_to_dir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling Datastore::MD5::mkdir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling uri_escape at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling File::Path::mkpath at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --Next Contig--
>>>>>>> 
>>>>>>> Processing run.log file...
>>>>>>> #---------------------------------------------------------------------
>>>>>>> Now retrying the contig!!
>>>>>>> SeqID: contig-dpp-500-500
>>>>>>> Length: 32156
>>>>>>> Tries: 2!!
>>>>>>> #---------------------------------------------------------------------
>>>>>>> 
>>>>>>> 
>>>>>>> Calling FastaDB::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 462.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> setting up GFF3 output and fasta chunks
>>>>>>> doing repeat masking
>>>>>>> DBI 
>>>>>>> connect('dbname=/home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/
>>>>>>> dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open 
>>>>>>> database file at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm 
>>>>>>> line 107.
>>>>>>> Can't call method "do" on an undefined value at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm line 108.
>>>>>>> --> rank=NA, hostname=belem
>>>>>>> ERROR: Failed while doing repeat masking
>>>>>>> ERROR: Chunk failed at level:0, tier_type:1
>>>>>>> FAILED CONTIG:contig-dpp-500-500
>>>>>>> 
>>>>>>> ERROR: Chunk failed at level:2, tier_type:0
>>>>>>> FAILED CONTIG:contig-dpp-500-500
>>>>>>> 
>>>>>>> examining contents of the fasta file and run log
>>>>>>> Calling Datastore::MD5::mkdir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling Datastore::MD5::id_to_dir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling Datastore::MD5::mkdir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling uri_escape at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling File::Path::mkpath at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --Next Contig--
>>>>>>> 
>>>>>>> Processing run.log file...
>>>>>>> 
>>>>>>> 
>>>>>>> Maker is now finished!!!
>>>>>>> 
>>>>>>> Many thanks for you help
>>>>>>> 
>>>>>>> Christelle
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 2014-03-19 14:01 GMT+01:00 Carson Holt <carsonhh at gmail.com>:
>>>>>>> Your problem is one of the following.  You need to reinstall the 
>>>>>>> DBD::SQLite module, you are running in a directory you don?t have 
>>>>>>> permissions for, you set your TMDIR environmental variable or TMP value 
>>>>>>> in maker_opts.ctl to an NFS mounted or memory mounted directory, or you 
>>>>>>> are using a self compiled version of Perl (I.e. not /usr/bin/perl) that 
>>>>>>> has issues (probably with DB or SQLite modules).  You can also 
>>>>>>> completely delete the output directory, and start again to see if it was 
>>>>>>> just a random error.  You should look at each of those first.  You can 
>>>>>>> also run MAKER with the --debug command line flag and send it to me if 
>>>>>>> all of those seem not to be the issue.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Carson
>>>>>>> 
>>>>>>> 
>>>>>>> From:  Chris Bioinfo <chrisbioinfo at gmail.com>
>>>>>>> Date:  Wednesday, March 19, 2014 at 5:09 AM
>>>>>>> To:  <maker-devel at yandell-lab.org>
>>>>>>> Subject:  [maker-devel] Annotation with maker2
>>>>>>> 
>>>>>>> Hello,
>>>>>>> 
>>>>>>> I'm installing/using maker2 for the first time and I have an error by 
>>>>>>> using it.
>>>>>>> 
>>>>>>> I certainly missing something, but I don't know what.
>>>>>>> 
>>>>>>> I compile maker with no error message and I have all these directories 
>>>>>>> after compilation: 
>>>>>>> bin  data  GMOD  INSTALL  lib  LICENSE  MWAS  perl  README  src
>>>>>>> 
>>>>>>> Nevertheless when I try maker2 on the test data (dpp_contig.fasta) I 
>>>>>>> have this error:
>>>>>>> 
>>>>>>> STATUS: Now running MAKER...
>>>>>>> examining contents of the fasta file and run log
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --Next Contig--
>>>>>>> 
>>>>>>> #---------------------------------------------------------------------
>>>>>>> Now starting the contig!!
>>>>>>> SeqID: contig-dpp-500-500
>>>>>>> Length: 32156
>>>>>>> #---------------------------------------------------------------------
>>>>>>> 
>>>>>>> 
>>>>>>> setting up GFF3 output and fasta chunks
>>>>>>> doing repeat masking
>>>>>>> DBI 
>>>>>>> connect('dbname=/path/to/dpp_contig.maker.output/dpp_contig.db','',...) 
>>>>>>> failed: unable to open database file at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm
>>>>>>> 
>>>>>>> Can't call method "do" on an undefined value at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm 
>>>>>>> --> rank=NA, hostname=belem
>>>>>>> ERROR: Failed while doing repeat masking
>>>>>>> ERROR: Chunk failed at level:0, tier_type:1
>>>>>>> FAILED CONTIG:contig-dpp-500-500
>>>>>>> ...
>>>>>>> 
>>>>>>> ideas?
>>>>>>> 
>>>>>>> Best,
>>>>>>> 
>>>>>>> Christelle
>>>>>>> 
>>>>>>> _______________________________________________ maker-devel mailing list 
>>>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listin
>>>>>>> fo/maker-devel_yandell-lab.org
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140321/add1314d/attachment-0002.html>

From jfierst at uoregon.edu  Fri Mar 21 09:43:59 2014
From: jfierst at uoregon.edu (Janna Fierst)
Date: Fri, 21 Mar 2014 08:43:59 -0700
Subject: [maker-devel] associating gene names between related strains
In-Reply-To: <CF489F0B.AC19%carsonhh@gmail.com>
References: <CAGoyurZz5FvX_oCGtSoq5mzwfabFS5ixaHVgzQds7Bo26NcYHg@mail.gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6C3C3@mxb2.hg.genetics.utah.edu>
	<CF489F0B.AC19%carsonhh@gmail.com>
Message-ID: <CAGoyurYLEQqXv0e9wik4NQUXMZgkrUge2-uuh7xfGWEj9oKGow@mail.gmail.com>

Hi,

I just wanted to say thanks for all your help- I did the reciprocal best
blast hits and then used the maker scripts (map_fasta_ids, map_gff_ids) to
associate names between strain assemblies/annotations. Worked perfectly!
-Janna


On Fri, Mar 14, 2014 at 11:02 AM, Carson Holt <carsonhh at gmail.com> wrote:

> maker_map_ids does a translation (i.e. change gene-A to smug1), so you
> need to know which genes you want to translate names to (two column input
> file, column 1 -> original ID, column 2 -> new ID).  I'm not sure EST
> forward is the best way to do this, although I do think maker_map_ids is
> the tool to use in the end.  The question is how to make a list of IDs to
> translate as the input to maker_map_ids?
>
> I would actually just use BLASTP against the reference strain, and then
> do reciprocal best BLAST hits.  To do this you BLAST your reference
> proteins against your maker proteins.  Then do the opposite, BLAST your
>  maker proteins against your reference proteins.  If they are both each
> others best hit, then they are orthologous, and you can safely make a two
> column entry for the maker_map_ids input (i.e. maker-gene-1 translates into
> smug1).
>
> --Carson
>
>
> From: Daniel Ence <dence at genetics.utah.edu>
> Date: Friday, March 14, 2014 at 11:32 AM
> To: Janna Fierst <jfierst at uoregon.edu>, "maker-devel at yandell-lab.org" <
> maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] associating gene names between related strains
>
> Hi Janna, So do you have one strain that you want to use as the reference
> for all the others? There's a script that comes with MAKER called
> maker_map_ids that lets you use a common prefix or suffix for entries in a
> fasta file from one strain and then use est_forward to use that ID in the
> gene models for the other species.
>
> Let me know if that's not what you're looking for,
> Daniel
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ------------------------------
> *From:* maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
> Janna Fierst [jfierst at uoregon.edu]
> *Sent:* Friday, March 14, 2014 10:06 AM
> *To:* maker-devel at yandell-lab.org
> *Subject:* [maker-devel] associating gene names between related strains
>
> Hi,
>
> we are assembling and annotating genomes for several related strains of
> Caenorhabditis worms and I was wondering if there is a way to coordinate
> the gene naming so that orthologs between species can be associated by
> name. I have been playing around a little with the est_forward option but
> can't figure out a good system/workflow that preserves names but still uses
> the strain-specific RNA-Seq EST set for the actual gene models. Thanks!
> -Janna
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140321/b8ab29c4/attachment-0002.html>

From carsonhh at gmail.com  Fri Mar 21 09:54:15 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 21 Mar 2014 09:54:15 -0600
Subject: [maker-devel] associating gene names between related strains
In-Reply-To: <CAGoyurYLEQqXv0e9wik4NQUXMZgkrUge2-uuh7xfGWEj9oKGow@mail.gmail.com>
References: <CAGoyurZz5FvX_oCGtSoq5mzwfabFS5ixaHVgzQds7Bo26NcYHg@mail.gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6C3C3@mxb2.hg.genetics.utah.edu>
	<CF489F0B.AC19%carsonhh@gmail.com>
	<CAGoyurYLEQqXv0e9wik4NQUXMZgkrUge2-uuh7xfGWEj9oKGow@mail.gmail.com>
Message-ID: <CF51BCA1.AFB9%carsonhh@gmail.com>

I'm glad we could help.

--Carson

From:  Janna Fierst <jfierst at uoregon.edu>
Date:  Friday, March 21, 2014 at 9:43 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] associating gene names between related strains

Hi,

I just wanted to say thanks for all your help- I did the reciprocal best
blast hits and then used the maker scripts (map_fasta_ids, map_gff_ids) to
associate names between strain assemblies/annotations. Worked perfectly!
-Janna


On Fri, Mar 14, 2014 at 11:02 AM, Carson Holt <carsonhh at gmail.com> wrote:
> maker_map_ids does a translation (i.e. change gene-A to smug1), so you need to
> know which genes you want to translate names to (two column input file, column
> 1 -> original ID, column 2 -> new ID).  I?m not sure EST forward is the best
> way to do this, although I do think maker_map_ids is the tool to use in the
> end.  The question is how to make a list of IDs to translate as the input to
> maker_map_ids?
> 
> I would actually just use BLASTP against the reference strain, and then do
> reciprocal best BLAST hits.  To do this you BLAST your reference proteins
> against your maker proteins.  Then do the opposite, BLAST your  maker proteins
> against your reference proteins.  If they are both each others best hit, then
> they are orthologous, and you can safely make a two column entry for the
> maker_map_ids input (i.e. maker-gene-1 translates into smug1).
> 
> ?Carson
> 
> 
> From:  Daniel Ence <dence at genetics.utah.edu>
> Date:  Friday, March 14, 2014 at 11:32 AM
> To:  Janna Fierst <jfierst at uoregon.edu>, "maker-devel at yandell-lab.org"
> <maker-devel at yandell-lab.org>
> Subject:  Re: [maker-devel] associating gene names between related strains
> 
> Hi Janna, So do you have one strain that you want to use as the reference for
> all the others? There's a script that comes with MAKER called maker_map_ids
> that lets you use a common prefix or suffix for entries in a fasta file from
> one strain and then use est_forward to use that ID in the gene models for the
> other species. 
> 
> Let me know if that's not what you're looking for,
> Daniel
> 
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> 
> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Janna
> Fierst [jfierst at uoregon.edu]
> Sent: Friday, March 14, 2014 10:06 AM
> To: maker-devel at yandell-lab.org
> Subject: [maker-devel] associating gene names between related strains
> 
> Hi,
> 
> we are assembling and annotating genomes for several related strains of
> Caenorhabditis worms and I was wondering if there is a way to coordinate the
> gene naming so that orthologs between species can be associated by name. I
> have been playing around a little with the est_forward option but can't figure
> out a good system/workflow that preserves names but still uses the
> strain-specific RNA-Seq EST set for the actual gene models. Thanks! -Janna
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak
> er-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140321/8a62aa07/attachment-0002.html>

From Hossein.Borhan at AGR.GC.CA  Fri Mar 21 10:41:38 2014
From: Hossein.Borhan at AGR.GC.CA (Borhan, Hossein)
Date: Fri, 21 Mar 2014 16:41:38 +0000
Subject: [maker-devel] non-nucleotide characters in the maker generated
	transcripts
In-Reply-To: <CF4CA8DB.AD74%carson.holt@genetics.utah.edu>
References: <E8EDFB90D92694478065C37017B3A3A6A890C8AC@SKREGIXES2.AGR.GC.CA>
	<CF47300B.AB4F%carson.holt@genetics.utah.edu>
	<CF4731CC.AB5E%carson.holt@genetics.utah.edu>
	<E8EDFB90D92694478065C37017B3A3A6A890CC84@SKREGIXES2.AGR.GC.CA>
	<CF4CA8DB.AD74%carson.holt@genetics.utah.edu>
Message-ID: <E8EDFB90D92694478065C37017B3A3A6A890F2F6@SKREGIXES2.AGR.GC.CA>

Dear Carson

I ran maker and modified .pm files and it resolved the problem with the
fasta output. Thanks a lot for your help.


HB


On 14-03-17 1:45 PM, "Carson Holt" <carson.holt at genetics.utah.edu> wrote:

>I have attached 4 files for you to place in the .../maker/Widgets/
>directory.
>
>The *blast.pm files will suppress the BLAST+ failures you are getting
>(alternatively you can just downgrade to BLAST 2.27 to get the same
>effect).  BLAST 2.29 gives a lot of warnings etc., which you can ignore.
>In the latest release NCBI redid all their warnings and error codes so it
>spits out a lot of garbage and fails with different messages than it did
>before.  For example BLAST now warns you every time it encounter a fasta
>header with a comment (virtually every fasta entry in existence falls in
>this category), so your screen will be awash with meaningless warning
>messages.
>
>The fgenesh.pm file will fix the other failure, which only occurs if you
>use fgenesh simultaneously with the est_fustion=1 option.  No other
>predictors are affected.
>
>Thanks,
>Carson
>
>
>On 3/14/14, 5:14 PM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:
>
>>Dear  Carson
>>
>>Sorry for the late reply. I was away for a couple of days. I have
>>uploaded
>>the out put files plus control and error output on the FTP site that you
>>provided
>>The user ID is borhanh
>>
>>I used blast+ for this run.
>>
>>
>>
>>
>>Regards
>>
>>
>>HB
>>
>>
>>
>>
>>
>>
>>
>>
>>On 14-03-13 10:00 AM, "Carson Holt" <carson.holt at genetics.utah.edu>
>>wrote:
>>
>>>Just resending this to the correct maker-devel address.  Please when
>>>replying, do not CC the incorrect maker-devel-bounce address.
>>>
>>>Thanks,
>>>Carson
>>>
>>>
>>>On 3/13/14, 9:56 AM, "Carson Holt" <carson.holt at genetics.utah.edu>
>>>wrote:
>>>
>>>>FGENESH is not a heavily used tool, so depending on which version it is
>>>>(either too old or too new), output might be slightly different which
>>>>could cause incorrect parsing. Could you tar up your maker.output
>>>>folder,
>>>>and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>>>>(send me either your user/guest ID after you upload).
>>>>
>>>>For the BLAST error, use BLAST+ instead.  You are using blastall which
>>>>is
>>>>the old legacy version of NCBI BLAST.  You can do this by setting the
>>>>blast type in maker_bopts.ctl and the location of executables in
>>>>maker_exe.ctl.
>>>>
>>>>Thanks,
>>>>Carson
>>>>
>>>>
>>>>
>>>>On 3/12/14, 11:58 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA>
>>>>wrote:
>>>>
>>>>>Dear Maker users
>>>>>
>>>>>
>>>>>I ran maker (2.31) on a fungal genome and found out that it inserted
>>>>>the
>>>>>word SCLAR   followed by a pair of bracket like this (0x22de7020)
>>>>>inserted in the nucleotide sequence of some of the genes. This seems
>>>>>to
>>>>>be related to transcripts predicted by fgenesh_masked.
>>>>>
>>>>>
>>>>>Here is an example for one of the genes
>>>>>
>>>>>
>>>>>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript
>>>>>>offset:0 AE
>>>>>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651
>>>>>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23
>>>>>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA
>>>>>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG
>>>>>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC
>>>>>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT
>>>>>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC
>>>>>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT
>>>>>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA
>>>>>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA
>>>>>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT
>>>>>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT
>>>>>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC
>>>>>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG
>>>>>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG
>>>>>TTTCGACAAGC
>>>>>
>>>>>The same genome sequence was used for the first round of maker (2.10)
>>>>>without such problem. I checked the sequence for the scaffold related
>>>>>to
>>>>>one of the affected transcripts and there was no error in the
>>>>>sequence.
>>>>>I am not sure what is causing this. The only error that I could spot
>>>>>in
>>>>>the output error file is the following
>>>>>
>>>>>
>>>>>[blastall] FATAL ERROR:  search cannot proceed due to errors in all
>>>>>contexts/frames of query sequences.
>>>>>
>>>>>
>>>>>
>>>>>Your help is appreciated
>>>>>
>>>>>
>>>>>
>>>>>HB
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


From carsonhh at gmail.com  Fri Mar 21 10:43:10 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 21 Mar 2014 10:43:10 -0600
Subject: [maker-devel] non-nucleotide characters in the maker generated
 transcripts
Message-ID: <CF51C832.AFC0%carsonhh@gmail.com>

Thanks for letting me know.

--Carson


On 3/21/14, 10:41 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:

>Dear Carson
>
>I ran maker and modified .pm files and it resolved the problem with the
>fasta output. Thanks a lot for your help.
>
>
>
>
>HB
>
>
>
>
>
>
>
>
>On 14-03-17 1:45 PM, "Carson Holt" <carson.holt at genetics.utah.edu> wrote:
>
>>I have attached 4 files for you to place in the .../maker/Widgets/
>>directory.
>>
>>The *blast.pm files will suppress the BLAST+ failures you are getting
>>(alternatively you can just downgrade to BLAST 2.27 to get the same
>>effect).  BLAST 2.29 gives a lot of warnings etc., which you can ignore.
>>In the latest release NCBI redid all their warnings and error codes so it
>>spits out a lot of garbage and fails with different messages than it did
>>before.  For example BLAST now warns you every time it encounter a fasta
>>header with a comment (virtually every fasta entry in existence falls in
>>this category), so your screen will be awash with meaningless warning
>>messages.
>>
>>The fgenesh.pm file will fix the other failure, which only occurs if you
>>use fgenesh simultaneously with the est_fustion=1 option.  No other
>>predictors are affected.
>>
>>Thanks,
>>Carson
>>
>>
>>On 3/14/14, 5:14 PM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:
>>
>>>Dear  Carson
>>>
>>>Sorry for the late reply. I was away for a couple of days. I have
>>>uploaded
>>>the out put files plus control and error output on the FTP site that you
>>>provided
>>>The user ID is borhanh
>>>
>>>I used blast+ for this run.
>>>
>>>
>>>
>>>
>>>Regards
>>>
>>>
>>>HB
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>On 14-03-13 10:00 AM, "Carson Holt" <carson.holt at genetics.utah.edu>
>>>wrote:
>>>
>>>>Just resending this to the correct maker-devel address.  Please when
>>>>replying, do not CC the incorrect maker-devel-bounce address.
>>>>
>>>>Thanks,
>>>>Carson
>>>>
>>>>
>>>>On 3/13/14, 9:56 AM, "Carson Holt" <carson.holt at genetics.utah.edu>
>>>>wrote:
>>>>
>>>>>FGENESH is not a heavily used tool, so depending on which version it
>>>>>is
>>>>>(either too old or too new), output might be slightly different which
>>>>>could cause incorrect parsing. Could you tar up your maker.output
>>>>>folder,
>>>>>and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>>>>>(send me either your user/guest ID after you upload).
>>>>>
>>>>>For the BLAST error, use BLAST+ instead.  You are using blastall which
>>>>>is
>>>>>the old legacy version of NCBI BLAST.  You can do this by setting the
>>>>>blast type in maker_bopts.ctl and the location of executables in
>>>>>maker_exe.ctl.
>>>>>
>>>>>Thanks,
>>>>>Carson
>>>>>
>>>>>
>>>>>
>>>>>On 3/12/14, 11:58 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA>
>>>>>wrote:
>>>>>
>>>>>>Dear Maker users
>>>>>>
>>>>>>
>>>>>>I ran maker (2.31) on a fungal genome and found out that it inserted
>>>>>>the
>>>>>>word SCLAR   followed by a pair of bracket like this (0x22de7020)
>>>>>>inserted in the nucleotide sequence of some of the genes. This seems
>>>>>>to
>>>>>>be related to transcripts predicted by fgenesh_masked.
>>>>>>
>>>>>>
>>>>>>Here is an example for one of the genes
>>>>>>
>>>>>>
>>>>>>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript
>>>>>>>offset:0 AE
>>>>>>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651
>>>>>>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23
>>>>>>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA
>>>>>>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG
>>>>>>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC
>>>>>>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT
>>>>>>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC
>>>>>>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT
>>>>>>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA
>>>>>>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA
>>>>>>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT
>>>>>>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT
>>>>>>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC
>>>>>>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG
>>>>>>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG
>>>>>>TTTCGACAAGC
>>>>>>
>>>>>>The same genome sequence was used for the first round of maker (2.10)
>>>>>>without such problem. I checked the sequence for the scaffold related
>>>>>>to
>>>>>>one of the affected transcripts and there was no error in the
>>>>>>sequence.
>>>>>>I am not sure what is causing this. The only error that I could spot
>>>>>>in
>>>>>>the output error file is the following
>>>>>>
>>>>>>
>>>>>>[blastall] FATAL ERROR:  search cannot proceed due to errors in all
>>>>>>contexts/frames of query sequences.
>>>>>>
>>>>>>
>>>>>>
>>>>>>Your help is appreciated
>>>>>>
>>>>>>
>>>>>>
>>>>>>HB
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From marc.hoeppner at imbim.uu.se  Mon Mar 24 04:08:25 2014
From: marc.hoeppner at imbim.uu.se (=?iso-8859-1?Q?Marc_H=F6ppner?=)
Date: Mon, 24 Mar 2014 10:08:25 +0000
Subject: [maker-devel] Annotations from proteins, follow-up
Message-ID: <10AFC7D0-82BA-4527-9B77-80DC4BE80CFD@imbim.uu.se>

Hi,

I had previously inquired about protein-based gene building (for example to create a training set for SNAP). This is currently possible with Maker (2.31), but I noticed a limitation. Specifically, I tend to run Maker once to generate all the raw computes (protein and set alignments, mostly). I then separate these out into GFF files that I can store away and use in various combinations of settings and data in parallel. 

However, the protein2genome option does not seem to work off pre-aligned protein data (e.g. protein2genome.gff produced with Maker). Is that intentional and is there a work-around? Or is the only option to run this with fasta files?

Cheers,

Marc


Marc P. Hoeppner, PhD

Department for Medical Biochemistry and Microbiology
Uppsala University, Sweden
marc.hoeppner at imbim.uu.se


From sujaikumar at gmail.com  Mon Mar 24 08:15:16 2014
From: sujaikumar at gmail.com (Sujai)
Date: Mon, 24 Mar 2014 14:15:16 +0000
Subject: [maker-devel] Dashes in transcript predictions
Message-ID: <CAFADFFt-Af82itPN8kXv1Ozh_9K1YxO+9NWBYkDW2aR4jP4yFg@mail.gmail.com>

Dear Maker Team

On a recent run with maker 2.31, I noticed that a couple of the transcripts
had dashes/hyphens in them.

Example:
>snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript offset:261
AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAATTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG
AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATTCCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT
GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGACCATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG
GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTTACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT
AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCCTTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG
ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAAATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT
AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAACCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT
TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA

The protein prediction for this transcript is ok:

>snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25
eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCVMTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY
DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKNKKMVVWVVSSLPSAAIRNAKRRINEQSSHV

Is this a known bug? I tried searching for "dash|hyphen" in the email list
but couldn't find anything else.

Best wishes,

- Sujai

ps. I pulled out just this one contig and ran maker on it. all the
.maker.output files are attached.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140324/c626ff64/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nGt.0.3.035610.maker.output.tgz
Type: application/x-gzip
Size: 45641 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140324/c626ff64/attachment-0002.tgz>

From carsonhh at gmail.com  Mon Mar 24 10:49:46 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Mar 2014 10:49:46 -0600
Subject: [maker-devel] Dashes in transcript predictions
In-Reply-To: <CAFADFFt-Af82itPN8kXv1Ozh_9K1YxO+9NWBYkDW2aR4jP4yFg@mail.gmail.com>
References: <CAFADFFt-Af82itPN8kXv1Ozh_9K1YxO+9NWBYkDW2aR4jP4yFg@mail.gmail.com>
Message-ID: <CF55BD0D.B01C%carsonhh@gmail.com>

I've actually never seen that before, but looking through your output it
appears to be specifically caused by setting correct_est_fusion=1, and how
it interacts with some features of your dataset.

I've attached a patch in the form of a file you can use to replace
.../maker/lib/maker/join.pm.  I'm also going to add it to the MAKER
download.

Thanks,
Carson


From:  Sujai <sujaikumar at gmail.com>
Date:  Monday, March 24, 2014 at 8:15 AM
To:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Dashes in transcript predictions

Dear Maker Team

On a recent run with maker 2.31, I noticed that a couple of the transcripts
had dashes/hyphens in them.

Example:
>snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript offset:261
AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAA
TTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG
AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATT
CCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT
GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGAC
CATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG
GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTT
ACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT
AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCC
TTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG
ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAA
ATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT
AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAA
CCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT
TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA

The protein prediction for this transcript is ok:

>snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25 eAED:0.25
QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCV
MTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY
DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKN
KKMVVWVVSSLPSAAIRNAKRRINEQSSHV

Is this a known bug? I tried searching for "dash|hyphen" in the email list
but couldn't find anything else.

Best wishes,

- Sujai

ps. I pulled out just this one contig and ran maker on it. all the
.maker.output files are attached.


_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140324/ebc5d81c/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: join.pm
Type: text/x-perl-script
Size: 18645 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140324/ebc5d81c/attachment-0002.bin>

From carsonhh at gmail.com  Mon Mar 24 11:05:15 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Mar 2014 11:05:15 -0600
Subject: [maker-devel] Annotations from proteins, follow-up
Message-ID: <CF55BE79.B028%carsonhh@gmail.com>

It not so much intentional as it is a a limitation of the information in
GFF3 format alignments. Right now protein2genome for Eukaryotes will only
try and make exonerate derived alignments work because they have been
polished around splice sites and MAKER still has access to the original
protein sequence and alignment cigar string fro additional filtering, etc.
 With GFF3 pass-through the algorithm doesn't know nearly as much about
what is passed in. For example the protein sequence is gone, cigar
alignment strings are rarely included (Gap= attribute in GFF3), and it's
not always clear if the  alignment was polished for splice sites.  Also
since protein2genome=1 is expected to be used only to generate an initial
training set, and not for final annotations, this is considered a
reasonable restriction.

If you still really want to force protein alignments from a GFF3 to be
considered as potential models, you could put them in as pred_gff.  In
which case they will always be considered as potential models.  Of course
it will be relatively ugly because you lack things I mentioned before such
as the alignment cigar string and original protein sequence that are
normally used to filter protein2genome results for inclusion as models.

--Carson


On 3/24/14, 4:08 AM, "Marc H?ppner" <marc.hoeppner at imbim.uu.se> wrote:

>Hi,
>
>I had previously inquired about protein-based gene building (for example
>to create a training set for SNAP). This is currently possible with Maker
>(2.31), but I noticed a limitation. Specifically, I tend to run Maker
>once to generate all the raw computes (protein and set alignments,
>mostly). I then separate these out into GFF files that I can store away
>and use in various combinations of settings and data in parallel.
>
>However, the protein2genome option does not seem to work off pre-aligned
>protein data (e.g. protein2genome.gff produced with Maker). Is that
>intentional and is there a work-around? Or is the only option to run this
>with fasta files?
>
>Cheers,
>
>Marc
>
>
>Marc P. Hoeppner, PhD
>
>Department for Medical Biochemistry and Microbiology
>Uppsala University, Sweden
>marc.hoeppner at imbim.uu.se
>
>
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Mon Mar 24 12:15:39 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Mar 2014 12:15:39 -0600
Subject: [maker-devel] Dashes in transcript predictions
In-Reply-To: <CF55BD0D.B01C%carsonhh@gmail.com>
References: <CAFADFFt-Af82itPN8kXv1Ozh_9K1YxO+9NWBYkDW2aR4jP4yFg@mail.gmail.com>
	<CF55BD0D.B01C%carsonhh@gmail.com>
Message-ID: <CF55C7D4.B05A%carsonhh@gmail.com>

One more note on this.  The sequence is actually fully correct if you just
remove the '-' characters.  So if you don't want to rerun MAKER with the
patch, then you can use the attached script to just repair the transcript
file by removing the '-' characters.  Your GFF3 files and proteins files
should already be correct as is.

Usage --> perl fix_dash transcript_file.fasta > new_file.fasta

You may need to place the script in the .../maker/bin/ directory so it can
detect BioPerl if you don't have BioPerl installed system wide.

Thanks,
Carson

From:  Carson Holt <carsonhh at gmail.com>
Date:  Monday, March 24, 2014 at 10:49 AM
To:  Sujai <sujaikumar at gmail.com>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Dashes in transcript predictions

I've actually never seen that before, but looking through your output it
appears to be specifically caused by setting correct_est_fusion=1, and how
it interacts with some features of your dataset.

I've attached a patch in the form of a file you can use to replace
.../maker/lib/maker/join.pm.  I'm also going to add it to the MAKER
download.

Thanks,
Carson


From:  Sujai <sujaikumar at gmail.com>
Date:  Monday, March 24, 2014 at 8:15 AM
To:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Dashes in transcript predictions

Dear Maker Team

On a recent run with maker 2.31, I noticed that a couple of the transcripts
had dashes/hyphens in them.

Example:
>snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript offset:261
AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAA
TTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG
AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATT
CCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT
GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGAC
CATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG
GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTT
ACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT
AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCC
TTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG
ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAA
ATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT
AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAA
CCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT
TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA

The protein prediction for this transcript is ok:

>snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25 eAED:0.25
QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCV
MTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY
DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKN
KKMVVWVVSSLPSAAIRNAKRRINEQSSHV

Is this a known bug? I tried searching for "dash|hyphen" in the email list
but couldn't find anything else.

Best wishes,

- Sujai

ps. I pulled out just this one contig and ran maker on it. all the
.maker.output files are attached.


_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m
aker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140324/0a71390d/attachment-0002.html>

From sujaikumar at gmail.com  Mon Mar 24 12:17:02 2014
From: sujaikumar at gmail.com (Sujai)
Date: Mon, 24 Mar 2014 18:17:02 +0000
Subject: [maker-devel] Dashes in transcript predictions
In-Reply-To: <CF55C7D4.B05A%carsonhh@gmail.com>
References: <CAFADFFt-Af82itPN8kXv1Ozh_9K1YxO+9NWBYkDW2aR4jP4yFg@mail.gmail.com>
	<CF55BD0D.B01C%carsonhh@gmail.com> <CF55C7D4.B05A%carsonhh@gmail.com>
Message-ID: <CAFADFFs6KYiZ8rmfEwYVCYbGymJOUXHVcKVShscBBjjCR3q2fA@mail.gmail.com>

Wow. That was a super quick response. Thanks very much for confirming the
problem and the fixes!


On 24 March 2014 18:15, Carson Holt <carsonhh at gmail.com> wrote:

> One more note on this.  The sequence is actually fully correct if you just
> remove the '-' characters.  So if you don't want to rerun MAKER with the
> patch, then you can use the attached script to just repair the transcript
> file by removing the '-' characters.  Your GFF3 files and proteins files
> should already be correct as is.
>
> Usage --> perl fix_dash transcript_file.fasta > new_file.fasta
>
> You may need to place the script in the .../maker/bin/ directory so it can
> detect BioPerl if you don't have BioPerl installed system wide.
>
> Thanks,
> Carson
>
> From: Carson Holt <carsonhh at gmail.com>
> Date: Monday, March 24, 2014 at 10:49 AM
> To: Sujai <sujaikumar at gmail.com>, "maker-devel at yandell-lab.org" <
> maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Dashes in transcript predictions
>
> I've actually never seen that before, but looking through your output it
> appears to be specifically caused by setting correct_est_fusion=1, and how
> it interacts with some features of your dataset.
>
> I've attached a patch in the form of a file you can use to replace
> .../maker/lib/maker/join.pm.  I'm also going to add it to the MAKER
> download.
>
> Thanks,
> Carson
>
>
> From: Sujai <sujaikumar at gmail.com>
> Date: Monday, March 24, 2014 at 8:15 AM
> To: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: [maker-devel] Dashes in transcript predictions
>
> Dear Maker Team
>
> On a recent run with maker 2.31, I noticed that a couple of the
> transcripts had dashes/hyphens in them.
>
> Example:
> >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript
> offset:261 AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
> TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAATTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG
> AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATTCCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT
> GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGACCATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG
> GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTTACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT
> AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCCTTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG
> ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAAATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT
> AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAACCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT
> TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA
>
> The protein prediction for this transcript is ok:
>
> >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25
> eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
>
> MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCVMTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY
>
> DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKNKKMVVWVVSSLPSAAIRNAKRRINEQSSHV
>
> Is this a known bug? I tried searching for "dash|hyphen" in the email list
> but couldn't find anything else.
>
> Best wishes,
>
> - Sujai
>
> ps. I pulled out just this one contig and ran maker on it. all the
> .maker.output files are attached.
>
>
>  _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140324/88aabc4b/attachment-0002.html>

From diana.garnica at anu.edu.au  Mon Mar 24 17:11:01 2014
From: diana.garnica at anu.edu.au (Diana Garnica Moreno)
Date: Mon, 24 Mar 2014 23:11:01 +0000
Subject: [maker-devel] Problem extracting fasta from a GFF file generated
	with MAKER
Message-ID: <1264f0423dbe47b18ed3bc8b49c5b31d@HKXPR06MB101.apcprd06.prod.outlook.com>

Hi there,


We recently assembled a fungal genome using MAKER and we got the gene models. and the corresponding transcripts, predicted proteins and GFF files. However, the predicted proteins do not have the stop codon included so I do not know which proteins are complete and which ones are incomplete at the 3' end. To solve that I have used different programs to extract the fasta sequence of the CDSs given the gff file and the genome sequence. The problem is that with the tools I have tested I get the right sequence for some of the proteins and wrong sequences for others (with multiple stop codons for example). I am not sure why it happens and since it happens with different tools (different python scripts and even gffread from cufflink) I do not know where is the problem. Could you please give me some advice on how to extract the right sequences with the stop codons included?


Thanks!


Diana
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140324/352e150d/attachment-0002.html>

From carsonhh at gmail.com  Mon Mar 24 17:25:09 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Mar 2014 17:25:09 -0600
Subject: [maker-devel] Problem extracting fasta from a GFF file
 generated with MAKER
Message-ID: <CF56185B.B0E1%carsonhh@gmail.com>

You are probably getting the wrong proteins from your scripts because you
are not taking into account the 5' and 3' UTR in the transcript.

For example
>snap_masked-contig-processed-gene-0.2-mRNA-1 transcript offset:261 AED:0.25
eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|22|240

The 5' UTR is 261bp and the 3' UTR is 22bp long.  Both would have to be
trimmed before translating the transcript into a protein. Once they are
trimmed you can use frame 0 for the translation.

The fasta_tool that comes with MAKER can be used to quickly trim the UTR.

Example:
fasta_tool maker_transcripts.fasta --trim_maker_utr

Then you can try your other scripts again.

Thanks,
Carson


From:  Diana Garnica Moreno <diana.garnica at anu.edu.au>
Date:  Monday, March 24, 2014 at 5:11 PM
To:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Problem extracting fasta from a GFF file generated
with MAKER

Hi there,


We recently assembled a fungal genome using MAKER and we got the gene
models. and the corresponding transcripts, predicted proteins and GFF files.
However, the predicted proteins do not have the stop codon included so I do
not know which proteins are complete and which ones are incomplete at the 3'
end. To solve that I have used different programs to extract the fasta
sequence of the CDSs given the gff file and the genome sequence. The problem
is that with the tools I have tested I get the right sequence for some of
the proteins and wrong sequences for others (with multiple stop codons for
example). I am not sure why it happens and since it happens with different
tools (different python scripts and even gffread from cufflink) I do not
know where is the problem. Could you please give me some advice on how to
extract the right sequences with the stop codons included?


Thanks!


Diana
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140324/2bcbc369/attachment-0002.html>

From daniel.standage at gmail.com  Tue Mar 25 07:24:14 2014
From: daniel.standage at gmail.com (Daniel Standage)
Date: Tue, 25 Mar 2014 09:24:14 -0400
Subject: [maker-devel] Maker iPlant image
Message-ID: <CAOfLjHVa1r8hdF0GK+gp59pmfZb7qZLO5rF0qwK7b+=hQ0CcrQ@mail.gmail.com>

Greetings,

I launched an instance from the Maker-P 2.28 image
(c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location
of the installed software. All I could find was an example data set on the
Desktop, but the "maker" program was not in the path and the contents of
"/usr/local/src" are empty. Could you please advise on how to run Maker in
iPlant Atmosphere? Thanks.

--
Daniel S. Standage
Ph.D. Candidate
Computational Genome Science Laboratory
Indiana University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140325/6766e38e/attachment-0002.html>

From ernesto at ebi.ac.uk  Tue Mar 25 04:10:59 2014
From: ernesto at ebi.ac.uk (ernesto lowy gallego)
Date: Tue, 25 Mar 2014 10:10:59 +0000
Subject: [maker-devel] Incorrect translation start codon
Message-ID: <53315633.2070702@ebi.ac.uk>

Hi,

I have been inspecting the MAKER predictions and I detected a situation 
which appears with a certain frequency.
(See attached Apollo screenshot illustrating the situation I am going to 
describe):

Let's say that there is est2genome evidence supporting the prediction of 
the 5' UTR region, I have realized that in some of these transcripts 
with 5'UTR, MAKER is not capable of identifying the right downstream ATG 
protein start codon and considers a TTG codon (coding for L) as the 
incorrect protein start. The proper ATG codon start is further 
downstream, as the Ab-initio predictors (SNAP+AUGUSTUS) correctly 
predict in this case (see the attached screenshot)

Any comments on this?

Thanks!

ernesto

-- 
Developer

VectorBase | Ensembl Genomes

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2014-03-25 at 09.34.16.png
Type: image/png
Size: 32220 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140325/f9ae69ec/attachment-0002.png>

From carsonhh at gmail.com  Tue Mar 25 08:19:22 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 25 Mar 2014 08:19:22 -0600
Subject: [maker-devel] Incorrect translation start codon
In-Reply-To: <53315633.2070702@ebi.ac.uk>
References: <53315633.2070702@ebi.ac.uk>
Message-ID: <CF56EBF0.B109%carsonhh@gmail.com>

This is caused by BioPerl's is_start_codon method and default codon table
returning true for non-canonical start codons.  It was resolved some time
ago (See previous discussion -->
https://groups.google.com/forum/#!topic/maker-devel/S0j1fJ4LjVY ).  Make
sure you are using the most recent version of MAKER (currently 2.31).

Thanks,
Carson


https://groups.google.com/forum/#!topic/maker-devel/S0j1fJ4LjVY

On 3/25/14, 4:10 AM, "ernesto lowy gallego" <ernesto at ebi.ac.uk> wrote:

>Hi,
>
>I have been inspecting the MAKER predictions and I detected a situation
>which appears with a certain frequency.
>(See attached Apollo screenshot illustrating the situation I am going to
>describe):
>
>Let's say that there is est2genome evidence supporting the prediction of
>the 5' UTR region, I have realized that in some of these transcripts
>with 5'UTR, MAKER is not capable of identifying the right downstream ATG
>protein start codon and considers a TTG codon (coding for L) as the
>incorrect protein start. The proper ATG codon start is further
>downstream, as the Ab-initio predictors (SNAP+AUGUSTUS) correctly
>predict in this case (see the attached screenshot)
>
>Any comments on this?
>
>Thanks!
>
>ernesto
>
>-- 
>Developer
>
>VectorBase | Ensembl Genomes
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Tue Mar 25 08:24:36 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 25 Mar 2014 08:24:36 -0600
Subject: [maker-devel] Maker iPlant image
In-Reply-To: <CAOfLjHVa1r8hdF0GK+gp59pmfZb7qZLO5rF0qwK7b+=hQ0CcrQ@mail.gmail.com>
References: <CAOfLjHVa1r8hdF0GK+gp59pmfZb7qZLO5rF0qwK7b+=hQ0CcrQ@mail.gmail.com>
Message-ID: <CF56ED91.B119%carsonhh@gmail.com>

--> /opt/maker/bin/maker

It looks like most preinstalled software is under /opt on the image.

Thanks,
Carson


From:  Daniel Standage <daniel.standage at gmail.com>
Date:  Tuesday, March 25, 2014 at 7:24 AM
To:  Maker Mailing List <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Maker iPlant image

Greetings,

I launched an instance from the Maker-P 2.28 image
(c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location
of the installed software. All I could find was an example data set on the
Desktop, but the "maker" program was not in the path and the contents of
"/usr/local/src" are empty. Could you please advise on how to run Maker in
iPlant Atmosphere? Thanks.

--
Daniel S. Standage
Ph.D. Candidate
Computational Genome Science Laboratory
Indiana University
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140325/208a9c20/attachment-0002.html>

From darasappan at gmail.com  Tue Mar 25 10:33:59 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Tue, 25 Mar 2014 11:33:59 -0500
Subject: [maker-devel] maker to EvidenceModeler
Message-ID: <08324618-6422-4E24-99D1-D05E64420FFB@gmail.com>

Hi Carson and others,

Is there an easy tool/pipeline available as part of maker utilities to convert maker and SNAP output to files acceptable by EvidenceModeler?

It looks like it also needs just gff files, but with a few tweaks. EvidenceModeler seems better equipped to handle PASA annotation results than maker results.

Thanks
Dhivya


From barry.utah at gmail.com  Tue Mar 25 11:51:38 2014
From: barry.utah at gmail.com (Barry Moore)
Date: Tue, 25 Mar 2014 11:51:38 -0600
Subject: [maker-devel] Problem extracting fasta from a GFF file
	generated	with MAKER
In-Reply-To: <1264f0423dbe47b18ed3bc8b49c5b31d@HKXPR06MB101.apcprd06.prod.outlook.com>
References: <1264f0423dbe47b18ed3bc8b49c5b31d@HKXPR06MB101.apcprd06.prod.outlook.com>
Message-ID: <B283D045-3B8D-4A0C-82F8-7C2DB291B065@genetics.utah.edu>

Hi Diana,

There is a Perl library - The Genome Annotation Library - that is designed to make writing code like this easy.  I just added a script to this library called gal_CDS_sequence which you would run like this:

gal_CDS_sequence --translate genes.gff3 genome.fasta

The focus of GAL is to try to make writing quick scripts like this easy, so if you're comfortable with a bit of Perl, you can modify existing scripts and write new ones to search, iterate through, and traverse the relationships of features in GFF3 files.

You can access the library here:

http://www.sequenceontology.org/software/GAL.html

Support for GAL is available via the SO mailing list:

https://lists.sourceforge.net/lists/listinfo/song-devel

Hope that helps,

Barry

On Mar 24, 2014, at 5:11 PM, Diana Garnica Moreno wrote:

> Hi there,
> 
> We recently assembled a fungal genome using MAKER and we got the gene models. and the corresponding transcripts, predicted proteins and GFF files. However, the predicted proteins do not have the stop codon included so I do not know which proteins are complete and which ones are incomplete at the 3' end. To solve that I have used different programs to extract the fasta sequence of the CDSs given the gff file and the genome sequence. The problem is that with the tools I have tested I get the right sequence for some of the proteins and wrong sequences for others (with multiple stop codons for example). I am not sure why it happens and since it happens with different tools (different python scripts and even gffread from cufflink) I do not know where is the problem. Could you please give me some advice on how to extract the right sequences with the stop codons included?
> 
> Thanks!
> 
> Diana
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140325/fb1d5733/attachment-0002.html>

From kchilds at plantbiology.msu.edu  Wed Mar 26 08:21:36 2014
From: kchilds at plantbiology.msu.edu (Childs, Kevin)
Date: Wed, 26 Mar 2014 14:21:36 +0000
Subject: [maker-devel] Maker iPlant image
In-Reply-To: <CF56ED91.B119%carsonhh@gmail.com>
References: <CAOfLjHVa1r8hdF0GK+gp59pmfZb7qZLO5rF0qwK7b+=hQ0CcrQ@mail.gmail.com>
	<CF56ED91.B119%carsonhh@gmail.com>
Message-ID: <BE1EEBCF-58A6-4045-B169-699EB189D299@plantbiology.msu.edu>

Daniel,

There are a few small issues with the MAKER-P_2.28 image at iPlant.  I have been using the image successfully for more than a month.  I typically set several environmental variables immediately after starting an ssh session.

export PATH=$PATH:/opt/maker/bin:/opt/maker/exe/snap:/opt/maker/exe/augustus/bin:/opt/maker/exe/augustus/scripts/
export ZOE=/opt/maker/exe/snap
export AUGUSTUS_CONFIG_PATH=/opt/maker/exe/augustus/config
export TMP=/tmp

The image will allow you to train SNAP, but training Augustus is not possible with the current image.  Augustus training requires blat which was not installed in this image.  There is also an issue where training Augustus requires that you write to the /opt/maker/exe/augustus/config/species/ directory which requires some inconvenient directory hacking.  I've worked this all out on a forked image (currently private), but I have not had the time to contact Joshua Stein to suggest some modifications to his public image.

Augustus should work with a stock hmm on this image.

I have not attempted to use GeneMark, and of course, fgenesh is a completely different story.

Kevin Childs


---
Kevin Childs, PhD

Assistant Professor - Fixed Term
Plant Biology Department
Michigan State University

kchilds at plantbiology.msu.edu
517-775-2844 (m)
517-353-5969 (l)

On Mar 25, 2014, at 10:24 AM, Carson Holt wrote:

> --> /opt/maker/bin/maker
> 
> It looks like most preinstalled software is under /opt on the image.
> 
> Thanks,
> Carson
> 
> 
> From: Daniel Standage <daniel.standage at gmail.com>
> Date: Tuesday, March 25, 2014 at 7:24 AM
> To: Maker Mailing List <maker-devel at yandell-lab.org>
> Subject: [maker-devel] Maker iPlant image
> 
> Greetings,
> 
> I launched an instance from the Maker-P 2.28 image (c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location of the installed software. All I could find was an example data set on the Desktop, but the "maker" program was not in the path and the contents of "/usr/local/src" are empty. Could you please advise on how to run Maker in iPlant Atmosphere? Thanks.
> 
> --
> Daniel S. Standage
> Ph.D. Candidate
> Computational Genome Science Laboratory
> Indiana University
> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From steinj at cshl.edu  Wed Mar 26 12:41:37 2014
From: steinj at cshl.edu (Stein, Joshua)
Date: Wed, 26 Mar 2014 18:41:37 +0000
Subject: [maker-devel] Maker iPlant image
In-Reply-To: <BE1EEBCF-58A6-4045-B169-699EB189D299@plantbiology.msu.edu>
References: <CAOfLjHVa1r8hdF0GK+gp59pmfZb7qZLO5rF0qwK7b+=hQ0CcrQ@mail.gmail.com>
	<CF56ED91.B119%carsonhh@gmail.com>
	<BE1EEBCF-58A6-4045-B169-699EB189D299@plantbiology.msu.edu>
Message-ID: <A6505FF9-06C4-4EB2-949B-EDA9113F64E3@cshl.edu>

Also please note that there is a tutorial available here, particularly important if you want to use in MPI mode.
https://pods.iplantcollaborative.org/wiki/display/sciplant/MAKER-P+Atmosphere+Tutorial

Josh

Joshua Stein, PhD
Manager, Sci. Informatics III
Cold Spring Harbor Laboratory
steinj at cshl.edu
http://ware.cshl.org/


On Mar 26, 2014, at 10:20 AM, "Childs, Kevin" <kchilds at plantbiology.msu.edu>
 wrote:

> Daniel,
> 
> There are a few small issues with the MAKER-P_2.28 image at iPlant.  I have been using the image successfully for more than a month.  I typically set several environmental variables immediately after starting an ssh session.
> 
> export PATH=$PATH:/opt/maker/bin:/opt/maker/exe/snap:/opt/maker/exe/augustus/bin:/opt/maker/exe/augustus/scripts/
> export ZOE=/opt/maker/exe/snap
> export AUGUSTUS_CONFIG_PATH=/opt/maker/exe/augustus/config
> export TMP=/tmp
> 
> The image will allow you to train SNAP, but training Augustus is not possible with the current image.  Augustus training requires blat which was not installed in this image.  There is also an issue where training Augustus requires that you write to the /opt/maker/exe/augustus/config/species/ directory which requires some inconvenient directory hacking.  I've worked this all out on a forked image (currently private), but I have not had the time to contact Joshua Stein to suggest some modifications to his public image.
> 
> Augustus should work with a stock hmm on this image.
> 
> I have not attempted to use GeneMark, and of course, fgenesh is a completely different story.
> 
> Kevin Childs
> 
> 
> ---
> Kevin Childs, PhD
> 
> Assistant Professor - Fixed Term
> Plant Biology Department
> Michigan State University
> 
> kchilds at plantbiology.msu.edu
> 517-775-2844 (m)
> 517-353-5969 (l)
> 
> On Mar 25, 2014, at 10:24 AM, Carson Holt wrote:
> 
>> --> /opt/maker/bin/maker
>> 
>> It looks like most preinstalled software is under /opt on the image.
>> 
>> Thanks,
>> Carson
>> 
>> 
>> From: Daniel Standage <daniel.standage at gmail.com>
>> Date: Tuesday, March 25, 2014 at 7:24 AM
>> To: Maker Mailing List <maker-devel at yandell-lab.org>
>> Subject: [maker-devel] Maker iPlant image
>> 
>> Greetings,
>> 
>> I launched an instance from the Maker-P 2.28 image (c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location of the installed software. All I could find was an example data set on the Desktop, but the "maker" program was not in the path and the contents of "/usr/local/src" are empty. Could you please advise on how to run Maker in iPlant Atmosphere? Thanks.
>> 
>> --
>> Daniel S. Standage
>> Ph.D. Candidate
>> Computational Genome Science Laboratory
>> Indiana University
>> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at yandell-lab.org
> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org


From brubin at fieldmuseum.org  Sat Mar 29 10:24:05 2014
From: brubin at fieldmuseum.org (Benjamin Rubin)
Date: Sat, 29 Mar 2014 11:24:05 -0500
Subject: [maker-devel] Missing UTRs in GFF
Message-ID: <CAKpVPBLQ9i9qKv3e=fpD+pU9YFTyUXUFQUiMh0j0N9aDgvSRcQ@mail.gmail.com>

I have annotated a eukaryotic genome with MAKER 2.30. I recently realized
that there are a few genes in the GFF file produced by gff3_merge with
inconsistencies in the annotated CDS and UTRs. For most of my genes, the
UTRs have their own lines in the GFF file. However, for the problematic
genes, the UTRs are not specified in the GFF file and all exons are
annotated as CDS. The UTRs do appear in the gene header and the protein
sequences are the correct length (do not include the UTR). I have attached
an example from the GFF file.

Is this a known problem, or have I done something wrong? Is there an easy
way to fix the GFF file?

Thanks for your help,
Ben

-- 
_____________________________________________________
Benjamin ER Rubin
PhD Candidate
Committee on Evolutionary Biology
University of Chicago
benrubin.org

Division of Insects
Zoology Department
Field Museum of Natural History
1400 South Lake Shore Drive
Chicago, IL 60605
USA
Office: (312) 665-7776
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140329/0f93b3b2/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: missing_utr.gff
Type: application/octet-stream
Size: 2934 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140329/0f93b3b2/attachment-0002.obj>

From mhinsley at ebi.ac.uk  Mon Mar 31 04:20:10 2014
From: mhinsley at ebi.ac.uk (Malcolm Hinsley)
Date: Mon, 31 Mar 2014 11:20:10 +0100
Subject: [maker-devel] putative preponderance of short exons??
Message-ID: <5339415A.1020509@ebi.ac.uk>

Hi

I've run Maker on a de novo assembly of a species of fly and then ran 
some simple statistics (intron/ exon/ CDS length, exons per gene)  over 
the GFF output and compared with a couple of other species.
It all looks good except that there is a surprising number of very short 
exons (6000 < 50 bp, 3500 < 30 bp, 878< 10 bp, 87k total - see attached 
pdf), black is drosophilia, red is A.gambiae, green is with 5' and 3' 
exons removed).

I ran est2genome & protein2genome, then 3 cycles of Augustus and SNAP.  
I'm using maker 2.31 (unpatched).

Anecdotally, these short exons appear without EST or protein evidence 
and they all line up with canonical splice sequences (GT----AG).
(but i've only looked at a few using Apollo).

While there's no requirement that exons should be longer I'm suspicious 
of this as there must be some evolutionary relationship between these 
species.
I've compared with a another species annotated with Maker (using SNAP 
and Augustus)  which is more distant (not yet publicly available), and 
the same pattern of short exons is present.
I wondered if they were created to fulfil the need for start/stop 
codons, but this does not appear to be the case (mostly they are mid-gene).


Is there some way to adjust the predictors eg to require external 
evidence? or anything else you could suggest? ... I can see the 
following in the tutorial but I'm not sure how they could help:

pred_flank=200 #flank for extending evidence clusters sent to gene predictors
pred_stats=0 #report AED and QI statistics for all predictions as well as models
AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
min_protein=0 #require at least this many amino acids in predicted proteins
alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no
always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no


thanks

-- 
malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD
United Kingdom

-------------- next part --------------
A non-text attachment was scrubbed...
Name: exon_53.pdf
Type: application/pdf
Size: 10619 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140331/edd22fe9/attachment-0002.pdf>

From carsonhh at gmail.com  Mon Mar 31 07:52:15 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 31 Mar 2014 07:52:15 -0600
Subject: [maker-devel] putative preponderance of short exons??
In-Reply-To: <5339415A.1020509@ebi.ac.uk>
References: <5339415A.1020509@ebi.ac.uk>
Message-ID: <CF5ECE08.B30C%carsonhh@gmail.com>

The intron/exon structure is determined by SNAP, Augustus, etc.  It is not
affected by any of the maker parameters.  Only evidence alignments are
affected by the maker settings.  You can try retraining or manually
editing the HMMs, but they might also be regions where your assembly is
incorrect and those algorithms make short exons in order to make a
structure work without getting stop codons mid gene.

Thanks,
Carson


On 3/31/14, 4:20 AM, "Malcolm Hinsley" <mhinsley at ebi.ac.uk> wrote:

>Hi
>
>I've run Maker on a de novo assembly of a species of fly and then ran
>some simple statistics (intron/ exon/ CDS length, exons per gene)  over
>the GFF output and compared with a couple of other species.
>It all looks good except that there is a surprising number of very short
>exons (6000 < 50 bp, 3500 < 30 bp, 878< 10 bp, 87k total - see attached
>pdf), black is drosophilia, red is A.gambiae, green is with 5' and 3'
>exons removed).
>
>I ran est2genome & protein2genome, then 3 cycles of Augustus and SNAP.
>I'm using maker 2.31 (unpatched).
>
>Anecdotally, these short exons appear without EST or protein evidence
>and they all line up with canonical splice sequences (GT----AG).
>(but i've only looked at a few using Apollo).
>
>While there's no requirement that exons should be longer I'm suspicious
>of this as there must be some evolutionary relationship between these
>species.
>I've compared with a another species annotated with Maker (using SNAP
>and Augustus)  which is more distant (not yet publicly available), and
>the same pattern of short exons is present.
>I wondered if they were created to fulfil the need for start/stop
>codons, but this does not appear to be the case (mostly they are
>mid-gene).
>
>
>Is there some way to adjust the predictors eg to require external
>evidence? or anything else you could suggest? ... I can see the
>following in the tutorial but I'm not sure how they could help:
>
>pred_flank=200 #flank for extending evidence clusters sent to gene
>predictors
>pred_stats=0 #report AED and QI statistics for all predictions as well as
>models
>AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and
>1)
>min_protein=0 #require at least this many amino acids in predicted
>proteins
>alt_splice=0 #Take extra steps to try and find alternative splicing, 1 =
>yes, 0 = no
>always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0
>= no
>
>
>thanks
>
>-- 
>malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669
>European Bioinformatics Institute (EMBL-EBI)
>European Molecular Biology Laboratory
>Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD
>United Kingdom
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Mon Mar 31 08:37:15 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 31 Mar 2014 08:37:15 -0600
Subject: [maker-devel] Missing UTRs in GFF
In-Reply-To: <CAKpVPBLQ9i9qKv3e=fpD+pU9YFTyUXUFQUiMh0j0N9aDgvSRcQ@mail.gmail.com>
References: <CAKpVPBLQ9i9qKv3e=fpD+pU9YFTyUXUFQUiMh0j0N9aDgvSRcQ@mail.gmail.com>
Message-ID: <CF5ED8D3.B31A%carsonhh@gmail.com>

Not something I've seen before, but there was a patch for another issue that
was cause by the use of avoid_est_fusion=1, that may be related.  Try the
current stable release 2.31, and let me know if it still happens.

You can also upload the contig folder from one of the regions in question
here --> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi

Then I could verify the bug, and see if it is something that happens in the
current release.

--Carson


From:  Benjamin Rubin <brubin at fieldmuseum.org>
Date:  Saturday, March 29, 2014 at 10:24 AM
To:  <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Missing UTRs in GFF

I have annotated a eukaryotic genome with MAKER 2.30. I recently realized
that there are a few genes in the GFF file produced by gff3_merge with
inconsistencies in the annotated CDS and UTRs. For most of my genes, the
UTRs have their own lines in the GFF file. However, for the problematic
genes, the UTRs are not specified in the GFF file and all exons are
annotated as CDS. The UTRs do appear in the gene header and the protein
sequences are the correct length (do not include the UTR). I have attached
an example from the GFF file.

Is this a known problem, or have I done something wrong? Is there an easy
way to fix the GFF file?

Thanks for your help,
Ben

-- 
_____________________________________________________
Benjamin ER Rubin
PhD Candidate
Committee on Evolutionary Biology
University of Chicago
benrubin.org <http://benrubin.org>

Division of Insects
Zoology Department
Field Museum of Natural History
1400 South Lake Shore Drive
Chicago, IL 60605
USA
Office: (312) 665-7776
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140331/9116f7cb/attachment-0002.html>

From pushplata.singh at teri.res.in  Sun Mar  2 22:29:37 2014
From: pushplata.singh at teri.res.in (Pushplata Singh)
Date: Mon, 3 Mar 2014 10:59:37 +0530
Subject: [maker-devel] Query on Hardware requirement
Message-ID: <OF837195A3.CDBC7472-ON65257C90.001D994D-65257C90.001E2DB9@teri.res.in>


Hi,

I am trying to assemble and analyse(bio-informatics) genome sequence of a
35 GB fungal genome. The raw data that has been generated from Illumina
sequencing is of  ~15 GB. Could you please suggest me the system (hardware)
requirement for installing and running Maker and ALLPATHS-LG sofrware for
the job?

Thank you
Pushplata Singh, PhD
Nanobiotechnology Centre
Biotechnology and Management of Bioresources Division
The Energy and Resources Institute
Darbari Seth Block , India Habitat Centre,Lodhi Road
New Delhi 110003 India
Phone +91 11 24682100 ext 2611
Fax +91 11 24682145


------------------------------------------------------------------------------------------------------------

Disclaimer:

The information contained in this e-mail is intended for the person or entity
to which it is addressed, and it may contain confidential and/or privileged
material. Any review or other use of this mail or taking any action based on it
by persons or entities other than the intended recipient is strictly prohibited.
If you receive this e-mail by mistake, please contact the sender, and delete all
copies of this mail.This e-mail has been scanned and verified by McAfee SaaS
Email Security, formerly MX Logic.


From dence at genetics.utah.edu  Mon Mar  3 07:11:34 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Mon, 3 Mar 2014 14:11:34 +0000
Subject: [maker-devel] Query on Hardware requirement
In-Reply-To: <OF837195A3.CDBC7472-ON65257C90.001D994D-65257C90.001E2DB9@teri.res.in>
References: <OF837195A3.CDBC7472-ON65257C90.001D994D-65257C90.001E2DB9@teri.res.in>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D68BF9@mxb2.hg.genetics.utah.edu>

Hi Pradeep, 

I think Allpaths is developed by the Broad Institute, so you'd have to check their documentation for their system requirments. MAKER is installable on Linux and Mac OS X computers. The throughput you'll be able to achieve with MAKER depends on how many processors and how much RAM the machine has. To take advantage of MAKER's ability to parallelize the annotation process, you need some version of MPI installed on your machine. MAKER can try to install MPI for you, but a manual installation is usually required. 

I hope that helps. 

Thanks,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Pushplata Singh [pushplata.singh at teri.res.in]
Sent: Sunday, March 02, 2014 10:29 PM
To: maker-devel at yandell-lab.org
Cc: Pradeep Dahiya
Subject: [maker-devel] Query on Hardware requirement

Hi,

I am trying to assemble and analyse(bio-informatics) genome sequence of a
35 GB fungal genome. The raw data that has been generated from Illumina
sequencing is of  ~15 GB. Could you please suggest me the system (hardware)
requirement for installing and running Maker and ALLPATHS-LG sofrware for
the job?

Thank you
Pushplata Singh, PhD
Nanobiotechnology Centre
Biotechnology and Management of Bioresources Division
The Energy and Resources Institute
Darbari Seth Block , India Habitat Centre,Lodhi Road
New Delhi 110003 India
Phone +91 11 24682100 ext 2611
Fax +91 11 24682145


------------------------------------------------------------------------------------------------------------

Disclaimer:

The information contained in this e-mail is intended for the person or entity
to which it is addressed, and it may contain confidential and/or privileged
material. Any review or other use of this mail or taking any action based on it
by persons or entities other than the intended recipient is strictly prohibited.
If you receive this e-mail by mistake, please contact the sender, and delete all
copies of this mail.This e-mail has been scanned and verified by McAfee SaaS
Email Security, formerly MX Logic.

_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carson.holt at genetics.utah.edu  Mon Mar  3 12:08:49 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Mon, 3 Mar 2014 19:08:49 +0000
Subject: [maker-devel] FW: error runinig agustus
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A890B159@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A890B159@SKREGIXES2.AGR.GC.CA>
Message-ID: <CF3A2120.A782%carson.holt@genetics.utah.edu>

Forwarding this to the maker-devel list.


On 3/3/14, 12:04 PM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:

>I encountered the following error while running maker (2nd annotation
>using gff file of the first maker run and trinity assembled RNA seq as
>EST)
>
>ERROR: Augustus failed
>--> rank=NA, hostname=rapa.agr.gc.ca
>
>Note : 1st run of the maker was done by Maker 2.10 and for the 2nd one I
>am using 2.31
>
>Your help is appreciated
>
>
>HB
>
>
>
>
>


From carsonhh at gmail.com  Mon Mar  3 12:11:08 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 03 Mar 2014 12:11:08 -0700
Subject: [maker-devel] FW: error runinig agustus
Message-ID: <CF3A21A5.A788%carsonhh@gmail.com>

You will need to provide more detail.  Probably the entire error log and
the maker control files.

Thanks,
Carson


On 3/3/14, 12:08 PM, "Carson Holt" <carson.holt at genetics.utah.edu> wrote:

>Forwarding this to the maker-devel list.
>
>
>On 3/3/14, 12:04 PM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:
>
>>I encountered the following error while running maker (2nd annotation
>>using gff file of the first maker run and trinity assembled RNA seq as
>>EST)
>>
>>ERROR: Augustus failed
>>--> rank=NA, hostname=rapa.agr.gc.ca
>>
>>Note : 1st run of the maker was done by Maker 2.10 and for the 2nd one I
>>am using 2.31
>>
>>Your help is appreciated
>>
>>
>>HB
>>
>>
>>
>>
>>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From sjackman at gmail.com  Tue Mar  4 19:10:42 2014
From: sjackman at gmail.com (Shaun Jackman)
Date: Tue, 4 Mar 2014 18:10:42 -0800
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
	<CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
	<CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>
Message-ID: <CADX6M3rdHPKSk4VMUpbxKSA=rhVVUk+L=dG1xMibWz1KntF2AA@mail.gmail.com>

Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for
the tip.

The rRNA genes that are found with est2genome have the feature type set to
*mRNA* and have corresponding *five_prime_UTR*, *CDS* and
*three_prime_UTR*features. Ideally the feature type would be set to
*rRNA* or *tRNA* as appropriate, and would omit the UTR and CDS features.
Is that a feature that you would be interested in adding to MAKER? The rRNA
gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is
standard, so determining the appropriate type should be straight forward.

Thanks again for your help with this. Cheers,
Shaun


On 27 February 2014 17:13, Carson Holt <carsonhh at gmail.com> wrote:

> Set single_exon=1, and the minimum size to a smaller value.  I think it's
> set to 250 right now.  Also est2genome is looking for ORF, so if there is
> none (as with tRNAs) they probably won't get picked up.
>
> --Carson
>
> Sent from my iPhone
>
> On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:
>
> Sorry, ignore my previous question. est_forward also carries forward the
> names of protein evidence and works like a charm. Thank you!
>
> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller
> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They
> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect
> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value
> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing
> these hits?
>
> organism_type=prokaryotic
> est2genome=1
> protein2genome=1
> est_forward=1
>
> Cheers,
> Shaun
>
>
> On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
>
>> Is there a corresponding protein_forward=1 option to map forward protein
>> names from protein2genome?
>>
>> Cheers,
>> Shaun
>>
>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com<//carsonhh at gmail.com>)
>> wrote:
>>
>> Sorry I meant to say prefilter on the score in the mRNA column before
>> passing the gff3 to model_gff.
>>
>> --Carson
>>
>> Sent from my iPhone
>>
>> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>
>>  What you can do is run it once with just est_forward=1 and
>> est2genome/protein2genome set to 1.  Then take those results, pass them in
>> as model_gff and use the map_forward option to then filter the results
>> based on mRNA score and that would copy names onto new gene under the
>> standard MAKER pipeline.  Eventually it?s really supposed to go into a
>> separate tool that will map genes onto new assemblies (but under the hood
>> the tool will just be calling MAKER with certain parameters restricted).  I
>> do this because if people commonly use it mixed with things like SNAP I can
>> start to get some very weird behaviors.
>>
>> Thanks,
>> Carson
>>
>>  From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>> Date: Wednesday, February 26, 2014 at 3:04 PM
>> To: Carson Holt <carsonhh at gmail.com>
>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] Mapping gene names
>>
>>  It seems that this could be a very useful option in those cases where
>> you have firm a priori knowledge of the placement of ESTs. However, while
>> trying it I note that est_forward implies that the est2genome predictor is
>> turned on, implicitly. Is this necessary for this to work? I?m after the
>> behavior you describe below where exonerate is made to try really hard
>> within a limited region to align an est, but I would not like maker to
>> produce est2genome predictions.
>>
>> In general, I think this maker_coor and est_forward is a feature set that
>> is worthy to be promoted into a documented feature.
>>
>> THanks,
>> Mikael
>>
>>  26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>>
>>  It will still work without est_forward.  It just works a little
>> differently.  Keep in mind this was a hidden feature I used to find
>> stubborn or hard to find missing genes after reassembly of a genome.
>>
>> If est_forward is provided, MAKER will parse the database to look for the
>> maker_coor tags early in the pipeline.  Then it will create a list of
>> locations to search, and it will search them even if there are no BLAST
>> results to seed the search (normally MAKER gets a BLAST result first and
>> then polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to
>> look for a match using all of chr1 as the input to exonerate even when
>> BLAST finds nothing (this is a very very slow search, but can help pick up
>> one or two stubborn genes that don?t remap well).  To allow this, MAKER
>> gives exonerate looser matching parameters (i.e. allows for single base
>> pair introns perhaps caused by assembly errors).  The logic here is that
>> given the fact that I already told MAKER that with some degree of
>> confidence I expect sequence A to map to to location X, it will try its
>> hardest to make it match.
>>
>> Without est_forward set, the maker_coor= flag still gets read in GI.pm at
>> line 1563, but only after a BLAST alignment has already seeded it to the
>> region (that BLAST result has the information in its description
>> parameter).  MAKER will then ignore seeds completely outside of maker_coor.
>> In addition any BLAST seeds that overlap maker_coor will get the search
>> space for alignment polishing adjusted to match maker_coor exactly.  Also
>> match parameters for exonerate will not be relaxed as they were with
>> est_forward.
>>
>> As you can see the behavior, is slightly different (because it?s an
>> accidental feature).
>>
>> Thanks,
>> Carson
>>
>>
>>
>>  From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>> Date: Wednesday, February 26, 2014 at 6:37 AM
>> To: Carson Holt <carsonhh at gmail.com>
>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] Mapping gene names
>>
>>  That might be a useful and time saving accidental feature. But, reading
>> the code, it seems that I need to supply maker_coor but not gene_id, as
>> well as the configuration option est_forward for this to work. Any
>> occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1
>> right?
>>
>> Mikael
>>
>>  26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>>
>>  Yes.  That should work as well as an accidental feature.
>>
>> --Carson
>>
>> Sent from my iPhone
>>
>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <
>> mikael.durling at slu.se> wrote:
>>
>> Can this use of maker_coor be used only to hint about the placement of
>> the ests, without affecting the naming of the final genes? Ie if I have a
>> database of EST where I have a priori knowledge of their rough placement,
>> can this placement be given to maker without providing est_forward=1?
>>
>> Thanks,
>> Mikael
>>
>>  26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>
>>  There is a way.  It?s not a standard option and it?s undocumented, but
>> if you add est_forward=1 to the maker_opts.ctl file, then it will do just
>> that.  The option won?t already be there so you?ll have to type it in.
>>
>> There is also a feature designed to work with this option.  If you add
>> tags to your fasta headers, those can be used to guide the mapping and
>> naming.  For example, gene_id=<some_gene>  will ensure different isoforms
>> that share a common gene_id get clustered into the same gene,
>> and maker_coor=chr1:1-10000 in the fasta header will force a particular
>> sequence to only be mapped against chr1 within the range of 1-10000 bp  and
>> just using maker_coor=chr1 will force it to only be mapped against chr1.
>>
>> This is an undocumented way to remap genes onto new assemblies using
>> blast alignments of earlier transcript or protein annotations as a guide.
>>
>> ?Carson
>>
>>
>>
>>
>>  From: Shaun Jackman <sjackman at gmail.com>
>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>> Date: Tuesday, February 25, 2014 at 5:06 PM
>> To: <maker-devel at yandell-lab.org>
>> Subject: [maker-devel] Mapping gene names
>>
>>  Hi,
>>
>> I?m annotating a genome using a closely related genome from Genbank,
>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence to
>> annotate my genome. I?ve run Maker, and the annotation seems to have worked
>> well. Is it possible to map the names of the genes from the related species
>> to my annotation? I see the *map_forward* option, which applies to the
>> *model_gff* parameter. Is there a similar option for *est* and *protein*?
>>
>> *maker_opts.ctl*
>>
>> est=NC_123456.frn
>> protein=NC_123456.faa
>> est2genome=1
>> protein2genome=1
>>
>> Thanks,
>> Shaun
>>  _______________________________________________ maker-devel mailing
>> list maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>>  http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>>
>>
>>   _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140304/86755749/attachment-0003.html>

From carsonhh at gmail.com  Tue Mar  4 19:33:12 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 04 Mar 2014 19:33:12 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CADX6M3rdHPKSk4VMUpbxKSA=rhVVUk+L=dG1xMibWz1KntF2AA@mail.gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
	<CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
	<CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>
	<CADX6M3rdHPKSk4VMUpbxKSA=rhVVUk+L=dG1xMibWz1KntF2AA@mail.gmail.com>
Message-ID: <CF3BD88C.A7D5%carsonhh@gmail.com>

Trying to call non-coding RNA from ESTs or even sequence homology is
extremely messy (non-trivial problem in most organisms with high false
positive rate), so MAKER for the most part doesn?t even try to do that.  It
focuses only on the coding genes.  You can now use tRNAscan and snoscan in
the newest version for some non-coding RNA support (those features were only
added a couple of months ago).  So just like other prediction tools (snap,
augustus etc.), the primary focus has always been the coding genes.  We?ve
only started adding non-coding RNA support recently for iPlant, so it?s
still relatively immature.

Thanks,
Carson


From:  Shaun Jackman <sjackman at gmail.com>
Reply-To:  Shaun Jackman <sjackman at gmail.com>
Date:  Tuesday, March 4, 2014 at 7:10 PM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Mapping gene names

Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for
the tip.

The rRNA genes that are found with est2genome have the feature type set to
mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR
features. Ideally the feature type would be set to rRNA or tRNA as
appropriate, and would omit the UTR and CDS features. Is that a feature that
you would be interested in adding to MAKER? The rRNA gene names all start
with ?rrn? and the tRNA gene names with ?trn?, as is standard, so
determining the appropriate type should be straight forward.

Thanks again for your help with this. Cheers,
Shaun


On 27 February 2014 17:13, Carson Holt <carsonhh at gmail.com> wrote:
> Set single_exon=1, and the minimum size to a smaller value.  I think it's set
> to 250 right now.  Also est2genome is looking for ORF, so if there is none (as
> with tRNAs) they probably won't get picked up.
> 
> --Carson 
> 
> Sent from my iPhone
> 
> On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:
> 
>> Sorry, ignore my previous question. est_forward also carries forward the
>> names of protein evidence and works like a charm. Thank you!
>> 
>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5
>> and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the
>> blastn output, and in the evidence_0.gff. rrn5 has perfect identity,
>> sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 <
>> eval_blastn=1e-10). How should I debug which filter is removing these hits?
>> organism_type=prokaryotic
>> est2genome=1
>> protein2genome=1
>> est_forward=1
>> Cheers,
>> Shaun
>> 
>> 
>> 
>> On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
>>> Is there a corresponding protein_forward=1 option to map forward protein
>>> names from protein2genome?
>>>  
>>> 
>>> Cheers,
>>> Shaun
>>> 
>>> 
>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com
>>> <mailto://carsonhh at gmail.com> ) wrote:
>>>  
>>>> Sorry I meant to say prefilter on the score in the mRNA column before
>>>> passing the gff3 to model_gff.
>>>> 
>>>> --Carson 
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>>> 
>>>>> What you can do is run it once with just est_forward=1 and
>>>>> est2genome/protein2genome set to 1.  Then take those results, pass them in
>>>>> as model_gff and use the map_forward option to then filter the results
>>>>> based on mRNA score and that would copy names onto new gene under the
>>>>> standard MAKER pipeline.  Eventually it?s really supposed to go into a
>>>>> separate tool that will map genes onto new assemblies (but under the hood
>>>>> the tool will just be calling MAKER with certain parameters restricted).
>>>>> I do this because if people commonly use it mixed with things like SNAP I
>>>>> can start to get some very weird behaviors.
>>>>> 
>>>>> Thanks,
>>>>> Carson
>>>>> 
>>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>>> Date: Wednesday, February 26, 2014 at 3:04 PM
>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>> 
>>>>> It seems that this could be a very useful option in those cases where you
>>>>> have firm a priori knowledge of the placement of ESTs. However, while
>>>>> trying it I note that est_forward implies that the est2genome predictor is
>>>>> turned on, implicitly. Is this necessary for this to work? I?m after the
>>>>> behavior you describe below where exonerate is made to try really hard
>>>>> within a limited region to align an est, but I would not like maker to
>>>>> produce est2genome predictions.
>>>>> 
>>>>> In general, I think this maker_coor and est_forward is a feature set that
>>>>> is worthy to be promoted into a documented feature.
>>>>> 
>>>>> THanks,
>>>>> Mikael
>>>>> 
>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>>>>> 
>>>>>> It will still work without est_forward.  It just works a little
>>>>>> differently.  Keep in mind this was a hidden feature I used to find
>>>>>> stubborn or hard to find missing genes after reassembly of a genome.
>>>>>> 
>>>>>> If est_forward is provided, MAKER will parse the database to look for the
>>>>>> maker_coor tags early in the pipeline.  Then it will create a list of
>>>>>> locations to search, and it will search them even if there are no BLAST
>>>>>> results to seed the search (normally MAKER gets a BLAST result first and
>>>>>> then polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to
>>>>>> look for a match using all of chr1 as the input to exonerate even when
>>>>>> BLAST finds nothing (this is a very very slow search, but can help pick
>>>>>> up one or two stubborn genes that don?t remap well).  To allow this,
>>>>>> MAKER gives exonerate looser matching parameters (i.e. allows for single
>>>>>> base pair introns perhaps caused by assembly errors).  The logic here is
>>>>>> that given the fact that I already told MAKER that with some degree of
>>>>>> confidence I expect sequence A to map to to location X, it will try its
>>>>>> hardest to make it match.
>>>>>> 
>>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm at
>>>>>> line 1563, but only after a BLAST alignment has already seeded it to the
>>>>>> region (that BLAST result has the information in its description
>>>>>> parameter).  MAKER will then ignore seeds completely outside of
>>>>>> maker_coor. In addition any BLAST seeds that overlap maker_coor will get
>>>>>> the search space for alignment polishing adjusted to match maker_coor
>>>>>> exactly.  Also match parameters for exonerate will not be relaxed as they
>>>>>> were with est_forward.
>>>>>> 
>>>>>> As you can see the behavior, is slightly different (because it?s an
>>>>>> accidental feature).
>>>>>> 
>>>>>> Thanks,
>>>>>> Carson
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM
>>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>>> 
>>>>>> That might be a useful and time saving accidental feature. But, reading
>>>>>> the code, it seems that I need to supply maker_coor but not gene_id, as
>>>>>> well as the configuration option est_forward for this to work. Any
>>>>>> occurrences of maker_coor in GI.pm seems to be conditioned on
>>>>>> set_forward=1 right?
>>>>>> 
>>>>>> Mikael
>>>>>> 
>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>> 
>>>>>>> Yes.  That should work as well as an accidental feature.
>>>>>>> 
>>>>>>> --Carson 
>>>>>>> 
>>>>>>> Sent from my iPhone
>>>>>>> 
>>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling
>>>>>>> <mikael.durling at slu.se> wrote:
>>>>>>> 
>>>>>>> Can this use of maker_coor be used only to hint about the placement of
>>>>>>> the ests, without affecting the naming of the final genes? Ie if I have
>>>>>>> a database of EST where I have a priori knowledge of their rough
>>>>>>> placement, can this placement be given to maker without providing
>>>>>>> est_forward=1?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Mikael
>>>>>>> 
>>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>> 
>>>>>>> There is a way.  It?s not a standard option and it?s undocumented, but
>>>>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do
>>>>>>> just that.  The option won?t already be there so you?ll have to type it
>>>>>>> in.
>>>>>>> 
>>>>>>> There is also a feature designed to work with this option.  If you add
>>>>>>> tags to your fasta headers, those can be used to guide the mapping and
>>>>>>> naming.  For example, gene_id=<some_gene>  will ensure different
>>>>>>> isoforms that share a common gene_id get clustered into the same gene,
>>>>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular
>>>>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp
>>>>>>> and just using maker_coor=chr1 will force it to only be mapped against
>>>>>>> chr1.
>>>>>>> 
>>>>>>> This is an undocumented way to remap genes onto new assemblies using
>>>>>>> blast alignments of earlier transcript or protein annotations as a
>>>>>>> guide.
>>>>>>> 
>>>>>>> ?Carson
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> From: Shaun Jackman <sjackman at gmail.com>
>>>>>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>>>>>> To: <maker-devel at yandell-lab.org>
>>>>>>> Subject: [maker-devel] Mapping gene names
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I?m annotating a genome using a closely related genome from Genbank,
>>>>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence
>>>>>>> to annotate my genome. I?ve run Maker, and the annotation seems to have
>>>>>>> worked well. Is it possible to map the names of the genes from the
>>>>>>> related species to my annotation? I see the map_forward option, which
>>>>>>> applies to the model_gff parameter. Is there a similar option for est
>>>>>>> and protein?
>>>>>>> 
>>>>>>> maker_opts.ctl
>>>>>>> est=NC_123456.frn
>>>>>>> protein=NC_123456.faa
>>>>>>> est2genome=1
>>>>>>> protein2genome=1
>>>>>>> Thanks,
>>>>>>> Shaun
>>>>>>> _______________________________________________ maker-devel mailing list
>>>>>>> maker-devel at box290.bluehost.com
>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>>> <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>>> > 
>>>>>>> _______________________________________________
>>>>>>> maker-devel mailing list
>>>>>>> maker-devel at box290.bluehost.com
>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>>> 
>>>>>> 
>>>>> 
>>>> _______________________________________________
>>>> maker-devel mailing list
>>>> maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140304/6f5e8e33/attachment-0003.html>

From felix.bemm at uni-wuerzburg.de  Wed Mar  5 09:35:33 2014
From: felix.bemm at uni-wuerzburg.de (Felix Bemm)
Date: Wed, 05 Mar 2014 17:35:33 +0100
Subject: [maker-devel] Build Issues - v2.31
Message-ID: <53175255.4050102@uni-wuerzburg.de>

Hi,

I am trying to build maker version 2.31. Got the following error:

Configuring MAKER with MPI support
'CCFLAGSEX' is not a valid config option for Inline::C
  at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm 
line 236
  at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm 
line 256
	Parallel::Application::MPI::_bind('/software/mpich2-1.5rc3/bin/mpicc', 
'/software/mpich2-1.5rc3/include', 'blib', '') called at 
/storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 277
	MAKER::Build::ACTION_build('MAKER::Build=HASH(0x2199060)') called at 
/usr/share/perl/5.14/Module/Build/Base.pm line 2024
	Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)', 
'build') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2007
	Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)', 'build') 
called at /storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 469
	MAKER::Build::ACTION_install('MAKER::Build=HASH(0x2199060)') called at 
/usr/share/perl/5.14/Module/Build/Base.pm line 2024
	Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)', 
'install') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2012
	Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)') called at 
./Build line 70

Same procedure worked with 2.29-beta!

Any ideas?

Felix

-- 
Felix Bemm
Department of Bioinformatics
University of W?rzburg, Germany
Tel: +49 931 - 31 83696
Fax: +49 931 - 31 84552
felix.bemm at uni-wuerzburg.de


From carsonhh at gmail.com  Wed Mar  5 09:40:05 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 05 Mar 2014 09:40:05 -0700
Subject: [maker-devel] Build Issues - v2.31
In-Reply-To: <53175255.4050102@uni-wuerzburg.de>
References: <53175255.4050102@uni-wuerzburg.de>
Message-ID: <CF3CA125.A7FA%carsonhh@gmail.com>

You need to update your Inline::C module.  The CCFLAGSEX option was added
to Inline::C a couple of years ago to allow users to pass in flags to the
compiler.

Thanks,
Carson


On 3/5/14, 9:35 AM, "Felix Bemm" <felix.bemm at uni-wuerzburg.de> wrote:

>Hi,
>
>I am trying to build maker version 2.31. Got the following error:
>
>Configuring MAKER with MPI support
>'CCFLAGSEX' is not a valid config option for Inline::C
>  at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm
>line 236
>  at /storage/software/src/maker/src/lib/Parallel/Application/MPI.pm
>line 256
>	Parallel::Application::MPI::_bind('/software/mpich2-1.5rc3/bin/mpicc',
>'/software/mpich2-1.5rc3/include', 'blib', '') called at
>/storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 277
>	MAKER::Build::ACTION_build('MAKER::Build=HASH(0x2199060)') called at
>/usr/share/perl/5.14/Module/Build/Base.pm line 2024
>	Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)',
>'build') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2007
>	Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)', 'build')
>called at /storage/software/src/maker/src/inc/lib/MAKER/Build.pm line 469
>	MAKER::Build::ACTION_install('MAKER::Build=HASH(0x2199060)') called at
>/usr/share/perl/5.14/Module/Build/Base.pm line 2024
>	Module::Build::Base::_call_action('MAKER::Build=HASH(0x2199060)',
>'install') called at /usr/share/perl/5.14/Module/Build/Base.pm line 2012
>	Module::Build::Base::dispatch('MAKER::Build=HASH(0x2199060)') called at
>./Build line 70
>
>Same procedure worked with 2.29-beta!
>
>Any ideas?
>
>Felix
>
>-- 
>Felix Bemm
>Department of Bioinformatics
>University of W?rzburg, Germany
>Tel: +49 931 - 31 83696
>Fax: +49 931 - 31 84552
>felix.bemm at uni-wuerzburg.de
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carson.holt at genetics.utah.edu  Wed Mar  5 12:02:26 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Wed, 5 Mar 2014 19:02:26 +0000
Subject: [maker-devel] FW: maker-control file
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A890B8A7@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A890B8A7@SKREGIXES2.AGR.GC.CA>
Message-ID: <CF3CC2C6.A802%carson.holt@genetics.utah.edu>


On 3/5/14, 11:59 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:

>Dear Maker users
>
>I want to run maker on a fungal genome of about 45 Mb with about 1/3 of
>the genome begin repeat rich. But most of the virulent genes are located
>within the repeat regions flanked but stretch of repeats. I am not sure
>if I  use the repeat masker option I am going to miss out on the
>predication of these virulent genes located within the repeats.
>
>Other concerns with the setting in maker-opts file for fungal genomes are:
>
>single_exon = 0     should this get changed to 1 since single exon genes
>are quit common in fungi and what is the consequence of this on using EST
>and assembled RNA as evidence for gene prediction
>
>correct_est_fusion=0                  #limits use of ESTs in annotation
>to avoid fusion genes         as I understand this option will remove the
>overlapping UTRs but what is the consequence of setting this option on
>the use of EST for predicting ORFs
>
>
>Thanks
>
>
>
>HB
>
>
>
>


From carsonhh at gmail.com  Wed Mar  5 12:17:57 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 05 Mar 2014 12:17:57 -0700
Subject: [maker-devel] FW: maker-control file
Message-ID: <CF3CC300.A805%carsonhh@gmail.com>

Not using repeat masking will cause many problems.  Beside a gene being
flanked by repeats does not mean it will be lost, any evidence/alignments
that can seed in non-repetative regions (gene/exon) are still allowed to
extend into repetitive regions during the polishing stage (aligners have
two stages - seed and extend).  So transposons should never seed, but
genes will because there sequence will contain non-repetative regions
(even if they are near repeats).

single_exon should be set to 1 for fungi, just make sure to set the
minimum length of single exon evidence to something reasonable like 250bp.

correct_est_fusion should not be used together with est2genome.  It won?t
fail, you just get odd results.  Actually est2genome should not ever be
used to generate the final annotation set.  It is a convenience method
that allows you to generate rough models for training gene predictors like
SNAP and Augustus.  But once they are trained it should be turned off,
because the models it produces will be partial (Ests rarely cover the
whole transcript) and the results will have many false potties from
background transcription events from your EST data.  These models are good
enough to train with, but make very poor final annotations. So in the end
you should be using correct_est_fusion=1 with the SNAP pr Augustus set and
not est2genome (which should already have been turned off by then).


Thanks,
Carson


>
>
>On 3/5/14, 11:59 AM, "Borhan, Hossein" <> wrote:
>
>>Dear Maker users
>>
>>I want to run maker on a fungal genome of about 45 Mb with about 1/3 of
>>the genome begin repeat rich. But most of the virulent genes are located
>>within the repeat regions flanked but stretch of repeats. I am not sure
>>if I  use the repeat masker option I am going to miss out on the
>>predication of these virulent genes located within the repeats.
>>
>>Other concerns with the setting in maker-opts file for fungal genomes
>>are:
>>
>>single_exon = 0     should this get changed to 1 since single exon genes
>>are quit common in fungi and what is the consequence of this on using EST
>>and assembled RNA as evidence for gene prediction
>>
>>correct_est_fusion=0                  #limits use of ESTs in annotation
>>to avoid fusion genes         as I understand this option will remove the
>>overlapping UTRs but what is the consequence of setting this option on
>>the use of EST for predicting ORFs
>>
>>
>>Thanks
>>
>>
>>
>>HB
>>
>>
>>
>>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From marc.hoeppner at imbim.uu.se  Thu Mar  6 00:26:29 2014
From: marc.hoeppner at imbim.uu.se (=?Windows-1252?Q?Marc_H=F6ppner?=)
Date: Thu, 6 Mar 2014 07:26:29 +0000
Subject: [maker-devel] FW: maker-control file
In-Reply-To: <CF3CC300.A805%carsonhh@gmail.com>
References: <CF3CC300.A805%carsonhh@gmail.com>
Message-ID: <1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se>

Hi,

I think this is an interesting comment that I would like a few more information on:


correct_est_fusion should not be used together with est2genome.  It won?t
fail, you just get odd results.  Actually est2genome should not ever be
used to generate the final annotation set.  It is a convenience method
that allows you to generate rough models for training gene predictors like
SNAP and Augustus.  But once they are trained it should be turned off,
because the models it produces will be partial (Ests rarely cover the
whole transcript) and the results will have many false potties from
background transcription events from your EST data.  These models are good
enough to train with, but make very poor final annotations. So in the end
you should be using correct_est_fusion=1 with the SNAP pr Augustus set and
not est2genome (which should already have been turned off by then).


My experience has been that the process of training gene finders, especially for complex genomes like vertebrates, is a very slow and painful process. And ultimately, the results are far from accurate, even with a sizeable, manually curated training set. Wouldn?t it be more sensible to rely on the evidence over probabilistic models? The annotation would be partial, but on the other hand the chance of incorporating false signals are smaller (assuming I can generate a clean set of transcripts from RNA-seq data)? And I?d rather underestimate the exon inventory slightly than putting out an annotation with ~ 10% false exon calls.

As an example, using SNAP and Augustus on a bird genome - with augustus achieving nucleotide and exon sensitivities in the 70-90% range gave a host if false exons that were simply not supported by the RNAseq data, yet made it into the final gene build. Not sure what to think about that to be honest. Is it possible to get some more details on how Maker uses ab-inito predictions and reconciles them with evidence alignments? At the moment it seems to me that maker gives higher weight to the ab-initio predictions, which to me seems problematic.


/Marc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140306/f7acdc87/attachment-0003.html>

From carsonhh at gmail.com  Thu Mar  6 07:29:35 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Mar 2014 07:29:35 -0700
Subject: [maker-devel] FW: maker-control file
In-Reply-To: <1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se>
References: <CF3CC300.A805%carsonhh@gmail.com>
	<1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se>
Message-ID: <CF3DCCB0.A85C%carsonhh@gmail.com>

> Wouldn?t it be more sensible to rely on the evidence over probabilistic
> models?

Yes.  Infact that is the backbone of MAKER.  The evidence is used to derive
hints that are passed back into the predictors and reviewed in light of the
evidence to decide on final models (no longer strictly probabalistic).  Take
a look at the MAKER2 paper (Table 2 and Figure 1) and you will see that eve
when you use the wrong species parameters in the predictor (I.e. A. thaliana
to annotate C. elegant) you get as much as a 3 fold increase in exon level
accuracy by using the hint feedback from MAKER.  With est2genome option you
don?t get that hint feedback (normally probabilistic models, EST evidence,
and protein evidence would all work together), and the models are overall
poorer and contain more false positives (we have looked at this a lot).


> The annotation would be partial, but on the other hand the chance of
> incorporating false signals are smaller (assuming I can generate a clean set
> of transcripts from RNA-seq data)?

False signals are abundant.  It?s just the nature of how ESTs and especially
mRNAseq reads are generated and anchored back to the assembly.  By letting
there be feedback between the probabilistic model and the evidence (both
protein and EST/mRNAseq) a lot of this is eliminated.


> As an example, using SNAP and Augustus on a bird genome - with augustus
> achieving nucleotide and exon sensitivities in the 70-90% range gave a host if
> false exons that were simply not supported by the RNAseq data, yet made it
> into the final gene build.

You will get false positives from est2genome alone approach as well.  Models
will be more partial, and false negative rate will be very high (often
30-70% false negative rate).  Also look at the MAKER2 paper Figure 1.  The
false positive rate from ab initio alone can be quite high, but with the
evidence feedback it is substantially reduced (especially for poorly trained
predictors).


> Is it possible to get some more details on how Maker uses ab-inito predictions
> and reconciles them with evidence alignments? At the moment it seems to me
> that maker gives higher weight to the ab-initio predictions, which to me seems
> problematic. 

Take a look at the MAKER, MAKER2, and MAKER-P papers.  Final genes are
chosen based off of evidence overlap using AED (completely evidence based).
It is the model generation that leverages the hint based feedback.  The
names of MAKER genes can let you know what the source of the model is.  Any
time hint based models match the evidence better the name will have hame
like this ?>
maker-<contig>-<predictor>-gene-<ID> (I.e. maker-chr1-snap-gene-0.4)

When the ab initio model matches better than the hint based model the name
is like this ?>
<predictor>-<contig>-abinit-gene-<ID> (I.e. snap-chr1-abinit-gene-0.2)


In summary, using est2genome alone (while good for generating training sets)
undercuts the power of the evidence feedback together with the probabilistic
models.


Thanks,
Carson

From:  Marc H?ppner <marc.hoeppner at imbim.uu.se>
Date:  Thursday, March 6, 2014 at 12:26 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] FW: maker-control file

Hi,

I think this is an interesting comment that I would like a few more
information on:

> 
> correct_est_fusion should not be used together with est2genome.  It won?t
> fail, you just get odd results.  Actually est2genome should not ever be
> used to generate the final annotation set.  It is a convenience method
> that allows you to generate rough models for training gene predictors like
> SNAP and Augustus.  But once they are trained it should be turned off,
> because the models it produces will be partial (Ests rarely cover the
> whole transcript) and the results will have many false potties from
> background transcription events from your EST data.  These models are good
> enough to train with, but make very poor final annotations. So in the end
> you should be using correct_est_fusion=1 with the SNAP pr Augustus set and
> not est2genome (which should already have been turned off by then).
> 

My experience has been that the process of training gene finders, especially
for complex genomes like vertebrates, is a very slow and painful process.
And ultimately, the results are far from accurate, even with a sizeable,
manually curated training set. Wouldn?t it be more sensible to rely on the
evidence over probabilistic models? The annotation would be partial, but on
the other hand the chance of incorporating false signals are smaller
(assuming I can generate a clean set of transcripts from RNA-seq data)? And
I?d rather underestimate the exon inventory slightly than putting out an
annotation with ~ 10% false exon calls.

As an example, using SNAP and Augustus on a bird genome - with augustus
achieving nucleotide and exon sensitivities in the 70-90% range gave a host
if false exons that were simply not supported by the RNAseq data, yet made
it into the final gene build. Not sure what to think about that to be
honest. Is it possible to get some more details on how Maker uses ab-inito
predictions and reconciles them with evidence alignments? At the moment it
seems to me that maker gives higher weight to the ab-initio predictions,
which to me seems problematic.


/Marc


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140306/465e3b3f/attachment-0003.html>

From marc.hoeppner at imbim.uu.se  Thu Mar  6 07:40:48 2014
From: marc.hoeppner at imbim.uu.se (=?Windows-1252?Q?Marc_H=F6ppner?=)
Date: Thu, 6 Mar 2014 14:40:48 +0000
Subject: [maker-devel] FW: maker-control file
In-Reply-To: <CF3DCCB0.A85C%carsonhh@gmail.com>
References: <CF3CC300.A805%carsonhh@gmail.com>
	<1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se>
	<CF3DCCB0.A85C%carsonhh@gmail.com>
Message-ID: <1E6F33D6-44FE-44C5-81C5-8FE58DA07D27@imbim.uu.se>

Hi Carson,

Thanks for the detailed feedback, this has cleared up a few things. I don?t necessarily share your view on the problematic nature of RNA-seq data - especially with newer protocols near-perfect strandedness. We work a lot on transcriptome assembly and with a stringent approach to transcript assembly I think I got better results with est2genome than trying to let Maker work with a semi-refined ab-initio model. But it can be a bit tricky to hit that sweet spot (we did validate > 4000 models manually in order to make that sort of assessment tho).

But I will have another look at this and see if I can get Maker to do what I need with the approach you describe. That reminds me, I think it would be fantastic if you guys could put together a Wiki for Maker. This is such a useful and powerful tool, but clearly there are many things that people should get a proper explanation on that has only ever been discussed on this list here - best practices, experimental features etc.

Regards,

Marc


On 06 Mar 2014, at 15:29, Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>> wrote:

Wouldn?t it be more sensible to rely on the evidence over probabilistic models?

Yes.  Infact that is the backbone of MAKER.  The evidence is used to derive hints that are passed back into the predictors and reviewed in light of the evidence to decide on final models (no longer strictly probabalistic).  Take a look at the MAKER2 paper (Table 2 and Figure 1) and you will see that eve when you use the wrong species parameters in the predictor (I.e. A. thaliana to annotate C. elegant) you get as much as a 3 fold increase in exon level accuracy by using the hint feedback from MAKER.  With est2genome option you don?t get that hint feedback (normally probabilistic models, EST evidence, and protein evidence would all work together), and the models are overall poorer and contain more false positives (we have looked at this a lot).


The annotation would be partial, but on the other hand the chance of incorporating false signals are smaller (assuming I can generate a clean set of transcripts from RNA-seq data)?

False signals are abundant.  It?s just the nature of how ESTs and especially mRNAseq reads are generated and anchored back to the assembly.  By letting there be feedback between the probabilistic model and the evidence (both protein and EST/mRNAseq) a lot of this is eliminated.


As an example, using SNAP and Augustus on a bird genome - with augustus achieving nucleotide and exon sensitivities in the 70-90% range gave a host if false exons that were simply not supported by the RNAseq data, yet made it into the final gene build.

You will get false positives from est2genome alone approach as well.  Models will be more partial, and false negative rate will be very high (often 30-70% false negative rate).  Also look at the MAKER2 paper Figure 1.  The false positive rate from ab initio alone can be quite high, but with the evidence feedback it is substantially reduced (especially for poorly trained predictors).


Is it possible to get some more details on how Maker uses ab-inito predictions and reconciles them with evidence alignments? At the moment it seems to me that maker gives higher weight to the ab-initio predictions, which to me seems problematic.

Take a look at the MAKER, MAKER2, and MAKER-P papers.  Final genes are chosen based off of evidence overlap using AED (completely evidence based).  It is the model generation that leverages the hint based feedback.  The names of MAKER genes can let you know what the source of the model is.  Any time hint based models match the evidence better the name will have hame like this ?>
maker-<contig>-<predictor>-gene-<ID> (I.e. maker-chr1-snap-gene-0.4)

When the ab initio model matches better than the hint based model the name is like this ?>
<predictor>-<contig>-abinit-gene-<ID> (I.e. snap-chr1-abinit-gene-0.2)


In summary, using est2genome alone (while good for generating training sets) undercuts the power of the evidence feedback together with the probabilistic models.


Thanks,
Carson

From: Marc H?ppner <marc.hoeppner at imbim.uu.se<mailto:marc.hoeppner at imbim.uu.se>>
Date: Thursday, March 6, 2014 at 12:26 AM
To: Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] FW: maker-control file

Hi,

I think this is an interesting comment that I would like a few more information on:


correct_est_fusion should not be used together with est2genome.  It won?t
fail, you just get odd results.  Actually est2genome should not ever be
used to generate the final annotation set.  It is a convenience method
that allows you to generate rough models for training gene predictors like
SNAP and Augustus.  But once they are trained it should be turned off,
because the models it produces will be partial (Ests rarely cover the
whole transcript) and the results will have many false potties from
background transcription events from your EST data.  These models are good
enough to train with, but make very poor final annotations. So in the end
you should be using correct_est_fusion=1 with the SNAP pr Augustus set and
not est2genome (which should already have been turned off by then).


My experience has been that the process of training gene finders, especially for complex genomes like vertebrates, is a very slow and painful process. And ultimately, the results are far from accurate, even with a sizeable, manually curated training set. Wouldn?t it be more sensible to rely on the evidence over probabilistic models? The annotation would be partial, but on the other hand the chance of incorporating false signals are smaller (assuming I can generate a clean set of transcripts from RNA-seq data)? And I?d rather underestimate the exon inventory slightly than putting out an annotation with ~ 10% false exon calls.

As an example, using SNAP and Augustus on a bird genome - with augustus achieving nucleotide and exon sensitivities in the 70-90% range gave a host if false exons that were simply not supported by the RNAseq data, yet made it into the final gene build. Not sure what to think about that to be honest. Is it possible to get some more details on how Maker uses ab-inito predictions and reconciles them with evidence alignments? At the moment it seems to me that maker gives higher weight to the ab-initio predictions, which to me seems problematic.


/Marc

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140306/868effc6/attachment-0003.html>

From carsonhh at gmail.com  Thu Mar  6 08:03:10 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Mar 2014 08:03:10 -0700
Subject: [maker-devel] FW: maker-control file
In-Reply-To: <1E6F33D6-44FE-44C5-81C5-8FE58DA07D27@imbim.uu.se>
References: <CF3CC300.A805%carsonhh@gmail.com>
	<1560C956-4159-403D-8167-8727D6A4A587@imbim.uu.se>
	<CF3DCCB0.A85C%carsonhh@gmail.com>
	<1E6F33D6-44FE-44C5-81C5-8FE58DA07D27@imbim.uu.se>
Message-ID: <CF3DDC22.A8AF%carsonhh@gmail.com>

MAKER wiki ?> 
http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Main_Page

Thanks,
Carson


From:  Marc H?ppner <marc.hoeppner at imbim.uu.se>
Date:  Thursday, March 6, 2014 at 7:40 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] FW: maker-control file

Hi Carson, 

Thanks for the detailed feedback, this has cleared up a few things. I don?t
necessarily share your view on the problematic nature of RNA-seq data -
especially with newer protocols near-perfect strandedness. We work a lot on
transcriptome assembly and with a stringent approach to transcript assembly
I think I got better results with est2genome than trying to let Maker work
with a semi-refined ab-initio model. But it can be a bit tricky to hit that
sweet spot (we did validate > 4000 models manually in order to make that
sort of assessment tho).

But I will have another look at this and see if I can get Maker to do what I
need with the approach you describe. That reminds me, I think it would be
fantastic if you guys could put together a Wiki for Maker. This is such a
useful and powerful tool, but clearly there are many things that people
should get a proper explanation on that has only ever been discussed on this
list here - best practices, experimental features etc.

Regards,

Marc


On 06 Mar 2014, at 15:29, Carson Holt <carsonhh at gmail.com> wrote:

>> Wouldn?t it be more sensible to rely on the evidence over probabilistic
>> models?
> 
> Yes.  Infact that is the backbone of MAKER.  The evidence is used to derive
> hints that are passed back into the predictors and reviewed in light of the
> evidence to decide on final models (no longer strictly probabalistic).  Take a
> look at the MAKER2 paper (Table 2 and Figure 1) and you will see that eve when
> you use the wrong species parameters in the predictor (I.e. A. thaliana to
> annotate C. elegant) you get as much as a 3 fold increase in exon level
> accuracy by using the hint feedback from MAKER.  With est2genome option you
> don?t get that hint feedback (normally probabilistic models, EST evidence, and
> protein evidence would all work together), and the models are overall poorer
> and contain more false positives (we have looked at this a lot).
> 
> 
>> The annotation would be partial, but on the other hand the chance of
>> incorporating false signals are smaller (assuming I can generate a clean set
>> of transcripts from RNA-seq data)?
> 
> False signals are abundant.  It?s just the nature of how ESTs and especially
> mRNAseq reads are generated and anchored back to the assembly.  By letting
> there be feedback between the probabilistic model and the evidence (both
> protein and EST/mRNAseq) a lot of this is eliminated.
> 
> 
>> As an example, using SNAP and Augustus on a bird genome - with augustus
>> achieving nucleotide and exon sensitivities in the 70-90% range gave a host
>> if false exons that were simply not supported by the RNAseq data, yet made it
>> into the final gene build.
> 
> You will get false positives from est2genome alone approach as well.  Models
> will be more partial, and false negative rate will be very high (often 30-70%
> false negative rate).  Also look at the MAKER2 paper Figure 1.  The false
> positive rate from ab initio alone can be quite high, but with the evidence
> feedback it is substantially reduced (especially for poorly trained
> predictors).
> 
> 
>> Is it possible to get some more details on how Maker uses ab-inito
>> predictions and reconciles them with evidence alignments? At the moment it
>> seems to me that maker gives higher weight to the ab-initio predictions,
>> which to me seems problematic.
> 
> Take a look at the MAKER, MAKER2, and MAKER-P papers.  Final genes are chosen
> based off of evidence overlap using AED (completely evidence based).  It is
> the model generation that leverages the hint based feedback.  The names of
> MAKER genes can let you know what the source of the model is.  Any time hint
> based models match the evidence better the name will have hame like this ?>
> maker-<contig>-<predictor>-gene-<ID> (I.e. maker-chr1-snap-gene-0.4)
> 
> When the ab initio model matches better than the hint based model the name is
> like this ?>
> <predictor>-<contig>-abinit-gene-<ID> (I.e. snap-chr1-abinit-gene-0.2)
> 
> 
> In summary, using est2genome alone (while good for generating training sets)
> undercuts the power of the evidence feedback together with the probabilistic
> models.
> 
> 
> Thanks,
> Carson
> 
> From: Marc H?ppner <marc.hoeppner at imbim.uu.se>
> Date: Thursday, March 6, 2014 at 12:26 AM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] FW: maker-control file
> 
> Hi,
> 
> I think this is an interesting comment that I would like a few more
> information on:
> 
>> 
>> correct_est_fusion should not be used together with est2genome.  It won?t
>> fail, you just get odd results.  Actually est2genome should not ever be
>> used to generate the final annotation set.  It is a convenience method
>> that allows you to generate rough models for training gene predictors like
>> SNAP and Augustus.  But once they are trained it should be turned off,
>> because the models it produces will be partial (Ests rarely cover the
>> whole transcript) and the results will have many false potties from
>> background transcription events from your EST data.  These models are good
>> enough to train with, but make very poor final annotations. So in the end
>> you should be using correct_est_fusion=1 with the SNAP pr Augustus set and
>> not est2genome (which should already have been turned off by then).
>> 
> 
> My experience has been that the process of training gene finders, especially
> for complex genomes like vertebrates, is a very slow and painful process. And
> ultimately, the results are far from accurate, even with a sizeable, manually
> curated training set. Wouldn?t it be more sensible to rely on the evidence
> over probabilistic models? The annotation would be partial, but on the other
> hand the chance of incorporating false signals are smaller (assuming I can
> generate a clean set of transcripts from RNA-seq data)? And I?d rather
> underestimate the exon inventory slightly than putting out an annotation with
> ~ 10% false exon calls.
> 
> As an example, using SNAP and Augustus on a bird genome - with augustus
> achieving nucleotide and exon sensitivities in the 70-90% range gave a host if
> false exons that were simply not supported by the RNAseq data, yet made it
> into the final gene build. Not sure what to think about that to be honest. Is
> it possible to get some more details on how Maker uses ab-inito predictions
> and reconciles them with evidence alignments? At the moment it seems to me
> that maker gives higher weight to the ab-initio predictions, which to me seems
> problematic. 
> 
> 
> /Marc


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140306/10d5f640/attachment-0003.html>

From sjackman at gmail.com  Thu Mar  6 13:56:34 2014
From: sjackman at gmail.com (Shaun Jackman)
Date: Thu, 6 Mar 2014 12:56:34 -0800
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CF3BD88C.A7D5%carsonhh@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
	<CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
	<CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>
	<CADX6M3rdHPKSk4VMUpbxKSA=rhVVUk+L=dG1xMibWz1KntF2AA@mail.gmail.com>
	<CF3BD88C.A7D5%carsonhh@gmail.com>
Message-ID: <etPan.5318e112.238e1f29.9018@pshen01-imac.phage.bcgsc.ca>

Hi, Carson. I agree that identifying non-coding RNA by homology in general is a non-trivial problem. In my particular case, I have a well annotated reference species that is very closely related (99.2% sequence identity), so lifting over the annotations from that reference species to my species should be pretty straight forward. It would be great if MAKER had an option for RNA sequence homology similar to est2genome that does not imply the sequence is coding.

The integration of MAKER-P with tRNAscan is very useful. The identified genes are named e.g. `trnascan-205522-processed-gene-0.38`. ?tRNA genes are conventionally named according to the amino acid and anticodon, such as `trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the names with that convention?

Cheers,
Shaun


On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote:

Trying to call non-coding RNA from ESTs or even sequence homology is extremely messy (non-trivial problem in most organisms with high false positive rate), so MAKER for the most part doesn?t even try to do that. ?It focuses only on the coding genes. ?You can now use tRNAscan and snoscan in the newest version for some non-coding RNA support (those features were only added a couple of months ago). ?So just like other prediction tools (snap, augustus etc.), the primary focus has always been the coding genes. ?We?ve only started adding non-coding RNA support recently for iPlant, so it?s still relatively immature.

Thanks,
Carson


From: Shaun Jackman <sjackman at gmail.com>
Reply-To: Shaun Jackman <sjackman at gmail.com>
Date: Tuesday, March 4, 2014 at 7:10 PM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

Hi, Carson. I set  
single_length=50, and it worked like a charm. Thanks for the tip.

The rRNA genes that are found with est2genome have the feature type set to mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. Ideally the feature type would be set to rRNA or tRNA as appropriate, and would omit the UTR and CDS features. Is that a feature that you would be interested in adding to MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is standard, so determining the appropriate type should be straight forward.

Thanks again for your help with this. Cheers,
Shaun


On 27 February 2014 17:13, Carson Holt <carsonhh at gmail.com> wrote:
Set single_exon=1, and the minimum size to a smaller value. ?I think it's set to 250 right now. ?Also est2genome is looking for ORF, so if there is none (as with tRNAs) they probably won't get picked up.

--Carson?

Sent from my iPhone

On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:

Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you!

The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits?


organism_type=prokaryotic
est2genome=1
protein2genome=1
est_forward=1

Cheers,
Shaun


On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
Is there a corresponding?protein_forward=1 option to map forward protein names from protein2genome?

Cheers,
Shaun

On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote:

Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff.

--Carson?

Sent from my iPhone

On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:

What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. ?Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. ?Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). ?I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors.?

Thanks,
Carson

From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date: Wednesday, February 26, 2014 at 3:04 PM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions.

In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature.

THanks,
Mikael

26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:

It will still work without est_forward. ?It just works a little differently. ?Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome.

If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. ?Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). ?So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). ?To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). ?The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match.?

Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). ?MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. ?Also match parameters for exonerate will not be relaxed as they were with est_forward.

As you can see the behavior, is slightly different (because it?s an accidental feature).

Thanks,
Carson


From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date: Wednesday, February 26, 2014 at 6:37 AM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right?

Mikael

26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:

Yes. ?That should work as well as an accidental feature.

--Carson?

Sent from my iPhone

On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se> wrote:

Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?

Thanks,
Mikael

26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:

There is a way. ?It?s not a standard option and it?s undocumented, but if you add?est_forward=1 to the maker_opts.ctl file, then it will do just that. ?The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option. ?If you add tags to your fasta headers, those can be used to guide the mapping and naming. ?For example, gene_id=<some_gene> ?will ensure different isoforms that share a common gene_id get clustered into the same gene, and?maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp ?and just using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.

?Carson


From: Shaun Jackman <sjackman at gmail.com>
Reply-To: Shaun Jackman <sjackman at gmail.com>
Date: Tuesday, February 25, 2014 at 5:06 PM
To: <maker-devel at yandell-lab.org>
Subject: [maker-devel] Mapping gene names

Hi,

I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?

maker_opts.ctl


est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1

Thanks,
Shaun

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140306/b953179f/attachment-0003.html>

From carsonhh at gmail.com  Thu Mar  6 13:58:41 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Mar 2014 13:58:41 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <etPan.5318e112.238e1f29.9018@pshen01-imac.phage.bcgsc.ca>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
	<CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
	<CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>
	<CADX6M3rdHPKSk4VMUpbxKSA=rhVVUk+L=dG1xMibWz1KntF2AA@mail.gmail.com>
	<CF3BD88C.A7D5%carsonhh@gmail.com>
	<etPan.5318e112.238e1f29.9018@pshen01-imac.phage.bcgsc.ca>
Message-ID: <CF3E2F7A.A911%carsonhh@gmail.com>

Yes.  I?ll fix the naming.

Thanks,
Carson


From:  Shaun Jackman <sjackman at gmail.com>
Date:  Thursday, March 6, 2014 at 1:56 PM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Mapping gene names

Hi, Carson. I agree that identifying non-coding RNA by homology in general
is a non-trivial problem. In my particular case, I have a well annotated
reference species that is very closely related (99.2% sequence identity), so
lifting over the annotations from that reference species to my species
should be pretty straight forward. It would be great if MAKER had an option
for RNA sequence homology similar to est2genome that does not imply the
sequence is coding.

The integration of MAKER-P with tRNAscan is very useful. The identified
genes are named e.g. `trnascan-205522-processed-gene-0.38`.  tRNA genes are
conventionally named according to the amino acid and anticodon, such as
`trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the
names with that convention?

Cheers,
Shaun


On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote:
 
> Trying to call non-coding RNA from ESTs or even sequence homology is extremely
> messy (non-trivial problem in most organisms with high false positive rate),
> so MAKER for the most part doesn?t even try to do that.  It focuses only on
> the coding genes.  You can now use tRNAscan and snoscan in the newest version
> for some non-coding RNA support (those features were only added a couple of
> months ago).  So just like other prediction tools (snap, augustus etc.), the
> primary focus has always been the coding genes.  We?ve only started adding
> non-coding RNA support recently for iPlant, so it?s still relatively immature.
> 
> Thanks,
> Carson
> 
> 
> From: Shaun Jackman <sjackman at gmail.com>
> Reply-To: Shaun Jackman <sjackman at gmail.com>
> Date: Tuesday, March 4, 2014 at 7:10 PM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Mapping gene names
> 
> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for the
> tip.
> 
> The rRNA genes that are found with est2genome have the feature type set to
> mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features.
> Ideally the feature type would be set to rRNA or tRNA as appropriate, and
> would omit the UTR and CDS features. Is that a feature that you would be
> interested in adding to MAKER? The rRNA gene names all start with ?rrn? and
> the tRNA gene names with ?trn?, as is standard, so determining the appropriate
> type should be straight forward.
> 
> Thanks again for your help with this. Cheers,
> Shaun
> 
> 
> 
> On 27 February 2014 17:13, Carson Holt <carsonhh at gmail.com> wrote:
>> Set single_exon=1, and the minimum size to a smaller value.  I think it's set
>> to 250 right now.  Also est2genome is looking for ORF, so if there is none
>> (as with tRNAs) they probably won't get picked up.
>> 
>> --Carson 
>> 
>> Sent from my iPhone
>> 
>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:
>> 
>>> Sorry, ignore my previous question. est_forward also carries forward the
>>> names of protein evidence and works like a charm. Thank you!
>>> 
>>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5
>>> and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in
>>> the blastn output, and in the evidence_0.gff. rrn5 has perfect identity,
>>> sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 <
>>> eval_blastn=1e-10). How should I debug which filter is removing these hits?
>>> organism_type=prokaryotic
>>> est2genome=1
>>> protein2genome=1
>>> est_forward=1
>>> Cheers,
>>> Shaun
>>> 
>>> 
>>> 
>>> On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
>>>> Is there a corresponding protein_forward=1 option to map forward protein
>>>> names from protein2genome?
>>>> 
>>>> Cheers, 
>>>> Shaun
>>>> 
>>>> 
>>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com
>>>> <mailto://carsonhh at gmail.com> ) wrote:
>>>>> 
>>>>> Sorry I meant to say prefilter on the score in the mRNA column before
>>>>> passing the gff3 to model_gff.
>>>>> 
>>>>> --Carson 
>>>>> 
>>>>> Sent from my iPhone
>>>>> 
>>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>>>> 
>>>>>> What you can do is run it once with just est_forward=1 and
>>>>>> est2genome/protein2genome set to 1.  Then take those results, pass them
>>>>>> in as model_gff and use the map_forward option to then filter the results
>>>>>> based on mRNA score and that would copy names onto new gene under the
>>>>>> standard MAKER pipeline.  Eventually it?s really supposed to go into a
>>>>>> separate tool that will map genes onto new assemblies (but under the hood
>>>>>> the tool will just be calling MAKER with certain parameters restricted).
>>>>>> I do this because if people commonly use it mixed with things like SNAP I
>>>>>> can start to get some very weird behaviors.
>>>>>> 
>>>>>> Thanks,
>>>>>> Carson
>>>>>> 
>>>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>>>> Date: Wednesday, February 26, 2014 at 3:04 PM
>>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>>> 
>>>>>> It seems that this could be a very useful option in those cases where you
>>>>>> have firm a priori knowledge of the placement of ESTs. However, while
>>>>>> trying it I note that est_forward implies that the est2genome predictor
>>>>>> is turned on, implicitly. Is this necessary for this to work? I?m after
>>>>>> the behavior you describe below where exonerate is made to try really
>>>>>> hard within a limited region to align an est, but I would not like maker
>>>>>> to produce est2genome predictions.
>>>>>> 
>>>>>> In general, I think this maker_coor and est_forward is a feature set that
>>>>>> is worthy to be promoted into a documented feature.
>>>>>> 
>>>>>> THanks,
>>>>>> Mikael
>>>>>> 
>>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>> 
>>>>>>> It will still work without est_forward.  It just works a little
>>>>>>> differently.  Keep in mind this was a hidden feature I used to find
>>>>>>> stubborn or hard to find missing genes after reassembly of a genome.
>>>>>>> 
>>>>>>> If est_forward is provided, MAKER will parse the database to look for
>>>>>>> the maker_coor tags early in the pipeline.  Then it will create a list
>>>>>>> of locations to search, and it will search them even if there are no
>>>>>>> BLAST results to seed the search (normally MAKER gets a BLAST result
>>>>>>> first and then polishes it with exonerate).  So maker_coor=chr1 will
>>>>>>> cause MAKER to look for a match using all of chr1 as the input to
>>>>>>> exonerate even when BLAST finds nothing (this is a very very slow
>>>>>>> search, but can help pick up one or two stubborn genes that don?t remap
>>>>>>> well).  To allow this, MAKER gives exonerate looser matching parameters
>>>>>>> (i.e. allows for single base pair introns perhaps caused by assembly
>>>>>>> errors).  The logic here is that given the fact that I already told
>>>>>>> MAKER that with some degree of confidence I expect sequence A to map to
>>>>>>> to location X, it will try its hardest to make it match.
>>>>>>> 
>>>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm
>>>>>>> at line 1563, but only after a BLAST alignment has already seeded it to
>>>>>>> the region (that BLAST result has the information in its description
>>>>>>> parameter).  MAKER will then ignore seeds completely outside of
>>>>>>> maker_coor. In addition any BLAST seeds that overlap maker_coor will get
>>>>>>> the search space for alignment polishing adjusted to match maker_coor
>>>>>>> exactly.  Also match parameters for exonerate will not be relaxed as
>>>>>>> they were with est_forward.
>>>>>>> 
>>>>>>> As you can see the behavior, is slightly different (because it?s an
>>>>>>> accidental feature).
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Carson
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM
>>>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>>>> 
>>>>>>> That might be a useful and time saving accidental feature. But, reading
>>>>>>> the code, it seems that I need to supply maker_coor but not gene_id, as
>>>>>>> well as the configuration option est_forward for this to work. Any
>>>>>>> occurrences of maker_coor in GI.pm seems to be conditioned on
>>>>>>> set_forward=1 right?
>>>>>>> 
>>>>>>> Mikael
>>>>>>> 
>>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>> 
>>>>>>> Yes.  That should work as well as an accidental feature.
>>>>>>> 
>>>>>>> --Carson 
>>>>>>> 
>>>>>>> Sent from my iPhone
>>>>>>> 
>>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling
>>>>>>> <mikael.durling at slu.se> wrote:
>>>>>>> 
>>>>>>> Can this use of maker_coor be used only to hint about the placement of
>>>>>>> the ests, without affecting the naming of the final genes? Ie if I have
>>>>>>> a database of EST where I have a priori knowledge of their rough
>>>>>>> placement, can this placement be given to maker without providing
>>>>>>> est_forward=1?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Mikael
>>>>>>> 
>>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>> 
>>>>>>> There is a way.  It?s not a standard option and it?s undocumented, but
>>>>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do
>>>>>>> just that.  The option won?t already be there so you?ll have to type it
>>>>>>> in.
>>>>>>> 
>>>>>>> There is also a feature designed to work with this option.  If you add
>>>>>>> tags to your fasta headers, those can be used to guide the mapping and
>>>>>>> naming.  For example, gene_id=<some_gene>  will ensure different
>>>>>>> isoforms that share a common gene_id get clustered into the same gene,
>>>>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular
>>>>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp
>>>>>>> and just using maker_coor=chr1 will force it to only be mapped against
>>>>>>> chr1.
>>>>>>> 
>>>>>>> This is an undocumented way to remap genes onto new assemblies using
>>>>>>> blast alignments of earlier transcript or protein annotations as a
>>>>>>> guide.
>>>>>>> 
>>>>>>> ?Carson
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> From: Shaun Jackman <sjackman at gmail.com>
>>>>>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>>>>>> To: <maker-devel at yandell-lab.org>
>>>>>>> Subject: [maker-devel] Mapping gene names
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I?m annotating a genome using a closely related genome from Genbank,
>>>>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence
>>>>>>> to annotate my genome. I?ve run Maker, and the annotation seems to have
>>>>>>> worked well. Is it possible to map the names of the genes from the
>>>>>>> related species to my annotation? I see the map_forward option, which
>>>>>>> applies to the model_gff parameter. Is there a similar option for est
>>>>>>> and protein?
>>>>>>> 
>>>>>>> maker_opts.ctl
>>>>>>> est=NC_123456.frn
>>>>>>> protein=NC_123456.faa
>>>>>>> est2genome=1
>>>>>>> protein2genome=1
>>>>>>> Thanks,
>>>>>>> Shaun
>>>>>>> _______________________________________________ maker-devel mailing list
>>>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listin
>>>>>>> fo/maker-devel_yandell-lab.org
>>>>>>> _______________________________________________
>>>>>>> maker-devel mailing list
>>>>>>> maker-devel at box290.bluehost.com
>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> _______________________________________________
>>>>> maker-devel mailing list
>>>>> maker-devel at box290.bluehost.com
>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140306/7d17d96d/attachment-0003.html>

From carson.holt at genetics.utah.edu  Thu Mar  6 16:00:40 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Thu, 6 Mar 2014 23:00:40 +0000
Subject: [maker-devel] maker problem with running blast
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A890BAE7@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A890BAE7@SKREGIXES2.AGR.GC.CA>
Message-ID: <CF3E4A6E.A92B%carson.holt@genetics.utah.edu>

Your blast_type parameter in maker_bopts.ctl is set to 'wublast' but the
executables for wublast are blank in maker_exe.ctl.

See, they?re blank ?>
xdformat=#location of WUBLAST xdformat executable
blasta=#location of WUBLAST blasta executable


You either need to provide executables or set your blast_type parameter to
something else. For example, you could set it to 'NCBI+', but you will nee
to fix the location of makeblastdb.

makeblastdb is set incorrectly here?>
makeblastdb=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+ #location of
NCBI+ makeblastdb executable


Alternativley you can set blast_type to 'NCBI', but you will need to
uncomment the executables.

Here?>
formatdb=#/usr/local/bin/formatdb #location of NCBI formatdb executable
blastall=#/usr/local/bin/blastall #location of NCBI blastall executable


?Carson


On 3/6/14, 3:51 PM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:

>Hi
>
>I have installed latest version of blast+ and provided the excitable path
>to the maker_exec.ctl  as follow
>
>#-----Location of Executables Used by MAKER/EVALUATOR
>makeblastdb=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+ #location of
>NCBI+ makeblastdb executable
>blastn=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+/bin/blastn #location
>of NCBI+ blastn executable
>blastx=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+/bin/blastx #location
>of NCBI+ blastx executable
>tblastx=/home/AAFC-AAC/borhanh/bin/ncbi-blast-2.2.29+/bin/tblastx
>#location of NCBI+ tblastx executable
>formatdb=#/usr/local/bin/formatdb #location of NCBI formatdb executable
>blastall=#/usr/local/bin/blastall #location of NCBI blastall executable
>xdformat=#location of WUBLAST xdformat executable
>blasta=#location of WUBLAST blasta executable
>RepeatMasker=/usr/local/RepeatMasker/RepeatMasker #location of
>RepeatMasker executable
>exonerate=/home/AAFC-AAC/borhanh/bin/exonerate-2.2.0-x86_64/bin/exonerate
>#location of exonerate executable
>
>#-----Ab-initio Gene Prediction Algorithms
>snap=/home/AAFC-AAC/borhanh/bin/snap/snap #location of snap executable
>gmhmme3=/home/AAFC-AAC/borhanh/bin/gm_es_bp_linux64_v2.3e/gmes/gmhmme3
>#location of eukaryotic genemark executable
>gmhmmp= #location of prokaryotic genemark executable
>augustus=/usr/local/augustus.2.5.5/bin/augustus #location of augustus
>executable
>fgenesh=/usr/local/FGENESH/fgenesh #location of fgenesh executable
>
>#-----Other Algorithms
>fathom=/home/AAFC-AAC/borhanh/bin/snap/fathom #location of fathom
>executable (experimental)
>probuild=/home/AAFC-AAC/borhanh/bin/gm_es_bp_linux64_v2.3e/gmes/probuild
>#location of probuild executable (required for genemark)
>
>
>
>
>
>But when running maker I get this error
>
>
>STATUS: Parsing control files...
>WARNING: blast_type is set to 'wublast' but executables cannot be located
>ERROR: Please provide a valid locaction for a BLAST algorithm in the
>control files.
>
>
>
>
>
>
>


From sjackman at gmail.com  Thu Mar  6 16:33:04 2014
From: sjackman at gmail.com (Shaun Jackman)
Date: Thu, 6 Mar 2014 15:33:04 -0800
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CF3E2F7A.A911%carsonhh@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
	<CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
	<CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>
	<CADX6M3rdHPKSk4VMUpbxKSA=rhVVUk+L=dG1xMibWz1KntF2AA@mail.gmail.com>
	<CF3BD88C.A7D5%carsonhh@gmail.com>
	<etPan.5318e112.238e1f29.9018@pshen01-imac.phage.bcgsc.ca>
	<CF3E2F7A.A911%carsonhh@gmail.com>
Message-ID: <etPan.531905bf.79e2a9e3.9018@pshen01-imac.phage.bcgsc.ca>

Fantastic. Thanks, Carson. When I use both est2genome and tRNAscan to identify tRNA, I was hoping that both forms of evidence would be used to create a single gene model, which doesn?t seem to be the case. I get duplicate overlapping gene models (one mRNA from est and one tRNA from tRNAscan). Could MAKER merge these models?

Cheers,
Shaun
On 2014-March-06 at 12:58:50 , Carson Holt (carsonhh at gmail.com) wrote:

Yes. ?I?ll fix the naming.

Thanks,
Carson


From: Shaun Jackman <sjackman at gmail.com>
Date: Thursday, March 6, 2014 at 1:56 PM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

Hi, Carson. I agree that identifying non-coding RNA by homology in general is a non-trivial problem. In my particular case, I have a well annotated reference species that is very closely related (99.2% sequence identity), so lifting over the annotations from that reference species to my species should be pretty straight forward. It would be great if MAKER had an option for RNA sequence homology similar to est2genome that does not imply the sequence is coding.

The integration of MAKER-P with tRNAscan is very useful. The identified genes are named e.g. `trnascan-205522-processed-gene-0.38`. ?tRNA genes are conventionally named according to the amino acid and anticodon, such as `trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the names with that convention?

Cheers,
Shaun


On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote:

Trying to call non-coding RNA from ESTs or even sequence homology is extremely messy (non-trivial problem in most organisms with high false positive rate), so MAKER for the most part doesn?t even try to do that. ?It focuses only on the coding genes. ?You can now use tRNAscan and snoscan in the newest version for some non-coding RNA support (those features were only added a couple of months ago). ?So just like other prediction tools (snap, augustus etc.), the primary focus has always been the coding genes. ?We?ve only started adding non-coding RNA support recently for iPlant, so it?s still relatively immature.

Thanks,
Carson


From: Shaun Jackman <sjackman at gmail.com>
Reply-To: Shaun Jackman <sjackman at gmail.com>
Date: Tuesday, March 4, 2014 at 7:10 PM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

Hi, Carson. I set  
single_length=50, and it worked like a charm. Thanks for the tip.

The rRNA genes that are found with est2genome have the feature type set to mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. Ideally the feature type would be set to rRNA or tRNA as appropriate, and would omit the UTR and CDS features. Is that a feature that you would be interested in adding to MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is standard, so determining the appropriate type should be straight forward.

Thanks again for your help with this. Cheers,
Shaun


On 27 February 2014 17:13, Carson Holt <carsonhh at gmail.com> wrote:
Set single_exon=1, and the minimum size to a smaller value. ?I think it's set to 250 right now. ?Also est2genome is looking for ORF, so if there is none (as with tRNAs) they probably won't get picked up.

--Carson?

Sent from my iPhone

On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:

Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you!

The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits?


organism_type=prokaryotic
est2genome=1
protein2genome=1
est_forward=1

Cheers,
Shaun


On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
Is there a corresponding?protein_forward=1 option to map forward protein names from protein2genome?

Cheers,
Shaun

On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote:

Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff.

--Carson?

Sent from my iPhone

On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:

What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. ?Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. ?Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). ?I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors.?

Thanks,
Carson

From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date: Wednesday, February 26, 2014 at 3:04 PM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions.

In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature.

THanks,
Mikael

26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:

It will still work without est_forward. ?It just works a little differently. ?Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome.

If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. ?Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). ?So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). ?To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). ?The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match.?

Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). ?MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. ?Also match parameters for exonerate will not be relaxed as they were with est_forward.

As you can see the behavior, is slightly different (because it?s an accidental feature).

Thanks,
Carson


From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date: Wednesday, February 26, 2014 at 6:37 AM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right?

Mikael

26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:

Yes. ?That should work as well as an accidental feature.

--Carson?

Sent from my iPhone

On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se> wrote:

Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?

Thanks,
Mikael

26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:

There is a way. ?It?s not a standard option and it?s undocumented, but if you add?est_forward=1 to the maker_opts.ctl file, then it will do just that. ?The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option. ?If you add tags to your fasta headers, those can be used to guide the mapping and naming. ?For example, gene_id=<some_gene> ?will ensure different isoforms that share a common gene_id get clustered into the same gene, and?maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp ?and just using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.

?Carson


From: Shaun Jackman <sjackman at gmail.com>
Reply-To: Shaun Jackman <sjackman at gmail.com>
Date: Tuesday, February 25, 2014 at 5:06 PM
To: <maker-devel at yandell-lab.org>
Subject: [maker-devel] Mapping gene names

Hi,

I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?

maker_opts.ctl


est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1

Thanks,
Shaun

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140306/41dd51b0/attachment-0003.html>

From carsonhh at gmail.com  Thu Mar  6 16:38:48 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Mar 2014 16:38:48 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <etPan.531905bf.79e2a9e3.9018@pshen01-imac.phage.bcgsc.ca>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
	<CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
	<CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>
	<CADX6M3rdHPKSk4VMUpbxKSA=rhVVUk+L=dG1xMibWz1KntF2AA@mail.gmail.com>
	<CF3BD88C.A7D5%carsonhh@gmail.com>
	<etPan.5318e112.238e1f29.9018@pshen01-imac.phage.bcgsc.ca>
	<CF3E2F7A.A911%carsonhh@gmail.com>
	<etPan.531905bf.79e2a9e3.9018@pshen01-imac.phage.bcgsc.ca>
Message-ID: <CF3E5408.A93F%carsonhh@gmail.com>

Well? not really.  I have no plans to add est2genome support for noncoding
genes (non-trivial), so you would either have to remove the ncRNA from your
input, or filter it out downstream.

Thanks,
Carson


From:  Shaun Jackman <sjackman at gmail.com>
Date:  Thursday, March 6, 2014 at 4:33 PM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Mapping gene names

Fantastic. Thanks, Carson. When I use both est2genome and tRNAscan to
identify tRNA, I was hoping that both forms of evidence would be used to
create a single gene model, which doesn?t seem to be the case. I get
duplicate overlapping gene models (one mRNA from est and one tRNA from
tRNAscan). Could MAKER merge these models?

Cheers,
Shaun
On 2014-March-06 at 12:58:50 , Carson Holt (carsonhh at gmail.com) wrote:
 
> Yes.  I?ll fix the naming.
> 
> Thanks,
> Carson
> 
> 
> From: Shaun Jackman <sjackman at gmail.com>
> Date: Thursday, March 6, 2014 at 1:56 PM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Mapping gene names
> 
> Hi, Carson. I agree that identifying non-coding RNA by homology in general is
> a non-trivial problem. In my particular case, I have a well annotated
> reference species that is very closely related (99.2% sequence identity), so
> lifting over the annotations from that reference species to my species should
> be pretty straight forward. It would be great if MAKER had an option for RNA
> sequence homology similar to est2genome that does not imply the sequence is
> coding.
> 
> The integration of MAKER-P with tRNAscan is very useful. The identified genes
> are named e.g. `trnascan-205522-processed-gene-0.38`.  tRNA genes are
> conventionally named according to the amino acid and anticodon, such as
> `trnW-CCA`. Would it be possible for MAKER to name or perhaps prefix the names
> with that convention?
> 
> Cheers,
> Shaun
> 
> 
> On 2014-March-04 at 18:33:20 , Carson Holt (carsonhh at gmail.com) wrote:
>> 
>> Trying to call non-coding RNA from ESTs or even sequence homology is
>> extremely messy (non-trivial problem in most organisms with high false
>> positive rate), so MAKER for the most part doesn?t even try to do that.  It
>> focuses only on the coding genes.  You can now use tRNAscan and snoscan in
>> the newest version for some non-coding RNA support (those features were only
>> added a couple of months ago).  So just like other prediction tools (snap,
>> augustus etc.), the primary focus has always been the coding genes.  We?ve
>> only started adding non-coding RNA support recently for iPlant, so it?s still
>> relatively immature.
>> 
>> Thanks,
>> Carson
>> 
>> 
>> From: Shaun Jackman <sjackman at gmail.com>
>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>> Date: Tuesday, March 4, 2014 at 7:10 PM
>> To: Carson Holt <carsonhh at gmail.com>
>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] Mapping gene names
>> 
>> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for
>> the tip.
>> 
>> The rRNA genes that are found with est2genome have the feature type set to
>> mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features.
>> Ideally the feature type would be set to rRNA or tRNA as appropriate, and
>> would omit the UTR and CDS features. Is that a feature that you would be
>> interested in adding to MAKER? The rRNA gene names all start with ?rrn? and
>> the tRNA gene names with ?trn?, as is standard, so determining the
>> appropriate type should be straight forward.
>> 
>> Thanks again for your help with this. Cheers,
>> Shaun
>> 
>> 
>> 
>> On 27 February 2014 17:13, Carson Holt <carsonhh at gmail.com> wrote:
>>> Set single_exon=1, and the minimum size to a smaller value.  I think it's
>>> set to 250 right now.  Also est2genome is looking for ORF, so if there is
>>> none (as with tRNAs) they probably won't get picked up.
>>> 
>>> --Carson 
>>> 
>>> Sent from my iPhone
>>> 
>>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:
>>> 
>>>> Sorry, ignore my previous question. est_forward also carries forward the
>>>> names of protein evidence and works like a charm. Thank you!
>>>> 
>>>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller
>>>> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They
>>>> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect
>>>> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value
>>>> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing
>>>> these hits?
>>>> organism_type=prokaryotic
>>>> est2genome=1
>>>> protein2genome=1
>>>> est_forward=1
>>>> Cheers,
>>>> Shaun
>>>> 
>>>> 
>>>> 
>>>> On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
>>>>> Is there a corresponding protein_forward=1 option to map forward protein
>>>>> names from protein2genome?
>>>>> 
>>>>> Cheers, 
>>>>> Shaun
>>>>> 
>>>>> 
>>>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com
>>>>> <mailto://carsonhh at gmail.com> ) wrote:
>>>>>> 
>>>>>> Sorry I meant to say prefilter on the score in the mRNA column before
>>>>>> passing the gff3 to model_gff.
>>>>>> 
>>>>>> --Carson 
>>>>>> 
>>>>>> Sent from my iPhone
>>>>>> 
>>>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>>>>> 
>>>>>>> What you can do is run it once with just est_forward=1 and
>>>>>>> est2genome/protein2genome set to 1.  Then take those results, pass them
>>>>>>> in as model_gff and use the map_forward option to then filter the
>>>>>>> results based on mRNA score and that would copy names onto new gene
>>>>>>> under the standard MAKER pipeline.  Eventually it?s really supposed to
>>>>>>> go into a separate tool that will map genes onto new assemblies (but
>>>>>>> under the hood the tool will just be calling MAKER with certain
>>>>>>> parameters restricted).  I do this because if people commonly use it
>>>>>>> mixed with things like SNAP I can start to get some very weird
>>>>>>> behaviors. 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Carson
>>>>>>> 
>>>>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>>>>> Date: Wednesday, February 26, 2014 at 3:04 PM
>>>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>>>> 
>>>>>>> It seems that this could be a very useful option in those cases where
>>>>>>> you have firm a priori knowledge of the placement of ESTs. However,
>>>>>>> while trying it I note that est_forward implies that the est2genome
>>>>>>> predictor is turned on, implicitly. Is this necessary for this to work?
>>>>>>> I?m after the behavior you describe below where exonerate is made to try
>>>>>>> really hard within a limited region to align an est, but I would not
>>>>>>> like maker to produce est2genome predictions.
>>>>>>> 
>>>>>>> In general, I think this maker_coor and est_forward is a feature set
>>>>>>> that is worthy to be promoted into a documented feature.
>>>>>>> 
>>>>>>> THanks,
>>>>>>> Mikael
>>>>>>> 
>>>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>> 
>>>>>>> It will still work without est_forward.  It just works a little
>>>>>>> differently.  Keep in mind this was a hidden feature I used to find
>>>>>>> stubborn or hard to find missing genes after reassembly of a genome.
>>>>>>> 
>>>>>>> If est_forward is provided, MAKER will parse the database to look for
>>>>>>> the maker_coor tags early in the pipeline.  Then it will create a list
>>>>>>> of locations to search, and it will search them even if there are no
>>>>>>> BLAST results to seed the search (normally MAKER gets a BLAST result
>>>>>>> first and then polishes it with exonerate).  So maker_coor=chr1 will
>>>>>>> cause MAKER to look for a match using all of chr1 as the input to
>>>>>>> exonerate even when BLAST finds nothing (this is a very very slow
>>>>>>> search, but can help pick up one or two stubborn genes that don?t remap
>>>>>>> well).  To allow this, MAKER gives exonerate looser matching parameters
>>>>>>> (i.e. allows for single base pair introns perhaps caused by assembly
>>>>>>> errors).  The logic here is that given the fact that I already told
>>>>>>> MAKER that with some degree of confidence I expect sequence A to map to
>>>>>>> to location X, it will try its hardest to make it match.
>>>>>>> 
>>>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm
>>>>>>> at line 1563, but only after a BLAST alignment has already seeded it to
>>>>>>> the region (that BLAST result has the information in its description
>>>>>>> parameter).  MAKER will then ignore seeds completely outside of
>>>>>>> maker_coor. In addition any BLAST seeds that overlap maker_coor will get
>>>>>>> the search space for alignment polishing adjusted to match maker_coor
>>>>>>> exactly.  Also match parameters for exonerate will not be relaxed as
>>>>>>> they were with est_forward.
>>>>>>> 
>>>>>>> As you can see the behavior, is slightly different (because it?s an
>>>>>>> accidental feature).
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Carson
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM
>>>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>>>> 
>>>>>>> That might be a useful and time saving accidental feature. But, reading
>>>>>>> the code, it seems that I need to supply maker_coor but not gene_id, as
>>>>>>> well as the configuration option est_forward for this to work. Any
>>>>>>> occurrences of maker_coor in GI.pm seems to be conditioned on
>>>>>>> set_forward=1 right?
>>>>>>> 
>>>>>>> Mikael
>>>>>>> 
>>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>> 
>>>>>>> Yes.  That should work as well as an accidental feature.
>>>>>>> 
>>>>>>> --Carson 
>>>>>>> 
>>>>>>> Sent from my iPhone
>>>>>>> 
>>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling
>>>>>>> <mikael.durling at slu.se> wrote:
>>>>>>> 
>>>>>>> Can this use of maker_coor be used only to hint about the placement of
>>>>>>> the ests, without affecting the naming of the final genes? Ie if I have
>>>>>>> a database of EST where I have a priori knowledge of their rough
>>>>>>> placement, can this placement be given to maker without providing
>>>>>>> est_forward=1?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Mikael
>>>>>>> 
>>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>> 
>>>>>>> There is a way.  It?s not a standard option and it?s undocumented, but
>>>>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do
>>>>>>> just that.  The option won?t already be there so you?ll have to type it
>>>>>>> in.
>>>>>>> 
>>>>>>> There is also a feature designed to work with this option.  If you add
>>>>>>> tags to your fasta headers, those can be used to guide the mapping and
>>>>>>> naming.  For example, gene_id=<some_gene>  will ensure different
>>>>>>> isoforms that share a common gene_id get clustered into the same gene,
>>>>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular
>>>>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp
>>>>>>> and just using maker_coor=chr1 will force it to only be mapped against
>>>>>>> chr1.
>>>>>>> 
>>>>>>> This is an undocumented way to remap genes onto new assemblies using
>>>>>>> blast alignments of earlier transcript or protein annotations as a
>>>>>>> guide.
>>>>>>> 
>>>>>>> ?Carson
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> From: Shaun Jackman <sjackman at gmail.com>
>>>>>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>>>>>> To: <maker-devel at yandell-lab.org>
>>>>>>> Subject: [maker-devel] Mapping gene names
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I?m annotating a genome using a closely related genome from Genbank,
>>>>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence
>>>>>>> to annotate my genome. I?ve run Maker, and the annotation seems to have
>>>>>>> worked well. Is it possible to map the names of the genes from the
>>>>>>> related species to my annotation? I see the map_forward option, which
>>>>>>> applies to the model_gff parameter. Is there a similar option for est
>>>>>>> and protein?
>>>>>>> 
>>>>>>> maker_opts.ctl
>>>>>>> est=NC_123456.frn
>>>>>>> protein=NC_123456.faa
>>>>>>> est2genome=1
>>>>>>> protein2genome=1
>>>>>>> Thanks,
>>>>>>> Shaun
>>>>>>> _______________________________________________ maker-devel mailing list
>>>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listin
>>>>>>> fo/maker-devel_yandell-lab.org
>>>>>>> _______________________________________________
>>>>>>> maker-devel mailing list
>>>>>>> maker-devel at box290.bluehost.com
>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> _______________________________________________
>>>>>> maker-devel mailing list
>>>>>> maker-devel at box290.bluehost.com
>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>> 
>> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140306/1c286d5e/attachment-0003.html>

From sbrubaker at solazyme.com  Thu Mar  6 16:41:55 2014
From: sbrubaker at solazyme.com (Shane Brubaker)
Date: Thu, 6 Mar 2014 23:41:55 +0000
Subject: [maker-devel] Long introns from Augustus
Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA50F08236@EXCHANGE-MB01.internal.solazyme.com>

Hi, we have a very compact genome and we are getting a lot of fused gene models from running Augustus.  I am wondering if anyone has any advice about how to prevent introns above a certain cutoff from being created?

I tried a couple of things, some settings in a probabilities file and also changing a long list of probabilities to another file that someone had suggested on a forum.  So far I don't really see any changes though.

Any advice would be greatly appreciated.  

Thanks,
Shane


From carsonhh at gmail.com  Thu Mar  6 16:46:53 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Mar 2014 16:46:53 -0700
Subject: [maker-devel] Long introns from Augustus
Message-ID: <CF3E5643.A94C%carsonhh@gmail.com>

Are these the ab intio calls that are merged or final MAKER models.

?Carson


On 3/6/14, 4:41 PM, "Shane Brubaker" <sbrubaker at solazyme.com> wrote:

>Hi, we have a very compact genome and we are getting a lot of fused gene
>models from running Augustus.  I am wondering if anyone has any advice
>about how to prevent introns above a certain cutoff from being created?
>
>I tried a couple of things, some settings in a probabilities file and
>also changing a long list of probabilities to another file that someone
>had suggested on a forum.  So far I don't really see any changes though.
>
>Any advice would be greatly appreciated.
>
>Thanks,
>Shane
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From sbrubaker at solazyme.com  Thu Mar  6 17:48:15 2014
From: sbrubaker at solazyme.com (Shane Brubaker)
Date: Fri, 7 Mar 2014 00:48:15 +0000
Subject: [maker-devel] Long introns from Augustus
In-Reply-To: <CF3E5643.A94C%carsonhh@gmail.com>
References: <CF3E5643.A94C%carsonhh@gmail.com>
Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com>

Actually these are calls directly from Augustus (without using Maker).  They are not purely ab initio in that they are using hints from RNA-Seq data.

I had noticed that Maker does have some information about max intron length - does that mean it could be taken care of by Maker?  I don't have very good "EST" (transcriptome) assemblies because it is a very difficult organism to sequence.


-----Original Message-----
From: Carson Holt [mailto:carsonhh at gmail.com] 
Sent: Thursday, March 06, 2014 3:47 PM
To: Shane Brubaker; maker-devel at yandell-lab.org
Subject: Re: [maker-devel] Long introns from Augustus

Are these the ab intio calls that are merged or final MAKER models.

?Carson


On 3/6/14, 4:41 PM, "Shane Brubaker" <sbrubaker at solazyme.com> wrote:

>Hi, we have a very compact genome and we are getting a lot of fused 
>gene models from running Augustus.  I am wondering if anyone has any 
>advice about how to prevent introns above a certain cutoff from being created?
>
>I tried a couple of things, some settings in a probabilities file and 
>also changing a long list of probabilities to another file that someone 
>had suggested on a forum.  So far I don't really see any changes though.
>
>Any advice would be greatly appreciated.
>
>Thanks,
>Shane
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From mikael.durling at slu.se  Mon Mar 10 04:27:25 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Mon, 10 Mar 2014 10:27:25 +0000
Subject: [maker-devel] keep_preds values
Message-ID: <6765E2B1-3B6F-4F5D-92E1-80AE8C315FE3@slu.se>

Hi,

Can someone, please, explain the keep_preds parameter, as it works now with a value between 1 and 0? It used to be binary, but now it seems to test concordance towards something. The maker wiki doesn?t explain it any further either.

Thanks,
Mikael


From robert.king at rothamsted.ac.uk  Mon Mar 10 06:17:07 2014
From: robert.king at rothamsted.ac.uk (Robert King (RRes-Roth))
Date: Mon, 10 Mar 2014 12:17:07 +0000
Subject: [maker-devel] annotation comparison aed plots
Message-ID: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk>

Dear Maker Developers,

I've updated a reference that was had errors and was a little incomplete and now trying to produce a annotation for it. Please note the reference has not changed dramatically. I've produced two annotations using as evidence:

Annotation 1:
Uniprot proteins search using species keyword "fusarium"
Pubmed mRNA for the name of the organism
Prior annotation reference transcripts

Annotation 2:
Uniprot proteins search using species keyword "fusarium"
Pubmed mRNA for the name of the organism
Prior annotation reference transcripts
mRNA trinity assembly pasafly of different strain (only RNA-seq available)

I'm not sure if it was a smart move to use the prior annotation reference transcripts?

I want to compare these two annotations and have produced AED scores. How do I generate summary stats/figures to compare annotations. You mentioned last year in a post Mike Campbell has a script to produce these, do you know if he will post it? I've got the Eval program and converted to gtf format using the provided script, just waiting on some perl modules to be installed by admin to test it. I'm waiting on some perl modules to be installed by our administrator to test out the "Evaluator" and "compare" programs too, what do they do?

Best Wishes
Rob

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and we believe 
but do not warrant that this e-mail and any attachments
thereto do not contain any viruses. However, you are fully
responsible for performing any virus scanning.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140310/c3507502/attachment-0003.html>

From dence at genetics.utah.edu  Mon Mar 10 08:47:42 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Mon, 10 Mar 2014 14:47:42 +0000
Subject: [maker-devel] keep_preds values
In-Reply-To: <6765E2B1-3B6F-4F5D-92E1-80AE8C315FE3@slu.se>
References: <6765E2B1-3B6F-4F5D-92E1-80AE8C315FE3@slu.se>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6BA90@mxb2.hg.genetics.utah.edu>

Hi Mikael, 

The keep_preds parameter is often used the same as a binary parameter, but it doesn't have to be. The concordance that is mentioned in the comment line is the AED for that prediction. AED is a measurement of how well a prediction is supported by the evidence and ranges from 0 - 1. A prediction with an AED of 0 matches the evidence exactly while a prediction with an AED of 1 isn't overlapped by any evidence. 

The default behavior for MAKER is to make a gene model out of a prediction with any AED <1. When you change the keep_preds option from 0 to 1, then MAKER will make a gene model out of any prediction that matches the other parameters (like single_exon, min_exon, etc). Setting the keep_preds option to somewhere in between 0 and 1 will set a ceiling on the AED required for promoting a prediction to a gene model. 

>From a user standpoint, when you will almost certainly lose gene models when you set AED at an intermediate value, but you might benefit by knowing that all your models will now have an AED of at least a certain value. 

I hope that helps; let me know if it didn't. 

~Daniel

PS The original paper that described the AED is Eilbeck et al in BMC Bioinformatics 2009. It's also discussed in more detail in the MAKER2 paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews Genetics paper from 2012. 

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Mikael Brandstr?m Durling [mikael.durling at slu.se]
Sent: Monday, March 10, 2014 4:27 AM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] keep_preds values

Hi,

Can someone, please, explain the keep_preds parameter, as it works now with a value between 1 and 0? It used to be binary, but now it seems to test concordance towards something. The maker wiki doesn?t explain it any further either.

Thanks,
Mikael


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Mon Mar 10 09:51:21 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Mar 2014 08:51:21 -0700
Subject: [maker-devel] keep_preds values
Message-ID: <CF432CF3.A9C7%carsonhh@gmail.com>

Actually that is false. The keep_preds option is still binary.  Any value
other than 0 sets it to true.  There was discussion about making it a
non-binary value, but that has not been implemented.

?Carson


On 3/10/14, 7:47 AM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hi Mikael, 
>
>The keep_preds parameter is often used the same as a binary parameter,
>but it doesn't have to be. The concordance that is mentioned in the
>comment line is the AED for that prediction. AED is a measurement of how
>well a prediction is supported by the evidence and ranges from 0 - 1. A
>prediction with an AED of 0 matches the evidence exactly while a
>prediction with an AED of 1 isn't overlapped by any evidence.
>
>The default behavior for MAKER is to make a gene model out of a
>prediction with any AED <1. When you change the keep_preds option from 0
>to 1, then MAKER will make a gene model out of any prediction that
>matches the other parameters (like single_exon, min_exon, etc). Setting
>the keep_preds option to somewhere in between 0 and 1 will set a ceiling
>on the AED required for promoting a prediction to a gene model.
>
>From a user standpoint, when you will almost certainly lose gene models
>when you set AED at an intermediate value, but you might benefit by
>knowing that all your models will now have an AED of at least a certain
>value. 
>
>I hope that helps; let me know if it didn't.
>
>~Daniel
>
>PS The original paper that described the AED is Eilbeck et al in BMC
>Bioinformatics 2009. It's also discussed in more detail in the MAKER2
>paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews
>Genetics paper from 2012.
>
>Daniel Ence
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>Mikael Brandstr?m Durling [mikael.durling at slu.se]
>Sent: Monday, March 10, 2014 4:27 AM
>To: maker-devel at yandell-lab.org
>Subject: [maker-devel] keep_preds values
>
>Hi,
>
>Can someone, please, explain the keep_preds parameter, as it works now
>with a value between 1 and 0? It used to be binary, but now it seems to
>test concordance towards something. The maker wiki doesn?t explain it any
>further either.
>
>Thanks,
>Mikael
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From mikael.durling at slu.se  Mon Mar 10 08:57:23 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Mon, 10 Mar 2014 14:57:23 +0000
Subject: [maker-devel] keep_preds values
In-Reply-To: <CF432CF3.A9C7%carsonhh@gmail.com>
References: <CF432CF3.A9C7%carsonhh@gmail.com>
Message-ID: <E01F696F-4FC2-4B22-86B7-E40A5585A6F1@slu.se>

Hi Carson and Daniel,

That sounds more logical to me.  Then it would be appropriate to change the comment of keep_preds in the generated config files.

Would it make sense to make keep_preds a non-binary value to evaluate the concordance between ab initio models obtained from different predictors? That would assume that it is less likely to be a false positive when two or more predictors suggest the same unsported model?

Mikael


10 mar 2014 kl. 16:51 skrev Carson Holt <carsonhh at gmail.com>:

> Actually that is false. The keep_preds option is still binary.  Any value
> other than 0 sets it to true.  There was discussion about making it a
> non-binary value, but that has not been implemented.
> 
> ?Carson
> 
> 
> On 3/10/14, 7:47 AM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
> 
>> Hi Mikael, 
>> 
>> The keep_preds parameter is often used the same as a binary parameter,
>> but it doesn't have to be. The concordance that is mentioned in the
>> comment line is the AED for that prediction. AED is a measurement of how
>> well a prediction is supported by the evidence and ranges from 0 - 1. A
>> prediction with an AED of 0 matches the evidence exactly while a
>> prediction with an AED of 1 isn't overlapped by any evidence.
>> 
>> The default behavior for MAKER is to make a gene model out of a
>> prediction with any AED <1. When you change the keep_preds option from 0
>> to 1, then MAKER will make a gene model out of any prediction that
>> matches the other parameters (like single_exon, min_exon, etc). Setting
>> the keep_preds option to somewhere in between 0 and 1 will set a ceiling
>> on the AED required for promoting a prediction to a gene model.
>> 
>> From a user standpoint, when you will almost certainly lose gene models
>> when you set AED at an intermediate value, but you might benefit by
>> knowing that all your models will now have an AED of at least a certain
>> value. 
>> 
>> I hope that helps; let me know if it didn't.
>> 
>> ~Daniel
>> 
>> PS The original paper that described the AED is Eilbeck et al in BMC
>> Bioinformatics 2009. It's also discussed in more detail in the MAKER2
>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews
>> Genetics paper from 2012.
>> 
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> ________________________________________
>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>> Mikael Brandstr?m Durling [mikael.durling at slu.se]
>> Sent: Monday, March 10, 2014 4:27 AM
>> To: maker-devel at yandell-lab.org
>> Subject: [maker-devel] keep_preds values
>> 
>> Hi,
>> 
>> Can someone, please, explain the keep_preds parameter, as it works now
>> with a value between 1 and 0? It used to be binary, but now it seems to
>> test concordance towards something. The maker wiki doesn?t explain it any
>> further either.
>> 
>> Thanks,
>> Mikael
>> 
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> 


From carsonhh at gmail.com  Mon Mar 10 09:59:43 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Mar 2014 08:59:43 -0700
Subject: [maker-devel] keep_preds values
In-Reply-To: <E01F696F-4FC2-4B22-86B7-E40A5585A6F1@slu.se>
References: <CF432CF3.A9C7%carsonhh@gmail.com>
	<E01F696F-4FC2-4B22-86B7-E40A5585A6F1@slu.se>
Message-ID: <CF432F23.A9D4%carsonhh@gmail.com>

Yes.  It will eventually perform an AED like calculation between multiple
predictors (i.e. if you use 3 predictors it, then you require support by
at least 2 predictors across all exons to get a value of 0.33).  A value
of 0 would be perfect concordance across all 3 predictors.

?Carson


On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
wrote:

>Hi Carson and Daniel,
>
>That sounds more logical to me.  Then it would be appropriate to change
>the comment of keep_preds in the generated config files.
>
>Would it make sense to make keep_preds a non-binary value to evaluate the
>concordance between ab initio models obtained from different predictors?
>That would assume that it is less likely to be a false positive when two
>or more predictors suggest the same unsported model?
>
>Mikael
>
>
>10 mar 2014 kl. 16:51 skrev Carson Holt <carsonhh at gmail.com>:
>
>> Actually that is false. The keep_preds option is still binary.  Any
>>value
>> other than 0 sets it to true.  There was discussion about making it a
>> non-binary value, but that has not been implemented.
>> 
>> ?Carson
>> 
>> 
>> On 3/10/14, 7:47 AM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>> 
>>> Hi Mikael, 
>>> 
>>> The keep_preds parameter is often used the same as a binary parameter,
>>> but it doesn't have to be. The concordance that is mentioned in the
>>> comment line is the AED for that prediction. AED is a measurement of
>>>how
>>> well a prediction is supported by the evidence and ranges from 0 - 1. A
>>> prediction with an AED of 0 matches the evidence exactly while a
>>> prediction with an AED of 1 isn't overlapped by any evidence.
>>> 
>>> The default behavior for MAKER is to make a gene model out of a
>>> prediction with any AED <1. When you change the keep_preds option from
>>>0
>>> to 1, then MAKER will make a gene model out of any prediction that
>>> matches the other parameters (like single_exon, min_exon, etc). Setting
>>> the keep_preds option to somewhere in between 0 and 1 will set a
>>>ceiling
>>> on the AED required for promoting a prediction to a gene model.
>>> 
>>> From a user standpoint, when you will almost certainly lose gene models
>>> when you set AED at an intermediate value, but you might benefit by
>>> knowing that all your models will now have an AED of at least a certain
>>> value. 
>>> 
>>> I hope that helps; let me know if it didn't.
>>> 
>>> ~Daniel
>>> 
>>> PS The original paper that described the AED is Eilbeck et al in BMC
>>> Bioinformatics 2009. It's also discussed in more detail in the MAKER2
>>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews
>>> Genetics paper from 2012.
>>> 
>>> Daniel Ence
>>> Graduate Student
>>> Eccles Institute of Human Genetics
>>> University of Utah
>>> 15 North 2030 East, Room 2100
>>> Salt Lake City, UT 84112-5330
>>> ________________________________________
>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>>> Mikael Brandstr?m Durling [mikael.durling at slu.se]
>>> Sent: Monday, March 10, 2014 4:27 AM
>>> To: maker-devel at yandell-lab.org
>>> Subject: [maker-devel] keep_preds values
>>> 
>>> Hi,
>>> 
>>> Can someone, please, explain the keep_preds parameter, as it works now
>>> with a value between 1 and 0? It used to be binary, but now it seems to
>>> test concordance towards something. The maker wiki doesn?t explain it
>>>any
>>> further either.
>>> 
>>> Thanks,
>>> Mikael
>>> 
>>> 
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> 
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
>> 
>


From mikael.durling at slu.se  Mon Mar 10 09:08:16 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Mon, 10 Mar 2014 15:08:16 +0000
Subject: [maker-devel] keep_preds values
In-Reply-To: <CF432F23.A9D4%carsonhh@gmail.com>
References: <CF432CF3.A9C7%carsonhh@gmail.com>
	<E01F696F-4FC2-4B22-86B7-E40A5585A6F1@slu.se>
	<CF432F23.A9D4%carsonhh@gmail.com>
Message-ID: <00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se>

Ok. But that is not implemented no as far as I can tell from the source, right? Or is it reflected in the AED for the unsupported models?

Mikael

10 mar 2014 kl. 16:59 skrev Carson Holt <carsonhh at gmail.com>:

> Yes.  It will eventually perform an AED like calculation between multiple
> predictors (i.e. if you use 3 predictors it, then you require support by
> at least 2 predictors across all exons to get a value of 0.33).  A value
> of 0 would be perfect concordance across all 3 predictors.
> 
> ?Carson
> 
> 
> 
> 
> On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
> wrote:
> 
>> Hi Carson and Daniel,
>> 
>> That sounds more logical to me.  Then it would be appropriate to change
>> the comment of keep_preds in the generated config files.
>> 
>> Would it make sense to make keep_preds a non-binary value to evaluate the
>> concordance between ab initio models obtained from different predictors?
>> That would assume that it is less likely to be a false positive when two
>> or more predictors suggest the same unsported model?
>> 
>> Mikael
>> 
>> 
>> 10 mar 2014 kl. 16:51 skrev Carson Holt <carsonhh at gmail.com>:
>> 
>>> Actually that is false. The keep_preds option is still binary.  Any
>>> value
>>> other than 0 sets it to true.  There was discussion about making it a
>>> non-binary value, but that has not been implemented.
>>> 
>>> ?Carson
>>> 
>>> 
>>> On 3/10/14, 7:47 AM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>> 
>>>> Hi Mikael, 
>>>> 
>>>> The keep_preds parameter is often used the same as a binary parameter,
>>>> but it doesn't have to be. The concordance that is mentioned in the
>>>> comment line is the AED for that prediction. AED is a measurement of
>>>> how
>>>> well a prediction is supported by the evidence and ranges from 0 - 1. A
>>>> prediction with an AED of 0 matches the evidence exactly while a
>>>> prediction with an AED of 1 isn't overlapped by any evidence.
>>>> 
>>>> The default behavior for MAKER is to make a gene model out of a
>>>> prediction with any AED <1. When you change the keep_preds option from
>>>> 0
>>>> to 1, then MAKER will make a gene model out of any prediction that
>>>> matches the other parameters (like single_exon, min_exon, etc). Setting
>>>> the keep_preds option to somewhere in between 0 and 1 will set a
>>>> ceiling
>>>> on the AED required for promoting a prediction to a gene model.
>>>> 
>>>> From a user standpoint, when you will almost certainly lose gene models
>>>> when you set AED at an intermediate value, but you might benefit by
>>>> knowing that all your models will now have an AED of at least a certain
>>>> value. 
>>>> 
>>>> I hope that helps; let me know if it didn't.
>>>> 
>>>> ~Daniel
>>>> 
>>>> PS The original paper that described the AED is Eilbeck et al in BMC
>>>> Bioinformatics 2009. It's also discussed in more detail in the MAKER2
>>>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews
>>>> Genetics paper from 2012.
>>>> 
>>>> Daniel Ence
>>>> Graduate Student
>>>> Eccles Institute of Human Genetics
>>>> University of Utah
>>>> 15 North 2030 East, Room 2100
>>>> Salt Lake City, UT 84112-5330
>>>> ________________________________________
>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>>>> Mikael Brandstr?m Durling [mikael.durling at slu.se]
>>>> Sent: Monday, March 10, 2014 4:27 AM
>>>> To: maker-devel at yandell-lab.org
>>>> Subject: [maker-devel] keep_preds values
>>>> 
>>>> Hi,
>>>> 
>>>> Can someone, please, explain the keep_preds parameter, as it works now
>>>> with a value between 1 and 0? It used to be binary, but now it seems to
>>>> test concordance towards something. The maker wiki doesn?t explain it
>>>> any
>>>> further either.
>>>> 
>>>> Thanks,
>>>> Mikael
>>>> 
>>>> 
>>>> _______________________________________________
>>>> maker-devel mailing list
>>>> maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>> 
>>>> _______________________________________________
>>>> maker-devel mailing list
>>>> maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> 
>>> 
>> 
> 
> 


From carsonhh at gmail.com  Mon Mar 10 10:16:59 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Mar 2014 09:16:59 -0700
Subject: [maker-devel] keep_preds values
In-Reply-To: <00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se>
References: <CF432CF3.A9C7%carsonhh@gmail.com>
	<E01F696F-4FC2-4B22-86B7-E40A5585A6F1@slu.se>
	<CF432F23.A9D4%carsonhh@gmail.com>
	<00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se>
Message-ID: <CF4331A9.A9E0%carsonhh@gmail.com>

There is a value called abAED being calculated, which somewhat captures
the concordance among the predictors.  It is not currently printed in the
GFF3, but it is used to identify the best non-overlapping ab initio
predictor to put in the non-overlapping fasta file.  There are a couple of
things I still need to do with it to though.  It?s not yet normalized to
take into account the absence of a predictor in the cluster of overlapping
predictions. For example, if I have 2 predictors and 2 make perfectly
matching calls and 1 makes no call, they get a score of 0 before I have
perfect concordance between what?s there, but I really should make it 0.33
because the abscence of the third predictor is meaningful.  The
unnormalized concordance value is fine for deciding which overlapping
model to keep in the file, but not for global comparison.

?Carson


On 3/10/14, 8:08 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
wrote:

>Ok. But that is not implemented no as far as I can tell from the source,
>right? Or is it reflected in the AED for the unsupported models?
>
>Mikael
>
>10 mar 2014 kl. 16:59 skrev Carson Holt <carsonhh at gmail.com>:
>
>> Yes.  It will eventually perform an AED like calculation between
>>multiple
>> predictors (i.e. if you use 3 predictors it, then you require support by
>> at least 2 predictors across all exons to get a value of 0.33).  A value
>> of 0 would be perfect concordance across all 3 predictors.
>> 
>> ?Carson
>> 
>> 
>> 
>> 
>> On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
>> wrote:
>> 
>>> Hi Carson and Daniel,
>>> 
>>> That sounds more logical to me.  Then it would be appropriate to change
>>> the comment of keep_preds in the generated config files.
>>> 
>>> Would it make sense to make keep_preds a non-binary value to evaluate
>>>the
>>> concordance between ab initio models obtained from different
>>>predictors?
>>> That would assume that it is less likely to be a false positive when
>>>two
>>> or more predictors suggest the same unsported model?
>>> 
>>> Mikael
>>> 
>>> 
>>> 10 mar 2014 kl. 16:51 skrev Carson Holt <carsonhh at gmail.com>:
>>> 
>>>> Actually that is false. The keep_preds option is still binary.  Any
>>>> value
>>>> other than 0 sets it to true.  There was discussion about making it a
>>>> non-binary value, but that has not been implemented.
>>>> 
>>>> ?Carson
>>>> 
>>>> 
>>>> On 3/10/14, 7:47 AM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>>> 
>>>>> Hi Mikael, 
>>>>> 
>>>>> The keep_preds parameter is often used the same as a binary
>>>>>parameter,
>>>>> but it doesn't have to be. The concordance that is mentioned in the
>>>>> comment line is the AED for that prediction. AED is a measurement of
>>>>> how
>>>>> well a prediction is supported by the evidence and ranges from 0 -
>>>>>1. A
>>>>> prediction with an AED of 0 matches the evidence exactly while a
>>>>> prediction with an AED of 1 isn't overlapped by any evidence.
>>>>> 
>>>>> The default behavior for MAKER is to make a gene model out of a
>>>>> prediction with any AED <1. When you change the keep_preds option
>>>>>from
>>>>> 0
>>>>> to 1, then MAKER will make a gene model out of any prediction that
>>>>> matches the other parameters (like single_exon, min_exon, etc).
>>>>>Setting
>>>>> the keep_preds option to somewhere in between 0 and 1 will set a
>>>>> ceiling
>>>>> on the AED required for promoting a prediction to a gene model.
>>>>> 
>>>>> From a user standpoint, when you will almost certainly lose gene
>>>>>models
>>>>> when you set AED at an intermediate value, but you might benefit by
>>>>> knowing that all your models will now have an AED of at least a
>>>>>certain
>>>>> value. 
>>>>> 
>>>>> I hope that helps; let me know if it didn't.
>>>>> 
>>>>> ~Daniel
>>>>> 
>>>>> PS The original paper that described the AED is Eilbeck et al in BMC
>>>>> Bioinformatics 2009. It's also discussed in more detail in the MAKER2
>>>>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews
>>>>> Genetics paper from 2012.
>>>>> 
>>>>> Daniel Ence
>>>>> Graduate Student
>>>>> Eccles Institute of Human Genetics
>>>>> University of Utah
>>>>> 15 North 2030 East, Room 2100
>>>>> Salt Lake City, UT 84112-5330
>>>>> ________________________________________
>>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>>>>> Mikael Brandstr?m Durling [mikael.durling at slu.se]
>>>>> Sent: Monday, March 10, 2014 4:27 AM
>>>>> To: maker-devel at yandell-lab.org
>>>>> Subject: [maker-devel] keep_preds values
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> Can someone, please, explain the keep_preds parameter, as it works
>>>>>now
>>>>> with a value between 1 and 0? It used to be binary, but now it seems
>>>>>to
>>>>> test concordance towards something. The maker wiki doesn?t explain it
>>>>> any
>>>>> further either.
>>>>> 
>>>>> Thanks,
>>>>> Mikael
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> maker-devel mailing list
>>>>> maker-devel at box290.bluehost.com
>>>>> 
>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or
>>>>>g
>>>>> 
>>>>> _______________________________________________
>>>>> maker-devel mailing list
>>>>> maker-devel at box290.bluehost.com
>>>>> 
>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or
>>>>>g
>>>> 
>>>> 
>>> 
>> 
>> 
>


From carsonhh at gmail.com  Mon Mar 10 10:18:14 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Mar 2014 09:18:14 -0700
Subject: [maker-devel] keep_preds values
In-Reply-To: <CF4331A9.A9E0%carsonhh@gmail.com>
References: <CF432CF3.A9C7%carsonhh@gmail.com>
	<E01F696F-4FC2-4B22-86B7-E40A5585A6F1@slu.se>
	<CF432F23.A9D4%carsonhh@gmail.com>
	<00E6B00E-BE93-42F1-A580-2A254E2C9E64@slu.se>
	<CF4331A9.A9E0%carsonhh@gmail.com>
Message-ID: <CF4333C1.AA06%carsonhh@gmail.com>

Sorry meant to say "3 predictors and 2 make perfectly
matching calls and 1 makes no call."


On 3/10/14, 9:16 AM, "Carson Holt" <carsonhh at gmail.com> wrote:

>There is a value called abAED being calculated, which somewhat captures
>the concordance among the predictors.  It is not currently printed in the
>GFF3, but it is used to identify the best non-overlapping ab initio
>predictor to put in the non-overlapping fasta file.  There are a couple of
>things I still need to do with it to though.  It?s not yet normalized to
>take into account the absence of a predictor in the cluster of overlapping
>predictions. For example, if I have 2 predictors and 2 make perfectly
>matching calls and 1 makes no call, they get a score of 0 before I have
>perfect concordance between what?s there, but I really should make it 0.33
>because the abscence of the third predictor is meaningful.  The
>unnormalized concordance value is fine for deciding which overlapping
>model to keep in the file, but not for global comparison.
>
>?Carson
>
>
>
>On 3/10/14, 8:08 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
>wrote:
>
>>Ok. But that is not implemented no as far as I can tell from the source,
>>right? Or is it reflected in the AED for the unsupported models?
>>
>>Mikael
>>
>>10 mar 2014 kl. 16:59 skrev Carson Holt <carsonhh at gmail.com>:
>>
>>> Yes.  It will eventually perform an AED like calculation between
>>>multiple
>>> predictors (i.e. if you use 3 predictors it, then you require support
>>>by
>>> at least 2 predictors across all exons to get a value of 0.33).  A
>>>value
>>> of 0 would be perfect concordance across all 3 predictors.
>>> 
>>> ?Carson
>>> 
>>> 
>>> 
>>> 
>>> On 3/10/14, 7:57 AM, "Mikael Brandstr?m Durling"
>>><mikael.durling at slu.se>
>>> wrote:
>>> 
>>>> Hi Carson and Daniel,
>>>> 
>>>> That sounds more logical to me.  Then it would be appropriate to
>>>>change
>>>> the comment of keep_preds in the generated config files.
>>>> 
>>>> Would it make sense to make keep_preds a non-binary value to evaluate
>>>>the
>>>> concordance between ab initio models obtained from different
>>>>predictors?
>>>> That would assume that it is less likely to be a false positive when
>>>>two
>>>> or more predictors suggest the same unsported model?
>>>> 
>>>> Mikael
>>>> 
>>>> 
>>>> 10 mar 2014 kl. 16:51 skrev Carson Holt <carsonhh at gmail.com>:
>>>> 
>>>>> Actually that is false. The keep_preds option is still binary.  Any
>>>>> value
>>>>> other than 0 sets it to true.  There was discussion about making it a
>>>>> non-binary value, but that has not been implemented.
>>>>> 
>>>>> ?Carson
>>>>> 
>>>>> 
>>>>> On 3/10/14, 7:47 AM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>>>> 
>>>>>> Hi Mikael, 
>>>>>> 
>>>>>> The keep_preds parameter is often used the same as a binary
>>>>>>parameter,
>>>>>> but it doesn't have to be. The concordance that is mentioned in the
>>>>>> comment line is the AED for that prediction. AED is a measurement of
>>>>>> how
>>>>>> well a prediction is supported by the evidence and ranges from 0 -
>>>>>>1. A
>>>>>> prediction with an AED of 0 matches the evidence exactly while a
>>>>>> prediction with an AED of 1 isn't overlapped by any evidence.
>>>>>> 
>>>>>> The default behavior for MAKER is to make a gene model out of a
>>>>>> prediction with any AED <1. When you change the keep_preds option
>>>>>>from
>>>>>> 0
>>>>>> to 1, then MAKER will make a gene model out of any prediction that
>>>>>> matches the other parameters (like single_exon, min_exon, etc).
>>>>>>Setting
>>>>>> the keep_preds option to somewhere in between 0 and 1 will set a
>>>>>> ceiling
>>>>>> on the AED required for promoting a prediction to a gene model.
>>>>>> 
>>>>>> From a user standpoint, when you will almost certainly lose gene
>>>>>>models
>>>>>> when you set AED at an intermediate value, but you might benefit by
>>>>>> knowing that all your models will now have an AED of at least a
>>>>>>certain
>>>>>> value. 
>>>>>> 
>>>>>> I hope that helps; let me know if it didn't.
>>>>>> 
>>>>>> ~Daniel
>>>>>> 
>>>>>> PS The original paper that described the AED is Eilbeck et al in BMC
>>>>>> Bioinformatics 2009. It's also discussed in more detail in the
>>>>>>MAKER2
>>>>>> paper, the MAKER-P paper, and the Yandell and Ence Nature Reviews
>>>>>> Genetics paper from 2012.
>>>>>> 
>>>>>> Daniel Ence
>>>>>> Graduate Student
>>>>>> Eccles Institute of Human Genetics
>>>>>> University of Utah
>>>>>> 15 North 2030 East, Room 2100
>>>>>> Salt Lake City, UT 84112-5330
>>>>>> ________________________________________
>>>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>>>>>> Mikael Brandstr?m Durling [mikael.durling at slu.se]
>>>>>> Sent: Monday, March 10, 2014 4:27 AM
>>>>>> To: maker-devel at yandell-lab.org
>>>>>> Subject: [maker-devel] keep_preds values
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Can someone, please, explain the keep_preds parameter, as it works
>>>>>>now
>>>>>> with a value between 1 and 0? It used to be binary, but now it seems
>>>>>>to
>>>>>> test concordance towards something. The maker wiki doesn?t explain
>>>>>>it
>>>>>> any
>>>>>> further either.
>>>>>> 
>>>>>> Thanks,
>>>>>> Mikael
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> maker-devel mailing list
>>>>>> maker-devel at box290.bluehost.com
>>>>>> 
>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o
>>>>>>r
>>>>>>g
>>>>>> 
>>>>>> _______________________________________________
>>>>>> maker-devel mailing list
>>>>>> maker-devel at box290.bluehost.com
>>>>>> 
>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o
>>>>>>r
>>>>>>g
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>
>
>


From carsonhh at gmail.com  Mon Mar 10 10:25:50 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Mar 2014 09:25:50 -0700
Subject: [maker-devel] annotation comparison aed plots
Message-ID: <CF4330EC.A9DA%carsonhh@gmail.com>

I don?t know about Michaels?s script, but I?ve always used eval.  It
produces sensitivity/specificity metrics.  It assumes the first models are
100% correct, and then tells you the sensitivity/specificity value for the
second models.

It is not therefor a quality metric.  Instead you should view it as a change
metric. Lower sensitivity tells you that models/exons have been lost between
versions, and lower specificity tells you models/exons have been gained.
There will also be a lost of generic statistics on exon/intron distribution
and UTR length.  Then the AED values from the MAEKR run can be used
independently to evaluate how well models match the evidence.

?Carson


From:  "Robert King (RRes-Roth)" <robert.king at rothamsted.ac.uk>
Date:  Monday, March 10, 2014 at 5:17 AM
To:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  [maker-devel] annotation comparison aed plots

Dear Maker Developers,
 
I?ve updated a reference that was had errors and was a little incomplete and
now trying to produce a annotation for it. Please note the reference has not
changed dramatically. I?ve produced two annotations using as evidence:
 
Annotation 1:
Uniprot proteins search using species keyword ?fusarium?
Pubmed mRNA for the name of the organism
Prior annotation reference transcripts
 
Annotation 2:
Uniprot proteins search using species keyword ?fusarium?
Pubmed mRNA for the name of the organism
Prior annotation reference transcripts
mRNA trinity assembly pasafly of different strain (only RNA-seq available)
 
I?m not sure if it was a smart move to use the prior annotation reference
transcripts?
 
I want to compare these two annotations and have produced AED scores. How do
I generate summary stats/figures to compare annotations. You mentioned last
year in a post Mike Campbell has a script to produce these, do you know if
he will post it? I?ve got the Eval program and converted to gtf format using
the provided script, just waiting on some perl modules to be installed by
admin to test it. I?m waiting on some perl modules to be installed by our
administrator to test out the ?Evaluator? and ?compare? programs too, what
do they do?
 
Best Wishes
Rob

-- 
This message has been scanned for viruses and
dangerous content by MailScanner <http://www.mailscanner.info/> , and
we believe  but do not warrant that this e-mail and any attachments thereto
do not contain any viruses. However, you are fully responsible for
performing any virus scanning.
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140310/cbd8263c/attachment-0003.html>

From michael.s.campbell1 at gmail.com  Mon Mar 10 09:50:53 2014
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Mon, 10 Mar 2014 09:50:53 -0600
Subject: [maker-devel] annotation comparison aed plots
In-Reply-To: <CAAi6vWVWuP4b39zf+3k_SAwKuWxAFGRvAD3oNCugkuPLjagOww@mail.gmail.com>
References: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk>
	<CAAi6vWVWuP4b39zf+3k_SAwKuWxAFGRvAD3oNCugkuPLjagOww@mail.gmail.com>
Message-ID: <CAAi6vWUSY6UgyyXAJ5=-aUA_39FBwFREVX3xmeHSZaE264AKGw@mail.gmail.com>

One more point. The sensitivity, specificity,and accuracy produced by the
compare_annotations_3.2.pl script are gene level, and overlap is defined
very liberally between annotation sets is defined as at least one
nucleotide of an exon overlap.
Mike


On Mon, Mar 10, 2014 at 9:47 AM, Michael Campbell <
michael.s.campbell1 at gmail.com> wrote:

> Hi Robert,
>
> Here are the scripts that were mentioned before.
>
> The AED_cdf_generator.pl script is for making cumulative distribution
> function plots based on annotation edit distance. This script is quite
> simple and strait forward in its internals.
>
> The compare_annotations_3.2.pl script is for generating summary stats for
> annotations and will compare two annotations of the same assembly.
>
> You can run either script without arguments to get a usage statement.
>
> Thanks,
> Mike
>
>
> On Mon, Mar 10, 2014 at 6:17 AM, Robert King (RRes-Roth) <
> robert.king at rothamsted.ac.uk> wrote:
>
>>  Dear Maker Developers,
>>
>>
>>
>> I've updated a reference that was had errors and was a little incomplete
>> and now trying to produce a annotation for it. Please note the reference
>> has not changed dramatically. I've produced two annotations using as
>> evidence:
>>
>>
>>
>> Annotation 1:
>>
>> Uniprot proteins search using species keyword "fusarium"
>>
>> Pubmed mRNA for the name of the organism
>>
>> Prior annotation reference transcripts
>>
>>
>>
>> Annotation 2:
>>
>> Uniprot proteins search using species keyword "fusarium"
>>
>> Pubmed mRNA for the name of the organism
>>
>> Prior annotation reference transcripts
>>
>> mRNA trinity assembly pasafly of different strain (only RNA-seq available)
>>
>>
>>
>> I'm not sure if it was a smart move to use the prior annotation reference
>> transcripts?
>>
>>
>>
>> I want to compare these two annotations and have produced AED scores. How
>> do I generate summary stats/figures to compare annotations. You mentioned
>> last year in a post Mike Campbell has a script to produce these, do you
>> know if he will post it? I've got the Eval program and converted to gtf
>> format using the provided script, just waiting on some perl modules to be
>> installed by admin to test it. I'm waiting on some perl modules to be
>> installed by our administrator to test out the "Evaluator" and "compare"
>> programs too, what do they do?
>>
>>
>>
>> Best Wishes
>>
>> Rob
>>
>> --
>> This message has been scanned for viruses and
>> dangerous content by *MailScanner* <http://www.mailscanner.info/>, and
>> we believe but do not warrant that this e-mail and any attachments
>> thereto do not contain any viruses. However, you are fully responsible for
>> performing any virus scanning.
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>
>
> --
> Michael Campbell MS, RD.
> Doctoral Candidate
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ph:585-3543
>
>


-- 
Michael Campbell MS, RD.
Doctoral Candidate
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
ph:585-3543
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140310/25073390/attachment-0003.html>

From cjfields at illinois.edu  Mon Mar 10 09:52:50 2014
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Mon, 10 Mar 2014 15:52:50 +0000
Subject: [maker-devel] geneid (or alternative ab initio predictors)
Message-ID: <CEB024AC-5E08-4827-9EC4-17D09F06E1FA@illinois.edu>

I have been running MAKER 2.31 using Augustus and SNAP on an avian genome.  Augustus gives pretty decent gene model predictions based on a custom model we have and the hints MAKER provides.  However, SNAP seems to throw out a ton of false positives; in many cases this appears to cause erroneous gene fusions.  Leaving out SNAP altogether however leads to a marked decrease in # models overall, which is worse.  GeneMark had a very similar problem (high # false positives) and thus no marked improvement, either when using with both Augustus and SNAP or with Augustus alone.

I have been exploring using geneid (http://genome.crg.es/software/geneid/) as an alternative, based on some feedback on another project I worked with int he past.  This would be feed into MAKER using external GFF, but I wanted to see if anyone has tried geneid with MAKER first.  

Finally, how hard would it be to incorporate alternative callers into MAKER?  For instance, would it be possible to add these like a ?plugin??  

chris


From carsonhh at gmail.com  Mon Mar 10 11:05:24 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Mar 2014 10:05:24 -0700
Subject: [maker-devel] geneid (or alternative ab initio predictors)
Message-ID: <CF433C40.AA26%carsonhh@gmail.com>

Adding a new predictor can take some time.  It obviously requires some
coding.  It?s usually not too hard just to convert results to GFF3 and
then pass it in.  Integrated support is really only beneficial for
predictors that can take ?hints? from evidence alignments (for example we
are working on EVM integration right now -
http://evidencemodeler.sourceforge.net).  If SNAP and GeneMark give
problems just drop them.  GeneMark really doesn?t work very good on
genomes with complex intron/exon structure (and I really wouldn?t use it
for anything but fungi).

Make sure you are also giving sufficient protein evidence.  Perhaps all
proteins from chicken and pigeon for example.  Then you shouldn?t find
loss of any true genes if just using Augustus.  Also try not to use gene
count as an indicator of performance.  The value is very deceptive,
especially if the genome assembly is fragmented.

Thanks,
Carson


On 3/10/14, 8:52 AM, "Fields, Christopher J" <cjfields at illinois.edu> wrote:

>I have been running MAKER 2.31 using Augustus and SNAP on an avian
>genome.  Augustus gives pretty decent gene model predictions based on a
>custom model we have and the hints MAKER provides.  However, SNAP seems
>to throw out a ton of false positives; in many cases this appears to
>cause erroneous gene fusions.  Leaving out SNAP altogether however leads
>to a marked decrease in # models overall, which is worse.  GeneMark had a
>very similar problem (high # false positives) and thus no marked
>improvement, either when using with both Augustus and SNAP or with
>Augustus alone.
>
>I have been exploring using geneid
>(http://genome.crg.es/software/geneid/) as an alternative, based on some
>feedback on another project I worked with int he past.  This would be
>feed into MAKER using external GFF, but I wanted to see if anyone has
>tried geneid with MAKER first.
>
>Finally, how hard would it be to incorporate alternative callers into
>MAKER?  For instance, would it be possible to add these like a ?plugin??
>
>chris
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From michael.s.campbell1 at gmail.com  Mon Mar 10 09:47:50 2014
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Mon, 10 Mar 2014 09:47:50 -0600
Subject: [maker-devel] annotation comparison aed plots
In-Reply-To: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk>
References: <136AB40E0C34CF4FB9AE0DD8C22A8D7B7B10C1@rothex1.rothamsted.ac.uk>
Message-ID: <CAAi6vWVWuP4b39zf+3k_SAwKuWxAFGRvAD3oNCugkuPLjagOww@mail.gmail.com>

Hi Robert,

Here are the scripts that were mentioned before.

The AED_cdf_generator.pl script is for making cumulative distribution
function plots based on annotation edit distance. This script is quite
simple and strait forward in its internals.

The compare_annotations_3.2.pl script is for generating summary stats for
annotations and will compare two annotations of the same assembly.

You can run either script without arguments to get a usage statement.

Thanks,
Mike


On Mon, Mar 10, 2014 at 6:17 AM, Robert King (RRes-Roth) <
robert.king at rothamsted.ac.uk> wrote:

>  Dear Maker Developers,
>
>
>
> I've updated a reference that was had errors and was a little incomplete
> and now trying to produce a annotation for it. Please note the reference
> has not changed dramatically. I've produced two annotations using as
> evidence:
>
>
>
> Annotation 1:
>
> Uniprot proteins search using species keyword "fusarium"
>
> Pubmed mRNA for the name of the organism
>
> Prior annotation reference transcripts
>
>
>
> Annotation 2:
>
> Uniprot proteins search using species keyword "fusarium"
>
> Pubmed mRNA for the name of the organism
>
> Prior annotation reference transcripts
>
> mRNA trinity assembly pasafly of different strain (only RNA-seq available)
>
>
>
> I'm not sure if it was a smart move to use the prior annotation reference
> transcripts?
>
>
>
> I want to compare these two annotations and have produced AED scores. How
> do I generate summary stats/figures to compare annotations. You mentioned
> last year in a post Mike Campbell has a script to produce these, do you
> know if he will post it? I've got the Eval program and converted to gtf
> format using the provided script, just waiting on some perl modules to be
> installed by admin to test it. I'm waiting on some perl modules to be
> installed by our administrator to test out the "Evaluator" and "compare"
> programs too, what do they do?
>
>
>
> Best Wishes
>
> Rob
>
> --
> This message has been scanned for viruses and
> dangerous content by *MailScanner* <http://www.mailscanner.info/>, and
> we believe but do not warrant that this e-mail and any attachments thereto
> do not contain any viruses. However, you are fully responsible for
> performing any virus scanning.
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>


-- 
Michael Campbell MS, RD.
Doctoral Candidate
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
ph:585-3543
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140310/e21497bc/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: AED_cdf_generator.pl
Type: text/x-perl-script
Size: 2580 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140310/e21497bc/attachment-0006.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: compare_annotations_3.2.pl
Type: text/x-perl-script
Size: 29155 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140310/e21497bc/attachment-0007.bin>

From sajeet at gmail.com  Mon Mar 10 12:31:40 2014
From: sajeet at gmail.com (Sajeet Haridas)
Date: Mon, 10 Mar 2014 11:31:40 -0700
Subject: [maker-devel] geneid (or alternative ab initio predictors)
In-Reply-To: <CF433C40.AA26%carsonhh@gmail.com>
References: <CF433C40.AA26%carsonhh@gmail.com>
Message-ID: <CAJrwUqnXZgJgse2X6z7QhQcC_aNih_dp90dpjsW037F0Qk-W4A@mail.gmail.com>

One of the problems I have found with genemark is that it does not
understand a soft-masked genome. Hence, the self training is incorrect. I
have found marked improvement to genemark's prediction by running the
training on a hard masked genome.


On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt <carsonhh at gmail.com> wrote:

> Adding a new predictor can take some time.  It obviously requires some
> coding.  It's usually not too hard just to convert results to GFF3 and
> then pass it in.  Integrated support is really only beneficial for
> predictors that can take "hints" from evidence alignments (for example we
> are working on EVM integration right now -
> http://evidencemodeler.sourceforge.net).  If SNAP and GeneMark give
> problems just drop them.  GeneMark really doesn't work very good on
> genomes with complex intron/exon structure (and I really wouldn't use it
> for anything but fungi).
>
> Make sure you are also giving sufficient protein evidence.  Perhaps all
> proteins from chicken and pigeon for example.  Then you shouldn't find
> loss of any true genes if just using Augustus.  Also try not to use gene
> count as an indicator of performance.  The value is very deceptive,
> especially if the genome assembly is fragmented.
>
> Thanks,
> Carson
>
>
>
> On 3/10/14, 8:52 AM, "Fields, Christopher J" <cjfields at illinois.edu>
> wrote:
>
> >I have been running MAKER 2.31 using Augustus and SNAP on an avian
> >genome.  Augustus gives pretty decent gene model predictions based on a
> >custom model we have and the hints MAKER provides.  However, SNAP seems
> >to throw out a ton of false positives; in many cases this appears to
> >cause erroneous gene fusions.  Leaving out SNAP altogether however leads
> >to a marked decrease in # models overall, which is worse.  GeneMark had a
> >very similar problem (high # false positives) and thus no marked
> >improvement, either when using with both Augustus and SNAP or with
> >Augustus alone.
> >
> >I have been exploring using geneid
> >(http://genome.crg.es/software/geneid/) as an alternative, based on some
> >feedback on another project I worked with int he past.  This would be
> >feed into MAKER using external GFF, but I wanted to see if anyone has
> >tried geneid with MAKER first.
> >
> >Finally, how hard would it be to incorporate alternative callers into
> >MAKER?  For instance, would it be possible to add these like a 'plugin'?
> >
> >chris
> >_______________________________________________
> >maker-devel mailing list
> >maker-devel at box290.bluehost.com
> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140310/e3f33e33/attachment-0003.html>

From carsonhh at gmail.com  Mon Mar 10 22:13:43 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Mar 2014 22:13:43 -0600
Subject: [maker-devel] Long introns from Augustus
In-Reply-To: <61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com>
References: <CF3E5643.A94C%carsonhh@gmail.com>
	<61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com>
Message-ID: <99883695-A1E7-4B03-BB8D-06863D8132E5@gmail.com>

Maybe.  The max intron length will affect evidence alignments and clustering, which will be used as hints to Augustus. You can give it a try. If you lack transcriptome data, just make sure you provide it with a couple of related proteomes.

--Carson

Sent from my iPhone

> On Mar 6, 2014, at 5:48 PM, Shane Brubaker <sbrubaker at solazyme.com> wrote:
> 
> Actually these are calls directly from Augustus (without using Maker).  They are not purely ab initio in that they are using hints from RNA-Seq data.
> 
> I had noticed that Maker does have some information about max intron length - does that mean it could be taken care of by Maker?  I don't have very good "EST" (transcriptome) assemblies because it is a very difficult organism to sequence.
> 
> 
> -----Original Message-----
> From: Carson Holt [mailto:carsonhh at gmail.com] 
> Sent: Thursday, March 06, 2014 3:47 PM
> To: Shane Brubaker; maker-devel at yandell-lab.org
> Subject: Re: [maker-devel] Long introns from Augustus
> 
> Are these the ab intio calls that are merged or final MAKER models.
> 
> ?Carson
> 
> 
>> On 3/6/14, 4:41 PM, "Shane Brubaker" <sbrubaker at solazyme.com> wrote:
>> 
>> Hi, we have a very compact genome and we are getting a lot of fused 
>> gene models from running Augustus.  I am wondering if anyone has any 
>> advice about how to prevent introns above a certain cutoff from being created?
>> 
>> I tried a couple of things, some settings in a probabilities file and 
>> also changing a long list of probabilities to another file that someone 
>> had suggested on a forum.  So far I don't really see any changes though.
>> 
>> Any advice would be greatly appreciated.
>> 
>> Thanks,
>> Shane
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> 


From darasappan at gmail.com  Mon Mar 10 14:14:03 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Mon, 10 Mar 2014 15:14:03 -0500
Subject: [maker-devel] maker output- transcripts.fasta and proteins.fasta
	files missing
Message-ID: <E1538E4B-E356-4044-BD96-6D56D6F65C87@gmail.com>

Hello,

I've been running maker with different assembly files, reference files  
etc  and I check the output by:

1. concatenating the gff files
2. concatenating the *transcripts.fasta files
3. concatenating the *proteins.fasta files

I'm noticing that when I ran maker twice with same parameters, the  
second time around, many of the output subdirectories  do not have a  
*transcripts.fasta or *proteins.fasta file in it.
There are 251 subdirectories and only 97 of them have all 3 output  
files.  Maker log looks ok to me, but I've attached it here as well.

What could be the reason for this?

Thanks
dhivya

-------------- next part --------------
A non-text attachment was scrubbed...
Name: maker.o1813247.gz
Type: application/x-gzip
Size: 13857217 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140310/34f3a118/attachment-0001.tgz>
-------------- next part --------------


From sbrubaker at solazyme.com  Tue Mar 11 11:06:57 2014
From: sbrubaker at solazyme.com (Shane Brubaker)
Date: Tue, 11 Mar 2014 17:06:57 +0000
Subject: [maker-devel] Long introns from Augustus
In-Reply-To: <99883695-A1E7-4B03-BB8D-06863D8132E5@gmail.com>
References: <CF3E5643.A94C%carsonhh@gmail.com>
	<61D01ACB70C1E141A150BA9F586D5BFA50F0826A@EXCHANGE-MB01.internal.solazyme.com>
	<99883695-A1E7-4B03-BB8D-06863D8132E5@gmail.com>
Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA50F08FB3@EXCHANGE-MB01.internal.solazyme.com>

Ok thank you.

-----Original Message-----
From: Carson Holt [mailto:carsonhh at gmail.com] 
Sent: Monday, March 10, 2014 9:14 PM
To: Shane Brubaker
Cc: maker-devel at yandell-lab.org
Subject: Re: [maker-devel] Long introns from Augustus

Maybe.  The max intron length will affect evidence alignments and clustering, which will be used as hints to Augustus. You can give it a try. If you lack transcriptome data, just make sure you provide it with a couple of related proteomes.

--Carson

Sent from my iPhone

> On Mar 6, 2014, at 5:48 PM, Shane Brubaker <sbrubaker at solazyme.com> wrote:
> 
> Actually these are calls directly from Augustus (without using Maker).  They are not purely ab initio in that they are using hints from RNA-Seq data.
> 
> I had noticed that Maker does have some information about max intron length - does that mean it could be taken care of by Maker?  I don't have very good "EST" (transcriptome) assemblies because it is a very difficult organism to sequence.
> 
> 
> -----Original Message-----
> From: Carson Holt [mailto:carsonhh at gmail.com]
> Sent: Thursday, March 06, 2014 3:47 PM
> To: Shane Brubaker; maker-devel at yandell-lab.org
> Subject: Re: [maker-devel] Long introns from Augustus
> 
> Are these the ab intio calls that are merged or final MAKER models.
> 
> ?Carson
> 
> 
>> On 3/6/14, 4:41 PM, "Shane Brubaker" <sbrubaker at solazyme.com> wrote:
>> 
>> Hi, we have a very compact genome and we are getting a lot of fused 
>> gene models from running Augustus.  I am wondering if anyone has any 
>> advice about how to prevent introns above a certain cutoff from being created?
>> 
>> I tried a couple of things, some settings in a probabilities file and 
>> also changing a long list of probabilities to another file that 
>> someone had suggested on a forum.  So far I don't really see any changes though.
>> 
>> Any advice would be greatly appreciated.
>> 
>> Thanks,
>> Shane
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o
>> rg
> 
> 

From carson.holt at genetics.utah.edu  Thu Mar 13 10:00:06 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Thu, 13 Mar 2014 16:00:06 +0000
Subject: [maker-devel] non-nucleotide characters in the maker generated
	transcripts
In-Reply-To: <CF47300B.AB4F%carson.holt@genetics.utah.edu>
References: <E8EDFB90D92694478065C37017B3A3A6A890C8AC@SKREGIXES2.AGR.GC.CA>
	<CF47300B.AB4F%carson.holt@genetics.utah.edu>
Message-ID: <CF4731CC.AB5E%carson.holt@genetics.utah.edu>

Just resending this to the correct maker-devel address.  Please when
replying, do not CC the incorrect maker-devel-bounce address.

Thanks,
Carson


On 3/13/14, 9:56 AM, "Carson Holt" <carson.holt at genetics.utah.edu> wrote:

>FGENESH is not a heavily used tool, so depending on which version it is
>(either too old or too new), output might be slightly different which
>could cause incorrect parsing. Could you tar up your maker.output folder,
>and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>(send me either your user/guest ID after you upload).
>
>For the BLAST error, use BLAST+ instead.  You are using blastall which is
>the old legacy version of NCBI BLAST.  You can do this by setting the
>blast type in maker_bopts.ctl and the location of executables in
>maker_exe.ctl.
>
>Thanks,
>Carson
>
>
>
>On 3/12/14, 11:58 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:
>
>>Dear Maker users
>>
>>
>>I ran maker (2.31) on a fungal genome and found out that it inserted the
>>word SCLAR   followed by a pair of bracket like this (0x22de7020)
>>inserted in the nucleotide sequence of some of the genes. This seems to
>>be related to transcripts predicted by fgenesh_masked.
>>
>>
>>Here is an example for one of the genes
>>
>>
>>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript
>>>offset:0 AE
>>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651
>>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23
>>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA
>>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG
>>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC
>>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT
>>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC
>>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT
>>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA
>>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA
>>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT
>>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT
>>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC
>>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG
>>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG
>>TTTCGACAAGC
>>
>>The same genome sequence was used for the first round of maker (2.10)
>>without such problem. I checked the sequence for the scaffold related to
>>one of the affected transcripts and there was no error in the sequence.
>>I am not sure what is causing this. The only error that I could spot in
>>the output error file is the following
>>
>>
>>[blastall] FATAL ERROR:  search cannot proceed due to errors in all
>>contexts/frames of query sequences.
>>
>>
>>
>>Your help is appreciated
>>
>>
>>
>>HB
>>
>>
>>
>>
>>
>>
>


From carsonhh at gmail.com  Thu Mar 13 10:14:54 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 13 Mar 2014 10:14:54 -0600
Subject: [maker-devel] maker output- transcripts.fasta and
	proteins.fasta files missing
In-Reply-To: <A22880BB-7693-4655-A3F3-D99D4F1FC08D@gmail.com>
References: <E1538E4B-E356-4044-BD96-6D56D6F65C87@gmail.com>
	<64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com>
	<CF4382AA.AA8B%carsonhh@gmail.com>
	<A1D096BC-F25A-48D9-8C7F-8A64946E57F7@gmail.com>
	<CF438653.AA92%carsonhh@gmail.com>
	<A22880BB-7693-4655-A3F3-D99D4F1FC08D@gmail.com>
Message-ID: <CF4733ED.AB63%carsonhh@gmail.com>

Note protein/transcript fasts are only created when there are gene models to
output to those files (so their absence means there were no gene models for
that contig). Most sequences without protein/transcript fasts in your sample
are very short and thus don?t contain anything.  What is left either have no
est2genome results or the est2genome alignments do not have sufficient open
reading frame to be turned into a gene model (false merging of regions by
trinity can cause this, so make sure you use the jaccard index option when
assembling reads with trinity to avoid this).

You are using only the est2genome=1 option.  This will result in a limited
set of genes that can be used for training SNAP/Augustus (so not getting
results on all contigs is expected).  You really won?t get much as far as
results until you have one of the ab initio predictors turned on.

Thanks,
Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Tuesday, March 11, 2014 at 8:52 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Daniel Ence <dence at genetics.utah.edu>
Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
missing

Alright done. My username is daras

Thanks
Dhivya

On Mar 10, 2014, at 5:10 PM, Carson Holt wrote:

> Input and compressed file of output.
> 
> Thanks,
> Carson
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Monday, March 10, 2014 at 2:09 PM
> To:  Carson Holt <carsonhh at gmail.com>
> Cc:  Daniel Ence <dence at genetics.utah.edu>
> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files missing
> 
> Hi Carson,
> 
> Do you mean the whole maker output?
> 
> Thanks
> dhivya
> 
> On Mar 10, 2014, at 4:55 PM, Carson Holt wrote:
> 
>> Could you upload everything here ?>
>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>> 
>> Than send us the link generated or your user ID.
>> 
>> Thanks,
>> Carson
>> 
>> 
>> 
>> From:  dhivya arasappan <darasappan at gmail.com>
>> Date:  Monday, March 10, 2014 at 1:50 PM
>> To:  Carson Holt <carsonhh at gmail.com>, Daniel Ence <dence at genetics.utah.edu>
>> Subject:  Fwd: maker output- transcripts.fasta and proteins.fasta files
>> missing
>> 
>> Hi Carson and Daniel,
>> 
>> I'm sending this across to you separately since maker list is blocking my
>> email due to attachment size.
>> 
>> As always, thanks for any guidance you can provide.
>> Dhivya
>> 
>> 
>> Begin forwarded message:
>> 
>>> From: dhivya arasappan <darasappan at gmail.com>
>>> Date: March 10, 2014 3:14:03 PM CDT
>>> To: maker-devel at yandell-lab.org
>>> Subject: maker output- transcripts.fasta and proteins.fasta files missing
>>> 
>>>  
>>> Hello,
>>> 
>>> I've been running maker with different assembly files, reference files etc
>>> and I check the output by:
>>> 
>>> 1. concatenating the gff files
>>> 2. concatenating the *transcripts.fasta files
>>> 3. concatenating the *proteins.fasta files
>>> 
>>> I'm noticing that when I ran maker twice with same parameters, the second
>>> time around, many of the output subdirectories  do not have a
>>> *transcripts.fasta or *proteins.fasta file in it.
>>> There are 251 subdirectories and only 97 of them have all 3 output files.
>>> Maker log looks ok to me, but I've attached it here as well.
>>> 
>>> What could be the reason for this?
>>> 
>>> Thanks
>>> dhivya
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140313/1484b1b6/attachment-0003.html>

From carsonhh at gmail.com  Thu Mar 13 10:55:40 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 13 Mar 2014 10:55:40 -0600
Subject: [maker-devel] maker output- transcripts.fasta and
	proteins.fasta files missing
In-Reply-To: <0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com>
References: <E1538E4B-E356-4044-BD96-6D56D6F65C87@gmail.com>
	<64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com>
	<CF4382AA.AA8B%carsonhh@gmail.com>
	<A1D096BC-F25A-48D9-8C7F-8A64946E57F7@gmail.com>
	<CF438653.AA92%carsonhh@gmail.com>
	<A22880BB-7693-4655-A3F3-D99D4F1FC08D@gmail.com>
	<CF4733ED.AB63%carsonhh@gmail.com>
	<0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com>
Message-ID: <CF473DBA.AB9F%carsonhh@gmail.com>

The second time, it should have just started where it left off, so it would
run faster (because the processing from the previous job counted towards the
second one).  The archived output you sent me had 21,183 proteins and
transcripts.  If you are using the fasta_merge to collect them, just make
sure the datastore.index file is not truncated or corrupt otherwise it won?t
collect all the fastas from every contig.  You can rebuild the
datastore.index using the -dsindex flag with MAKER, if you want to check
that.  Also you can have maker just regenerate results without rerunning
BLAST etc., by using the -a flag if you want to just recalculate ll results
quickly (rebuilds all FASTA and GFF3 without redoing most analysis).

?Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Thursday, March 13, 2014 at 10:47 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
missing

Thanks Carson for the response.  I understand that est2genome=1 does not use
any ab initio gene predictions, but simply identifies ests based on
alignment.  I'm a little confused because I ran maker on my assembly before,
using the same parameters ( including est2genome=1).  I got a very good
result with > 20,000 transcripts and proteins.

Then I  was able to get an improved assembly, where many scaffolds were
combined into superscaffolds. So I reran maker on this assembly.   Same
parameters, same transcriptome and proteins files.  Now, I see such
drastically different results:  Only 500+ genes and transcripts.  My
scaffolds are now bigger than before, so I'm not sure how this is happening.
These were the results I sent you.

Another odd thing I noticed (and I am hesitant to report this because
perhaps it is due to some sort of error on my part):  I ran maker on the
improved assembly the first time and maker did not complete in the 48 hours
I allocated.  But I had  19,000+ transcripts in the unfinished output.  When
I reran maker, just changing the time allocated, it completed much faster,
but is giving much fewer transcripts and proteins as output.  Could
something like this happen? If not, then I'm guessing I must have changed
something although I'm pretty sure that I did not change anything other than
the time allocated. I've attached the trascripts and proteins files from the
first time I ran maker on my improved assembly.

Thanks again for your help
Dhivya


On Mar 13, 2014, at 11:14 AM, Carson Holt wrote:

> Note protein/transcript fasts are only created when there are gene models to
> output to those files (so their absence means there were no gene models for
> that contig). Most sequences without protein/transcript fasts in your sample
> are very short and thus don?t contain anything.  What is left either have no
> est2genome results or the est2genome alignments do not have sufficient open
> reading frame to be turned into a gene model (false merging of regions by
> trinity can cause this, so make sure you use the jaccard index option when
> assembling reads with trinity to avoid this).
> 
> You are using only the est2genome=1 option.  This will result in a limited set
> of genes that can be used for training SNAP/Augustus (so not getting results
> on all contigs is expected).  You really won?t get much as far as results
> until you have one of the ab initio predictors turned on.
> 
> Thanks,
> Carson
> 
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Tuesday, March 11, 2014 at 8:52 AM
> To:  Carson Holt <carsonhh at gmail.com>
> Cc:  Daniel Ence <dence at genetics.utah.edu>
> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files missing
> 
> Alright done. My username is daras
> 
> Thanks
> Dhivya
> 
> On Mar 10, 2014, at 5:10 PM, Carson Holt wrote:
> 
>> Input and compressed file of output.
>> 
>> Thanks,
>> Carson
>> 
>> From:  dhivya arasappan <darasappan at gmail.com>
>> Date:  Monday, March 10, 2014 at 2:09 PM
>> To:  Carson Holt <carsonhh at gmail.com>
>> Cc:  Daniel Ence <dence at genetics.utah.edu>
>> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
>> missing
>> 
>> Hi Carson,
>> 
>> Do you mean the whole maker output?
>> 
>> Thanks
>> dhivya
>> 
>> On Mar 10, 2014, at 4:55 PM, Carson Holt wrote:
>> 
>>> Could you upload everything here ?>
>>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>>> 
>>> Than send us the link generated or your user ID.
>>> 
>>> Thanks,
>>> Carson
>>> 
>>> 
>>> 
>>> From:  dhivya arasappan <darasappan at gmail.com>
>>> Date:  Monday, March 10, 2014 at 1:50 PM
>>> To:  Carson Holt <carsonhh at gmail.com>, Daniel Ence <dence at genetics.utah.edu>
>>> Subject:  Fwd: maker output- transcripts.fasta and proteins.fasta files
>>> missing
>>> 
>>> Hi Carson and Daniel,
>>> 
>>> I'm sending this across to you separately since maker list is blocking my
>>> email due to attachment size.
>>> 
>>> As always, thanks for any guidance you can provide.
>>> Dhivya
>>> 
>>> 
>>> Begin forwarded message:
>>> 
>>>> From: dhivya arasappan <darasappan at gmail.com>
>>>> Date: March 10, 2014 3:14:03 PM CDT
>>>> To: maker-devel at yandell-lab.org
>>>> Subject: maker output- transcripts.fasta and proteins.fasta files missing
>>>> 
>>>>  
>>>> Hello,
>>>> 
>>>> I've been running maker with different assembly files, reference files etc
>>>> and I check the output by:
>>>> 
>>>> 1. concatenating the gff files
>>>> 2. concatenating the *transcripts.fasta files
>>>> 3. concatenating the *proteins.fasta files
>>>> 
>>>> I'm noticing that when I ran maker twice with same parameters, the second
>>>> time around, many of the output subdirectories  do not have a
>>>> *transcripts.fasta or *proteins.fasta file in it.
>>>> There are 251 subdirectories and only 97 of them have all 3 output files.
>>>> Maker log looks ok to me, but I've attached it here as well.
>>>> 
>>>> What could be the reason for this?
>>>> 
>>>> Thanks
>>>> dhivya
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140313/a1a879a2/attachment-0003.html>

From darasappan at gmail.com  Thu Mar 13 10:47:25 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Thu, 13 Mar 2014 11:47:25 -0500
Subject: [maker-devel] maker output- transcripts.fasta and
	proteins.fasta files missing
In-Reply-To: <CF4733ED.AB63%carsonhh@gmail.com>
References: <E1538E4B-E356-4044-BD96-6D56D6F65C87@gmail.com>
	<64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com>
	<CF4382AA.AA8B%carsonhh@gmail.com>
	<A1D096BC-F25A-48D9-8C7F-8A64946E57F7@gmail.com>
	<CF438653.AA92%carsonhh@gmail.com>
	<A22880BB-7693-4655-A3F3-D99D4F1FC08D@gmail.com>
	<CF4733ED.AB63%carsonhh@gmail.com>
Message-ID: <0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com>

Thanks Carson for the response.  I understand that est2genome=1 does  
not use any ab initio gene predictions, but simply identifies ests  
based on alignment.  I'm a little confused because I ran maker on my  
assembly before, using the same parameters ( including est2genome=1).   
I got a very good result with > 20,000 transcripts and proteins.

Then I  was able to get an improved assembly, where many scaffolds  
were combined into superscaffolds. So I reran maker on this  
assembly.   Same parameters, same transcriptome and proteins files.   
Now, I see such drastically different results:  Only 500+ genes and  
transcripts.  My scaffolds are now bigger than before, so I'm not sure  
how this is happening.   These were the results I sent you.

Another odd thing I noticed (and I am hesitant to report this because  
perhaps it is due to some sort of error on my part):  I ran maker on  
the improved assembly the first time and maker did not complete in the  
48 hours I allocated.  But I had  19,000+ transcripts in the  
unfinished output.  When I reran maker, just changing the time  
allocated, it completed much faster, but is giving much fewer  
transcripts and proteins as output.  Could something like this happen?  
If not, then I'm guessing I must have changed something although I'm  
pretty sure that I did not change anything other than the time  
allocated. I've attached the trascripts and proteins files from the  
first time I ran maker on my improved assembly.

Thanks again for your help
Dhivya


On Mar 13, 2014, at 11:14 AM, Carson Holt wrote:

> Note protein/transcript fasts are only created when there are gene  
> models to output to those files (so their absence means there were  
> no gene models for that contig). Most sequences without protein/ 
> transcript fasts in your sample are very short and thus don?t  
> contain anything.  What is left either have no est2genome results or  
> the est2genome alignments do not have sufficient open reading frame  
> to be turned into a gene model (false merging of regions by trinity  
> can cause this, so make sure you use the jaccard index option when  
> assembling reads with trinity to avoid this).
>
> You are using only the est2genome=1 option.  This will result in a  
> limited set of genes that can be used for training SNAP/Augustus (so  
> not getting results on all contigs is expected).  You really won?t  
> get much as far as results until you have one of the ab initio  
> predictors turned on.
>
> Thanks,
> Carson
>
>
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Tuesday, March 11, 2014 at 8:52 AM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: Daniel Ence <dence at genetics.utah.edu>
> Subject: Re: maker output- transcripts.fasta and proteins.fasta  
> files missing
>
> Alright done. My username is daras
>
> Thanks
> Dhivya
>
> On Mar 10, 2014, at 5:10 PM, Carson Holt wrote:
>
>> Input and compressed file of output.
>>
>> Thanks,
>> Carson
>>
>> From: dhivya arasappan <darasappan at gmail.com>
>> Date: Monday, March 10, 2014 at 2:09 PM
>> To: Carson Holt <carsonhh at gmail.com>
>> Cc: Daniel Ence <dence at genetics.utah.edu>
>> Subject: Re: maker output- transcripts.fasta and proteins.fasta  
>> files missing
>>
>> Hi Carson,
>>
>> Do you mean the whole maker output?
>>
>> Thanks
>> dhivya
>>
>> On Mar 10, 2014, at 4:55 PM, Carson Holt wrote:
>>
>>> Could you upload everything here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>>>
>>> Than send us the link generated or your user ID.
>>>
>>> Thanks,
>>> Carson
>>>
>>>
>>>
>>> From: dhivya arasappan <darasappan at gmail.com>
>>> Date: Monday, March 10, 2014 at 1:50 PM
>>> To: Carson Holt <carsonhh at gmail.com>, Daniel Ence <dence at genetics.utah.edu 
>>> >
>>> Subject: Fwd: maker output- transcripts.fasta and proteins.fasta  
>>> files missing
>>>
>>> Hi Carson and Daniel,
>>>
>>> I'm sending this across to you separately since maker list is  
>>> blocking my email due to attachment size.
>>>
>>> As always, thanks for any guidance you can provide.
>>> Dhivya
>>>
>>>
>>> Begin forwarded message:
>>>
>>>> From: dhivya arasappan <darasappan at gmail.com>
>>>> Date: March 10, 2014 3:14:03 PM CDT
>>>> To: maker-devel at yandell-lab.org
>>>> Subject: maker output- transcripts.fasta and proteins.fasta files  
>>>> missing
>>>>
>>>> Hello,
>>>>
>>>> I've been running maker with different assembly files, reference  
>>>> files etc  and I check the output by:
>>>>
>>>> 1. concatenating the gff files
>>>> 2. concatenating the *transcripts.fasta files
>>>> 3. concatenating the *proteins.fasta files
>>>>
>>>> I'm noticing that when I ran maker twice with same parameters,  
>>>> the second time around, many of the output subdirectories  do not  
>>>> have a *transcripts.fasta or *proteins.fasta file in it.
>>>> There are 251 subdirectories and only 97 of them have all 3  
>>>> output files.  Maker log looks ok to me, but I've attached it  
>>>> here as well.
>>>>
>>>> What could be the reason for this?
>>>>
>>>> Thanks
>>>> dhivya
>>>>
>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140313/2048cfef/attachment-0009.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: transcripts.cat.fasta.old.gz
Type: application/x-gzip
Size: 7927581 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140313/2048cfef/attachment-0002.tgz>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140313/2048cfef/attachment-0010.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: proteins.cat.fasta.old.gz
Type: application/x-gzip
Size: 3668381 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140313/2048cfef/attachment-0003.tgz>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140313/2048cfef/attachment-0011.html>

From carsonhh at gmail.com  Thu Mar 13 12:53:05 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 13 Mar 2014 12:53:05 -0600
Subject: [maker-devel] maker output- transcripts.fasta and
	proteins.fasta files missing
In-Reply-To: <C5EC9853-C3A9-4651-9C7F-05F7B73FC628@gmail.com>
References: <E1538E4B-E356-4044-BD96-6D56D6F65C87@gmail.com>
	<64504EF3-413C-46C2-A95F-C855EC9383D1@gmail.com>
	<CF4382AA.AA8B%carsonhh@gmail.com>
	<A1D096BC-F25A-48D9-8C7F-8A64946E57F7@gmail.com>
	<CF438653.AA92%carsonhh@gmail.com>
	<A22880BB-7693-4655-A3F3-D99D4F1FC08D@gmail.com>
	<CF4733ED.AB63%carsonhh@gmail.com>
	<0A4E4571-97F3-44A5-BFDF-9465E7683D9C@gmail.com>
	<CF473DBA.AB9F%carsonhh@gmail.com>
	<672A27A2-FFBD-45EC-9303-E3973EEA5AB6@gmail.com>
	<CF474291.ABC0%carsonhh@gmail.com>
	<CF4744C6.ABC9%carsonhh@gmail.com>
	<5EE3B5E8-E7DC-4F09-B52D-E08CA4D85A15@gmail.com>
	<CF474BE5.ABDA%carsonhh@gmail.com>
	<C5EC9853-C3A9-4651-9C7F-05F7B73FC628@gmail.com>
Message-ID: <CF4759BA.ABE2%carsonhh@gmail.com>

For future reference, I suggest using the ?/maker/bin/fasta_merge tool to
merge based on the datastore.index rather than other command line based
methods.  It will handle the multiple fasta types that are produced in the
results, and will validate with the datastore.index file.

Example:
fasta_merge -d 
opgenResult+scaffoldsLengthsLess200_master_datastore_index.log

The same is also true when merging gff3 files.
gff3_merge -d opgenResult+scaffoldsLengthsLess200_master_datastore_index.log

Thanks,
Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Thursday, March 13, 2014 at 12:48 PM
To:  Carson Holt <carsonhh at gmail.com>
Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
missing

ah  I forgot that some were called superscaffolds.  That is a difference
between the old and new assembly. This was definitely the issue. Thanks and
sorry for the mix up.

Dhivya
On Mar 13, 2014, at 12:51 PM, Carson Holt wrote:

> Note that your command does not capture everything because not all scaffolds
> start with the name ?scaffold".
> 
> This works though ?>
> ls -lh opgenResult+scaffoldsLengthsLess200_datastore/*/*/*/*trans*fasta|wc -l
> 
> Thanks,
> Carson
> 
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Thursday, March 13, 2014 at 11:34 AM
> To:  Carson Holt <carsonhh at gmail.com>
> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files missing
> 
> Hi Carson,
> 
> Am I looking in the wrong place for my fasta files?  I looked here:
> 
> ls -lh opgenResult+scaffoldsLengthsLess200_datastore/*/*/sca*/*trans*fasta|wc
> -l
> 
> I see only 97 such files- so 97 contigs with transcripts.fasta files?
> 
> When I count the number of sequences in all these files, I get 514 sequences.
> 
> grep -c '^>' 
> opgenResult+scaffoldsLengthsLess200_datastore/*/*/sca*/*trans*fasta|cut -d ':'
> -f 2|awk '{total+=$0}END{print total}'
> 
> Could you tell how and where you are getting the 21,183 transcripts?
> 
> thanks
> dhivya
> 
> On Mar 13, 2014, at 12:21 PM, Carson Holt wrote:
> 
>> This is what I see in your uploaded data.  There are 21,183 transcripts from
>> 201 contigs.  Then there are 707 contigs with no gene models.
>> 
>> ?Carson
>> 
>> 
>> From:  Carson Holt <carsonhh at gmail.com>
>> Date:  Thursday, March 13, 2014 at 11:11 AM
>> To:  dhivya arasappan <darasappan at gmail.com>
>> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
>> missing
>> 
>> "as you saw from the output I uploaded before, the output certainly was much
>> less than 20,000 transcripts?
>> 
>> Actually there were 21,183 in the output you uploaded.  I saw no loss of
>> entries.
>> 
>> ?Carson
>> 
>> From:  dhivya arasappan <darasappan at gmail.com>
>> Date:  Thursday, March 13, 2014 at 11:09 AM
>> To:  Carson Holt <carsonhh at gmail.com>
>> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
>> missing
>> 
>> Hi Carson,
>> 
>> The datastore.index file looks fine- it has a started and finished status for
>> my 980 scaffolds.  I reran with increased time twice. Second time around, I
>> actually deleted the entire output directory to make sure it runs all over
>> again.  It still seemed to complete within a day. As you saw from the output
>> I uploaded before, the output certainly was much less than 20,000
>> transcripts. Given that I was seeing great results for an older version of my
>> assembly, I'm puzzled as to why my results are worse this time around. Any
>> suggestions of what to check or what I can do to see improved results would
>> be really helpful.
>> 
>> I do know that I went from ~4% gaps to ~6% gaps in my new assembly- other
>> than that, its better in every way. Could this cause just a dramatic
>> difference in results?
>> 
>> Thanks
>> dhivya
>> 
>> On Mar 13, 2014, at 11:55 AM, Carson Holt wrote:
>> 
>>> The second time, it should have just started where it left off, so it would
>>> run faster (because the processing from the previous job counted towards the
>>> second one).  The archived output you sent me had 21,183 proteins and
>>> transcripts.  If you are using the fasta_merge to collect them, just make
>>> sure the datastore.index file is not truncated or corrupt otherwise it won?t
>>> collect all the fastas from every contig.  You can rebuild the
>>> datastore.index using the -dsindex flag with MAKER, if you want to check
>>> that.  Also you can have maker just regenerate results without rerunning
>>> BLAST etc., by using the -a flag if you want to just recalculate ll results
>>> quickly (rebuilds all FASTA and GFF3 without redoing most analysis).
>>> 
>>> ?Carson
>>> 
>>> 
>>> From:  dhivya arasappan <darasappan at gmail.com>
>>> Date:  Thursday, March 13, 2014 at 10:47 AM
>>> To:  Carson Holt <carsonhh at gmail.com>
>>> Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
>>> <maker-devel at yandell-lab.org>
>>> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
>>> missing
>>> 
>>> Thanks Carson for the response.  I understand that est2genome=1 does not use
>>> any ab initio gene predictions, but simply identifies ests based on
>>> alignment.  I'm a little confused because I ran maker on my assembly before,
>>> using the same parameters ( including est2genome=1).  I got a very good
>>> result with > 20,000 transcripts and proteins.
>>> 
>>> Then I  was able to get an improved assembly, where many scaffolds were
>>> combined into superscaffolds. So I reran maker on this assembly.   Same
>>> parameters, same transcriptome and proteins files.  Now, I see such
>>> drastically different results:  Only 500+ genes and transcripts.  My
>>> scaffolds are now bigger than before, so I'm not sure how this is happening.
>>> These were the results I sent you.
>>> 
>>> Another odd thing I noticed (and I am hesitant to report this because
>>> perhaps it is due to some sort of error on my part):  I ran maker on the
>>> improved assembly the first time and maker did not complete in the 48 hours
>>> I allocated.  But I had  19,000+ transcripts in the unfinished output.  When
>>> I reran maker, just changing the time allocated, it completed much faster,
>>> but is giving much fewer transcripts and proteins as output.  Could
>>> something like this happen? If not, then I'm guessing I must have changed
>>> something although I'm pretty sure that I did not change anything other than
>>> the time allocated. I've attached the trascripts and proteins files from the
>>> first time I ran maker on my improved assembly.
>>> 
>>> Thanks again for your help
>>> Dhivya
>>> 
>>> 
>>> 
>>> On Mar 13, 2014, at 11:14 AM, Carson Holt wrote:
>>> 
>>>> Note protein/transcript fasts are only created when there are gene models
>>>> to output to those files (so their absence means there were no gene models
>>>> for that contig). Most sequences without protein/transcript fasts in your
>>>> sample are very short and thus don?t contain anything.  What is left either
>>>> have no est2genome results or the est2genome alignments do not have
>>>> sufficient open reading frame to be turned into a gene model (false merging
>>>> of regions by trinity can cause this, so make sure you use the jaccard
>>>> index option when assembling reads with trinity to avoid this).
>>>> 
>>>> You are using only the est2genome=1 option.  This will result in a limited
>>>> set of genes that can be used for training SNAP/Augustus (so not getting
>>>> results on all contigs is expected).  You really won?t get much as far as
>>>> results until you have one of the ab initio predictors turned on.
>>>> 
>>>> Thanks,
>>>> Carson
>>>> 
>>>> 
>>>> From:  dhivya arasappan <darasappan at gmail.com>
>>>> Date:  Tuesday, March 11, 2014 at 8:52 AM
>>>> To:  Carson Holt <carsonhh at gmail.com>
>>>> Cc:  Daniel Ence <dence at genetics.utah.edu>
>>>> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
>>>> missing
>>>> 
>>>> Alright done. My username is daras
>>>> 
>>>> Thanks
>>>> Dhivya
>>>> 
>>>> On Mar 10, 2014, at 5:10 PM, Carson Holt wrote:
>>>> 
>>>>> Input and compressed file of output.
>>>>> 
>>>>> Thanks,
>>>>> Carson
>>>>> 
>>>>> From:  dhivya arasappan <darasappan at gmail.com>
>>>>> Date:  Monday, March 10, 2014 at 2:09 PM
>>>>> To:  Carson Holt <carsonhh at gmail.com>
>>>>> Cc:  Daniel Ence <dence at genetics.utah.edu>
>>>>> Subject:  Re: maker output- transcripts.fasta and proteins.fasta files
>>>>> missing
>>>>> 
>>>>> Hi Carson,
>>>>> 
>>>>> Do you mean the whole maker output?
>>>>> 
>>>>> Thanks
>>>>> dhivya
>>>>> 
>>>>> On Mar 10, 2014, at 4:55 PM, Carson Holt wrote:
>>>>> 
>>>>>> Could you upload everything here ?>
>>>>>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>>>>>> 
>>>>>> Than send us the link generated or your user ID.
>>>>>> 
>>>>>> Thanks,
>>>>>> Carson
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> From:  dhivya arasappan <darasappan at gmail.com>
>>>>>> Date:  Monday, March 10, 2014 at 1:50 PM
>>>>>> To:  Carson Holt <carsonhh at gmail.com>, Daniel Ence
>>>>>> <dence at genetics.utah.edu>
>>>>>> Subject:  Fwd: maker output- transcripts.fasta and proteins.fasta files
>>>>>> missing
>>>>>> 
>>>>>> Hi Carson and Daniel,
>>>>>> 
>>>>>> I'm sending this across to you separately since maker list is blocking my
>>>>>> email due to attachment size.
>>>>>> 
>>>>>> As always, thanks for any guidance you can provide.
>>>>>> Dhivya
>>>>>> 
>>>>>> 
>>>>>> Begin forwarded message:
>>>>>> 
>>>>>>> From: dhivya arasappan <darasappan at gmail.com>
>>>>>>> Date: March 10, 2014 3:14:03 PM CDT
>>>>>>> To: maker-devel at yandell-lab.org
>>>>>>> Subject: maker output- transcripts.fasta and proteins.fasta files
>>>>>>> missing
>>>>>>> 
>>>>>>>  
>>>>>>> Hello,
>>>>>>> 
>>>>>>> I've been running maker with different assembly files, reference files
>>>>>>> etc  and I check the output by:
>>>>>>> 
>>>>>>> 1. concatenating the gff files
>>>>>>> 2. concatenating the *transcripts.fasta files
>>>>>>> 3. concatenating the *proteins.fasta files
>>>>>>> 
>>>>>>> I'm noticing that when I ran maker twice with same parameters, the
>>>>>>> second time around, many of the output subdirectories  do not have a
>>>>>>> *transcripts.fasta or *proteins.fasta file in it.
>>>>>>> There are 251 subdirectories and only 97 of them have all 3 output
>>>>>>> files.  Maker log looks ok to me, but I've attached it here as well.
>>>>>>> 
>>>>>>> What could be the reason for this?
>>>>>>> 
>>>>>>> Thanks
>>>>>>> dhivya
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140313/dff0c913/attachment-0003.html>

From cjfields at illinois.edu  Thu Mar 13 15:04:23 2014
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 13 Mar 2014 21:04:23 +0000
Subject: [maker-devel] geneid (or alternative ab initio predictors)
In-Reply-To: <CAJrwUqnXZgJgse2X6z7QhQcC_aNih_dp90dpjsW037F0Qk-W4A@mail.gmail.com>
References: <CF433C40.AA26%carsonhh@gmail.com>
	<CAJrwUqnXZgJgse2X6z7QhQcC_aNih_dp90dpjsW037F0Qk-W4A@mail.gmail.com>
Message-ID: <A7C303EB-717F-4E95-8829-7912B49A6D38@illinois.edu>

That is nice to know; I?ll have to check the masking on this assembly to see if that is the problem (my guess is that it is).

Carson, re: geneid and ?hints?, it looks as if geneid can take some hints such as BLAST HSPs (as well as other information), in the form of a GFF ?homology? file.  I assume it could take protein2genome/est2genome as well through the same route.

chris

On Mar 10, 2014, at 1:31 PM, Sajeet Haridas <sajeet at gmail.com<mailto:sajeet at gmail.com>> wrote:

One of the problems I have found with genemark is that it does not understand a soft-masked genome. Hence, the self training is incorrect. I have found marked improvement to genemark's prediction by running the training on a hard masked genome.


On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>> wrote:
Adding a new predictor can take some time.  It obviously requires some
coding.  It?s usually not too hard just to convert results to GFF3 and
then pass it in.  Integrated support is really only beneficial for
predictors that can take ?hints? from evidence alignments (for example we
are working on EVM integration right now -
http://evidencemodeler.sourceforge.net<http://evidencemodeler.sourceforge.net/>).  If SNAP and GeneMark give
problems just drop them.  GeneMark really doesn?t work very good on
genomes with complex intron/exon structure (and I really wouldn?t use it
for anything but fungi).

Make sure you are also giving sufficient protein evidence.  Perhaps all
proteins from chicken and pigeon for example.  Then you shouldn?t find
loss of any true genes if just using Augustus.  Also try not to use gene
count as an indicator of performance.  The value is very deceptive,
especially if the genome assembly is fragmented.

Thanks,
Carson


On 3/10/14, 8:52 AM, "Fields, Christopher J" <cjfields at illinois.edu<mailto:cjfields at illinois.edu>> wrote:

>I have been running MAKER 2.31 using Augustus and SNAP on an avian
>genome.  Augustus gives pretty decent gene model predictions based on a
>custom model we have and the hints MAKER provides.  However, SNAP seems
>to throw out a ton of false positives; in many cases this appears to
>cause erroneous gene fusions.  Leaving out SNAP altogether however leads
>to a marked decrease in # models overall, which is worse.  GeneMark had a
>very similar problem (high # false positives) and thus no marked
>improvement, either when using with both Augustus and SNAP or with
>Augustus alone.
>
>I have been exploring using geneid
>(http://genome.crg.es/software/geneid/) as an alternative, based on some
>feedback on another project I worked with int he past.  This would be
>feed into MAKER using external GFF, but I wanted to see if anyone has
>tried geneid with MAKER first.
>
>Finally, how hard would it be to incorporate alternative callers into
>MAKER?  For instance, would it be possible to add these like a ?plugin??
>
>chris
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140313/357a688a/attachment-0003.html>

From jfierst at uoregon.edu  Fri Mar 14 10:06:26 2014
From: jfierst at uoregon.edu (Janna Fierst)
Date: Fri, 14 Mar 2014 09:06:26 -0700
Subject: [maker-devel] associating gene names between related strains
Message-ID: <CAGoyurZz5FvX_oCGtSoq5mzwfabFS5ixaHVgzQds7Bo26NcYHg@mail.gmail.com>

Hi,

we are assembling and annotating genomes for several related strains of
Caenorhabditis worms and I was wondering if there is a way to coordinate
the gene naming so that orthologs between species can be associated by
name. I have been playing around a little with the est_forward option but
can't figure out a good system/workflow that preserves names but still uses
the strain-specific RNA-Seq EST set for the actual gene models. Thanks!
-Janna
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140314/6d450ccc/attachment-0003.html>

From dence at genetics.utah.edu  Fri Mar 14 11:32:02 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Fri, 14 Mar 2014 17:32:02 +0000
Subject: [maker-devel] associating gene names between related strains
In-Reply-To: <CAGoyurZz5FvX_oCGtSoq5mzwfabFS5ixaHVgzQds7Bo26NcYHg@mail.gmail.com>
References: <CAGoyurZz5FvX_oCGtSoq5mzwfabFS5ixaHVgzQds7Bo26NcYHg@mail.gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6C3C3@mxb2.hg.genetics.utah.edu>

Hi Janna, So do you have one strain that you want to use as the reference for all the others? There's a script that comes with MAKER called maker_map_ids that lets you use a common prefix or suffix for entries in a fasta file from one strain and then use est_forward to use that ID in the gene models for the other species.

Let me know if that's not what you're looking for,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Janna Fierst [jfierst at uoregon.edu]
Sent: Friday, March 14, 2014 10:06 AM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] associating gene names between related strains

Hi,

we are assembling and annotating genomes for several related strains of Caenorhabditis worms and I was wondering if there is a way to coordinate the gene naming so that orthologs between species can be associated by name. I have been playing around a little with the est_forward option but can't figure out a good system/workflow that preserves names but still uses the strain-specific RNA-Seq EST set for the actual gene models. Thanks! -Janna
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140314/84143c7f/attachment-0003.html>

From jfierst at uoregon.edu  Fri Mar 14 12:01:16 2014
From: jfierst at uoregon.edu (Janna Fierst)
Date: Fri, 14 Mar 2014 11:01:16 -0700
Subject: [maker-devel] associating gene names between related strains
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6C3C3@mxb2.hg.genetics.utah.edu>
References: <CAGoyurZz5FvX_oCGtSoq5mzwfabFS5ixaHVgzQds7Bo26NcYHg@mail.gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6C3C3@mxb2.hg.genetics.utah.edu>
Message-ID: <CAGoyuracbDO5pcWU7wThnnnGbfoKo2xEn+trPPUaUJx9t+8_Lg@mail.gmail.com>

I will try it today. Thanks for the quick reply!


On Fri, Mar 14, 2014 at 10:32 AM, Daniel Ence <dence at genetics.utah.edu>wrote:

>  Hi Janna, So do you have one strain that you want to use as the
> reference for all the others? There's a script that comes with MAKER called
> maker_map_ids that lets you use a common prefix or suffix for entries in a
> fasta file from one strain and then use est_forward to use that ID in the
> gene models for the other species.
>
>  Let me know if that's not what you're looking for,
> Daniel
>
>  Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
>   ------------------------------
> *From:* maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
> Janna Fierst [jfierst at uoregon.edu]
> *Sent:* Friday, March 14, 2014 10:06 AM
> *To:* maker-devel at yandell-lab.org
> *Subject:* [maker-devel] associating gene names between related strains
>
>   Hi,
>
> we are assembling and annotating genomes for several related strains of
> Caenorhabditis worms and I was wondering if there is a way to coordinate
> the gene naming so that orthologs between species can be associated by
> name. I have been playing around a little with the est_forward option but
> can't figure out a good system/workflow that preserves names but still uses
> the strain-specific RNA-Seq EST set for the actual gene models. Thanks!
> -Janna
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140314/6c26531d/attachment-0003.html>

From carsonhh at gmail.com  Fri Mar 14 12:02:48 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 14 Mar 2014 12:02:48 -0600
Subject: [maker-devel] associating gene names between related strains
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6C3C3@mxb2.hg.genetics.utah.edu>
References: <CAGoyurZz5FvX_oCGtSoq5mzwfabFS5ixaHVgzQds7Bo26NcYHg@mail.gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6C3C3@mxb2.hg.genetics.utah.edu>
Message-ID: <CF489F0B.AC19%carsonhh@gmail.com>

maker_map_ids does a translation (i.e. change gene-A to smug1), so you need
to know which genes you want to translate names to (two column input file,
column 1 -> original ID, column 2 -> new ID).  I?m not sure EST forward is
the best way to do this, although I do think maker_map_ids is the tool to
use in the end.  The question is how to make a list of IDs to translate as
the input to maker_map_ids?

I would actually just use BLASTP against the reference strain, and then do
reciprocal best BLAST hits.  To do this you BLAST your reference proteins
against your maker proteins.  Then do the opposite, BLAST your  maker
proteins against your reference proteins.  If they are both each others best
hit, then they are orthologous, and you can safely make a two column entry
for the maker_map_ids input (i.e. maker-gene-1 translates into smug1).

?Carson


From:  Daniel Ence <dence at genetics.utah.edu>
Date:  Friday, March 14, 2014 at 11:32 AM
To:  Janna Fierst <jfierst at uoregon.edu>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] associating gene names between related strains

Hi Janna, So do you have one strain that you want to use as the reference
for all the others? There's a script that comes with MAKER called
maker_map_ids that lets you use a common prefix or suffix for entries in a
fasta file from one strain and then use est_forward to use that ID in the
gene models for the other species.

Let me know if that's not what you're looking for,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330

From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Janna
Fierst [jfierst at uoregon.edu]
Sent: Friday, March 14, 2014 10:06 AM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] associating gene names between related strains

Hi,

we are assembling and annotating genomes for several related strains of
Caenorhabditis worms and I was wondering if there is a way to coordinate the
gene naming so that orthologs between species can be associated by name. I
have been playing around a little with the est_forward option but can't
figure out a good system/workflow that preserves names but still uses the
strain-specific RNA-Seq EST set for the actual gene models. Thanks! -Janna
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140314/e19abad7/attachment-0003.html>

From carsonhh at gmail.com  Fri Mar 14 12:43:41 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 14 Mar 2014 12:43:41 -0600
Subject: [maker-devel] Error when running maker2zff script
In-Reply-To: <9E3C7171-E5F7-4602-A7B7-9E9CE91F303A@gmail.com>
References: <C9394A0F-A682-4249-80DD-D79E45AE18EA@gmail.com>
	<3219E92A-2024-45C6-84A9-66C646287D7E@gmail.com>
	<9E3C7171-E5F7-4602-A7B7-9E9CE91F303A@gmail.com>
Message-ID: <CF48A7BD.AC29%carsonhh@gmail.com>

I?m glad you were able to fix it.  I?ll check to see why it was failing as
well.

Thanks,
Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Friday, March 14, 2014 at 10:16 AM
To:  Carson Holt <carsonhh at gmail.com>
Subject:  Re: Error when running maker2zff script

Kindly ignore my previous question. I was able to manipulate the scaffold
names in the gff file to get maker2zff to work.

Thanks
dhivya

On Mar 14, 2014, at 10:55 AM, dhivya arasappan <darasappan at gmail.com> wrote:

> My message got flagged by the maker list again, so I?m forwarding this
> separately to you.  Is there a better way to send biggish files?
> 
> 
> Thank you
> Dhivya
> 
> 
> 
> Begin forwarded message:
> 
>> From: dhivya arasappan <darasappan at gmail.com>
>> Subject: Error when running maker2zff script
>> Date: March 13, 2014 at 8:35:27 PM CDT
>> To: Carson Holt <carsonhh at gmail.com>, maker-devel at yandell-lab.org
>> 
>> Hi Carson,
>> 
>> I used gff3_merge to create my gff file from maker output. I've attached it
>> here. But when I run maker2zff on it, I get the following error:
>> 
>> Can't use an undefined value as an ARRAY reference at
>> /opt/apps/maker/2.30/bin/maker2zff line 177, <GFF> line 7294251.
>> 
>> It produces an incomplete output file and it looks like it may be running
>> into problems when it encounters scaffold3%2F0.  I'm wondering if its having
>> problems with my scaffold names. There seem to be some inconsistencies
>> because it's referred to as  scaffold3%F0 and scaffold3/0 in the gff file.
>> It goes through other scaffolds like SCAFFOLD3_873, SCAFFOLD3_95 etc just
>> fine.   I did try replacing the scaffold names in the gff file, but still get
>> the same error.   Any ideas?
>> 
>> Substitution command I used, for your reference:  sed 's/3\%2F/3_/g' gfffile|
>> sed 's/\//\_/'  > mod.gfffile
>> 
>> Thanks
>> Dhivya
>> 
> <head.gff.gz>
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140314/0ab2c23b/attachment-0003.html>

From carsonhh at gmail.com  Fri Mar 14 13:25:58 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 14 Mar 2014 13:25:58 -0600
Subject: [maker-devel] geneid (or alternative ab initio predictors)
In-Reply-To: <A7C303EB-717F-4E95-8829-7912B49A6D38@illinois.edu>
References: <CF433C40.AA26%carsonhh@gmail.com>
	<CAJrwUqnXZgJgse2X6z7QhQcC_aNih_dp90dpjsW037F0Qk-W4A@mail.gmail.com>
	<A7C303EB-717F-4E95-8829-7912B49A6D38@illinois.edu>
Message-ID: <CF48B2BC.AC3E%carsonhh@gmail.com>

We can look into it.

?Carson

From:  "Fields, Christopher J" <cjfields at illinois.edu>
Date:  Thursday, March 13, 2014 at 3:04 PM
To:  Sajeet Haridas <sajeet at gmail.com>
Cc:  Carson Holt <carsonhh at gmail.com>, "<maker-devel at yandell-lab.org> List"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] geneid (or alternative ab initio predictors)

That is nice to know; I?ll have to check the masking on this assembly to see
if that is the problem (my guess is that it is).

Carson, re: geneid and ?hints?, it looks as if geneid can take some hints
such as BLAST HSPs (as well as other information), in the form of a GFF
?homology? file.  I assume it could take protein2genome/est2genome as well
through the same route.

chris

On Mar 10, 2014, at 1:31 PM, Sajeet Haridas <sajeet at gmail.com> wrote:

> One of the problems I have found with genemark is that it does not understand
> a soft-masked genome. Hence, the self training is incorrect. I have found
> marked improvement to genemark's prediction by running the training on a hard
> masked genome.
> 
> 
> On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt <carsonhh at gmail.com> wrote:
>> Adding a new predictor can take some time.  It obviously requires some
>> coding.  It?s usually not too hard just to convert results to GFF3 and
>> then pass it in.  Integrated support is really only beneficial for
>> predictors that can take ?hints? from evidence alignments (for example we
>> are working on EVM integration right now -
>> http://evidencemodeler.sourceforge.net
>> <http://evidencemodeler.sourceforge.net/> ).  If SNAP and GeneMark give
>> problems just drop them.  GeneMark really doesn?t work very good on
>> genomes with complex intron/exon structure (and I really wouldn?t use it
>> for anything but fungi).
>> 
>> Make sure you are also giving sufficient protein evidence.  Perhaps all
>> proteins from chicken and pigeon for example.  Then you shouldn?t find
>> loss of any true genes if just using Augustus.  Also try not to use gene
>> count as an indicator of performance.  The value is very deceptive,
>> especially if the genome assembly is fragmented.
>> 
>> Thanks,
>> Carson
>> 
>> 
>> 
>> On 3/10/14, 8:52 AM, "Fields, Christopher J" <cjfields at illinois.edu> wrote:
>> 
>>> >I have been running MAKER 2.31 using Augustus and SNAP on an avian
>>> >genome.  Augustus gives pretty decent gene model predictions based on a
>>> >custom model we have and the hints MAKER provides.  However, SNAP seems
>>> >to throw out a ton of false positives; in many cases this appears to
>>> >cause erroneous gene fusions.  Leaving out SNAP altogether however leads
>>> >to a marked decrease in # models overall, which is worse.  GeneMark had a
>>> >very similar problem (high # false positives) and thus no marked
>>> >improvement, either when using with both Augustus and SNAP or with
>>> >Augustus alone.
>>> >
>>> >I have been exploring using geneid
>>> >(http://genome.crg.es/software/geneid/) as an alternative, based on some
>>> >feedback on another project I worked with int he past.  This would be
>>> >feed into MAKER using external GFF, but I wanted to see if anyone has
>>> >tried geneid with MAKER first.
>>> >
>>> >Finally, how hard would it be to incorporate alternative callers into
>>> >MAKER?  For instance, would it be possible to add these like a ?plugin??
>>> >
>>> >chris
>>> >_______________________________________________
>>> >maker-devel mailing list
>>> >maker-devel at box290.bluehost.com
>>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
>> 
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140314/f67ff628/attachment-0003.html>

From cjfields at illinois.edu  Fri Mar 14 20:22:55 2014
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Sat, 15 Mar 2014 02:22:55 +0000
Subject: [maker-devel] geneid (or alternative ab initio predictors)
In-Reply-To: <CF48B2BC.AC3E%carsonhh@gmail.com>
References: <CF433C40.AA26%carsonhh@gmail.com>
	<CAJrwUqnXZgJgse2X6z7QhQcC_aNih_dp90dpjsW037F0Qk-W4A@mail.gmail.com>
	<A7C303EB-717F-4E95-8829-7912B49A6D38@illinois.edu>
	<CF48B2BC.AC3E%carsonhh@gmail.com>
Message-ID: <53FD788A-15EA-4A18-BB2F-3072178816CA@illinois.edu>

Not an issue at the moment; I?ll likely supply these via gff for now.  If needed I can work off a svn checkout and send along a patch should I ever manage to eek out time to work on it.

chris

On Mar 14, 2014, at 2:25 PM, Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>> wrote:

We can look into it.

?Carson

From: "Fields, Christopher J" <cjfields at illinois.edu<mailto:cjfields at illinois.edu>>
Date: Thursday, March 13, 2014 at 3:04 PM
To: Sajeet Haridas <sajeet at gmail.com<mailto:sajeet at gmail.com>>
Cc: Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>, "<maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>> List" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] geneid (or alternative ab initio predictors)

That is nice to know; I?ll have to check the masking on this assembly to see if that is the problem (my guess is that it is).

Carson, re: geneid and ?hints?, it looks as if geneid can take some hints such as BLAST HSPs (as well as other information), in the form of a GFF ?homology? file.  I assume it could take protein2genome/est2genome as well through the same route.

chris

On Mar 10, 2014, at 1:31 PM, Sajeet Haridas <sajeet at gmail.com<mailto:sajeet at gmail.com>> wrote:

One of the problems I have found with genemark is that it does not understand a soft-masked genome. Hence, the self training is incorrect. I have found marked improvement to genemark's prediction by running the training on a hard masked genome.


On Mon, Mar 10, 2014 at 10:05 AM, Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>> wrote:
Adding a new predictor can take some time.  It obviously requires some
coding.  It?s usually not too hard just to convert results to GFF3 and
then pass it in.  Integrated support is really only beneficial for
predictors that can take ?hints? from evidence alignments (for example we
are working on EVM integration right now -
http://evidencemodeler.sourceforge.net<http://evidencemodeler.sourceforge.net/>).  If SNAP and GeneMark give
problems just drop them.  GeneMark really doesn?t work very good on
genomes with complex intron/exon structure (and I really wouldn?t use it
for anything but fungi).

Make sure you are also giving sufficient protein evidence.  Perhaps all
proteins from chicken and pigeon for example.  Then you shouldn?t find
loss of any true genes if just using Augustus.  Also try not to use gene
count as an indicator of performance.  The value is very deceptive,
especially if the genome assembly is fragmented.

Thanks,
Carson


On 3/10/14, 8:52 AM, "Fields, Christopher J" <cjfields at illinois.edu<mailto:cjfields at illinois.edu>> wrote:

>I have been running MAKER 2.31 using Augustus and SNAP on an avian
>genome.  Augustus gives pretty decent gene model predictions based on a
>custom model we have and the hints MAKER provides.  However, SNAP seems
>to throw out a ton of false positives; in many cases this appears to
>cause erroneous gene fusions.  Leaving out SNAP altogether however leads
>to a marked decrease in # models overall, which is worse.  GeneMark had a
>very similar problem (high # false positives) and thus no marked
>improvement, either when using with both Augustus and SNAP or with
>Augustus alone.
>
>I have been exploring using geneid
>(http://genome.crg.es/software/geneid/) as an alternative, based on some
>feedback on another project I worked with int he past.  This would be
>feed into MAKER using external GFF, but I wanted to see if anyone has
>tried geneid with MAKER first.
>
>Finally, how hard would it be to incorporate alternative callers into
>MAKER?  For instance, would it be possible to add these like a ?plugin??
>
>chris
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140315/e6294622/attachment-0003.html>

From carson.holt at genetics.utah.edu  Mon Mar 17 13:45:15 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Mon, 17 Mar 2014 19:45:15 +0000
Subject: [maker-devel] non-nucleotide characters in the maker generated
	transcripts
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A890CC84@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A890C8AC@SKREGIXES2.AGR.GC.CA>
	<CF47300B.AB4F%carson.holt@genetics.utah.edu>
	<CF4731CC.AB5E%carson.holt@genetics.utah.edu>
	<E8EDFB90D92694478065C37017B3A3A6A890CC84@SKREGIXES2.AGR.GC.CA>
Message-ID: <CF4CA8DB.AD74%carson.holt@genetics.utah.edu>

I have attached 4 files for you to place in the .../maker/Widgets/
directory.

The *blast.pm files will suppress the BLAST+ failures you are getting
(alternatively you can just downgrade to BLAST 2.27 to get the same
effect).  BLAST 2.29 gives a lot of warnings etc., which you can ignore.
In the latest release NCBI redid all their warnings and error codes so it
spits out a lot of garbage and fails with different messages than it did
before.  For example BLAST now warns you every time it encounter a fasta
header with a comment (virtually every fasta entry in existence falls in
this category), so your screen will be awash with meaningless warning
messages.

The fgenesh.pm file will fix the other failure, which only occurs if you
use fgenesh simultaneously with the est_fustion=1 option.  No other
predictors are affected.

Thanks,
Carson


On 3/14/14, 5:14 PM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:

>Dear  Carson
>
>Sorry for the late reply. I was away for a couple of days. I have uploaded
>the out put files plus control and error output on the FTP site that you
>provided
>The user ID is borhanh
>
>I used blast+ for this run.
>
>
>
>
>Regards
>
>
>HB
>
>
>
>
>
>
>
>
>On 14-03-13 10:00 AM, "Carson Holt" <carson.holt at genetics.utah.edu> wrote:
>
>>Just resending this to the correct maker-devel address.  Please when
>>replying, do not CC the incorrect maker-devel-bounce address.
>>
>>Thanks,
>>Carson
>>
>>
>>On 3/13/14, 9:56 AM, "Carson Holt" <carson.holt at genetics.utah.edu> wrote:
>>
>>>FGENESH is not a heavily used tool, so depending on which version it is
>>>(either too old or too new), output might be slightly different which
>>>could cause incorrect parsing. Could you tar up your maker.output
>>>folder,
>>>and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>>>(send me either your user/guest ID after you upload).
>>>
>>>For the BLAST error, use BLAST+ instead.  You are using blastall which
>>>is
>>>the old legacy version of NCBI BLAST.  You can do this by setting the
>>>blast type in maker_bopts.ctl and the location of executables in
>>>maker_exe.ctl.
>>>
>>>Thanks,
>>>Carson
>>>
>>>
>>>
>>>On 3/12/14, 11:58 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA>
>>>wrote:
>>>
>>>>Dear Maker users
>>>>
>>>>
>>>>I ran maker (2.31) on a fungal genome and found out that it inserted
>>>>the
>>>>word SCLAR   followed by a pair of bracket like this (0x22de7020)
>>>>inserted in the nucleotide sequence of some of the genes. This seems to
>>>>be related to transcripts predicted by fgenesh_masked.
>>>>
>>>>
>>>>Here is an example for one of the genes
>>>>
>>>>
>>>>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript
>>>>>offset:0 AE
>>>>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651
>>>>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23
>>>>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA
>>>>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG
>>>>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC
>>>>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT
>>>>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC
>>>>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT
>>>>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA
>>>>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA
>>>>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT
>>>>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT
>>>>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC
>>>>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG
>>>>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG
>>>>TTTCGACAAGC
>>>>
>>>>The same genome sequence was used for the first round of maker (2.10)
>>>>without such problem. I checked the sequence for the scaffold related
>>>>to
>>>>one of the affected transcripts and there was no error in the sequence.
>>>>I am not sure what is causing this. The only error that I could spot in
>>>>the output error file is the following
>>>>
>>>>
>>>>[blastall] FATAL ERROR:  search cannot proceed due to errors in all
>>>>contexts/frames of query sequences.
>>>>
>>>>
>>>>
>>>>Your help is appreciated
>>>>
>>>>
>>>>
>>>>HB
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: blastn.pm
Type: text/x-perl-script
Size: 8112 bytes
Desc: blastn.pm
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140317/e73c4b0f/attachment-0012.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: blastx.pm
Type: text/x-perl-script
Size: 8218 bytes
Desc: blastx.pm
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140317/e73c4b0f/attachment-0013.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fgenesh.pm
Type: text/x-perl-script
Size: 19744 bytes
Desc: fgenesh.pm
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140317/e73c4b0f/attachment-0014.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tblastx.pm
Type: text/x-perl-script
Size: 9113 bytes
Desc: tblastx.pm
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140317/e73c4b0f/attachment-0015.bin>

From carsonhh at gmail.com  Mon Mar 17 15:14:42 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 17 Mar 2014 15:14:42 -0600
Subject: [maker-devel] Error when running maker2zff script
In-Reply-To: <C9394A0F-A682-4249-80DD-D79E45AE18EA@gmail.com>
References: <C9394A0F-A682-4249-80DD-D79E45AE18EA@gmail.com>
Message-ID: <CF4CBEAF.ADA3%carsonhh@gmail.com>

Just an update on this.  I?ve fixed the maker2zff script to handle the
issues seen.  Looking at this actually brought to light another issue.
There is inconsistent escape character specification for GFF3 in column 1
(the source ID), column 8 (the attributes ID and Target_ID), as well as
the FASTA ID for internal sequence.  We?re updating the GFF3 spec to
clarify this so that everywhere you see the same ID getting treated the
same way for character escaping.
 
To be safe though, only use these characters in your contig IDs for the
assembly when using any tool that reads or outputs GFF3 ?>
a-zA-Z0-9.:^*$@!+_?-|

Any character not in that set has a high chance of breaking some
downstream tool.  For now just assume the strict interpretation from the
GFF3 spec for column 1, must be used on all IDs everywhere (see below).

>>Column 1: ?seqid"
>>The ID of the landmark used to establish the coordinate system for the
>>current feature.
>>IDs may contain any characters, but must escape any characters not in
>>the set [a-zA-Z0-9.:^*$@!+_?-|].
>>In particular, IDs may not contain unescaped whitespace and must not
>>begin with an unescaped ">".


Thanks,
Carson


On 3/13/14, 7:35 PM, "dhivya arasappan" <darasappan at gmail.com> wrote:

>Hi Carson,
>
>I used gff3_merge to create my gff file from maker output. I've
>attached it here. But when I run maker2zff on it, I get the following
>error:
>
>Can't use an undefined value as an ARRAY reference at /opt/apps/maker/
>2.30/bin/maker2zff line 177, <GFF> line 7294251.
>
>It produces an incomplete output file and it looks like it may be
>running into problems when it encounters scaffold3%2F0.  I'm wondering
>if its having problems with my scaffold names. There seem to be some
>inconsistencies because it's referred to as  scaffold3%F0 and
>scaffold3/0 in the gff file.  It goes through other scaffolds like
>SCAFFOLD3_873, SCAFFOLD3_95 etc just fine.   I did try replacing the
>scaffold names in the gff file, but still get the same error.   Any
>ideas?
>
>Substitution command I used, for your reference:  sed 's/3\%2F/3_/g'
>gfffile| sed 's/\//\_/'  > mod.gfffile
>
>Thanks
>Dhivya
>


From darasappan at gmail.com  Mon Mar 17 15:20:18 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Mon, 17 Mar 2014 16:20:18 -0500
Subject: [maker-devel] Error when running maker2zff script
In-Reply-To: <CF4CBEAF.ADA3%carsonhh@gmail.com>
References: <C9394A0F-A682-4249-80DD-D79E45AE18EA@gmail.com>
	<CF4CBEAF.ADA3%carsonhh@gmail.com>
Message-ID: <CAGWaY_61EFs28=2dThqjgnkeisCXjad7JM72ews-fkTn0v7FCA@mail.gmail.com>

Awesome! Thanks Carson.

Dhivya


On Mon, Mar 17, 2014 at 4:14 PM, Carson Holt <carsonhh at gmail.com> wrote:

> Just an update on this.  I've fixed the maker2zff script to handle the
> issues seen.  Looking at this actually brought to light another issue.
> There is inconsistent escape character specification for GFF3 in column 1
> (the source ID), column 8 (the attributes ID and Target_ID), as well as
> the FASTA ID for internal sequence.  We're updating the GFF3 spec to
> clarify this so that everywhere you see the same ID getting treated the
> same way for character escaping.
>
> To be safe though, only use these characters in your contig IDs for the
> assembly when using any tool that reads or outputs GFF3 -->
> a-zA-Z0-9.:^*$@!+_?-|
>
> Any character not in that set has a high chance of breaking some
> downstream tool.  For now just assume the strict interpretation from the
> GFF3 spec for column 1, must be used on all IDs everywhere (see below).
>
> >>Column 1: "seqid"
> >>The ID of the landmark used to establish the coordinate system for the
> >>current feature.
> >>IDs may contain any characters, but must escape any characters not in
> >>the set [a-zA-Z0-9.:^*$@!+_?-|].
> >>In particular, IDs may not contain unescaped whitespace and must not
> >>begin with an unescaped ">".
>
>
> Thanks,
> Carson
>
>
>
> On 3/13/14, 7:35 PM, "dhivya arasappan" <darasappan at gmail.com> wrote:
>
> >Hi Carson,
> >
> >I used gff3_merge to create my gff file from maker output. I've
> >attached it here. But when I run maker2zff on it, I get the following
> >error:
> >
> >Can't use an undefined value as an ARRAY reference at /opt/apps/maker/
> >2.30/bin/maker2zff line 177, <GFF> line 7294251.
> >
> >It produces an incomplete output file and it looks like it may be
> >running into problems when it encounters scaffold3%2F0.  I'm wondering
> >if its having problems with my scaffold names. There seem to be some
> >inconsistencies because it's referred to as  scaffold3%F0 and
> >scaffold3/0 in the gff file.  It goes through other scaffolds like
> >SCAFFOLD3_873, SCAFFOLD3_95 etc just fine.   I did try replacing the
> >scaffold names in the gff file, but still get the same error.   Any
> >ideas?
> >
> >Substitution command I used, for your reference:  sed 's/3\%2F/3_/g'
> >gfffile| sed 's/\//\_/'  > mod.gfffile
> >
> >Thanks
> >Dhivya
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140317/7b1247b8/attachment-0003.html>

From marc.hoeppner at bils.se  Tue Mar 18 05:43:43 2014
From: marc.hoeppner at bils.se (=?windows-1252?Q?Marc_H=F6ppner?=)
Date: Tue, 18 Mar 2014 12:43:43 +0100
Subject: [maker-devel] Maker changes 2.30-2.31
Message-ID: <92E3B1D1-092C-45CF-9DD6-1A5E6921FC15@bils.se>

Hi,

I have observed a few oddities with our installation of maker 2.31 and was therefore wondering if there is a change log somewhere to get some information on what, if anything, was changed between 2.30 and 2.31?

There is of course a good chance that the issues I am seeing (pipeline locking up) are related to our setup and not necessarily Maker - but I?d  like to make sure, if possible. Both versions use the exact same external binaries etc, and were run on the same data. 2.30 is running along happily, 2.31 however has randomly locked up. I should perhaps also say that I am running on SL 6.2 and am using mpich2 for the MPI run. 

I haven?t done any more systematic testing so far, but will probably do so if there is no ?obvious? reason why Maker 2.31 should behave differently..

Cheers,

Marc


Marc P. Hoeppner, PhD
Department for Medical Biochemistry and Microbiology
Uppsala University, Sweden
marc.hoeppner at bils.se


From carsonhh at gmail.com  Tue Mar 18 09:07:07 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 18 Mar 2014 09:07:07 -0600
Subject: [maker-devel] Maker changes 2.30-2.31
In-Reply-To: <92E3B1D1-092C-45CF-9DD6-1A5E6921FC15@bils.se>
References: <92E3B1D1-092C-45CF-9DD6-1A5E6921FC15@bils.se>
Message-ID: <CF4DBC09.ADE0%carsonhh@gmail.com>

Attached.  Also make sure you are using the tar ball from the lab website
and not the prerelease from the subversion repository.

Thanks,
Carson


On 3/18/14, 5:43 AM, "Marc H?ppner" <marc.hoeppner at bils.se> wrote:

>Hi,
>
>I have observed a few oddities with our installation of maker 2.31 and
>was therefore wondering if there is a change log somewhere to get some
>information on what, if anything, was changed between 2.30 and 2.31?
>
>There is of course a good chance that the issues I am seeing (pipeline
>locking up) are related to our setup and not necessarily Maker - but I?d
>like to make sure, if possible. Both versions use the exact same external
>binaries etc, and were run on the same data. 2.30 is running along
>happily, 2.31 however has randomly locked up. I should perhaps also say
>that I am running on SL 6.2 and am using mpich2 for the MPI run.
>
>I haven?t done any more systematic testing so far, but will probably do
>so if there is no ?obvious? reason why Maker 2.31 should behave
>differently..
>
>Cheers,
>
>Marc
>
>
>
>
>Marc P. Hoeppner, PhD
>Department for Medical Biochemistry and Microbiology
>Uppsala University, Sweden
>marc.hoeppner at bils.se
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
r1060 | cholt | 2013-11-04 11:18:12 -0700 (Mon, 04 Nov 2013) | MAKER stable release version 2.30
r1061 | cholt | 2013-11-10 22:19:51 -0700 (Sun, 10 Nov 2013) | altered build install slightly
r1062 | cholt | 2013-11-25 09:33:16 -0700 (Mon, 25 Nov 2013) | updated fgenesh for hint based annotation error
r1063 | cholt | 2013-12-05 14:10:42 -0700 (Thu, 05 Dec 2013) | fix repeat too short output error
r1064 | cholt | 2013-12-05 14:18:04 -0700 (Thu, 05 Dec 2013) | updated installation scripts
r1065 | cholt | 2013-12-13 08:42:08 -0700 (Fri, 13 Dec 2013) | fix fully masked failure for BLAST 2.2.25
r1066 | cholt | 2014-01-09 10:45:08 -0700 (Thu, 09 Jan 2014) | update MWAS and maker2jbrowse
r1067 | cholt | 2014-01-09 11:34:18 -0700 (Thu, 09 Jan 2014) | fix invalid character in Ecoli example fasta
r1068 | cholt | 2014-01-24 10:42:15 -0700 (Fri, 24 Jan 2014) | added iprscan to maker.css for MWAS
r1070 | cholt | 2014-01-26 20:27:52 -0700 (Sun, 26 Jan 2014) | attempt to fix ipr_update issues with Name ne to ID and fix lock with GFF3DB as well as docs for JBrowse and MAKER install
r1071 | cholt | 2014-01-26 20:41:55 -0700 (Sun, 26 Jan 2014) | alter install to hide MWAS fix skip of small contigs and map forward of genes with est_forward
r1072 | cholt | 2014-01-28 11:20:41 -0700 (Tue, 28 Jan 2014) | added message to get user to use the correct maker executable and updated INSTALL docs
r1073 | cholt | 2014-01-28 11:36:19 -0700 (Tue, 28 Jan 2014) | further update to maker from wrong directory message when name has whitespace
r1074 | cholt | 2014-02-03 14:48:05 -0700 (Mon, 03 Feb 2014) | fixed segfault on exit for OpenMPI
r1075 | cholt | 2014-02-03 15:32:38 -0700 (Mon, 03 Feb 2014) | added support for optional test compiler flags to be used with MVAPICH2
r1076 | cholt | 2014-02-03 15:38:52 -0700 (Mon, 03 Feb 2014) | fixed build commit missing m option
r1077 | cholt | 2014-02-04 14:29:43 -0700 (Tue, 04 Feb 2014) | made MPI communication always serialize
r1078 | cholt | 2014-02-05 11:23:10 -0700 (Wed, 05 Feb 2014) | updated MPI calling to use probe for size rather than another message for faster performance
r1079 | cholt | 2014-02-06 08:29:45 -0700 (Thu, 06 Feb 2014) | fixed labeling bug, fixed hanging MPI calls, fixed trnascan introns, and length
r1080 | cholt | 2014-02-11 10:08:33 -0700 (Tue, 11 Feb 2014) | switch FindBin::Bin for FindBin::RealBin throughout
r1081 | cholt | 2014-02-11 10:49:24 -0700 (Tue, 11 Feb 2014) | MAKER stable release version 2.31

From fbarreto at ucsd.edu  Tue Mar 18 10:08:47 2014
From: fbarreto at ucsd.edu (Felipe Barreto)
Date: Tue, 18 Mar 2014 09:08:47 -0700
Subject: [maker-devel] Size of initial EST training set for SNAP
Message-ID: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>

Hi, all,

I've been learning a lot from reading posts from this group, and finally
started doing actual runs of Maker on our current genome assembly
(arthropod, genome size ~230Mb).  I started by training SNAP, but would
like to check my approach before continuing with longer runs.

>From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I
deemed of very high quality based on blast alignments to Swiss-Prot (based
on query-subject coverage, bit score, etc).  I then used only these 2000
ESTs in a first Maker run using est2genome=1.  The output returned 1500
models (with the 500 "missing" models probably a result of single-exon
issues; not a concern at this point).

I now plan on training SNAP with this first output, and then doing another
Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein
evidence, and 3) SNAP with the first HMM file.  The output of this second
run will be used to re-train SNAP, and this second HMM file will be used in
a final "official" run (while continuing to provide the EST and protein
evidence, of course).

Does this sound like a reasonable approach?  Simply put, my main concern is
whether I'm using too few ESTs in my first est2genome step.

Thanks for any insight!

-- 
Felipe Barreto
Post-doctoral Scholar
Scripps Institution of Oceanography
University of California, San Diego
La Jolla, CA 92093
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/c8c3b2ba/attachment-0003.html>

From carsonhh at gmail.com  Tue Mar 18 10:14:29 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 18 Mar 2014 10:14:29 -0600
Subject: [maker-devel] Size of initial EST training set for SNAP
In-Reply-To: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>
References: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>
Message-ID: <CF4DCCE1.ADEE%carsonhh@gmail.com>

That sounds good.  1,500 initial models should be more than sufficient for
the first round of training.

?Carson


From:  Felipe Barreto <fbarreto at ucsd.edu>
Date:  Tuesday, March 18, 2014 at 10:08 AM
To:  MAKER group <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Size of initial EST training set for SNAP

Hi, all,

I've been learning a lot from reading posts from this group, and finally
started doing actual runs of Maker on our current genome assembly
(arthropod, genome size ~230Mb).  I started by training SNAP, but would like
to check my approach before continuing with longer runs.

>From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I
deemed of very high quality based on blast alignments to Swiss-Prot (based
on query-subject coverage, bit score, etc).  I then used only these 2000
ESTs in a first Maker run using est2genome=1.  The output returned 1500
models (with the 500 "missing" models probably a result of single-exon
issues; not a concern at this point).

I now plan on training SNAP with this first output, and then doing another
Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein
evidence, and 3) SNAP with the first HMM file.  The output of this second
run will be used to re-train SNAP, and this second HMM file will be used in
a final "official" run (while continuing to provide the EST and protein
evidence, of course).

Does this sound like a reasonable approach?  Simply put, my main concern is
whether I'm using too few ESTs in my first est2genome step.

Thanks for any insight!

-- 
Felipe Barreto
Post-doctoral Scholar
Scripps Institution of Oceanography
University of California, San Diego
La Jolla, CA 92093 
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/2cd5fce1/attachment-0003.html>

From dence at genetics.utah.edu  Tue Mar 18 10:16:20 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Tue, 18 Mar 2014 16:16:20 +0000
Subject: [maker-devel] Size of initial EST training set for SNAP
In-Reply-To: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>
References: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6E483@mxb2.hg.genetics.utah.edu>

Hi Felipe,

I think 1500 models sounds like a good size set with which to train SNAP. I think that SNAP expects ~1000 models for training.

The only other comment on the approach is perhaps that using only one ab-initio predictor is a little bit risky. Using multiple predictors would allow MAKER to select from among their different models for the one that best fits the evidence.

Good luck and let us know if there's anything we can help with!

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Felipe Barreto [fbarreto at ucsd.edu]
Sent: Tuesday, March 18, 2014 10:08 AM
To: MAKER group
Subject: [maker-devel] Size of initial EST training set for SNAP

Hi, all,

I've been learning a lot from reading posts from this group, and finally started doing actual runs of Maker on our current genome assembly (arthropod, genome size ~230Mb).  I started by training SNAP, but would like to check my approach before continuing with longer runs.

>From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I deemed of very high quality based on blast alignments to Swiss-Prot (based on query-subject coverage, bit score, etc).  I then used only these 2000 ESTs in a first Maker run using est2genome=1.  The output returned 1500 models (with the 500 "missing" models probably a result of single-exon issues; not a concern at this point).

I now plan on training SNAP with this first output, and then doing another Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein evidence, and 3) SNAP with the first HMM file.  The output of this second run will be used to re-train SNAP, and this second HMM file will be used in a final "official" run (while continuing to provide the EST and protein evidence, of course).

Does this sound like a reasonable approach?  Simply put, my main concern is whether I'm using too few ESTs in my first est2genome step.

Thanks for any insight!

--
Felipe Barreto
Post-doctoral Scholar
Scripps Institution of Oceanography
University of California, San Diego
La Jolla, CA 92093
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/b9bf5ff0/attachment-0003.html>

From barry.utah at gmail.com  Tue Mar 18 10:26:45 2014
From: barry.utah at gmail.com (Barry Moore)
Date: Tue, 18 Mar 2014 10:26:45 -0600
Subject: [maker-devel] Size of initial EST training set for SNAP
In-Reply-To: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>
References: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>
Message-ID: <02A2F388-D911-4C73-BF34-47A125A62EE5@genetics.utah.edu>

Hi Felipe,

I think that plan sounds quite reasonable.  To address your primary concern, most gene prediction tools recommend something in the range of a minimum of a few hundred gene models to train on.  Since your an order of magnitude above that I think your in good shape.  Having said that, of course if you have concerns about biases in your training set you may be able to supplement it further by using a tool like CEGMA (http://korflab.ucdavis.edu/datasets/cegma/) to include high confidence genes that your set is missing.

Since the final gene set will only be as complete as the gene predictions that MAKER has to choose from I would suggest that you also consider including at least one other gene predictor.  Augustus works well on a wide variety of genomes and while it is more difficult to train than SNAP it does accept hints from MAKER and will likely add to the diversity of the final gene set, even if you choose to use an existing HMM that has some reasonable relationship to your genome.  This is one of the advantages of MAKER supervision, while it would be best to train Augustus as well, MAKER will ensure that the final models are not too far out of line with the evidence and you'll likely see quite good results using a custom SNAP HMM and an existing Augustus HMM as predictor within MAKER.

Thanks,

B

On Mar 18, 2014, at 10:08 AM, Felipe Barreto wrote:

> Hi, all,
> 
> I've been learning a lot from reading posts from this group, and finally started doing actual runs of Maker on our current genome assembly (arthropod, genome size ~230Mb).  I started by training SNAP, but would like to check my approach before continuing with longer runs.  
> 
> From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I deemed of very high quality based on blast alignments to Swiss-Prot (based on query-subject coverage, bit score, etc).  I then used only these 2000 ESTs in a first Maker run using est2genome=1.  The output returned 1500 models (with the 500 "missing" models probably a result of single-exon issues; not a concern at this point).
> 
> I now plan on training SNAP with this first output, and then doing another Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein evidence, and 3) SNAP with the first HMM file.  The output of this second run will be used to re-train SNAP, and this second HMM file will be used in a final "official" run (while continuing to provide the EST and protein evidence, of course).
> 
> Does this sound like a reasonable approach?  Simply put, my main concern is whether I'm using too few ESTs in my first est2genome step.
> 
> Thanks for any insight!
> 
> -- 
> Felipe Barreto
> Post-doctoral Scholar
> Scripps Institution of Oceanography
> University of California, San Diego
> La Jolla, CA 92093
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/94293e29/attachment-0003.html>

From fbarreto at ucsd.edu  Tue Mar 18 10:59:39 2014
From: fbarreto at ucsd.edu (Felipe Barreto)
Date: Tue, 18 Mar 2014 09:59:39 -0700
Subject: [maker-devel] Size of initial EST training set for SNAP
In-Reply-To: <02A2F388-D911-4C73-BF34-47A125A62EE5@genetics.utah.edu>
References: <CAOi0ENbn=mb8d8ppOwNL9rMCUE6bpFkrAm61m1xn-HLQfDFPAg@mail.gmail.com>
	<02A2F388-D911-4C73-BF34-47A125A62EE5@genetics.utah.edu>
Message-ID: <CAOi0ENYUcJFJsg0nDj3-9if0E96N+UY=vPyJkfH0T4xvFYOQ3w@mail.gmail.com>

Thanks, guys, for the swift and informative response!  I will try to train
Augustus again, but at the very least, will include it with an arthropod
HMM in my final run (in addition to my custom SNAP HMM).

Cheers,

Felipe


On Tue, Mar 18, 2014 at 9:26 AM, Barry Moore <barry.utah at gmail.com> wrote:

> Hi Felipe,
>
> I think that plan sounds quite reasonable.  To address your primary
> concern, most gene prediction tools recommend something in the range of a
> minimum of a few hundred gene models to train on.  Since your an order of
> magnitude above that I think your in good shape.  Having said that, of
> course if you have concerns about biases in your training set you may be
> able to supplement it further by using a tool like CEGMA (
> http://korflab.ucdavis.edu/datasets/cegma/) to include high confidence
> genes that your set is missing.
>
> Since the final gene set will only be as complete as the gene predictions
> that MAKER has to choose from I would suggest that you also consider
> including at least one other gene predictor.  Augustus works well on a wide
> variety of genomes and while it is more difficult to train than SNAP it
> does accept hints from MAKER and will likely add to the diversity of the
> final gene set, even if you choose to use an existing HMM that has some
> reasonable relationship to your genome.  This is one of the advantages of
> MAKER supervision, while it would be best to train Augustus as well, MAKER
> will ensure that the final models are not too far out of line with the
> evidence and you'll likely see quite good results using a custom SNAP HMM
> and an existing Augustus HMM as predictor within MAKER.
>
> Thanks,
>
> B
>
> On Mar 18, 2014, at 10:08 AM, Felipe Barreto wrote:
>
> Hi, all,
>
> I've been learning a lot from reading posts from this group, and finally
> started doing actual runs of Maker on our current genome assembly
> (arthropod, genome size ~230Mb).  I started by training SNAP, but would
> like to check my approach before continuing with longer runs.
>
> From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I
> deemed of very high quality based on blast alignments to Swiss-Prot (based
> on query-subject coverage, bit score, etc).  I then used only these 2000
> ESTs in a first Maker run using est2genome=1.  The output returned 1500
> models (with the 500 "missing" models probably a result of single-exon
> issues; not a concern at this point).
>
> I now plan on training SNAP with this first output, and then doing another
> Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein
> evidence, and 3) SNAP with the first HMM file.  The output of this second
> run will be used to re-train SNAP, and this second HMM file will be used in
> a final "official" run (while continuing to provide the EST and protein
> evidence, of course).
>
> Does this sound like a reasonable approach?  Simply put, my main concern
> is whether I'm using too few ESTs in my first est2genome step.
>
> Thanks for any insight!
>
> --
> Felipe Barreto
> Post-doctoral Scholar
> Scripps Institution of Oceanography
> University of California, San Diego
> La Jolla, CA 92093
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
> Barry Moore
> Research Scientist
> Dept. of Human Genetics
> University of Utah
> Salt Lake City, UT 84112
> --------------------------------------------
> (801) 585-3543
>
>
>
>
>


-- 
Felipe Barreto
Post-doctoral Scholar
Scripps Institution of Oceanography
University of California, San Diego
La Jolla, CA 92093
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/f95daccd/attachment-0003.html>

From darasappan at gmail.com  Tue Mar 18 13:27:11 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Tue, 18 Mar 2014 14:27:11 -0500
Subject: [maker-devel] maker snap output files
Message-ID: <F88D0CA1-15E3-4E01-881F-4F697044B1FC@gmail.com>

Hello,

I ran maker after running SNAP ab initio prediction (following instructions from the maker tutorial).  It ran successfully and when I ran fasta_merge, I got several output fasta files. I?m unable to find information on the tutorial about interpreting these different files. I?m hoping one of you can help.

*maker.proteins.fasta
*maker.snap_masked.proteins.fasta
*maker.non_overlapping_ab_initio.proteins.fasta

What is the difference among these? They all have different number of sequences.

Similarly,with transcripts:

maker.non_overlapping_ab_initio.transcripts.fasta
maker.snap_masked.transcripts.fasta
maker.transcripts.fasta

Thanks
Dhivya


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/93fd247e/attachment-0003.html>

From carsonhh at gmail.com  Tue Mar 18 13:34:05 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 18 Mar 2014 13:34:05 -0600
Subject: [maker-devel] maker snap output files
In-Reply-To: <F88D0CA1-15E3-4E01-881F-4F697044B1FC@gmail.com>
References: <F88D0CA1-15E3-4E01-881F-4F697044B1FC@gmail.com>
Message-ID: <CF4DFA69.AE2E%carsonhh@gmail.com>

maker.proteins.fasta - these are the final filtered and modified protein
models (this is what you want)
maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab
initio predictions (for reference purposes)
maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant
rejected models that do not overlap the maker.proteins.fasta entries. If you
think you are missing a gene, look for it here.  Sometimes people use
interproscan (very slow) to analyze this file for false negatives.


These files are also described in the README distributed with MAKER in the
?MAKER OUTPUT? section.

Thanks,
Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Tuesday, March 18, 2014 at 1:27 PM
To:  Carson Holt <carsonhh at gmail.com>, <maker-devel at yandell-lab.org>
Subject:  maker snap output files

Hello,

I ran maker after running SNAP ab initio prediction (following instructions
from the maker tutorial).  It ran successfully and when I ran fasta_merge, I
got several output fasta files. I?m unable to find information on the
tutorial about interpreting these different files. I?m hoping one of you can
help.

*maker.proteins.fasta
*maker.snap_masked.proteins.fasta
*maker.non_overlapping_ab_initio.proteins.fasta

What is the difference among these? They all have different number of
sequences.

Similarly,with transcripts:

maker.non_overlapping_ab_initio.transcripts.fasta
maker.snap_masked.transcripts.fasta
maker.transcripts.fasta

Thanks
Dhivya


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/5d1345f9/attachment-0003.html>

From darasappan at gmail.com  Tue Mar 18 14:05:39 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Tue, 18 Mar 2014 15:05:39 -0500
Subject: [maker-devel] maker snap output files
In-Reply-To: <CF4DFA69.AE2E%carsonhh@gmail.com>
References: <F88D0CA1-15E3-4E01-881F-4F697044B1FC@gmail.com>
	<CF4DFA69.AE2E%carsonhh@gmail.com>
Message-ID: <05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com>

Thanks Carson.

Is it normal that in my maker results after running snap, the number of proteins (in *maker.proteins.fasta) Is actually less than the number of proteins in my pre-snap maker results?  I assumed that annotations through alignment+annotation through prediction would equal more annotations?

The unfiltered proteins file has more proteins though.

Thanks
Dhivya


On Mar 18, 2014, at 2:34 PM, Carson Holt <carsonhh at gmail.com> wrote:

> maker.proteins.fasta - these are the final filtered and modified protein models (this is what you want)
> maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab initio predictions (for reference purposes)
> maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant rejected models that do not overlap the maker.proteins.fasta entries. If you think you are missing a gene, look for it here.  Sometimes people use interproscan (very slow) to analyze this file for false negatives.
> 
> 
> These files are also described in the README distributed with MAKER in the ?MAKER OUTPUT? section.
> 
> Thanks,
> Carson
> 
> 
> 
> 
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Tuesday, March 18, 2014 at 1:27 PM
> To: Carson Holt <carsonhh at gmail.com>, <maker-devel at yandell-lab.org>
> Subject: maker snap output files
> 
> Hello,
> 
> I ran maker after running SNAP ab initio prediction (following instructions from the maker tutorial).  It ran successfully and when I ran fasta_merge, I got several output fasta files. I?m unable to find information on the tutorial about interpreting these different files. I?m hoping one of you can help.
> 
> *maker.proteins.fasta
> *maker.snap_masked.proteins.fasta
> *maker.non_overlapping_ab_initio.proteins.fasta
> 
> What is the difference among these? They all have different number of sequences.
> 
> Similarly,with transcripts:
> 
> maker.non_overlapping_ab_initio.transcripts.fasta
> maker.snap_masked.transcripts.fasta
> maker.transcripts.fasta
> 
> Thanks
> Dhivya
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/8f85193d/attachment-0003.html>

From carsonhh at gmail.com  Tue Mar 18 14:09:01 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 18 Mar 2014 14:09:01 -0600
Subject: [maker-devel] maker snap output files
In-Reply-To: <05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com>
References: <F88D0CA1-15E3-4E01-881F-4F697044B1FC@gmail.com>
	<CF4DFA69.AE2E%carsonhh@gmail.com>
	<05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com>
Message-ID: <CF4E0363.AE3D%carsonhh@gmail.com>

There can also be hint based predictions.  They may be similar in size, but
there is no rule.  Generally maker.snap_masked.proteins.fasta will be
larger, as gene predictors tend to over predict (as much as 10 fold).  You
should always review your annotations in something like Apollo, to see how
the models compare to the evidence.  Just counts don?t really mean anything.

Thanks,
Carson

From:  dhivya arasappan <darasappan at gmail.com>
Date:  Tuesday, March 18, 2014 at 2:05 PM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  <maker-devel at yandell-lab.org>
Subject:  Re: maker snap output files

Thanks Carson.

Is it normal that in my maker results after running snap, the number of
proteins (in *maker.proteins.fasta) Is actually less than the number of
proteins in my pre-snap maker results?  I assumed that annotations through
alignment+annotation through prediction would equal more annotations?

The unfiltered proteins file has more proteins though.

Thanks
Dhivya


On Mar 18, 2014, at 2:34 PM, Carson Holt <carsonhh at gmail.com> wrote:

> maker.proteins.fasta - these are the final filtered and modified protein
> models (this is what you want)
> maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab initio
> predictions (for reference purposes)
> maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant
> rejected models that do not overlap the maker.proteins.fasta entries. If you
> think you are missing a gene, look for it here.  Sometimes people use
> interproscan (very slow) to analyze this file for false negatives.
> 
> 
> These files are also described in the README distributed with MAKER in the
> ?MAKER OUTPUT? section.
> 
> Thanks,
> Carson
> 
> 
> 
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Tuesday, March 18, 2014 at 1:27 PM
> To:  Carson Holt <carsonhh at gmail.com>, <maker-devel at yandell-lab.org>
> Subject:  maker snap output files
> 
> Hello,
> 
> I ran maker after running SNAP ab initio prediction (following instructions
> from the maker tutorial).  It ran successfully and when I ran fasta_merge, I
> got several output fasta files. I?m unable to find information on the tutorial
> about interpreting these different files. I?m hoping one of you can help.
> 
> *maker.proteins.fasta
> *maker.snap_masked.proteins.fasta
> *maker.non_overlapping_ab_initio.proteins.fasta
> 
> What is the difference among these? They all have different number of
> sequences.
> 
> Similarly,with transcripts:
> 
> maker.non_overlapping_ab_initio.transcripts.fasta
> maker.snap_masked.transcripts.fasta
> maker.transcripts.fasta
> 
> Thanks
> Dhivya
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/f5d761ca/attachment-0003.html>

From chrisbioinfo at gmail.com  Wed Mar 19 05:09:57 2014
From: chrisbioinfo at gmail.com (Chris Bioinfo)
Date: Wed, 19 Mar 2014 12:09:57 +0100
Subject: [maker-devel] Annotation with maker2
Message-ID: <CAF+kvSZO+VzHveN+WNmD3O8qayyrOFATS7VA2c-wLdGs1m4iTw@mail.gmail.com>

Hello,

I'm installing/using maker2 for the first time and I have an error by using
it.

I certainly missing something, but I don't know what.

I compile maker with no error message and I have all these directories
after compilation:
bin  data  GMOD  INSTALL  lib  LICENSE  MWAS  perl  README  src

Nevertheless when I try maker2 on the test data (dpp_contig.fasta) I have
this error:

STATUS: Now running MAKER...
examining contents of the fasta file and run log


--Next Contig--

#---------------------------------------------------------------------
Now starting the contig!!
SeqID: contig-dpp-500-500
Length: 32156
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
doing repeat masking
DBI connect('dbname=/path/to/dpp_contig.maker.output/dpp_contig.db','',...)
failed: unable to open database file at
/usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm

Can't call method "do" on an undefined value at
/usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm
--> rank=NA, hostname=belem
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:contig-dpp-500-500
...

ideas?

Best,

Christelle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140319/f54e5d3c/attachment-0003.html>

From carsonhh at gmail.com  Wed Mar 19 07:01:35 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 19 Mar 2014 07:01:35 -0600
Subject: [maker-devel] Annotation with maker2
In-Reply-To: <CAF+kvSZO+VzHveN+WNmD3O8qayyrOFATS7VA2c-wLdGs1m4iTw@mail.gmail.com>
References: <CAF+kvSZO+VzHveN+WNmD3O8qayyrOFATS7VA2c-wLdGs1m4iTw@mail.gmail.com>
Message-ID: <CF4EF035.AE6F%carsonhh@gmail.com>

Your problem is one of the following.  You need to reinstall the DBD::SQLite
module, you are running in a directory you don?t have permissions for, you
set your TMDIR environmental variable or TMP value in maker_opts.ctl to an
NFS mounted or memory mounted directory, or you are using a self compiled
version of Perl (I.e. not /usr/bin/perl) that has issues (probably with DB
or SQLite modules).  You can also completely delete the output directory,
and start again to see if it was just a random error.  You should look at
each of those first.  You can also run MAKER with the --debug command line
flag and send it to me if all of those seem not to be the issue.

Thanks,
Carson


From:  Chris Bioinfo <chrisbioinfo at gmail.com>
Date:  Wednesday, March 19, 2014 at 5:09 AM
To:  <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Annotation with maker2

Hello,

I'm installing/using maker2 for the first time and I have an error by using
it.

I certainly missing something, but I don't know what.

I compile maker with no error message and I have all these directories after
compilation: 
bin  data  GMOD  INSTALL  lib  LICENSE  MWAS  perl  README  src

Nevertheless when I try maker2 on the test data (dpp_contig.fasta) I have
this error:

STATUS: Now running MAKER...
examining contents of the fasta file and run log


--Next Contig--

#---------------------------------------------------------------------
Now starting the contig!!
SeqID: contig-dpp-500-500
Length: 32156
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
doing repeat masking
DBI connect('dbname=/path/to/dpp_contig.maker.output/dpp_contig.db','',...)
failed: unable to open database file at
/usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm

Can't call method "do" on an undefined value at
/usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm
--> rank=NA, hostname=belem
ERROR: Failed while doing repeat masking
ERROR: Chunk failed at level:0, tier_type:1
FAILED CONTIG:contig-dpp-500-500
...

ideas?

Best,

Christelle

_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140319/66e7fe68/attachment-0003.html>

From rbharris at uw.edu  Wed Mar 19 19:19:27 2014
From: rbharris at uw.edu (Rebecca Harris)
Date: Wed, 19 Mar 2014 18:19:27 -0700
Subject: [maker-devel] tradeoff between run time & file number
Message-ID: <CAESS274qd5dL9apLh3sobjkz0+vwjVa9j0Ytd5dR-Qrb4av+=Q@mail.gmail.com>

Hi -

I'm running maker on a dataset of >400,000 scaffolds with MPI -n 64. I've
gone through it once - and used the clean_up option because otherwise maker
exceeds the clusters file_quote. However, now I'm retraining SNAP and it is
taking a very long time - probably because it has to go through BLAST
again. Is there anyway of getting around this? I expect I may have to train
SNAP and rerun maker multiple times and it is taking about 3 weeks to get
through my dataset. Is there a way to prune down my original dataset based
on maker's output?

Thanks,
Rebecca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140319/80de6463/attachment-0003.html>

From dence at genetics.utah.edu  Wed Mar 19 23:43:11 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Thu, 20 Mar 2014 05:43:11 +0000
Subject: [maker-devel] tradeoff between run time & file number
In-Reply-To: <CAESS274qd5dL9apLh3sobjkz0+vwjVa9j0Ytd5dR-Qrb4av+=Q@mail.gmail.com>
References: <CAESS274qd5dL9apLh3sobjkz0+vwjVa9j0Ytd5dR-Qrb4av+=Q@mail.gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6F524@mxb2.hg.genetics.utah.edu>

Hi Rebecca, So, as far as pruning down the dataset goes, I think that the biggest gains will be made by trimming the number of scaffolds that you annotate. What is the n50 of your 400,000 scaffold set? Usually, scaffolds shorter than 5k or 10kbp won't contribute much to the gene counts in the end.

Also, if you can, try to avoid using the alt_est option. It works completely fine, but blasting those sequences takes much longer than blastn or blastp.

Otherwise, I'd need to see your maker_opts.ctl file to see how you've got things set up. You can attach those to your reply (to the maker-devel list), and I'll take a look. I don't how to force maker to create fewer files. You definitely want to be able to make use of the results from prior runs to save time.

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Rebecca Harris [rbharris at uw.edu]
Sent: Wednesday, March 19, 2014 7:19 PM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] tradeoff between run time & file number

Hi -

I'm running maker on a dataset of >400,000 scaffolds with MPI -n 64. I've gone through it once - and used the clean_up option because otherwise maker exceeds the clusters file_quote. However, now I'm retraining SNAP and it is taking a very long time - probably because it has to go through BLAST again. Is there anyway of getting around this? I expect I may have to train SNAP and rerun maker multiple times and it is taking about 3 weeks to get through my dataset. Is there a way to prune down my original dataset based on maker's output?

Thanks,
Rebecca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140320/c636afd0/attachment-0003.html>

From darasappan at gmail.com  Thu Mar 20 11:22:47 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Thu, 20 Mar 2014 12:22:47 -0500
Subject: [maker-devel] maker snap output files
In-Reply-To: <CF4E0363.AE3D%carsonhh@gmail.com>
References: <F88D0CA1-15E3-4E01-881F-4F697044B1FC@gmail.com>
	<CF4DFA69.AE2E%carsonhh@gmail.com>
	<05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com>
	<CF4E0363.AE3D%carsonhh@gmail.com>
Message-ID: <48D7969E-3BA8-4086-8886-11B32CDAA2A2@gmail.com>

Hi Carson,

Given that I now have maker transcripts, ab initio predicted transcripts and transcripts that don?t overlap, which ones are reflected in the gff file?

The ids in the gff file (for exons, genes, mrna) all say something like ?*snap-gene?  so does this mean these are the genes from the snap prediction tool?


Thanks
dhivya


On Mar 18, 2014, at 3:09 PM, Carson Holt <carsonhh at gmail.com> wrote:

> There can also be hint based predictions.  They may be similar in size, but there is no rule.  Generally maker.snap_masked.proteins.fasta will be larger, as gene predictors tend to over predict (as much as 10 fold).  You should always review your annotations in something like Apollo, to see how the models compare to the evidence.  Just counts don?t really mean anything.
> 
> Thanks,
> Carson
> 
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Tuesday, March 18, 2014 at 2:05 PM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: <maker-devel at yandell-lab.org>
> Subject: Re: maker snap output files
> 
> Thanks Carson.
> 
> Is it normal that in my maker results after running snap, the number of proteins (in *maker.proteins.fasta) Is actually less than the number of proteins in my pre-snap maker results?  I assumed that annotations through alignment+annotation through prediction would equal more annotations?
> 
> The unfiltered proteins file has more proteins though.
> 
> Thanks
> Dhivya
> 
> 
> 
> On Mar 18, 2014, at 2:34 PM, Carson Holt <carsonhh at gmail.com> wrote:
> 
>> maker.proteins.fasta - these are the final filtered and modified protein models (this is what you want)
>> maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab initio predictions (for reference purposes)
>> maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant rejected models that do not overlap the maker.proteins.fasta entries. If you think you are missing a gene, look for it here.  Sometimes people use interproscan (very slow) to analyze this file for false negatives.
>> 
>> 
>> These files are also described in the README distributed with MAKER in the ?MAKER OUTPUT? section.
>> 
>> Thanks,
>> Carson
>> 
>> 
>> 
>> 
>> From: dhivya arasappan <darasappan at gmail.com>
>> Date: Tuesday, March 18, 2014 at 1:27 PM
>> To: Carson Holt <carsonhh at gmail.com>, <maker-devel at yandell-lab.org>
>> Subject: maker snap output files
>> 
>> Hello,
>> 
>> I ran maker after running SNAP ab initio prediction (following instructions from the maker tutorial).  It ran successfully and when I ran fasta_merge, I got several output fasta files. I?m unable to find information on the tutorial about interpreting these different files. I?m hoping one of you can help.
>> 
>> *maker.proteins.fasta
>> *maker.snap_masked.proteins.fasta
>> *maker.non_overlapping_ab_initio.proteins.fasta
>> 
>> What is the difference among these? They all have different number of sequences.
>> 
>> Similarly,with transcripts:
>> 
>> maker.non_overlapping_ab_initio.transcripts.fasta
>> maker.snap_masked.transcripts.fasta
>> maker.transcripts.fasta
>> 
>> Thanks
>> Dhivya
>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140320/9aed362d/attachment-0003.html>

From carsonhh at gmail.com  Thu Mar 20 11:24:41 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 20 Mar 2014 11:24:41 -0600
Subject: [maker-devel] maker snap output files
In-Reply-To: <48D7969E-3BA8-4086-8886-11B32CDAA2A2@gmail.com>
References: <F88D0CA1-15E3-4E01-881F-4F697044B1FC@gmail.com>
	<CF4DFA69.AE2E%carsonhh@gmail.com>
	<05EA6913-59F1-459F-850B-A4EAAFE610D9@gmail.com>
	<CF4E0363.AE3D%carsonhh@gmail.com>
	<48D7969E-3BA8-4086-8886-11B32CDAA2A2@gmail.com>
Message-ID: <CF508021.AF35%carsonhh@gmail.com>

maker transcripts will be the gene/mRNA/exon/CDS features

All other transcripts from SNAP etc. will be match/match_part features in
the GFF3.

When you look at these in something like Apollo, they will be placed in
different viewing panels based on their type.

Thanks,
Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Thursday, March 20, 2014 at 11:22 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  <maker-devel at yandell-lab.org>
Subject:  Re: maker snap output files

Hi Carson,

Given that I now have maker transcripts, ab initio predicted transcripts and
transcripts that don?t overlap, which ones are reflected in the gff file?

The ids in the gff file (for exons, genes, mrna) all say something like
?*snap-gene?  so does this mean these are the genes from the snap prediction
tool?


Thanks
dhivya


On Mar 18, 2014, at 3:09 PM, Carson Holt <carsonhh at gmail.com> wrote:

> There can also be hint based predictions.  They may be similar in size, but
> there is no rule.  Generally maker.snap_masked.proteins.fasta will be larger,
> as gene predictors tend to over predict (as much as 10 fold).  You should
> always review your annotations in something like Apollo, to see how the models
> compare to the evidence.  Just counts don?t really mean anything.
> 
> Thanks,
> Carson
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Tuesday, March 18, 2014 at 2:05 PM
> To:  Carson Holt <carsonhh at gmail.com>
> Cc:  <maker-devel at yandell-lab.org>
> Subject:  Re: maker snap output files
> 
> Thanks Carson.
> 
> Is it normal that in my maker results after running snap, the number of
> proteins (in *maker.proteins.fasta) Is actually less than the number of
> proteins in my pre-snap maker results?  I assumed that annotations through
> alignment+annotation through prediction would equal more annotations?
> 
> The unfiltered proteins file has more proteins though.
> 
> Thanks
> Dhivya
> 
> 
> 
> On Mar 18, 2014, at 2:34 PM, Carson Holt <carsonhh at gmail.com> wrote:
> 
>> maker.proteins.fasta - these are the final filtered and modified protein
>> models (this is what you want)
>> maker.snap_masked.proteins.fasta - these are the raw unfiltered snap ab
>> initio predictions (for reference purposes)
>> maker.non_overlapping_ab_initio.proteins.fasta - these are non-redundant
>> rejected models that do not overlap the maker.proteins.fasta entries. If you
>> think you are missing a gene, look for it here.  Sometimes people use
>> interproscan (very slow) to analyze this file for false negatives.
>> 
>> 
>> These files are also described in the README distributed with MAKER in the
>> ?MAKER OUTPUT? section.
>> 
>> Thanks,
>> Carson
>> 
>> 
>> 
>> 
>> From:  dhivya arasappan <darasappan at gmail.com>
>> Date:  Tuesday, March 18, 2014 at 1:27 PM
>> To:  Carson Holt <carsonhh at gmail.com>, <maker-devel at yandell-lab.org>
>> Subject:  maker snap output files
>> 
>> Hello,
>> 
>> I ran maker after running SNAP ab initio prediction (following instructions
>> from the maker tutorial).  It ran successfully and when I ran fasta_merge, I
>> got several output fasta files. I?m unable to find information on the
>> tutorial about interpreting these different files. I?m hoping one of you can
>> help.
>> 
>> *maker.proteins.fasta
>> *maker.snap_masked.proteins.fasta
>> *maker.non_overlapping_ab_initio.proteins.fasta
>> 
>> What is the difference among these? They all have different number of
>> sequences.
>> 
>> Similarly,with transcripts:
>> 
>> maker.non_overlapping_ab_initio.transcripts.fasta
>> maker.snap_masked.transcripts.fasta
>> maker.transcripts.fasta
>> 
>> Thanks
>> Dhivya
>> 
>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140320/5d055334/attachment-0003.html>

From carsonhh at gmail.com  Thu Mar 20 11:53:24 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 20 Mar 2014 11:53:24 -0600
Subject: [maker-devel] tradeoff between run time & file number
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6F524@mxb2.hg.genetics.utah.edu>
References: <CAESS274qd5dL9apLh3sobjkz0+vwjVa9j0Ytd5dR-Qrb4av+=Q@mail.gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6F524@mxb2.hg.genetics.utah.edu>
Message-ID: <CF50861A.AF65%carsonhh@gmail.com>

You may also want to try the GFF3 pass_through options.  Basically you give
your GFF3 file to maker_gff, tell it what kinds of evidence to maintain from
your past run by setting the 'pass' options to 1.  Then you can run without
your fast file inputs for ESTs, Proteins, and repeats (also blank out repeat
masker species as well).  The values will be passed forward from the GFF3
file into the current run.

--Carson


From:  Daniel Ence <dence at genetics.utah.edu>
Date:  Wednesday, March 19, 2014 at 11:43 PM
To:  Rebecca Harris <rbharris at uw.edu>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] tradeoff between run time & file number

Hi Rebecca, So, as far as pruning down the dataset goes, I think that the
biggest gains will be made by trimming the number of scaffolds that you
annotate. What is the n50 of your 400,000 scaffold set? Usually, scaffolds
shorter than 5k or 10kbp won't contribute much to the gene counts in the
end. 

Also, if you can, try to avoid using the alt_est option. It works completely
fine, but blasting those sequences takes much longer than blastn or blastp.

Otherwise, I'd need to see your maker_opts.ctl file to see how you've got
things set up. You can attach those to your reply (to the maker-devel list),
and I'll take a look. I don't how to force maker to create fewer files. You
definitely want to be able to make use of the results from prior runs to
save time. 

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330

From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Rebecca
Harris [rbharris at uw.edu]
Sent: Wednesday, March 19, 2014 7:19 PM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] tradeoff between run time & file number

Hi - 

I'm running maker on a dataset of >400,000 scaffolds with MPI -n 64. I've
gone through it once - and used the clean_up option because otherwise maker
exceeds the clusters file_quote. However, now I'm retraining SNAP and it is
taking a very long time - probably because it has to go through BLAST again.
Is there anyway of getting around this? I expect I may have to train SNAP
and rerun maker multiple times and it is taking about 3 weeks to get through
my dataset. Is there a way to prune down my original dataset based on
maker's output?

Thanks,
Rebecca
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140320/583f25f5/attachment-0003.html>

From carsonhh at gmail.com  Fri Mar 21 08:23:18 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 21 Mar 2014 08:23:18 -0600
Subject: [maker-devel] Annotation with maker2
In-Reply-To: <CAF+kvSZZJA1+ZvRfqArTERXSy_aTZJ07w4kE_JgR0eo1mWe3FQ@mail.gmail.com>
References: <CAF+kvSZO+VzHveN+WNmD3O8qayyrOFATS7VA2c-wLdGs1m4iTw@mail.gmail.com>
	<CF4EF035.AE6F%carsonhh@gmail.com>
	<CAF+kvSasxjb7p_Wjtntmy2nht6kfL=JqaP5DfMGeC0GHkLy8Hw@mail.gmail.com>
	<CF5065FF.AEE7%carsonhh@gmail.com>
	<CAF+kvSbKs-sdFfvncqEgsAk4_XKbsB7KdB85fCdxpcWNe1rjWQ@mail.gmail.com>
	<CF506B1E.AEED%carsonhh@gmail.com>
	<CAF+kvSbmpRgneyfz6_tWsx_NS8ZWhuwnQAV0hA83qJrOVh-0hA@mail.gmail.com>
	<CAF+kvSZD7rpUeeoNGMKBGbH0zZN3bHksJFuUPP+hGoUKki34jw@mail.gmail.com>
	<CF506F04.AEF8%carsonhh@gmail.com>
	<CAF+kvSY_nWAFBH1YpKJqWV7qQ=XehHzhX9e+65miAG4f_+=ptA@mail.gmail.com>
	<CAF+kvSYYTA8pYFc0WY12+g6T_bk7P9MRUxNpzqtGkJARsA0wpg@mail.gmail.com>
	<CF50741C.AF02%carsonhh@gmail.com>
	<CAF+kvSZAhJnJdq+UcRfpWSya+6W26ecZHkRvHzGLsqk6K=fmQg@mail.gmail.com>
	<CF507AB2.AF1E%carsonhh@gmail.com>
	<CF507F90.AF30%carsonhh@gmail.com>
	<CAF+kvSZZJA1+ZvRfqArTERXSy_aTZJ07w4kE_JgR0eo1mWe3FQ@mail.gmail.com>
Message-ID: <CF51A74A.AFA8%carsonhh@gmail.com>

Glad it's working.  Let us know if anything else comes up.

--Carson


From:  Chris Bioinfo <chrisbioinfo at gmail.com>
Date:  Friday, March 21, 2014 at 4:57 AM
To:  Carson Holt <carsonhh at gmail.com>
Subject:  Re: [maker-devel] Annotation with maker2

Dear Carson

it works!! after many difficults :

I have installed sqlite3.8.4.1 yesterday: it was """better"""" (no error
message by launching sqlite3). Yet my test.db was not created..

Today I find the trick!
the problem was due to my too long path to created the db .. only that...

Thanks for your time and you help Carson!

All the best,

Christelle


2014-03-20 18:21 GMT+01:00 Carson Holt <carsonhh at gmail.com>:
> Also you can use this command line to test both before and after installing
> 
> perl -MDBI -MDBD::SQLite -e 'print "$DBD::SQLite::sqlite_version\n"; $dbh =
> DBI->connect("dbi:SQLite:dbname=/path/from/maker/error/dpp_contig.db","","");'
> 
> Make sure to set /path/from/maker/error/dpp_contig.db to whatever its was in
> the error.
> 
> --Carson
> 
> 
> From:  Carson Holt <carsonhh at gmail.com>
> Date:  Thursday, March 20, 2014 at 11:03 AM
> To:  Chris Bioinfo <chrisbioinfo at gmail.com>
> 
> Subject:  Re: [maker-devel] Annotation with maker2
> 
> The failure is in SQLite.  So you have to reinstall.  I.e. 'force install
> DBD::SQLite' in CPAN.  Otherwise you are just keeping whatever module is
> installed which may have broken C bindings.
> 
> You may also have to install SQLite 3.8.4.1, and then reinstall the perl
> modules using the force option to force recompile.
> 
> --Carson
> 
> 
> 
> From:  Chris Bioinfo <chrisbioinfo at gmail.com>
> Date:  Thursday, March 20, 2014 at 10:57 AM
> To:  Carson Holt <carsonhh at gmail.com>
> Subject:  Re: [maker-devel] Annotation with maker2
> 
> cpan[2]> install DBI
> DBI is up to date (1.631).
> 
> cpan[3]> install DBD::SQLite
> DBD::SQLite is up to date (1.42).
> 
> my test.db is not created effectively:
> 
> sqlite3 dpp_contig.maker.output/test.db
> SQLite version 3.8.3.1 2014-02-11 14:52:19
> Enter ".help" for instructions
> Enter SQL statements terminated with a ";"
> sqlite> 
> 
> 
> 
> 
> 2014-03-20 17:36 GMT+01:00 Carson Holt <carsonhh at gmail.com>:
>> I'm actually checking the mount points for the disk.  SQLite won't work on
>> filesystems that don't implement locks, and 'df' is a good way to infer some
>> of that info.
>> 
>> Basically I still think this is SQLlite failing on your system.  You might
>> need to reinstall SQLlite and then reinstall the perl DBI and DBD::SQLite
>> modules.
>> 
>> You can also do a test command --> 'sqllite3 dpp_contig.maker.output/test.db'
>> 
>> This will work if you have sqllite3 installed.  And any error it give may be
>> informative.
>> 
>> --Carson
>> 
>> From:  Chris Bioinfo <chrisbioinfo at gmail.com>
>> Date:  Thursday, March 20, 2014 at 10:29 AM
>> 
>> To:  Carson Holt <carsonhh at gmail.com>
>> Subject:  Re: [maker-devel] Annotation with maker2
>> 
>> oh sorry
>> 
>> my disks are quite full, but still space I guess for maker
>> 
>>  /dev/sdc1           19T     18T  934G  95% /home
>> 
>> 
>> 2014-03-20 17:23 GMT+01:00 Chris Bioinfo <chrisbioinfo at gmail.com>:
>>> this :
>>> 
>>>  du -h dpp_contig.maker.output/
>>> 0    
>>> dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500/theVoi
>>> d.contig-dpp-500-500/0
>>> 88K    
>>> dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500/theVoi
>>> d.contig-dpp-500-500
>>> 92K    dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500
>>> 92K    dpp_contig.maker.output/dpp_contig_datastore/05/1F
>>> 92K    dpp_contig.maker.output/dpp_contig_datastore/05
>>> 92K    dpp_contig.maker.output/dpp_contig_datastore
>>> 4.0K    dpp_contig.maker.output/dpp_contig_master_datastore_index.log
>>> 4.0K    dpp_contig.maker.output/maker_bopts.log
>>> 4.0K    dpp_contig.maker.output/maker_exe.log
>>> 8.0K    dpp_contig.maker.output/maker_opts.log
>>> 16K    dpp_contig.maker.output/mpi_blastdb/dpp_protein%2Efasta.mpi.1
>>> 44K    dpp_contig.maker.output/mpi_blastdb/dpp_contig%2Efasta.mpi.1
>>> 14M    dpp_contig.maker.output/mpi_blastdb/te_proteins%2Efasta.mpi.10
>>> 32K    dpp_contig.maker.output/mpi_blastdb/dpp_est%2Efasta.mpi.1
>>> 14M    dpp_contig.maker.output/mpi_blastdb
>>> 0    dpp_contig.maker.output/seen.dbm
>>> 
>>> 
>>> 
>>> 2014-03-20 17:10 GMT+01:00 Carson Holt <carsonhh at gmail.com>:
>>> 
>>>> What does 'df -h dpp_contig.maker.output' show?
>>>> 
>>>> --Carson
>>>> 
>>>> From:  Chris Bioinfo <chrisbioinfo at gmail.com>
>>>> Date:  Thursday, March 20, 2014 at 10:00 AM
>>>> 
>>>> To:  Carson Holt <carsonhh at gmail.com>
>>>> Subject:  Re: [maker-devel] Annotation with maker2
>>>> 
>>>> sorry, mistake on the dir!
>>>> 
>>>> I have these files:
>>>> dpp_contig_datastore  dpp_contig_master_datastore_index.log
>>>> maker_bopts.log  maker_exe.log  maker_opts.log  mpi_blastdb  seen.dbm
>>>> 
>>>> 
>>>> 2014-03-20 16:59 GMT+01:00 Chris Bioinfo <chrisbioinfo at gmail.com>:
>>>>> no,
>>>>> 
>>>>> I have theses files in the directory:
>>>>> dpp_contig.fasta         dpp_est.fasta      hsap_contig.fasta
>>>>> hsap_protein.fasta  maker_exe.ctl
>>>>> dpp_contig.maker.output  dpp_protein.fasta  hsap_est.fasta
>>>>> maker_bopts.ctl     maker_opts.ctl  te_proteins.fasta
>>>>> 
>>>>> 
>>>>> 
>>>>> 2014-03-20 16:53 GMT+01:00 Carson Holt <carsonhh at gmail.com>:
>>>>> 
>>>>>> Did 
>>>>>> /home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/dpp_contig.maker.
>>>>>> output/dpp_contig.db exist?
>>>>>> 
>>>>>> --Carson
>>>>>> 
>>>>>> 
>>>>>> From:  Chris Bioinfo <chrisbioinfo at gmail.com>
>>>>>> Date:  Thursday, March 20, 2014 at 9:50 AM
>>>>>> 
>>>>>> To:  Carson Holt <carsonhh at gmail.com>
>>>>>> Subject:  Re: [maker-devel] Annotation with maker2
>>>>>> 
>>>>>> cdantec at belem:~$ /usr/bin/perl -v
>>>>>> 
>>>>>> This is perl 5, version 18, subversion 1 (v5.18.1) built for
>>>>>> x86_64-linux-gnu-thread-multi
>>>>>> (with 46 registered patches, see perl -V for more detail)
>>>>>> 
>>>>>> Copyright 1987-2013, Larry Wall
>>>>>> 
>>>>>> Perl may be copied only under the terms of either the Artistic License or
>>>>>> the
>>>>>> GNU General Public License, which may be found in the Perl 5 source kit.
>>>>>> 
>>>>>> Complete documentation for Perl, including FAQ lists, should be found on
>>>>>> this system using "man perl" or "perldoc perl".  If you have access to
>>>>>> the
>>>>>> Internet, point your browser at http://www.perl.org/, the Perl Home Page.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 2014-03-20 16:32 GMT+01:00 Carson Holt <carsonhh at gmail.com>:
>>>>>>> What do you get for when you type --> /usr/bin/perl -v
>>>>>>> 
>>>>>>> The key to the error is this line -->
>>>>>>> DBI 
>>>>>>> connect('dbname=/home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/
>>>>>>> dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open
>>>>>>> database file
>>>>>>> 
>>>>>>> Either the database doesn't exist, or is corrupt.  Does it exist?
>>>>>>> 
>>>>>>> --Carson
>>>>>>> 
>>>>>>> From:  Chris Bioinfo <chrisbioinfo at gmail.com>
>>>>>>> Date:  Thursday, March 20, 2014 at 9:25 AM
>>>>>>> To:  Carson Holt <carsonhh at gmail.com>
>>>>>>> Subject:  Re: [maker-devel] Annotation with maker2
>>>>>>> 
>>>>>>> Dear Carson,
>>>>>>> 
>>>>>>> I have reinstalled DBD::SQLite module, check the permission in my
>>>>>>> directory, configure the TMP value in maker_opts.ctl. perl is in
>>>>>>> /usr/bin/perl.
>>>>>>> I have deleted many times  the output directory.. but same problem..
>>>>>>> 
>>>>>>> So here the debug output :
>>>>>>> ****MODULE VERSION INFO
>>>>>>>     0.05    Acme::Damn    /usr/local/lib/perl/5.18.1/Acme/Damn.pm
>>>>>>>     1.01    AnyDBM_File    /usr/share/perl/5.18/AnyDBM_File.pm
>>>>>>>     5.73    AutoLoader    /usr/share/perl/5.18/AutoLoader.pm
>>>>>>>     UNKNOWN    Bio::AnalysisParserI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/AnalysisParserI.pm
>>>>>>>     UNKNOWN    Bio::AnnotatableI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/AnnotatableI.pm
>>>>>>>     UNKNOWN    Bio::Annotation::Collection
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Annotation/Collection.pm
>>>>>>>     UNKNOWN    Bio::Annotation::SimpleValue
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Annotation/SimpleValue.pm
>>>>>>>     UNKNOWN    Bio::Annotation::TypeManager
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Annotation/TypeManager.pm
>>>>>>>     UNKNOWN    Bio::AnnotationCollectionI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/AnnotationCollectionI.pm
>>>>>>>     UNKNOWN    Bio::AnnotationI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/AnnotationI.pm
>>>>>>>     1.006923    Bio::DB::Fasta
>>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/Fasta.pm
>>>>>>>     UNKNOWN    Bio::DB::InMemoryCache
>>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/InMemoryCache.pm
>>>>>>>     UNKNOWN    Bio::DB::IndexedBase
>>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/IndexedBase.pm
>>>>>>>     UNKNOWN    Bio::DB::RandomAccessI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/RandomAccessI.pm
>>>>>>>     UNKNOWN    Bio::DB::SeqI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/DB/SeqI.pm
>>>>>>>     UNKNOWN    Bio::DescribableI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/DescribableI.pm
>>>>>>>     UNKNOWN    Bio::Event::EventGeneratorI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Event/EventGeneratorI.pm
>>>>>>>     UNKNOWN    Bio::Event::EventHandlerI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Event/EventHandlerI.pm
>>>>>>>     UNKNOWN    Bio::Factory::ObjectFactory
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Factory/ObjectFactory.pm
>>>>>>>     UNKNOWN    Bio::Factory::ObjectFactoryI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Factory/ObjectFactoryI.pm
>>>>>>>     UNKNOWN    Bio::Factory::SequenceFactoryI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Factory/SequenceFactoryI.pm
>>>>>>>     UNKNOWN    Bio::FeatureHolderI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/FeatureHolderI.pm
>>>>>>>     UNKNOWN    Bio::IdentifiableI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/IdentifiableI.pm
>>>>>>>     UNKNOWN    Bio::LocatableSeq
>>>>>>> /usr/local/share/perl/5.18.1/Bio/LocatableSeq.pm
>>>>>>>     UNKNOWN    Bio::Location::Atomic
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Atomic.pm
>>>>>>>     UNKNOWN    Bio::Location::CoordinatePolicyI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/CoordinatePolicyI.pm
>>>>>>>     UNKNOWN    Bio::Location::Fuzzy
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Fuzzy.pm
>>>>>>>     UNKNOWN    Bio::Location::FuzzyLocationI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/FuzzyLocationI.pm
>>>>>>>     UNKNOWN    Bio::Location::Simple
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Simple.pm
>>>>>>>     UNKNOWN    Bio::Location::Split
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/Split.pm
>>>>>>>     UNKNOWN    Bio::Location::SplitLocationI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/SplitLocationI.pm
>>>>>>>     UNKNOWN    Bio::Location::WidestCoordPolicy
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Location/WidestCoordPolicy.pm
>>>>>>>     UNKNOWN    Bio::LocationI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/LocationI.pm
>>>>>>>     UNKNOWN    Bio::PrimarySeq
>>>>>>> /usr/local/share/perl/5.18.1/Bio/PrimarySeq.pm
>>>>>>>     1.006923    Bio::PrimarySeqI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/PrimarySeqI.pm
>>>>>>>     UNKNOWN    Bio::Range    /usr/local/share/perl/5.18.1/Bio/Range.pm
>>>>>>>     UNKNOWN    Bio::RangeI    /usr/local/share/perl/5.18.1/Bio/RangeI.pm
>>>>>>>     1.006923    Bio::Root::Exception
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/Exception.pm
>>>>>>>     UNKNOWN    Bio::Root::HTTPget
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/HTTPget.pm
>>>>>>>     UNKNOWN    Bio::Root::IO
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/IO.pm
>>>>>>>     1.006923    Bio::Root::Root
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/Root.pm
>>>>>>>     1.006923    Bio::Root::RootI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/RootI.pm
>>>>>>>     1.006923    Bio::Root::Version
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Root/Version.pm
>>>>>>>     UNKNOWN    Bio::Search::HSP::GenericHSP
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/HSP/GenericHSP.pm
>>>>>>>     UNKNOWN    Bio::Search::HSP::HSPFactory
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/HSP/HSPFactory.pm
>>>>>>>     UNKNOWN    Bio::Search::HSP::HSPI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/HSP/HSPI.pm
>>>>>>>     0.01    Bio::Search::HSP::PhatHSP::Base
>>>>>>> 
/usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/Base.p>>>>>>>
m
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::augustus
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/august
>>>>>>> us.pm <http://augustus.pm>
>>>>>>>     0.01    Bio::Search::HSP::PhatHSP::blastn
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/blastn
>>>>>>> .pm <http://blastn.pm>
>>>>>>>     0.01    Bio::Search::HSP::PhatHSP::blastx
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/blastx
>>>>>>> .pm <http://blastx.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::cdna2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/cdna2g
>>>>>>> enome.pm <http://cdna2genome.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::est2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/est2ge
>>>>>>> nome.pm <http://est2genome.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::fgenesh
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/fgenes
>>>>>>> h.pm <http://fgenesh.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::genemark
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/genema
>>>>>>> rk.pm <http://genemark.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::gff3
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/gff3.p
>>>>>>> m <http://gff3.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::protein2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/protei
>>>>>>> n2genome.pm <http://protein2genome.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::repeatmasker
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/repeat
>>>>>>> masker.pm <http://repeatmasker.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::snap
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/snap.p
>>>>>>> m <http://snap.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::snoscan
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/snosca
>>>>>>> n.pm <http://snoscan.pm>
>>>>>>>     0.01    Bio::Search::HSP::PhatHSP::tblastx
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/tblast
>>>>>>> x.pm <http://tblastx.pm>
>>>>>>>     UNKNOWN    Bio::Search::HSP::PhatHSP::trnascan
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/HSP/PhatHSP/trnasc
>>>>>>> an.pm <http://trnascan.pm>
>>>>>>>     1.006923    Bio::Search::Hit::GenericHit
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/Hit/GenericHit.pm
>>>>>>>     UNKNOWN    Bio::Search::Hit::HitFactory
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/Hit/HitFactory.pm
>>>>>>>     UNKNOWN    Bio::Search::Hit::HitI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/Hit/HitI.pm
>>>>>>>     0.01    Bio::Search::Hit::PhatHit::Base
>>>>>>> 
/usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/Base.p>>>>>>>
m
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::augustus
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/august
>>>>>>> us.pm <http://augustus.pm>
>>>>>>>     0.01    Bio::Search::Hit::PhatHit::blastn
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/blastn
>>>>>>> .pm <http://blastn.pm>
>>>>>>>     0.01    Bio::Search::Hit::PhatHit::blastx
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/blastx
>>>>>>> .pm <http://blastx.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::cdna2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/cdna2g
>>>>>>> enome.pm <http://cdna2genome.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::est2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/est2ge
>>>>>>> nome.pm <http://est2genome.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::fgenesh
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/fgenes
>>>>>>> h.pm <http://fgenesh.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::genemark
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/genema
>>>>>>> rk.pm <http://genemark.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::gff3
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/gff3.p
>>>>>>> m <http://gff3.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::protein2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/protei
>>>>>>> n2genome.pm <http://protein2genome.pm>
>>>>>>>     1.006923    Bio::Search::Hit::PhatHit::repeatmasker
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/repeat
>>>>>>> masker.pm <http://repeatmasker.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::snap
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/snap.p
>>>>>>> m <http://snap.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::snoscan
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/snosca
>>>>>>> n.pm <http://snoscan.pm>
>>>>>>>     0.01    Bio::Search::Hit::PhatHit::tblastx
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/tblast
>>>>>>> x.pm <http://tblastx.pm>
>>>>>>>     UNKNOWN    Bio::Search::Hit::PhatHit::trnascan
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Bio/Search/Hit/PhatHit/trnasc
>>>>>>> an.pm <http://trnascan.pm>
>>>>>>>     1.006923    Bio::Search::SearchUtils
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Search/SearchUtils.pm
>>>>>>>     UNKNOWN    Bio::SearchIO
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SearchIO.pm
>>>>>>>     UNKNOWN    Bio::SearchIO::EventHandlerI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SearchIO/EventHandlerI.pm
>>>>>>>     UNKNOWN    Bio::SearchIO::SearchResultEventBuilder
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SearchIO/SearchResultEventBuilder.pm
>>>>>>>     UNKNOWN    Bio::Seq    /usr/local/share/perl/5.18.1/Bio/Seq.pm
>>>>>>>     UNKNOWN    Bio::Seq::SeqFactory
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Seq/SeqFactory.pm
>>>>>>>     UNKNOWN    Bio::SeqAnalysisParserI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqAnalysisParserI.pm
>>>>>>>     UNKNOWN    Bio::SeqFeature::FeaturePair
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/FeaturePair.pm
>>>>>>>     UNKNOWN    Bio::SeqFeature::Generic
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/Generic.pm
>>>>>>>     UNKNOWN    Bio::SeqFeature::Similarity
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/Similarity.pm
>>>>>>>     UNKNOWN    Bio::SeqFeature::SimilarityPair
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeature/SimilarityPair.pm
>>>>>>>     UNKNOWN    Bio::SeqFeatureI
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqFeatureI.pm
>>>>>>>     UNKNOWN    Bio::SeqI    /usr/local/share/perl/5.18.1/Bio/SeqI.pm
>>>>>>>     UNKNOWN    Bio::SeqUtils
>>>>>>> /usr/local/share/perl/5.18.1/Bio/SeqUtils.pm
>>>>>>>     1.006923    Bio::Tools::CodonTable
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Tools/CodonTable.pm
>>>>>>>     UNKNOWN    Bio::Tools::GFF
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Tools/GFF.pm
>>>>>>>     1.006923    Bio::Tools::IUPAC
>>>>>>> /usr/local/share/perl/5.18.1/Bio/Tools/IUPAC.pm
>>>>>>>     7.3    Bit::Vector    /usr/local/lib/perl/5.18.1/Bit/Vector.pm
>>>>>>>     0.01    CGL::Annotation
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation.pm
>>>>>>>     0.01    CGL::Annotation::Feature
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature.pm
>>>>>>>     0.01    CGL::Annotation::Feature::Contig
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Contig
>>>>>>> .pm
>>>>>>>     0.01    CGL::Annotation::Feature::Exon
>>>>>>> 
/usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Exon.p>>>>>>>
m
>>>>>>>     0.01    CGL::Annotation::Feature::Gene
>>>>>>> 
/usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Gene.p>>>>>>>
m
>>>>>>>     0.01    CGL::Annotation::Feature::Intron
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Intron
>>>>>>> .pm
>>>>>>>     0.01    CGL::Annotation::Feature::Protein
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Protei
>>>>>>> n.pm
>>>>>>>     0.01    CGL::Annotation::Feature::Sequence_variant
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Sequen
>>>>>>> ce_variant.pm
>>>>>>>     0.01    CGL::Annotation::Feature::Transcript
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Feature/Transc
>>>>>>> ript.pm
>>>>>>>     0.01    CGL::Annotation::FeatureLocation
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/FeatureLocatio
>>>>>>> n.pm
>>>>>>>     0.01    CGL::Annotation::FeatureRelationship
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/FeatureRelatio
>>>>>>> nship.pm
>>>>>>>     0.01    CGL::Annotation::Iterator
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Iterator.pm
>>>>>>>     0.01    CGL::Annotation::Trace
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Annotation/Trace.pm
>>>>>>>     0.01    CGL::Clone
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Clone.pm
>>>>>>>     0.01    CGL::Ontology::Node
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Node.pm
>>>>>>>     0.01    CGL::Ontology::NodeRelationship
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/NodeRelationship
>>>>>>> .pm
>>>>>>>     0.01    CGL::Ontology::Ontology
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Ontology.pm
>>>>>>>     0.01    CGL::Ontology::Parser::OBO
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Parser/OBO.pm
>>>>>>>     0.01    CGL::Ontology::SO
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/SO.pm
>>>>>>>     0.01    CGL::Ontology::Trace
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Ontology/Trace.pm
>>>>>>>     0.01    CGL::Revcomp
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/Revcomp.pm
>>>>>>>     0.01    CGL::TranslationMachine
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/CGL/TranslationMachine.pm
>>>>>>>     1.32    Carp    /usr/local/share/perl/5.18.1/Carp.pm
>>>>>>>     1.32    Carp::Heavy    /usr/local/share/perl/5.18.1/Carp/Heavy.pm
>>>>>>>     0.64    Class::Struct    /usr/share/perl/5.18/Class/Struct.pm
>>>>>>>     0.36    Clone    /usr/local/lib/perl/5.18.1/Clone.pm
>>>>>>>     5.018001    Config    /usr/lib/perl/5.18/Config.pm
>>>>>>>     3.40    Cwd    /usr/lib/perl/5.18/Cwd.pm
>>>>>>>     1.42    DBD::SQLite    /usr/local/lib/perl/5.18.1/DBD/SQLite.pm
>>>>>>>     1.631    DBI    /usr/local/lib/perl/5.18.1/DBI.pm
>>>>>>>     1.827    DB_File    /usr/lib/perl/5.18/DB_File.pm
>>>>>>>     2.145    Data::Dumper    /usr/lib/perl/5.18/Data/Dumper.pm
>>>>>>>     0.11    Datastore::Base
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Datastore/Base.pm
>>>>>>>     0.01    Datastore::MD5
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Datastore/MD5.pm
>>>>>>>     2.53    Digest::MD5    /usr/local/lib/perl/5.18.1/Digest/MD5.pm
>>>>>>>     1.16    Digest::base    /usr/share/perl/5.18/Digest/base.pm
>>>>>>> <http://base.pm>
>>>>>>>     UNKNOWN    Dumper::GFF::GFFV3
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Dumper/GFF/GFFV3.pm
>>>>>>>     UNKNOWN    Dumper::XML::Game
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Dumper/XML/Game.pm
>>>>>>>     UNKNOWN    Dumper::XML::Game_Xml
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Dumper/XML/Game_Xml.pm
>>>>>>>     1.18    DynaLoader    /usr/lib/perl/5.18/DynaLoader.pm
>>>>>>>     1.18    Errno    /usr/lib/perl/5.18/Errno.pm
>>>>>>>     0.17015    Error
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm
>>>>>>>     UNKNOWN    Error::Simple
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error/Simple.pm
>>>>>>>     5.68    Exporter    /usr/share/perl/5.18/Exporter.pm
>>>>>>>     5.68    Exporter::Heavy    /usr/share/perl/5.18/Exporter/Heavy.pm
>>>>>>>     UNKNOWN    Fasta
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Fasta.pm
>>>>>>>     UNKNOWN    FastaChunk
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaChunk.pm
>>>>>>>     UNKNOWN    FastaChunker
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaChunker.pm
>>>>>>>     UNKNOWN    FastaDB
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaDB.pm
>>>>>>>     UNKNOWN    FastaFile
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaFile.pm
>>>>>>>     UNKNOWN    FastaSeq
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/FastaSeq.pm
>>>>>>>     1.11    Fcntl    /usr/lib/perl/5.18/Fcntl.pm
>>>>>>>     2.84    File::Basename    /usr/share/perl/5.18/File/Basename.pm
>>>>>>>     2.26    File::Copy    /usr/share/perl/5.18/File/Copy.pm
>>>>>>>     1.20    File::Glob    /usr/lib/perl/5.18/File/Glob.pm
>>>>>>>     1.20    File::NFSLock
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/File/NFSLock.pm
>>>>>>>     2.09    File::Path    /usr/share/perl/5.18/File/Path.pm
>>>>>>>     3.40    File::Spec    /usr/lib/perl/5.18/File/Spec.pm
>>>>>>>     3.40    File::Spec::Unix    /usr/lib/perl/5.18/File/Spec/Unix.pm
>>>>>>>     0.2304    File::Temp    /usr/local/share/perl/5.18.1/File/Temp.pm
>>>>>>>     1.09    File::Which
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/File/Which.pm
>>>>>>>     2.02    FileHandle    /usr/share/perl/5.18/FileHandle.pm
>>>>>>>     1.51    FindBin    /usr/share/perl/5.18/FindBin.pm
>>>>>>>     UNKNOWN    GFFDB
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm
>>>>>>>     UNKNOWN    GI    /usr/local/annotation/maker2.31/bin/../lib/GI.pm
>>>>>>>     2.42    Getopt::Long    /usr/local/share/perl/5.18.1/Getopt/Long.pm
>>>>>>>     6.02    HTTP::Date    /usr/share/perl5/HTTP/Date.pm
>>>>>>>     6.05    HTTP::Headers    /usr/share/perl5/HTTP/Headers.pm
>>>>>>>     6.06    HTTP::Message    /usr/share/perl5/HTTP/Message.pm
>>>>>>>     6.00    HTTP::Request    /usr/share/perl5/HTTP/Request.pm
>>>>>>>     6.04    HTTP::Response    /usr/share/perl5/HTTP/Response.pm
>>>>>>>     6.03    HTTP::Status    /usr/share/perl5/HTTP/Status.pm
>>>>>>>     1.28    IO    /usr/lib/perl/5.18/IO.pm
>>>>>>>     1.16    IO::File    /usr/lib/perl/5.18/IO/File.pm
>>>>>>>     1.34    IO::Handle    /usr/lib/perl/5.18/IO/Handle.pm
>>>>>>>     1.1    IO::Seekable    /usr/lib/perl/5.18/IO/Seekable.pm
>>>>>>>     1.21    IO::Select    /usr/lib/perl/5.18/IO/Select.pm
>>>>>>>     1.36    IO::Socket    /usr/lib/perl/5.18/IO/Socket.pm
>>>>>>>     1.33    IO::Socket::INET    /usr/lib/perl/5.18/IO/Socket/INET.pm
>>>>>>>     1.24    IO::Socket::UNIX    /usr/lib/perl/5.18/IO/Socket/UNIX.pm
>>>>>>>     1.13    IPC::Open3    /usr/share/perl/5.18/IPC/Open3.pm
>>>>>>>     0.53    Inline    /usr/local/share/perl/5.18.1/Inline.pm
>>>>>>>     UNKNOWN    Inline::denter
>>>>>>> /usr/local/share/perl/5.18.1/Inline/denter.pm <http://denter.pm>
>>>>>>>     UNKNOWN    Iterator
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator.pm
>>>>>>>     UNKNOWN    Iterator::Any
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator/Any.pm
>>>>>>>     UNKNOWN    Iterator::Fasta
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator/Fasta.pm
>>>>>>>     UNKNOWN    Iterator::GFF3
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Iterator/GFF3.pm
>>>>>>>     6.05    LWP    /usr/share/perl5/LWP.pm
>>>>>>>     UNKNOWN    LWP::MemberMixin    /usr/share/perl5/LWP/MemberMixin.pm
>>>>>>>     6.00    LWP::Protocol    /usr/share/perl5/LWP/Protocol.pm
>>>>>>>     6.05    LWP::UserAgent    /usr/share/perl5/LWP/UserAgent.pm
>>>>>>>     0.33    List::MoreUtils
>>>>>>> /usr/local/lib/perl/5.18.1/List/MoreUtils.pm
>>>>>>>     1.38    List::Util    /usr/local/lib/perl/5.18.1/List/Util.pm
>>>>>>>     UNKNOWN    MAKER::ConfigData
>>>>>>> /usr/local/annotation/maker2.31/bin/../perl/lib/MAKER/ConfigData.pm
>>>>>>>     1.32    POSIX    /usr/lib/perl/5.18/POSIX.pm
>>>>>>>     0.01    Parallel::Application::MPI
>>>>>>> /usr/local/annotation/maker2.31/bin/../perl/lib/Parallel/Application/MPI
>>>>>>> .pm
>>>>>>>     0.02    Perl::Unsafe::Signals
>>>>>>> /usr/local/lib/perl/5.18.1/Perl/Unsafe/Signals.pm
>>>>>>>     UNKNOWN    PhatHit_utils
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/PhatHit_utils.pm
>>>>>>>     UNKNOWN    PostData
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/PostData.pm
>>>>>>>     1.0    Proc::ProcessTable_simple
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Proc/ProcessTable_simple.pm
>>>>>>>     1.0    Proc::Signal
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Proc/Signal.pm
>>>>>>>     UNKNOWN    Process::MpiChunk
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm
>>>>>>>     UNKNOWN    Process::MpiTiers
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiTiers.pm
>>>>>>>     1.38    Scalar::Util    /usr/local/lib/perl/5.18.1/Scalar/Util.pm
>>>>>>>     1.02    SelectSaver    /usr/share/perl/5.18/SelectSaver.pm
>>>>>>>     UNKNOWN    Shadower
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Shadower.pm
>>>>>>>     UNKNOWN    SimpleCluster
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/SimpleCluster.pm
>>>>>>>     2.009    Socket    /usr/lib/perl/5.18/Socket.pm
>>>>>>>     UNKNOWN    SpaceBase
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/SpaceBase.pm
>>>>>>>     2.45    Storable    /usr/local/lib/perl/5.18.1/Storable.pm
>>>>>>>     1.07    Symbol    /usr/share/perl/5.18/Symbol.pm
>>>>>>>     1.17    Sys::Hostname    /usr/lib/perl/5.18/Sys/Hostname.pm
>>>>>>>     0.21    Sys::SigAction
>>>>>>> /usr/local/share/perl/5.18.1/Sys/SigAction.pm
>>>>>>>     UNKNOWN    Sys::SigAction::Alarm
>>>>>>> /usr/local/share/perl/5.18.1/Sys/SigAction/Alarm.pm
>>>>>>>     4.02    Term::ANSIColor    /usr/share/perl/5.18/Term/ANSIColor.pm
>>>>>>>     4.2    Tie::Handle    /usr/share/perl/5.18/Tie/Handle.pm
>>>>>>>     1.04    Tie::Hash    /usr/share/perl/5.18/Tie/Hash.pm
>>>>>>>     4.3    Tie::StdHandle    /usr/share/perl/5.18/Tie/StdHandle.pm
>>>>>>>     1.9726    Time::HiRes    /usr/local/lib/perl/5.18.1/Time/HiRes.pm
>>>>>>>     1.2300    Time::Local    /usr/share/perl/5.18/Time/Local.pm
>>>>>>>     1.60    URI    /usr/share/perl5/URI.pm
>>>>>>>     3.31    URI::Escape    /usr/share/perl5/URI/Escape.pm
>>>>>>>     UNKNOWN    Widget
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget.pm
>>>>>>>     UNKNOWN    Widget::RepeatMasker
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/RepeatMasker.pm
>>>>>>>     UNKNOWN    Widget::augustus
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/augustus.pm
>>>>>>> <http://augustus.pm>
>>>>>>>     UNKNOWN    Widget::blastn
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/blastn.pm
>>>>>>> <http://blastn.pm>
>>>>>>>     UNKNOWN    Widget::blastx
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/blastx.pm
>>>>>>> <http://blastx.pm>
>>>>>>>     UNKNOWN    Widget::exonerate
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate.pm
>>>>>>> <http://exonerate.pm>
>>>>>>>     UNKNOWN    Widget::exonerate::cdna2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate/cdna2genome.
>>>>>>> pm <http://cdna2genome.pm>
>>>>>>>     UNKNOWN    Widget::exonerate::est2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate/est2genome.p
>>>>>>> m <http://est2genome.pm>
>>>>>>>     UNKNOWN    Widget::exonerate::protein2genome
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/exonerate/protein2geno
>>>>>>> me.pm <http://protein2genome.pm>
>>>>>>>     UNKNOWN    Widget::fgenesh
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/fgenesh.pm
>>>>>>> <http://fgenesh.pm>
>>>>>>>     UNKNOWN    Widget::formater
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/formater.pm
>>>>>>> <http://formater.pm>
>>>>>>>     UNKNOWN    Widget::genemark
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/genemark.pm
>>>>>>> <http://genemark.pm>
>>>>>>>     UNKNOWN    Widget::snap
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/snap.pm
>>>>>>> <http://snap.pm>
>>>>>>>     UNKNOWN    Widget::snoscan
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/snoscan.pm
>>>>>>> <http://snoscan.pm>
>>>>>>>     UNKNOWN    Widget::tblastx
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/tblastx.pm
>>>>>>> <http://tblastx.pm>
>>>>>>>     UNKNOWN    Widget::trnascan
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Widget/trnascan.pm 
>>>>>>> <http://trnascan.pm> 
>>>>>>>     0.16    XSLoader    /usr/share/perl/5.18/XSLoader.pm
>>>>>>>     0.21    attributes    /usr/lib/perl/5.18/attributes.pm 
>>>>>>> <http://attributes.pm> 
>>>>>>>     2.18    base    /usr/share/perl/5.18/base.pm <http://base.pm> 
>>>>>>>     1.04    bytes    /usr/share/perl/5.18/bytes.pm <http://bytes.pm> 
>>>>>>>     UNKNOWN    clean    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/clean.pm <http://clean.pm> 
>>>>>>>     UNKNOWN    cluster    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/cluster.pm 
>>>>>>> <http://cluster.pm> 
>>>>>>>     UNKNOWN    compare    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/compare.pm 
>>>>>>> <http://compare.pm> 
>>>>>>>     1.27    constant    /usr/share/perl/5.18/constant.pm 
>>>>>>> <http://constant.pm> 
>>>>>>>     UNKNOWN    ds_utility    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/ds_utility.pm 
>>>>>>> <http://ds_utility.pm> 
>>>>>>>     UNKNOWN    exonerate::splice_info    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/exonerate/splice_info.pm 
>>>>>>> <http://splice_info.pm> 
>>>>>>>     0.34    forks    /usr/local/lib/perl/5.18.1/forks.pm 
>>>>>>> <http://forks.pm> 
>>>>>>>     2.08001    forks::Devel::Symdump    
>>>>>>> /usr/local/lib/perl/5.18.1/forks/Devel/Symdump.pm
>>>>>>>     0.34    forks::shared    /usr/local/lib/perl/5.18.1/forks/shared.pm 
>>>>>>> <http://shared.pm> 
>>>>>>>     0.34    forks::signals    
>>>>>>> /usr/local/lib/perl/5.18.1/forks/signals.pm <http://signals.pm> 
>>>>>>>     1.00    integer    /usr/share/perl/5.18/integer.pm 
>>>>>>> <http://integer.pm> 
>>>>>>>     0.63    lib    /usr/lib/perl/5.18/lib.pm <http://lib.pm> 
>>>>>>>     1.02    locale    /usr/share/perl/5.18/locale.pm <http://locale.pm> 
>>>>>>>     UNKNOWN    maker::auto_annotator    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/auto_annotator.pm 
>>>>>>> <http://auto_annotator.pm> 
>>>>>>>     UNKNOWN    maker::join    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/join.pm 
>>>>>>> <http://join.pm> 
>>>>>>>     UNKNOWN    maker::quality_index    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/quality_index.pm 
>>>>>>> <http://quality_index.pm> 
>>>>>>>     UNKNOWN    maker::sens_spec    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/maker/sens_spec.pm 
>>>>>>> <http://sens_spec.pm> 
>>>>>>>     1.22    overload    /usr/share/perl/5.18/overload.pm 
>>>>>>> <http://overload.pm> 
>>>>>>>     0.02    overloading    /usr/share/perl/5.18/overloading.pm 
>>>>>>> <http://overloading.pm> 
>>>>>>>     0.225    parent    /usr/share/perl/5.18/parent.pm <http://parent.pm> 
>>>>>>>     UNKNOWN    polisher    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher.pm 
>>>>>>> <http://polisher.pm> 
>>>>>>>     UNKNOWN    polisher::exonerate    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate.pm 
>>>>>>> <http://exonerate.pm> 
>>>>>>>     UNKNOWN    polisher::exonerate::altest    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate/altest.pm 
>>>>>>> <http://altest.pm> 
>>>>>>>     UNKNOWN    polisher::exonerate::est    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate/est.pm 
>>>>>>> <http://est.pm> 
>>>>>>>     UNKNOWN    polisher::exonerate::protein    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/polisher/exonerate/protein.pm 
>>>>>>> <http://protein.pm> 
>>>>>>>     UNKNOWN    repeat_mask_seq    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/repeat_mask_seq.pm 
>>>>>>> <http://repeat_mask_seq.pm> 
>>>>>>>     0.1    runlog    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/runlog.pm <http://runlog.pm> 
>>>>>>>     UNKNOWN    shadow_AED    
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/shadow_AED.pm
>>>>>>>     1.07    sigtrap    /usr/share/perl/5.18/sigtrap.pm 
>>>>>>> <http://sigtrap.pm> 
>>>>>>>     1.07    strict    /usr/share/perl/5.18/strict.pm <http://strict.pm> 
>>>>>>>     1.77    threads    /usr/local/lib/perl/5.18.1/forks.pm 
>>>>>>> <http://forks.pm> 
>>>>>>>     1.33    threads::shared    
>>>>>>> /usr/local/lib/perl/5.18.1/forks/shared.pm <http://shared.pm> 
>>>>>>>     1.03    vars    /usr/share/perl/5.18/vars.pm <http://vars.pm> 
>>>>>>>     1.18    warnings    /usr/share/perl/5.18/warnings.pm 
>>>>>>> <http://warnings.pm> 
>>>>>>>     1.02    warnings::register    
>>>>>>> /usr/share/perl/5.18/warnings/register.pm <http://register.pm> 
>>>>>>> STATUS: Parsing control files...
>>>>>>> Calling GI::load_control_files at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 452.
>>>>>>> Calling GI::new_instance_temp at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 463.
>>>>>>> Calling GI::mount_check at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 465.
>>>>>>> Calling GI::set_global_temp at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 483.
>>>>>>> STATUS: Processing and indexing input FASTA files...
>>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 519.
>>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 519.
>>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 519.
>>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 519.
>>>>>>> Calling GI::s_abs_path at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 519.
>>>>>>> Calling List::Util::shuffle at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 529.
>>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 536.
>>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::nextDef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::nextFastaRef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling File::NFSLock::unlock at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling GI::build_fasta_index at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538.
>>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 539.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 536.
>>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::nextDef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::nextFastaRef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling File::NFSLock::unlock at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling GI::build_fasta_index at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538.
>>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 539.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 536.
>>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::nextDef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::nextFastaRef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling File::NFSLock::unlock at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling GI::build_fasta_index at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538.
>>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 539.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling GI::split_db at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 536.
>>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling Iterator::Any::nextDef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 537.
>>>>>>> Calling mkdir at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling Iterator::Any::nextFastaRef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling system at /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling File::NFSLock::unlock at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 537.
>>>>>>> Calling GI::build_fasta_index at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 538.
>>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 539.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling File::NFSLock::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> Calling GI::create_blastdb at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 574.
>>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 575.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 575.
>>>>>>> Calling Iterator::Any::nextDef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 575.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 575.
>>>>>>> Calling File::Path::rmtree at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 575.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 575.
>>>>>>> Calling Iterator::Any::nextDef at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 575.
>>>>>>> Calling Iterator::Any::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 575.
>>>>>>> Calling GI::build_fasta_index at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 622.
>>>>>>> Calling FastaDB::new at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 623.
>>>>>>> Calling out to BioPerl Bio::DB::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Error.pm line 415.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> STATUS: Setting up database for any GFF3 input...
>>>>>>> Calling GFFDB::new at /usr/local/annotation/maker2.31/bin/maker line 
>>>>>>> 629.
>>>>>>> Calling GFFDB::next_build at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 631.
>>>>>>> Calling ds_utility::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 635.
>>>>>>> A data structure will be created for you at:
>>>>>>> /home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/dpp_contig.maker
>>>>>>> .output/dpp_contig_datastore
>>>>>>> 
>>>>>>> To access files for individual sequences use the datastore index:
>>>>>>> /home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/dpp_contig.maker
>>>>>>> .output/dpp_contig_master_datastore_index.log
>>>>>>> 
>>>>>>> Calling Datastore::MD5::new at /usr/local/annotation/maker2.31/bin/maker 
>>>>>>> line 636.
>>>>>>> Calling Iterator::Fasta::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 639.
>>>>>>> Calling Iterator::Fasta::skip_file at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 641.
>>>>>>> Calling Iterator::Fasta::step at 
>>>>>>> /usr/local/annotation/maker2.31/bin/maker line 643.
>>>>>>> STATUS: Now running MAKER...
>>>>>>> examining contents of the fasta file and run log
>>>>>>> Calling Datastore::MD5::mkdir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling Datastore::MD5::id_to_dir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling Datastore::MD5::mkdir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling uri_escape at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling File::Path::mkpath at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --Next Contig--
>>>>>>> 
>>>>>>> #---------------------------------------------------------------------
>>>>>>> Now starting the contig!!
>>>>>>> SeqID: contig-dpp-500-500
>>>>>>> Length: 32156
>>>>>>> #---------------------------------------------------------------------
>>>>>>> 
>>>>>>> 
>>>>>>> Calling FastaDB::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 462.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> setting up GFF3 output and fasta chunks
>>>>>>> doing repeat masking
>>>>>>> DBI 
>>>>>>> connect('dbname=/home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/
>>>>>>> dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open 
>>>>>>> database file at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm 
>>>>>>> line 107.
>>>>>>> Can't call method "do" on an undefined value at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm line 108.
>>>>>>> --> rank=NA, hostname=belem
>>>>>>> ERROR: Failed while doing repeat masking
>>>>>>> ERROR: Chunk failed at level:0, tier_type:1
>>>>>>> FAILED CONTIG:contig-dpp-500-500
>>>>>>> 
>>>>>>> ERROR: Chunk failed at level:2, tier_type:0
>>>>>>> FAILED CONTIG:contig-dpp-500-500
>>>>>>> 
>>>>>>> examining contents of the fasta file and run log
>>>>>>> Calling Datastore::MD5::mkdir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling Datastore::MD5::id_to_dir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling Datastore::MD5::mkdir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling uri_escape at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling File::Path::mkpath at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --Next Contig--
>>>>>>> 
>>>>>>> Processing run.log file...
>>>>>>> #---------------------------------------------------------------------
>>>>>>> Now retrying the contig!!
>>>>>>> SeqID: contig-dpp-500-500
>>>>>>> Length: 32156
>>>>>>> Tries: 2!!
>>>>>>> #---------------------------------------------------------------------
>>>>>>> 
>>>>>>> 
>>>>>>> Calling FastaDB::new at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 462.
>>>>>>> Calling out to BioPerl get_PrimarySeq_stream at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GI.pm line 1894.
>>>>>>> setting up GFF3 output and fasta chunks
>>>>>>> doing repeat masking
>>>>>>> DBI 
>>>>>>> connect('dbname=/home/cdantec/cutQuality/assembly/HR/path/to/Maker/test/
>>>>>>> dpp_contig.maker.output/dpp_contig.db','',...) failed: unable to open 
>>>>>>> database file at /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm 
>>>>>>> line 107.
>>>>>>> Can't call method "do" on an undefined value at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm line 108.
>>>>>>> --> rank=NA, hostname=belem
>>>>>>> ERROR: Failed while doing repeat masking
>>>>>>> ERROR: Chunk failed at level:0, tier_type:1
>>>>>>> FAILED CONTIG:contig-dpp-500-500
>>>>>>> 
>>>>>>> ERROR: Chunk failed at level:2, tier_type:0
>>>>>>> FAILED CONTIG:contig-dpp-500-500
>>>>>>> 
>>>>>>> examining contents of the fasta file and run log
>>>>>>> Calling Datastore::MD5::mkdir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling Datastore::MD5::id_to_dir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling Datastore::MD5::mkdir at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling uri_escape at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> Calling File::Path::mkpath at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/Process/MpiChunk.pm line 439.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --Next Contig--
>>>>>>> 
>>>>>>> Processing run.log file...
>>>>>>> 
>>>>>>> 
>>>>>>> Maker is now finished!!!
>>>>>>> 
>>>>>>> Many thanks for you help
>>>>>>> 
>>>>>>> Christelle
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 2014-03-19 14:01 GMT+01:00 Carson Holt <carsonhh at gmail.com>:
>>>>>>> Your problem is one of the following.  You need to reinstall the 
>>>>>>> DBD::SQLite module, you are running in a directory you don?t have 
>>>>>>> permissions for, you set your TMDIR environmental variable or TMP value 
>>>>>>> in maker_opts.ctl to an NFS mounted or memory mounted directory, or you 
>>>>>>> are using a self compiled version of Perl (I.e. not /usr/bin/perl) that 
>>>>>>> has issues (probably with DB or SQLite modules).  You can also 
>>>>>>> completely delete the output directory, and start again to see if it was 
>>>>>>> just a random error.  You should look at each of those first.  You can 
>>>>>>> also run MAKER with the --debug command line flag and send it to me if 
>>>>>>> all of those seem not to be the issue.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Carson
>>>>>>> 
>>>>>>> 
>>>>>>> From:  Chris Bioinfo <chrisbioinfo at gmail.com>
>>>>>>> Date:  Wednesday, March 19, 2014 at 5:09 AM
>>>>>>> To:  <maker-devel at yandell-lab.org>
>>>>>>> Subject:  [maker-devel] Annotation with maker2
>>>>>>> 
>>>>>>> Hello,
>>>>>>> 
>>>>>>> I'm installing/using maker2 for the first time and I have an error by 
>>>>>>> using it.
>>>>>>> 
>>>>>>> I certainly missing something, but I don't know what.
>>>>>>> 
>>>>>>> I compile maker with no error message and I have all these directories 
>>>>>>> after compilation: 
>>>>>>> bin  data  GMOD  INSTALL  lib  LICENSE  MWAS  perl  README  src
>>>>>>> 
>>>>>>> Nevertheless when I try maker2 on the test data (dpp_contig.fasta) I 
>>>>>>> have this error:
>>>>>>> 
>>>>>>> STATUS: Now running MAKER...
>>>>>>> examining contents of the fasta file and run log
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --Next Contig--
>>>>>>> 
>>>>>>> #---------------------------------------------------------------------
>>>>>>> Now starting the contig!!
>>>>>>> SeqID: contig-dpp-500-500
>>>>>>> Length: 32156
>>>>>>> #---------------------------------------------------------------------
>>>>>>> 
>>>>>>> 
>>>>>>> setting up GFF3 output and fasta chunks
>>>>>>> doing repeat masking
>>>>>>> DBI 
>>>>>>> connect('dbname=/path/to/dpp_contig.maker.output/dpp_contig.db','',...) 
>>>>>>> failed: unable to open database file at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm
>>>>>>> 
>>>>>>> Can't call method "do" on an undefined value at 
>>>>>>> /usr/local/annotation/maker2.31/bin/../lib/GFFDB.pm 
>>>>>>> --> rank=NA, hostname=belem
>>>>>>> ERROR: Failed while doing repeat masking
>>>>>>> ERROR: Chunk failed at level:0, tier_type:1
>>>>>>> FAILED CONTIG:contig-dpp-500-500
>>>>>>> ...
>>>>>>> 
>>>>>>> ideas?
>>>>>>> 
>>>>>>> Best,
>>>>>>> 
>>>>>>> Christelle
>>>>>>> 
>>>>>>> _______________________________________________ maker-devel mailing list 
>>>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listin
>>>>>>> fo/maker-devel_yandell-lab.org
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140321/add1314d/attachment-0003.html>

From jfierst at uoregon.edu  Fri Mar 21 09:43:59 2014
From: jfierst at uoregon.edu (Janna Fierst)
Date: Fri, 21 Mar 2014 08:43:59 -0700
Subject: [maker-devel] associating gene names between related strains
In-Reply-To: <CF489F0B.AC19%carsonhh@gmail.com>
References: <CAGoyurZz5FvX_oCGtSoq5mzwfabFS5ixaHVgzQds7Bo26NcYHg@mail.gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6C3C3@mxb2.hg.genetics.utah.edu>
	<CF489F0B.AC19%carsonhh@gmail.com>
Message-ID: <CAGoyurYLEQqXv0e9wik4NQUXMZgkrUge2-uuh7xfGWEj9oKGow@mail.gmail.com>

Hi,

I just wanted to say thanks for all your help- I did the reciprocal best
blast hits and then used the maker scripts (map_fasta_ids, map_gff_ids) to
associate names between strain assemblies/annotations. Worked perfectly!
-Janna


On Fri, Mar 14, 2014 at 11:02 AM, Carson Holt <carsonhh at gmail.com> wrote:

> maker_map_ids does a translation (i.e. change gene-A to smug1), so you
> need to know which genes you want to translate names to (two column input
> file, column 1 -> original ID, column 2 -> new ID).  I'm not sure EST
> forward is the best way to do this, although I do think maker_map_ids is
> the tool to use in the end.  The question is how to make a list of IDs to
> translate as the input to maker_map_ids?
>
> I would actually just use BLASTP against the reference strain, and then
> do reciprocal best BLAST hits.  To do this you BLAST your reference
> proteins against your maker proteins.  Then do the opposite, BLAST your
>  maker proteins against your reference proteins.  If they are both each
> others best hit, then they are orthologous, and you can safely make a two
> column entry for the maker_map_ids input (i.e. maker-gene-1 translates into
> smug1).
>
> --Carson
>
>
> From: Daniel Ence <dence at genetics.utah.edu>
> Date: Friday, March 14, 2014 at 11:32 AM
> To: Janna Fierst <jfierst at uoregon.edu>, "maker-devel at yandell-lab.org" <
> maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] associating gene names between related strains
>
> Hi Janna, So do you have one strain that you want to use as the reference
> for all the others? There's a script that comes with MAKER called
> maker_map_ids that lets you use a common prefix or suffix for entries in a
> fasta file from one strain and then use est_forward to use that ID in the
> gene models for the other species.
>
> Let me know if that's not what you're looking for,
> Daniel
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ------------------------------
> *From:* maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
> Janna Fierst [jfierst at uoregon.edu]
> *Sent:* Friday, March 14, 2014 10:06 AM
> *To:* maker-devel at yandell-lab.org
> *Subject:* [maker-devel] associating gene names between related strains
>
> Hi,
>
> we are assembling and annotating genomes for several related strains of
> Caenorhabditis worms and I was wondering if there is a way to coordinate
> the gene naming so that orthologs between species can be associated by
> name. I have been playing around a little with the est_forward option but
> can't figure out a good system/workflow that preserves names but still uses
> the strain-specific RNA-Seq EST set for the actual gene models. Thanks!
> -Janna
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140321/b8ab29c4/attachment-0003.html>

From carsonhh at gmail.com  Fri Mar 21 09:54:15 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 21 Mar 2014 09:54:15 -0600
Subject: [maker-devel] associating gene names between related strains
In-Reply-To: <CAGoyurYLEQqXv0e9wik4NQUXMZgkrUge2-uuh7xfGWEj9oKGow@mail.gmail.com>
References: <CAGoyurZz5FvX_oCGtSoq5mzwfabFS5ixaHVgzQds7Bo26NcYHg@mail.gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6C3C3@mxb2.hg.genetics.utah.edu>
	<CF489F0B.AC19%carsonhh@gmail.com>
	<CAGoyurYLEQqXv0e9wik4NQUXMZgkrUge2-uuh7xfGWEj9oKGow@mail.gmail.com>
Message-ID: <CF51BCA1.AFB9%carsonhh@gmail.com>

I'm glad we could help.

--Carson

From:  Janna Fierst <jfierst at uoregon.edu>
Date:  Friday, March 21, 2014 at 9:43 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] associating gene names between related strains

Hi,

I just wanted to say thanks for all your help- I did the reciprocal best
blast hits and then used the maker scripts (map_fasta_ids, map_gff_ids) to
associate names between strain assemblies/annotations. Worked perfectly!
-Janna


On Fri, Mar 14, 2014 at 11:02 AM, Carson Holt <carsonhh at gmail.com> wrote:
> maker_map_ids does a translation (i.e. change gene-A to smug1), so you need to
> know which genes you want to translate names to (two column input file, column
> 1 -> original ID, column 2 -> new ID).  I?m not sure EST forward is the best
> way to do this, although I do think maker_map_ids is the tool to use in the
> end.  The question is how to make a list of IDs to translate as the input to
> maker_map_ids?
> 
> I would actually just use BLASTP against the reference strain, and then do
> reciprocal best BLAST hits.  To do this you BLAST your reference proteins
> against your maker proteins.  Then do the opposite, BLAST your  maker proteins
> against your reference proteins.  If they are both each others best hit, then
> they are orthologous, and you can safely make a two column entry for the
> maker_map_ids input (i.e. maker-gene-1 translates into smug1).
> 
> ?Carson
> 
> 
> From:  Daniel Ence <dence at genetics.utah.edu>
> Date:  Friday, March 14, 2014 at 11:32 AM
> To:  Janna Fierst <jfierst at uoregon.edu>, "maker-devel at yandell-lab.org"
> <maker-devel at yandell-lab.org>
> Subject:  Re: [maker-devel] associating gene names between related strains
> 
> Hi Janna, So do you have one strain that you want to use as the reference for
> all the others? There's a script that comes with MAKER called maker_map_ids
> that lets you use a common prefix or suffix for entries in a fasta file from
> one strain and then use est_forward to use that ID in the gene models for the
> other species. 
> 
> Let me know if that's not what you're looking for,
> Daniel
> 
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> 
> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Janna
> Fierst [jfierst at uoregon.edu]
> Sent: Friday, March 14, 2014 10:06 AM
> To: maker-devel at yandell-lab.org
> Subject: [maker-devel] associating gene names between related strains
> 
> Hi,
> 
> we are assembling and annotating genomes for several related strains of
> Caenorhabditis worms and I was wondering if there is a way to coordinate the
> gene naming so that orthologs between species can be associated by name. I
> have been playing around a little with the est_forward option but can't figure
> out a good system/workflow that preserves names but still uses the
> strain-specific RNA-Seq EST set for the actual gene models. Thanks! -Janna
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak
> er-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140321/8a62aa07/attachment-0003.html>

From Hossein.Borhan at AGR.GC.CA  Fri Mar 21 10:41:38 2014
From: Hossein.Borhan at AGR.GC.CA (Borhan, Hossein)
Date: Fri, 21 Mar 2014 16:41:38 +0000
Subject: [maker-devel] non-nucleotide characters in the maker generated
	transcripts
In-Reply-To: <CF4CA8DB.AD74%carson.holt@genetics.utah.edu>
References: <E8EDFB90D92694478065C37017B3A3A6A890C8AC@SKREGIXES2.AGR.GC.CA>
	<CF47300B.AB4F%carson.holt@genetics.utah.edu>
	<CF4731CC.AB5E%carson.holt@genetics.utah.edu>
	<E8EDFB90D92694478065C37017B3A3A6A890CC84@SKREGIXES2.AGR.GC.CA>
	<CF4CA8DB.AD74%carson.holt@genetics.utah.edu>
Message-ID: <E8EDFB90D92694478065C37017B3A3A6A890F2F6@SKREGIXES2.AGR.GC.CA>

Dear Carson

I ran maker and modified .pm files and it resolved the problem with the
fasta output. Thanks a lot for your help.


HB


On 14-03-17 1:45 PM, "Carson Holt" <carson.holt at genetics.utah.edu> wrote:

>I have attached 4 files for you to place in the .../maker/Widgets/
>directory.
>
>The *blast.pm files will suppress the BLAST+ failures you are getting
>(alternatively you can just downgrade to BLAST 2.27 to get the same
>effect).  BLAST 2.29 gives a lot of warnings etc., which you can ignore.
>In the latest release NCBI redid all their warnings and error codes so it
>spits out a lot of garbage and fails with different messages than it did
>before.  For example BLAST now warns you every time it encounter a fasta
>header with a comment (virtually every fasta entry in existence falls in
>this category), so your screen will be awash with meaningless warning
>messages.
>
>The fgenesh.pm file will fix the other failure, which only occurs if you
>use fgenesh simultaneously with the est_fustion=1 option.  No other
>predictors are affected.
>
>Thanks,
>Carson
>
>
>On 3/14/14, 5:14 PM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:
>
>>Dear  Carson
>>
>>Sorry for the late reply. I was away for a couple of days. I have
>>uploaded
>>the out put files plus control and error output on the FTP site that you
>>provided
>>The user ID is borhanh
>>
>>I used blast+ for this run.
>>
>>
>>
>>
>>Regards
>>
>>
>>HB
>>
>>
>>
>>
>>
>>
>>
>>
>>On 14-03-13 10:00 AM, "Carson Holt" <carson.holt at genetics.utah.edu>
>>wrote:
>>
>>>Just resending this to the correct maker-devel address.  Please when
>>>replying, do not CC the incorrect maker-devel-bounce address.
>>>
>>>Thanks,
>>>Carson
>>>
>>>
>>>On 3/13/14, 9:56 AM, "Carson Holt" <carson.holt at genetics.utah.edu>
>>>wrote:
>>>
>>>>FGENESH is not a heavily used tool, so depending on which version it is
>>>>(either too old or too new), output might be slightly different which
>>>>could cause incorrect parsing. Could you tar up your maker.output
>>>>folder,
>>>>and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>>>>(send me either your user/guest ID after you upload).
>>>>
>>>>For the BLAST error, use BLAST+ instead.  You are using blastall which
>>>>is
>>>>the old legacy version of NCBI BLAST.  You can do this by setting the
>>>>blast type in maker_bopts.ctl and the location of executables in
>>>>maker_exe.ctl.
>>>>
>>>>Thanks,
>>>>Carson
>>>>
>>>>
>>>>
>>>>On 3/12/14, 11:58 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA>
>>>>wrote:
>>>>
>>>>>Dear Maker users
>>>>>
>>>>>
>>>>>I ran maker (2.31) on a fungal genome and found out that it inserted
>>>>>the
>>>>>word SCLAR   followed by a pair of bracket like this (0x22de7020)
>>>>>inserted in the nucleotide sequence of some of the genes. This seems
>>>>>to
>>>>>be related to transcripts predicted by fgenesh_masked.
>>>>>
>>>>>
>>>>>Here is an example for one of the genes
>>>>>
>>>>>
>>>>>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript
>>>>>>offset:0 AE
>>>>>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651
>>>>>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23
>>>>>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA
>>>>>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG
>>>>>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC
>>>>>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT
>>>>>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC
>>>>>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT
>>>>>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA
>>>>>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA
>>>>>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT
>>>>>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT
>>>>>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC
>>>>>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG
>>>>>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG
>>>>>TTTCGACAAGC
>>>>>
>>>>>The same genome sequence was used for the first round of maker (2.10)
>>>>>without such problem. I checked the sequence for the scaffold related
>>>>>to
>>>>>one of the affected transcripts and there was no error in the
>>>>>sequence.
>>>>>I am not sure what is causing this. The only error that I could spot
>>>>>in
>>>>>the output error file is the following
>>>>>
>>>>>
>>>>>[blastall] FATAL ERROR:  search cannot proceed due to errors in all
>>>>>contexts/frames of query sequences.
>>>>>
>>>>>
>>>>>
>>>>>Your help is appreciated
>>>>>
>>>>>
>>>>>
>>>>>HB
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


From carsonhh at gmail.com  Fri Mar 21 10:43:10 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 21 Mar 2014 10:43:10 -0600
Subject: [maker-devel] non-nucleotide characters in the maker generated
 transcripts
Message-ID: <CF51C832.AFC0%carsonhh@gmail.com>

Thanks for letting me know.

--Carson


On 3/21/14, 10:41 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:

>Dear Carson
>
>I ran maker and modified .pm files and it resolved the problem with the
>fasta output. Thanks a lot for your help.
>
>
>
>
>HB
>
>
>
>
>
>
>
>
>On 14-03-17 1:45 PM, "Carson Holt" <carson.holt at genetics.utah.edu> wrote:
>
>>I have attached 4 files for you to place in the .../maker/Widgets/
>>directory.
>>
>>The *blast.pm files will suppress the BLAST+ failures you are getting
>>(alternatively you can just downgrade to BLAST 2.27 to get the same
>>effect).  BLAST 2.29 gives a lot of warnings etc., which you can ignore.
>>In the latest release NCBI redid all their warnings and error codes so it
>>spits out a lot of garbage and fails with different messages than it did
>>before.  For example BLAST now warns you every time it encounter a fasta
>>header with a comment (virtually every fasta entry in existence falls in
>>this category), so your screen will be awash with meaningless warning
>>messages.
>>
>>The fgenesh.pm file will fix the other failure, which only occurs if you
>>use fgenesh simultaneously with the est_fustion=1 option.  No other
>>predictors are affected.
>>
>>Thanks,
>>Carson
>>
>>
>>On 3/14/14, 5:14 PM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:
>>
>>>Dear  Carson
>>>
>>>Sorry for the late reply. I was away for a couple of days. I have
>>>uploaded
>>>the out put files plus control and error output on the FTP site that you
>>>provided
>>>The user ID is borhanh
>>>
>>>I used blast+ for this run.
>>>
>>>
>>>
>>>
>>>Regards
>>>
>>>
>>>HB
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>On 14-03-13 10:00 AM, "Carson Holt" <carson.holt at genetics.utah.edu>
>>>wrote:
>>>
>>>>Just resending this to the correct maker-devel address.  Please when
>>>>replying, do not CC the incorrect maker-devel-bounce address.
>>>>
>>>>Thanks,
>>>>Carson
>>>>
>>>>
>>>>On 3/13/14, 9:56 AM, "Carson Holt" <carson.holt at genetics.utah.edu>
>>>>wrote:
>>>>
>>>>>FGENESH is not a heavily used tool, so depending on which version it
>>>>>is
>>>>>(either too old or too new), output might be slightly different which
>>>>>could cause incorrect parsing. Could you tar up your maker.output
>>>>>folder,
>>>>>and send it to http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>>>>>(send me either your user/guest ID after you upload).
>>>>>
>>>>>For the BLAST error, use BLAST+ instead.  You are using blastall which
>>>>>is
>>>>>the old legacy version of NCBI BLAST.  You can do this by setting the
>>>>>blast type in maker_bopts.ctl and the location of executables in
>>>>>maker_exe.ctl.
>>>>>
>>>>>Thanks,
>>>>>Carson
>>>>>
>>>>>
>>>>>
>>>>>On 3/12/14, 11:58 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA>
>>>>>wrote:
>>>>>
>>>>>>Dear Maker users
>>>>>>
>>>>>>
>>>>>>I ran maker (2.31) on a fungal genome and found out that it inserted
>>>>>>the
>>>>>>word SCLAR   followed by a pair of bracket like this (0x22de7020)
>>>>>>inserted in the nucleotide sequence of some of the genes. This seems
>>>>>>to
>>>>>>be related to transcripts predicted by fgenesh_masked.
>>>>>>
>>>>>>
>>>>>>Here is an example for one of the genes
>>>>>>
>>>>>>
>>>>>>>fgenesh_masked-scaffold00087-processed-gene-3.142-mRNA-1 transcript
>>>>>>>offset:0 AE
>>>>>>D:0.01 eAED:0.00 QI:0|1|0.85|1|0.5|0.42|7|144|651
>>>>>>ATGCGTTACTCCCAGATCTTTGGCAGTGCTGCTGCGCTTGTTGGCTCTGSCALAR(0x23
>>>>>>418b90)SCALAR(0x244c8ca0)GCTTTGGGGCGTGGAGAACAGTGGTGACGACTTCA
>>>>>>AGCGCAACGGCAAAGACATTCACATGAACAACCCCGGCGAGAAAATCCATTACATGGGCG
>>>>>>ATGTCACCAAGCCAAATGACAACTGGTATGGATATCCTACCTGCTTCACTGTCTGGCAAC
>>>>>>CCAGTGACTTCACCGACAAAACCTTCAAGGTCGGCGACTGGTTCGTGCAAGCACCCACTT
>>>>>>CTTCCTTTGGCGACGAAACATGCAGTCAGCGGGCCACCGCACCCAAGCTCACCCTGTTTC
>>>>>>CTCACTCTGCACCCATTGATTGCAAGTTCGATGCCGAGAGTACGACCATGTACATTACCT
>>>>>>ATCATGGTAGCTGGAACCGCTCGCCCGTCACGGGCTTCAAGCTCGTCGCTGTGCAGTTTA
>>>>>>AGCTTGGCGCTGATGGCCAGTATACGCCTGTCGAGCCGCTTACCAGCACAACCGCGGCCA
>>>>>>AGGATATCTTTTACAATCCGAGGGTGGAGAGCTGTCAGGGTAATGGCCCGGGATTCAGCT
>>>>>>CGGGTTGCTTCAGACCTGCAGGCTTGGCATGGGATCCCCAGGGTCGGTTGATCATGACGT
>>>>>>CGGATACATCGAGCAATGGTGAGCTGTGGATCTTGGGTACATCTTGAATGACATGTCAGC
>>>>>>AAGGCAGAAGGTAAGTAGTGGATGCCGTTGGAGGAAGTTTGTAAATACAGTGATGCAATG
>>>>>>CCACGGTCGTTCTCTTTTTGCGGTGCTGGCCAGGATAACAAGGTCAATTGACTTTGGATG
>>>>>>TTTCGACAAGC
>>>>>>
>>>>>>The same genome sequence was used for the first round of maker (2.10)
>>>>>>without such problem. I checked the sequence for the scaffold related
>>>>>>to
>>>>>>one of the affected transcripts and there was no error in the
>>>>>>sequence.
>>>>>>I am not sure what is causing this. The only error that I could spot
>>>>>>in
>>>>>>the output error file is the following
>>>>>>
>>>>>>
>>>>>>[blastall] FATAL ERROR:  search cannot proceed due to errors in all
>>>>>>contexts/frames of query sequences.
>>>>>>
>>>>>>
>>>>>>
>>>>>>Your help is appreciated
>>>>>>
>>>>>>
>>>>>>
>>>>>>HB
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From marc.hoeppner at imbim.uu.se  Mon Mar 24 04:08:25 2014
From: marc.hoeppner at imbim.uu.se (=?iso-8859-1?Q?Marc_H=F6ppner?=)
Date: Mon, 24 Mar 2014 10:08:25 +0000
Subject: [maker-devel] Annotations from proteins, follow-up
Message-ID: <10AFC7D0-82BA-4527-9B77-80DC4BE80CFD@imbim.uu.se>

Hi,

I had previously inquired about protein-based gene building (for example to create a training set for SNAP). This is currently possible with Maker (2.31), but I noticed a limitation. Specifically, I tend to run Maker once to generate all the raw computes (protein and set alignments, mostly). I then separate these out into GFF files that I can store away and use in various combinations of settings and data in parallel. 

However, the protein2genome option does not seem to work off pre-aligned protein data (e.g. protein2genome.gff produced with Maker). Is that intentional and is there a work-around? Or is the only option to run this with fasta files?

Cheers,

Marc


Marc P. Hoeppner, PhD

Department for Medical Biochemistry and Microbiology
Uppsala University, Sweden
marc.hoeppner at imbim.uu.se


From sujaikumar at gmail.com  Mon Mar 24 08:15:16 2014
From: sujaikumar at gmail.com (Sujai)
Date: Mon, 24 Mar 2014 14:15:16 +0000
Subject: [maker-devel] Dashes in transcript predictions
Message-ID: <CAFADFFt-Af82itPN8kXv1Ozh_9K1YxO+9NWBYkDW2aR4jP4yFg@mail.gmail.com>

Dear Maker Team

On a recent run with maker 2.31, I noticed that a couple of the transcripts
had dashes/hyphens in them.

Example:
>snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript offset:261
AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAATTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG
AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATTCCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT
GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGACCATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG
GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTTACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT
AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCCTTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG
ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAAATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT
AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAACCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT
TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA

The protein prediction for this transcript is ok:

>snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25
eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCVMTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY
DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKNKKMVVWVVSSLPSAAIRNAKRRINEQSSHV

Is this a known bug? I tried searching for "dash|hyphen" in the email list
but couldn't find anything else.

Best wishes,

- Sujai

ps. I pulled out just this one contig and ran maker on it. all the
.maker.output files are attached.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140324/c626ff64/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nGt.0.3.035610.maker.output.tgz
Type: application/x-gzip
Size: 45641 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140324/c626ff64/attachment-0003.tgz>

From carsonhh at gmail.com  Mon Mar 24 10:49:46 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Mar 2014 10:49:46 -0600
Subject: [maker-devel] Dashes in transcript predictions
In-Reply-To: <CAFADFFt-Af82itPN8kXv1Ozh_9K1YxO+9NWBYkDW2aR4jP4yFg@mail.gmail.com>
References: <CAFADFFt-Af82itPN8kXv1Ozh_9K1YxO+9NWBYkDW2aR4jP4yFg@mail.gmail.com>
Message-ID: <CF55BD0D.B01C%carsonhh@gmail.com>

I've actually never seen that before, but looking through your output it
appears to be specifically caused by setting correct_est_fusion=1, and how
it interacts with some features of your dataset.

I've attached a patch in the form of a file you can use to replace
.../maker/lib/maker/join.pm.  I'm also going to add it to the MAKER
download.

Thanks,
Carson


From:  Sujai <sujaikumar at gmail.com>
Date:  Monday, March 24, 2014 at 8:15 AM
To:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Dashes in transcript predictions

Dear Maker Team

On a recent run with maker 2.31, I noticed that a couple of the transcripts
had dashes/hyphens in them.

Example:
>snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript offset:261
AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAA
TTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG
AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATT
CCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT
GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGAC
CATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG
GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTT
ACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT
AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCC
TTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG
ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAA
ATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT
AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAA
CCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT
TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA

The protein prediction for this transcript is ok:

>snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25 eAED:0.25
QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCV
MTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY
DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKN
KKMVVWVVSSLPSAAIRNAKRRINEQSSHV

Is this a known bug? I tried searching for "dash|hyphen" in the email list
but couldn't find anything else.

Best wishes,

- Sujai

ps. I pulled out just this one contig and ran maker on it. all the
.maker.output files are attached.


_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140324/ebc5d81c/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: join.pm
Type: text/x-perl-script
Size: 18645 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140324/ebc5d81c/attachment-0003.bin>

From carsonhh at gmail.com  Mon Mar 24 11:05:15 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Mar 2014 11:05:15 -0600
Subject: [maker-devel] Annotations from proteins, follow-up
Message-ID: <CF55BE79.B028%carsonhh@gmail.com>

It not so much intentional as it is a a limitation of the information in
GFF3 format alignments. Right now protein2genome for Eukaryotes will only
try and make exonerate derived alignments work because they have been
polished around splice sites and MAKER still has access to the original
protein sequence and alignment cigar string fro additional filtering, etc.
 With GFF3 pass-through the algorithm doesn't know nearly as much about
what is passed in. For example the protein sequence is gone, cigar
alignment strings are rarely included (Gap= attribute in GFF3), and it's
not always clear if the  alignment was polished for splice sites.  Also
since protein2genome=1 is expected to be used only to generate an initial
training set, and not for final annotations, this is considered a
reasonable restriction.

If you still really want to force protein alignments from a GFF3 to be
considered as potential models, you could put them in as pred_gff.  In
which case they will always be considered as potential models.  Of course
it will be relatively ugly because you lack things I mentioned before such
as the alignment cigar string and original protein sequence that are
normally used to filter protein2genome results for inclusion as models.

--Carson


On 3/24/14, 4:08 AM, "Marc H?ppner" <marc.hoeppner at imbim.uu.se> wrote:

>Hi,
>
>I had previously inquired about protein-based gene building (for example
>to create a training set for SNAP). This is currently possible with Maker
>(2.31), but I noticed a limitation. Specifically, I tend to run Maker
>once to generate all the raw computes (protein and set alignments,
>mostly). I then separate these out into GFF files that I can store away
>and use in various combinations of settings and data in parallel.
>
>However, the protein2genome option does not seem to work off pre-aligned
>protein data (e.g. protein2genome.gff produced with Maker). Is that
>intentional and is there a work-around? Or is the only option to run this
>with fasta files?
>
>Cheers,
>
>Marc
>
>
>Marc P. Hoeppner, PhD
>
>Department for Medical Biochemistry and Microbiology
>Uppsala University, Sweden
>marc.hoeppner at imbim.uu.se
>
>
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Mon Mar 24 12:15:39 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Mar 2014 12:15:39 -0600
Subject: [maker-devel] Dashes in transcript predictions
In-Reply-To: <CF55BD0D.B01C%carsonhh@gmail.com>
References: <CAFADFFt-Af82itPN8kXv1Ozh_9K1YxO+9NWBYkDW2aR4jP4yFg@mail.gmail.com>
	<CF55BD0D.B01C%carsonhh@gmail.com>
Message-ID: <CF55C7D4.B05A%carsonhh@gmail.com>

One more note on this.  The sequence is actually fully correct if you just
remove the '-' characters.  So if you don't want to rerun MAKER with the
patch, then you can use the attached script to just repair the transcript
file by removing the '-' characters.  Your GFF3 files and proteins files
should already be correct as is.

Usage --> perl fix_dash transcript_file.fasta > new_file.fasta

You may need to place the script in the .../maker/bin/ directory so it can
detect BioPerl if you don't have BioPerl installed system wide.

Thanks,
Carson

From:  Carson Holt <carsonhh at gmail.com>
Date:  Monday, March 24, 2014 at 10:49 AM
To:  Sujai <sujaikumar at gmail.com>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Dashes in transcript predictions

I've actually never seen that before, but looking through your output it
appears to be specifically caused by setting correct_est_fusion=1, and how
it interacts with some features of your dataset.

I've attached a patch in the form of a file you can use to replace
.../maker/lib/maker/join.pm.  I'm also going to add it to the MAKER
download.

Thanks,
Carson


From:  Sujai <sujaikumar at gmail.com>
Date:  Monday, March 24, 2014 at 8:15 AM
To:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Dashes in transcript predictions

Dear Maker Team

On a recent run with maker 2.31, I noticed that a couple of the transcripts
had dashes/hyphens in them.

Example:
>snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript offset:261
AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAA
TTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG
AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATT
CCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT
GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGAC
CATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG
GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTT
ACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT
AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCC
TTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG
ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAA
ATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT
AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAA
CCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT
TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA

The protein prediction for this transcript is ok:

>snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25 eAED:0.25
QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCV
MTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY
DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKN
KKMVVWVVSSLPSAAIRNAKRRINEQSSHV

Is this a known bug? I tried searching for "dash|hyphen" in the email list
but couldn't find anything else.

Best wishes,

- Sujai

ps. I pulled out just this one contig and ran maker on it. all the
.maker.output files are attached.


_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m
aker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140324/0a71390d/attachment-0003.html>

From sujaikumar at gmail.com  Mon Mar 24 12:17:02 2014
From: sujaikumar at gmail.com (Sujai)
Date: Mon, 24 Mar 2014 18:17:02 +0000
Subject: [maker-devel] Dashes in transcript predictions
In-Reply-To: <CF55C7D4.B05A%carsonhh@gmail.com>
References: <CAFADFFt-Af82itPN8kXv1Ozh_9K1YxO+9NWBYkDW2aR4jP4yFg@mail.gmail.com>
	<CF55BD0D.B01C%carsonhh@gmail.com> <CF55C7D4.B05A%carsonhh@gmail.com>
Message-ID: <CAFADFFs6KYiZ8rmfEwYVCYbGymJOUXHVcKVShscBBjjCR3q2fA@mail.gmail.com>

Wow. That was a super quick response. Thanks very much for confirming the
problem and the fixes!


On 24 March 2014 18:15, Carson Holt <carsonhh at gmail.com> wrote:

> One more note on this.  The sequence is actually fully correct if you just
> remove the '-' characters.  So if you don't want to rerun MAKER with the
> patch, then you can use the attached script to just repair the transcript
> file by removing the '-' characters.  Your GFF3 files and proteins files
> should already be correct as is.
>
> Usage --> perl fix_dash transcript_file.fasta > new_file.fasta
>
> You may need to place the script in the .../maker/bin/ directory so it can
> detect BioPerl if you don't have BioPerl installed system wide.
>
> Thanks,
> Carson
>
> From: Carson Holt <carsonhh at gmail.com>
> Date: Monday, March 24, 2014 at 10:49 AM
> To: Sujai <sujaikumar at gmail.com>, "maker-devel at yandell-lab.org" <
> maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Dashes in transcript predictions
>
> I've actually never seen that before, but looking through your output it
> appears to be specifically caused by setting correct_est_fusion=1, and how
> it interacts with some features of your dataset.
>
> I've attached a patch in the form of a file you can use to replace
> .../maker/lib/maker/join.pm.  I'm also going to add it to the MAKER
> download.
>
> Thanks,
> Carson
>
>
> From: Sujai <sujaikumar at gmail.com>
> Date: Monday, March 24, 2014 at 8:15 AM
> To: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: [maker-devel] Dashes in transcript predictions
>
> Dear Maker Team
>
> On a recent run with maker 2.31, I noticed that a couple of the
> transcripts had dashes/hyphens in them.
>
> Example:
> >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 transcript
> offset:261 AED:0.25 eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
> TTTGATTATTAATTATTTTTGTCTTTATTAA-------AAAATAATTTTGGTACAAACAATCGAATTAATAT-TAATTAAAGTTTTTATCAGCCTTATAAAATCTACGACACCGGCTTTTACCAATGTTTAGCG
> AGTGATTCTCTCAACAGAAGTATCTCCAAATCAATATTCGTTGAATGTAAATGAACCCAAACACCTTATTCTCATTCCTCCGGAAGAAGCTCCTGAATCAACTTTTGATCTCTACAGTAATGTATCTATGAATT
> GCGAAGGAAGAAGTTATTTTCCGAATCAACCAATCATTGTTAATTGGATGTTTAAACATAAAGACTCATATACGACCATAACAAGAGATCACAAAATGGCTACAAGAATAATCACTGCATCAAACAGATCAAAG
> GAAACTAATCTTGATTTGGTCAATATATTTTCTTACCTTACCATAAATGATATCCGCGAAGAAGATGGTGGAGTTTACAAATGTGTGATGACTCAAGGAAGTGTTGACGAAGAACAAGAATTTCTAGTAACTAT
> AAACAATCAAAGTGAAAAGGAAATTGATGTATCCATTTTTTACCAAGATGATGACTTTGTAAGTGTTCGAGCAGCCTTAGAAACAGTCAAGATTTTAGAGAATTACCAGTTTCGATGTTGGTTGTACGACCGGG
> ATAAGACGTATGGTCAAGACGCCGGGAAGCCGACGAAATCGACAGAAAACCGTATAGGTCGTTATTATCAGTCAAAATATTCTGATTGTTCTCAATTTCGCATAGAAAGTTTCTATCAGCTGCCAATTTCTGTT
> AACCGATGGCTGAAAAAAGAACTCAGTTTACAGTCTTTCTTTCAGCCATTTAGCTTTAATTGGGACCCTCAAAAAACCCCTAAAAACAAGAAAATGGTAGTATGGGTTGTTTCTTCCCTACCCTCAGCGGCGAT
> TCGTAATGCAAAGAGAAGAATCAATGAACAATCTTCTCATGTATAA
>
> The protein prediction for this transcript is ok:
>
> >snap_masked-nGt.0.3.035610-processed-gene-0.2-mRNA-1 protein AED:0.25
> eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|0|240
>
> MNCEGRSYFPNQPIIVNWMFKHKDSYTTITRDHKMATRIITASNRSKETNLDLVNIFSYLTINDIREEDGGVYKCVMTQGSVDEEQEFLVTINNQSEKEIDVSIFYQDDDFVSVRAALETVKILENYQFRCWLY
>
> DRDKTYGQDAGKPTKSTENRIGRYYQSKYSDCSQFRIESFYQLPISVNRWLKKELSLQSFFQPFSFNWDPQKTPKNKKMVVWVVSSLPSAAIRNAKRRINEQSSHV
>
> Is this a known bug? I tried searching for "dash|hyphen" in the email list
> but couldn't find anything else.
>
> Best wishes,
>
> - Sujai
>
> ps. I pulled out just this one contig and ran maker on it. all the
> .maker.output files are attached.
>
>
>  _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140324/88aabc4b/attachment-0003.html>

From diana.garnica at anu.edu.au  Mon Mar 24 17:11:01 2014
From: diana.garnica at anu.edu.au (Diana Garnica Moreno)
Date: Mon, 24 Mar 2014 23:11:01 +0000
Subject: [maker-devel] Problem extracting fasta from a GFF file generated
	with MAKER
Message-ID: <1264f0423dbe47b18ed3bc8b49c5b31d@HKXPR06MB101.apcprd06.prod.outlook.com>

Hi there,


We recently assembled a fungal genome using MAKER and we got the gene models. and the corresponding transcripts, predicted proteins and GFF files. However, the predicted proteins do not have the stop codon included so I do not know which proteins are complete and which ones are incomplete at the 3' end. To solve that I have used different programs to extract the fasta sequence of the CDSs given the gff file and the genome sequence. The problem is that with the tools I have tested I get the right sequence for some of the proteins and wrong sequences for others (with multiple stop codons for example). I am not sure why it happens and since it happens with different tools (different python scripts and even gffread from cufflink) I do not know where is the problem. Could you please give me some advice on how to extract the right sequences with the stop codons included?


Thanks!


Diana
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140324/352e150d/attachment-0003.html>

From carsonhh at gmail.com  Mon Mar 24 17:25:09 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Mar 2014 17:25:09 -0600
Subject: [maker-devel] Problem extracting fasta from a GFF file
 generated with MAKER
Message-ID: <CF56185B.B0E1%carsonhh@gmail.com>

You are probably getting the wrong proteins from your scripts because you
are not taking into account the 5' and 3' UTR in the transcript.

For example
>snap_masked-contig-processed-gene-0.2-mRNA-1 transcript offset:261 AED:0.25
eAED:0.25 QI:261|0.4|0.83|0.83|0.8|0.83|6|22|240

The 5' UTR is 261bp and the 3' UTR is 22bp long.  Both would have to be
trimmed before translating the transcript into a protein. Once they are
trimmed you can use frame 0 for the translation.

The fasta_tool that comes with MAKER can be used to quickly trim the UTR.

Example:
fasta_tool maker_transcripts.fasta --trim_maker_utr

Then you can try your other scripts again.

Thanks,
Carson


From:  Diana Garnica Moreno <diana.garnica at anu.edu.au>
Date:  Monday, March 24, 2014 at 5:11 PM
To:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Problem extracting fasta from a GFF file generated
with MAKER

Hi there,


We recently assembled a fungal genome using MAKER and we got the gene
models. and the corresponding transcripts, predicted proteins and GFF files.
However, the predicted proteins do not have the stop codon included so I do
not know which proteins are complete and which ones are incomplete at the 3'
end. To solve that I have used different programs to extract the fasta
sequence of the CDSs given the gff file and the genome sequence. The problem
is that with the tools I have tested I get the right sequence for some of
the proteins and wrong sequences for others (with multiple stop codons for
example). I am not sure why it happens and since it happens with different
tools (different python scripts and even gffread from cufflink) I do not
know where is the problem. Could you please give me some advice on how to
extract the right sequences with the stop codons included?


Thanks!


Diana
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140324/2bcbc369/attachment-0003.html>

From daniel.standage at gmail.com  Tue Mar 25 07:24:14 2014
From: daniel.standage at gmail.com (Daniel Standage)
Date: Tue, 25 Mar 2014 09:24:14 -0400
Subject: [maker-devel] Maker iPlant image
Message-ID: <CAOfLjHVa1r8hdF0GK+gp59pmfZb7qZLO5rF0qwK7b+=hQ0CcrQ@mail.gmail.com>

Greetings,

I launched an instance from the Maker-P 2.28 image
(c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location
of the installed software. All I could find was an example data set on the
Desktop, but the "maker" program was not in the path and the contents of
"/usr/local/src" are empty. Could you please advise on how to run Maker in
iPlant Atmosphere? Thanks.

--
Daniel S. Standage
Ph.D. Candidate
Computational Genome Science Laboratory
Indiana University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140325/6766e38e/attachment-0003.html>

From ernesto at ebi.ac.uk  Tue Mar 25 04:10:59 2014
From: ernesto at ebi.ac.uk (ernesto lowy gallego)
Date: Tue, 25 Mar 2014 10:10:59 +0000
Subject: [maker-devel] Incorrect translation start codon
Message-ID: <53315633.2070702@ebi.ac.uk>

Hi,

I have been inspecting the MAKER predictions and I detected a situation 
which appears with a certain frequency.
(See attached Apollo screenshot illustrating the situation I am going to 
describe):

Let's say that there is est2genome evidence supporting the prediction of 
the 5' UTR region, I have realized that in some of these transcripts 
with 5'UTR, MAKER is not capable of identifying the right downstream ATG 
protein start codon and considers a TTG codon (coding for L) as the 
incorrect protein start. The proper ATG codon start is further 
downstream, as the Ab-initio predictors (SNAP+AUGUSTUS) correctly 
predict in this case (see the attached screenshot)

Any comments on this?

Thanks!

ernesto

-- 
Developer

VectorBase | Ensembl Genomes

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2014-03-25 at 09.34.16.png
Type: image/png
Size: 32220 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140325/f9ae69ec/attachment-0003.png>

From carsonhh at gmail.com  Tue Mar 25 08:19:22 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 25 Mar 2014 08:19:22 -0600
Subject: [maker-devel] Incorrect translation start codon
In-Reply-To: <53315633.2070702@ebi.ac.uk>
References: <53315633.2070702@ebi.ac.uk>
Message-ID: <CF56EBF0.B109%carsonhh@gmail.com>

This is caused by BioPerl's is_start_codon method and default codon table
returning true for non-canonical start codons.  It was resolved some time
ago (See previous discussion -->
https://groups.google.com/forum/#!topic/maker-devel/S0j1fJ4LjVY ).  Make
sure you are using the most recent version of MAKER (currently 2.31).

Thanks,
Carson


https://groups.google.com/forum/#!topic/maker-devel/S0j1fJ4LjVY

On 3/25/14, 4:10 AM, "ernesto lowy gallego" <ernesto at ebi.ac.uk> wrote:

>Hi,
>
>I have been inspecting the MAKER predictions and I detected a situation
>which appears with a certain frequency.
>(See attached Apollo screenshot illustrating the situation I am going to
>describe):
>
>Let's say that there is est2genome evidence supporting the prediction of
>the 5' UTR region, I have realized that in some of these transcripts
>with 5'UTR, MAKER is not capable of identifying the right downstream ATG
>protein start codon and considers a TTG codon (coding for L) as the
>incorrect protein start. The proper ATG codon start is further
>downstream, as the Ab-initio predictors (SNAP+AUGUSTUS) correctly
>predict in this case (see the attached screenshot)
>
>Any comments on this?
>
>Thanks!
>
>ernesto
>
>-- 
>Developer
>
>VectorBase | Ensembl Genomes
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Tue Mar 25 08:24:36 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 25 Mar 2014 08:24:36 -0600
Subject: [maker-devel] Maker iPlant image
In-Reply-To: <CAOfLjHVa1r8hdF0GK+gp59pmfZb7qZLO5rF0qwK7b+=hQ0CcrQ@mail.gmail.com>
References: <CAOfLjHVa1r8hdF0GK+gp59pmfZb7qZLO5rF0qwK7b+=hQ0CcrQ@mail.gmail.com>
Message-ID: <CF56ED91.B119%carsonhh@gmail.com>

--> /opt/maker/bin/maker

It looks like most preinstalled software is under /opt on the image.

Thanks,
Carson


From:  Daniel Standage <daniel.standage at gmail.com>
Date:  Tuesday, March 25, 2014 at 7:24 AM
To:  Maker Mailing List <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Maker iPlant image

Greetings,

I launched an instance from the Maker-P 2.28 image
(c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location
of the installed software. All I could find was an example data set on the
Desktop, but the "maker" program was not in the path and the contents of
"/usr/local/src" are empty. Could you please advise on how to run Maker in
iPlant Atmosphere? Thanks.

--
Daniel S. Standage
Ph.D. Candidate
Computational Genome Science Laboratory
Indiana University
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140325/208a9c20/attachment-0003.html>

From darasappan at gmail.com  Tue Mar 25 10:33:59 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Tue, 25 Mar 2014 11:33:59 -0500
Subject: [maker-devel] maker to EvidenceModeler
Message-ID: <08324618-6422-4E24-99D1-D05E64420FFB@gmail.com>

Hi Carson and others,

Is there an easy tool/pipeline available as part of maker utilities to convert maker and SNAP output to files acceptable by EvidenceModeler?

It looks like it also needs just gff files, but with a few tweaks. EvidenceModeler seems better equipped to handle PASA annotation results than maker results.

Thanks
Dhivya


From barry.utah at gmail.com  Tue Mar 25 11:51:38 2014
From: barry.utah at gmail.com (Barry Moore)
Date: Tue, 25 Mar 2014 11:51:38 -0600
Subject: [maker-devel] Problem extracting fasta from a GFF file
	generated	with MAKER
In-Reply-To: <1264f0423dbe47b18ed3bc8b49c5b31d@HKXPR06MB101.apcprd06.prod.outlook.com>
References: <1264f0423dbe47b18ed3bc8b49c5b31d@HKXPR06MB101.apcprd06.prod.outlook.com>
Message-ID: <B283D045-3B8D-4A0C-82F8-7C2DB291B065@genetics.utah.edu>

Hi Diana,

There is a Perl library - The Genome Annotation Library - that is designed to make writing code like this easy.  I just added a script to this library called gal_CDS_sequence which you would run like this:

gal_CDS_sequence --translate genes.gff3 genome.fasta

The focus of GAL is to try to make writing quick scripts like this easy, so if you're comfortable with a bit of Perl, you can modify existing scripts and write new ones to search, iterate through, and traverse the relationships of features in GFF3 files.

You can access the library here:

http://www.sequenceontology.org/software/GAL.html

Support for GAL is available via the SO mailing list:

https://lists.sourceforge.net/lists/listinfo/song-devel

Hope that helps,

Barry

On Mar 24, 2014, at 5:11 PM, Diana Garnica Moreno wrote:

> Hi there,
> 
> We recently assembled a fungal genome using MAKER and we got the gene models. and the corresponding transcripts, predicted proteins and GFF files. However, the predicted proteins do not have the stop codon included so I do not know which proteins are complete and which ones are incomplete at the 3' end. To solve that I have used different programs to extract the fasta sequence of the CDSs given the gff file and the genome sequence. The problem is that with the tools I have tested I get the right sequence for some of the proteins and wrong sequences for others (with multiple stop codons for example). I am not sure why it happens and since it happens with different tools (different python scripts and even gffread from cufflink) I do not know where is the problem. Could you please give me some advice on how to extract the right sequences with the stop codons included?
> 
> Thanks!
> 
> Diana
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140325/fb1d5733/attachment-0003.html>

From kchilds at plantbiology.msu.edu  Wed Mar 26 08:21:36 2014
From: kchilds at plantbiology.msu.edu (Childs, Kevin)
Date: Wed, 26 Mar 2014 14:21:36 +0000
Subject: [maker-devel] Maker iPlant image
In-Reply-To: <CF56ED91.B119%carsonhh@gmail.com>
References: <CAOfLjHVa1r8hdF0GK+gp59pmfZb7qZLO5rF0qwK7b+=hQ0CcrQ@mail.gmail.com>
	<CF56ED91.B119%carsonhh@gmail.com>
Message-ID: <BE1EEBCF-58A6-4045-B169-699EB189D299@plantbiology.msu.edu>

Daniel,

There are a few small issues with the MAKER-P_2.28 image at iPlant.  I have been using the image successfully for more than a month.  I typically set several environmental variables immediately after starting an ssh session.

export PATH=$PATH:/opt/maker/bin:/opt/maker/exe/snap:/opt/maker/exe/augustus/bin:/opt/maker/exe/augustus/scripts/
export ZOE=/opt/maker/exe/snap
export AUGUSTUS_CONFIG_PATH=/opt/maker/exe/augustus/config
export TMP=/tmp

The image will allow you to train SNAP, but training Augustus is not possible with the current image.  Augustus training requires blat which was not installed in this image.  There is also an issue where training Augustus requires that you write to the /opt/maker/exe/augustus/config/species/ directory which requires some inconvenient directory hacking.  I've worked this all out on a forked image (currently private), but I have not had the time to contact Joshua Stein to suggest some modifications to his public image.

Augustus should work with a stock hmm on this image.

I have not attempted to use GeneMark, and of course, fgenesh is a completely different story.

Kevin Childs


---
Kevin Childs, PhD

Assistant Professor - Fixed Term
Plant Biology Department
Michigan State University

kchilds at plantbiology.msu.edu
517-775-2844 (m)
517-353-5969 (l)

On Mar 25, 2014, at 10:24 AM, Carson Holt wrote:

> --> /opt/maker/bin/maker
> 
> It looks like most preinstalled software is under /opt on the image.
> 
> Thanks,
> Carson
> 
> 
> From: Daniel Standage <daniel.standage at gmail.com>
> Date: Tuesday, March 25, 2014 at 7:24 AM
> To: Maker Mailing List <maker-devel at yandell-lab.org>
> Subject: [maker-devel] Maker iPlant image
> 
> Greetings,
> 
> I launched an instance from the Maker-P 2.28 image (c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location of the installed software. All I could find was an example data set on the Desktop, but the "maker" program was not in the path and the contents of "/usr/local/src" are empty. Could you please advise on how to run Maker in iPlant Atmosphere? Thanks.
> 
> --
> Daniel S. Standage
> Ph.D. Candidate
> Computational Genome Science Laboratory
> Indiana University
> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From steinj at cshl.edu  Wed Mar 26 12:41:37 2014
From: steinj at cshl.edu (Stein, Joshua)
Date: Wed, 26 Mar 2014 18:41:37 +0000
Subject: [maker-devel] Maker iPlant image
In-Reply-To: <BE1EEBCF-58A6-4045-B169-699EB189D299@plantbiology.msu.edu>
References: <CAOfLjHVa1r8hdF0GK+gp59pmfZb7qZLO5rF0qwK7b+=hQ0CcrQ@mail.gmail.com>
	<CF56ED91.B119%carsonhh@gmail.com>
	<BE1EEBCF-58A6-4045-B169-699EB189D299@plantbiology.msu.edu>
Message-ID: <A6505FF9-06C4-4EB2-949B-EDA9113F64E3@cshl.edu>

Also please note that there is a tutorial available here, particularly important if you want to use in MPI mode.
https://pods.iplantcollaborative.org/wiki/display/sciplant/MAKER-P+Atmosphere+Tutorial

Josh

Joshua Stein, PhD
Manager, Sci. Informatics III
Cold Spring Harbor Laboratory
steinj at cshl.edu
http://ware.cshl.org/


On Mar 26, 2014, at 10:20 AM, "Childs, Kevin" <kchilds at plantbiology.msu.edu>
 wrote:

> Daniel,
> 
> There are a few small issues with the MAKER-P_2.28 image at iPlant.  I have been using the image successfully for more than a month.  I typically set several environmental variables immediately after starting an ssh session.
> 
> export PATH=$PATH:/opt/maker/bin:/opt/maker/exe/snap:/opt/maker/exe/augustus/bin:/opt/maker/exe/augustus/scripts/
> export ZOE=/opt/maker/exe/snap
> export AUGUSTUS_CONFIG_PATH=/opt/maker/exe/augustus/config
> export TMP=/tmp
> 
> The image will allow you to train SNAP, but training Augustus is not possible with the current image.  Augustus training requires blat which was not installed in this image.  There is also an issue where training Augustus requires that you write to the /opt/maker/exe/augustus/config/species/ directory which requires some inconvenient directory hacking.  I've worked this all out on a forked image (currently private), but I have not had the time to contact Joshua Stein to suggest some modifications to his public image.
> 
> Augustus should work with a stock hmm on this image.
> 
> I have not attempted to use GeneMark, and of course, fgenesh is a completely different story.
> 
> Kevin Childs
> 
> 
> ---
> Kevin Childs, PhD
> 
> Assistant Professor - Fixed Term
> Plant Biology Department
> Michigan State University
> 
> kchilds at plantbiology.msu.edu
> 517-775-2844 (m)
> 517-353-5969 (l)
> 
> On Mar 25, 2014, at 10:24 AM, Carson Holt wrote:
> 
>> --> /opt/maker/bin/maker
>> 
>> It looks like most preinstalled software is under /opt on the image.
>> 
>> Thanks,
>> Carson
>> 
>> 
>> From: Daniel Standage <daniel.standage at gmail.com>
>> Date: Tuesday, March 25, 2014 at 7:24 AM
>> To: Maker Mailing List <maker-devel at yandell-lab.org>
>> Subject: [maker-devel] Maker iPlant image
>> 
>> Greetings,
>> 
>> I launched an instance from the Maker-P 2.28 image (c5104d19-b4a2-4304-beb2-4921ac61c1ca), but was unable to find the location of the installed software. All I could find was an example data set on the Desktop, but the "maker" program was not in the path and the contents of "/usr/local/src" are empty. Could you please advise on how to run Maker in iPlant Atmosphere? Thanks.
>> 
>> --
>> Daniel S. Standage
>> Ph.D. Candidate
>> Computational Genome Science Laboratory
>> Indiana University
>> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at yandell-lab.org
> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org


From brubin at fieldmuseum.org  Sat Mar 29 10:24:05 2014
From: brubin at fieldmuseum.org (Benjamin Rubin)
Date: Sat, 29 Mar 2014 11:24:05 -0500
Subject: [maker-devel] Missing UTRs in GFF
Message-ID: <CAKpVPBLQ9i9qKv3e=fpD+pU9YFTyUXUFQUiMh0j0N9aDgvSRcQ@mail.gmail.com>

I have annotated a eukaryotic genome with MAKER 2.30. I recently realized
that there are a few genes in the GFF file produced by gff3_merge with
inconsistencies in the annotated CDS and UTRs. For most of my genes, the
UTRs have their own lines in the GFF file. However, for the problematic
genes, the UTRs are not specified in the GFF file and all exons are
annotated as CDS. The UTRs do appear in the gene header and the protein
sequences are the correct length (do not include the UTR). I have attached
an example from the GFF file.

Is this a known problem, or have I done something wrong? Is there an easy
way to fix the GFF file?

Thanks for your help,
Ben

-- 
_____________________________________________________
Benjamin ER Rubin
PhD Candidate
Committee on Evolutionary Biology
University of Chicago
benrubin.org

Division of Insects
Zoology Department
Field Museum of Natural History
1400 South Lake Shore Drive
Chicago, IL 60605
USA
Office: (312) 665-7776
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140329/0f93b3b2/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: missing_utr.gff
Type: application/octet-stream
Size: 2934 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140329/0f93b3b2/attachment-0003.obj>

From mhinsley at ebi.ac.uk  Mon Mar 31 04:20:10 2014
From: mhinsley at ebi.ac.uk (Malcolm Hinsley)
Date: Mon, 31 Mar 2014 11:20:10 +0100
Subject: [maker-devel] putative preponderance of short exons??
Message-ID: <5339415A.1020509@ebi.ac.uk>

Hi

I've run Maker on a de novo assembly of a species of fly and then ran 
some simple statistics (intron/ exon/ CDS length, exons per gene)  over 
the GFF output and compared with a couple of other species.
It all looks good except that there is a surprising number of very short 
exons (6000 < 50 bp, 3500 < 30 bp, 878< 10 bp, 87k total - see attached 
pdf), black is drosophilia, red is A.gambiae, green is with 5' and 3' 
exons removed).

I ran est2genome & protein2genome, then 3 cycles of Augustus and SNAP.  
I'm using maker 2.31 (unpatched).

Anecdotally, these short exons appear without EST or protein evidence 
and they all line up with canonical splice sequences (GT----AG).
(but i've only looked at a few using Apollo).

While there's no requirement that exons should be longer I'm suspicious 
of this as there must be some evolutionary relationship between these 
species.
I've compared with a another species annotated with Maker (using SNAP 
and Augustus)  which is more distant (not yet publicly available), and 
the same pattern of short exons is present.
I wondered if they were created to fulfil the need for start/stop 
codons, but this does not appear to be the case (mostly they are mid-gene).


Is there some way to adjust the predictors eg to require external 
evidence? or anything else you could suggest? ... I can see the 
following in the tutorial but I'm not sure how they could help:

pred_flank=200 #flank for extending evidence clusters sent to gene predictors
pred_stats=0 #report AED and QI statistics for all predictions as well as models
AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
min_protein=0 #require at least this many amino acids in predicted proteins
alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no
always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no


thanks

-- 
malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD
United Kingdom

-------------- next part --------------
A non-text attachment was scrubbed...
Name: exon_53.pdf
Type: application/pdf
Size: 10619 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140331/edd22fe9/attachment-0003.pdf>

From carsonhh at gmail.com  Mon Mar 31 07:52:15 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 31 Mar 2014 07:52:15 -0600
Subject: [maker-devel] putative preponderance of short exons??
In-Reply-To: <5339415A.1020509@ebi.ac.uk>
References: <5339415A.1020509@ebi.ac.uk>
Message-ID: <CF5ECE08.B30C%carsonhh@gmail.com>

The intron/exon structure is determined by SNAP, Augustus, etc.  It is not
affected by any of the maker parameters.  Only evidence alignments are
affected by the maker settings.  You can try retraining or manually
editing the HMMs, but they might also be regions where your assembly is
incorrect and those algorithms make short exons in order to make a
structure work without getting stop codons mid gene.

Thanks,
Carson


On 3/31/14, 4:20 AM, "Malcolm Hinsley" <mhinsley at ebi.ac.uk> wrote:

>Hi
>
>I've run Maker on a de novo assembly of a species of fly and then ran
>some simple statistics (intron/ exon/ CDS length, exons per gene)  over
>the GFF output and compared with a couple of other species.
>It all looks good except that there is a surprising number of very short
>exons (6000 < 50 bp, 3500 < 30 bp, 878< 10 bp, 87k total - see attached
>pdf), black is drosophilia, red is A.gambiae, green is with 5' and 3'
>exons removed).
>
>I ran est2genome & protein2genome, then 3 cycles of Augustus and SNAP.
>I'm using maker 2.31 (unpatched).
>
>Anecdotally, these short exons appear without EST or protein evidence
>and they all line up with canonical splice sequences (GT----AG).
>(but i've only looked at a few using Apollo).
>
>While there's no requirement that exons should be longer I'm suspicious
>of this as there must be some evolutionary relationship between these
>species.
>I've compared with a another species annotated with Maker (using SNAP
>and Augustus)  which is more distant (not yet publicly available), and
>the same pattern of short exons is present.
>I wondered if they were created to fulfil the need for start/stop
>codons, but this does not appear to be the case (mostly they are
>mid-gene).
>
>
>Is there some way to adjust the predictors eg to require external
>evidence? or anything else you could suggest? ... I can see the
>following in the tutorial but I'm not sure how they could help:
>
>pred_flank=200 #flank for extending evidence clusters sent to gene
>predictors
>pred_stats=0 #report AED and QI statistics for all predictions as well as
>models
>AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and
>1)
>min_protein=0 #require at least this many amino acids in predicted
>proteins
>alt_splice=0 #Take extra steps to try and find alternative splicing, 1 =
>yes, 0 = no
>always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0
>= no
>
>
>thanks
>
>-- 
>malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669
>European Bioinformatics Institute (EMBL-EBI)
>European Molecular Biology Laboratory
>Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD
>United Kingdom
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Mon Mar 31 08:37:15 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 31 Mar 2014 08:37:15 -0600
Subject: [maker-devel] Missing UTRs in GFF
In-Reply-To: <CAKpVPBLQ9i9qKv3e=fpD+pU9YFTyUXUFQUiMh0j0N9aDgvSRcQ@mail.gmail.com>
References: <CAKpVPBLQ9i9qKv3e=fpD+pU9YFTyUXUFQUiMh0j0N9aDgvSRcQ@mail.gmail.com>
Message-ID: <CF5ED8D3.B31A%carsonhh@gmail.com>

Not something I've seen before, but there was a patch for another issue that
was cause by the use of avoid_est_fusion=1, that may be related.  Try the
current stable release 2.31, and let me know if it still happens.

You can also upload the contig folder from one of the regions in question
here --> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi

Then I could verify the bug, and see if it is something that happens in the
current release.

--Carson


From:  Benjamin Rubin <brubin at fieldmuseum.org>
Date:  Saturday, March 29, 2014 at 10:24 AM
To:  <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Missing UTRs in GFF

I have annotated a eukaryotic genome with MAKER 2.30. I recently realized
that there are a few genes in the GFF file produced by gff3_merge with
inconsistencies in the annotated CDS and UTRs. For most of my genes, the
UTRs have their own lines in the GFF file. However, for the problematic
genes, the UTRs are not specified in the GFF file and all exons are
annotated as CDS. The UTRs do appear in the gene header and the protein
sequences are the correct length (do not include the UTR). I have attached
an example from the GFF file.

Is this a known problem, or have I done something wrong? Is there an easy
way to fix the GFF file?

Thanks for your help,
Ben

-- 
_____________________________________________________
Benjamin ER Rubin
PhD Candidate
Committee on Evolutionary Biology
University of Chicago
benrubin.org <http://benrubin.org>

Division of Insects
Zoology Department
Field Museum of Natural History
1400 South Lake Shore Drive
Chicago, IL 60605
USA
Office: (312) 665-7776
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140331/9116f7cb/attachment-0003.html>