From darasappan at gmail.com  Mon Feb  3 10:31:16 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Mon, 3 Feb 2014 10:31:16 -0600
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
Message-ID: <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>

Hi Daniel,

I was able to check on some of those questions.

1. From trinity assembly: I started with 102000 contigs. I used  
trinotate to annotate proteins in this.

I ran maker on this data with est2genome set to 1. The output looks  
like this (most important parts on top):

     6653 gene
    46675 exon
  280534 protein_match
59934 CDS
     969 contig
  105388 expressed_sequence_match
   12584 five_prime_UTR
   78565 match
1401369 match_part
   10180 mRNA
   11545 three_prime_UTR

2. From cufflinks assembly: I started with 133380 entries (out of  
which there are 29,000 transcripts).  I used the protein sequences  
from trinity assembly.

I ran maker on this data with est2genome set to 1. The output looks  
like this:
      29 gene
      75 exon
  573659 protein_match
67 CDS
    1099 contig
  269298 expressed_sequence_match
      23 five_prime_UTR
  173844 match
2221846 match_part
      29 mRNA
      23 three_prime_UTR

The genes annotated using the trinity assembly is lower than expected,  
so I went the cufflinks route. I dont understand why when using the  
cufflinks transcripts, even less genes are being found.

3. Training SNAP:  I used the results of maker from 1 to train SNAP.   
I then used that training set to rerun maker:
snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
maker_mpi_withAlltrinity/snap/RHA.hmm
est2genome=0

And again I got results with no entries for gene, exon, CDS etc.
957 contig
   46555 expressed_sequence_match
   43651 match
  553633 match_part
  113738 protein_match

As I mentioned in another email, cegma results indicated that the  
genome was more than 90% complete. Any suggestions would be helpful.

Thank you
Dhivya


On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:

> Hi Dhivya,
>
> I think there a few numbers that could be helpful to understand  
> what's happening here.
>
> How many transcripts did Trinity assembly the RNA-seq data into?  
> Also, you had 29,000 transcripts from cufflinks, but fewer from  
> MAKER when you gave it the cufflinks data. How many transcripts did  
> MAKER identify with the cufflinks data? Did you still get more than  
> the 10,000 transcripts that you found with just the Trinity data?
>
> A key part of MAKER's approach to genome annotation that might be  
> affecting it's performance is that it only annotates a gene where  
> there is both evidence (like your RNA-seq data) and an ab-initio  
> prediction. If a prediction is unsupported by the evidence, then  
> MAKER won't annotate a gene and if evidence aligns where there's no  
> prediction, MAKER won't annotate a gene either. What ab-initio  
> predictors are you using and have they been trained specific genome?
>
> You can force MAKER to automatically promote evidence alignments to  
> a gene model by setting the est2genome option to 1, but that will  
> usually give you many false positives.
>
> Try rerunning it with either the Trinity data or the Cufflinks data  
> and with est2genome set to 1, and let us know how that affects the  
> MAKER results.
>
> Thanks,
> Daniel
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ________________________________________
> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of  
> dhivya arasappan [darasappan at gmail.com]
> Sent: Thursday, January 30, 2014 11:18 AM
> To: maker-devel at yandell-lab.org
> Subject: [maker-devel] maker annotation with cufflinks output
>
> Hello,
>
> I am trying to annotate a 200 mb plant genome for which I have a very
> good assembly.
>
> I tried to denovo assemble RNA-seq data using trinity and ran maker
> using my genome assembly and the trinity results.  I did not get as
> many transcripts as expected, around 10,000 transcripts.
>
> So, I decided to try a different approach.  I did a genome assisted
> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
> generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
> genome assembly and the cufflinks result.  I get much less number of
> transcripts as a result.
>
> If cufflinks found 29000 transcripts by mapping to the genome, I'm
> confused as to why maker is not finding the same.
>
> Any suggestions would be appreciated.
>
> Thanks
> Dhivya
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- 
> lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140203/f454f816/attachment.html>

From rebzi87 at gmail.com  Tue Feb  4 16:29:41 2014
From: rebzi87 at gmail.com (Rebecca Harris)
Date: Tue, 4 Feb 2014 14:29:41 -0800
Subject: [maker-devel] maker output
Message-ID: <CAESS275oypPL7CUMF2QaV3MKxNtNtXppYdF3exxFQMKSA7Vqdw@mail.gmail.com>

Hi,

I'm running maker on a cluster and am having some problems with the run
ending prematurely. I would like to know if there is a straightforward way
to figure out whether maker has completed. I've tried: 1) counting the
number of run.log files in the datastore directly, and 2) counting the
instances of "FINISHED" in the master_datastore_index.log. These numbers
are inconsistent. I have 200,000 contigs in my fasta file - do I expect
200,000 run.log files? I've had to restart maker a few times - it appears
that maker is appending to the master_datastore_index.log, as I find
multiple instances of the same contig being finished.

Thanks!

Cheers,
Rebecca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140204/690873a4/attachment.html>

From darasappan at gmail.com  Tue Feb  4 16:43:19 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Tue, 4 Feb 2014 16:43:19 -0600
Subject: [maker-devel] Fwd:  maker annotation with cufflinks output
References: <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
Message-ID: <EAFE0808-FDA7-49E5-8FD6-9AFD570DF20C@gmail.com>

Resending this since it didnt make it to the mailing list before.

>
> I was able to check on some of those questions.
>
> 1. From trinity assembly: I started with 102000 contigs. I used  
> trinotate to annotate proteins in this.
>
> I ran maker on this data with est2genome set to 1. The output looks  
> like this (most important parts on top):
>
>     6653 gene
>    46675 exon
>  280534 protein_match
> 59934 CDS
>     969 contig
>  105388 expressed_sequence_match
>   12584 five_prime_UTR
>   78565 match
> 1401369 match_part
>   10180 mRNA
>   11545 three_prime_UTR
>
> 2. From cufflinks assembly: I started with 133380 entries (out of  
> which there are 29,000 transcripts).  I used the protein sequences  
> from trinity assembly.
>
> I ran maker on this data with est2genome set to 1. The output looks  
> like this:
>      29 gene
>      75 exon
>  573659 protein_match
> 67 CDS
>    1099 contig
>  269298 expressed_sequence_match
>      23 five_prime_UTR
>  173844 match
> 2221846 match_part
>      29 mRNA
>      23 three_prime_UTR
>
> The genes annotated using the trinity assembly is lower than  
> expected, so I went the cufflinks route. I dont understand why when  
> using the cufflinks transcripts, even less genes are being found.
>
> 3. Training SNAP:  I used the results of maker from 1 to train  
> SNAP.  I then used that training set to rerun maker:
> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
> maker_mpi_withAlltrinity/snap/RHA.hmm
> est2genome=0
>
> And again I got results with no entries for gene, exon, CDS etc.
> 957 contig
>   46555 expressed_sequence_match
>   43651 match
>  553633 match_part
>  113738 protein_match
>
> As I mentioned in another email, cegma results indicated that the  
> genome was more than 90% complete. Any suggestions would be helpful.
>
> Thank you
> Dhivya
>
>
>
>
> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>
>> Hi Dhivya,
>>
>> I think there a few numbers that could be helpful to understand  
>> what's happening here.
>>
>> How many transcripts did Trinity assembly the RNA-seq data into?  
>> Also, you had 29,000 transcripts from cufflinks, but fewer from  
>> MAKER when you gave it the cufflinks data. How many transcripts did  
>> MAKER identify with the cufflinks data? Did you still get more than  
>> the 10,000 transcripts that you found with just the Trinity data?
>>
>> A key part of MAKER's approach to genome annotation that might be  
>> affecting it's performance is that it only annotates a gene where  
>> there is both evidence (like your RNA-seq data) and an ab-initio  
>> prediction. If a prediction is unsupported by the evidence, then  
>> MAKER won't annotate a gene and if evidence aligns where there's no  
>> prediction, MAKER won't annotate a gene either. What ab-initio  
>> predictors are you using and have they been trained specific genome?
>>
>> You can force MAKER to automatically promote evidence alignments to  
>> a gene model by setting the est2genome option to 1, but that will  
>> usually give you many false positives.
>>
>> Try rerunning it with either the Trinity data or the Cufflinks data  
>> and with est2genome set to 1, and let us know how that affects the  
>> MAKER results.
>>
>> Thanks,
>> Daniel
>>
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> ________________________________________
>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf  
>> of dhivya arasappan [darasappan at gmail.com]
>> Sent: Thursday, January 30, 2014 11:18 AM
>> To: maker-devel at yandell-lab.org
>> Subject: [maker-devel] maker annotation with cufflinks output
>>
>> Hello,
>>
>> I am trying to annotate a 200 mb plant genome for which I have a very
>> good assembly.
>>
>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>> using my genome assembly and the trinity results.  I did not get as
>> many transcripts as expected, around 10,000 transcripts.
>>
>> So, I decided to try a different approach.  I did a genome assisted
>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using  
>> my
>> genome assembly and the cufflinks result.  I get much less number of
>> transcripts as a result.
>>
>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>> confused as to why maker is not finding the same.
>>
>> Any suggestions would be appreciated.
>>
>> Thanks
>> Dhivya
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140204/b1755e26/attachment.html>

From dence at genetics.utah.edu  Tue Feb  4 16:42:52 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Tue, 4 Feb 2014 22:42:52 +0000
Subject: [maker-devel] maker output
In-Reply-To: <CAESS275oypPL7CUMF2QaV3MKxNtNtXppYdF3exxFQMKSA7Vqdw@mail.gmail.com>
References: <CAESS275oypPL7CUMF2QaV3MKxNtNtXppYdF3exxFQMKSA7Vqdw@mail.gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D43E51@mxb2.hg.genetics.utah.edu>

Hi Rebecca, If you're looking at the master_datastore_index.log, then you're looking for lines with the "FINISHED" status. If you do a count on those (with "grep -c" for example), that will tell you how many contigs have finished.

If you have 200,000,000 contigs that you're trying to annotate, you might also consider settinng the "min_contig" parameter in the maker_opts.ctl file. This parameter sets a minimum length for a contig before MAKER tries to annotate it. Usually 5000 bp or larger is what you want. That will save you some time in the long run.

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Rebecca Harris [rebzi87 at gmail.com]
Sent: Tuesday, February 04, 2014 3:29 PM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] maker output

Hi,

I'm running maker on a cluster and am having some problems with the run ending prematurely. I would like to know if there is a straightforward way to figure out whether maker has completed. I've tried: 1) counting the number of run.log files in the datastore directly, and 2) counting the instances of "FINISHED" in the master_datastore_index.log. These numbers are inconsistent. I have 200,000 contigs in my fasta file - do I expect 200,000 run.log files? I've had to restart maker a few times - it appears that maker is appending to the master_datastore_index.log, as I find multiple instances of the same contig being finished.

Thanks!

Cheers,
Rebecca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140204/ce6b2734/attachment.html>

From mikael.durling at slu.se  Tue Feb  4 16:49:46 2014
From: mikael.durling at slu.se (=?iso-8859-1?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Tue, 4 Feb 2014 22:49:46 +0000
Subject: [maker-devel] maker output
In-Reply-To: <CAESS275oypPL7CUMF2QaV3MKxNtNtXppYdF3exxFQMKSA7Vqdw@mail.gmail.com>
References: <CAESS275oypPL7CUMF2QaV3MKxNtNtXppYdF3exxFQMKSA7Vqdw@mail.gmail.com>
Message-ID: <D36EEC49-FC5A-4DB8-BF08-795103F1B485@slu.se>

> 4 feb 2014 kl. 23:32 skrev "Rebecca Harris" <rebzi87 at gmail.com>:
> 
> Hi,
> 
> I'm running maker on a cluster and am having some problems with the run ending prematurely. I would like to know if there is a straightforward way to figure out whether maker has completed. I've tried: 1) counting the number of run.log files in the datastore directly, and 2) counting the instances of "FINISHED" in the master_datastore_index.log.

This is usually what I do to check if maker has finished all scaffolds. There should be one FINISHED statement for each entry in the scata file. (It might be one for every scaffold longer than the gjven minimum length. 

> These numbers are inconsistent. I have 200,000 contigs in my fasta file - do I expect 200,000 run.log files? I've had to restart maker a few times - it appears that maker is appending to the master_datastore_index.log, as I find multiple instances of the same contig being finished. 

Run maker -daindex to rebuild the file if you like. The number of FINISHED should not change though

Mikael

> 
> Thanks!
> 
> Cheers,
> Rebecca
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Tue Feb  4 16:50:10 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 04 Feb 2014 15:50:10 -0700
Subject: [maker-devel] maker output
In-Reply-To: <CAESS275oypPL7CUMF2QaV3MKxNtNtXppYdF3exxFQMKSA7Vqdw@mail.gmail.com>
References: <CAESS275oypPL7CUMF2QaV3MKxNtNtXppYdF3exxFQMKSA7Vqdw@mail.gmail.com>
Message-ID: <CF16BBC3.9807%carsonhh@gmail.com>

Clusters are notoriously flakey, so maker is restartable (hence the need for
the log file).  Also since multiple nodes may write simultaneously to the
log, they can munge it?s contents.   You can rerun maker with the -dsindex
flag to regenerate the master_datastore_index.log as well without processing
anything else. You can even delete it before rebuilding it if you want to
ensure all entries are uniq (run on a single cpus when you do this).

Then count the number of FINISHED entries in the log.

Thanks,
Carson


From:  Rebecca Harris <rebzi87 at gmail.com>
Date:  Tuesday, February 4, 2014 at 3:29 PM
To:  <maker-devel at yandell-lab.org>
Subject:  [maker-devel] maker output

Hi,

I'm running maker on a cluster and am having some problems with the run
ending prematurely. I would like to know if there is a straightforward way
to figure out whether maker has completed. I've tried: 1) counting the
number of run.log files in the datastore directly, and 2) counting the
instances of "FINISHED" in the master_datastore_index.log. These numbers are
inconsistent. I have 200,000 contigs in my fasta file - do I expect 200,000
run.log files? I've had to restart maker a few times - it appears that maker
is appending to the master_datastore_index.log, as I find multiple instances
of the same contig being finished.

Thanks!

Cheers,
Rebecca
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140204/9fedef33/attachment.html>

From carsonhh at gmail.com  Wed Feb  5 12:38:50 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 05 Feb 2014 11:38:50 -0700
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
Message-ID: <CF17D1FC.987A%carsonhh@gmail.com>

Do you have any features of type snap in your results from step 3?  We?ve
had a couple of recent posts where after training snap was giving no
results, and as a result maker couldn?t give any genes.  One cause of
something like that may be your step 2.  Make sure the ZFF wasn?t empty you
used to train with.  The maker2zff script uses filters to only put the best
genes in the off file, and if all your genes fail the filtering then you are
training with an empty ZFF.

Also you should use proteins from a related species as your protein file.  I
see that you protein marches are varying wildly from run to run? So is your
contig count?  Were the subset of contigs you have results for long enough
to contain genes?

?Carson

From:  dhivya arasappan <darasappan at gmail.com>
Date:  Monday, February 3, 2014 at 9:31 AM
To:  Daniel Ence <dence at genetics.utah.edu>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] maker annotation with cufflinks output

Hi Daniel,

I was able to check on some of those questions.

1. From trinity assembly: I started with 102000 contigs. I used trinotate to
annotate proteins in this.

I ran maker on this data with est2genome set to 1. The output looks like
this (most important parts on top):

    6653 gene
   46675 exon
 280534 protein_match
59934 CDS
    969 contig
 105388 expressed_sequence_match
  12584 five_prime_UTR
  78565 match
1401369 match_part
  10180 mRNA
  11545 three_prime_UTR

2. From cufflinks assembly: I started with 133380 entries (out of which
there are 29,000 transcripts).  I used the protein sequences from trinity
assembly.

I ran maker on this data with est2genome set to 1. The output looks like
this:
     29 gene
     75 exon
 573659 protein_match
67 CDS
   1099 contig
 269298 expressed_sequence_match
     23 five_prime_UTR
 173844 match
2221846 match_part
     29 mRNA
     23 three_prime_UTR

The genes annotated using the trinity assembly is lower than expected, so I
went the cufflinks route. I dont understand why when using the cufflinks
transcripts, even less genes are being found.

3. Training SNAP:  I used the results of maker from 1 to train SNAP.  I then
used that training set to rerun maker:
snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/sn
ap/RHA.hmm
est2genome=0

And again I got results with no entries for gene, exon, CDS etc.
957 contig
  46555 expressed_sequence_match
  43651 match
 553633 match_part
 113738 protein_match

As I mentioned in another email, cegma results indicated that the genome was
more than 90% complete. Any suggestions would be helpful.

Thank you
Dhivya


On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:

> Hi Dhivya, 
> 
> I think there a few numbers that could be helpful to understand what's
> happening here. 
> 
> How many transcripts did Trinity assembly the RNA-seq data into? Also, you had
> 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it the
> cufflinks data. How many transcripts did MAKER identify with the cufflinks
> data? Did you still get more than the 10,000 transcripts that you found with
> just the Trinity data?
> 
> A key part of MAKER's approach to genome annotation that might be affecting
> it's performance is that it only annotates a gene where there is both evidence
> (like your RNA-seq data) and an ab-initio prediction. If a prediction is
> unsupported by the evidence, then MAKER won't annotate a gene and if evidence
> aligns where there's no prediction, MAKER won't annotate a gene either. What
> ab-initio predictors are you using and have they been trained specific genome?
> 
> You can force MAKER to automatically promote evidence alignments to a gene
> model by setting the est2genome option to 1, but that will usually give you
> many false positives.
> 
> Try rerunning it with either the Trinity data or the Cufflinks data and with
> est2genome set to 1, and let us know how that affects the MAKER results.
> 
> Thanks,
> Daniel
> 
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ________________________________________
> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya
> arasappan [darasappan at gmail.com]
> Sent: Thursday, January 30, 2014 11:18 AM
> To: maker-devel at yandell-lab.org
> Subject: [maker-devel] maker annotation with cufflinks output
> 
> Hello,
> 
> I am trying to annotate a 200 mb plant genome for which I have a very
> good assembly.
> 
> I tried to denovo assemble RNA-seq data using trinity and ran maker
> using my genome assembly and the trinity results.  I did not get as
> many transcripts as expected, around 10,000 transcripts.
> 
> So, I decided to try a different approach.  I did a genome assisted
> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
> generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
> genome assembly and the cufflinks result.  I get much less number of
> transcripts as a result.
> 
> If cufflinks found 29000 transcripts by mapping to the genome, I'm
> confused as to why maker is not finding the same.
> 
> Any suggestions would be appreciated.
> 
> Thanks
> Dhivya
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140205/2bbca2c5/attachment.html>

From dence at genetics.utah.edu  Wed Feb  5 13:28:48 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Wed, 5 Feb 2014 19:28:48 +0000
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <CF17D1FC.987A%carsonhh@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>,
	<CF17D1FC.987A%carsonhh@gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D43F95@mxb2.hg.genetics.utah.edu>

Hi Dhivya, Are the protein matches in your results coming from your annotations of the transcriptome? You should really use amino-acid sequences from related organisms and some kind of omnibus source like SwissProt.

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: Carson Holt [carsonhh at gmail.com]
Sent: Wednesday, February 05, 2014 11:38 AM
To: dhivya arasappan; Daniel Ence
Cc: maker-devel at yandell-lab.org
Subject: Re: [maker-devel] maker annotation with cufflinks output

Do you have any features of type snap in your results from step 3?  We?ve had a couple of recent posts where after training snap was giving no results, and as a result maker couldn?t give any genes.  One cause of something like that may be your step 2.  Make sure the ZFF wasn?t empty you used to train with.  The maker2zff script uses filters to only put the best genes in the off file, and if all your genes fail the filtering then you are training with an empty ZFF.

Also you should use proteins from a related species as your protein file.  I see that you protein marches are varying wildly from run to run? So is your contig count?  Were the subset of contigs you have results for long enough to contain genes?

?Carson

From: dhivya arasappan <darasappan at gmail.com<mailto:darasappan at gmail.com>>
Date: Monday, February 3, 2014 at 9:31 AM
To: Daniel Ence <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] maker annotation with cufflinks output

Hi Daniel,

I was able to check on some of those questions.

1. From trinity assembly: I started with 102000 contigs. I used trinotate to annotate proteins in this.

I ran maker on this data with est2genome set to 1. The output looks like this (most important parts on top):

    6653 gene
   46675 exon
 280534 protein_match
59934 CDS
    969 contig
 105388 expressed_sequence_match
  12584 five_prime_UTR
  78565 match
1401369 match_part
  10180 mRNA
  11545 three_prime_UTR

2. From cufflinks assembly: I started with 133380 entries (out of which there are 29,000 transcripts).  I used the protein sequences from trinity assembly.

I ran maker on this data with est2genome set to 1. The output looks like this:
     29 gene
     75 exon
 573659 protein_match
67 CDS
   1099 contig
 269298 expressed_sequence_match
     23 five_prime_UTR
 173844 match
2221846 match_part
     29 mRNA
     23 three_prime_UTR

The genes annotated using the trinity assembly is lower than expected, so I went the cufflinks route. I dont understand why when using the cufflinks transcripts, even less genes are being found.

3. Training SNAP:  I used the results of maker from 1 to train SNAP.  I then used that training set to rerun maker:
snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap/RHA.hmm
est2genome=0

And again I got results with no entries for gene, exon, CDS etc.
957 contig
  46555 expressed_sequence_match
  43651 match
 553633 match_part
 113738 protein_match

As I mentioned in another email, cegma results indicated that the genome was more than 90% complete. Any suggestions would be helpful.

Thank you
Dhivya


On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:

Hi Dhivya,

I think there a few numbers that could be helpful to understand what's happening here.

How many transcripts did Trinity assembly the RNA-seq data into? Also, you had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it the cufflinks data. How many transcripts did MAKER identify with the cufflinks data? Did you still get more than the 10,000 transcripts that you found with just the Trinity data?

A key part of MAKER's approach to genome annotation that might be affecting it's performance is that it only annotates a gene where there is both evidence (like your RNA-seq data) and an ab-initio prediction. If a prediction is unsupported by the evidence, then MAKER won't annotate a gene and if evidence aligns where there's no prediction, MAKER won't annotate a gene either. What ab-initio predictors are you using and have they been trained specific genome?

You can force MAKER to automatically promote evidence alignments to a gene model by setting the est2genome option to 1, but that will usually give you many false positives.

Try rerunning it with either the Trinity data or the Cufflinks data and with est2genome set to 1, and let us know how that affects the MAKER results.

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org>] on behalf of dhivya arasappan [darasappan at gmail.com<mailto:darasappan at gmail.com>]
Sent: Thursday, January 30, 2014 11:18 AM
To: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
Subject: [maker-devel] maker annotation with cufflinks output

Hello,

I am trying to annotate a 200 mb plant genome for which I have a very
good assembly.

I tried to denovo assemble RNA-seq data using trinity and ran maker
using my genome assembly and the trinity results.  I did not get as
many transcripts as expected, around 10,000 transcripts.

So, I decided to try a different approach.  I did a genome assisted
assembly of the RNA-seq data using tophat/cufflinks. This pipeline
generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
genome assembly and the cufflinks result.  I get much less number of
transcripts as a result.

If cufflinks found 29000 transcripts by mapping to the genome, I'm
confused as to why maker is not finding the same.

Any suggestions would be appreciated.

Thanks
Dhivya


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140205/98e0f3f4/attachment.html>

From darasappan at gmail.com  Wed Feb  5 14:13:57 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Wed, 5 Feb 2014 14:13:57 -0600
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D43F95@mxb2.hg.genetics.utah.edu>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>,
	<CF17D1FC.987A%carsonhh@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43F95@mxb2.hg.genetics.utah.edu>
Message-ID: <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com>

Hello Daniel and Carson,

Thanks for your replies.

Yes I used the the protein sequences resulting from annotation of  
trinity assembly (using trinotate).  I'll try using protein sequences  
from related species (though there arent sequences from closely  
related orgs).  Could you tell me a little about why protein data from  
annotating my rnaseq data would not work best here?

Thanks
Dhivya

On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote:

> Hi Dhivya, Are the protein matches in your results coming from your  
> annotations of the transcriptome? You should really use amino-acid  
> sequences from related organisms and some kind of omnibus source  
> like SwissProt.
>
> Thanks,
> Daniel
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> From: Carson Holt [carsonhh at gmail.com]
> Sent: Wednesday, February 05, 2014 11:38 AM
> To: dhivya arasappan; Daniel Ence
> Cc: maker-devel at yandell-lab.org
> Subject: Re: [maker-devel] maker annotation with cufflinks output
>
> Do you have any features of type snap in your results from step 3?   
> We?ve had a couple of recent posts where after training snap was  
> giving no results, and as a result maker couldn?t give any genes.   
> One cause of something like that may be your step 2.  Make sure the  
> ZFF wasn?t empty you used to train with.  The maker2zff script uses  
> filters to only put the best genes in the off file, and if all your  
> genes fail the filtering then you are training with an empty ZFF.
>
> Also you should use proteins from a related species as your protein  
> file.  I see that you protein marches are varying wildly from run to  
> run? So is your contig count?  Were the subset of contigs you have  
> results for long enough to contain genes?
>
> ?Carson
>
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Monday, February 3, 2014 at 9:31 AM
> To: Daniel Ence <dence at genetics.utah.edu>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] maker annotation with cufflinks output
>
> Hi Daniel,
>
> I was able to check on some of those questions.
>
> 1. From trinity assembly: I started with 102000 contigs. I used  
> trinotate to annotate proteins in this.
>
> I ran maker on this data with est2genome set to 1. The output looks  
> like this (most important parts on top):
>
>     6653 gene
>    46675 exon
>  280534 protein_match
> 59934 CDS
>     969 contig
>  105388 expressed_sequence_match
>   12584 five_prime_UTR
>   78565 match
> 1401369 match_part
>   10180 mRNA
>   11545 three_prime_UTR
>
> 2. From cufflinks assembly: I started with 133380 entries (out of  
> which there are 29,000 transcripts).  I used the protein sequences  
> from trinity assembly.
>
> I ran maker on this data with est2genome set to 1. The output looks  
> like this:
>      29 gene
>      75 exon
>  573659 protein_match
> 67 CDS
>    1099 contig
>  269298 expressed_sequence_match
>      23 five_prime_UTR
>  173844 match
> 2221846 match_part
>      29 mRNA
>      23 three_prime_UTR
>
> The genes annotated using the trinity assembly is lower than  
> expected, so I went the cufflinks route. I dont understand why when  
> using the cufflinks transcripts, even less genes are being found.
>
> 3. Training SNAP:  I used the results of maker from 1 to train  
> SNAP.  I then used that training set to rerun maker:
> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
> maker_mpi_withAlltrinity/snap/RHA.hmm
> est2genome=0
>
> And again I got results with no entries for gene, exon, CDS etc.
> 957 contig
>   46555 expressed_sequence_match
>   43651 match
>  553633 match_part
>  113738 protein_match
>
> As I mentioned in another email, cegma results indicated that the  
> genome was more than 90% complete. Any suggestions would be helpful.
>
> Thank you
> Dhivya
>
>
>
>
> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>
>> Hi Dhivya,
>>
>> I think there a few numbers that could be helpful to understand  
>> what's happening here.
>>
>> How many transcripts did Trinity assembly the RNA-seq data into?  
>> Also, you had 29,000 transcripts from cufflinks, but fewer from  
>> MAKER when you gave it the cufflinks data. How many transcripts did  
>> MAKER identify with the cufflinks data? Did you still get more than  
>> the 10,000 transcripts that you found with just the Trinity data?
>>
>> A key part of MAKER's approach to genome annotation that might be  
>> affecting it's performance is that it only annotates a gene where  
>> there is both evidence (like your RNA-seq data) and an ab-initio  
>> prediction. If a prediction is unsupported by the evidence, then  
>> MAKER won't annotate a gene and if evidence aligns where there's no  
>> prediction, MAKER won't annotate a gene either. What ab-initio  
>> predictors are you using and have they been trained specific genome?
>>
>> You can force MAKER to automatically promote evidence alignments to  
>> a gene model by setting the est2genome option to 1, but that will  
>> usually give you many false positives.
>>
>> Try rerunning it with either the Trinity data or the Cufflinks data  
>> and with est2genome set to 1, and let us know how that affects the  
>> MAKER results.
>>
>> Thanks,
>> Daniel
>>
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> ________________________________________
>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf  
>> of dhivya arasappan [darasappan at gmail.com]
>> Sent: Thursday, January 30, 2014 11:18 AM
>> To: maker-devel at yandell-lab.org
>> Subject: [maker-devel] maker annotation with cufflinks output
>>
>> Hello,
>>
>> I am trying to annotate a 200 mb plant genome for which I have a very
>> good assembly.
>>
>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>> using my genome assembly and the trinity results.  I did not get as
>> many transcripts as expected, around 10,000 transcripts.
>>
>> So, I decided to try a different approach.  I did a genome assisted
>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using  
>> my
>> genome assembly and the cufflinks result.  I get much less number of
>> transcripts as a result.
>>
>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>> confused as to why maker is not finding the same.
>>
>> Any suggestions would be appreciated.
>>
>> Thanks
>> Dhivya
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
> _______________________________________________ maker-devel mailing  
> list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140205/44820157/attachment.html>

From dence at genetics.utah.edu  Wed Feb  5 14:36:26 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Wed, 5 Feb 2014 20:36:26 +0000
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>,
	<CF17D1FC.987A%carsonhh@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43F95@mxb2.hg.genetics.utah.edu>,
	<4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D43FB4@mxb2.hg.genetics.utah.edu>

Hi Dhivya,

In genome annotation, often you want to use as many sources for evidence as is reasonable, but those sources should be distinct.  It will confuse downstream annotation efforts if your protein evidence is actually based on the RNA-seq data.

Using the trinotate results for protein evidence here restricts you first to the proteins coded by the transcripts in the RNA-seq data, which may be incomplete, and secondly to the proteins that trinotate could annotate from among the transcripts.

The problem that Carson mentioned with the SNAP HMM file is a real possibility also.

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: dhivya arasappan [darasappan at gmail.com]
Sent: Wednesday, February 05, 2014 1:13 PM
To: Daniel Ence
Cc: Carson Holt; maker-devel at yandell-lab.org
Subject: Re: [maker-devel] maker annotation with cufflinks output

Hello Daniel and Carson,

Thanks for your replies.

Yes I used the the protein sequences resulting from annotation of trinity assembly (using trinotate).  I'll try using protein sequences from related species (though there arent sequences from closely related orgs).  Could you tell me a little about why protein data from annotating my rnaseq data would not work best here?

Thanks
Dhivya

On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote:

Hi Dhivya, Are the protein matches in your results coming from your annotations of the transcriptome? You should really use amino-acid sequences from related organisms and some kind of omnibus source like SwissProt.

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: Carson Holt [carsonhh at gmail.com<mailto:carsonhh at gmail.com>]
Sent: Wednesday, February 05, 2014 11:38 AM
To: dhivya arasappan; Daniel Ence
Cc: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] maker annotation with cufflinks output

Do you have any features of type snap in your results from step 3?  We?ve had a couple of recent posts where after training snap was giving no results, and as a result maker couldn?t give any genes.  One cause of something like that may be your step 2.  Make sure the ZFF wasn?t empty you used to train with.  The maker2zff script uses filters to only put the best genes in the off file, and if all your genes fail the filtering then you are training with an empty ZFF.

Also you should use proteins from a related species as your protein file.  I see that you protein marches are varying wildly from run to run? So is your contig count?  Were the subset of contigs you have results for long enough to contain genes?

?Carson

From: dhivya arasappan <darasappan at gmail.com<mailto:darasappan at gmail.com>>
Date: Monday, February 3, 2014 at 9:31 AM
To: Daniel Ence <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] maker annotation with cufflinks output

Hi Daniel,

I was able to check on some of those questions.

1. From trinity assembly: I started with 102000 contigs. I used trinotate to annotate proteins in this.

I ran maker on this data with est2genome set to 1. The output looks like this (most important parts on top):

    6653 gene
   46675 exon
 280534 protein_match
59934 CDS
    969 contig
 105388 expressed_sequence_match
  12584 five_prime_UTR
  78565 match
1401369 match_part
  10180 mRNA
  11545 three_prime_UTR

2. From cufflinks assembly: I started with 133380 entries (out of which there are 29,000 transcripts).  I used the protein sequences from trinity assembly.

I ran maker on this data with est2genome set to 1. The output looks like this:
     29 gene
     75 exon
 573659 protein_match
67 CDS
   1099 contig
 269298 expressed_sequence_match
     23 five_prime_UTR
 173844 match
2221846 match_part
     29 mRNA
     23 three_prime_UTR

The genes annotated using the trinity assembly is lower than expected, so I went the cufflinks route. I dont understand why when using the cufflinks transcripts, even less genes are being found.

3. Training SNAP:  I used the results of maker from 1 to train SNAP.  I then used that training set to rerun maker:
snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap/RHA.hmm
est2genome=0

And again I got results with no entries for gene, exon, CDS etc.
957 contig
  46555 expressed_sequence_match
  43651 match
 553633 match_part
 113738 protein_match

As I mentioned in another email, cegma results indicated that the genome was more than 90% complete. Any suggestions would be helpful.

Thank you
Dhivya


On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:

Hi Dhivya,

I think there a few numbers that could be helpful to understand what's happening here.

How many transcripts did Trinity assembly the RNA-seq data into? Also, you had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it the cufflinks data. How many transcripts did MAKER identify with the cufflinks data? Did you still get more than the 10,000 transcripts that you found with just the Trinity data?

A key part of MAKER's approach to genome annotation that might be affecting it's performance is that it only annotates a gene where there is both evidence (like your RNA-seq data) and an ab-initio prediction. If a prediction is unsupported by the evidence, then MAKER won't annotate a gene and if evidence aligns where there's no prediction, MAKER won't annotate a gene either. What ab-initio predictors are you using and have they been trained specific genome?

You can force MAKER to automatically promote evidence alignments to a gene model by setting the est2genome option to 1, but that will usually give you many false positives.

Try rerunning it with either the Trinity data or the Cufflinks data and with est2genome set to 1, and let us know how that affects the MAKER results.

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org>] on behalf of dhivya arasappan [darasappan at gmail.com<mailto:darasappan at gmail.com>]
Sent: Thursday, January 30, 2014 11:18 AM
To: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
Subject: [maker-devel] maker annotation with cufflinks output

Hello,

I am trying to annotate a 200 mb plant genome for which I have a very
good assembly.

I tried to denovo assemble RNA-seq data using trinity and ran maker
using my genome assembly and the trinity results.  I did not get as
many transcripts as expected, around 10,000 transcripts.

So, I decided to try a different approach.  I did a genome assisted
assembly of the RNA-seq data using tophat/cufflinks. This pipeline
generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
genome assembly and the cufflinks result.  I get much less number of
transcripts as a result.

If cufflinks found 29000 transcripts by mapping to the genome, I'm
confused as to why maker is not finding the same.

Any suggestions would be appreciated.

Thanks
Dhivya


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140205/36c41e54/attachment.html>

From carsonhh at gmail.com  Wed Feb  5 14:38:44 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 05 Feb 2014 13:38:44 -0700
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43F95@mxb2.hg.genetics.utah.edu>
	<4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com>
Message-ID: <CF17E9B9.9892%carsonhh@gmail.com>

Protein data doesn?t have to be from that closely a related species.  This
is because genes maintain homology at the amino acid level across even very
large evolutionary distances.  Having a closer related species just ensures
that genome contents are similar (fewer losses/gains relative to each
other). And use the entire proteome of at least one related species (just
using a database like swiss-prot is not sufficient).

Using translated mRNA-seq data will not give you any new information that
was not already available from the untranslated sequence.  Plus it will
introduce the complicating artifacts that mRNA-seq generates into the
protein part of the pipeline (gene merging, incorrect assembly, and false
calls caused by background transcription).  A big gotcha with mRNA-seq is
that all of your genome gets transcribed at a low level, not just the genes,
so you will always have contamination that does not represent real gene
models.  Also in the end you really only expect to capture about 50% of the
genes with mRNA-seq (maybe 70% if you are fortunate - and most of those will
be partial). So using the proteins from another species, is important to
improve sensitivity, and fix many of the issues that arise from the noisy
nature of mRNA-seq.  In fact if you were forced to use only one (either
protein evidence or mRNA-seq) you will actually get better annotations from
the protein evidence in most cases. You get better annotations when you use
both, but if using only one of them, the proteins from another species are
better, and noisy mRNA-seq will be the primary source of annotation error.

Thanks,
Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Wednesday, February 5, 2014 at 1:13 PM
To:  Daniel Ence <dence at genetics.utah.edu>
Cc:  Carson Holt <carsonhh at gmail.com>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] maker annotation with cufflinks output

Hello Daniel and Carson,

Thanks for your replies.

Yes I used the the protein sequences resulting from annotation of trinity
assembly (using trinotate).  I'll try using protein sequences from related
species (though there arent sequences from closely related orgs).  Could you
tell me a little about why protein data from annotating my rnaseq data would
not work best here?

Thanks
Dhivya
 
On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote:

> Hi Dhivya, Are the protein matches in your results coming from your
> annotations of the transcriptome? You should really use amino-acid sequences
> from related organisms and some kind of omnibus source like SwissProt.
> 
> Thanks,
> Daniel
> 
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> 
> From: Carson Holt [carsonhh at gmail.com]
> Sent: Wednesday, February 05, 2014 11:38 AM
> To: dhivya arasappan; Daniel Ence
> Cc: maker-devel at yandell-lab.org
> Subject: Re: [maker-devel] maker annotation with cufflinks output
> 
> Do you have any features of type snap in your results from step 3?  We?ve had
> a couple of recent posts where after training snap was giving no results, and
> as a result maker couldn?t give any genes.  One cause of something like that
> may be your step 2.  Make sure the ZFF wasn?t empty you used to train with.
> The maker2zff script uses filters to only put the best genes in the off file,
> and if all your genes fail the filtering then you are training with an empty
> ZFF.
> 
> Also you should use proteins from a related species as your protein file.  I
> see that you protein marches are varying wildly from run to run? So is your
> contig count?  Were the subset of contigs you have results for long enough to
> contain genes?
> 
> ?Carson
> 
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Monday, February 3, 2014 at 9:31 AM
> To: Daniel Ence <dence at genetics.utah.edu>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] maker annotation with cufflinks output
> 
> Hi Daniel,
> 
> I was able to check on some of those questions.
> 
> 1. From trinity assembly: I started with 102000 contigs. I used trinotate to
> annotate proteins in this.
> 
> I ran maker on this data with est2genome set to 1. The output looks like this
> (most important parts on top):
> 
>     6653 gene
>    46675 exon
>  280534 protein_match
> 59934 CDS
>     969 contig
>  105388 expressed_sequence_match
>   12584 five_prime_UTR
>   78565 match
> 1401369 match_part
>   10180 mRNA
>   11545 three_prime_UTR
> 
> 2. From cufflinks assembly: I started with 133380 entries (out of which there
> are 29,000 transcripts).  I used the protein sequences from trinity assembly.
> 
> I ran maker on this data with est2genome set to 1. The output looks like this:
>      29 gene
>      75 exon
>  573659 protein_match
> 67 CDS
>    1099 contig
>  269298 expressed_sequence_match
>      23 five_prime_UTR
>  173844 match
> 2221846 match_part
>      29 mRNA
>      23 three_prime_UTR
> 
> The genes annotated using the trinity assembly is lower than expected, so I
> went the cufflinks route. I dont understand why when using the cufflinks
> transcripts, even less genes are being found.
> 
> 3. Training SNAP:  I used the results of maker from 1 to train SNAP.  I then
> used that training set to rerun maker:
> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap
> /RHA.hmm
> est2genome=0
> 
> And again I got results with no entries for gene, exon, CDS etc.
> 957 contig
>   46555 expressed_sequence_match
>   43651 match
>  553633 match_part
>  113738 protein_match
> 
> As I mentioned in another email, cegma results indicated that the genome was
> more than 90% complete. Any suggestions would be helpful.
> 
> Thank you
> Dhivya
> 
> 
> 
> 
> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
> 
>> Hi Dhivya, 
>> 
>> I think there a few numbers that could be helpful to understand what's
>> happening here. 
>> 
>> How many transcripts did Trinity assembly the RNA-seq data into? Also, you
>> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it
>> the cufflinks data. How many transcripts did MAKER identify with the
>> cufflinks data? Did you still get more than the 10,000 transcripts that you
>> found with just the Trinity data?
>> 
>> A key part of MAKER's approach to genome annotation that might be affecting
>> it's performance is that it only annotates a gene where there is both
>> evidence (like your RNA-seq data) and an ab-initio prediction. If a
>> prediction is unsupported by the evidence, then MAKER won't annotate a gene
>> and if evidence aligns where there's no prediction, MAKER won't annotate a
>> gene either. What ab-initio predictors are you using and have they been
>> trained specific genome?
>> 
>> You can force MAKER to automatically promote evidence alignments to a gene
>> model by setting the est2genome option to 1, but that will usually give you
>> many false positives.
>> 
>> Try rerunning it with either the Trinity data or the Cufflinks data and with
>> est2genome set to 1, and let us know how that affects the MAKER results.
>> 
>> Thanks,
>> Daniel
>> 
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> ________________________________________
>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya
>> arasappan [darasappan at gmail.com]
>> Sent: Thursday, January 30, 2014 11:18 AM
>> To: maker-devel at yandell-lab.org
>> Subject: [maker-devel] maker annotation with cufflinks output
>> 
>> Hello,
>> 
>> I am trying to annotate a 200 mb plant genome for which I have a very
>> good assembly.
>> 
>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>> using my genome assembly and the trinity results.  I did not get as
>> many transcripts as expected, around 10,000 transcripts.
>> 
>> So, I decided to try a different approach.  I did a genome assisted
>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
>> genome assembly and the cufflinks result.  I get much less number of
>> transcripts as a result.
>> 
>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>> confused as to why maker is not finding the same.
>> 
>> Any suggestions would be appreciated.
>> 
>> Thanks
>> Dhivya
>> 
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140205/422a18ff/attachment.html>

From darasappan at gmail.com  Wed Feb  5 23:16:43 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Wed, 5 Feb 2014 23:16:43 -0600
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <CF17E9B9.9892%carsonhh@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43F95@mxb2.hg.genetics.utah.edu>
	<4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com>
	<CF17E9B9.9892%carsonhh@gmail.com>
Message-ID: <1188173E-53C1-4FFE-B790-B710C3A55B86@gmail.com>

Thank you both for those explanations. I'll get back to you after I  
try rerunning maker.

Dhivya

On Feb 5, 2014, at 2:38 PM, Carson Holt wrote:

> Protein data doesn?t have to be from that closely a related  
> species.  This is because genes maintain homology at the amino acid  
> level across even very large evolutionary distances.  Having a  
> closer related species just ensures that genome contents are similar  
> (fewer losses/gains relative to each other). And use the entire  
> proteome of at least one related species (just using a database like  
> swiss-prot is not sufficient).
>
> Using translated mRNA-seq data will not give you any new information  
> that was not already available from the untranslated sequence.  Plus  
> it will introduce the complicating artifacts that mRNA-seq generates  
> into the protein part of the pipeline (gene merging, incorrect  
> assembly, and false calls caused by background transcription).  A  
> big gotcha with mRNA-seq is that all of your genome gets transcribed  
> at a low level, not just the genes, so you will always have  
> contamination that does not represent real gene models.  Also in the  
> end you really only expect to capture about 50% of the genes with  
> mRNA-seq (maybe 70% if you are fortunate - and most of those will be  
> partial). So using the proteins from another species, is important  
> to improve sensitivity, and fix many of the issues that arise from  
> the noisy nature of mRNA-seq.  In fact if you were forced to use  
> only one (either protein evidence or mRNA-seq) you will actually get  
> better annotations from the protein evidence in most cases. You get  
> better annotations when you use both, but if using only one of them,  
> the proteins from another species are better, and noisy mRNA-seq  
> will be the primary source of annotation error.
>
> Thanks,
> Carson
>
>
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Wednesday, February 5, 2014 at 1:13 PM
> To: Daniel Ence <dence at genetics.utah.edu>
> Cc: Carson Holt <carsonhh at gmail.com>, "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org 
> >
> Subject: Re: [maker-devel] maker annotation with cufflinks output
>
> Hello Daniel and Carson,
>
> Thanks for your replies.
>
> Yes I used the the protein sequences resulting from annotation of  
> trinity assembly (using trinotate).  I'll try using protein  
> sequences from related species (though there arent sequences from  
> closely related orgs).  Could you tell me a little about why protein  
> data from annotating my rnaseq data would not work best here?
>
> Thanks
> Dhivya
>
> On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote:
>
>> Hi Dhivya, Are the protein matches in your results coming from your  
>> annotations of the transcriptome? You should really use amino-acid  
>> sequences from related organisms and some kind of omnibus source  
>> like SwissProt.
>>
>> Thanks,
>> Daniel
>>
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> From: Carson Holt [carsonhh at gmail.com]
>> Sent: Wednesday, February 05, 2014 11:38 AM
>> To: dhivya arasappan; Daniel Ence
>> Cc: maker-devel at yandell-lab.org
>> Subject: Re: [maker-devel] maker annotation with cufflinks output
>>
>> Do you have any features of type snap in your results from step 3?   
>> We?ve had a couple of recent posts where after training snap was  
>> giving no results, and as a result maker couldn?t give any genes.   
>> One cause of something like that may be your step 2.  Make sure the  
>> ZFF wasn?t empty you used to train with.  The maker2zff script uses  
>> filters to only put the best genes in the off file, and if all your  
>> genes fail the filtering then you are training with an empty ZFF.
>>
>> Also you should use proteins from a related species as your protein  
>> file.  I see that you protein marches are varying wildly from run  
>> to run? So is your contig count?  Were the subset of contigs you  
>> have results for long enough to contain genes?
>>
>> ?Carson
>>
>> From: dhivya arasappan <darasappan at gmail.com>
>> Date: Monday, February 3, 2014 at 9:31 AM
>> To: Daniel Ence <dence at genetics.utah.edu>
>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] maker annotation with cufflinks output
>>
>> Hi Daniel,
>>
>> I was able to check on some of those questions.
>>
>> 1. From trinity assembly: I started with 102000 contigs. I used  
>> trinotate to annotate proteins in this.
>>
>> I ran maker on this data with est2genome set to 1. The output looks  
>> like this (most important parts on top):
>>
>>     6653 gene
>>    46675 exon
>>  280534 protein_match
>> 59934 CDS
>>     969 contig
>>  105388 expressed_sequence_match
>>   12584 five_prime_UTR
>>   78565 match
>> 1401369 match_part
>>   10180 mRNA
>>   11545 three_prime_UTR
>>
>> 2. From cufflinks assembly: I started with 133380 entries (out of  
>> which there are 29,000 transcripts).  I used the protein sequences  
>> from trinity assembly.
>>
>> I ran maker on this data with est2genome set to 1. The output looks  
>> like this:
>>      29 gene
>>      75 exon
>>  573659 protein_match
>> 67 CDS
>>    1099 contig
>>  269298 expressed_sequence_match
>>      23 five_prime_UTR
>>  173844 match
>> 2221846 match_part
>>      29 mRNA
>>      23 three_prime_UTR
>>
>> The genes annotated using the trinity assembly is lower than  
>> expected, so I went the cufflinks route. I dont understand why when  
>> using the cufflinks transcripts, even less genes are being found.
>>
>> 3. Training SNAP:  I used the results of maker from 1 to train  
>> SNAP.  I then used that training set to rerun maker:
>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
>> maker_mpi_withAlltrinity/snap/RHA.hmm
>> est2genome=0
>>
>> And again I got results with no entries for gene, exon, CDS etc.
>> 957 contig
>>   46555 expressed_sequence_match
>>   43651 match
>>  553633 match_part
>>  113738 protein_match
>>
>> As I mentioned in another email, cegma results indicated that the  
>> genome was more than 90% complete. Any suggestions would be helpful.
>>
>> Thank you
>> Dhivya
>>
>>
>>
>>
>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>>
>>> Hi Dhivya,
>>>
>>> I think there a few numbers that could be helpful to understand  
>>> what's happening here.
>>>
>>> How many transcripts did Trinity assembly the RNA-seq data into?  
>>> Also, you had 29,000 transcripts from cufflinks, but fewer from  
>>> MAKER when you gave it the cufflinks data. How many transcripts  
>>> did MAKER identify with the cufflinks data? Did you still get more  
>>> than the 10,000 transcripts that you found with just the Trinity  
>>> data?
>>>
>>> A key part of MAKER's approach to genome annotation that might be  
>>> affecting it's performance is that it only annotates a gene where  
>>> there is both evidence (like your RNA-seq data) and an ab-initio  
>>> prediction. If a prediction is unsupported by the evidence, then  
>>> MAKER won't annotate a gene and if evidence aligns where there's  
>>> no prediction, MAKER won't annotate a gene either. What ab-initio  
>>> predictors are you using and have they been trained specific genome?
>>>
>>> You can force MAKER to automatically promote evidence alignments  
>>> to a gene model by setting the est2genome option to 1, but that  
>>> will usually give you many false positives.
>>>
>>> Try rerunning it with either the Trinity data or the Cufflinks  
>>> data and with est2genome set to 1, and let us know how that  
>>> affects the MAKER results.
>>>
>>> Thanks,
>>> Daniel
>>>
>>> Daniel Ence
>>> Graduate Student
>>> Eccles Institute of Human Genetics
>>> University of Utah
>>> 15 North 2030 East, Room 2100
>>> Salt Lake City, UT 84112-5330
>>> ________________________________________
>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf  
>>> of dhivya arasappan [darasappan at gmail.com]
>>> Sent: Thursday, January 30, 2014 11:18 AM
>>> To: maker-devel at yandell-lab.org
>>> Subject: [maker-devel] maker annotation with cufflinks output
>>>
>>> Hello,
>>>
>>> I am trying to annotate a 200 mb plant genome for which I have a  
>>> very
>>> good assembly.
>>>
>>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>>> using my genome assembly and the trinity results.  I did not get as
>>> many transcripts as expected, around 10,000 transcripts.
>>>
>>> So, I decided to try a different approach.  I did a genome assisted
>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>>> generated 21,000 genes, 29,000 transcripts.  I then ran maker  
>>> using my
>>> genome assembly and the cufflinks result.  I get much less number of
>>> transcripts as a result.
>>>
>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>>> confused as to why maker is not finding the same.
>>>
>>> Any suggestions would be appreciated.
>>>
>>> Thanks
>>> Dhivya
>>>
>>>
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>> _______________________________________________ maker-devel mailing  
>> list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140205/02e0218f/attachment.html>

From mikael.durling at slu.se  Thu Feb  6 05:02:37 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Thu, 6 Feb 2014 11:02:37 +0000
Subject: [maker-devel] ncRNA support in maker
In-Reply-To: <CEF5A8E0.88B4%carsonhh@gmail.com>
References: <CEF5A8E0.88B4%carsonhh@gmail.com>
Message-ID: <CCBE48F7-81F1-42E3-87A3-B251EE03140C@slu.se>

Hi Carson,

it?s nice to see all these new features in maker.

I gave the trnascan option a try by enabling it in the config file for one of my fungal genomes. It failed though, with this error message:

ERROR: You found a tRNA with an intron! This should not happen
--> rank=12, hostname=my-mgrid6
ERROR: Failed while gathering ab-init output files
ERROR: Chunk failed at level:1, tier_type:2
FAILED CONTIG:scf_013

ERROR: Chunk failed at level:4, tier_type:0
FAILED CONTIG:scf_013

I checked the trnascan output (scf_013.abinit_nomask.0.eukaryotic.trnascan) in theVoid for that contig, and the output seems valid to me:

scf_013         1       189339  189410  Thr     AGT     0       0       82.83
scf_013         2       510381  510462  Ser     AGA     0       0       67.09
scf_013         3       586886  587000  Leu     CAA     586924  586956  57.97
scf_013         4       942166  942069  Leu     AAG     942128  942113  57.48
scf_013         5       169102  168993  Leu     TAA     169065  169037  56.49


Hope this can be of some help while debugging. I?ll leave trnascan off for now.

thanks,

Mikael


10 jan 2014 kl. 22:03 skrev Carson Holt <carsonhh at gmail.com>:

> Hi Mikael,
> 
> The options are part of the new MAKER-P integration
> (http://www.plantphysiol.org/content/early/2013/12/06/pp.113.230144.abstrac
> t).  Additional documentation/tutorials will be forthcoming - probably in
> a nice wiki page as part of the upcoming GMOD Malaysia courses in February
> or alternatively with the annual GMOD summer school. The tRNA option is
> easy enough to turn on (just set trna=1 in the maker_opts.ctl file).
> 
> Thanks,
> Carson
> 
> 
> 
> On 1/10/14, 2:48 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
> wrote:
> 
>> Hi Carson and other maker developers,
>> 
>> I was reading the source code of the latest maker release and noted
>> several references to ncRNAs, snoscan and trnascan. Can these be
>> incorporated into the normal annotation workflow? If so, are there any
>> instructions available for that?
>> 
>> best regards,
>> Mikael Durling
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> 


From darasappan at gmail.com  Thu Feb  6 08:52:12 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Thu, 6 Feb 2014 08:52:12 -0600
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <CF17D1FC.987A%carsonhh@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
Message-ID: <73AFCD9F-3B60-4C9C-9E03-35BC682E14ED@gmail.com>

Hello,

I does appear than my genome.ann file from maker2zff script has data  
in it. However, the SNAP steps after that have created empty files.   
The following are all empty:

alt.dna  err.dna  export.dna  genome.dna  olp.dna  uni.dna  wrn.dna
alt.ann  err.ann  export.ann  genome.ann  olp.ann  uni.ann  wrn.ann

When I tried to get gene stats or validate genome.ann, I get errors  
like this for all of them:

fathom genome.ann genome.dna -gene-stats |more
MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds  
exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds  
exon-6:out_of_bounds
MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds  
exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds  
exon-1:out_of_bounds
MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds  
exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds  
exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds  
exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds  
exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds  
exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds  
exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds  
exon-20:out_of_bounds exon-21:out_of_bounds

I'm not sure why the annotation I'm seeing in genome.ann are all  
showing up as errors. I realize this may be an issue with snap, but  
are you familiar with anything like this? Snippet of my genome.ann  
file is attached (since its too big for the list) for reference.

Thanks
Dhivya


On Feb 5, 2014, at 12:38 PM, Carson Holt wrote:

> Do you have any features of type snap in your results from step 3?   
> We?ve had a couple of recent posts where after training snap was  
> giving no results, and as a result maker couldn?t give any genes.   
> One cause of something like that may be your step 2.  Make sure the  
> ZFF wasn?t empty you used to train with.  The maker2zff script uses  
> filters to only put the best genes in the off file, and if all your  
> genes fail the filtering then you are training with an empty ZFF.
>
> Also you should use proteins from a related species as your protein  
> file.  I see that you protein marches are varying wildly from run to  
> run? So is your contig count?  Were the subset of contigs you have  
> results for long enough to contain genes?
>
> ?Carson
>
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Monday, February 3, 2014 at 9:31 AM
> To: Daniel Ence <dence at genetics.utah.edu>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] maker annotation with cufflinks output
>
> Hi Daniel,
>
> I was able to check on some of those questions.
>
> 1. From trinity assembly: I started with 102000 contigs. I used  
> trinotate to annotate proteins in this.
>
> I ran maker on this data with est2genome set to 1. The output looks  
> like this (most important parts on top):
>
>     6653 gene
>    46675 exon
>  280534 protein_match
> 59934 CDS
>     969 contig
>  105388 expressed_sequence_match
>   12584 five_prime_UTR
>   78565 match
> 1401369 match_part
>   10180 mRNA
>   11545 three_prime_UTR
>
> 2. From cufflinks assembly: I started with 133380 entries (out of  
> which there are 29,000 transcripts).  I used the protein sequences  
> from trinity assembly.
>
> I ran maker on this data with est2genome set to 1. The output looks  
> like this:
>      29 gene
>      75 exon
>  573659 protein_match
> 67 CDS
>    1099 contig
>  269298 expressed_sequence_match
>      23 five_prime_UTR
>  173844 match
> 2221846 match_part
>      29 mRNA
>      23 three_prime_UTR
>
> The genes annotated using the trinity assembly is lower than  
> expected, so I went the cufflinks route. I dont understand why when  
> using the cufflinks transcripts, even less genes are being found.
>
> 3. Training SNAP:  I used the results of maker from 1 to train  
> SNAP.  I then used that training set to rerun maker:
> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
> maker_mpi_withAlltrinity/snap/RHA.hmm
> est2genome=0
>
> And again I got results with no entries for gene, exon, CDS etc.
> 957 contig
>   46555 expressed_sequence_match
>   43651 match
>  553633 match_part
>  113738 protein_match
>
> As I mentioned in another email, cegma results indicated that the  
> genome was more than 90% complete. Any suggestions would be helpful.
>
> Thank you
> Dhivya
>
>
>
>
> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>
>> Hi Dhivya,
>>
>> I think there a few numbers that could be helpful to understand  
>> what's happening here.
>>
>> How many transcripts did Trinity assembly the RNA-seq data into?  
>> Also, you had 29,000 transcripts from cufflinks, but fewer from  
>> MAKER when you gave it the cufflinks data. How many transcripts did  
>> MAKER identify with the cufflinks data? Did you still get more than  
>> the 10,000 transcripts that you found with just the Trinity data?
>>
>> A key part of MAKER's approach to genome annotation that might be  
>> affecting it's performance is that it only annotates a gene where  
>> there is both evidence (like your RNA-seq data) and an ab-initio  
>> prediction. If a prediction is unsupported by the evidence, then  
>> MAKER won't annotate a gene and if evidence aligns where there's no  
>> prediction, MAKER won't annotate a gene either. What ab-initio  
>> predictors are you using and have they been trained specific genome?
>>
>> You can force MAKER to automatically promote evidence alignments to  
>> a gene model by setting the est2genome option to 1, but that will  
>> usually give you many false positives.
>>
>> Try rerunning it with either the Trinity data or the Cufflinks data  
>> and with est2genome set to 1, and let us know how that affects the  
>> MAKER results.
>>
>> Thanks,
>> Daniel
>>
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> ________________________________________
>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf  
>> of dhivya arasappan [darasappan at gmail.com]
>> Sent: Thursday, January 30, 2014 11:18 AM
>> To: maker-devel at yandell-lab.org
>> Subject: [maker-devel] maker annotation with cufflinks output
>>
>> Hello,
>>
>> I am trying to annotate a 200 mb plant genome for which I have a very
>> good assembly.
>>
>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>> using my genome assembly and the trinity results.  I did not get as
>> many transcripts as expected, around 10,000 transcripts.
>>
>> So, I decided to try a different approach.  I did a genome assisted
>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using  
>> my
>> genome assembly and the cufflinks result.  I get much less number of
>> transcripts as a result.
>>
>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>> confused as to why maker is not finding the same.
>>
>> Any suggestions would be appreciated.
>>
>> Thanks
>> Dhivya
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
> _______________________________________________ maker-devel mailing  
> list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a6912d46/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: head.genome.ann
Type: application/octet-stream
Size: 15761 bytes
Desc: not available
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a6912d46/attachment.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a6912d46/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: head.genome.dna
Type: application/octet-stream
Size: 3075 bytes
Desc: not available
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a6912d46/attachment-0001.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a6912d46/attachment-0002.html>

From carsonhh at gmail.com  Thu Feb  6 10:01:04 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Feb 2014 09:01:04 -0700
Subject: [maker-devel] ncRNA support in maker
In-Reply-To: <CCBE48F7-81F1-42E3-87A3-B251EE03140C@slu.se>
References: <CEF5A8E0.88B4%carsonhh@gmail.com>
	<CCBE48F7-81F1-42E3-87A3-B251EE03140C@slu.se>
Message-ID: <CF18FE86.9903%carsonhh@gmail.com>

I?m making a new release this weekend, but if you have access to the devel
version, you can test now.  All changes have been committed tot he
subversion repository.

Thanks,
Carson


On 2/6/14, 4:02 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
wrote:

>Hi Carson,
>
>it?s nice to see all these new features in maker.
>
>I gave the trnascan option a try by enabling it in the config file for
>one of my fungal genomes. It failed though, with this error message:
>
>ERROR: You found a tRNA with an intron! This should not happen
>--> rank=12, hostname=my-mgrid6
>ERROR: Failed while gathering ab-init output files
>ERROR: Chunk failed at level:1, tier_type:2
>FAILED CONTIG:scf_013
>
>ERROR: Chunk failed at level:4, tier_type:0
>FAILED CONTIG:scf_013
>
>I checked the trnascan output
>(scf_013.abinit_nomask.0.eukaryotic.trnascan) in theVoid for that contig,
>and the output seems valid to me:
>
>scf_013         1       189339  189410  Thr     AGT     0       0
>82.83
>scf_013         2       510381  510462  Ser     AGA     0       0
>67.09
>scf_013         3       586886  587000  Leu     CAA     586924  586956
>57.97
>scf_013         4       942166  942069  Leu     AAG     942128  942113
>57.48
>scf_013         5       169102  168993  Leu     TAA     169065  169037
>56.49
>
>
>Hope this can be of some help while debugging. I?ll leave trnascan off
>for now.
>
>thanks,
>
>Mikael
>
>
>10 jan 2014 kl. 22:03 skrev Carson Holt <carsonhh at gmail.com>:
>
>> Hi Mikael,
>> 
>> The options are part of the new MAKER-P integration
>> 
>>(http://www.plantphysiol.org/content/early/2013/12/06/pp.113.230144.abstr
>>ac
>> t).  Additional documentation/tutorials will be forthcoming - probably
>>in
>> a nice wiki page as part of the upcoming GMOD Malaysia courses in
>>February
>> or alternatively with the annual GMOD summer school. The tRNA option is
>> easy enough to turn on (just set trna=1 in the maker_opts.ctl file).
>> 
>> Thanks,
>> Carson
>> 
>> 
>> 
>> On 1/10/14, 2:48 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
>> wrote:
>> 
>>> Hi Carson and other maker developers,
>>> 
>>> I was reading the source code of the latest maker release and noted
>>> several references to ncRNAs, snoscan and trnascan. Can these be
>>> incorporated into the normal annotation workflow? If so, are there any
>>> instructions available for that?
>>> 
>>> best regards,
>>> Mikael Durling
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
>> 
>


From carsonhh at gmail.com  Thu Feb  6 10:05:05 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Feb 2014 09:05:05 -0700
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <C375C3D8-1B13-4685-9E90-AAF710CADCDD@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
	<C375C3D8-1B13-4685-9E90-AAF710CADCDD@gmail.com>
Message-ID: <CF19004A.9913%carsonhh@gmail.com>

Your genome.dna file has no sequence?  Did you by any chance strip the fasta
sequence from the GFF3 you are using as input to maker2zff?  There should be
fasta sequence at the end of that file.  Also can I see the GFF3 file you
are using as input to maker2zff.

Thanks,
Carson

From:  dhivya arasappan <darasappan at gmail.com>
Date:  Thursday, February 6, 2014 at 7:47 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] maker annotation with cufflinks output

Hello,

I does appear than my genome.ann file from maker2zff script has data in it.
However, the SNAP steps after that have created empty files.  The following
are all empty:

alt.dna  err.dna  export.dna  genome.dna  olp.dna  uni.dna  wrn.dna
alt.ann  err.ann  export.ann  genome.ann  olp.ann  uni.ann  wrn.ann

When I tried to get gene stats or validate genome.ann, I get errors like
this for all of them:

fathom genome.ann genome.dna -gene-stats |more
MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds
exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
exon-6:out_of_bounds
MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds
exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds
exon-1:out_of_bounds
MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds
exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds
exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds
exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds
exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds
exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds
exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds
exon-21:out_of_bounds

I'm not sure why the annotation I'm seeing in genome.ann are all showing up
as errors. I realize this may be an issue with snap, but are you familiar
with anything like this? My genome.ann file is attached for reference.

Thanks
Dhivya

On Feb 5, 2014, at 12:38 PM, Carson Holt wrote:

> Do you have any features of type snap in your results from step 3?  We?ve had
> a couple of recent posts where after training snap was giving no results, and
> as a result maker couldn?t give any genes.  One cause of something like that
> may be your step 2.  Make sure the ZFF wasn?t empty you used to train with.
> The maker2zff script uses filters to only put the best genes in the off file,
> and if all your genes fail the filtering then you are training with an empty
> ZFF.
> 
> Also you should use proteins from a related species as your protein file.  I
> see that you protein marches are varying wildly from run to run? So is your
> contig count?  Were the subset of contigs you have results for long enough to
> contain genes?
> 
> ?Carson
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Monday, February 3, 2014 at 9:31 AM
> To:  Daniel Ence <dence at genetics.utah.edu>
> Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject:  Re: [maker-devel] maker annotation with cufflinks output
> 
> Hi Daniel,
> 
> I was able to check on some of those questions.
> 
> 1. From trinity assembly: I started with 102000 contigs. I used trinotate to
> annotate proteins in this.
> 
> I ran maker on this data with est2genome set to 1. The output looks like this
> (most important parts on top):
> 
>     6653 gene
>    46675 exon
>  280534 protein_match
> 59934 CDS
>     969 contig
>  105388 expressed_sequence_match
>   12584 five_prime_UTR
>   78565 match
> 1401369 match_part
>   10180 mRNA
>   11545 three_prime_UTR
> 
> 2. From cufflinks assembly: I started with 133380 entries (out of which there
> are 29,000 transcripts).  I used the protein sequences from trinity assembly.
> 
> I ran maker on this data with est2genome set to 1. The output looks like this:
>      29 gene
>      75 exon
>  573659 protein_match
> 67 CDS
>    1099 contig
>  269298 expressed_sequence_match
>      23 five_prime_UTR
>  173844 match
> 2221846 match_part
>      29 mRNA
>      23 three_prime_UTR
> 
> The genes annotated using the trinity assembly is lower than expected, so I
> went the cufflinks route. I dont understand why when using the cufflinks
> transcripts, even less genes are being found.
> 
> 3. Training SNAP:  I used the results of maker from 1 to train SNAP.  I then
> used that training set to rerun maker:
> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap
> /RHA.hmm
> est2genome=0
> 
> And again I got results with no entries for gene, exon, CDS etc.
> 957 contig
>   46555 expressed_sequence_match
>   43651 match
>  553633 match_part
>  113738 protein_match
> 
> As I mentioned in another email, cegma results indicated that the genome was
> more than 90% complete. Any suggestions would be helpful.
> 
> Thank you
> Dhivya
> 
> 
> 
> 
> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
> 
>> Hi Dhivya, 
>> 
>> I think there a few numbers that could be helpful to understand what's
>> happening here. 
>> 
>> How many transcripts did Trinity assembly the RNA-seq data into? Also, you
>> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it
>> the cufflinks data. How many transcripts did MAKER identify with the
>> cufflinks data? Did you still get more than the 10,000 transcripts that you
>> found with just the Trinity data?
>> 
>> A key part of MAKER's approach to genome annotation that might be affecting
>> it's performance is that it only annotates a gene where there is both
>> evidence (like your RNA-seq data) and an ab-initio prediction. If a
>> prediction is unsupported by the evidence, then MAKER won't annotate a gene
>> and if evidence aligns where there's no prediction, MAKER won't annotate a
>> gene either. What ab-initio predictors are you using and have they been
>> trained specific genome?
>> 
>> You can force MAKER to automatically promote evidence alignments to a gene
>> model by setting the est2genome option to 1, but that will usually give you
>> many false positives.
>> 
>> Try rerunning it with either the Trinity data or the Cufflinks data and with
>> est2genome set to 1, and let us know how that affects the MAKER results.
>> 
>> Thanks,
>> Daniel
>> 
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> ________________________________________
>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya
>> arasappan [darasappan at gmail.com]
>> Sent: Thursday, January 30, 2014 11:18 AM
>> To: maker-devel at yandell-lab.org
>> Subject: [maker-devel] maker annotation with cufflinks output
>> 
>> Hello,
>> 
>> I am trying to annotate a 200 mb plant genome for which I have a very
>> good assembly.
>> 
>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>> using my genome assembly and the trinity results.  I did not get as
>> many transcripts as expected, around 10,000 transcripts.
>> 
>> So, I decided to try a different approach.  I did a genome assisted
>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
>> genome assembly and the cufflinks result.  I get much less number of
>> transcripts as a result.
>> 
>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>> confused as to why maker is not finding the same.
>> 
>> Any suggestions would be appreciated.
>> 
>> Thanks
>> Dhivya
>> 
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140206/9fd72060/attachment.html>

From carsonhh at gmail.com  Thu Feb  6 11:04:25 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Feb 2014 10:04:25 -0700
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
	<C375C3D8-1B13-4685-9E90-AAF710CADCDD@gmail.com>
	<CF19004A.9913%carsonhh@gmail.com>
	<02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com>
Message-ID: <CF190E83.9927%carsonhh@gmail.com>

Could you give me the file without using 'head? to trim it, its cutting it
before it reaches the part I?m interested in.

?Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Thursday, February 6, 2014 at 10:01 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] maker annotation with cufflinks output

Oh yes I did- I took just the non sequence entries in the gff file and used
that as my input.  I will rerun snap with the gff file containing the
sequences as well. 

I'm attaching a snippet of the gff file that I used as input to maker2zff.

Thanks for your help
Dhivya


On Feb 6, 2014, at 10:05 AM, Carson Holt wrote:

> Your genome.dna file has no sequence?  Did you by any chance strip the fasta
> sequence from the GFF3 you are using as input to maker2zff?  There should be
> fasta sequence at the end of that file.  Also can I see the GFF3 file you are
> using as input to maker2zff.
> 
> Thanks,
> Carson
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Thursday, February 6, 2014 at 7:47 AM
> To:  Carson Holt <carsonhh at gmail.com>
> Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
> <maker-devel at yandell-lab.org>
> Subject:  Re: [maker-devel] maker annotation with cufflinks output
> 
> Hello,
> 
> I does appear than my genome.ann file from maker2zff script has data in it.
> However, the SNAP steps after that have created empty files.  The following
> are all empty:
> 
> alt.dna  err.dna  export.dna  genome.dna  olp.dna  uni.dna  wrn.dna
> alt.ann  err.ann  export.ann  genome.ann  olp.ann  uni.ann  wrn.ann
> 
> When I tried to get gene stats or validate genome.ann, I get errors like this
> for all of them:
> 
> fathom genome.ann genome.dna -gene-stats |more
> MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds
> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
> exon-6:out_of_bounds
> MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds
> exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds
> exon-1:out_of_bounds
> MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds
> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
> MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds
> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
> exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds
> exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds
> exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds
> exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds
> exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds
> exon-21:out_of_bounds
> 
> I'm not sure why the annotation I'm seeing in genome.ann are all showing up as
> errors. I realize this may be an issue with snap, but are you familiar with
> anything like this? My genome.ann file is attached for reference.
> 
> Thanks
> Dhivya
> 
> On Feb 5, 2014, at 12:38 PM, Carson Holt wrote:
> 
>> Do you have any features of type snap in your results from step 3?  We?ve had
>> a couple of recent posts where after training snap was giving no results, and
>> as a result maker couldn?t give any genes.  One cause of something like that
>> may be your step 2.  Make sure the ZFF wasn?t empty you used to train with.
>> The maker2zff script uses filters to only put the best genes in the off file,
>> and if all your genes fail the filtering then you are training with an empty
>> ZFF.
>> 
>> Also you should use proteins from a related species as your protein file.  I
>> see that you protein marches are varying wildly from run to run? So is your
>> contig count?  Were the subset of contigs you have results for long enough to
>> contain genes?
>> 
>> ?Carson
>> 
>> From:  dhivya arasappan <darasappan at gmail.com>
>> Date:  Monday, February 3, 2014 at 9:31 AM
>> To:  Daniel Ence <dence at genetics.utah.edu>
>> Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject:  Re: [maker-devel] maker annotation with cufflinks output
>> 
>> Hi Daniel,
>> 
>> I was able to check on some of those questions.
>> 
>> 1. From trinity assembly: I started with 102000 contigs. I used trinotate to
>> annotate proteins in this.
>> 
>> I ran maker on this data with est2genome set to 1. The output looks like this
>> (most important parts on top):
>> 
>>     6653 gene
>>    46675 exon
>>  280534 protein_match
>> 59934 CDS
>>     969 contig
>>  105388 expressed_sequence_match
>>   12584 five_prime_UTR
>>   78565 match
>> 1401369 match_part
>>   10180 mRNA
>>   11545 three_prime_UTR
>> 
>> 2. From cufflinks assembly: I started with 133380 entries (out of which there
>> are 29,000 transcripts).  I used the protein sequences from trinity assembly.
>> 
>> I ran maker on this data with est2genome set to 1. The output looks like
>> this:
>>      29 gene
>>      75 exon
>>  573659 protein_match
>> 67 CDS
>>    1099 contig
>>  269298 expressed_sequence_match
>>      23 five_prime_UTR
>>  173844 match
>> 2221846 match_part
>>      29 mRNA
>>      23 three_prime_UTR
>> 
>> The genes annotated using the trinity assembly is lower than expected, so I
>> went the cufflinks route. I dont understand why when using the cufflinks
>> transcripts, even less genes are being found.
>> 
>> 3. Training SNAP:  I used the results of maker from 1 to train SNAP.  I then
>> used that training set to rerun maker:
>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/sna
>> p/RHA.hmm
>> est2genome=0
>> 
>> And again I got results with no entries for gene, exon, CDS etc.
>> 957 contig
>>   46555 expressed_sequence_match
>>   43651 match
>>  553633 match_part
>>  113738 protein_match
>> 
>> As I mentioned in another email, cegma results indicated that the genome was
>> more than 90% complete. Any suggestions would be helpful.
>> 
>> Thank you
>> Dhivya
>> 
>> 
>> 
>> 
>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>> 
>>> Hi Dhivya, 
>>> 
>>> I think there a few numbers that could be helpful to understand what's
>>> happening here.
>>> 
>>> How many transcripts did Trinity assembly the RNA-seq data into? Also, you
>>> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it
>>> the cufflinks data. How many transcripts did MAKER identify with the
>>> cufflinks data? Did you still get more than the 10,000 transcripts that you
>>> found with just the Trinity data?
>>> 
>>> A key part of MAKER's approach to genome annotation that might be affecting
>>> it's performance is that it only annotates a gene where there is both
>>> evidence (like your RNA-seq data) and an ab-initio prediction. If a
>>> prediction is unsupported by the evidence, then MAKER won't annotate a gene
>>> and if evidence aligns where there's no prediction, MAKER won't annotate a
>>> gene either. What ab-initio predictors are you using and have they been
>>> trained specific genome?
>>> 
>>> You can force MAKER to automatically promote evidence alignments to a gene
>>> model by setting the est2genome option to 1, but that will usually give you
>>> many false positives.
>>> 
>>> Try rerunning it with either the Trinity data or the Cufflinks data and with
>>> est2genome set to 1, and let us know how that affects the MAKER results.
>>> 
>>> Thanks,
>>> Daniel
>>> 
>>> Daniel Ence
>>> Graduate Student
>>> Eccles Institute of Human Genetics
>>> University of Utah
>>> 15 North 2030 East, Room 2100
>>> Salt Lake City, UT 84112-5330
>>> ________________________________________
>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya
>>> arasappan [darasappan at gmail.com]
>>> Sent: Thursday, January 30, 2014 11:18 AM
>>> To: maker-devel at yandell-lab.org
>>> Subject: [maker-devel] maker annotation with cufflinks output
>>> 
>>> Hello,
>>> 
>>> I am trying to annotate a 200 mb plant genome for which I have a very
>>> good assembly.
>>> 
>>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>>> using my genome assembly and the trinity results.  I did not get as
>>> many transcripts as expected, around 10,000 transcripts.
>>> 
>>> So, I decided to try a different approach.  I did a genome assisted
>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
>>> genome assembly and the cufflinks result.  I get much less number of
>>> transcripts as a result.
>>> 
>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>>> confused as to why maker is not finding the same.
>>> 
>>> Any suggestions would be appreciated.
>>> 
>>> Thanks
>>> Dhivya
>>> 
>>> 
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
>> _______________________________________________ maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140206/0e6ce7ae/attachment.html>

From darasappan at gmail.com  Thu Feb  6 11:01:44 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Thu, 6 Feb 2014 11:01:44 -0600
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <CF19004A.9913%carsonhh@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
	<C375C3D8-1B13-4685-9E90-AAF710CADCDD@gmail.com>
	<CF19004A.9913%carsonhh@gmail.com>
Message-ID: <02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com>

Oh yes I did- I took just the non sequence entries in the gff file and  
used that as my input.  I will rerun snap with the gff file containing  
the sequences as well.

I'm attaching a snippet of the gff file that I used as input to  
maker2zff.

Thanks for your help
Dhivya


On Feb 6, 2014, at 10:05 AM, Carson Holt wrote:

> Your genome.dna file has no sequence?  Did you by any chance strip  
> the fasta sequence from the GFF3 you are using as input to  
> maker2zff?  There should be fasta sequence at the end of that file.   
> Also can I see the GFF3 file you are using as input to maker2zff.
>
> Thanks,
> Carson
>
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Thursday, February 6, 2014 at 7:47 AM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org 
> " <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] maker annotation with cufflinks output
>
> Hello,
>
> I does appear than my genome.ann file from maker2zff script has data  
> in it. However, the SNAP steps after that have created empty files.   
> The following are all empty:
>
> alt.dna  err.dna  export.dna  genome.dna  olp.dna  uni.dna  wrn.dna
> alt.ann  err.ann  export.ann  genome.ann  olp.ann  uni.ann  wrn.ann
>
> When I tried to get gene stats or validate genome.ann, I get errors  
> like this for all of them:
>
> fathom genome.ann genome.dna -gene-stats |more
> MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds  
> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
> exon-5:out_of_bounds exon-6:out_of_bounds
> MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds  
> exon-5:out_of_bounds exon-4:out_of_bounds exon-3:out_of_bounds  
> exon-2:out_of_bounds exon-1:out_of_bounds
> MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds  
> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
> exon-5:out_of_bounds
> MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds  
> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
> exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds  
> exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds  
> exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds  
> exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds  
> exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds  
> exon-20:out_of_bounds exon-21:out_of_bounds
>
> I'm not sure why the annotation I'm seeing in genome.ann are all  
> showing up as errors. I realize this may be an issue with snap, but  
> are you familiar with anything like this? My genome.ann file is  
> attached for reference.
>
> Thanks
> Dhivya
>
> On Feb 5, 2014, at 12:38 PM, Carson Holt wrote:
>
>> Do you have any features of type snap in your results from step 3?   
>> We?ve had a couple of recent posts where after training snap was  
>> giving no results, and as a result maker couldn?t give any genes.   
>> One cause of something like that may be your step 2.  Make sure the  
>> ZFF wasn?t empty you used to train with.  The maker2zff script uses  
>> filters to only put the best genes in the off file, and if all your  
>> genes fail the filtering then you are training with an empty ZFF.
>>
>> Also you should use proteins from a related species as your protein  
>> file.  I see that you protein marches are varying wildly from run  
>> to run? So is your contig count?  Were the subset of contigs you  
>> have results for long enough to contain genes?
>>
>> ?Carson
>>
>> From: dhivya arasappan <darasappan at gmail.com>
>> Date: Monday, February 3, 2014 at 9:31 AM
>> To: Daniel Ence <dence at genetics.utah.edu>
>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] maker annotation with cufflinks output
>>
>> Hi Daniel,
>>
>> I was able to check on some of those questions.
>>
>> 1. From trinity assembly: I started with 102000 contigs. I used  
>> trinotate to annotate proteins in this.
>>
>> I ran maker on this data with est2genome set to 1. The output looks  
>> like this (most important parts on top):
>>
>>     6653 gene
>>    46675 exon
>>  280534 protein_match
>> 59934 CDS
>>     969 contig
>>  105388 expressed_sequence_match
>>   12584 five_prime_UTR
>>   78565 match
>> 1401369 match_part
>>   10180 mRNA
>>   11545 three_prime_UTR
>>
>> 2. From cufflinks assembly: I started with 133380 entries (out of  
>> which there are 29,000 transcripts).  I used the protein sequences  
>> from trinity assembly.
>>
>> I ran maker on this data with est2genome set to 1. The output looks  
>> like this:
>>      29 gene
>>      75 exon
>>  573659 protein_match
>> 67 CDS
>>    1099 contig
>>  269298 expressed_sequence_match
>>      23 five_prime_UTR
>>  173844 match
>> 2221846 match_part
>>      29 mRNA
>>      23 three_prime_UTR
>>
>> The genes annotated using the trinity assembly is lower than  
>> expected, so I went the cufflinks route. I dont understand why when  
>> using the cufflinks transcripts, even less genes are being found.
>>
>> 3. Training SNAP:  I used the results of maker from 1 to train  
>> SNAP.  I then used that training set to rerun maker:
>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
>> maker_mpi_withAlltrinity/snap/RHA.hmm
>> est2genome=0
>>
>> And again I got results with no entries for gene, exon, CDS etc.
>> 957 contig
>>   46555 expressed_sequence_match
>>   43651 match
>>  553633 match_part
>>  113738 protein_match
>>
>> As I mentioned in another email, cegma results indicated that the  
>> genome was more than 90% complete. Any suggestions would be helpful.
>>
>> Thank you
>> Dhivya
>>
>>
>>
>>
>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>>
>>> Hi Dhivya,
>>>
>>> I think there a few numbers that could be helpful to understand  
>>> what's happening here.
>>>
>>> How many transcripts did Trinity assembly the RNA-seq data into?  
>>> Also, you had 29,000 transcripts from cufflinks, but fewer from  
>>> MAKER when you gave it the cufflinks data. How many transcripts  
>>> did MAKER identify with the cufflinks data? Did you still get more  
>>> than the 10,000 transcripts that you found with just the Trinity  
>>> data?
>>>
>>> A key part of MAKER's approach to genome annotation that might be  
>>> affecting it's performance is that it only annotates a gene where  
>>> there is both evidence (like your RNA-seq data) and an ab-initio  
>>> prediction. If a prediction is unsupported by the evidence, then  
>>> MAKER won't annotate a gene and if evidence aligns where there's  
>>> no prediction, MAKER won't annotate a gene either. What ab-initio  
>>> predictors are you using and have they been trained specific genome?
>>>
>>> You can force MAKER to automatically promote evidence alignments  
>>> to a gene model by setting the est2genome option to 1, but that  
>>> will usually give you many false positives.
>>>
>>> Try rerunning it with either the Trinity data or the Cufflinks  
>>> data and with est2genome set to 1, and let us know how that  
>>> affects the MAKER results.
>>>
>>> Thanks,
>>> Daniel
>>>
>>> Daniel Ence
>>> Graduate Student
>>> Eccles Institute of Human Genetics
>>> University of Utah
>>> 15 North 2030 East, Room 2100
>>> Salt Lake City, UT 84112-5330
>>> ________________________________________
>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf  
>>> of dhivya arasappan [darasappan at gmail.com]
>>> Sent: Thursday, January 30, 2014 11:18 AM
>>> To: maker-devel at yandell-lab.org
>>> Subject: [maker-devel] maker annotation with cufflinks output
>>>
>>> Hello,
>>>
>>> I am trying to annotate a 200 mb plant genome for which I have a  
>>> very
>>> good assembly.
>>>
>>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>>> using my genome assembly and the trinity results.  I did not get as
>>> many transcripts as expected, around 10,000 transcripts.
>>>
>>> So, I decided to try a different approach.  I did a genome assisted
>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>>> generated 21,000 genes, 29,000 transcripts.  I then ran maker  
>>> using my
>>> genome assembly and the cufflinks result.  I get much less number of
>>> transcripts as a result.
>>>
>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>>> confused as to why maker is not finding the same.
>>>
>>> Any suggestions would be appreciated.
>>>
>>> Thanks
>>> Dhivya
>>>
>>>
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>> _______________________________________________ maker-devel mailing  
>> list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a662c5a7/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: head.cat.formatted.gff
Type: application/octet-stream
Size: 19905 bytes
Desc: not available
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a662c5a7/attachment.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a662c5a7/attachment-0001.html>

From sjackman at gmail.com  Thu Feb  6 18:22:57 2014
From: sjackman at gmail.com (Shaun Jackman)
Date: Thu, 6 Feb 2014 16:22:57 -0800
Subject: [maker-devel] Adding MAKER to Homebrew for ease of installation
Message-ID: <CADX6M3r=29brfAzzjPr22mAGW28VUb7np5MJz5bEjsAL-o2r-w@mail.gmail.com>

Hi MAKER developers,

I?d like to add MAKER to Homebrew <http://brew.sh> to make the installation
of MAKER and its dependencies as straight forward as brew install maker.
Homebrew is a system for installing software, originally developed for Mac
OS, and now also for Linux through
Linuxbrew<https://github.com/Homebrew/linuxbrew>.
Homebrew/science <https://github.com/Homebrew/homebrew-science> is a
collection of scientific software, which includes a lot of bioinformatics
software.

I?ve created a prototype for the MAKER installation
script<https://github.com/Homebrew/homebrew-science/blob/maker/maker.rb>(called
a formula, in Homebrew parlance). Is there a static URL for the
source code of MAKER? The current formula won?t work out of the box,
because part of the
URL<https://github.com/Homebrew/homebrew-science/blob/maker/maker.rb#L7>depends
on the user?s unique ID:
http://yandell.topaz.genetics.utah.edu/maker_downloads/$key/maker-2.28.tgz.

Would you be interested in adding MAKER to Homebrew? I know MAKER must be
licensed for commercial use. It is possible for Homebrew to display a
notice of the MAKER license when it?s installed.

MAKER is not available for commercial use without a license. Those wishing
to license MAKER for commercial use should contact Beth Drees at the
University of Utah TCO to discuss your needs.

Cheers,
Shaun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140206/404a2418/attachment.html>

From bioinformatics.umd at gmail.com  Fri Feb  7 07:29:27 2014
From: bioinformatics.umd at gmail.com (UMD Bioinformatics)
Date: Fri, 7 Feb 2014 08:29:27 -0500
Subject: [maker-devel] NCBI feature table
Message-ID: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com>

Hello Maker Developers,

I have used this software with great success and I continue to look to it going forward. However, as I?m getting ready to submit my annotations to NCBI with the genomes I haven?t found a straightforward method of turning the MAKER produced GFF files into a NCBI feature table. What is the process for creating this table? It seem that the format NCBI is looking for is unique and I haven?t uncovered any scripts or tools to assist in the creation of this table from my annotation files. If anyone has any insight on this issue it would be greatly appreciated.

Cheers
Ian


From mike.thon at gmail.com  Fri Feb  7 08:14:06 2014
From: mike.thon at gmail.com (Michael Thon)
Date: Fri, 7 Feb 2014 15:14:06 +0100
Subject: [maker-devel] NCBI feature table
In-Reply-To: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com>
References: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com>
Message-ID: <7239FEAE-64AF-4F91-B608-EDDF44B7B51D@gmail.com>

Hi Ian -

We've been struggling with this too and I started developing a script to convert the maker gff into ncbi's .tbl format.  However we found that some of the gene models required manual editing so what we do is import the gff into a commercial application called Geneious where we do the edits.  From there we export the data in genbank format and then convert it to .tbl format with a script. Our submission just passed the automated checks and we're waiting for the manual review. Probably none of my code will help you, and in any case its kind of a mess.  The only advice I can offer is to say that you'll probably need some manual editing in your workflow, if not Apollo, then some other app.  In that case you'll need to convert the output of that app into .tbl format.

> On Feb 7, 2014, at 2:29 PM, UMD Bioinformatics <bioinformatics.umd at gmail.com> wrote:
> 
> Hello Maker Developers,
> 
> I have used this software with great success and I continue to look to it going forward. However, as I?m getting ready to submit my annotations to NCBI with the genomes I haven?t found a straightforward method of turning the MAKER produced GFF files into a NCBI feature table. What is the process for creating this table? It seem that the format NCBI is looking for is unique and I haven?t uncovered any scripts or tools to assist in the creation of this table from my annotation files. If anyone has any insight on this issue it would be greatly appreciated.
> 
> Cheers
> Ian
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From cexzurjimenezjr at gmail.com  Thu Feb  6 23:27:13 2014
From: cexzurjimenezjr at gmail.com (Cexzur Jimenez Jr.)
Date: Fri, 7 Feb 2014 13:27:13 +0800
Subject: [maker-devel] Testing MAKER After Installation
Message-ID: <CABb+y6SiT7D8ZLZGLXNdBORAW5ks_GdRdvMhfb0co+kp1N1_2Q@mail.gmail.com>

Hello,

I have finished installing MAKER marked by "PERL Dependencies: INSTALLED,
External Programs: INSTALLED, MPI SUPPORT: NOT CONFIGURED,
MAKER: INSTALLED" and it seems everything's fine. I'm using MAKER 2.10 and
I have followed the installation instructions both in its corresponding
"README" and "INSTALL" files and the 2012 GMOD MAKER Tutorial. After
editing the three configuration files and run with "maker", I saw the
following error in my terminal. I have searched Google and tried the
solutions offered there but the error is still showing. Below is the error
I got:


Can't locate package GDBM_File for @AnyDBM_File::ISA at
/usr/lib/perl/5.14/DB_File.pm line 287.
Can't locate package NDBM_File for @AnyDBM_File::ISA at
/usr/lib/perl/5.14/DB_File.pm line 287.
Can't locate package SDBM_File for @AnyDBM_File::ISA at
/usr/lib/perl/5.14/DB_File.pm line 287.
A data structure will be created for you at:
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore

To access files for individual sequences use the datastore index:
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_master_datastore_index.log


--Next Contig--

#---------------------------------------------------------------------
Now starting the contig!!
SeqID: contig-dpp-500-500
Length: 32156
#---------------------------------------------------------------------


running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
/usr/local/maker/exe/RepeatMasker/RepeatMasker
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb
-species all -dir
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500
-pa 1
#-------------------------------#
Building general libraries in:
/usr/local/maker/exe/RepeatMasker/Libraries/20120418/general
RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb
on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib.
ERROR: RepeatMasker failed

FATAL ERROR
ERROR: Failed while doing repeat masking!!

ERROR: Chunk failed at level 2
!!
FAILED CONTIG:contig-dpp-500-500


--Next Contig--

Processing run.log file...
MAKER WARNING: The file
dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out
did not finish on the last run and must be erased
#---------------------------------------------------------------------
Now retrying the contig!!
SeqID: contig-dpp-500-500
Length: 32156
Retry: 1!!
#---------------------------------------------------------------------


running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
/usr/local/maker/exe/RepeatMasker/RepeatMasker
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb
-species all -dir
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500
-pa 1
#-------------------------------#
Building general libraries in:
/usr/local/maker/exe/RepeatMasker/Libraries/20120418/general
RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb
on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib.
ERROR: RepeatMasker failed

FATAL ERROR
ERROR: Failed while doing repeat masking!!

ERROR: Chunk failed at level 2
!!
FAILED CONTIG:contig-dpp-500-500


--Next Contig--

Processing run.log file...
MAKER WARNING: The file
dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out
did not finish on the last run and must be erased


Maker is now finished!!!


Can you state to me the error and what part of the installation did I go
wrong? Your help will be very much appreciated. Thank you.

Attached herein are configuration files I used for MAKER.


Sincerely,

CJ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140207/b2025b2a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: maker_bopts.ctl
Type: application/octet-stream
Size: 1501 bytes
Desc: not available
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140207/b2025b2a/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: maker_exe.ctl
Type: application/octet-stream
Size: 1319 bytes
Desc: not available
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140207/b2025b2a/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: maker_opts.ctl
Type: application/octet-stream
Size: 4540 bytes
Desc: not available
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140207/b2025b2a/attachment-0002.obj>

From carson.holt at genetics.utah.edu  Fri Feb  7 12:11:44 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Fri, 7 Feb 2014 18:11:44 +0000
Subject: [maker-devel] Maker installation
In-Reply-To: <CAEpzfGCB9HFkj+Kd2suNTRN_prriqipM26kdj=3gW=QygmXjmw@mail.gmail.com>
References: <CAEpzfGCB9HFkj+Kd2suNTRN_prriqipM26kdj=3gW=QygmXjmw@mail.gmail.com>
Message-ID: <CF1A6E45.99DF%carson.holt@genetics.utah.edu>

Hi Tracy,

The older apollo is pretty much deprecated.  There are still people who like to use it though (myself among them).  You can download and install it manually from here ?> http://sourceforge.net/projects/gmod/files/Apollo/.

If you want to let MAKER install it for you, you can edit the URL in the .../maker/src/locations file to be this ?> http://weatherby.genetics.utah.edu/apollo/apollo.tar.gz

You can also use Web-Apollo for your data if you want, and that is what I would recommend.

On a side note, if you are trying to install the old Apollo as part of the optional web-based GUI, I?d recommend not doing that.  The GUI is really only for demonstration purposes or very small datasets.  It is not for production (that is why it is off by default).

Thanks,
Carson

From: Tracy Smith <tmsmith23 at wisc.edu<mailto:tmsmith23 at wisc.edu>>
Date: Friday, February 7, 2014 at 10:48 AM
To: Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>
Cc: <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Maker installation

Hi,

I am trying to install Maker and am running into the same problem noted on this page, namely I cannot install Apollo.

https://groups.google.com/forum/#!msg/maker-devel/vrVa2mEsKbg/0e_25LvOvdEJ

I tried using the new url you provided, "Here is a new location for the source --> http://sourceforge.net/code-snapshots/svn/g/gm/gmod/svn/gmod-svn-25291-apollo-trunk.zip"
but that url now points nowhere.

Is it possible to use WebApollo instead? Or do you know of another location where a copy of Apollo could be downloaded?

Thank you so much.

Best regards,
Tracy

--
Tracy Smith
University of Wisconsin- Madison
Pepperell Lab
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140207/9ac7950e/attachment.html>

From carson.holt at genetics.utah.edu  Fri Feb  7 12:28:29 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Fri, 7 Feb 2014 18:28:29 +0000
Subject: [maker-devel] NCBI feature table
In-Reply-To: <7239FEAE-64AF-4F91-B608-EDDF44B7B51D@gmail.com>
References: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com>
	<7239FEAE-64AF-4F91-B608-EDDF44B7B51D@gmail.com>
Message-ID: <CF1A7331.9A09%carson.holt@genetics.utah.edu>

Yes.  The non-web version of apollo can open GFF3 and then save to table
format ?> http://sourceforge.net/projects/gmod/files/Apollo/

I?ve also attached a script made by a lab member that can convert MAKER
derived GFF3 gene entries into raw table format, and I?ve CC?d the scripts
author (Michael Campbell) incase you have any questions.

Thanks,
Carson


On 2/7/14, 7:14 AM, "Michael Thon" <mike.thon at gmail.com> wrote:

>Hi Ian -
>
>We've been struggling with this too and I started developing a script to
>convert the maker gff into ncbi's .tbl format.  However we found that
>some of the gene models required manual editing so what we do is import
>the gff into a commercial application called Geneious where we do the
>edits.  From there we export the data in genbank format and then convert
>it to .tbl format with a script. Our submission just passed the automated
>checks and we're waiting for the manual review. Probably none of my code
>will help you, and in any case its kind of a mess.  The only advice I can
>offer is to say that you'll probably need some manual editing in your
>workflow, if not Apollo, then some other app.  In that case you'll need
>to convert the output of that app into .tbl format.
>
>> On Feb 7, 2014, at 2:29 PM, UMD Bioinformatics
>><bioinformatics.umd at gmail.com> wrote:
>> 
>> Hello Maker Developers,
>> 
>> I have used this software with great success and I continue to look to
>>it going forward. However, as I?m getting ready to submit my annotations
>>to NCBI with the genomes I haven?t found a straightforward method of
>>turning the MAKER produced GFF files into a NCBI feature table. What is
>>the process for creating this table? It seem that the format NCBI is
>>looking for is unique and I haven?t uncovered any scripts or tools to
>>assist in the creation of this table from my annotation files. If anyone
>>has any insight on this issue it would be greatly appreciated.
>> 
>> Cheers
>> Ian
>> 
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: gff32table
Type: application/octet-stream
Size: 7511 bytes
Desc: gff32table
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140207/2e51f964/attachment.obj>

From carson.holt at genetics.utah.edu  Fri Feb  7 12:31:17 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Fri, 7 Feb 2014 18:31:17 +0000
Subject: [maker-devel] Testing MAKER After Installation
In-Reply-To: <CABb+y6SiT7D8ZLZGLXNdBORAW5ks_GdRdvMhfb0co+kp1N1_2Q@mail.gmail.com>
References: <CABb+y6SiT7D8ZLZGLXNdBORAW5ks_GdRdvMhfb0co+kp1N1_2Q@mail.gmail.com>
Message-ID: <CF1A7417.9A11%carson.holt@genetics.utah.edu>

That can happen on some systems with that very old version of MAKER.  Use MAKER 2.28 or 2.30 instead ?> http://www.yandell-lab.org/software/maker.html

Thanks,
Carson


From: "Cexzur Jimenez Jr." <cexzurjimenezjr at gmail.com<mailto:cexzurjimenezjr at gmail.com>>
Date: Thursday, February 6, 2014 at 10:27 PM
To: <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: [maker-devel] Testing MAKER After Installation

Hello,

I have finished installing MAKER marked by "PERL Dependencies: INSTALLED, External Programs: INSTALLED, MPI SUPPORT: NOT CONFIGURED,
MAKER: INSTALLED" and it seems everything's fine. I'm using MAKER 2.10 and I have followed the installation instructions both in its corresponding "README" and "INSTALL" files and the 2012 GMOD MAKER Tutorial. After editing the three configuration files and run with "maker", I saw the following error in my terminal. I have searched Google and tried the solutions offered there but the error is still showing. Below is the error I got:


Can't locate package GDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287.
Can't locate package NDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287.
Can't locate package SDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287.
A data structure will be created for you at:
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore

To access files for individual sequences use the datastore index:
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_master_datastore_index.log


--Next Contig--

#---------------------------------------------------------------------
Now starting the contig!!
SeqID: contig-dpp-500-500
Length: 32156
#---------------------------------------------------------------------


running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
/usr/local/maker/exe/RepeatMasker/RepeatMasker /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb -species all -dir /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500 -pa 1
#-------------------------------#
Building general libraries in: /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general
RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib.
ERROR: RepeatMasker failed

FATAL ERROR
ERROR: Failed while doing repeat masking!!

ERROR: Chunk failed at level 2
!!
FAILED CONTIG:contig-dpp-500-500


--Next Contig--

Processing run.log file...
MAKER WARNING: The file dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out
did not finish on the last run and must be erased
#---------------------------------------------------------------------
Now retrying the contig!!
SeqID: contig-dpp-500-500
Length: 32156
Retry: 1!!
#---------------------------------------------------------------------


running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
/usr/local/maker/exe/RepeatMasker/RepeatMasker /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb -species all -dir /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500 -pa 1
#-------------------------------#
Building general libraries in: /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general
RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib.
ERROR: RepeatMasker failed

FATAL ERROR
ERROR: Failed while doing repeat masking!!

ERROR: Chunk failed at level 2
!!
FAILED CONTIG:contig-dpp-500-500


--Next Contig--

Processing run.log file...
MAKER WARNING: The file dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out
did not finish on the last run and must be erased


Maker is now finished!!!


Can you state to me the error and what part of the installation did I go wrong? Your help will be very much appreciated. Thank you.

Attached herein are configuration files I used for MAKER.


Sincerely,

CJ

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140207/333ceab2/attachment.html>

From bhall7 at hawaii.edu  Fri Feb  7 18:31:36 2014
From: bhall7 at hawaii.edu (Brian Hall)
Date: Fri, 07 Feb 2014 14:31:36 -1000
Subject: [maker-devel] NCBI feature table
In-Reply-To: <mailman.61.1391786169.26968.maker-devel_yandell-lab.org@box290.bluehost.com>
References: <mailman.61.1391786169.26968.maker-devel_yandell-lab.org@box290.bluehost.com>
Message-ID: <52F57AE8.5090002@hawaii.edu>

Hi Ian,

My colleagues are also working on preparing a genome for submission to 
the NCBI. The software we are developing for this task is still a work 
in progress, but you are welcome to give it a try:

https://github.com/tedsta/GAG

It's a console-based application and it requires Python 2.6. Its 
strength is in filtering and modifying large segments of the genome at 
once -- where Apollo is good for removing a few erroneous exons, we are 
dealing with lists of dozens or more. This program seeks to make such 
changes as painless as possible.

My advice is to try the simplest gff3-to-tbl script you can find and 
then run tbl2asn. If it works out okay, great! If you get a massive 
error report, get in touch and we'll help you out if we can :)

--Brian

On 02/07/2014 05:16 AM, maker-devel-request at yandell-lab.org wrote:
> Date: Fri, 7 Feb 2014 08:29:27 -0500
> From: UMD Bioinformatics <bioinformatics.umd at gmail.com>
> To: maker-devel at yandell-lab.org
> Subject: [maker-devel] NCBI feature table
> Message-ID: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8 at gmail.com>
> Content-Type: text/plain; charset=windows-1252
>
> Hello Maker Developers,
>
> I have used this software with great success and I continue to look to it going forward. However, as I?m getting ready to submit my annotations to NCBI with the genomes I haven?t found a straightforward method of turning the MAKER produced GFF files into a NCBI feature table. What is the process for creating this table? It seem that the format NCBI is looking for is unique and I haven?t uncovered any scripts or tools to assist in the creation of this table from my annotation files. If anyone has any insight on this issue it would be greatly appreciated.
>
> Cheers
> Ian
>


From tmsmith23 at wisc.edu  Fri Feb  7 11:48:13 2014
From: tmsmith23 at wisc.edu (Tracy Smith)
Date: Fri, 7 Feb 2014 11:48:13 -0600
Subject: [maker-devel] Maker installation
Message-ID: <CAEpzfGCB9HFkj+Kd2suNTRN_prriqipM26kdj=3gW=QygmXjmw@mail.gmail.com>

Hi,

I am trying to install Maker and am running into the same problem noted on
this page, namely I cannot install Apollo.

https://groups.google.com/forum/#!msg/maker-devel/vrVa2mEsKbg/0e_25LvOvdEJ

I tried using the new url you provided, "Here is a new location for the
source -->
http://sourceforge.net/code-snapshots/svn/g/gm/gmod/svn/gmod-svn-25291-apollo-trunk.zip
"
but that url now points nowhere.

Is it possible to use WebApollo instead? Or do you know of another location
where a copy of Apollo could be downloaded?

Thank you so much.

Best regards,
Tracy

-- 
Tracy Smith
University of Wisconsin- Madison
Pepperell Lab
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140207/0ddc7929/attachment.html>

From carsonhh at gmail.com  Mon Feb 10 09:34:58 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Feb 2014 08:34:58 -0700
Subject: [maker-devel] MAKER presentation at PAG
In-Reply-To: <CAAer89Z=ivW==Pv0eSA+RQtPg1r9JoLHv7hH+TP2c4=DUwh8tg@mail.gmail.com>
References: <CAAer89Z=ivW==Pv0eSA+RQtPg1r9JoLHv7hH+TP2c4=DUwh8tg@mail.gmail.com>
Message-ID: <CF1E3E65.9B13%carsonhh@gmail.com>

* 
* maker_map_ids - Build shorter IDs/Names for MAKER genes and transcripts
following the NCBI suggested naming format.
* map_fasta_ids - Maps short IDs/Names generated by maker_map_ids to MAKER
fasta files.
* map_gff_ids - Maps short IDs/Names generated by maker_map_id to MAKER GFF3
files, old IDs/Names are mapped to to the Alias attribute.
* maker_functional_fasta - Maps putative functions identified from BLASTP
against UniProt/SwissProt to the MAKER produced transcript and protein fasta
files.
* maker_functional_gff - Maps putative functions identified from BLASTP
against UniProt/SwissProt to the MAKER produced GFF3 files in the Note
attribute
* ipr_update_gff - Takes InterproScan (iprscan) output and maps domain IDs
and GO terms to the Dbxref and Ontology_term attributes in the GFF3 file.
This is meta data that shows up when you click on an annotation in JBrowse
/GBrowse.
* iprscan2gff3 - Takes InerproScan (iprscan) output and generates GFF3
features representing domains. Interesting tier for GBrowse. These are
visible features tracks that can be seen in JBrowse/GBrowse.
Thanks,
Carson

From:  Kevin Dorn <dorn at umn.edu>
Date:  Sunday, February 9, 2014 at 9:23 PM
To:  <carson.holt at utah.edu>
Subject:  MAKER presentation at PAG

Hi Carson, 

I saw your MAKER presentation at PAG this year and have a quick question.
I've used MAKER to annotate the plant genome we're working on, and am mostly
done. I had to step out for a second during your talk, and when I came back,
you were talking about how you can transfer meaningful annotations (getting
rid of the 'ugly MAKER names' for genes). Is there an accessory script to do
this? 

Thanks, 
Kevin Dorn 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140210/26f43039/attachment.html>

From amitha at ccmb.res.in  Mon Feb 10 01:04:37 2014
From: amitha at ccmb.res.in (AMITHA SAMPATH KUMAR)
Date: Mon, 10 Feb 2014 12:34:37 +0530 (IST)
Subject: [maker-devel] Falied to create new account
In-Reply-To: <bea52988-c660-488d-aae4-196364348cea@node1>
Message-ID: <11349995-a97a-43fd-9fd6-420dd067cd6b@node1>

Hi,

I an interested in using Maker online version, for which i tried to create a profile using the email id 'amitha at ccmb.res.in', but unfortunately, I did not successfully login. 
I am also pasting a link of the error here, http://weatherby.genetics.utah.edu/cgi-bin/mwas/maker.cgi.

The error mentioned is:
Error executing run mode 'forgot_login': Can't call method "MailMsg" without a package or object reference at /var/www/cgi-bin/mwas/lib/MWAS_util.pm line 529.
 at /var/www/cgi-bin/mwas/maker.cgi line 21.

Kindly help me through the registration asap.

Thanks
Amitha.


From listona at science.oregonstate.edu  Sat Feb  8 20:08:42 2014
From: listona at science.oregonstate.edu (Aaron Liston)
Date: Sat, 08 Feb 2014 18:08:42 -0800
Subject: [maker-devel] Re-using repeat masking in SNAP training
Message-ID: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu>

I am following the tutorial for training SNAP, and it works fine.  
However, the tutorial instructions have MAKER repeat the repeat  
masking. To avoid this, I concatenated my gff files from the first  
round of annotation and used maker_gff=round1.gff and rm_pass=1  but  
at the end of the process, the repeat annotations were not there. Any  
suggestions?  Thanks, Aaron


From caigh02 at gmail.com  Sun Feb  9 21:26:57 2014
From: caigh02 at gmail.com (Guohong Cai)
Date: Sun, 9 Feb 2014 21:26:57 -0600
Subject: [maker-devel] Fwd: Functional annotation of MAKER gene models
In-Reply-To: <CAOcLemT5qaFvSRfjQ1QrObr9WCLh915aJ14a7ZbSemcuOBypfQ@mail.gmail.com>
References: <CAOcLemT5qaFvSRfjQ1QrObr9WCLh915aJ14a7ZbSemcuOBypfQ@mail.gmail.com>
Message-ID: <CAOcLemT3CCPmWMpwoZr_w322Gv9ZXFrmD70t7ygZWOk1Kq9TMg@mail.gmail.com>

I sent the following message to Carson but forgot to send to the
maker-devel list

Hi Carson,

Again need your help!

With your guidance, I have the gene models for my genomes. Now I am trying
to assign functions to the gene models. I noticed that I can use
maker_functional_gff/fasta or interproScan. I dig out some old messages in
maker-devel google group, but still have a few questions:

1. Will maker_functional_gff/fasta take NCBI blastp results, or only
wu-blast results? I do not have wu-blast.

2. Do I have to use Uniprot/Swiss_prot database or I can use something
else? For example, may I add a few high-quality genome annotations of
related species to the swiss_prot database? Or may I use Uniref90 or nr
database instead of swiss_prot?

3. Do you have a script to integrate blast2go results to the maker
gff/fasta?

Thanks.

Guohong

Rutgers University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140209/bad045be/attachment.html>

From carsonhh at gmail.com  Mon Feb 10 11:25:06 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Feb 2014 10:25:06 -0700
Subject: [maker-devel] Falied to create new account
In-Reply-To: <11349995-a97a-43fd-9fd6-420dd067cd6b@node1>
References: <bea52988-c660-488d-aae4-196364348cea@node1>
	<11349995-a97a-43fd-9fd6-420dd067cd6b@node1>
Message-ID: <CF1E5936.9B37%carsonhh@gmail.com>

The smtp server that sends e-mails out is just down.  So when you said you
forgot your login, it couldn?t e-mail you.  I switched to a different
server for the time being.

?Carson


On 2/10/14, 12:04 AM, "AMITHA SAMPATH KUMAR" <amitha at ccmb.res.in> wrote:

>Hi,
>
>I an interested in using Maker online version, for which i tried to
>create a profile using the email id 'amitha at ccmb.res.in', but
>unfortunately, I did not successfully login.
>I am also pasting a link of the error here,
>http://weatherby.genetics.utah.edu/cgi-bin/mwas/maker.cgi.
>
>The error mentioned is:
>Error executing run mode 'forgot_login': Can't call method "MailMsg"
>without a package or object reference at
>/var/www/cgi-bin/mwas/lib/MWAS_util.pm line 529.
> at /var/www/cgi-bin/mwas/maker.cgi line 21.
>
>Kindly help me through the registration asap.
>
>Thanks
>Amitha.
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Mon Feb 10 11:26:06 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Feb 2014 10:26:06 -0700
Subject: [maker-devel] Fwd: Functional annotation of MAKER gene models
In-Reply-To: <CAOcLemT3CCPmWMpwoZr_w322Gv9ZXFrmD70t7ygZWOk1Kq9TMg@mail.gmail.com>
References: <CAOcLemT5qaFvSRfjQ1QrObr9WCLh915aJ14a7ZbSemcuOBypfQ@mail.gmail.com>
	<CAOcLemT3CCPmWMpwoZr_w322Gv9ZXFrmD70t7ygZWOk1Kq9TMg@mail.gmail.com>
Message-ID: <CF1E59B4.9B3B%carsonhh@gmail.com>

1. yes. It should take NCBI BLAST+ results.
2. It has to be UniProt/Swissprot or you can modify the comments of another
database to look like UniProt/Swissport
3. ipr_update_gff, can also take BLAST2GO results as an undocumented feature
(or at least it could last time I tested it - which was quite a long time
ago).

Thanks,
Carson

From:  Guohong Cai <caigh02 at gmail.com>
Date:  Sunday, February 9, 2014 at 8:26 PM
To:  <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Fwd: Functional annotation of MAKER gene models

I sent the following message to Carson but forgot to send to the maker-devel
list

Hi Carson,

Again need your help!

With your guidance, I have the gene models for my genomes. Now I am trying
to assign functions to the gene models. I noticed that I can use
maker_functional_gff/fasta or interproScan. I dig out some old messages in
maker-devel google group, but still have a few questions:

1. Will maker_functional_gff/fasta take NCBI blastp results, or only
wu-blast results? I do not have wu-blast.

2. Do I have to use Uniprot/Swiss_prot database or I can use something else?
For example, may I add a few high-quality genome annotations of related
species to the swiss_prot database? Or may I use Uniref90 or nr database
instead of swiss_prot?

3. Do you have a script to integrate blast2go results to the maker
gff/fasta?  

Thanks.

Guohong

Rutgers University 

_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140210/5042428b/attachment.html>

From barry.utah at gmail.com  Mon Feb 10 13:21:31 2014
From: barry.utah at gmail.com (Barry Moore)
Date: Mon, 10 Feb 2014 12:21:31 -0700
Subject: [maker-devel] Re-using repeat masking in SNAP training
In-Reply-To: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu>
References: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu>
Message-ID: <78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu>

Hi Arron,

If you re-run maker and don't change the details about the repeat library (i.e. you only update the SNAP HMM file) then MAKER shouldn't redo any work with repeat masking it should reuse the work it has already done.  Is this not what you are seeing?

Barry


On Feb 8, 2014, at 7:08 PM, Aaron Liston wrote:

> I am following the tutorial for training SNAP, and it works fine. However, the tutorial instructions have MAKER repeat the repeat masking. To avoid this, I concatenated my gff files from the first round of annotation and used maker_gff=round1.gff and rm_pass=1  but at the end of the process, the repeat annotations were not there. Any suggestions?  Thanks, Aaron
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140210/15a1305a/attachment.html>

From listona at science.oregonstate.edu  Mon Feb 10 13:46:06 2014
From: listona at science.oregonstate.edu (Aaron Liston)
Date: Mon, 10 Feb 2014 11:46:06 -0800
Subject: [maker-devel] Re-using repeat masking in SNAP training
In-Reply-To: <78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu>
References: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu>
	<78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu>
Message-ID: <02b401cf2698$bd2a1550$377e3ff0$@science.oregonstate.edu>

Hi Barry:   I changed the name of the genome file, so that I could see the
results at each step. However, it sounds like if I had kept the same name,
MAKER would use the info from the previous run.  Is that correct?  Aaron

 
From: Barry Moore [mailto:barry.utah at gmail.com] 
Sent: Monday, February 10, 2014 11:22 AM
To: Aaron Liston
Cc: maker-devel at yandell-lab.org
Subject: Re: [maker-devel] Re-using repeat masking in SNAP training

 
Hi Arron,

 
If you re-run maker and don't change the details about the repeat library
(i.e. you only update the SNAP HMM file) then MAKER shouldn't redo any work
with repeat masking it should reuse the work it has already done.  Is this
not what you are seeing?

 
Barry

 
On Feb 8, 2014, at 7:08 PM, Aaron Liston wrote:


I am following the tutorial for training SNAP, and it works fine. However,
the tutorial instructions have MAKER repeat the repeat masking. To avoid
this, I concatenated my gff files from the first round of annotation and
used maker_gff=round1.gff and rm_pass=1  but at the end of the process, the
repeat annotations were not there. Any suggestions?  Thanks, Aaron


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

 
Barry Moore

Research Scientist

Dept. of Human Genetics

University of Utah

Salt Lake City, UT 84112

--------------------------------------------

(801) 585-3543

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140210/6c808a76/attachment.html>

From barry.utah at gmail.com  Mon Feb 10 13:56:26 2014
From: barry.utah at gmail.com (Barry Moore)
Date: Mon, 10 Feb 2014 12:56:26 -0700
Subject: [maker-devel] Re-using repeat masking in SNAP training
In-Reply-To: <02b401cf2698$bd2a1550$377e3ff0$@science.oregonstate.edu>
References: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu>
	<78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu>
	<02b401cf2698$bd2a1550$377e3ff0$@science.oregonstate.edu>
Message-ID: <19FC4633-46F6-4B32-820A-A68C242A1E77@gmail.com>

Yep.  If you want to keep the results from each step just copy the GFF3 file from your first run to a new name and then redo your run.

B

On Feb 10, 2014, at 12:46 PM, Aaron Liston wrote:

> Hi Barry:   I changed the name of the genome file, so that I could see the results at each step. However, it sounds like if I had kept the same name, MAKER would use the info from the previous run.  Is that correct?  Aaron
>  
> From: Barry Moore [mailto:barry.utah at gmail.com] 
> Sent: Monday, February 10, 2014 11:22 AM
> To: Aaron Liston
> Cc: maker-devel at yandell-lab.org
> Subject: Re: [maker-devel] Re-using repeat masking in SNAP training
>  
> Hi Arron,
>  
> If you re-run maker and don't change the details about the repeat library (i.e. you only update the SNAP HMM file) then MAKER shouldn't redo any work with repeat masking it should reuse the work it has already done.  Is this not what you are seeing?
>  
> Barry
>  
>  
> On Feb 8, 2014, at 7:08 PM, Aaron Liston wrote:
> 
> 
> I am following the tutorial for training SNAP, and it works fine. However, the tutorial instructions have MAKER repeat the repeat masking. To avoid this, I concatenated my gff files from the first round of annotation and used maker_gff=round1.gff and rm_pass=1  but at the end of the process, the repeat annotations were not there. Any suggestions?  Thanks, Aaron
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>  
> Barry Moore
> Research Scientist
> Dept. of Human Genetics
> University of Utah
> Salt Lake City, UT 84112
> --------------------------------------------
> (801) 585-3543
>  
>  
>  
>  

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140210/344b73a2/attachment.html>

From dence at genetics.utah.edu  Tue Feb 11 12:37:36 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Tue, 11 Feb 2014 18:37:36 +0000
Subject: [maker-devel] Falied to create new account
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A89089AF@SKREGIXES2.AGR.GC.CA>
References: <bea52988-c660-488d-aae4-196364348cea@node1>
	<11349995-a97a-43fd-9fd6-420dd067cd6b@node1>
	<CF1E5936.9B37%carsonhh@gmail.com>
	<E8EDFB90D92694478065C37017B3A3A6A8908910@SKREGIXES2.AGR.GC.CA>
	<CF1FA919.9BBB%carsonhh@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D445A3@mxb2.hg.genetics.utah.edu>,
	<E8EDFB90D92694478065C37017B3A3A6A89089AF@SKREGIXES2.AGR.GC.CA>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D445B3@mxb2.hg.genetics.utah.edu>

Hossein, 

Ok. So since this error came up on a local install, I'm going to need some more information to understand what went wrong. Is it the same contig that always causes this error? If it is, then is the the only error or warning that MAKER encounters while running on this contig? Or, if multiple contigs fail, then is it always the same error? 

If you can narrow it down to the smallest possible dataset that consistently gives the same error, then we canb egin to understand what's wrong. 

Thanks,
Daniel 


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
Sent: Tuesday, February 11, 2014 11:20 AM
To: Daniel Ence
Subject: Re: [maker-devel] Falied to create new account

Hi Daniel

I running it through the local server at my work


M. Hossein Borhan, Ph.D.
Research Scientist/ Chercheur Scientifique
Saskatoon Research Centre/Centre de Recherches de Saskatoon
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
107 Science Place, Saskatoon, SK.,S7N 0X2
Telephone/T?l?phone: (306) 385-9441
Facsimile/T?l?copieur: (306) 385-9482
Hossein.borhan at agr.gc.ca


On 14-02-11 12:16 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hi Hossein,
>
>Did you encounter this error while you were running MAKER on your local
>machine or through the MAKER web annotation service?
>
>Thanks,
>Daniel
>
>
>Daniel Ence
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: Carson Holt [carsonhh at gmail.com]
>Sent: Tuesday, February 11, 2014 10:18 AM
>To: Daniel Ence
>Cc: Mark Yandell
>Subject: FW: [maker-devel] Falied to create new account
>
>Hey Daniel could you download his dataset, and see if you can replicate
>the error.  Also check if this was an MWAS job or a local maker run (his
>dataset will already be there for MWAS, you just need the job ID).
>
>Thanks,
>Carson
>
>On 2/11/14, 10:16 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:
>
>>Hi Carson
>>
>>
>>I encountered this error while running maker
>>
>>FATAL ERROR
>>ERROR: Failed while processing the chunk divide!!
>>
>>ERROR: Chunk failed at level 17
>>!!
>>FAILED CONTIG:PbPT3Sc00006
>>
>>
>>
>>
>>
>>HB
>>
>>
>>
>>
>>
>>
>>
>>>
>>
>
>


From darasappan at gmail.com  Tue Feb 11 12:48:23 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Tue, 11 Feb 2014 12:48:23 -0600
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <CF19187C.994D%carsonhh@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
	<C375C3D8-1B13-4685-9E90-AAF710CADCDD@gmail.com>
	<CF19004A.9913%carsonhh@gmail.com>
	<02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com>
	<CF190E83.9927%carsonhh@gmail.com>
	<CAGWaY_4mGU2DLWwcQ=_F3-O+YE1ZmDtE=zgdi6cVouhkH=N5HQ@mail.gmail.com>
	<CF19187C.994D%carsonhh@gmail.com>
Message-ID: <0BB3E178-1CA3-46E7-8923-3E7C6B834665@gmail.com>

With your suggested changes (using a protein file not derived from the  
RNA-seq data and fixing the gff file for training SNAP), I was able to  
increase the number of genes from 6000+ to 18116.

I'm now trying to evaluate the quality of the annotation.  I have a  
question about the usage for mpi_evaluator.

In the maker tutorial,  the usage is given as:

  mpi_evaluator [options] <eval_opts> <eval_bopts> <eval_exe>
What files are being referred to in the input parameters: eval_opts,  
eval_bopts and eval_exe?

Thanks
Dhivya

On Feb 6, 2014, at 11:47 AM, Carson Holt wrote:

> Ok.  Content looks good.  Just make sure to use gff3_merge to join  
> the GFF3?s without stripping out the fasta sequence at the end when  
> training SNAP.
>
> Thanks,
> Carson
>
>
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Thursday, February 6, 2014 at 10:29 AM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: Daniel Ence <dence at genetics.utah.edu>
> Subject: Re: [maker-devel] maker annotation with cufflinks output
>
> Sorry I was just trying to make it small enough to be approved by  
> the mailing list.
>
> Here is the whole file:
>
>
>  cat.formatted.gff.tgz
>
>
>
> On Thu, Feb 6, 2014 at 11:04 AM, Carson Holt <carsonhh at gmail.com>  
> wrote:
>> Could you give me the file without using 'head? to trim it, its  
>> cutting it before it reaches the part I?m interested in.
>>
>> ?Carson
>>
>>
>> From: dhivya arasappan <darasappan at gmail.com>
>> Date: Thursday, February 6, 2014 at 10:01 AM
>>
>> To: Carson Holt <carsonhh at gmail.com>
>> Cc: Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org 
>> " <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] maker annotation with cufflinks output
>>
>> Oh yes I did- I took just the non sequence entries in the gff file  
>> and used that as my input.  I will rerun snap with the gff file  
>> containing the sequences as well.
>>
>> I'm attaching a snippet of the gff file that I used as input to  
>> maker2zff.
>>
>> Thanks for your help
>> Dhivya
>>
>>
>>
>>
>> On Feb 6, 2014, at 10:05 AM, Carson Holt wrote:
>>
>>> Your genome.dna file has no sequence?  Did you by any chance strip  
>>> the fasta sequence from the GFF3 you are using as input to  
>>> maker2zff?  There should be fasta sequence at the end of that  
>>> file.  Also can I see the GFF3 file you are using as input to  
>>> maker2zff.
>>>
>>> Thanks,
>>> Carson
>>>
>>> From: dhivya arasappan <darasappan at gmail.com>
>>> Date: Thursday, February 6, 2014 at 7:47 AM
>>> To: Carson Holt <carsonhh at gmail.com>
>>> Cc: Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org 
>>> " <maker-devel at yandell-lab.org>
>>> Subject: Re: [maker-devel] maker annotation with cufflinks output
>>>
>>> Hello,
>>>
>>> I does appear than my genome.ann file from maker2zff script has  
>>> data in it. However, the SNAP steps after that have created empty  
>>> files.  The following are all empty:
>>>
>>> alt.dna  err.dna  export.dna  genome.dna  olp.dna  uni.dna  wrn.dna
>>> alt.ann  err.ann  export.ann  genome.ann  olp.ann  uni.ann  wrn.ann
>>>
>>> When I tried to get gene stats or validate genome.ann, I get  
>>> errors like this for all of them:
>>>
>>> fathom genome.ann genome.dna -gene-stats |more
>>> MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds  
>>> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
>>> exon-5:out_of_bounds exon-6:out_of_bounds
>>> MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds  
>>> exon-5:out_of_bounds exon-4:out_of_bounds exon-3:out_of_bounds  
>>> exon-2:out_of_bounds exon-1:out_of_bounds
>>> MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds  
>>> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
>>> exon-5:out_of_bounds
>>> MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds  
>>> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
>>> exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds  
>>> exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds  
>>> exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds  
>>> exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds  
>>> exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds  
>>> exon-20:out_of_bounds exon-21:out_of_bounds
>>>
>>> I'm not sure why the annotation I'm seeing in genome.ann are all  
>>> showing up as errors. I realize this may be an issue with snap,  
>>> but are you familiar with anything like this? My genome.ann file  
>>> is attached for reference.
>>>
>>> Thanks
>>> Dhivya
>>>
>>> On Feb 5, 2014, at 12:38 PM, Carson Holt wrote:
>>>
>>>> Do you have any features of type snap in your results from step  
>>>> 3?  We?ve had a couple of recent posts where after training snap  
>>>> was giving no results, and as a result maker couldn?t give any  
>>>> genes.  One cause of something like that may be your step 2.   
>>>> Make sure the ZFF wasn?t empty you used to train with.  The  
>>>> maker2zff script uses filters to only put the best genes in the  
>>>> off file, and if all your genes fail the filtering then you are  
>>>> training with an empty ZFF.
>>>>
>>>> Also you should use proteins from a related species as your  
>>>> protein file.  I see that you protein marches are varying wildly  
>>>> from run to run? So is your contig count?  Were the subset of  
>>>> contigs you have results for long enough to contain genes?
>>>>
>>>> ?Carson
>>>>
>>>> From: dhivya arasappan <darasappan at gmail.com>
>>>> Date: Monday, February 3, 2014 at 9:31 AM
>>>> To: Daniel Ence <dence at genetics.utah.edu>
>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>> Subject: Re: [maker-devel] maker annotation with cufflinks output
>>>>
>>>> Hi Daniel,
>>>>
>>>> I was able to check on some of those questions.
>>>>
>>>> 1. From trinity assembly: I started with 102000 contigs. I used  
>>>> trinotate to annotate proteins in this.
>>>>
>>>> I ran maker on this data with est2genome set to 1. The output  
>>>> looks like this (most important parts on top):
>>>>
>>>>     6653 gene
>>>>    46675 exon
>>>>  280534 protein_match
>>>> 59934 CDS
>>>>     969 contig
>>>>  105388 expressed_sequence_match
>>>>   12584 five_prime_UTR
>>>>   78565 match
>>>> 1401369 match_part
>>>>   10180 mRNA
>>>>   11545 three_prime_UTR
>>>>
>>>> 2. From cufflinks assembly: I started with 133380 entries (out of  
>>>> which there are 29,000 transcripts).  I used the protein  
>>>> sequences from trinity assembly.
>>>>
>>>> I ran maker on this data with est2genome set to 1. The output  
>>>> looks like this:
>>>>      29 gene
>>>>      75 exon
>>>>  573659 protein_match
>>>> 67 CDS
>>>>    1099 contig
>>>>  269298 expressed_sequence_match
>>>>      23 five_prime_UTR
>>>>  173844 match
>>>> 2221846 match_part
>>>>      29 mRNA
>>>>      23 three_prime_UTR
>>>>
>>>> The genes annotated using the trinity assembly is lower than  
>>>> expected, so I went the cufflinks route. I dont understand why  
>>>> when using the cufflinks transcripts, even less genes are being  
>>>> found.
>>>>
>>>> 3. Training SNAP:  I used the results of maker from 1 to train  
>>>> SNAP.  I then used that training set to rerun maker:
>>>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
>>>> maker_mpi_withAlltrinity/snap/RHA.hmm
>>>> est2genome=0
>>>>
>>>> And again I got results with no entries for gene, exon, CDS etc.
>>>> 957 contig
>>>>   46555 expressed_sequence_match
>>>>   43651 match
>>>>  553633 match_part
>>>>  113738 protein_match
>>>>
>>>> As I mentioned in another email, cegma results indicated that the  
>>>> genome was more than 90% complete. Any suggestions would be  
>>>> helpful.
>>>>
>>>> Thank you
>>>> Dhivya
>>>>
>>>>
>>>>
>>>>
>>>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>>>>
>>>>> Hi Dhivya,
>>>>>
>>>>> I think there a few numbers that could be helpful to understand  
>>>>> what's happening here.
>>>>>
>>>>> How many transcripts did Trinity assembly the RNA-seq data into?  
>>>>> Also, you had 29,000 transcripts from cufflinks, but fewer from  
>>>>> MAKER when you gave it the cufflinks data. How many transcripts  
>>>>> did MAKER identify with the cufflinks data? Did you still get  
>>>>> more than the 10,000 transcripts that you found with just the  
>>>>> Trinity data?
>>>>>
>>>>> A key part of MAKER's approach to genome annotation that might  
>>>>> be affecting it's performance is that it only annotates a gene  
>>>>> where there is both evidence (like your RNA-seq data) and an ab- 
>>>>> initio prediction. If a prediction is unsupported by the  
>>>>> evidence, then MAKER won't annotate a gene and if evidence  
>>>>> aligns where there's no prediction, MAKER won't annotate a gene  
>>>>> either. What ab-initio predictors are you using and have they  
>>>>> been trained specific genome?
>>>>>
>>>>> You can force MAKER to automatically promote evidence alignments  
>>>>> to a gene model by setting the est2genome option to 1, but that  
>>>>> will usually give you many false positives.
>>>>>
>>>>> Try rerunning it with either the Trinity data or the Cufflinks  
>>>>> data and with est2genome set to 1, and let us know how that  
>>>>> affects the MAKER results.
>>>>>
>>>>> Thanks,
>>>>> Daniel
>>>>>
>>>>> Daniel Ence
>>>>> Graduate Student
>>>>> Eccles Institute of Human Genetics
>>>>> University of Utah
>>>>> 15 North 2030 East, Room 2100
>>>>> Salt Lake City, UT 84112-5330
>>>>> ________________________________________
>>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on  
>>>>> behalf of dhivya arasappan [darasappan at gmail.com]
>>>>> Sent: Thursday, January 30, 2014 11:18 AM
>>>>> To: maker-devel at yandell-lab.org
>>>>> Subject: [maker-devel] maker annotation with cufflinks output
>>>>>
>>>>> Hello,
>>>>>
>>>>> I am trying to annotate a 200 mb plant genome for which I have a  
>>>>> very
>>>>> good assembly.
>>>>>
>>>>> I tried to denovo assemble RNA-seq data using trinity and ran  
>>>>> maker
>>>>> using my genome assembly and the trinity results.  I did not get  
>>>>> as
>>>>> many transcripts as expected, around 10,000 transcripts.
>>>>>
>>>>> So, I decided to try a different approach.  I did a genome  
>>>>> assisted
>>>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>>>>> generated 21,000 genes, 29,000 transcripts.  I then ran maker  
>>>>> using my
>>>>> genome assembly and the cufflinks result.  I get much less  
>>>>> number of
>>>>> transcripts as a result.
>>>>>
>>>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>>>>> confused as to why maker is not finding the same.
>>>>>
>>>>> Any suggestions would be appreciated.
>>>>>
>>>>> Thanks
>>>>> Dhivya
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> maker-devel mailing list
>>>>> maker-devel at box290.bluehost.com
>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>
>>>> _______________________________________________ maker-devel  
>>>> mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140211/bf1fae70/attachment.html>

From carsonhh at gmail.com  Tue Feb 11 12:55:38 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 11 Feb 2014 11:55:38 -0700
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <0BB3E178-1CA3-46E7-8923-3E7C6B834665@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
	<C375C3D8-1B13-4685-9E90-AAF710CADCDD@gmail.com>
	<CF19004A.9913%carsonhh@gmail.com>
	<02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com>
	<CF190E83.9927%carsonhh@gmail.com>
	<CAGWaY_4mGU2DLWwcQ=_F3-O+YE1ZmDtE=zgdi6cVouhkH=N5HQ@mail.gmail.com>
	<CF19187C.994D%carsonhh@gmail.com>
	<0BB3E178-1CA3-46E7-8923-3E7C6B834665@gmail.com>
Message-ID: <CF1FBEEF.9BF5%carsonhh@gmail.com>

I wouldn?t use mpi_evaluator.  It is buggy and has virtually no
documentation.  The AED values are the best way to identify which genes are
higher and lower quality.  You can also run interproscan to identify protein
domain content as an independent evaluation. Look at this paper here ?>
http://www.biomedcentral.com/1471-2105/12/491

Figure 4 has a nice example of how AED, domain content, and gene orthology
correlate to show the quality of different subsets of genes in seven ant
genomes.

If you choose to try mpi_evaluator it uses the -CTL option to generate empty
files that you then fill in.

Thanks,
Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Tuesday, February 11, 2014 at 11:48 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Daniel Ence <dence at genetics.utah.edu>, <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] maker annotation with cufflinks output

With your suggested changes (using a protein file not derived from the
RNA-seq data and fixing the gff file for training SNAP), I was able to
increase the number of genes from 6000+ to 18116.

I'm now trying to evaluate the quality of the annotation.  I have a question
about the usage for mpi_evaluator.

In the maker tutorial,  the usage is given as:

 mpi_evaluator [options] <eval_opts> <eval_bopts> <eval_exe>
What files are being referred to in the input parameters: eval_opts,
eval_bopts and eval_exe?

Thanks 
Dhivya

On Feb 6, 2014, at 11:47 AM, Carson Holt wrote:

> Ok.  Content looks good.  Just make sure to use gff3_merge to join the GFF3?s
> without stripping out the fasta sequence at the end when training SNAP.
> 
> Thanks,
> Carson
> 
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Thursday, February 6, 2014 at 10:29 AM
> To:  Carson Holt <carsonhh at gmail.com>
> Cc:  Daniel Ence <dence at genetics.utah.edu>
> Subject:  Re: [maker-devel] maker annotation with cufflinks output
> 
> Sorry I was just trying to make it small enough to be approved by the mailing
> list.
> 
> Here is the whole file:
> 
> 
>  cat.formatted.gff.tgz
> <https://docs.google.com/file/d/0B3fACsJDXQi6VEE1VG5tWEh5M1U/edit?usp=drive_we
> b> 
> 
> 
> 
> On Thu, Feb 6, 2014 at 11:04 AM, Carson Holt <carsonhh at gmail.com> wrote:
>> Could you give me the file without using 'head? to trim it, its cutting it
>> before it reaches the part I?m interested in.
>> 
>> ?Carson
>> 
>> 
>> From:  dhivya arasappan <darasappan at gmail.com>
>> Date:  Thursday, February 6, 2014 at 10:01 AM
>> 
>> To:  Carson Holt <carsonhh at gmail.com>
>> Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
>> <maker-devel at yandell-lab.org>
>> Subject:  Re: [maker-devel] maker annotation with cufflinks output
>> 
>> Oh yes I did- I took just the non sequence entries in the gff file and used
>> that as my input.  I will rerun snap with the gff file containing the
>> sequences as well.
>> 
>> I'm attaching a snippet of the gff file that I used as input to maker2zff.
>> 
>> Thanks for your help
>> Dhivya
>> 
>> 
>> 
>> 
>> On Feb 6, 2014, at 10:05 AM, Carson Holt wrote:
>> 
>>> Your genome.dna file has no sequence?  Did you by any chance strip the fasta
>>> sequence from the GFF3 you are using as input to maker2zff?  There should be
>>> fasta sequence at the end of that file.  Also can I see the GFF3 file you
>>> are using as input to maker2zff.
>>> 
>>> Thanks,
>>> Carson
>>> 
>>> From:  dhivya arasappan <darasappan at gmail.com>
>>> Date:  Thursday, February 6, 2014 at 7:47 AM
>>> To:  Carson Holt <carsonhh at gmail.com>
>>> Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
>>> <maker-devel at yandell-lab.org>
>>> Subject:  Re: [maker-devel] maker annotation with cufflinks output
>>> 
>>> Hello,
>>> 
>>> I does appear than my genome.ann file from maker2zff script has data in it.
>>> However, the SNAP steps after that have created empty files.  The following
>>> are all empty:
>>> 
>>> alt.dna  err.dna  export.dna  genome.dna  olp.dna  uni.dna  wrn.dna
>>> alt.ann  err.ann  export.ann  genome.ann  olp.ann  uni.ann  wrn.ann
>>> 
>>> When I tried to get gene stats or validate genome.ann, I get errors like
>>> this for all of them:
>>> 
>>> fathom genome.ann genome.dna -gene-stats |more
>>> MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds
>>> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
>>> exon-6:out_of_bounds
>>> MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds
>>> exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds
>>> exon-1:out_of_bounds
>>> MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds
>>> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
>>> MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds
>>> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
>>> exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds
>>> exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds
>>> exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds
>>> exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds
>>> exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds
>>> exon-21:out_of_bounds
>>> 
>>> I'm not sure why the annotation I'm seeing in genome.ann are all showing up
>>> as errors. I realize this may be an issue with snap, but are you familiar
>>> with anything like this? My genome.ann file is attached for reference.
>>> 
>>> Thanks
>>> Dhivya
>>> 
>>> On Feb 5, 2014, at 12:38 PM, Carson Holt wrote:
>>> 
>>>> Do you have any features of type snap in your results from step 3?  We?ve
>>>> had a couple of recent posts where after training snap was giving no
>>>> results, and as a result maker couldn?t give any genes.  One cause of
>>>> something like that may be your step 2.  Make sure the ZFF wasn?t empty you
>>>> used to train with.  The maker2zff script uses filters to only put the best
>>>> genes in the off file, and if all your genes fail the filtering then you
>>>> are training with an empty ZFF.
>>>> 
>>>> Also you should use proteins from a related species as your protein file.
>>>> I see that you protein marches are varying wildly from run to run? So is
>>>> your contig count?  Were the subset of contigs you have results for long
>>>> enough to contain genes?
>>>> 
>>>> ?Carson
>>>> 
>>>> From:  dhivya arasappan <darasappan at gmail.com>
>>>> Date:  Monday, February 3, 2014 at 9:31 AM
>>>> To:  Daniel Ence <dence at genetics.utah.edu>
>>>> Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>> Subject:  Re: [maker-devel] maker annotation with cufflinks output
>>>> 
>>>> Hi Daniel,
>>>> 
>>>> I was able to check on some of those questions.
>>>> 
>>>> 1. From trinity assembly: I started with 102000 contigs. I used trinotate
>>>> to annotate proteins in this.
>>>> 
>>>> I ran maker on this data with est2genome set to 1. The output looks like
>>>> this (most important parts on top):
>>>> 
>>>>     6653 gene
>>>>    46675 exon
>>>>  280534 protein_match
>>>> 59934 CDS
>>>>     969 contig
>>>>  105388 expressed_sequence_match
>>>>   12584 five_prime_UTR
>>>>   78565 match
>>>> 1401369 match_part
>>>>   10180 mRNA
>>>>   11545 three_prime_UTR
>>>> 
>>>> 2. From cufflinks assembly: I started with 133380 entries (out of which
>>>> there are 29,000 transcripts).  I used the protein sequences from trinity
>>>> assembly.
>>>> 
>>>> I ran maker on this data with est2genome set to 1. The output looks like
>>>> this:
>>>>      29 gene
>>>>      75 exon
>>>>  573659 protein_match
>>>> 67 CDS
>>>>    1099 contig
>>>>  269298 expressed_sequence_match
>>>>      23 five_prime_UTR
>>>>  173844 match
>>>> 2221846 match_part
>>>>      29 mRNA
>>>>      23 three_prime_UTR
>>>> 
>>>> The genes annotated using the trinity assembly is lower than expected, so I
>>>> went the cufflinks route. I dont understand why when using the cufflinks
>>>> transcripts, even less genes are being found.
>>>> 
>>>> 3. Training SNAP:  I used the results of maker from 1 to train SNAP.  I
>>>> then used that training set to rerun maker:
>>>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/s
>>>> nap/RHA.hmm
>>>> est2genome=0
>>>> 
>>>> And again I got results with no entries for gene, exon, CDS etc.
>>>> 957 contig
>>>>   46555 expressed_sequence_match
>>>>   43651 match
>>>>  553633 match_part
>>>>  113738 protein_match
>>>> 
>>>> As I mentioned in another email, cegma results indicated that the genome
>>>> was more than 90% complete. Any suggestions would be helpful.
>>>> 
>>>> Thank you
>>>> Dhivya
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>>>> 
>>>>> Hi Dhivya, 
>>>>> 
>>>>> I think there a few numbers that could be helpful to understand what's
>>>>> happening here.
>>>>> 
>>>>> How many transcripts did Trinity assembly the RNA-seq data into? Also, you
>>>>> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave
>>>>> it the cufflinks data. How many transcripts did MAKER identify with the
>>>>> cufflinks data? Did you still get more than the 10,000 transcripts that
>>>>> you found with just the Trinity data?
>>>>> 
>>>>> A key part of MAKER's approach to genome annotation that might be
>>>>> affecting it's performance is that it only annotates a gene where there is
>>>>> both evidence (like your RNA-seq data) and an ab-initio prediction. If a
>>>>> prediction is unsupported by the evidence, then MAKER won't annotate a
>>>>> gene and if evidence aligns where there's no prediction, MAKER won't
>>>>> annotate a gene either. What ab-initio predictors are you using and have
>>>>> they been trained specific genome?
>>>>> 
>>>>> You can force MAKER to automatically promote evidence alignments to a gene
>>>>> model by setting the est2genome option to 1, but that will usually give
>>>>> you many false positives.
>>>>> 
>>>>> Try rerunning it with either the Trinity data or the Cufflinks data and
>>>>> with est2genome set to 1, and let us know how that affects the MAKER
>>>>> results. 
>>>>> 
>>>>> Thanks,
>>>>> Daniel
>>>>> 
>>>>> Daniel Ence
>>>>> Graduate Student
>>>>> Eccles Institute of Human Genetics
>>>>> University of Utah
>>>>> 15 North 2030 East, Room 2100
>>>>> Salt Lake City, UT 84112-5330
>>>>> ________________________________________
>>>>>  From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>>>>> dhivya arasappan [darasappan at gmail.com]
>>>>>  Sent: Thursday, January 30, 2014 11:18 AM
>>>>> To: maker-devel at yandell-lab.org
>>>>> Subject: [maker-devel] maker annotation with cufflinks output
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> I am trying to annotate a 200 mb plant genome for which I have a very
>>>>> good assembly.
>>>>> 
>>>>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>>>>> using my genome assembly and the trinity results.  I did not get as
>>>>>  many transcripts as expected, around 10,000 transcripts.
>>>>> 
>>>>> So, I decided to try a different approach.  I did a genome assisted
>>>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>>>>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
>>>>>  genome assembly and the cufflinks result.  I get much less number of
>>>>> transcripts as a result.
>>>>> 
>>>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>>>>> confused as to why maker is not finding the same.
>>>>> 
>>>>> Any suggestions would be appreciated.
>>>>> 
>>>>> Thanks
>>>>> Dhivya
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> maker-devel mailing list
>>>>> maker-devel at box290.bluehost.com
>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>> 
>>>> _______________________________________________ maker-devel mailing list
>>>> maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> 
>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140211/0f491f93/attachment.html>

From carson.holt at genetics.utah.edu  Tue Feb 11 14:52:05 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Tue, 11 Feb 2014 20:52:05 +0000
Subject: [maker-devel] New MAKER release
Message-ID: <CF1FDB84.9C17%carson.holt@genetics.utah.edu>

Hello all,

MAKER has been updated to 2.31.

There are no major new features over 2.30.  It is primarily just bug fixes, and updates to the features that were added from MAKER-P like tRNAscan support.  I also was able to remove the seg faults that sometimes happened on exit under OpenMPI.

Thanks,
Carson

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140211/bce7d2a5/attachment.html>

From carson.holt at genetics.utah.edu  Tue Feb 11 15:19:17 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Tue, 11 Feb 2014 21:19:17 +0000
Subject: [maker-devel] New MAKER release
In-Reply-To: <CA+77YqG+FiWr+HvSNYY6R6UOBCtcejA1wCLCXvzQr_Top5Eemw@mail.gmail.com>
References: <CF1FDB84.9C17%carson.holt@genetics.utah.edu>
	<CA+77YqG+FiWr+HvSNYY6R6UOBCtcejA1wCLCXvzQr_Top5Eemw@mail.gmail.com>
Message-ID: <CF1FDDCC.9C1B%carson.holt@genetics.utah.edu>

URLs can be manually edited in the .../maker/src/locations file. I?ve also updated that file in the latest MAKER download. to point to the new RepBase URL.

Thanks,
Carson

From: Joanna Kelley <jokelley at stanford.edu<mailto:jokelley at stanford.edu>>
Date: Tuesday, February 11, 2014 at 2:00 PM
To: Carson Holt <carson.holt at genetics.utah.edu<mailto:carson.holt at genetics.utah.edu>>
Subject: Re: [maker-devel] New MAKER release

Hi Carson,

The RepBase step is failing, it seems to be looking for the incorrect version, where do I change the code to solve that?

Thanks,
Joanna

 Downloading RepBase...
--2014-02-11 12:59:38--  http://www.girinst.org/server/RepBase/protected/repeatmaskerlibraries/repeatmaskerlibraries-20130422.tar.gz
Resolving www.girinst.org... 66.201.49.247
Connecting to www.girinst.org<http://www.girinst.org>|66.201.49.247|:80... connected.
HTTP request sent, awaiting response... 401 Authorization Required
Connecting to www.girinst.org<http://www.girinst.org>|66.201.49.247|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2014-02-11 12:59:38 ERROR 404: Not Found.


ERROR: Failed installing RepBase, now cleaning installation path...
You may need to install RepBase manually.


On Tue, Feb 11, 2014 at 12:52 PM, Carson Holt <carson.holt at genetics.utah.edu<mailto:carson.holt at genetics.utah.edu>> wrote:
Hello all,

MAKER has been updated to 2.31.

There are no major new features over 2.30.  It is primarily just bug fixes, and updates to the features that were added from MAKER-P like tRNAscan support.  I also was able to remove the seg faults that sometimes happened on exit under OpenMPI.

Thanks,
Carson


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


--
Please update your address book, my new email address is joanna.l.kelley at wsu.edu<mailto:joanna.l.kelley at wsu.edu>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140211/3da9afda/attachment.html>

From dence at genetics.utah.edu  Tue Feb 11 16:59:57 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Tue, 11 Feb 2014 22:59:57 +0000
Subject: [maker-devel] ERROR: Failed while processing the chunk divide!!
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A8908ADE@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A8908ADE@SKREGIXES2.AGR.GC.CA>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D4462B@mxb2.hg.genetics.utah.edu>

Hi Hossen, 

I think that what would be the most help right now is if you ran MAKER on only one of those contigs that are failing and send me the entire error output along with the maker control files that you are using. It looks like the error is coming from the gff3 files that you are using as input. 

Thanks,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
Sent: Tuesday, February 11, 2014 3:51 PM
To: Daniel Ence
Subject: ERROR: Failed while processing the chunk divide!!

Dear Daniel

I re-started maker and it is still running. But in error our file that has
been generated so far it seems that smaller conitgs are affected. There
are contigs of 2-4 kb with this error but also I noticed a contig of 30kb
length having this error

I was wondering if I need to change the setting in the maker_opt file

#-----MAKER Behavior Options
max_dna_len=100000 #length for dividing up contigs into chunks
(increases/decreases  memory usage)
min_contig=1 #skip genome contigs below this length (under 10kb are often
useless)


If I understand correctly max_dna_len   divide conitgs  of over 100kb to
smaller chucks. However it is not clear to me that for the min_contig
option if the default contig length is 10kb or less, then why I have error
message for 30kb long contigs. Should I change this to 0

Here is an example of the error message for one of the contigs


#--------- command -------------#
Widget::exonerate::est2genome:
/usr/local/exonerate-2.2.0-x86_64/bin/exonerate  -q
/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.brass
icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35
/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta
-t
/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.brass
icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom
e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-1136.
fasta
-Q dna -T dna --model est2genome
--minintron 20 --showcigar --percent 20 >
/raid01/projects/Plasmodiophora/brassica
e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.brassi
cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3S
c00001.235-1136.comp14545_c0_seq1.est_exonerate
#-------------------------------#
cleaning blastn...
cleaning tblastx...
cleaning blastx...
ERROR: Failed on
PbPT3Sc00001_S_0.8_1-mRNA-1
Check your input GFF3 file for errors!
(from GFFDB)

FATAL ERROR
ERROR: Failed while processing the chunk
divide!!

ERROR: Chunk failed at level 17
!!
FAILED CONTIG:PbPT3Sc00001


--Next Contig--


Regards


HB


On 14-02-11 12:37 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hossein,
>
>Ok. So since this error came up on a local install, I'm going to need
>some more information to understand what went wrong. Is it the same
>contig that always causes this error? If it is, then is the the only
>error or warning that MAKER encounters while running on this contig? Or,
>if multiple contigs fail, then is it always the same error?
>
>If you can narrow it down to the smallest possible dataset that
>consistently gives the same error, then we canb egin to understand what's
>wrong.
>
>Thanks,
>Daniel
>
>
>Daniel Ence
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>Sent: Tuesday, February 11, 2014 11:20 AM
>To: Daniel Ence
>Subject: Re: [maker-devel] Falied to create new account
>
>Hi Daniel
>
>I running it through the local server at my work
>
>
>
>
>
>
>M. Hossein Borhan, Ph.D.
>Research Scientist/ Chercheur Scientifique
>Saskatoon Research Centre/Centre de Recherches de Saskatoon
>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
>107 Science Place, Saskatoon, SK.,S7N 0X2
>Telephone/T?l?phone: (306) 385-9441
>Facsimile/T?l?copieur: (306) 385-9482
>Hossein.borhan at agr.gc.ca
>
>
>
>
>
>
>
>
>On 14-02-11 12:16 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>
>>Hi Hossein,
>>
>>Did you encounter this error while you were running MAKER on your local
>>machine or through the MAKER web annotation service?
>>
>>Thanks,
>>Daniel
>>
>>
>>Daniel Ence
>>Graduate Student
>>Eccles Institute of Human Genetics
>>University of Utah
>>15 North 2030 East, Room 2100
>>Salt Lake City, UT 84112-5330
>>________________________________________
>>From: Carson Holt [carsonhh at gmail.com]
>>Sent: Tuesday, February 11, 2014 10:18 AM
>>To: Daniel Ence
>>Cc: Mark Yandell
>>Subject: FW: [maker-devel] Falied to create new account
>>
>>Hey Daniel could you download his dataset, and see if you can replicate
>>the error.  Also check if this was an MWAS job or a local maker run (his
>>dataset will already be there for MWAS, you just need the job ID).
>>
>>Thanks,
>>Carson
>>
>>On 2/11/14, 10:16 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:
>>
>>>Hi Carson
>>>
>>>
>>>I encountered this error while running maker
>>>
>>>FATAL ERROR
>>>ERROR: Failed while processing the chunk divide!!
>>>
>>>ERROR: Chunk failed at level 17
>>>!!
>>>FAILED CONTIG:PbPT3Sc00006
>>>
>>>
>>>
>>>
>>>
>>>HB
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>>
>>>
>>
>>
>


From marc.hoeppner at imbim.uu.se  Wed Feb 12 02:34:12 2014
From: marc.hoeppner at imbim.uu.se (Marc P. Hoeppner)
Date: Wed, 12 Feb 2014 09:34:12 +0100
Subject: [maker-devel] Annotations from protein alignments
Message-ID: <52FB3204.60606@imbim.uu.se>

Dear list,

I have an annotation project with both protein data (it's a bird, so 
I've been using both vertebrates in general and chicken in specific), 
and huge amounts of somewhat dodgy (as in lot's of pre-mRNA) RNA-seq 
data. The chicken augustus model seems to do a decent job in seeding 
gene loci, but it's not quite perfect. I want to use protein alignments 
to create a high-confidence set of exons and subsequently a set of gene 
loci to train e.g. snap), but when testing to set protein2genome=1 I 
never get any annotations. This is also true for the test data set that 
is delivered together with Maker (hsap_). Anything I should know about 
the use of proteins to generate annotations? I left all settings in the 
config file at their defaults (except protein2genome=1). I've tried this 
with both Maker 2.30 and 2.31.

All the best,

Marc

-- 
-----------
Marc P. Hoeppner, PhD
Group leader
BILS Genome annotation platform

Department of Medical Biochemistry and Microbiology
Uppsala University, Sweden
marc.hoepner at imbim.uu.se


From carsonhh at gmail.com  Wed Feb 12 09:42:36 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 12 Feb 2014 08:42:36 -0700
Subject: [maker-devel] Annotations from protein alignments
In-Reply-To: <52FB3204.60606@imbim.uu.se>
References: <52FB3204.60606@imbim.uu.se>
Message-ID: <CF20E42A.9C8C%carsonhh@gmail.com>

I updated the 2.31 tar ball.  Go ahead and download it again.
protein2genome was turned off for eukaryotes and only working for
prokaryotic genomes.

?Carson


On 2/12/14, 1:34 AM, "Marc P. Hoeppner" <marc.hoeppner at imbim.uu.se> wrote:

>Dear list,
>
>I have an annotation project with both protein data (it's a bird, so
>I've been using both vertebrates in general and chicken in specific),
>and huge amounts of somewhat dodgy (as in lot's of pre-mRNA) RNA-seq
>data. The chicken augustus model seems to do a decent job in seeding
>gene loci, but it's not quite perfect. I want to use protein alignments
>to create a high-confidence set of exons and subsequently a set of gene
>loci to train e.g. snap), but when testing to set protein2genome=1 I
>never get any annotations. This is also true for the test data set that
>is delivered together with Maker (hsap_). Anything I should know about
>the use of proteins to generate annotations? I left all settings in the
>config file at their defaults (except protein2genome=1). I've tried this
>with both Maker 2.30 and 2.31.
>
>All the best,
>
>Marc
>
>-- 
>-----------
>Marc P. Hoeppner, PhD
>Group leader
>BILS Genome annotation platform
>
>Department of Medical Biochemistry and Microbiology
>Uppsala University, Sweden
>marc.hoepner at imbim.uu.se
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From dence at genetics.utah.edu  Wed Feb 12 12:59:11 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Wed, 12 Feb 2014 18:59:11 +0000
Subject: [maker-devel] ERROR: Failed while processing the chunk divide!!
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A8908D02@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A8908ADE@SKREGIXES2.AGR.GC.CA>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D4462B@mxb2.hg.genetics.utah.edu>,
	<E8EDFB90D92694478065C37017B3A3A6A8908D02@SKREGIXES2.AGR.GC.CA>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D448BA@mxb2.hg.genetics.utah.edu>

Hi Hossein, 

So, after looking at the gff3 and your control files, I had an idea. There's the part of the control file called "Re-annotation Using MAKER Derived GFF3", but you can also passthrough features from a gff3 using the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines. 

Sometimes we encounter problems with the MAKER passthrough. Could you try dividing the gff3 file into the different feature sources and passing it through the "est_gff" etc options and not with the MAKER passthrough? That will tell us if the problem is with the gff3 file or with how MAKER is processing it. 

Another also to check is to make sure that the contig names in the gff3 file match the contig names in the fasta file that you're annotating. 

Thanks,
Daniel


Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
Sent: Wednesday, February 12, 2014 8:49 AM
To: Daniel Ence
Subject: Re: ERROR: Failed while processing the chunk divide!!

Dear Daniel


I have generated the files that you requested. I choose Sc00009 from my
genome which is 30 kb and was one of the scaffolds coming up with error.
In addition to Ctl files and error output file I also attached a part of
the gff file related to SC00009 that is indicated in the error message.


Thanks for helping with this


Regards


HB


On 14-02-11 4:59 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hi Hossen,
>
>I think that what would be the most help right now is if you ran MAKER on
>only one of those contigs that are failing and send me the entire error
>output along with the maker control files that you are using. It looks
>like the error is coming from the gff3 files that you are using as input.
>
>Thanks,
>Daniel
>
>
>
>Daniel Ence
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>Sent: Tuesday, February 11, 2014 3:51 PM
>To: Daniel Ence
>Subject: ERROR: Failed while processing the chunk divide!!
>
>Dear Daniel
>
>I re-started maker and it is still running. But in error our file that has
>been generated so far it seems that smaller conitgs are affected. There
>are contigs of 2-4 kb with this error but also I noticed a contig of 30kb
>length having this error
>
>I was wondering if I need to change the setting in the maker_opt file
>
>#-----MAKER Behavior Options
>max_dna_len=100000 #length for dividing up contigs into chunks
>(increases/decreases  memory usage)
>min_contig=1 #skip genome contigs below this length (under 10kb are often
>useless)
>
>
>If I understand correctly max_dna_len   divide conitgs  of over 100kb to
>smaller chucks. However it is not clear to me that for the min_contig
>option if the default contig length is 10kb or less, then why I have error
>message for 30kb long contigs. Should I change this to 0
>
>Here is an example of the error message for one of the contigs
>
>
>#--------- command -------------#
>Widget::exonerate::est2genome:
>/usr/local/exonerate-2.2.0-x86_64/bin/exonerate  -q
>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras
>s
>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35
>/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta
>-t
>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras
>s
>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom
>e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-1136
>.
>fasta
>-Q dna -T dna --model est2genome
>--minintron 20 --showcigar --percent 20 >
>/raid01/projects/Plasmodiophora/brassica
>e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.brass
>i
>cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3
>S
>c00001.235-1136.comp14545_c0_seq1.est_exonerate
>#-------------------------------#
>cleaning blastn...
>cleaning tblastx...
>cleaning blastx...
>ERROR: Failed on
>PbPT3Sc00001_S_0.8_1-mRNA-1
>Check your input GFF3 file for errors!
>(from GFFDB)
>
>FATAL ERROR
>ERROR: Failed while processing the chunk
>divide!!
>
>ERROR: Chunk failed at level 17
>!!
>FAILED CONTIG:PbPT3Sc00001
>
>
>
>
>--Next Contig--
>
>
>
>
>
>
>Regards
>
>
>HB
>
>
>
>
>
>
>
>
>
>
>On 14-02-11 12:37 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>
>>Hossein,
>>
>>Ok. So since this error came up on a local install, I'm going to need
>>some more information to understand what went wrong. Is it the same
>>contig that always causes this error? If it is, then is the the only
>>error or warning that MAKER encounters while running on this contig? Or,
>>if multiple contigs fail, then is it always the same error?
>>
>>If you can narrow it down to the smallest possible dataset that
>>consistently gives the same error, then we canb egin to understand what's
>>wrong.
>>
>>Thanks,
>>Daniel
>>
>>
>>Daniel Ence
>>Graduate Student
>>Eccles Institute of Human Genetics
>>University of Utah
>>15 North 2030 East, Room 2100
>>Salt Lake City, UT 84112-5330
>>________________________________________
>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>>Sent: Tuesday, February 11, 2014 11:20 AM
>>To: Daniel Ence
>>Subject: Re: [maker-devel] Falied to create new account
>>
>>Hi Daniel
>>
>>I running it through the local server at my work
>>
>>
>>
>>
>>
>>
>>M. Hossein Borhan, Ph.D.
>>Research Scientist/ Chercheur Scientifique
>>Saskatoon Research Centre/Centre de Recherches de Saskatoon
>>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
>>107 Science Place, Saskatoon, SK.,S7N 0X2
>>Telephone/T?l?phone: (306) 385-9441
>>Facsimile/T?l?copieur: (306) 385-9482
>>Hossein.borhan at agr.gc.ca
>>
>>
>>
>>
>>
>>
>>
>>
>>On 14-02-11 12:16 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>
>>>Hi Hossein,
>>>
>>>Did you encounter this error while you were running MAKER on your local
>>>machine or through the MAKER web annotation service?
>>>
>>>Thanks,
>>>Daniel
>>>
>>>
>>>Daniel Ence
>>>Graduate Student
>>>Eccles Institute of Human Genetics
>>>University of Utah
>>>15 North 2030 East, Room 2100
>>>Salt Lake City, UT 84112-5330
>>>________________________________________
>>>From: Carson Holt [carsonhh at gmail.com]
>>>Sent: Tuesday, February 11, 2014 10:18 AM
>>>To: Daniel Ence
>>>Cc: Mark Yandell
>>>Subject: FW: [maker-devel] Falied to create new account
>>>
>>>Hey Daniel could you download his dataset, and see if you can replicate
>>>the error.  Also check if this was an MWAS job or a local maker run (his
>>>dataset will already be there for MWAS, you just need the job ID).
>>>
>>>Thanks,
>>>Carson
>>>
>>>On 2/11/14, 10:16 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA>
>>>wrote:
>>>
>>>>Hi Carson
>>>>
>>>>
>>>>I encountered this error while running maker
>>>>
>>>>FATAL ERROR
>>>>ERROR: Failed while processing the chunk divide!!
>>>>
>>>>ERROR: Chunk failed at level 17
>>>>!!
>>>>FAILED CONTIG:PbPT3Sc00006
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>HB
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>
>>>
>>>
>>
>


From dence at genetics.utah.edu  Wed Feb 12 13:15:59 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Wed, 12 Feb 2014 19:15:59 +0000
Subject: [maker-devel] ERROR: Failed while processing the chunk divide!!
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D448BA@mxb2.hg.genetics.utah.edu>
References: <E8EDFB90D92694478065C37017B3A3A6A8908ADE@SKREGIXES2.AGR.GC.CA>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D4462B@mxb2.hg.genetics.utah.edu>,
	<E8EDFB90D92694478065C37017B3A3A6A8908D02@SKREGIXES2.AGR.GC.CA>,
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D448BA@mxb2.hg.genetics.utah.edu>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D44928@mxb2.hg.genetics.utah.edu>

Hi Hossein, 

One more question. How did you make the gff3 that you're passing through here? 

Thanks,
Daniel 


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Daniel Ence [dence at genetics.utah.edu]
Sent: Wednesday, February 12, 2014 11:59 AM
To: Borhan, Hossein
Cc: maker-devel at yandell-lab.org
Subject: Re: [maker-devel] ERROR: Failed while processing the chunk divide!!

Hi Hossein,

So, after looking at the gff3 and your control files, I had an idea. There's the part of the control file called "Re-annotation Using MAKER Derived GFF3", but you can also passthrough features from a gff3 using the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines.

Sometimes we encounter problems with the MAKER passthrough. Could you try dividing the gff3 file into the different feature sources and passing it through the "est_gff" etc options and not with the MAKER passthrough? That will tell us if the problem is with the gff3 file or with how MAKER is processing it.

Another also to check is to make sure that the contig names in the gff3 file match the contig names in the fasta file that you're annotating.

Thanks,
Daniel


Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
Sent: Wednesday, February 12, 2014 8:49 AM
To: Daniel Ence
Subject: Re: ERROR: Failed while processing the chunk divide!!

Dear Daniel


I have generated the files that you requested. I choose Sc00009 from my
genome which is 30 kb and was one of the scaffolds coming up with error.
In addition to Ctl files and error output file I also attached a part of
the gff file related to SC00009 that is indicated in the error message.


Thanks for helping with this


Regards


HB


On 14-02-11 4:59 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hi Hossen,
>
>I think that what would be the most help right now is if you ran MAKER on
>only one of those contigs that are failing and send me the entire error
>output along with the maker control files that you are using. It looks
>like the error is coming from the gff3 files that you are using as input.
>
>Thanks,
>Daniel
>
>
>
>Daniel Ence
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>Sent: Tuesday, February 11, 2014 3:51 PM
>To: Daniel Ence
>Subject: ERROR: Failed while processing the chunk divide!!
>
>Dear Daniel
>
>I re-started maker and it is still running. But in error our file that has
>been generated so far it seems that smaller conitgs are affected. There
>are contigs of 2-4 kb with this error but also I noticed a contig of 30kb
>length having this error
>
>I was wondering if I need to change the setting in the maker_opt file
>
>#-----MAKER Behavior Options
>max_dna_len=100000 #length for dividing up contigs into chunks
>(increases/decreases  memory usage)
>min_contig=1 #skip genome contigs below this length (under 10kb are often
>useless)
>
>
>If I understand correctly max_dna_len   divide conitgs  of over 100kb to
>smaller chucks. However it is not clear to me that for the min_contig
>option if the default contig length is 10kb or less, then why I have error
>message for 30kb long contigs. Should I change this to 0
>
>Here is an example of the error message for one of the contigs
>
>
>#--------- command -------------#
>Widget::exonerate::est2genome:
>/usr/local/exonerate-2.2.0-x86_64/bin/exonerate  -q
>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras
>s
>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35
>/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta
>-t
>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras
>s
>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom
>e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-1136
>.
>fasta
>-Q dna -T dna --model est2genome
>--minintron 20 --showcigar --percent 20 >
>/raid01/projects/Plasmodiophora/brassica
>e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.brass
>i
>cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3
>S
>c00001.235-1136.comp14545_c0_seq1.est_exonerate
>#-------------------------------#
>cleaning blastn...
>cleaning tblastx...
>cleaning blastx...
>ERROR: Failed on
>PbPT3Sc00001_S_0.8_1-mRNA-1
>Check your input GFF3 file for errors!
>(from GFFDB)
>
>FATAL ERROR
>ERROR: Failed while processing the chunk
>divide!!
>
>ERROR: Chunk failed at level 17
>!!
>FAILED CONTIG:PbPT3Sc00001
>
>
>
>
>--Next Contig--
>
>
>
>
>
>
>Regards
>
>
>HB
>
>
>
>
>
>
>
>
>
>
>On 14-02-11 12:37 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>
>>Hossein,
>>
>>Ok. So since this error came up on a local install, I'm going to need
>>some more information to understand what went wrong. Is it the same
>>contig that always causes this error? If it is, then is the the only
>>error or warning that MAKER encounters while running on this contig? Or,
>>if multiple contigs fail, then is it always the same error?
>>
>>If you can narrow it down to the smallest possible dataset that
>>consistently gives the same error, then we canb egin to understand what's
>>wrong.
>>
>>Thanks,
>>Daniel
>>
>>
>>Daniel Ence
>>Graduate Student
>>Eccles Institute of Human Genetics
>>University of Utah
>>15 North 2030 East, Room 2100
>>Salt Lake City, UT 84112-5330
>>________________________________________
>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>>Sent: Tuesday, February 11, 2014 11:20 AM
>>To: Daniel Ence
>>Subject: Re: [maker-devel] Falied to create new account
>>
>>Hi Daniel
>>
>>I running it through the local server at my work
>>
>>
>>
>>
>>
>>
>>M. Hossein Borhan, Ph.D.
>>Research Scientist/ Chercheur Scientifique
>>Saskatoon Research Centre/Centre de Recherches de Saskatoon
>>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
>>107 Science Place, Saskatoon, SK.,S7N 0X2
>>Telephone/T?l?phone: (306) 385-9441
>>Facsimile/T?l?copieur: (306) 385-9482
>>Hossein.borhan at agr.gc.ca
>>
>>
>>
>>
>>
>>
>>
>>
>>On 14-02-11 12:16 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>
>>>Hi Hossein,
>>>
>>>Did you encounter this error while you were running MAKER on your local
>>>machine or through the MAKER web annotation service?
>>>
>>>Thanks,
>>>Daniel
>>>
>>>
>>>Daniel Ence
>>>Graduate Student
>>>Eccles Institute of Human Genetics
>>>University of Utah
>>>15 North 2030 East, Room 2100
>>>Salt Lake City, UT 84112-5330
>>>________________________________________
>>>From: Carson Holt [carsonhh at gmail.com]
>>>Sent: Tuesday, February 11, 2014 10:18 AM
>>>To: Daniel Ence
>>>Cc: Mark Yandell
>>>Subject: FW: [maker-devel] Falied to create new account
>>>
>>>Hey Daniel could you download his dataset, and see if you can replicate
>>>the error.  Also check if this was an MWAS job or a local maker run (his
>>>dataset will already be there for MWAS, you just need the job ID).
>>>
>>>Thanks,
>>>Carson
>>>
>>>On 2/11/14, 10:16 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA>
>>>wrote:
>>>
>>>>Hi Carson
>>>>
>>>>
>>>>I encountered this error while running maker
>>>>
>>>>FATAL ERROR
>>>>ERROR: Failed while processing the chunk divide!!
>>>>
>>>>ERROR: Chunk failed at level 17
>>>>!!
>>>>FAILED CONTIG:PbPT3Sc00006
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>HB
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>
>>>
>>>
>>
>


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From dence at genetics.utah.edu  Wed Feb 12 14:42:03 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Wed, 12 Feb 2014 20:42:03 +0000
Subject: [maker-devel] ERROR: Failed while processing the chunk divide!!
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A8908E3E@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A8908ADE@SKREGIXES2.AGR.GC.CA>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D4462B@mxb2.hg.genetics.utah.edu>
	<E8EDFB90D92694478065C37017B3A3A6A8908D02@SKREGIXES2.AGR.GC.CA>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D448BA@mxb2.hg.genetics.utah.edu>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D44928@mxb2.hg.genetics.utah.edu>
	<E8EDFB90D92694478065C37017B3A3A6A8908DE5@SKREGIXES2.AGR.GC.CA>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D4498A@mxb2.hg.genetics.utah.edu>,
	<E8EDFB90D92694478065C37017B3A3A6A8908E3E@SKREGIXES2.AGR.GC.CA>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D44A3B@mxb2.hg.genetics.utah.edu>

Hi Hossein, 

So, those problems with passing through MAKER-derived gff3 have been addressed in newer versions of MAKER. The current version is 2.31 and is available for download now on our website. Try installing that version and trying the same controls file you started out using, and let me know if that fixes the problems. 

Thanks,
Daniel

 
Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
Sent: Wednesday, February 12, 2014 12:55 PM
To: Daniel Ence
Subject: Re: ERROR: Failed while processing the chunk divide!!

Hi Daniel

I am using maker 2.10
 I also checked the naming of the scaffold in the genome file and the gff
file for the failed example. Naming is the same

Thanks

Hossein


M. Hossein Borhan, Ph.D.
Research Scientist/ Chercheur Scientifique
Saskatoon Research Centre/Centre de Recherches de Saskatoon
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
107 Science Place, Saskatoon, SK.,S7N 0X2
Telephone/T?l?phone: (306) 385-9441
Facsimile/T?l?copieur: (306) 385-9482
Hossein.borhan at agr.gc.ca


On 14-02-12 1:30 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hi Hossein,
>
>And which version of MAKER are you using?
>
>Thanks,
>Daniel
>
>
>Daniel Ence
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>Sent: Wednesday, February 12, 2014 12:25 PM
>To: Daniel Ence
>Subject: Re: ERROR: Failed while processing the chunk divide!!
>
>Hi Daniel
>
>Gff file was generated by the 1st run of maker
>
>
>
>HB
>
>
>
>
>
>
>
>On 14-02-12 1:15 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>
>>Hi Hossein,
>>
>>One more question. How did you make the gff3 that you're passing through
>>here?
>>
>>Thanks,
>>Daniel
>>
>>
>>Daniel Ence
>>Graduate Student
>>Eccles Institute of Human Genetics
>>University of Utah
>>15 North 2030 East, Room 2100
>>Salt Lake City, UT 84112-5330
>>________________________________________
>>From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>>Daniel Ence [dence at genetics.utah.edu]
>>Sent: Wednesday, February 12, 2014 11:59 AM
>>To: Borhan, Hossein
>>Cc: maker-devel at yandell-lab.org
>>Subject: Re: [maker-devel] ERROR: Failed while processing the chunk
>>divide!!
>>
>>Hi Hossein,
>>
>>So, after looking at the gff3 and your control files, I had an idea.
>>There's the part of the control file called "Re-annotation Using MAKER
>>Derived GFF3", but you can also passthrough features from a gff3 using
>>the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines.
>>
>>Sometimes we encounter problems with the MAKER passthrough. Could you try
>>dividing the gff3 file into the different feature sources and passing it
>>through the "est_gff" etc options and not with the MAKER passthrough?
>>That will tell us if the problem is with the gff3 file or with how MAKER
>>is processing it.
>>
>>Another also to check is to make sure that the contig names in the gff3
>>file match the contig names in the fasta file that you're annotating.
>>
>>Thanks,
>>Daniel
>>
>>
>>
>>Graduate Student
>>Eccles Institute of Human Genetics
>>University of Utah
>>15 North 2030 East, Room 2100
>>Salt Lake City, UT 84112-5330
>>________________________________________
>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>>Sent: Wednesday, February 12, 2014 8:49 AM
>>To: Daniel Ence
>>Subject: Re: ERROR: Failed while processing the chunk divide!!
>>
>>Dear Daniel
>>
>>
>>I have generated the files that you requested. I choose Sc00009 from my
>>genome which is 30 kb and was one of the scaffolds coming up with error.
>>In addition to Ctl files and error output file I also attached a part of
>>the gff file related to SC00009 that is indicated in the error message.
>>
>>
>>Thanks for helping with this
>>
>>
>>
>>Regards
>>
>>
>>HB
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>On 14-02-11 4:59 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>
>>>Hi Hossen,
>>>
>>>I think that what would be the most help right now is if you ran MAKER
>>>on
>>>only one of those contigs that are failing and send me the entire error
>>>output along with the maker control files that you are using. It looks
>>>like the error is coming from the gff3 files that you are using as
>>>input.
>>>
>>>Thanks,
>>>Daniel
>>>
>>>
>>>
>>>Daniel Ence
>>>Graduate Student
>>>Eccles Institute of Human Genetics
>>>University of Utah
>>>15 North 2030 East, Room 2100
>>>Salt Lake City, UT 84112-5330
>>>________________________________________
>>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>>>Sent: Tuesday, February 11, 2014 3:51 PM
>>>To: Daniel Ence
>>>Subject: ERROR: Failed while processing the chunk divide!!
>>>
>>>Dear Daniel
>>>
>>>I re-started maker and it is still running. But in error our file that
>>>has
>>>been generated so far it seems that smaller conitgs are affected. There
>>>are contigs of 2-4 kb with this error but also I noticed a contig of
>>>30kb
>>>length having this error
>>>
>>>I was wondering if I need to change the setting in the maker_opt file
>>>
>>>#-----MAKER Behavior Options
>>>max_dna_len=100000 #length for dividing up contigs into chunks
>>>(increases/decreases  memory usage)
>>>min_contig=1 #skip genome contigs below this length (under 10kb are
>>>often
>>>useless)
>>>
>>>
>>>If I understand correctly max_dna_len   divide conitgs  of over 100kb to
>>>smaller chucks. However it is not clear to me that for the min_contig
>>>option if the default contig length is 10kb or less, then why I have
>>>error
>>>message for 30kb long contigs. Should I change this to 0
>>>
>>>Here is an example of the error message for one of the contigs
>>>
>>>
>>>#--------- command -------------#
>>>Widget::exonerate::est2genome:
>>>/usr/local/exonerate-2.2.0-x86_64/bin/exonerate  -q
>>>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.br
>>>a
>>>s
>>>s
>>>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35
>>>/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta
>>>-t
>>>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.br
>>>a
>>>s
>>>s
>>>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom
>>>e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-11
>>>3
>>>6
>>>.
>>>fasta
>>>-Q dna -T dna --model est2genome
>>>--minintron 20 --showcigar --percent 20 >
>>>/raid01/projects/Plasmodiophora/brassica
>>>e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.bra
>>>s
>>>s
>>>i
>>>cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbP
>>>T
>>>3
>>>S
>>>c00001.235-1136.comp14545_c0_seq1.est_exonerate
>>>#-------------------------------#
>>>cleaning blastn...
>>>cleaning tblastx...
>>>cleaning blastx...
>>>ERROR: Failed on
>>>PbPT3Sc00001_S_0.8_1-mRNA-1
>>>Check your input GFF3 file for errors!
>>>(from GFFDB)
>>>
>>>FATAL ERROR
>>>ERROR: Failed while processing the chunk
>>>divide!!
>>>
>>>ERROR: Chunk failed at level 17
>>>!!
>>>FAILED CONTIG:PbPT3Sc00001
>>>
>>>
>>>
>>>
>>>--Next Contig--
>>>
>>>
>>>
>>>
>>>
>>>
>>>Regards
>>>
>>>
>>>HB
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>On 14-02-11 12:37 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>>
>>>>Hossein,
>>>>
>>>>Ok. So since this error came up on a local install, I'm going to need
>>>>some more information to understand what went wrong. Is it the same
>>>>contig that always causes this error? If it is, then is the the only
>>>>error or warning that MAKER encounters while running on this contig?
>>>>Or,
>>>>if multiple contigs fail, then is it always the same error?
>>>>
>>>>If you can narrow it down to the smallest possible dataset that
>>>>consistently gives the same error, then we canb egin to understand
>>>>what's
>>>>wrong.
>>>>
>>>>Thanks,
>>>>Daniel
>>>>
>>>>
>>>>Daniel Ence
>>>>Graduate Student
>>>>Eccles Institute of Human Genetics
>>>>University of Utah
>>>>15 North 2030 East, Room 2100
>>>>Salt Lake City, UT 84112-5330
>>>>________________________________________
>>>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>>>>Sent: Tuesday, February 11, 2014 11:20 AM
>>>>To: Daniel Ence
>>>>Subject: Re: [maker-devel] Falied to create new account
>>>>
>>>>Hi Daniel
>>>>
>>>>I running it through the local server at my work
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>M. Hossein Borhan, Ph.D.
>>>>Research Scientist/ Chercheur Scientifique
>>>>Saskatoon Research Centre/Centre de Recherches de Saskatoon
>>>>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
>>>>107 Science Place, Saskatoon, SK.,S7N 0X2
>>>>Telephone/T?l?phone: (306) 385-9441
>>>>Facsimile/T?l?copieur: (306) 385-9482
>>>>Hossein.borhan at agr.gc.ca
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>On 14-02-11 12:16 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>>>
>>>>>Hi Hossein,
>>>>>
>>>>>Did you encounter this error while you were running MAKER on your
>>>>>local
>>>>>machine or through the MAKER web annotation service?
>>>>>
>>>>>Thanks,
>>>>>Daniel
>>>>>
>>>>>
>>>>>Daniel Ence
>>>>>Graduate Student
>>>>>Eccles Institute of Human Genetics
>>>>>University of Utah
>>>>>15 North 2030 East, Room 2100
>>>>>Salt Lake City, UT 84112-5330
>>>>>________________________________________
>>>>>From: Carson Holt [carsonhh at gmail.com]
>>>>>Sent: Tuesday, February 11, 2014 10:18 AM
>>>>>To: Daniel Ence
>>>>>Cc: Mark Yandell
>>>>>Subject: FW: [maker-devel] Falied to create new account
>>>>>
>>>>>Hey Daniel could you download his dataset, and see if you can
>>>>>replicate
>>>>>the error.  Also check if this was an MWAS job or a local maker run
>>>>>(his
>>>>>dataset will already be there for MWAS, you just need the job ID).
>>>>>
>>>>>Thanks,
>>>>>Carson
>>>>>
>>>>>On 2/11/14, 10:16 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA>
>>>>>wrote:
>>>>>
>>>>>>Hi Carson
>>>>>>
>>>>>>
>>>>>>I encountered this error while running maker
>>>>>>
>>>>>>FATAL ERROR
>>>>>>ERROR: Failed while processing the chunk divide!!
>>>>>>
>>>>>>ERROR: Chunk failed at level 17
>>>>>>!!
>>>>>>FAILED CONTIG:PbPT3Sc00006
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>HB
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>>_______________________________________________
>>maker-devel mailing list
>>maker-devel at box290.bluehost.com
>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>


From masa at bioinfo.hr  Thu Feb 13 04:17:11 2014
From: masa at bioinfo.hr (Masa Roller)
Date: Thu, 13 Feb 2014 11:17:11 +0100
Subject: [maker-devel] SNAP scores and AED scores
Message-ID: <52FC9BA7.6060505@bioinfo.hr>

Dear all,

I ran snap2 based gene prediction through maker.

In the resulting gff file, in the source "snap_masked" I can find the 
score in the score column of every snap prediction that did not get 
promoted to a maker gene. This would be the score of how well the 
prediction matches the HMM?

It seems to me that those snap models that are given gene status no 
longer appear as snap_masked source but only as source "maker". Maker 
then removes the score column, instead giving AED and eAED scores (which 
are more about how the model corresponds to the evidence). When viewing 
the maker transcripts and SNAP predictions in a browser, they do not 
match (mostly, maker predictions are longer).

I am interested in the score of individual gene predictions that 
underlined maker gene models. Where could I find that information?

Many thanks!


From carsonhh at gmail.com  Thu Feb 13 14:11:22 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 13 Feb 2014 13:11:22 -0700
Subject: [maker-devel] SNAP scores and AED scores
In-Reply-To: <52FC9BA7.6060505@bioinfo.hr>
References: <52FC9BA7.6060505@bioinfo.hr>
Message-ID: <CF227374.9D6F%carsonhh@gmail.com>

No.  Snap genes do not disappear. All SNAP ab initio calls will always be
kept as reference fetters marked snap_masked (for repeat masked genome)
and snap (for unmasked genome).  MAKER then runs SNAP another time where
it feeds hints to SNAP based on EST and protein alignment evidence.  These
hint based models can then compete against the ab initio SNAP models to be
promoted to genes if their AED scores are better.  Fianl models can also
get UTR added based on EST evidence.  That is why you can get models from
MAKER that do not match the original SNAP ab initio calls.

So in summary, all SNAP ab initio models will be in snap_masked.  The
MAKER models will consist of hint based SNAP rerun plus SNAP ab intio
models processed to add UTR.

Thanks,
Carson


On 2/13/14, 3:17 AM, "Masa Roller" <masa at bioinfo.hr> wrote:

>Dear all,
>
>I ran snap2 based gene prediction through maker.
>
>In the resulting gff file, in the source "snap_masked" I can find the
>score in the score column of every snap prediction that did not get
>promoted to a maker gene. This would be the score of how well the
>prediction matches the HMM?
>
>It seems to me that those snap models that are given gene status no
>longer appear as snap_masked source but only as source "maker". Maker
>then removes the score column, instead giving AED and eAED scores (which
>are more about how the model corresponds to the evidence). When viewing
>the maker transcripts and SNAP predictions in a browser, they do not
>match (mostly, maker predictions are longer).
>
>I am interested in the score of individual gene predictions that
>underlined maker gene models. Where could I find that information?
>
>Many thanks!
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Thu Feb 13 14:23:07 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 13 Feb 2014 13:23:07 -0700
Subject: [maker-devel] SNAP scores and AED scores
In-Reply-To: <CF227374.9D6F%carsonhh@gmail.com>
References: <52FC9BA7.6060505@bioinfo.hr>
 <CF227374.9D6F%carsonhh@gmail.com>
Message-ID: <CF227602.9D7E%carsonhh@gmail.com>

On a side note.  Because the MAKER models involve modifying either the ab
initio SNAP model or manipulating the underlying scoring scheme using
hints, the SNAP score on those is virtually meaningless.  However Ian Korf
has developed a tool that can take any gene structure and reverse generate
a score (i.e. what would the score of this gene have been if SNAP would
have called it that way in the first place).

I believe the tool is called fathom and is part of the SNAP package.  It
is not well documented, so you might have to contact Ian Korf directly for
that.  You can use the maker2zff tool to generate the input to fathom.

Thanks,
Carson


On 2/13/14, 1:11 PM, "Carson Holt" <carsonhh at gmail.com> wrote:

>No.  Snap genes do not disappear. All SNAP ab initio calls will always be
>kept as reference fetters marked snap_masked (for repeat masked genome)
>and snap (for unmasked genome).  MAKER then runs SNAP another time where
>it feeds hints to SNAP based on EST and protein alignment evidence.  These
>hint based models can then compete against the ab initio SNAP models to be
>promoted to genes if their AED scores are better.  Fianl models can also
>get UTR added based on EST evidence.  That is why you can get models from
>MAKER that do not match the original SNAP ab initio calls.
>
>So in summary, all SNAP ab initio models will be in snap_masked.  The
>MAKER models will consist of hint based SNAP rerun plus SNAP ab intio
>models processed to add UTR.
>
>Thanks,
>Carson
>
>
>
>On 2/13/14, 3:17 AM, "Masa Roller" <masa at bioinfo.hr> wrote:
>
>>Dear all,
>>
>>I ran snap2 based gene prediction through maker.
>>
>>In the resulting gff file, in the source "snap_masked" I can find the
>>score in the score column of every snap prediction that did not get
>>promoted to a maker gene. This would be the score of how well the
>>prediction matches the HMM?
>>
>>It seems to me that those snap models that are given gene status no
>>longer appear as snap_masked source but only as source "maker". Maker
>>then removes the score column, instead giving AED and eAED scores (which
>>are more about how the model corresponds to the evidence). When viewing
>>the maker transcripts and SNAP predictions in a browser, they do not
>>match (mostly, maker predictions are longer).
>>
>>I am interested in the score of individual gene predictions that
>>underlined maker gene models. Where could I find that information?
>>
>>Many thanks!
>>
>>_______________________________________________
>>maker-devel mailing list
>>maker-devel at box290.bluehost.com
>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>


From barry.utah at gmail.com  Thu Feb 13 14:27:17 2014
From: barry.utah at gmail.com (Barry Moore)
Date: Thu, 13 Feb 2014 13:27:17 -0700
Subject: [maker-devel] SNAP scores and AED scores
In-Reply-To: <CF227374.9D6F%carsonhh@gmail.com>
References: <52FC9BA7.6060505@bioinfo.hr> <CF227374.9D6F%carsonhh@gmail.com>
Message-ID: <39AA5089-3E89-4067-A8DF-60B6716C98DF@genetics.utah.edu>

Hi Masa,

Also, if you want additional SNAP output that hasn't been passed forward in MAKER you can alway access the original SNAP output files in the MAKER datastore.  This is a directory structure created by MAKER to store contig specific data.  There is a datastore directory (and a corresponding index file) in the make output directory.  The index file will provide the path to individual contigs and in that contig specific directory there is a directory call theVoid.  This contains all of the output of each program that MAKER runs.

B

On Feb 13, 2014, at 1:11 PM, Carson Holt wrote:

> No.  Snap genes do not disappear. All SNAP ab initio calls will always be
> kept as reference fetters marked snap_masked (for repeat masked genome)
> and snap (for unmasked genome).  MAKER then runs SNAP another time where
> it feeds hints to SNAP based on EST and protein alignment evidence.  These
> hint based models can then compete against the ab initio SNAP models to be
> promoted to genes if their AED scores are better.  Fianl models can also
> get UTR added based on EST evidence.  That is why you can get models from
> MAKER that do not match the original SNAP ab initio calls.
> 
> So in summary, all SNAP ab initio models will be in snap_masked.  The
> MAKER models will consist of hint based SNAP rerun plus SNAP ab intio
> models processed to add UTR.
> 
> Thanks,
> Carson
> 
> 
> 
> On 2/13/14, 3:17 AM, "Masa Roller" <masa at bioinfo.hr> wrote:
> 
>> Dear all,
>> 
>> I ran snap2 based gene prediction through maker.
>> 
>> In the resulting gff file, in the source "snap_masked" I can find the
>> score in the score column of every snap prediction that did not get
>> promoted to a maker gene. This would be the score of how well the
>> prediction matches the HMM?
>> 
>> It seems to me that those snap models that are given gene status no
>> longer appear as snap_masked source but only as source "maker". Maker
>> then removes the score column, instead giving AED and eAED scores (which
>> are more about how the model corresponds to the evidence). When viewing
>> the maker transcripts and SNAP predictions in a browser, they do not
>> match (mostly, maker predictions are longer).
>> 
>> I am interested in the score of individual gene predictions that
>> underlined maker gene models. Where could I find that information?
>> 
>> Many thanks!
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140213/4966ce40/attachment.html>

From mptrsen at uni-bonn.de  Thu Feb 13 21:00:24 2014
From: mptrsen at uni-bonn.de (Malte Petersen)
Date: Fri, 14 Feb 2014 04:00:24 +0100
Subject: [maker-devel] BLAST options error / should Maker check for file
	format?
Message-ID: <52FD86C8.6040007@uni-bonn.de>

Dear MAKER devs,

I was running Maker version 2.30p-beta on an insect genome, and it
didn't produce any output. I got these error messages:


Widget::formater:
/path/to/makeblastdb -dbtype nucl -in
/tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-62_e3%2Escaf.mpi.10.0
#-------------------------------#
BLAST options error: File
/tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-62_e3%2Escaf.mpi.10.0
is empty
ERROR: /path/to/makeblastdb failed in Widget::formater
--> rank=NA, hostname=Jeanne-GBR
ERROR: Failed while doing blastn of ESTs
ERROR: Chunk failed at level:0, tier_type:3
FAILED CONTIG:scf7180005143343

ERROR: Chunk failed at level:4, tier_type:0
FAILED CONTIG:scf7180005143343


I figured out that this error is due to a non-Fasta file format being
fed to Maker as extrinsic evidence (I gave it a meta-info file).  While
I got the pipeline running now with the correct file, I think that it
should be complaining (a lot earlier) if any of the input files are of
the wrong format.  More people might run into this problem and have no
idea where to look for a solution.

What do you think?

Best,
Malte


From carsonhh at gmail.com  Thu Feb 13 21:11:22 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 13 Feb 2014 20:11:22 -0700
Subject: [maker-devel] BLAST options error / should Maker check for file
 format?
In-Reply-To: <52FD86C8.6040007@uni-bonn.de>
References: <52FD86C8.6040007@uni-bonn.de>
Message-ID: <CF22D59B.9DEB%carsonhh@gmail.com>

Hi Malte,

Actually there already is.  I?m very surprised your file made it that far.
Normally it fails right away.

Example ?>

STATUS: Parsing control files...
STATUS: Processing and indexing input FASTA files...
ERROR: The fasta file /Users/cholt/Developer/maker/trunk/data/test1
appears to be empty.


Another test file ?>


STATUS: Parsing control files...
STATUS: Processing and indexing input FASTA files...
ERROR: The nucleotide sequence file
'/Users/cholt/Developer/maker/trunk/data/test2'
appears to contain protein sequence or unrecognized characters. Note
the following nucleotides may be valid but are unsupported [RYKMSWBDHV]
Please check/fix the file before continuing, or set -fix_nucleotides on
the command line to fix this automatically.
Invalid Character: 'M'


You seem to have found just the right formula of improper input to get
past the filters on your run :-)


Thanks,
Carson


On 2/13/14, 8:00 PM, "Malte Petersen" <mptrsen at uni-bonn.de> wrote:

>Dear MAKER devs,
>
>I was running Maker version 2.30p-beta on an insect genome, and it
>didn't produce any output. I got these error messages:
>
>
>Widget::formater:
>/path/to/makeblastdb -dbtype nucl -in
>/tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-6
>2_e3%2Escaf.mpi.10.0
>#-------------------------------#
>BLAST options error: File
>/tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-6
>2_e3%2Escaf.mpi.10.0
>is empty
>ERROR: /path/to/makeblastdb failed in Widget::formater
>--> rank=NA, hostname=Jeanne-GBR
>ERROR: Failed while doing blastn of ESTs
>ERROR: Chunk failed at level:0, tier_type:3
>FAILED CONTIG:scf7180005143343
>
>ERROR: Chunk failed at level:4, tier_type:0
>FAILED CONTIG:scf7180005143343
>
>
>I figured out that this error is due to a non-Fasta file format being
>fed to Maker as extrinsic evidence (I gave it a meta-info file).  While
>I got the pipeline running now with the correct file, I think that it
>should be complaining (a lot earlier) if any of the input files are of
>the wrong format.  More people might run into this problem and have no
>idea where to look for a solution.
>
>What do you think?
>
>Best,
>Malte
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From dence at genetics.utah.edu  Fri Feb 14 13:09:08 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Fri, 14 Feb 2014 19:09:08 +0000
Subject: [maker-devel] ERROR: Failed while processing the chunk divide!!
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A89090D3@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A8908ADE@SKREGIXES2.AGR.GC.CA>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D4462B@mxb2.hg.genetics.utah.edu>
	<E8EDFB90D92694478065C37017B3A3A6A8908D02@SKREGIXES2.AGR.GC.CA>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D448BA@mxb2.hg.genetics.utah.edu>,
	<E8EDFB90D92694478065C37017B3A3A6A89090D3@SKREGIXES2.AGR.GC.CA>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D452AD@mxb2.hg.genetics.utah.edu>

Hi Hossein, 

So, this is what is going on. The problem is with the GFF3 file, and the problem is that the exon features in that GFF3 should have the mRNA as their parent instead of the gene. When you deleted the "-mRNA-1", the Name of the mRNA became the same as the Name of the gene, which restored the proper relationship between the features. The same problem exists for the CDS features.

The solution for this is to make the exon and CDS parent's "point" to the mRNA and not the gene. Since MAKER has very regular rules for making names, this should be pretty straight forward. You should be ok with just adding "-mRNA-1" to the end of all the exon and CDS lines. This will work unless there some mRNAs with alternative splice forms because then the mRNA's will end with something like "-mRNA-2". 

I've attached a script that should do this for you. 

Run it with this command

"perl fix_gff3_script.pl <your_gff3> > <fixed_gff3>"

And then run MAKER with the fixed gff3 file in place of the old gff3 file. 

Let me know if that works, 

Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
Sent: Thursday, February 13, 2014 3:27 PM
To: Daniel Ence
Subject: Re: ERROR: Failed while processing the chunk divide!!

Dear Daniel


I downloaded maker 2.31 and ran the same scaffold. Again it gave error on
the gff file. I then removed the word mRNA-1 from my gff file and ran it
again. It seems to have worked this time. Attached are std error files for
first try std-err (the one that failed) and 2nd one named std-err-wo-mRNA
(that apparently worked).  Since the gff file is as evidence only I
thought it should not matter to remove the mRNA-1 naming form the gff file.


Cheers

HB


On 14-02-12 12:59 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hi Hossein,
>
>So, after looking at the gff3 and your control files, I had an idea.
>There's the part of the control file called "Re-annotation Using MAKER
>Derived GFF3", but you can also passthrough features from a gff3 using
>the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines.
>
>Sometimes we encounter problems with the MAKER passthrough. Could you try
>dividing the gff3 file into the different feature sources and passing it
>through the "est_gff" etc options and not with the MAKER passthrough?
>That will tell us if the problem is with the gff3 file or with how MAKER
>is processing it.
>
>Another also to check is to make sure that the contig names in the gff3
>file match the contig names in the fasta file that you're annotating.
>
>Thanks,
>Daniel
>
>
>
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>Sent: Wednesday, February 12, 2014 8:49 AM
>To: Daniel Ence
>Subject: Re: ERROR: Failed while processing the chunk divide!!
>
>Dear Daniel
>
>
>I have generated the files that you requested. I choose Sc00009 from my
>genome which is 30 kb and was one of the scaffolds coming up with error.
>In addition to Ctl files and error output file I also attached a part of
>the gff file related to SC00009 that is indicated in the error message.
>
>
>Thanks for helping with this
>
>
>
>Regards
>
>
>HB
>
>
>
>
>
>
>
>
>
>
>
>
>On 14-02-11 4:59 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>
>>Hi Hossen,
>>
>>I think that what would be the most help right now is if you ran MAKER on
>>only one of those contigs that are failing and send me the entire error
>>output along with the maker control files that you are using. It looks
>>like the error is coming from the gff3 files that you are using as input.
>>
>>Thanks,
>>Daniel
>>
>>
>>
>>Daniel Ence
>>Graduate Student
>>Eccles Institute of Human Genetics
>>University of Utah
>>15 North 2030 East, Room 2100
>>Salt Lake City, UT 84112-5330
>>________________________________________
>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>>Sent: Tuesday, February 11, 2014 3:51 PM
>>To: Daniel Ence
>>Subject: ERROR: Failed while processing the chunk divide!!
>>
>>Dear Daniel
>>
>>I re-started maker and it is still running. But in error our file that
>>has
>>been generated so far it seems that smaller conitgs are affected. There
>>are contigs of 2-4 kb with this error but also I noticed a contig of 30kb
>>length having this error
>>
>>I was wondering if I need to change the setting in the maker_opt file
>>
>>#-----MAKER Behavior Options
>>max_dna_len=100000 #length for dividing up contigs into chunks
>>(increases/decreases  memory usage)
>>min_contig=1 #skip genome contigs below this length (under 10kb are often
>>useless)
>>
>>
>>If I understand correctly max_dna_len   divide conitgs  of over 100kb to
>>smaller chucks. However it is not clear to me that for the min_contig
>>option if the default contig length is 10kb or less, then why I have
>>error
>>message for 30kb long contigs. Should I change this to 0
>>
>>Here is an example of the error message for one of the contigs
>>
>>
>>#--------- command -------------#
>>Widget::exonerate::est2genome:
>>/usr/local/exonerate-2.2.0-x86_64/bin/exonerate  -q
>>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bra
>>s
>>s
>>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35
>>/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta
>>-t
>>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bra
>>s
>>s
>>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom
>>e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-113
>>6
>>.
>>fasta
>>-Q dna -T dna --model est2genome
>>--minintron 20 --showcigar --percent 20 >
>>/raid01/projects/Plasmodiophora/brassica
>>e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.bras
>>s
>>i
>>cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT
>>3
>>S
>>c00001.235-1136.comp14545_c0_seq1.est_exonerate
>>#-------------------------------#
>>cleaning blastn...
>>cleaning tblastx...
>>cleaning blastx...
>>ERROR: Failed on
>>PbPT3Sc00001_S_0.8_1-mRNA-1
>>Check your input GFF3 file for errors!
>>(from GFFDB)
>>
>>FATAL ERROR
>>ERROR: Failed while processing the chunk
>>divide!!
>>
>>ERROR: Chunk failed at level 17
>>!!
>>FAILED CONTIG:PbPT3Sc00001
>>
>>
>>
>>
>>--Next Contig--
>>
>>
>>
>>
>>
>>
>>Regards
>>
>>
>>HB
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>On 14-02-11 12:37 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>
>>>Hossein,
>>>
>>>Ok. So since this error came up on a local install, I'm going to need
>>>some more information to understand what went wrong. Is it the same
>>>contig that always causes this error? If it is, then is the the only
>>>error or warning that MAKER encounters while running on this contig? Or,
>>>if multiple contigs fail, then is it always the same error?
>>>
>>>If you can narrow it down to the smallest possible dataset that
>>>consistently gives the same error, then we canb egin to understand
>>>what's
>>>wrong.
>>>
>>>Thanks,
>>>Daniel
>>>
>>>
>>>Daniel Ence
>>>Graduate Student
>>>Eccles Institute of Human Genetics
>>>University of Utah
>>>15 North 2030 East, Room 2100
>>>Salt Lake City, UT 84112-5330
>>>________________________________________
>>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>>>Sent: Tuesday, February 11, 2014 11:20 AM
>>>To: Daniel Ence
>>>Subject: Re: [maker-devel] Falied to create new account
>>>
>>>Hi Daniel
>>>
>>>I running it through the local server at my work
>>>
>>>
>>>
>>>
>>>
>>>
>>>M. Hossein Borhan, Ph.D.
>>>Research Scientist/ Chercheur Scientifique
>>>Saskatoon Research Centre/Centre de Recherches de Saskatoon
>>>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
>>>107 Science Place, Saskatoon, SK.,S7N 0X2
>>>Telephone/T?l?phone: (306) 385-9441
>>>Facsimile/T?l?copieur: (306) 385-9482
>>>Hossein.borhan at agr.gc.ca
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>On 14-02-11 12:16 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>>
>>>>Hi Hossein,
>>>>
>>>>Did you encounter this error while you were running MAKER on your local
>>>>machine or through the MAKER web annotation service?
>>>>
>>>>Thanks,
>>>>Daniel
>>>>
>>>>
>>>>Daniel Ence
>>>>Graduate Student
>>>>Eccles Institute of Human Genetics
>>>>University of Utah
>>>>15 North 2030 East, Room 2100
>>>>Salt Lake City, UT 84112-5330
>>>>________________________________________
>>>>From: Carson Holt [carsonhh at gmail.com]
>>>>Sent: Tuesday, February 11, 2014 10:18 AM
>>>>To: Daniel Ence
>>>>Cc: Mark Yandell
>>>>Subject: FW: [maker-devel] Falied to create new account
>>>>
>>>>Hey Daniel could you download his dataset, and see if you can replicate
>>>>the error.  Also check if this was an MWAS job or a local maker run
>>>>(his
>>>>dataset will already be there for MWAS, you just need the job ID).
>>>>
>>>>Thanks,
>>>>Carson
>>>>
>>>>On 2/11/14, 10:16 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA>
>>>>wrote:
>>>>
>>>>>Hi Carson
>>>>>
>>>>>
>>>>>I encountered this error while running maker
>>>>>
>>>>>FATAL ERROR
>>>>>ERROR: Failed while processing the chunk divide!!
>>>>>
>>>>>ERROR: Chunk failed at level 17
>>>>>!!
>>>>>FAILED CONTIG:PbPT3Sc00006
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>HB
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix_gff3_script.pl
Type: application/octet-stream
Size: 349 bytes
Desc: fix_gff3_script.pl
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140214/364c961e/attachment.obj>

From claudio.valero at wur.nl  Mon Feb 17 03:23:21 2014
From: claudio.valero at wur.nl (Valero Jimenez, Claudio)
Date: Mon, 17 Feb 2014 09:23:21 +0000
Subject: [maker-devel] Maker not predicting many genes
Message-ID: <A60E0B903F7C834D8F8ED0D21DE86ECF1CF820@SCOMP0936.wurnet.nl>

Dear list,

I'm trying to annotate a fungal genome, and I'm surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

Regards,

Claudio


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140217/69ce0cfc/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: maker_opts.log
Type: application/octet-stream
Size: 4776 bytes
Desc: maker_opts.log
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140217/69ce0cfc/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SOBA.pdf
Type: application/pdf
Size: 210262 bytes
Desc: SOBA.pdf
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140217/69ce0cfc/attachment.pdf>

From carson.holt at genetics.utah.edu  Mon Feb 17 13:22:13 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Mon, 17 Feb 2014 19:22:13 +0000
Subject: [maker-devel] Maker not predicting many genes
In-Reply-To: <A60E0B903F7C834D8F8ED0D21DE86ECF1CF820@SCOMP0936.wurnet.nl>
References: <A60E0B903F7C834D8F8ED0D21DE86ECF1CF820@SCOMP0936.wurnet.nl>
Message-ID: <CF27AB29.9F59%carson.holt@genetics.utah.edu>

You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off.

If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).

Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.

Thanks,
Carson


From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>
Date: Monday, February 17, 2014 at 2:23 AM
To: "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Maker not predicting many genes

Dear list,

I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

Regards,

Claudio


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140217/d8a9d19c/attachment.html>

From carsonhh at gmail.com  Mon Feb 17 13:26:05 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 17 Feb 2014 12:26:05 -0700
Subject: [maker-devel] Maker not predicting many genes
Message-ID: <CF27AFF8.9F83%carsonhh@gmail.com>

>From your control file, it looks like not setting single_exon=1, and only
using UniProt rather than supplying complete proteomes of a related species
are your primary shortcomings.  I?d set correct_est_fusion=1 as well.

?Carson


From:  Carson Holt <carson.holt at genetics.utah.edu>
Date:  Monday, February 17, 2014 at 12:22 PM
To:  "Valero Jimenez, Claudio" <claudio.valero at wur.nl>,
"'maker-devel at yandell-lab.org'" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Maker not predicting many genes

You also need to look at the contigs in a browser like apollo.  That will
allow you to see both the predictions and the evidence in context.  You can
then see if genes are being dropped because they are only being supported by
single exon evidence, they have no evidence support whatsoever, or if they
are being excluded because of UTR overlap.  That last one is a common
problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so
close that they often overlap in the UTR.  As a result, mRNA-seq assemblers
falsely asseble neighboring genes into single transcripts.  The result is
really long UTR on some of your gene models that force other models to be
excluded.  If this is the case, rerun something like trinity with the
jacquard clip option set  to avoid transcript fusion.  Then set
correct_est_fusion=1 in the MAKER control files to get those long false
UTR?s clipped off.

If it is a lack of evidence overlap, make sure you provided minimum 1
proteome from a related species to the protein= option.  At least 2
proteomes are recommended though (these are not proteins from the same
species but rather complete proteomes from related species).  Also
comprehensive databases like UniProt/Swiss-prot are not sufficient on their
own, but can supplement the other proteome data.  Also are you providing EST
data?  Note that EST/mRNA-seq data without a proteome from a related species
is also not siufficient (because both quality and how comprehensive
EST/mRNA-seq databsases are can vary so widely, and may only capture as
little as 30% of the genes).

Another thing that comes into play are single exon evidence.  In anything
but fungi, single exon evidence is mostly caused by spurious alignments.
But fungi have so many single exon genes, that this is not the case for
them.  Make sure single_exon=1 is set to allow that evidence to be kept, and
set the length of single exon evidence to keep to something like 250 bp.

Thanks,
Carson


From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl>
Date: Monday, February 17, 2014 at 2:23 AM
To: "'maker-devel at yandell-lab.org'" <maker-devel at yandell-lab.org>
Subject: Maker not predicting many genes

Dear list,
 
I?m trying to annotate a fungal genome, and I?m surprised that Maker does
not predict many genes (3697). I have trained SNAP and followed all the
tutorials available. Ab initio predictors are able to predict between
8000-10000 genes. It is something that I have in the configuration file that
is wrong?? I attach the ops file and the SOBA summary of the annotation.
 
Regards,
 
Claudio
 
 
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140217/6c29cf24/attachment.html>

From claudio.valero at wur.nl  Wed Feb 19 02:20:04 2014
From: claudio.valero at wur.nl (Valero Jimenez, Claudio)
Date: Wed, 19 Feb 2014 08:20:04 +0000
Subject: [maker-devel] Maker not predicting many genes
In-Reply-To: <CF27AFF8.9F83%carsonhh@gmail.com>
References: <CF27AFF8.9F83%carsonhh@gmail.com>
Message-ID: <A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>

Hi Carson,

Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error:

Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67.

Similar thing happens when I try fasta_merge:

Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52.

I never had this problem before with these commands.


Regards,

Claudio

From: Carson Holt [mailto:carsonhh at gmail.com]
Sent: maandag 17 februari 2014 20:26
To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org'
Subject: Re: [maker-devel] Maker not predicting many genes

From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings.  I?d set correct_est_fusion=1 as well.

?Carson


From: Carson Holt <carson.holt at genetics.utah.edu<mailto:carson.holt at genetics.utah.edu>>
Date: Monday, February 17, 2014 at 12:22 PM
To: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>, "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] Maker not predicting many genes

You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off.

If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).

Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.

Thanks,
Carson


From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>
Date: Monday, February 17, 2014 at 2:23 AM
To: "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Maker not predicting many genes

Dear list,

I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

Regards,

Claudio


_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140219/ac13ef29/attachment.html>

From carsonhh at gmail.com  Wed Feb 19 09:34:33 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 19 Feb 2014 08:34:33 -0700
Subject: [maker-devel] Maker not predicting many genes
In-Reply-To: <A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>
References: <CF27AFF8.9F83%carsonhh@gmail.com>
	<A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>
Message-ID: <CF2A1C44.A02B%carsonhh@gmail.com>

You provided a directory rather than a file to the -d option (?d' stands for
datastore log).
You must provide the location of the datastore index log file and not the
datastore directory.

Example ?> ./dpp_contig.maker.output/dpp_contig_master_datastore_index.log

Thanks,
Carson


From:  "Valero Jimenez, Claudio" <claudio.valero at wur.nl>
Date:  Wednesday, February 19, 2014 at 1:20 AM
To:  Carson Holt <carsonhh at gmail.com>, Carson Holt
<carson.holt at genetics.utah.edu>, "'maker-devel at yandell-lab.org'"
<maker-devel at yandell-lab.org>
Subject:  RE: [maker-devel] Maker not predicting many genes

Hi Carson,
 
Thank you for your suggestions. I ran again Maker and it was able to predict
many more genes. Although I have a different problem now. I try to run
gff3_merge and get the following error:
 
Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge
line 67.
 
Similar thing happens when I try fasta_merge:
 
Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge
line 52.
 
I never had this problem before with these commands.
 
 
Regards,
 
Claudio
 

From: Carson Holt [mailto:carsonhh at gmail.com]
Sent: maandag 17 februari 2014 20:26
To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org'
Subject: Re: [maker-devel] Maker not predicting many genes
 

>From your control file, it looks like not setting single_exon=1, and only
using UniProt rather than supplying complete proteomes of a related species
are your primary shortcomings.  I?d set correct_est_fusion=1 as well.

 
?Carson

 
From: Carson Holt <carson.holt at genetics.utah.edu>
Date: Monday, February 17, 2014 at 12:22 PM
To: "Valero Jimenez, Claudio" <claudio.valero at wur.nl>,
"'maker-devel at yandell-lab.org'" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Maker not predicting many genes

 
You also need to look at the contigs in a browser like apollo.  That will
allow you to see both the predictions and the evidence in context.  You can
then see if genes are being dropped because they are only being supported by
single exon evidence, they have no evidence support whatsoever, or if they
are being excluded because of UTR overlap.  That last one is a common
problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so
close that they often overlap in the UTR.  As a result, mRNA-seq assemblers
falsely asseble neighboring genes into single transcripts.  The result is
really long UTR on some of your gene models that force other models to be
excluded.  If this is the case, rerun something like trinity with the
jacquard clip option set  to avoid transcript fusion.  Then set
correct_est_fusion=1 in the MAKER control files to get those long false
UTR?s clipped off.

 
If it is a lack of evidence overlap, make sure you provided minimum 1
proteome from a related species to the protein= option.  At least 2
proteomes are recommended though (these are not proteins from the same
species but rather complete proteomes from related species).  Also
comprehensive databases like UniProt/Swiss-prot are not sufficient on their
own, but can supplement the other proteome data.  Also are you providing EST
data?  Note that EST/mRNA-seq data without a proteome from a related species
is also not siufficient (because both quality and how comprehensive
EST/mRNA-seq databsases are can vary so widely, and may only capture as
little as 30% of the genes).

 
Another thing that comes into play are single exon evidence.  In anything
but fungi, single exon evidence is mostly caused by spurious alignments.
But fungi have so many single exon genes, that this is not the case for
them.  Make sure single_exon=1 is set to allow that evidence to be kept, and
set the length of single exon evidence to keep to something like 250 bp.

 
Thanks,

Carson

 
From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl>
Date: Monday, February 17, 2014 at 2:23 AM
To: "'maker-devel at yandell-lab.org'" <maker-devel at yandell-lab.org>
Subject: Maker not predicting many genes

 
Dear list,
 
I?m trying to annotate a fungal genome, and I?m surprised that Maker does
not predict many genes (3697). I have trained SNAP and followed all the
tutorials available. Ab initio predictors are able to predict between
8000-10000 genes. It is something that I have in the configuration file that
is wrong?? I attach the ops file and the SOBA summary of the annotation.
 
Regards,
 
Claudio
 
 
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
<http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140219/a158d5b1/attachment.html>

From dence at genetics.utah.edu  Wed Feb 19 10:04:08 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Wed, 19 Feb 2014 16:04:08 +0000
Subject: [maker-devel] Maker not predicting many genes
In-Reply-To: <A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>
References: <CF27AFF8.9F83%carsonhh@gmail.com>,
	<A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6272A@mxb2.hg.genetics.utah.edu>

Hi Claudio,

What was the command line you used for gff3_merge?

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl]
Sent: Wednesday, February 19, 2014 1:20 AM
To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org'
Subject: Re: [maker-devel] Maker not predicting many genes

Hi Carson,

Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error:

Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67.

Similar thing happens when I try fasta_merge:

Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52.

I never had this problem before with these commands.


Regards,

Claudio

From: Carson Holt [mailto:carsonhh at gmail.com]
Sent: maandag 17 februari 2014 20:26
To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org'
Subject: Re: [maker-devel] Maker not predicting many genes

>From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings.  I?d set correct_est_fusion=1 as well.

?Carson


From: Carson Holt <carson.holt at genetics.utah.edu<mailto:carson.holt at genetics.utah.edu>>
Date: Monday, February 17, 2014 at 12:22 PM
To: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>, "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] Maker not predicting many genes

You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off.

If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).

Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.

Thanks,
Carson


From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>
Date: Monday, February 17, 2014 at 2:23 AM
To: "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Maker not predicting many genes

Dear list,

I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

Regards,

Claudio


_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140219/4b409201/attachment.html>

From claudio.valero at wur.nl  Wed Feb 19 10:33:36 2014
From: claudio.valero at wur.nl (Valero Jimenez, Claudio)
Date: Wed, 19 Feb 2014 16:33:36 +0000
Subject: [maker-devel] Maker not predicting many genes
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6272A@mxb2.hg.genetics.utah.edu>
References: <CF27AFF8.9F83%carsonhh@gmail.com>,
	<A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6272A@mxb2.hg.genetics.utah.edu>
Message-ID: <A60E0B903F7C834D8F8ED0D21DE86ECF1D695A@SCOMP0936.wurnet.nl>

Hi,

Thanks, I had a mistake in the command line!!!

Regards,

Claudio

From: Daniel Ence [mailto:dence at genetics.utah.edu]
Sent: woensdag 19 februari 2014 17:04
To: Valero Jimenez, Claudio; 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org'
Subject: RE: [maker-devel] Maker not predicting many genes

Hi Claudio,

What was the command line you used for gff3_merge?

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl]
Sent: Wednesday, February 19, 2014 1:20 AM
To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org'
Subject: Re: [maker-devel] Maker not predicting many genes
Hi Carson,

Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error:

Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67.

Similar thing happens when I try fasta_merge:

Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52.

I never had this problem before with these commands.


Regards,

Claudio

From: Carson Holt [mailto:carsonhh at gmail.com]
Sent: maandag 17 februari 2014 20:26
To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org'
Subject: Re: [maker-devel] Maker not predicting many genes

>From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings.  I'd set correct_est_fusion=1 as well.

-Carson


From: Carson Holt <carson.holt at genetics.utah.edu<mailto:carson.holt at genetics.utah.edu>>
Date: Monday, February 17, 2014 at 12:22 PM
To: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>, "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] Maker not predicting many genes

You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR's clipped off.

If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).

Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.

Thanks,
Carson


From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>
Date: Monday, February 17, 2014 at 2:23 AM
To: "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Maker not predicting many genes

Dear list,

I'm trying to annotate a fungal genome, and I'm surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

Regards,

Claudio


_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140219/2ad5ced8/attachment.html>

From barry.utah at gmail.com  Wed Feb 19 12:03:47 2014
From: barry.utah at gmail.com (Barry Moore)
Date: Wed, 19 Feb 2014 11:03:47 -0700
Subject: [maker-devel] Maker not predicting many genes
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6272A@mxb2.hg.genetics.utah.edu>
References: <CF27AFF8.9F83%carsonhh@gmail.com>,
	<A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6272A@mxb2.hg.genetics.utah.edu>
Message-ID: <0F5B5A10-4B50-47EC-847B-0223E4CCF612@genetics.utah.edu>

Hi Daniel,

Could you add an error message to those two scripts that detects that a filename is missing or that a directory was given instead and gives the user a suggested solution.

Thanks,

B

On Feb 19, 2014, at 9:04 AM, Daniel Ence wrote:

> Hi Claudio, 
> 
> What was the command line you used for gff3_merge?
> 
> Thanks,
> Daniel
> 
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl]
> Sent: Wednesday, February 19, 2014 1:20 AM
> To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org'
> Subject: Re: [maker-devel] Maker not predicting many genes
> 
> Hi Carson,
>  
> Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error:
>  
> Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67.
>  
> Similar thing happens when I try fasta_merge:
>  
> Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52.
>  
> I never had this problem before with these commands.
>  
>  
> Regards,
>  
> Claudio
>  
> From: Carson Holt [mailto:carsonhh at gmail.com] 
> Sent: maandag 17 februari 2014 20:26
> To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org'
> Subject: Re: [maker-devel] Maker not predicting many genes
>  
> From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings.  I?d set correct_est_fusion=1 as well.
>  
> ?Carson
>  
>  
> From: Carson Holt <carson.holt at genetics.utah.edu>
> Date: Monday, February 17, 2014 at 12:22 PM
> To: "Valero Jimenez, Claudio" <claudio.valero at wur.nl>, "'maker-devel at yandell-lab.org'" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Maker not predicting many genes
>  
> You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off.
>  
> If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).
>  
> Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.
>  
> Thanks,
> Carson
>  
>  
>  
>  
>  
>  
> From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl>
> Date: Monday, February 17, 2014 at 2:23 AM
> To: "'maker-devel at yandell-lab.org'" <maker-devel at yandell-lab.org>
> Subject: Maker not predicting many genes
>  
> Dear list,
>  
> I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.
>  
> Regards,
>  
> Claudio
>  
>  
> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140219/fa42921a/attachment.html>

From carson.holt at genetics.utah.edu  Wed Feb 19 12:06:52 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Wed, 19 Feb 2014 18:06:52 +0000
Subject: [maker-devel] Maker not predicting many genes
In-Reply-To: <0F5B5A10-4B50-47EC-847B-0223E4CCF612@genetics.utah.edu>
References: <CF27AFF8.9F83%carsonhh@gmail.com>
	<A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6272A@mxb2.hg.genetics.utah.edu>
	<0F5B5A10-4B50-47EC-847B-0223E4CCF612@genetics.utah.edu>
Message-ID: <CF2A4058.A064%carson.holt@genetics.utah.edu>

You only need to swap a single character in the script.  Just change the  -e (exists) test to a -f (is file) test.

Thanks,
Carson

From: Barry Moore <barry.utah at gmail.com<mailto:barry.utah at gmail.com>>
Date: Wednesday, February 19, 2014 at 11:03 AM
To: Daniel Ence <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>>
Cc: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>, Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>, Carson Holt <carson.holt at genetics.utah.edu<mailto:carson.holt at genetics.utah.edu>>, "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] Maker not predicting many genes

Hi Daniel,

Could you add an error message to those two scripts that detects that a filename is missing or that a directory was given instead and gives the user a suggested solution.

Thanks,

B

On Feb 19, 2014, at 9:04 AM, Daniel Ence wrote:

Hi Claudio,

What was the command line you used for gff3_merge?

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org>] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>]
Sent: Wednesday, February 19, 2014 1:20 AM
To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>'
Subject: Re: [maker-devel] Maker not predicting many genes

Hi Carson,

Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error:

Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67.

Similar thing happens when I try fasta_merge:

Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52.

I never had this problem before with these commands.


Regards,

Claudio

From: Carson Holt [mailto:carsonhh at gmail.com]
Sent: maandag 17 februari 2014 20:26
To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>'
Subject: Re: [maker-devel] Maker not predicting many genes

From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings.  I?d set correct_est_fusion=1 as well.

?Carson


From: Carson Holt <carson.holt at genetics.utah.edu<mailto:carson.holt at genetics.utah.edu>>
Date: Monday, February 17, 2014 at 12:22 PM
To: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>, "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] Maker not predicting many genes

You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off.

If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).

Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.

Thanks,
Carson


From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>
Date: Monday, February 17, 2014 at 2:23 AM
To: "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Maker not predicting many genes

Dear list,

I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

Regards,

Claudio


_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140219/6a80ec35/attachment.html>

From gtaylor at bcgsc.ca  Fri Feb 21 12:48:42 2014
From: gtaylor at bcgsc.ca (Greg Taylor)
Date: Fri, 21 Feb 2014 10:48:42 -0800
Subject: [maker-devel] Maker jobs hanging
Message-ID: <C521977B031ADB40857D0FE9C98CC82737CC600AA1@xchange4>

Hello,
 I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb genome with predictors SNAP and Genemark, and using ABySS assembled RNA-seq data. To do this I am using 480 processors on our local cluster. Once a run begins, 479 contigs are started, as noted in the *_master_datastore_index.log file, the standard error log for the whole job looks normal, as do the run.log and run.log.child.0 for the daughter processes. This seems to be sequence dependent, as re-running contigs that hang doesn't help, the same contigs will always hang. I'm still looking into this myself, but it seems most if not all the jobs are stuck at the Blastx stage. If you have any suggestions, your help would be greatly appreciated.

sincerely,
Greg Taylor


From dence at genetics.utah.edu  Fri Feb 21 12:54:17 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Fri, 21 Feb 2014 18:54:17 +0000
Subject: [maker-devel] Maker jobs hanging
In-Reply-To: <C521977B031ADB40857D0FE9C98CC82737CC600AA1@xchange4>
References: <C521977B031ADB40857D0FE9C98CC82737CC600AA1@xchange4>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D66CD0@mxb2.hg.genetics.utah.edu>

Hi Greg, 

Since this is probably going to be a more complicated situation, would you upload your data and control file at this URL so that we can try to replicate the error on our machines?
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=166

Also, which version of MPI are you using? And you might want to try updating MAKER. I think version 2.31 was just updated a few weeks ago. 

Thanks,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Greg Taylor [gtaylor at bcgsc.ca]
Sent: Friday, February 21, 2014 11:48 AM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] Maker jobs hanging

Hello,
 I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb genome with predictors SNAP and Genemark, and using ABySS assembled RNA-seq data. To do this I am using 480 processors on our local cluster. Once a run begins, 479 contigs are started, as noted in the *_master_datastore_index.log file, the standard error log for the whole job looks normal, as do the run.log and run.log.child.0 for the daughter processes. This seems to be sequence dependent, as re-running contigs that hang doesn't help, the same contigs will always hang. I'm still looking into this myself, but it seems most if not all the jobs are stuck at the Blastx stage. If you have any suggestions, your help would be greatly appreciated.

sincerely,
Greg Taylor
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Fri Feb 21 12:56:50 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 21 Feb 2014 11:56:50 -0700
Subject: [maker-devel] Maker jobs hanging
Message-ID: <CF2CEDC6.A15D%carsonhh@gmail.com>

Use 2.31.  It has been tested to work without issue on several thousand
cpus.  Also use OpenMPI for any jobs greater than 100 cpus. In addition,
OpenMPI can freeze on some systems without the following flag when using
perl based MPI programs --> -mca btl ^openib

Example --> mpiexec -mca btl ^openib -n 200 maker


Finally, never use MVAPICH2.  It doesn't play well with perl, and freezes
whenever perl based MPI jobs extend across nodes (they run fine within a
single node though).

?Carson


On 2/21/14, 11:48 AM, "Greg Taylor" <gtaylor at bcgsc.ca> wrote:

>Hello,
> I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb
>genome with predictors SNAP and Genemark, and using ABySS assembled
>RNA-seq data. To do this I am using 480 processors on our local cluster.
>Once a run begins, 479 contigs are started, as noted in the
>*_master_datastore_index.log file, the standard error log for the whole
>job looks normal, as do the run.log and run.log.child.0 for the daughter
>processes. This seems to be sequence dependent, as re-running contigs
>that hang doesn't help, the same contigs will always hang. I'm still
>looking into this myself, but it seems most if not all the jobs are stuck
>at the Blastx stage. If you have any suggestions, your help would be
>greatly appreciated.
>
>sincerely,
>Greg Taylor
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From dence at genetics.utah.edu  Fri Feb 21 16:04:34 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Fri, 21 Feb 2014 22:04:34 +0000
Subject: [maker-devel] FW:  Maker jobs hanging
In-Reply-To: <CF2CEDC6.A15D%carsonhh@gmail.com>
References: <CF2CEDC6.A15D%carsonhh@gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D66D7E@mxb2.hg.genetics.utah.edu>

Hi Greg, 

You should be able to have the new MAKER work on the old datastore. Note the following advice from the main MAKER developer, Carson Holt. 

Thanks,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Carson Holt [carsonhh at gmail.com]
Sent: Friday, February 21, 2014 11:56 AM
To: Greg Taylor; maker-devel at yandell-lab.org
Subject: Re: [maker-devel] Maker jobs hanging

Use 2.31.  It has been tested to work without issue on several thousand
cpus.  Also use OpenMPI for any jobs greater than 100 cpus. In addition,
OpenMPI can freeze on some systems without the following flag when using
perl based MPI programs --> -mca btl ^openib

Example --> mpiexec -mca btl ^openib -n 200 maker


Finally, never use MVAPICH2.  It doesn't play well with perl, and freezes
whenever perl based MPI jobs extend across nodes (they run fine within a
single node though).

?Carson


On 2/21/14, 11:48 AM, "Greg Taylor" <gtaylor at bcgsc.ca> wrote:

>Hello,
> I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb
>genome with predictors SNAP and Genemark, and using ABySS assembled
>RNA-seq data. To do this I am using 480 processors on our local cluster.
>Once a run begins, 479 contigs are started, as noted in the
>*_master_datastore_index.log file, the standard error log for the whole
>job looks normal, as do the run.log and run.log.child.0 for the daughter
>processes. This seems to be sequence dependent, as re-running contigs
>that hang doesn't help, the same contigs will always hang. I'm still
>looking into this myself, but it seems most if not all the jobs are stuck
>at the Blastx stage. If you have any suggestions, your help would be
>greatly appreciated.
>
>sincerely,
>Greg Taylor
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From dence at genetics.utah.edu  Fri Feb 21 20:38:59 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Sat, 22 Feb 2014 02:38:59 +0000
Subject: [maker-devel] I am a PhD candidate at NMSU and have a question
	about maker2
In-Reply-To: <7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu>
References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu>,
	<7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D66E9C@mxb2.hg.genetics.utah.edu>

Hi Joe, 

Will you upload your control files and data at this URL?
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169

Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene?

I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues. 

Thanks,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Mark Yandell
Sent: Friday, February 21, 2014 7:32 PM
To: Daniel Ence
Subject: FW: I am a PhD candidate at NMSU and have a question about maker2

Mark Yandell
Professor of Human Genetics
H.A. & Edna Benning Presidential Endowed Chair
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
ph:801-587-7707

________________________________________
From: Joseph Said [joesaid at nmsu.edu]
Sent: Friday, February 21, 2014 5:18 PM
To: Mark Yandell
Subject: I am a PhD candidate at NMSU and have a question about maker2

Dear Dr. Yandell,

I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me.

Thank you,
Joe

Sent from my iPad


From dence at genetics.utah.edu  Fri Feb 21 22:27:10 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Sat, 22 Feb 2014 04:27:10 +0000
Subject: [maker-devel] I am a PhD candidate at NMSU and have a question
	about maker2
In-Reply-To: <d5533a5c463b498e877651cd01820309@BY2PR01MB506.prod.exchangelabs.com>
References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu>,
	<7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu>,
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66E9C@mxb2.hg.genetics.utah.edu>,
	<d5533a5c463b498e877651cd01820309@BY2PR01MB506.prod.exchangelabs.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D66ECE@mxb2.hg.genetics.utah.edu>

Hi Joe, 

MAKER runs blast from your local system (or your server where MAKER is installed), and it blasts evidence that the user supplies in the "est" and "protein" settings. The est and protein settings are set in the maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file and the specific blast settings are in the "maker_bopts.ctl" file. 

Will you attach those file to your reply, so we can make sure that the settings are set up correctly?

Thanks,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Joseph Said [joesaid at nmsu.edu]
Sent: Friday, February 21, 2014 7:44 PM
To: Daniel Ence
Subject: RE: I am a PhD candidate at NMSU and have a question about maker2

Hi Daniel,

Thank you for getting back to me so quickly. I am using the cotton Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I believe maker2 just calls BLAST from NCBI's page. So when I search the cotton genome it returns zero hits. But then I used a known cotton gene as a test and ran a search and also returned zero hits. I am not sure what the problem is but it seems like the protocol that should be returning the results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits. I can a BLAST standalone and came up with hits for both my gene of interest and the control test gene and came up with results.

Thanks,
Joe
________________________________________
From: Daniel Ence <dence at genetics.utah.edu>
Sent: Friday, February 21, 2014 7:38 PM
To: Joseph Said
Cc: maker-devel at yandell-lab.org
Subject: RE: I am a PhD candidate at NMSU and have a question about maker2

Hi Joe,

Will you upload your control files and data at this URL?
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169

Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene?

I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues.

Thanks,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Mark Yandell
Sent: Friday, February 21, 2014 7:32 PM
To: Daniel Ence
Subject: FW: I am a PhD candidate at NMSU and have a question about maker2

Mark Yandell
Professor of Human Genetics
H.A. & Edna Benning Presidential Endowed Chair
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
ph:801-587-7707

________________________________________
From: Joseph Said [joesaid at nmsu.edu]
Sent: Friday, February 21, 2014 5:18 PM
To: Mark Yandell
Subject: I am a PhD candidate at NMSU and have a question about maker2

Dear Dr. Yandell,

I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me.

Thank you,
Joe

Sent from my iPad


From dence at genetics.utah.edu  Sat Feb 22 16:51:48 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Sat, 22 Feb 2014 22:51:48 +0000
Subject: [maker-devel] I am a PhD candidate at NMSU and have a question
	about maker2
In-Reply-To: <CA+ebk3=kXzXEH+DVjKFvMNt689-Gwjw-+6GtySaMG_gZLQ5XvA@mail.gmail.com>
References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu>
	<7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66E9C@mxb2.hg.genetics.utah.edu>
	<d5533a5c463b498e877651cd01820309@BY2PR01MB506.prod.exchangelabs.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66ECE@mxb2.hg.genetics.utah.edu>
	<6FA1C2F9-68A0-4154-8825-7B502E4762BF@nmsu.edu>,
	<CA+ebk3=kXzXEH+DVjKFvMNt689-Gwjw-+6GtySaMG_gZLQ5XvA@mail.gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D66F8F@mxb2.hg.genetics.utah.edu>

Hi,

Will you send me the long file that you were trying to blast against?

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: Hua Zhong [zh9118 at gmail.com]
Sent: Saturday, February 22, 2014 10:46 AM
To: Daniel Ence
Cc: Joe Song; Joseph Said
Subject: Re: I am a PhD candidate at NMSU and have a question about maker2

hi all,
Attached are the three configuration files and two input files, which are used to predict something between the genome and protein. For a simple test, we used one short sequence about 60bp and its translated protein sequence as inputs. But got nothing returned. What's more, we did test long genome sequence as one input as well, but still got nothing. I am not sure what's the reason cause this result.
Thanks a lot for help.

Hua


On Fri, Feb 21, 2014 at 9:31 PM, Joseph Said <joesaid at nmsu.edu<mailto:joesaid at nmsu.edu>> wrote:
Hi Daniel,

I do not have the exact files with me right now, but my coauthors on the paper I am working on have been copied on this email. Hua can send you those files. Thank you for being very helpful especially on a Friday night.

Thanks,
Joe

Sent from my iPad

> On Feb 21, 2014, at 9:27 PM, "Daniel Ence" <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>> wrote:
>
> Hi Joe,
>
> MAKER runs blast from your local system (or your server where MAKER is installed), and it blasts evidence that the user supplies in the "est" and "protein" settings. The est and protein settings are set in the maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file and the specific blast settings are in the "maker_bopts.ctl" file.
>
> Will you attach those file to your reply, so we can make sure that the settings are set up correctly?
>
> Thanks,
> Daniel
>
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ________________________________________
> From: Joseph Said [joesaid at nmsu.edu<mailto:joesaid at nmsu.edu>]
> Sent: Friday, February 21, 2014 7:44 PM
> To: Daniel Ence
> Subject: RE: I am a PhD candidate at NMSU and have a question about maker2
>
> Hi Daniel,
>
> Thank you for getting back to me so quickly. I am using the cotton Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I believe maker2 just calls BLAST from NCBI's page. So when I search the cotton genome it returns zero hits. But then I used a known cotton gene as a test and ran a search and also returned zero hits. I am not sure what the problem is but it seems like the protocol that should be returning the results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits. I can a BLAST standalone and came up with hits for both my gene of interest and the control test gene and came up with results.
>
> Thanks,
> Joe
> ________________________________________
> From: Daniel Ence <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>>
> Sent: Friday, February 21, 2014 7:38 PM
> To: Joseph Said
> Cc: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
> Subject: RE: I am a PhD candidate at NMSU and have a question about maker2
>
> Hi Joe,
>
> Will you upload your control files and data at this URL?
> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169
>
> Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene?
>
> I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues.
>
> Thanks,
> Daniel
>
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ________________________________________
> From: Mark Yandell
> Sent: Friday, February 21, 2014 7:32 PM
> To: Daniel Ence
> Subject: FW: I am a PhD candidate at NMSU and have a question about maker2
>
> Mark Yandell
> Professor of Human Genetics
> H.A. & Edna Benning Presidential Endowed Chair
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ph:801-587-7707
>
> ________________________________________
> From: Joseph Said [joesaid at nmsu.edu<mailto:joesaid at nmsu.edu>]
> Sent: Friday, February 21, 2014 5:18 PM
> To: Mark Yandell
> Subject: I am a PhD candidate at NMSU and have a question about maker2
>
> Dear Dr. Yandell,
>
> I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me.
>
> Thank you,
> Joe
>
> Sent from my iPad

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140222/2fbf1dbc/attachment.html>

From dence at genetics.utah.edu  Sat Feb 22 17:21:51 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Sat, 22 Feb 2014 23:21:51 +0000
Subject: [maker-devel] I am a PhD candidate at NMSU and have a question
	about maker2
In-Reply-To: <CA+ebk3=2mJi_1wxy5gnkOb4syEVZ14Pcj_bGRVcq=uHgySPmqQ@mail.gmail.com>
References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu>
	<7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66E9C@mxb2.hg.genetics.utah.edu>
	<d5533a5c463b498e877651cd01820309@BY2PR01MB506.prod.exchangelabs.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66ECE@mxb2.hg.genetics.utah.edu>
	<6FA1C2F9-68A0-4154-8825-7B502E4762BF@nmsu.edu>
	<CA+ebk3=kXzXEH+DVjKFvMNt689-Gwjw-+6GtySaMG_gZLQ5XvA@mail.gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66F8F@mxb2.hg.genetics.utah.edu>,
	<CA+ebk3=2mJi_1wxy5gnkOb4syEVZ14Pcj_bGRVcq=uHgySPmqQ@mail.gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D66FAB@mxb2.hg.genetics.utah.edu>

Hi Hua, will you upload the genome file to this URL?
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=170
I am more concerned that MAKER didn't find the gene in the whole genome than in the 60bp substring. I think that MAKER needs more sequence than that to annotate a gene model.

Will you also upload the MAKER output and datastore from the MAKER run?

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: Hua Zhong [zh9118 at gmail.com]
Sent: Saturday, February 22, 2014 4:00 PM
To: Daniel Ence
Cc: maker-devel at yandell-lab.org; Joseph Said; Joe Song
Subject: RE: I am a PhD candidate at NMSU and have a question about maker2


The long file we used is a whole genome. Quite huge a file. I am not able to send that. Sorry. But in the simple test i told you, the nucleotide sequence sent you is consider to be the genome file, and protein sequence is another input. There two are what we want to blast against to each other to see if Maker2 works well.
Thanks.

On Feb 22, 2014 3:51 PM, "Daniel Ence" <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>> wrote:
Hi,

Will you send me the long file that you were trying to blast against?

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: Hua Zhong [zh9118 at gmail.com<mailto:zh9118 at gmail.com>]
Sent: Saturday, February 22, 2014 10:46 AM
To: Daniel Ence
Cc: Joe Song; Joseph Said
Subject: Re: I am a PhD candidate at NMSU and have a question about maker2

hi all,
Attached are the three configuration files and two input files, which are used to predict something between the genome and protein. For a simple test, we used one short sequence about 60bp and its translated protein sequence as inputs. But got nothing returned. What's more, we did test long genome sequence as one input as well, but still got nothing. I am not sure what's the reason cause this result.
Thanks a lot for help.

Hua


On Fri, Feb 21, 2014 at 9:31 PM, Joseph Said <joesaid at nmsu.edu<mailto:joesaid at nmsu.edu>> wrote:
Hi Daniel,

I do not have the exact files with me right now, but my coauthors on the paper I am working on have been copied on this email. Hua can send you those files. Thank you for being very helpful especially on a Friday night.

Thanks,
Joe

Sent from my iPad

> On Feb 21, 2014, at 9:27 PM, "Daniel Ence" <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>> wrote:
>
> Hi Joe,
>
> MAKER runs blast from your local system (or your server where MAKER is installed), and it blasts evidence that the user supplies in the "est" and "protein" settings. The est and protein settings are set in the maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file and the specific blast settings are in the "maker_bopts.ctl" file.
>
> Will you attach those file to your reply, so we can make sure that the settings are set up correctly?
>
> Thanks,
> Daniel
>
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ________________________________________
> From: Joseph Said [joesaid at nmsu.edu<mailto:joesaid at nmsu.edu>]
> Sent: Friday, February 21, 2014 7:44 PM
> To: Daniel Ence
> Subject: RE: I am a PhD candidate at NMSU and have a question about maker2
>
> Hi Daniel,
>
> Thank you for getting back to me so quickly. I am using the cotton Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I believe maker2 just calls BLAST from NCBI's page. So when I search the cotton genome it returns zero hits. But then I used a known cotton gene as a test and ran a search and also returned zero hits. I am not sure what the problem is but it seems like the protocol that should be returning the results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits. I can a BLAST standalone and came up with hits for both my gene of interest and the control test gene and came up with results.
>
> Thanks,
> Joe
> ________________________________________
> From: Daniel Ence <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>>
> Sent: Friday, February 21, 2014 7:38 PM
> To: Joseph Said
> Cc: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
> Subject: RE: I am a PhD candidate at NMSU and have a question about maker2
>
> Hi Joe,
>
> Will you upload your control files and data at this URL?
> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169
>
> Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene?
>
> I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues.
>
> Thanks,
> Daniel
>
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ________________________________________
> From: Mark Yandell
> Sent: Friday, February 21, 2014 7:32 PM
> To: Daniel Ence
> Subject: FW: I am a PhD candidate at NMSU and have a question about maker2
>
> Mark Yandell
> Professor of Human Genetics
> H.A. & Edna Benning Presidential Endowed Chair
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ph:801-587-7707<tel:801-587-7707>
>
> ________________________________________
> From: Joseph Said [joesaid at nmsu.edu<mailto:joesaid at nmsu.edu>]
> Sent: Friday, February 21, 2014 5:18 PM
> To: Mark Yandell
> Subject: I am a PhD candidate at NMSU and have a question about maker2
>
> Dear Dr. Yandell,
>
> I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me.
>
> Thank you,
> Joe
>
> Sent from my iPad

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140222/0033879e/attachment.html>

From mikael.durling at slu.se  Sun Feb 23 10:57:09 2014
From: mikael.durling at slu.se (=?iso-8859-1?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Sun, 23 Feb 2014 16:57:09 +0000
Subject: [maker-devel] Maker predicting fusion genes?
Message-ID: <4CFD158A-DE75-4756-AD05-4CBF99BAF72D@slu.se>

Dear list and maker developers,

I was browsing the results of a recent maker run, focusing on differences between this run with the a recent maker (svn r1067) and a previous run with svn revision 1022 (I recall). One of the differences I found was a gene lost in the new prediction set, but replaced by an extended version of a previous neighbor (see http://figshare.com/articles/Maker_prediction_comparison/942300).  As you can see, there is no support for the join in the evidence. Do you have any clue to what might cause this?

Best regards,
Mikael Durling


From carsonhh at gmail.com  Sun Feb 23 14:00:50 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Sun, 23 Feb 2014 13:00:50 -0700
Subject: [maker-devel] Maker predicting fusion genes?
Message-ID: <CF2FA087.A21D%carsonhh@gmail.com>

The image doesn?t show all evidence sources, but the short answer is that
one of you evidence sources (est2genome, protein2genome, or blastx)
bridges the two regions, and when provided the bridged hint one of the
gene predictors thinks it makes sense to create a single model instead.
my guess is that it?s blastx evidence.

?Carson


On 2/23/14, 9:57 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
wrote:

>Dear list and maker developers,
>
>I was browsing the results of a recent maker run, focusing on differences
>between this run with the a recent maker (svn r1067) and a previous run
>with svn revision 1022 (I recall). One of the differences I found was a
>gene lost in the new prediction set, but replaced by an extended version
>of a previous neighbor (see
>http://figshare.com/articles/Maker_prediction_comparison/942300).  As you
>can see, there is no support for the join in the evidence. Do you have
>any clue to what might cause this?
>
>Best regards,
>Mikael Durling
>
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From mikael.durling at slu.se  Sun Feb 23 15:14:00 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Sun, 23 Feb 2014 21:14:00 +0000
Subject: [maker-devel] Maker predicting fusion genes?
In-Reply-To: <CF2FA087.A21D%carsonhh@gmail.com>
References: <CF2FA087.A21D%carsonhh@gmail.com>
Message-ID: <7CCC5270-93B9-4E5A-9687-26A1BF0EB1F8@slu.se>

Ok, do you by that imply that the predictions that end up in the gff3 output from the ab initio predictors (snap_masked, augustus_masked, and genemark), are not the final hinted predictions? Otherwise, I?m sorry that I can?t follow your reasoning. I checked my gff file, and there is no evidence there to support the bridge, as far as I can tell (See attached gff of the region or http://figshare.com/articles/Maker_prediction/942301 where all evidence is plotted).

Mikael


23 feb 2014 kl. 21:00 skrev Carson Holt <carsonhh at gmail.com>:

> The image doesn?t show all evidence sources, but the short answer is that
> one of you evidence sources (est2genome, protein2genome, or blastx)
> bridges the two regions, and when provided the bridged hint one of the
> gene predictors thinks it makes sense to create a single model instead.
> my guess is that it?s blastx evidence.
>
> ?Carson
>
>
> On 2/23/14, 9:57 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
> wrote:
>
>> Dear list and maker developers,
>>
>> I was browsing the results of a recent maker run, focusing on differences
>> between this run with the a recent maker (svn r1067) and a previous run
>> with svn revision 1022 (I recall). One of the differences I found was a
>> gene lost in the new prediction set, but replaced by an extended version
>> of a previous neighbor (see
>> http://figshare.com/articles/Maker_prediction_comparison/942300).  As you
>> can see, there is no support for the join in the evidence. Do you have
>> any clue to what might cause this?
>>
>> Best regards,
>> Mikael Durling
>>
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140223/240ecba4/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: region.gff3
Type: application/octet-stream
Size: 19612 bytes
Desc: region.gff3
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140223/240ecba4/attachment.obj>

From hedgyx at yahoo.com  Mon Feb 24 01:02:41 2014
From: hedgyx at yahoo.com (Megan)
Date: Sun, 23 Feb 2014 23:02:41 -0800 (PST)
Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides
Message-ID: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com>

Maker folks,
I am re-annotating a single contig and I am having a few problems.

First, I am having trouble passing through a Maker derived gff (from Maker 2.09, with some modifications to gene names and functional information added).  The gff file passes the modencode validator but Maker always fails on the first gene in the file, regardless of which gene comes first.  So it appears to be a systematic error across the entire file.  The Maker error is "Check your input GFF3 file for errors! (from GFFDB)".   I have tried Maker 2.10 and 2.31, using both genome_gff with model_pass=1 and pred_gff.  Attached is a gff with the first 2 genes.  

Second, when I updated to Maker 2.31, Maker now complains that my EST fasta file has nucleotides that are not supported [RYKMSWBDHV].  It suggests "set -fix_nucleotides on the command line to fix this automatically".  Is the -fix_nucleotides a Maker flag?  What exactly does it do?  Does it remove the entire sequence or replace ambiguous bases with a randomly selected one?  Half of my 20k ESTs contain these characters, so I don't want to throw them out entirely.  

Also, just curious, has Maker never supported these characters but just never complained?  I used this EST data set with Maker 2.09.  I did note poor EST coverage, but thought it was an issue with the EST data itself.

I appreciate any suggestions.
Thanks,
Megan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: part_passthru.gff
Type: application/octet-stream
Size: 4363 bytes
Desc: not available
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140223/3950a0b4/attachment.obj>

From zh9118 at gmail.com  Sat Feb 22 17:00:28 2014
From: zh9118 at gmail.com (Hua Zhong)
Date: Sat, 22 Feb 2014 16:00:28 -0700
Subject: [maker-devel] I am a PhD candidate at NMSU and have a question
	about maker2
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D66F8F@mxb2.hg.genetics.utah.edu>
References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu>
	<7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66E9C@mxb2.hg.genetics.utah.edu>
	<d5533a5c463b498e877651cd01820309@BY2PR01MB506.prod.exchangelabs.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66ECE@mxb2.hg.genetics.utah.edu>
	<6FA1C2F9-68A0-4154-8825-7B502E4762BF@nmsu.edu>
	<CA+ebk3=kXzXEH+DVjKFvMNt689-Gwjw-+6GtySaMG_gZLQ5XvA@mail.gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66F8F@mxb2.hg.genetics.utah.edu>
Message-ID: <CA+ebk3=2mJi_1wxy5gnkOb4syEVZ14Pcj_bGRVcq=uHgySPmqQ@mail.gmail.com>

The long file we used is a whole genome. Quite huge a file. I am not able
to send that. Sorry. But in the simple test i told you, the nucleotide
sequence sent you is consider to be the genome file, and protein sequence
is another input. There two are what we want to blast against to each other
to see if Maker2 works well.
Thanks.
On Feb 22, 2014 3:51 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>  Hi,
>
>  Will you send me the long file that you were trying to blast against?
>
>  Thanks,
> Daniel
>
>  Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
>   ------------------------------
> *From:* Hua Zhong [zh9118 at gmail.com]
> *Sent:* Saturday, February 22, 2014 10:46 AM
> *To:* Daniel Ence
> *Cc:* Joe Song; Joseph Said
> *Subject:* Re: I am a PhD candidate at NMSU and have a question about
> maker2
>
>   hi all,
> Attached are the three configuration files and two input files, which are
> used to predict something between the genome and protein. For a simple
> test, we used one short sequence about 60bp and its translated protein
> sequence as inputs. But got nothing returned. What's more, we did test long
> genome sequence as one input as well, but still got nothing. I am not sure
> what's the reason cause this result.
> Thanks a lot for help.
>
>  Hua
>
>
>
>
> On Fri, Feb 21, 2014 at 9:31 PM, Joseph Said <joesaid at nmsu.edu> wrote:
>
>> Hi Daniel,
>>
>> I do not have the exact files with me right now, but my coauthors on the
>> paper I am working on have been copied on this email. Hua can send you
>> those files. Thank you for being very helpful especially on a Friday night.
>>
>> Thanks,
>> Joe
>>
>> Sent from my iPad
>>
>> > On Feb 21, 2014, at 9:27 PM, "Daniel Ence" <dence at genetics.utah.edu>
>> wrote:
>> >
>> > Hi Joe,
>> >
>> > MAKER runs blast from your local system (or your server where MAKER is
>> installed), and it blasts evidence that the user supplies in the "est" and
>> "protein" settings. The est and protein settings are set in the
>> maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file
>> and the specific blast settings are in the "maker_bopts.ctl" file.
>> >
>> > Will you attach those file to your reply, so we can make sure that the
>> settings are set up correctly?
>> >
>> > Thanks,
>> > Daniel
>> >
>> >
>> > Daniel Ence
>> > Graduate Student
>> > Eccles Institute of Human Genetics
>> > University of Utah
>> > 15 North 2030 East, Room 2100
>> > Salt Lake City, UT 84112-5330
>> > ________________________________________
>> > From: Joseph Said [joesaid at nmsu.edu]
>> > Sent: Friday, February 21, 2014 7:44 PM
>> > To: Daniel Ence
>> > Subject: RE: I am a PhD candidate at NMSU and have a question about
>> maker2
>> >
>> > Hi Daniel,
>> >
>> > Thank you for getting back to me so quickly. I am using the cotton
>> Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the
>> GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I
>> believe maker2 just calls BLAST from NCBI's page. So when I search the
>> cotton genome it returns zero hits. But then I used a known cotton gene as
>> a test and ran a search and also returned zero hits. I am not sure what the
>> problem is but it seems like the protocol that should be returning the
>> results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits.
>> I can a BLAST standalone and came up with hits for both my gene of interest
>> and the control test gene and came up with results.
>> >
>> > Thanks,
>> > Joe
>> > ________________________________________
>> > From: Daniel Ence <dence at genetics.utah.edu>
>> > Sent: Friday, February 21, 2014 7:38 PM
>> > To: Joseph Said
>> > Cc: maker-devel at yandell-lab.org
>> > Subject: RE: I am a PhD candidate at NMSU and have a question about
>> maker2
>> >
>> > Hi Joe,
>> >
>> > Will you upload your control files and data at this URL?
>> > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169
>> >
>> > Also, what version of MAKER and blast are you using? And which file are
>> you using for the known arabidopsis gene?
>> >
>> > I've copied this email to the maker-development list, which is a really
>> good resource for trouble-shooting MAKER issues.
>> >
>> > Thanks,
>> > Daniel
>> >
>> >
>> > Daniel Ence
>> > Graduate Student
>> > Eccles Institute of Human Genetics
>> > University of Utah
>> > 15 North 2030 East, Room 2100
>> > Salt Lake City, UT 84112-5330
>> > ________________________________________
>> > From: Mark Yandell
>> > Sent: Friday, February 21, 2014 7:32 PM
>> > To: Daniel Ence
>> > Subject: FW: I am a PhD candidate at NMSU and have a question about
>> maker2
>> >
>> > Mark Yandell
>> > Professor of Human Genetics
>> > H.A. & Edna Benning Presidential Endowed Chair
>> > Eccles Institute of Human Genetics
>> > University of Utah
>> > 15 North 2030 East, Room 2100
>> > Salt Lake City, UT 84112-5330
>> > ph:801-587-7707
>> >
>> > ________________________________________
>> > From: Joseph Said [joesaid at nmsu.edu]
>> > Sent: Friday, February 21, 2014 5:18 PM
>> > To: Mark Yandell
>> > Subject: I am a PhD candidate at NMSU and have a question about maker2
>> >
>> > Dear Dr. Yandell,
>> >
>> > I am a molecular biologist at NMSU. I am trying to use maker2 with the
>> cotton genome, and search an Arabidopsis gene against it. I think there is
>> a problem with the blast component because zero results are returned. I
>> tried troubleshooting by searching a known gene and still returned zero
>> results. Is this a common problem maybe with the pipeline? I would
>> appreciate any ideas you might have to help me.
>> >
>> > Thank you,
>> > Joe
>> >
>> > Sent from my iPad
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140222/57e1804c/attachment.html>

From carsonhh at gmail.com  Mon Feb 24 12:18:18 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Feb 2014 11:18:18 -0700
Subject: [maker-devel] gff pass thru problem and unsupported EST
 nucleotides
In-Reply-To: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com>
References: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com>
Message-ID: <CF30D6EC.A2CC%carsonhh@gmail.com>

The -fix_nucleotides flag is added to the command line (I.e. maker
-fix_nucleotides flag).  It is there so you are aware that there is an
issue with your fasta file, that will cause things downstream to fail.
MAKER can fix the errors for you, but first it gives a warning designed to
make you look at the file and validate it.  Why would you want to do this?
 For example, what if you provided protein sequence to the EST option
accidentally, you wouldn?t want MAKER to just proceed.  You want a warning
so you can check first.  If your file is in fact EST data, then set the
flag and those characters will be changed to N?s in the fixed fasta
sequence, otherwise those characters will cause errors in downstream tools
like exonerate, and even some downstream GMOD tools, so they can?t be
allowed to remain as is.

For the GFF3 file, there is almost definitely a logic issue in the file
(mod encode validator won?t check for those).  This can be from prior
manipulation of the GFF3 file.  For example, IDs for a gene that are the
same across two contigs (technically valid but a logic error).  The GFF3
error message will normally give the ID of the feature causing the issue.

I could also take a look for you.  You can upload the GFF3 file here ?>
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
Click on 'new guest account' then e-mail me back you guest ID, so I know
which files to review.

Thanks,
Carson


On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com> wrote:

>Maker folks,
>I am re-annotating a single contig and I am having a few problems.
>
>First, I am having trouble passing through a Maker derived gff (from
>Maker 2.09, with some modifications to gene names and functional
>information added).  The gff file passes the modencode validator but
>Maker always fails on the first gene in the file, regardless of which
>gene comes first.  So it appears to be a systematic error across the
>entire file.  The Maker error is "Check your input GFF3 file for errors!
>(from GFFDB)".   I have tried Maker 2.10 and 2.31, using both genome_gff
>with model_pass=1 and pred_gff.  Attached is a gff with the first 2
>genes.  
>
>Second, when I updated to Maker 2.31, Maker now complains that my EST
>fasta file has nucleotides that are not supported [RYKMSWBDHV].  It
>suggests "set -fix_nucleotides on the command line to fix this
>automatically".  Is the -fix_nucleotides a Maker flag?  What exactly does
>it do?  Does it remove the entire sequence or replace ambiguous bases
>with a randomly selected one?  Half of my 20k ESTs contain these
>characters, so I don't want to throw them out entirely.
>
>Also, just curious, has Maker never supported these characters but just
>never complained?  I used this EST data set with Maker 2.09.  I did note
>poor EST coverage, but thought it was an issue with the EST data itself.
>
>I appreciate any suggestions.
>Thanks,
>Megan_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From dence at genetics.utah.edu  Mon Feb 24 12:31:47 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Mon, 24 Feb 2014 18:31:47 +0000
Subject: [maker-devel] gff pass thru problem and unsupported EST
 nucleotides
In-Reply-To: <CF30D6EC.A2CC%carsonhh@gmail.com>
References: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com>,
	<CF30D6EC.A2CC%carsonhh@gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D671BB@mxb2.hg.genetics.utah.edu>

Hi Megan, 

One problem with the GFF3 that you attached is that the ID's for the CDS features are being made wrong. All of the CDS features for a given mRNA or transcript should have the same ID. The CDS features in your GFF3 have IDs that use the exon name. 

You can fix it with this command-line perl:
cat part_passthru.gff | perl -ane 'if(/\tCDS\t/){ chomp; /Parent=([\S]+)/; my $parent=$1; s/ID=([^\;]+)/ID=$parent-cds/; print "$_\n"}else{print $_}' > fixed.gff3

It just fixes the ID attributes in all of the CDS features. Try it on the test gff3 you sent and let me know if it works. I can't test it myself without the fasta file that you are annotating. 

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Carson Holt [carsonhh at gmail.com]
Sent: Monday, February 24, 2014 11:18 AM
To: Megan; maker-devel at yandell-lab.org
Subject: Re: [maker-devel] gff pass thru problem and unsupported EST nucleotides

The -fix_nucleotides flag is added to the command line (I.e. maker
-fix_nucleotides flag).  It is there so you are aware that there is an
issue with your fasta file, that will cause things downstream to fail.
MAKER can fix the errors for you, but first it gives a warning designed to
make you look at the file and validate it.  Why would you want to do this?
 For example, what if you provided protein sequence to the EST option
accidentally, you wouldn?t want MAKER to just proceed.  You want a warning
so you can check first.  If your file is in fact EST data, then set the
flag and those characters will be changed to N?s in the fixed fasta
sequence, otherwise those characters will cause errors in downstream tools
like exonerate, and even some downstream GMOD tools, so they can?t be
allowed to remain as is.

For the GFF3 file, there is almost definitely a logic issue in the file
(mod encode validator won?t check for those).  This can be from prior
manipulation of the GFF3 file.  For example, IDs for a gene that are the
same across two contigs (technically valid but a logic error).  The GFF3
error message will normally give the ID of the feature causing the issue.

I could also take a look for you.  You can upload the GFF3 file here ?>
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
Click on 'new guest account' then e-mail me back you guest ID, so I know
which files to review.

Thanks,
Carson


On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com> wrote:

>Maker folks,
>I am re-annotating a single contig and I am having a few problems.
>
>First, I am having trouble passing through a Maker derived gff (from
>Maker 2.09, with some modifications to gene names and functional
>information added).  The gff file passes the modencode validator but
>Maker always fails on the first gene in the file, regardless of which
>gene comes first.  So it appears to be a systematic error across the
>entire file.  The Maker error is "Check your input GFF3 file for errors!
>(from GFFDB)".   I have tried Maker 2.10 and 2.31, using both genome_gff
>with model_pass=1 and pred_gff.  Attached is a gff with the first 2
>genes.
>
>Second, when I updated to Maker 2.31, Maker now complains that my EST
>fasta file has nucleotides that are not supported [RYKMSWBDHV].  It
>suggests "set -fix_nucleotides on the command line to fix this
>automatically".  Is the -fix_nucleotides a Maker flag?  What exactly does
>it do?  Does it remove the entire sequence or replace ambiguous bases
>with a randomly selected one?  Half of my 20k ESTs contain these
>characters, so I don't want to throw them out entirely.
>
>Also, just curious, has Maker never supported these characters but just
>never complained?  I used this EST data set with Maker 2.09.  I did note
>poor EST coverage, but thought it was an issue with the EST data itself.
>
>I appreciate any suggestions.
>Thanks,
>Megan_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Mon Feb 24 12:34:28 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Feb 2014 11:34:28 -0700
Subject: [maker-devel] gff pass thru problem and unsupported EST
 nucleotides
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D671BB@mxb2.hg.genetics.utah.edu>
References: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com>
	<CF30D6EC.A2CC%carsonhh@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D671BB@mxb2.hg.genetics.utah.edu>
Message-ID: <CF30DE6B.A2F6%carsonhh@gmail.com>

Actually that is not true.  CDS IDs can be the same or different.  MAKER
doesn?t care either way.  Both are valid in GFF3.  Having the same ID just
allows then to be put together by some GMOD viewers without having to go
through a container feature.

?Carson

On 2/24/14, 11:31 AM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hi Megan, 
>
>One problem with the GFF3 that you attached is that the ID's for the CDS
>features are being made wrong. All of the CDS features for a given mRNA
>or transcript should have the same ID. The CDS features in your GFF3 have
>IDs that use the exon name.
>
>You can fix it with this command-line perl:
>cat part_passthru.gff | perl -ane 'if(/\tCDS\t/){ chomp;
>/Parent=([\S]+)/; my $parent=$1; s/ID=([^\;]+)/ID=$parent-cds/; print
>"$_\n"}else{print $_}' > fixed.gff3
>
>It just fixes the ID attributes in all of the CDS features. Try it on the
>test gff3 you sent and let me know if it works. I can't test it myself
>without the fasta file that you are annotating.
>
>Thanks,
>Daniel
>
>Daniel Ence
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>Carson Holt [carsonhh at gmail.com]
>Sent: Monday, February 24, 2014 11:18 AM
>To: Megan; maker-devel at yandell-lab.org
>Subject: Re: [maker-devel] gff pass thru problem and unsupported EST
>nucleotides
>
>The -fix_nucleotides flag is added to the command line (I.e. maker
>-fix_nucleotides flag).  It is there so you are aware that there is an
>issue with your fasta file, that will cause things downstream to fail.
>MAKER can fix the errors for you, but first it gives a warning designed to
>make you look at the file and validate it.  Why would you want to do this?
> For example, what if you provided protein sequence to the EST option
>accidentally, you wouldn?t want MAKER to just proceed.  You want a warning
>so you can check first.  If your file is in fact EST data, then set the
>flag and those characters will be changed to N?s in the fixed fasta
>sequence, otherwise those characters will cause errors in downstream tools
>like exonerate, and even some downstream GMOD tools, so they can?t be
>allowed to remain as is.
>
>For the GFF3 file, there is almost definitely a logic issue in the file
>(mod encode validator won?t check for those).  This can be from prior
>manipulation of the GFF3 file.  For example, IDs for a gene that are the
>same across two contigs (technically valid but a logic error).  The GFF3
>error message will normally give the ID of the feature causing the issue.
>
>I could also take a look for you.  You can upload the GFF3 file here ?>
>http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>Click on 'new guest account' then e-mail me back you guest ID, so I know
>which files to review.
>
>Thanks,
>Carson
>
>
>
>On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com> wrote:
>
>>Maker folks,
>>I am re-annotating a single contig and I am having a few problems.
>>
>>First, I am having trouble passing through a Maker derived gff (from
>>Maker 2.09, with some modifications to gene names and functional
>>information added).  The gff file passes the modencode validator but
>>Maker always fails on the first gene in the file, regardless of which
>>gene comes first.  So it appears to be a systematic error across the
>>entire file.  The Maker error is "Check your input GFF3 file for errors!
>>(from GFFDB)".   I have tried Maker 2.10 and 2.31, using both genome_gff
>>with model_pass=1 and pred_gff.  Attached is a gff with the first 2
>>genes.
>>
>>Second, when I updated to Maker 2.31, Maker now complains that my EST
>>fasta file has nucleotides that are not supported [RYKMSWBDHV].  It
>>suggests "set -fix_nucleotides on the command line to fix this
>>automatically".  Is the -fix_nucleotides a Maker flag?  What exactly does
>>it do?  Does it remove the entire sequence or replace ambiguous bases
>>with a randomly selected one?  Half of my 20k ESTs contain these
>>characters, so I don't want to throw them out entirely.
>>
>>Also, just curious, has Maker never supported these characters but just
>>never complained?  I used this EST data set with Maker 2.09.  I did note
>>poor EST coverage, but thought it was an issue with the EST data itself.
>>
>>I appreciate any suggestions.
>>Thanks,
>>Megan_______________________________________________
>>maker-devel mailing list
>>maker-devel at box290.bluehost.com
>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Mon Feb 24 14:59:12 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Feb 2014 13:59:12 -0700
Subject: [maker-devel] gff pass thru problem and unsupported EST
 nucleotides
In-Reply-To: <1393273971.41635.YahooMailBasic@web162205.mail.bf1.yahoo.com>
References: <CF30D6EC.A2CC%carsonhh@gmail.com>
	<1393273971.41635.YahooMailBasic@web162205.mail.bf1.yahoo.com>
Message-ID: <CF30FEE0.A32D%carsonhh@gmail.com>

I found the issue.  You have non-ascii characters at the end of almost
every line.  Because they are happening within the Parent= tag, they then
become part of the Parent ID when the file is read.

So instead of "HERA000031-RA? you get ?> "HERA000031-RA\cM? as the Parent
ID.

?\cM? is a meta-return.

I ran the attached script to remove these characters (perl purify
<gff3_file>), and then it works.  Make sure to remove the
.../Hera_Cr_HmelHybd_Nov2013.maker.output/Hera_Cr_HmelHybd_Nov2013.db file
to force the GFF3 database to be rebuilt after fixing the file when you
rerun MAKER.

Thanks,
Carson


On 2/24/14, 1:32 PM, "Megan" <hedgyx at yahoo.com> wrote:

>Hi Carson and Daniel,
>
>Thanks for your suggestions.  I have looked at the gff file, but I do not
>see any obvious errors.  I have uploaded the files to your website.  The
>reference fasta is there, the full gff, and a single gene gff that also
>causes an error.  If I remove that gene from the full gff, then the error
>is on the next gene in the file, so it appears to be a systematic problem
>throughout the gff.  The gff was generated by Maker, but I may have
>messed it up when I modified it to rename genes and add functional
>information.  I checked with cat -te, but don't see any obvious
>formatting errors.
>
>Thanks!
>Megan
>
>
>--------------------------------------------
>On Mon, 2/24/14, Carson Holt <carsonhh at gmail.com> wrote:
>
> Subject: Re: [maker-devel] gff pass thru problem and unsupported EST
>nucleotides
> To: "Megan" <hedgyx at yahoo.com>, maker-devel at yandell-lab.org
> Date: Monday, February 24, 2014, 10:18 AM
> 
> The -fix_nucleotides flag is added to
> the command line (I.e. maker
> -fix_nucleotides flag).  It is there so you are aware
> that there is an
> issue with your fasta file, that will cause things
> downstream to fail.
> MAKER can fix the errors for you, but first it gives a
> warning designed to
> make you look at the file and validate it.  Why would
> you want to do this?
>  For example, what if you provided protein sequence to the
> EST option
> accidentally, you wouldn?t want MAKER to just
> proceed.  You want a warning
> so you can check first.  If your file is in fact EST
> data, then set the
> flag and those characters will be changed to N?s in the
> fixed fasta
> sequence, otherwise those characters will cause errors in
> downstream tools
> like exonerate, and even some downstream GMOD tools, so they
> can?t be
> allowed to remain as is.
> 
> For the GFF3 file, there is almost definitely a logic issue
> in the file
> (mod encode validator won?t check for those).  This
> can be from prior
> manipulation of the GFF3 file.  For example, IDs for a
> gene that are the
> same across two contigs (technically valid but a logic
> error).  The GFF3
> error message will normally give the ID of the feature
> causing the issue.
> 
> I could also take a look for you.  You can upload the
> GFF3 file here ?>
> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
> Click on 'new guest account' then e-mail me back you guest
> ID, so I know
> which files to review.
> 
> Thanks,
> Carson
> 
> 
> 
> On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com>
> wrote:
> 
> >Maker folks,
> >I am re-annotating a single contig and I am having a few
> problems.
> >
> >First, I am having trouble passing through a Maker
> derived gff (from
> >Maker 2.09, with some modifications to gene names and
> functional
> >information added).  The gff file passes the
> modencode validator but
> >Maker always fails on the first gene in the file,
> regardless of which
> >gene comes first.  So it appears to be a systematic
> error across the
> >entire file.  The Maker error is "Check your input
> GFF3 file for errors!
> >(from GFFDB)".   I have tried Maker 2.10
> and 2.31, using both genome_gff
> >with model_pass=1 and pred_gff.  Attached is a gff
> with the first 2
> >genes.  
> >
> >Second, when I updated to Maker 2.31, Maker now
> complains that my EST
> >fasta file has nucleotides that are not supported
> [RYKMSWBDHV].  It
> >suggests "set -fix_nucleotides on the command line to
> fix this
> >automatically".  Is the -fix_nucleotides a Maker
> flag?  What exactly does
> >it do?  Does it remove the entire sequence or
> replace ambiguous bases
> >with a randomly selected one?  Half of my 20k ESTs
> contain these
> >characters, so I don't want to throw them out entirely.
> >
> >Also, just curious, has Maker never supported these
> characters but just
> >never complained?  I used this EST data set with
> Maker 2.09.  I did note
> >poor EST coverage, but thought it was an issue with the
> EST data itself.
> >
> >I appreciate any suggestions.
> >Thanks,
> >Megan_______________________________________________
> >maker-devel mailing list
> >maker-devel at box290.bluehost.com
> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> 
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: purify
Type: application/octet-stream
Size: 1965 bytes
Desc: not available
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140224/a1582e7d/attachment.obj>

From carsonhh at gmail.com  Mon Feb 24 15:03:00 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Feb 2014 14:03:00 -0700
Subject: [maker-devel] gff pass thru problem and unsupported EST
 nucleotides
In-Reply-To: <CF30FEE0.A32D%carsonhh@gmail.com>
References: <CF30D6EC.A2CC%carsonhh@gmail.com>
	<1393273971.41635.YahooMailBasic@web162205.mail.bf1.yahoo.com>
	<CF30FEE0.A32D%carsonhh@gmail.com>
Message-ID: <CF310121.A33F%carsonhh@gmail.com>

One more thing.  You must give the file to pred_gff or model_gff.  It is
no longer strictly a MAKER file, as many of the source columns read ?.?
meaning it has been edited by Apollo or another editor.  So it will not be
guaranteed to be recognized by genome_gff, because many of the source tags
have changed.

Thanks,
Carson


On 2/24/14, 1:59 PM, "Carson Holt" <carsonhh at gmail.com> wrote:

>I found the issue.  You have non-ascii characters at the end of almost
>every line.  Because they are happening within the Parent= tag, they then
>become part of the Parent ID when the file is read.
>
>So instead of "HERA000031-RA? you get ?> "HERA000031-RA\cM? as the Parent
>ID.
>
>?\cM? is a meta-return.
>
>I ran the attached script to remove these characters (perl purify
><gff3_file>), and then it works.  Make sure to remove the
>.../Hera_Cr_HmelHybd_Nov2013.maker.output/Hera_Cr_HmelHybd_Nov2013.db file
>to force the GFF3 database to be rebuilt after fixing the file when you
>rerun MAKER.
>
>Thanks,
>Carson
>
>
>
>
>On 2/24/14, 1:32 PM, "Megan" <hedgyx at yahoo.com> wrote:
>
>>Hi Carson and Daniel,
>>
>>Thanks for your suggestions.  I have looked at the gff file, but I do not
>>see any obvious errors.  I have uploaded the files to your website.  The
>>reference fasta is there, the full gff, and a single gene gff that also
>>causes an error.  If I remove that gene from the full gff, then the error
>>is on the next gene in the file, so it appears to be a systematic problem
>>throughout the gff.  The gff was generated by Maker, but I may have
>>messed it up when I modified it to rename genes and add functional
>>information.  I checked with cat -te, but don't see any obvious
>>formatting errors.
>>
>>Thanks!
>>Megan
>>
>>
>>--------------------------------------------
>>On Mon, 2/24/14, Carson Holt <carsonhh at gmail.com> wrote:
>>
>> Subject: Re: [maker-devel] gff pass thru problem and unsupported EST
>>nucleotides
>> To: "Megan" <hedgyx at yahoo.com>, maker-devel at yandell-lab.org
>> Date: Monday, February 24, 2014, 10:18 AM
>> 
>> The -fix_nucleotides flag is added to
>> the command line (I.e. maker
>> -fix_nucleotides flag).  It is there so you are aware
>> that there is an
>> issue with your fasta file, that will cause things
>> downstream to fail.
>> MAKER can fix the errors for you, but first it gives a
>> warning designed to
>> make you look at the file and validate it.  Why would
>> you want to do this?
>>  For example, what if you provided protein sequence to the
>> EST option
>> accidentally, you wouldn?t want MAKER to just
>> proceed.  You want a warning
>> so you can check first.  If your file is in fact EST
>> data, then set the
>> flag and those characters will be changed to N?s in the
>> fixed fasta
>> sequence, otherwise those characters will cause errors in
>> downstream tools
>> like exonerate, and even some downstream GMOD tools, so they
>> can?t be
>> allowed to remain as is.
>> 
>> For the GFF3 file, there is almost definitely a logic issue
>> in the file
>> (mod encode validator won?t check for those).  This
>> can be from prior
>> manipulation of the GFF3 file.  For example, IDs for a
>> gene that are the
>> same across two contigs (technically valid but a logic
>> error).  The GFF3
>> error message will normally give the ID of the feature
>> causing the issue.
>> 
>> I could also take a look for you.  You can upload the
>> GFF3 file here ?>
>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>> Click on 'new guest account' then e-mail me back you guest
>> ID, so I know
>> which files to review.
>> 
>> Thanks,
>> Carson
>> 
>> 
>> 
>> On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com>
>> wrote:
>> 
>> >Maker folks,
>> >I am re-annotating a single contig and I am having a few
>> problems.
>> >
>> >First, I am having trouble passing through a Maker
>> derived gff (from
>> >Maker 2.09, with some modifications to gene names and
>> functional
>> >information added).  The gff file passes the
>> modencode validator but
>> >Maker always fails on the first gene in the file,
>> regardless of which
>> >gene comes first.  So it appears to be a systematic
>> error across the
>> >entire file.  The Maker error is "Check your input
>> GFF3 file for errors!
>> >(from GFFDB)".   I have tried Maker 2.10
>> and 2.31, using both genome_gff
>> >with model_pass=1 and pred_gff.  Attached is a gff
>> with the first 2
>> >genes.  
>> >
>> >Second, when I updated to Maker 2.31, Maker now
>> complains that my EST
>> >fasta file has nucleotides that are not supported
>> [RYKMSWBDHV].  It
>> >suggests "set -fix_nucleotides on the command line to
>> fix this
>> >automatically".  Is the -fix_nucleotides a Maker
>> flag?  What exactly does
>> >it do?  Does it remove the entire sequence or
>> replace ambiguous bases
>> >with a randomly selected one?  Half of my 20k ESTs
>> contain these
>> >characters, so I don't want to throw them out entirely.
>> >
>> >Also, just curious, has Maker never supported these
>> characters but just
>> >never complained?  I used this EST data set with
>> Maker 2.09.  I did note
>> >poor EST coverage, but thought it was an issue with the
>> EST data itself.
>> >
>> >I appreciate any suggestions.
>> >Thanks,
>> >Megan_______________________________________________
>> >maker-devel mailing list
>> >maker-devel at box290.bluehost.com
>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
>> 
>>
>


From rbharris at uw.edu  Tue Feb 25 15:49:57 2014
From: rbharris at uw.edu (Rebecca Harris)
Date: Tue, 25 Feb 2014 13:49:57 -0800
Subject: [maker-devel] error in snap training
Message-ID: <CAESS276MjRUmto+9fkr68jRXBE9or4geWB-q4Oc5_qKsQOdnpA@mail.gmail.com>

Hey -

I'm trying to train SNAP and am running into errors. I don't have any EST
evidence, just protein. My .gff file reports 10865 genes but when I run
maker2zff  -c0 -e0 I get back empty genome files. When I run maker2zff -n,
a ton of overlap_prev_exon errors get written to the screen and then with I
get to the forge step I get an "impossible error5". Any help would be
greatly appreciated.

Thanks!
Rebecca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140225/cc68f3a6/attachment.html>

From carsonhh at gmail.com  Tue Feb 25 16:12:14 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 25 Feb 2014 15:12:14 -0700
Subject: [maker-devel] error in snap training
In-Reply-To: <CAESS276MjRUmto+9fkr68jRXBE9or4geWB-q4Oc5_qKsQOdnpA@mail.gmail.com>
References: <CAESS276MjRUmto+9fkr68jRXBE9or4geWB-q4Oc5_qKsQOdnpA@mail.gmail.com>
Message-ID: <16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com>

Make sure you are using 2.31,  and then try the maker2zff filters individually.  If the protein models are not working well, use CEGMA to generate models. It's from the same group as SNAP.  Use cegma2zff for the conversion.

--Carson

Sent from my iPhone

> On Feb 25, 2014, at 2:49 PM, Rebecca Harris <rbharris at uw.edu> wrote:
> 
> Hey - 
> 
> I'm trying to train SNAP and am running into errors. I don't have any EST evidence, just protein. My .gff file reports 10865 genes but when I run maker2zff  -c0 -e0 I get back empty genome files. When I run maker2zff -n, a ton of overlap_prev_exon errors get written to the screen and then with I get to the forge step I get an "impossible error5". Any help would be greatly appreciated.
> 
> Thanks!
> Rebecca
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From sjackman at gmail.com  Tue Feb 25 18:06:03 2014
From: sjackman at gmail.com (Shaun Jackman)
Date: Tue, 25 Feb 2014 16:06:03 -0800
Subject: [maker-devel] Mapping gene names
Message-ID: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>

Hi,

I?m annotating a genome using a closely related genome from Genbank, using
the .frn (RNA) and .faa (protein) files from Genbank as evidence to
annotate my genome. I?ve run Maker, and the annotation seems to have worked
well. Is it possible to map the names of the genes from the related species
to my annotation? I see the *map_forward* option, which applies to the
*model_gff* parameter. Is there a similar option for *est* and *protein*?

*maker_opts.ctl*

est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1

Thanks,
Shaun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140225/7ae5e966/attachment.html>

From hedgyx at yahoo.com  Tue Feb 25 18:26:11 2014
From: hedgyx at yahoo.com (Megan)
Date: Tue, 25 Feb 2014 16:26:11 -0800 (PST)
Subject: [maker-devel] gff pass thru problem and unsupported EST
	nucleotides
In-Reply-To: <CF30FEE0.A32D%carsonhh@gmail.com>
Message-ID: <1393374371.45210.YahooMailBasic@web162201.mail.bf1.yahoo.com>

Carson,

Everything ran through smoothly after removing the ^Ms.  Thanks for the help.

Megan
--------------------------------------------
On Mon, 2/24/14, Carson Holt <carsonhh at gmail.com> wrote:

 Subject: Re: [maker-devel] gff pass thru problem and unsupported EST nucleotides
 To: "Megan" <hedgyx at yahoo.com>, "Daniel Ence" <dence at genetics.utah.edu>
 Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
 Date: Monday, February 24, 2014, 12:59 PM
 
 I found the issue.? You have
 non-ascii characters at the end of almost
 every line.? Because they are happening within the
 Parent= tag, they then
 become part of the Parent ID when the file is read.
 
 So instead of "HERA000031-RA? you get ?>
 "HERA000031-RA\cM? as the Parent
 ID.
 
 ?\cM? is a meta-return.
 
 I ran the attached script to remove these characters (perl
 purify
 <gff3_file>), and then it works.? Make sure to
 remove the
 .../Hera_Cr_HmelHybd_Nov2013.maker.output/Hera_Cr_HmelHybd_Nov2013.db
 file
 to force the GFF3 database to be rebuilt after fixing the
 file when you
 rerun MAKER.
 
 Thanks,
 Carson
 
 
 On 2/24/14, 1:32 PM, "Megan" <hedgyx at yahoo.com>
 wrote:
 
 >Hi Carson and Daniel,
 >
 >Thanks for your suggestions.? I have looked at the
 gff file, but I do not
 >see any obvious errors.? I have uploaded the files
 to your website.? The
 >reference fasta is there, the full gff, and a single
 gene gff that also
 >causes an error.? If I remove that gene from the
 full gff, then the error
 >is on the next gene in the file, so it appears to be a
 systematic problem
 >throughout the gff.? The gff was generated by
 Maker, but I may have
 >messed it up when I modified it to rename genes and add
 functional
 >information.? I checked with cat -te, but don't see
 any obvious
 >formatting errors.
 >
 >Thanks!
 >Megan
 >
 >
 >--------------------------------------------
 >On Mon, 2/24/14, Carson Holt <carsonhh at gmail.com>
 wrote:
 >
 > Subject: Re: [maker-devel] gff pass thru problem and
 unsupported EST
 >nucleotides
 > To: "Megan" <hedgyx at yahoo.com>,
 maker-devel at yandell-lab.org
 > Date: Monday, February 24, 2014, 10:18 AM
 > 
 > The -fix_nucleotides flag is added to
 > the command line (I.e. maker
 > -fix_nucleotides flag).? It is there so you are
 aware
 > that there is an
 > issue with your fasta file, that will cause things
 > downstream to fail.
 > MAKER can fix the errors for you, but first it gives a
 > warning designed to
 > make you look at the file and validate it.? Why
 would
 > you want to do this?
 >? For example, what if you provided protein
 sequence to the
 > EST option
 > accidentally, you wouldn?t want MAKER to just
 > proceed.? You want a warning
 > so you can check first.? If your file is in fact
 EST
 > data, then set the
 > flag and those characters will be changed to N?s in
 the
 > fixed fasta
 > sequence, otherwise those characters will cause errors
 in
 > downstream tools
 > like exonerate, and even some downstream GMOD tools, so
 they
 > can?t be
 > allowed to remain as is.
 > 
 > For the GFF3 file, there is almost definitely a logic
 issue
 > in the file
 > (mod encode validator won?t check for those).?
 This
 > can be from prior
 > manipulation of the GFF3 file.? For example, IDs
 for a
 > gene that are the
 > same across two contigs (technically valid but a logic
 > error).? The GFF3
 > error message will normally give the ID of the feature
 > causing the issue.
 > 
 > I could also take a look for you.? You can upload
 the
 > GFF3 file here ?>
 > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
 > Click on 'new guest account' then e-mail me back you
 guest
 > ID, so I know
 > which files to review.
 > 
 > Thanks,
 > Carson
 > 
 > 
 > 
 > On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com>
 > wrote:
 > 
 > >Maker folks,
 > >I am re-annotating a single contig and I am having
 a few
 > problems.
 > >
 > >First, I am having trouble passing through a Maker
 > derived gff (from
 > >Maker 2.09, with some modifications to gene names
 and
 > functional
 > >information added).? The gff file passes the
 > modencode validator but
 > >Maker always fails on the first gene in the file,
 > regardless of which
 > >gene comes first.? So it appears to be a
 systematic
 > error across the
 > >entire file.? The Maker error is "Check your
 input
 > GFF3 file for errors!
 > >(from GFFDB)".???I have tried Maker
 2.10
 > and 2.31, using both genome_gff
 > >with model_pass=1 and pred_gff.? Attached is a
 gff
 > with the first 2
 > >genes.? 
 > >
 > >Second, when I updated to Maker 2.31, Maker now
 > complains that my EST
 > >fasta file has nucleotides that are not supported
 > [RYKMSWBDHV].? It
 > >suggests "set -fix_nucleotides on the command line
 to
 > fix this
 > >automatically".? Is the -fix_nucleotides a
 Maker
 > flag?? What exactly does
 > >it do?? Does it remove the entire sequence or
 > replace ambiguous bases
 > >with a randomly selected one?? Half of my 20k
 ESTs
 > contain these
 > >characters, so I don't want to throw them out
 entirely.
 > >
 > >Also, just curious, has Maker never supported
 these
 > characters but just
 > >never complained?? I used this EST data set
 with
 > Maker 2.09.? I did note
 > >poor EST coverage, but thought it was an issue with
 the
 > EST data itself.
 > >
 > >I appreciate any suggestions.
 > >Thanks,
 >
 >Megan_______________________________________________
 > >maker-devel mailing list
 > >maker-devel at box290.bluehost.com
 > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
 > 
 > 
 >
 
 
From carsonhh at gmail.com  Tue Feb 25 18:58:08 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 25 Feb 2014 17:58:08 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
Message-ID: <CF32868D.A42A%carsonhh@gmail.com>

There is a way.  It?s not a standard option and it?s undocumented, but if
you add est_forward=1 to the maker_opts.ctl file, then it will do just that.
The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option.  If you add tags
to your fasta headers, those can be used to guide the mapping and naming.
For example, gene_id=<some_gene>  will ensure different isoforms that share
a common gene_id get clustered into the same gene, and
maker_coor=chr1:1-10000 in the fasta header will force a particular sequence
to only be mapped against chr1 within the range of 1-10000 bp  and just
using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast
alignments of earlier transcript or protein annotations as a guide.

?Carson


From:  Shaun Jackman <sjackman at gmail.com>
Reply-To:  Shaun Jackman <sjackman at gmail.com>
Date:  Tuesday, February 25, 2014 at 5:06 PM
To:  <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Mapping gene names

Hi,

I?m annotating a genome using a closely related genome from Genbank, using
the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate
my genome. I?ve run Maker, and the annotation seems to have worked well. Is
it possible to map the names of the genes from the related species to my
annotation? I see the map_forward option, which applies to the model_gff
parameter. Is there a similar option for est and protein?

maker_opts.ctl
est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1
Thanks,
Shaun
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140225/acb85579/attachment.html>

From carsonhh at gmail.com  Tue Feb 25 19:04:48 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 25 Feb 2014 18:04:48 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CF32868D.A42A%carsonhh@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
Message-ID: <CF328AAA.A44D%carsonhh@gmail.com>

One more note.  When using this option, the score column of mRNA features
will represent how completely this gene matches the source EST/protein
(fraction coverage multiplied by % identity).  So a value of 100 means there
is perfect match.  This way if the same transcript maps to multiple
locations, then you can identify which locations is the closest match (also
works for identifying likly orthologs vs. paralogs).

?Carson


From:  Carson Holt <carsonhh at gmail.com>
Date:  Tuesday, February 25, 2014 at 5:58 PM
To:  Shaun Jackman <sjackman at gmail.com>, <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Mapping gene names

There is a way.  It?s not a standard option and it?s undocumented, but if
you add est_forward=1 to the maker_opts.ctl file, then it will do just that.
The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option.  If you add tags
to your fasta headers, those can be used to guide the mapping and naming.
For example, gene_id=<some_gene>  will ensure different isoforms that share
a common gene_id get clustered into the same gene, and
maker_coor=chr1:1-10000 in the fasta header will force a particular sequence
to only be mapped against chr1 within the range of 1-10000 bp  and just
using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast
alignments of earlier transcript or protein annotations as a guide.

?Carson


From:  Shaun Jackman <sjackman at gmail.com>
Reply-To:  Shaun Jackman <sjackman at gmail.com>
Date:  Tuesday, February 25, 2014 at 5:06 PM
To:  <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Mapping gene names

Hi,

I?m annotating a genome using a closely related genome from Genbank, using
the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate
my genome. I?ve run Maker, and the annotation seems to have worked well. Is
it possible to map the names of the genes from the related species to my
annotation? I see the map_forward option, which applies to the model_gff
parameter. Is there a similar option for est and protein?

maker_opts.ctl
est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1
Thanks,
Shaun
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m
aker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140225/bc343f94/attachment.html>

From weckalba at asu.edu  Tue Feb 25 19:36:21 2014
From: weckalba at asu.edu (Walter Eckalbar)
Date: Tue, 25 Feb 2014 17:36:21 -0800
Subject: [maker-devel] invalid gff3 format issues
Message-ID: <CANRPJScjqJDph_SMu0+8PaTMDT7aym9a3u_nhVihYa6BNxZ3AQ@mail.gmail.com>

Hi all,

I am trying to update maker annotations with PASA and encountered errors
stemming from file format issues in the gff3 file.

I put a few lines from the gff3 to highlight the issue below.  Basically,
the problem is that there are non-unique IDs for a number of the
annotations.

Is there anything that can be done to right this problem?

Thanks,

Walter

Lines from GFF3 file, repeated IDs are highlighted:


chr1    maker    gene    9377440    9432028    .    -    .
ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16
chr1    maker    mRNA    9377440    9432028    .    -    .
ID=maker-chr1-snap-gene-4.53-mRNA-1;
Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234
chr1    maker    exon    9431899    9432028    .    -    .
ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1
chr1    maker    exon    9431698    9431808    .    -    .
ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1

chr1    maker    gene    8894975    9021577    .    +    .
ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53
chr1    maker    mRNA    8894975    9021577    .    +    .
ID=maker-chr1-snap-gene-4.53-mRNA-1;
Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007
chr1    maker    exon    8894975    8895153    .    +    .
ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
chr1    maker    exon    8942215    8942531    .    +    .
ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140225/2bb3934c/attachment.html>

From dence at genetics.utah.edu  Tue Feb 25 20:02:04 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Wed, 26 Feb 2014 02:02:04 +0000
Subject: [maker-devel] invalid gff3 format issues
In-Reply-To: <CANRPJScjqJDph_SMu0+8PaTMDT7aym9a3u_nhVihYa6BNxZ3AQ@mail.gmail.com>
References: <CANRPJScjqJDph_SMu0+8PaTMDT7aym9a3u_nhVihYa6BNxZ3AQ@mail.gmail.com>
Message-ID: <BA9485A1-B761-4C33-A695-9FF6EF43B109@genetics.utah.edu>

Hi Walter,

Will you upload the full GFF3 and the control files that you used to this URL?
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189
Also, what version of MAKER are you running this with?

Thanks,
Daniel


On Feb 25, 2014, at 6:36 PM, Walter Eckalbar <weckalba at asu.edu<mailto:weckalba at asu.edu>>
 wrote:

Hi all,

I am trying to update maker annotations with PASA and encountered errors stemming from file format issues in the gff3 file.

I put a few lines from the gff3 to highlight the issue below.  Basically, the problem is that there are non-unique IDs for a number of the annotations.

Is there anything that can be done to right this problem?

Thanks,

Walter

Lines from GFF3 file, repeated IDs are highlighted:


chr1    maker    gene    9377440    9432028    .    -    .    ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16
chr1    maker    mRNA    9377440    9432028    .    -    .    ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234
chr1    maker    exon    9431899    9432028    .    -    .    ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1
chr1    maker    exon    9431698    9431808    .    -    .    ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1

chr1    maker    gene    8894975    9021577    .    +    .    ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53
chr1    maker    mRNA    8894975    9021577    .    +    .    ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007
chr1    maker    exon    8894975    8895153    .    +    .    ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
chr1    maker    exon    8942215    8942531    .    +    .    ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140226/72236939/attachment.html>

From weckalba at asu.edu  Tue Feb 25 20:11:12 2014
From: weckalba at asu.edu (Walter Eckalbar)
Date: Tue, 25 Feb 2014 18:11:12 -0800
Subject: [maker-devel] invalid gff3 format issues
In-Reply-To: <BA9485A1-B761-4C33-A695-9FF6EF43B109@genetics.utah.edu>
References: <CANRPJScjqJDph_SMu0+8PaTMDT7aym9a3u_nhVihYa6BNxZ3AQ@mail.gmail.com>
	<BA9485A1-B761-4C33-A695-9FF6EF43B109@genetics.utah.edu>
Message-ID: <CANRPJSdY6--A0QtTOUBNQM+HN7dWRDv1YZv7bi=+CVef8LLRXw@mail.gmail.com>

Hi Daniel, those have been uploaded and I'm using version 2.28.

Walter


On 25 February 2014 18:02, Daniel Ence <dence at genetics.utah.edu> wrote:

>  Hi Walter,
>
>  Will you upload the full GFF3 and the control files that you used to
> this URL?
>  http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189
> Also, what version of MAKER are you running this with?
>
>  Thanks,
> Daniel
>
>
>
>  On Feb 25, 2014, at 6:36 PM, Walter Eckalbar <weckalba at asu.edu>
>  wrote:
>
>   Hi all,
>
> I am trying to update maker annotations with PASA and encountered errors
> stemming from file format issues in the gff3 file.
>
>  I put a few lines from the gff3 to highlight the issue below.  Basically,
> the problem is that there are non-unique IDs for a number of the
> annotations.
>
>  Is there anything that can be done to right this problem?
>
> Thanks,
>
>  Walter
>
> Lines from GFF3 file, repeated IDs are highlighted:
>
>
> chr1    maker    gene    9377440    9432028    .    -    .
> ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16
> chr1    maker    mRNA    9377440    9432028    .    -    .
> ID=maker-chr1-snap-gene-4.53-mRNA-1;
> Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234
> chr1    maker    exon    9431899    9432028    .    -    .
> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1
> chr1    maker    exon    9431698    9431808    .    -    .
> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1
>
> chr1    maker    gene    8894975    9021577    .    +    .
> ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53
> chr1    maker    mRNA    8894975    9021577    .    +    .   ID=maker-chr1-snap-gene-4.53-mRNA-1;
> Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007
> chr1    maker    exon    8894975    8895153    .    +    .
> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
> chr1    maker    exon    8942215    8942531    .    +    .
> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
>  _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140225/2392a8fd/attachment.html>

From carsonhh at gmail.com  Tue Feb 25 22:10:27 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 25 Feb 2014 21:10:27 -0700
Subject: [maker-devel] invalid gff3 format issues
In-Reply-To: <CANRPJSdY6--A0QtTOUBNQM+HN7dWRDv1YZv7bi=+CVef8LLRXw@mail.gmail.com>
References: <CANRPJScjqJDph_SMu0+8PaTMDT7aym9a3u_nhVihYa6BNxZ3AQ@mail.gmail.com>
	<BA9485A1-B761-4C33-A695-9FF6EF43B109@genetics.utah.edu>
	<CANRPJSdY6--A0QtTOUBNQM+HN7dWRDv1YZv7bi=+CVef8LLRXw@mail.gmail.com>
Message-ID: <CF32B115.A46C%carsonhh@gmail.com>

Could you try version 2.31 (the current version)?  I believe this is
happening because you are passing in MAKER genes as pred_gff the transcripts
thus ended up with the same Names and IDs as the genes being generated by
the MAKER run via SNAP etc.  This shouldn?t happen with model_gff, and
shouldn?t happen in 2.31 (IDs and names are generated slightly differently
in 2.30+).

Thanks,
Carson

From:  Walter Eckalbar <weckalba at asu.edu>
Date:  Tuesday, February 25, 2014 at 7:11 PM
To:  Daniel Ence <dence at genetics.utah.edu>
Cc:  "<maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] invalid gff3 format issues

Hi Daniel, those have been uploaded and I?m using version 2.28.

Walter


On 25 February 2014 18:02, Daniel Ence <dence at genetics.utah.edu> wrote:
> Hi Walter, 
> 
> Will you upload the full GFF3 and the control files that you used to this URL?
> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189
> Also, what version of MAKER are you running this with?
> 
> Thanks,
> Daniel
> 
> 
> 
> On Feb 25, 2014, at 6:36 PM, Walter Eckalbar <weckalba at asu.edu>
>  wrote:
> 
>> Hi all,
>> 
>> I am trying to update maker annotations with PASA and encountered errors
>> stemming from file format issues in the gff3 file.
>> 
>> I put a few lines from the gff3 to highlight the issue below.  Basically, the
>> problem is that there are non-unique IDs for a number of the annotations.
>> 
>> Is there anything that can be done to right this problem?
>> 
>> Thanks,
>> 
>> Walter
>> 
>> Lines from GFF3 file, repeated IDs are highlighted:
>> 
>> 
>> chr1    maker    gene    9377440    9432028    .    -    .
>> ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.
>> 16
>> chr1    maker    mRNA    9377440    9432028    .    -    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-pred_gff_maker-gene-4.1
>> 6;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82
>> |1|1|1|28|1680|1234
>> chr1    maker    exon    9431899    9432028    .    -    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53
>> -mRNA-1
>> chr1    maker    exon    9431698    9431808    .    -    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53
>> -mRNA-1
>> 
>> chr1    maker    gene    8894975    9021577    .    +    .
>> ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53
>> chr1    maker    mRNA    8894975    9021577    .    +    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-snap-gene-4.53;Name=mak
>> er-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0
>> .88|27|503|2007
>> chr1    maker    exon    8894975    8895153    .    +    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53
>> -mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,mak
>> er-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-sna
>> p-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53
>> -mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,ma
>> ker-chr1-snap-gene-4.53-mRNA-11
>> chr1    maker    exon    8942215    8942531    .    +    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53
>> -mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,mak
>> er-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-sna
>> p-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53
>> -mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,ma
>> ker-chr1-snap-gene-4.53-mRNA-11
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 

_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140225/f87e77c7/attachment.html>

From marc.hoeppner at imbim.uu.se  Wed Feb 26 02:26:35 2014
From: marc.hoeppner at imbim.uu.se (=?Windows-1252?Q?Marc_H=F6ppner?=)
Date: Wed, 26 Feb 2014 08:26:35 +0000
Subject: [maker-devel] Functional annotation options
Message-ID: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se>

Dear List,

I have finished a gene build now, and I would like to go over to functional annotation. I understand that maker includes a few script to facilitate such analyses. However, I have a few questions about this:

1) iprscan
It seems maker includes a MPI wrapper for InterProscan, but requests ?iprscan? to be in $PATH. The latest versions of Interproscan I have worked with are java applications and eventho I put their location in $PATH, mpi_iprscan seems to want something else? But what?

2) maker_functional_gff
This script seems to be very useful, but the description suggests that it requires WuBlast tabular output ?2', which I think looks quite different from the ncbi blast tabular output. Since Wublast is not really available anymore (except this very old, frozen binary bundle), I was wondering how to address this issue. 

3) maker_functional
This just throws an error about a missing Job ID, so no clue what this would be used for.

I guess what I am after is some suggestion as to how use the scripts included with Maker to achieve a reasonable functional annotation. 

With kind regards,

Marc Hoeppner

Marc P. Hoeppner, PhD
Team Leader
BILS Genome Annotation Platform
Department for Medical Biochemistry and Microbiology
Uppsala University, Sweden
marc.hoeppner at imbim.uu.se


From mikael.durling at slu.se  Wed Feb 26 03:43:43 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Wed, 26 Feb 2014 09:43:43 +0000
Subject: [maker-devel] Functional annotation options
In-Reply-To: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se>
References: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se>
Message-ID: <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se>


26 feb 2014 kl. 09:26 skrev Marc H?ppner <marc.hoeppner at imbim.uu.se>:

> Dear List,
> 
> I have finished a gene build now, and I would like to go over to functional annotation. I understand that maker includes a few script to facilitate such analyses. However, I have a few questions about this:
> 
> 1) iprscan
> It seems maker includes a MPI wrapper for InterProscan, but requests ?iprscan? to be in $PATH. The latest versions of Interproscan I have worked with are java applications and eventho I put their location in $PATH, mpi_iprscan seems to want something else? But what?

I don?t believe it works with interproscan5. What I usually do is to split the maker protein file into chunks, and then run these chunks as separate jobs on our cluster, then finally merge the results. The TSV file form iprscan5 can be input into the maker tool ipr_update_gff. I have not tried the iprscan2gff3, as I haven?t figured how to get an iprscan4 raw file from iprscan5.


> 2) maker_functional_gff
> This script seems to be very useful, but the description suggests that it requires WuBlast tabular output ?2', which I think looks quite different from the ncbi blast tabular output. Since Wublast is not really available anymore (except this very old, frozen binary bundle), I was wondering how to address this issue. 

It works fine with ncbiblast+ and the blastp command with -outfmt 6. 

cheers,
Mikael

Ps. Your welcome to visit me at SLU if you would like to discuss experiences of genome annotations.


> 
> 3) maker_functional
> This just throws an error about a missing Job ID, so no clue what this would be used for.
> 
> I guess what I am after is some suggestion as to how use the scripts included with Maker to achieve a reasonable functional annotation. 
> 
> With kind regards,
> 
> Marc Hoeppner
> 
> Marc P. Hoeppner, PhD
> Team Leader
> BILS Genome Annotation Platform
> Department for Medical Biochemistry and Microbiology
> Uppsala University, Sweden
> marc.hoeppner at imbim.uu.se
> 
> 
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From mikael.durling at slu.se  Wed Feb 26 03:55:56 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Wed, 26 Feb 2014 09:55:56 +0000
Subject: [maker-devel] Functional annotation options
In-Reply-To: <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se>
References: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se>
	<63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se>
Message-ID: <29357689-D616-465F-BCC4-66AF5B1D5D2E@slu.se>


26 feb 2014 kl. 10:43 skrev Mikael Brandstr?m Durling <mikael.durling at slu.se<mailto:mikael.durling at slu.se>>:


26 feb 2014 kl. 09:26 skrev Marc H?ppner <marc.hoeppner at imbim.uu.se<mailto:marc.hoeppner at imbim.uu.se>>:

Dear List,

I have finished a gene build now, and I would like to go over to functional annotation. I understand that maker includes a few script to facilitate such analyses. However, I have a few questions about this:

1) iprscan
It seems maker includes a MPI wrapper for InterProscan, but requests ?iprscan? to be in $PATH. The latest versions of Interproscan I have worked with are java applications and eventho I put their location in $PATH, mpi_iprscan seems to want something else? But what?

I don?t believe it works with interproscan5. What I usually do is to split the maker protein file into chunks, and then run these chunks as separate jobs on our cluster, then finally merge the results. The TSV file form iprscan5 can be input into the maker tool ipr_update_gff. I have not tried the iprscan2gff3, as I haven?t figured how to get an iprscan4 raw file from iprscan5.

I should clarify this and say that mpi_iprscan doesn?t seem to work with iprscan5. ipr_update_gff3 does, however.


Mikael

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140226/b4a81f22/attachment.html>

From mikael.durling at slu.se  Wed Feb 26 06:30:44 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Wed, 26 Feb 2014 12:30:44 +0000
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CF32868D.A42A%carsonhh@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
Message-ID: <BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>

Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?

Thanks,
Mikael

26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>:

There is a way.  It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that.  The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option.  If you add tags to your fasta headers, those can be used to guide the mapping and naming.  For example, gene_id=<some_gene>  will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp  and just using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.

?Carson


From: Shaun Jackman <sjackman at gmail.com<mailto:sjackman at gmail.com>>
Reply-To: Shaun Jackman <sjackman at gmail.com<mailto:sjackman at gmail.com>>
Date: Tuesday, February 25, 2014 at 5:06 PM
To: <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: [maker-devel] Mapping gene names


Hi,

I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?

maker_opts.ctl

est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1


Thanks,
Shaun

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140226/874f135e/attachment.html>

From carsonhh at gmail.com  Wed Feb 26 07:22:34 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 26 Feb 2014 06:22:34 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
Message-ID: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>

Yes.  That should work as well as an accidental feature.

--Carson 

Sent from my iPhone

> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se> wrote:
> 
> Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?
> 
> Thanks,
> Mikael
> 
>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>> 
>> There is a way.  It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that.  The option won?t already be there so you?ll have to type it in.
>> 
>> There is also a feature designed to work with this option.  If you add tags to your fasta headers, those can be used to guide the mapping and naming.  For example, gene_id=<some_gene>  will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp  and just using maker_coor=chr1 will force it to only be mapped against chr1.
>> 
>> This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.
>> 
>> ?Carson
>> 
>> 
>> 
>> 
>> From: Shaun Jackman <sjackman at gmail.com>
>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>> Date: Tuesday, February 25, 2014 at 5:06 PM
>> To: <maker-devel at yandell-lab.org>
>> Subject: [maker-devel] Mapping gene names
>> 
>> Hi,
>> 
>> I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?
>> 
>> maker_opts.ctl
>> 
>> est=NC_123456.frn
>> protein=NC_123456.faa
>> est2genome=1
>> protein2genome=1
>> Thanks,
>> Shaun
>> 
>> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140226/f3b97c58/attachment.html>

From mikael.durling at slu.se  Wed Feb 26 07:37:29 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Wed, 26 Feb 2014 13:37:29 +0000
Subject: [maker-devel] Mapping gene names
In-Reply-To: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
Message-ID: <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>

That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right?

Mikael

26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>:

Yes.  That should work as well as an accidental feature.

--Carson

Sent from my iPhone

On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se<mailto:mikael.durling at slu.se>> wrote:

Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?

Thanks,
Mikael

26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>:

There is a way.  It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that.  The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option.  If you add tags to your fasta headers, those can be used to guide the mapping and naming.  For example, gene_id=<some_gene>  will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp  and just using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.

?Carson


From: Shaun Jackman <sjackman at gmail.com<mailto:sjackman at gmail.com>>
Reply-To: Shaun Jackman <sjackman at gmail.com<mailto:sjackman at gmail.com>>
Date: Tuesday, February 25, 2014 at 5:06 PM
To: <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: [maker-devel] Mapping gene names


Hi,

I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?

maker_opts.ctl

est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1


Thanks,
Shaun

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140226/791ef46d/attachment.html>

From nextgen.usfs at gmail.com  Wed Feb 26 10:21:33 2014
From: nextgen.usfs at gmail.com (USFS Ion PGM)
Date: Wed, 26 Feb 2014 10:21:33 -0600
Subject: [maker-devel] change program locations in maker_exe
Message-ID: <CDD24D4E-4555-474F-9367-B6F6D05F11B4@gmail.com>

Hello,
I was wondering if there is a way to make permanent changes to the maker_exe.ctl file, as it seems on the install that maker didn?t find the gene mark or pro build locations correctly, which means that I have to manually edit the maker_exe.ctl file every time and add that information.  Where can I modify this permanently so that the maker -CTL command creates the appropriate maker_exe file?  Thank you.

- Jon


From carsonhh at gmail.com  Wed Feb 26 09:38:47 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 26 Feb 2014 08:38:47 -0700
Subject: [maker-devel] Functional annotation options
In-Reply-To: <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se>
References: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se>
	<63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se>
Message-ID: <CF33558F.A4C3%carsonhh@gmail.com>

maker_functional is a script that gets called by another script, not meant
to be called directly by the user.  So ignore that.

Just run iprscan directly it already works pretty well.  The mpi_iprscan
and iprscan_wrap scripts, just give some logging functionality by wrapping
the iprscan call.  In most cases there is not advantage over just running
iprscan directly.

?Carson


On 2/26/14, 2:43 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
wrote:

>
>26 feb 2014 kl. 09:26 skrev Marc H?ppner <marc.hoeppner at imbim.uu.se>:
>
>> Dear List,
>> 
>> I have finished a gene build now, and I would like to go over to
>>functional annotation. I understand that maker includes a few script to
>>facilitate such analyses. However, I have a few questions about this:
>> 
>> 1) iprscan
>> It seems maker includes a MPI wrapper for InterProscan, but requests
>>?iprscan? to be in $PATH. The latest versions of Interproscan I have
>>worked with are java applications and eventho I put their location in
>>$PATH, mpi_iprscan seems to want something else? But what?
>
>I don?t believe it works with interproscan5. What I usually do is to
>split the maker protein file into chunks, and then run these chunks as
>separate jobs on our cluster, then finally merge the results. The TSV
>file form iprscan5 can be input into the maker tool ipr_update_gff. I
>have not tried the iprscan2gff3, as I haven?t figured how to get an
>iprscan4 raw file from iprscan5.
>
>
>> 2) maker_functional_gff
>> This script seems to be very useful, but the description suggests that
>>it requires WuBlast tabular output ?2', which I think looks quite
>>different from the ncbi blast tabular output. Since Wublast is not
>>really available anymore (except this very old, frozen binary bundle), I
>>was wondering how to address this issue.
>
>It works fine with ncbiblast+ and the blastp command with -outfmt 6.
>
>cheers,
>Mikael
>
>Ps. Your welcome to visit me at SLU if you would like to discuss
>experiences of genome annotations.
>
>
>> 
>> 3) maker_functional
>> This just throws an error about a missing Job ID, so no clue what this
>>would be used for.
>> 
>> I guess what I am after is some suggestion as to how use the scripts
>>included with Maker to achieve a reasonable functional annotation.
>> 
>> With kind regards,
>> 
>> Marc Hoeppner
>> 
>> Marc P. Hoeppner, PhD
>> Team Leader
>> BILS Genome Annotation Platform
>> Department for Medical Biochemistry and Microbiology
>> Uppsala University, Sweden
>> marc.hoeppner at imbim.uu.se
>> 
>> 
>> 
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Wed Feb 26 10:09:14 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 26 Feb 2014 09:09:14 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
Message-ID: <CF335A95.A4DE%carsonhh@gmail.com>

It will still work without est_forward.  It just works a little differently.
Keep in mind this was a hidden feature I used to find stubborn or hard to
find missing genes after reassembly of a genome.

If est_forward is provided, MAKER will parse the database to look for the
maker_coor tags early in the pipeline.  Then it will create a list of
locations to search, and it will search them even if there are no BLAST
results to seed the search (normally MAKER gets a BLAST result first and
then polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to
look for a match using all of chr1 as the input to exonerate even when BLAST
finds nothing (this is a very very slow search, but can help pick up one or
two stubborn genes that don?t remap well).  To allow this, MAKER gives
exonerate looser matching parameters (i.e. allows for single base pair
introns perhaps caused by assembly errors).  The logic here is that given
the fact that I already told MAKER that with some degree of confidence I
expect sequence A to map to to location X, it will try its hardest to make
it match. 

Without est_forward set, the maker_coor= flag still gets read in GI.pm at
line 1563, but only after a BLAST alignment has already seeded it to the
region (that BLAST result has the information in its description parameter).
MAKER will then ignore seeds completely outside of maker_coor. In addition
any BLAST seeds that overlap maker_coor will get the search space for
alignment polishing adjusted to match maker_coor exactly.  Also match
parameters for exonerate will not be relaxed as they were with est_forward.

As you can see the behavior, is slightly different (because it?s an
accidental feature).

Thanks,
Carson


From:  Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date:  Wednesday, February 26, 2014 at 6:37 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Mapping gene names

That might be a useful and time saving accidental feature. But, reading the
code, it seems that I need to supply maker_coor but not gene_id, as well as
the configuration option est_forward for this to work. Any occurrences of
maker_coor in GI.pm seems to be conditioned on set_forward=1 right?

Mikael

26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:

> Yes.  That should work as well as an accidental feature.
> 
> --Carson 
> 
> Sent from my iPhone
> 
> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se>
> wrote:
> 
>> Can this use of maker_coor be used only to hint about the placement of the
>> ests, without affecting the naming of the final genes? Ie if I have a
>> database of EST where I have a priori knowledge of their rough placement, can
>> this placement be given to maker without providing est_forward=1?
>> 
>> Thanks,
>> Mikael
>> 
>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>> 
>>> There is a way.  It?s not a standard option and it?s undocumented, but if
>>> you add est_forward=1 to the maker_opts.ctl file, then it will do just that.
>>> The option won?t already be there so you?ll have to type it in.
>>> 
>>> There is also a feature designed to work with this option.  If you add tags
>>> to your fasta headers, those can be used to guide the mapping and naming.
>>> For example, gene_id=<some_gene>  will ensure different isoforms that share
>>> a common gene_id get clustered into the same gene, and
>>> maker_coor=chr1:1-10000 in the fasta header will force a particular sequence
>>> to only be mapped against chr1 within the range of 1-10000 bp  and just
>>> using maker_coor=chr1 will force it to only be mapped against chr1.
>>> 
>>> This is an undocumented way to remap genes onto new assemblies using blast
>>> alignments of earlier transcript or protein annotations as a guide.
>>> 
>>> ?Carson
>>> 
>>> 
>>> 
>>> 
>>> From: Shaun Jackman <sjackman at gmail.com>
>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>> To: <maker-devel at yandell-lab.org>
>>> Subject: [maker-devel] Mapping gene names
>>> 
>>> Hi,
>>> 
>>> I?m annotating a genome using a closely related genome from Genbank, using
>>> the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate
>>> my genome. I?ve run Maker, and the annotation seems to have worked well. Is
>>> it possible to map the names of the genes from the related species to my
>>> annotation? I see the map_forward option, which applies to the model_gff
>>> parameter. Is there a similar option for est and protein?
>>> 
>>> maker_opts.ctl
>>> est=NC_123456.frn
>>> protein=NC_123456.faa
>>> est2genome=1
>>> protein2genome=1
>>> Thanks,
>>> Shaun
>>> _______________________________________________ maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140226/4889751f/attachment.html>

From carson.holt at genetics.utah.edu  Wed Feb 26 10:38:37 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Wed, 26 Feb 2014 16:38:37 +0000
Subject: [maker-devel] change program locations in maker_exe
In-Reply-To: <CDD24D4E-4555-474F-9367-B6F6D05F11B4@gmail.com>
References: <CDD24D4E-4555-474F-9367-B6F6D05F11B4@gmail.com>
Message-ID: <CF33655B.A514%carson.holt@genetics.utah.edu>

MAKER first looks inside of .../maker/exe/ for any executables.  Then it
uses the systems ?which? command to identify executables in your PATH
environmental variable.  If MAKER is not finding the one you want, then
you can either put the program in the .../maker/exe/ folder (I.e. create
.../maker/exe/bin/  and then put soft links to the executables you want to
be used first), or you can rearrange the order of paraameters in your PATH
environmental variable so that ?which <program_name>? returns the location
you want.  If MAKER is always leaving the locations to those programs
empty, it is because you need to add them to your PATH environmental
variable.

Thanks,
Carson

On 2/26/14, 9:21 AM, "USFS Ion PGM" <nextgen.usfs at gmail.com> wrote:

>Hello,
>I was wondering if there is a way to make permanent changes to the
>maker_exe.ctl file, as it seems on the install that maker didn?t find the
>gene mark or pro build locations correctly, which means that I have to
>manually edit the maker_exe.ctl file every time and add that information.
> Where can I modify this permanently so that the maker -CTL command
>creates the appropriate maker_exe file?  Thank you.
>
>- Jon
>
>


From nextgen.usfs at gmail.com  Wed Feb 26 10:58:11 2014
From: nextgen.usfs at gmail.com (USFS Ion PGM)
Date: Wed, 26 Feb 2014 10:58:11 -0600
Subject: [maker-devel] change program locations in maker_exe
In-Reply-To: <CF33655B.A514%carson.holt@genetics.utah.edu>
References: <CDD24D4E-4555-474F-9367-B6F6D05F11B4@gmail.com>
	<CF33655B.A514%carson.holt@genetics.utah.edu>
Message-ID: <2FA61AAE-0548-4030-9F4A-6964A631703C@gmail.com>

Hi Carson,

Thank you - that did it, I didn?t have them in the PATH.  All working now.

Cheers,
Jon

On Feb 26, 2014, at 10:38 AM, Carson Holt <carson.holt at genetics.utah.edu> wrote:

> MAKER first looks inside of .../maker/exe/ for any executables.  Then it
> uses the systems ?which? command to identify executables in your PATH
> environmental variable.  If MAKER is not finding the one you want, then
> you can either put the program in the .../maker/exe/ folder (I.e. create
> .../maker/exe/bin/  and then put soft links to the executables you want to
> be used first), or you can rearrange the order of paraameters in your PATH
> environmental variable so that ?which <program_name>? returns the location
> you want.  If MAKER is always leaving the locations to those programs
> empty, it is because you need to add them to your PATH environmental
> variable.
> 
> Thanks,
> Carson
> 
> On 2/26/14, 9:21 AM, "USFS Ion PGM" <nextgen.usfs at gmail.com> wrote:
> 
>> Hello,
>> I was wondering if there is a way to make permanent changes to the
>> maker_exe.ctl file, as it seems on the install that maker didn?t find the
>> gene mark or pro build locations correctly, which means that I have to
>> manually edit the maker_exe.ctl file every time and add that information.
>> Where can I modify this permanently so that the maker -CTL command
>> creates the appropriate maker_exe file?  Thank you.
>> 
>> - Jon
>> 
>> 
> 


From weckalba at asu.edu  Wed Feb 26 14:05:05 2014
From: weckalba at asu.edu (Walter Eckalbar)
Date: Wed, 26 Feb 2014 12:05:05 -0800
Subject: [maker-devel] invalid gff3 format issues
In-Reply-To: <CF32B115.A46C%carsonhh@gmail.com>
References: <CANRPJScjqJDph_SMu0+8PaTMDT7aym9a3u_nhVihYa6BNxZ3AQ@mail.gmail.com>
	<BA9485A1-B761-4C33-A695-9FF6EF43B109@genetics.utah.edu>
	<CANRPJSdY6--A0QtTOUBNQM+HN7dWRDv1YZv7bi=+CVef8LLRXw@mail.gmail.com>
	<CF32B115.A46C%carsonhh@gmail.com>
Message-ID: <CANRPJSfTAZrey0m6usseLZ6Sj-2fOsMWe_q1_6-9yXvOiwm44w@mail.gmail.com>

Hi Carson,

Thanks, that seems to have mostly resolved the issue.  Oddly enough though,
PASA still complains about the GFF3 file directly from gff3_merge, but if I
first transform it with maker2eval_gtf, then use PASA's
gtf_to_gff3_format.pl script, everything seems to run fine.


On 25 February 2014 20:10, Carson Holt <carsonhh at gmail.com> wrote:

> Could you try version 2.31 (the current version)?  I believe this is
> happening because you are passing in MAKER genes as pred_gff the
> transcripts thus ended up with the same Names and IDs as the genes being
> generated by the MAKER run via SNAP etc.  This shouldn't happen with
> model_gff, and shouldn't happen in 2.31 (IDs and names are generated
> slightly differently in 2.30+).
>
> Thanks,
> Carson
>
> From: Walter Eckalbar <weckalba at asu.edu>
> Date: Tuesday, February 25, 2014 at 7:11 PM
> To: Daniel Ence <dence at genetics.utah.edu>
> Cc: "<maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] invalid gff3 format issues
>
> Hi Daniel, those have been uploaded and I'm using version 2.28.
>
> Walter
>
>
> On 25 February 2014 18:02, Daniel Ence <dence at genetics.utah.edu> wrote:
>
>> Hi Walter,
>>
>> Will you upload the full GFF3 and the control files that you used to this
>> URL?
>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189
>> Also, what version of MAKER are you running this with?
>>
>> Thanks,
>> Daniel
>>
>>
>>
>> On Feb 25, 2014, at 6:36 PM, Walter Eckalbar <weckalba at asu.edu>
>>  wrote:
>>
>> Hi all,
>>
>> I am trying to update maker annotations with PASA and encountered errors
>> stemming from file format issues in the gff3 file.
>>
>> I put a few lines from the gff3 to highlight the issue below.  Basically,
>> the problem is that there are non-unique IDs for a number of the
>> annotations.
>>
>> Is there anything that can be done to right this problem?
>>
>> Thanks,
>>
>> Walter
>>
>> Lines from GFF3 file, repeated IDs are highlighted:
>>
>>
>> chr1    maker    gene    9377440    9432028    .    -    .
>> ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16
>> chr1    maker    mRNA    9377440    9432028    .    -    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1;
>> Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234
>> chr1    maker    exon    9431899    9432028    .    -    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1
>> chr1    maker    exon    9431698    9431808    .    -    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1
>>
>> chr1    maker    gene    8894975    9021577    .    +    .
>> ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53
>> chr1    maker    mRNA    8894975    9021577    .    +    .   ID=maker-chr1-snap-gene-4.53-mRNA-1;
>> Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007
>> chr1    maker    exon    8894975    8895153    .    +    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
>> chr1    maker    exon    8942215    8942531    .    +    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>>
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140226/2d2f2884/attachment.html>

From carsonhh at gmail.com  Wed Feb 26 15:12:23 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 26 Feb 2014 14:12:23 -0700
Subject: [maker-devel] invalid gff3 format issues
In-Reply-To: <CANRPJSfTAZrey0m6usseLZ6Sj-2fOsMWe_q1_6-9yXvOiwm44w@mail.gmail.com>
References: <CANRPJScjqJDph_SMu0+8PaTMDT7aym9a3u_nhVihYa6BNxZ3AQ@mail.gmail.com>
	<BA9485A1-B761-4C33-A695-9FF6EF43B109@genetics.utah.edu>
	<CANRPJSdY6--A0QtTOUBNQM+HN7dWRDv1YZv7bi=+CVef8LLRXw@mail.gmail.com>
	<CF32B115.A46C%carsonhh@gmail.com>
	<CANRPJSfTAZrey0m6usseLZ6Sj-2fOsMWe_q1_6-9yXvOiwm44w@mail.gmail.com>
Message-ID: <CF33A669.A53C%carsonhh@gmail.com>

Could you put the file in this GFF3 validator to see if anything comes up?
?> http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online

Maybe it?s just PASA.  But I?d like to know there?s no issue being caused by
something else.

Thanks,
Carson


From:  Walter Eckalbar <weckalba at asu.edu>
Date:  Wednesday, February 26, 2014 at 1:05 PM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Daniel Ence <dence at genetics.utah.edu>, "<maker-devel at yandell-lab.org>"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] invalid gff3 format issues

Hi Carson,

Thanks, that seems to have mostly resolved the issue.  Oddly enough though,
PASA still complains about the GFF3 file directly from gff3_merge, but if I
first transform it with maker2eval_gtf, then use PASA?s
gtf_to_gff3_format.pl <http://gtf_to_gff3_format.pl>  script, everything
seems to run fine.


On 25 February 2014 20:10, Carson Holt <carsonhh at gmail.com> wrote:
> Could you try version 2.31 (the current version)?  I believe this is happening
> because you are passing in MAKER genes as pred_gff the transcripts thus ended
> up with the same Names and IDs as the genes being generated by the MAKER run
> via SNAP etc.  This shouldn?t happen with model_gff, and shouldn?t happen in
> 2.31 (IDs and names are generated slightly differently in 2.30+).
> 
> Thanks,
> Carson
> 
> From:  Walter Eckalbar <weckalba at asu.edu>
> Date:  Tuesday, February 25, 2014 at 7:11 PM
> To:  Daniel Ence <dence at genetics.utah.edu>
> Cc:  "<maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org>
> Subject:  Re: [maker-devel] invalid gff3 format issues
> 
> Hi Daniel, those have been uploaded and I?m using version 2.28.
> 
> Walter
> 
> 
> On 25 February 2014 18:02, Daniel Ence <dence at genetics.utah.edu> wrote:
>> Hi Walter, 
>> 
>> Will you upload the full GFF3 and the control files that you used to this
>> URL?
>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189
>> Also, what version of MAKER are you running this with?
>> 
>> Thanks,
>> Daniel
>> 
>> 
>> 
>> On Feb 25, 2014, at 6:36 PM, Walter Eckalbar <weckalba at asu.edu>
>>  wrote:
>> 
>>> Hi all,
>>> 
>>> I am trying to update maker annotations with PASA and encountered errors
>>> stemming from file format issues in the gff3 file.
>>> 
>>> I put a few lines from the gff3 to highlight the issue below.  Basically,
>>> the problem is that there are non-unique IDs for a number of the
>>> annotations.
>>> 
>>> Is there anything that can be done to right this problem?
>>> 
>>> Thanks,
>>> 
>>> Walter
>>> 
>>> Lines from GFF3 file, repeated IDs are highlighted:
>>> 
>>> 
>>> chr1    maker    gene    9377440    9432028    .    -    .
>>> ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4
>>> .16
>>> chr1    maker    mRNA    9377440    9432028    .    -    .
>>> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-pred_gff_maker-gene-4.
>>> 16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.
>>> 82|1|1|1|28|1680|1234
>>> chr1    maker    exon    9431899    9432028    .    -    .
>>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.5
>>> 3-mRNA-1
>>> chr1    maker    exon    9431698    9431808    .    -    .
>>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.5
>>> 3-mRNA-1
>>> 
>>> chr1    maker    gene    8894975    9021577    .    +    .
>>> ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53
>>> chr1    maker    mRNA    8894975    9021577    .    +    .
>>> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-snap-gene-4.53;Name=ma
>>> ker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84
>>> |0.88|27|503|2007
>>> chr1    maker    exon    8894975    8895153    .    +    .
>>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.5
>>> 3-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,m
>>> aker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-
>>> snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-
>>> 4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-
>>> 10,maker-chr1-snap-gene-4.53-mRNA-11
>>> chr1    maker    exon    8942215    8942531    .    +    .
>>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.5
>>> 3-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,m
>>> aker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-
>>> snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-
>>> 4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-
>>> 10,maker-chr1-snap-gene-4.53-mRNA-11
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
> 
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak
> er-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140226/ea166d94/attachment.html>

From mikael.durling at slu.se  Wed Feb 26 16:04:37 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Wed, 26 Feb 2014 22:04:37 +0000
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CF335A95.A4DE%carsonhh@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
Message-ID: <ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>

It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions.

In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature.

THanks,
Mikael

26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>:

It will still work without est_forward.  It just works a little differently.  Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome.

If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline.  Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well).  To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors).  The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match.

Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter).  MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly.  Also match parameters for exonerate will not be relaxed as they were with est_forward.

As you can see the behavior, is slightly different (because it?s an accidental feature).

Thanks,
Carson


From: Mikael Brandstr?m Durling <mikael.durling at slu.se<mailto:mikael.durling at slu.se>>
Date: Wednesday, February 26, 2014 at 6:37 AM
To: Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] Mapping gene names

That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right?

Mikael

26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>:

Yes.  That should work as well as an accidental feature.

--Carson

Sent from my iPhone

On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se<mailto:mikael.durling at slu.se>> wrote:

Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?

Thanks,
Mikael

26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>:

There is a way.  It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that.  The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option.  If you add tags to your fasta headers, those can be used to guide the mapping and naming.  For example, gene_id=<some_gene>  will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp  and just using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.

?Carson


From: Shaun Jackman <sjackman at gmail.com<mailto:sjackman at gmail.com>>
Reply-To: Shaun Jackman <sjackman at gmail.com<mailto:sjackman at gmail.com>>
Date: Tuesday, February 25, 2014 at 5:06 PM
To: <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: [maker-devel] Mapping gene names


Hi,

I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?

maker_opts.ctl

est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1


Thanks,
Shaun

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140226/0409040d/attachment.html>

From carsonhh at gmail.com  Wed Feb 26 16:50:30 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 26 Feb 2014 15:50:30 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
Message-ID: <CF33B334.A551%carsonhh@gmail.com>

What you can do is run it once with just est_forward=1 and
est2genome/protein2genome set to 1.  Then take those results, pass them in
as model_gff and use the map_forward option to then filter the results based
on mRNA score and that would copy names onto new gene under the standard
MAKER pipeline.  Eventually it?s really supposed to go into a separate tool
that will map genes onto new assemblies (but under the hood the tool will
just be calling MAKER with certain parameters restricted).  I do this
because if people commonly use it mixed with things like SNAP I can start to
get some very weird behaviors.

Thanks,
Carson

From:  Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date:  Wednesday, February 26, 2014 at 3:04 PM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Mapping gene names

It seems that this could be a very useful option in those cases where you
have firm a priori knowledge of the placement of ESTs. However, while trying
it I note that est_forward implies that the est2genome predictor is turned
on, implicitly. Is this necessary for this to work? I?m after the behavior
you describe below where exonerate is made to try really hard within a
limited region to align an est, but I would not like maker to produce
est2genome predictions.

In general, I think this maker_coor and est_forward is a feature set that is
worthy to be promoted into a documented feature.

THanks,
Mikael

26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:

> It will still work without est_forward.  It just works a little differently.
> Keep in mind this was a hidden feature I used to find stubborn or hard to find
> missing genes after reassembly of a genome.
> 
> If est_forward is provided, MAKER will parse the database to look for the
> maker_coor tags early in the pipeline.  Then it will create a list of
> locations to search, and it will search them even if there are no BLAST
> results to seed the search (normally MAKER gets a BLAST result first and then
> polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to look for
> a match using all of chr1 as the input to exonerate even when BLAST finds
> nothing (this is a very very slow search, but can help pick up one or two
> stubborn genes that don?t remap well).  To allow this, MAKER gives exonerate
> looser matching parameters (i.e. allows for single base pair introns perhaps
> caused by assembly errors).  The logic here is that given the fact that I
> already told MAKER that with some degree of confidence I expect sequence A to
> map to to location X, it will try its hardest to make it match.
> 
> Without est_forward set, the maker_coor= flag still gets read in GI.pm at line
> 1563, but only after a BLAST alignment has already seeded it to the region
> (that BLAST result has the information in its description parameter).  MAKER
> will then ignore seeds completely outside of maker_coor. In addition any BLAST
> seeds that overlap maker_coor will get the search space for alignment
> polishing adjusted to match maker_coor exactly.  Also match parameters for
> exonerate will not be relaxed as they were with est_forward.
> 
> As you can see the behavior, is slightly different (because it?s an accidental
> feature).
> 
> Thanks,
> Carson
> 
> 
> 
> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
> Date: Wednesday, February 26, 2014 at 6:37 AM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Mapping gene names
> 
> That might be a useful and time saving accidental feature. But, reading the
> code, it seems that I need to supply maker_coor but not gene_id, as well as
> the configuration option est_forward for this to work. Any occurrences of
> maker_coor in GI.pm seems to be conditioned on set_forward=1 right?
> 
> Mikael
> 
> 26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
> 
>> Yes.  That should work as well as an accidental feature.
>> 
>> --Carson 
>> 
>> Sent from my iPhone
>> 
>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling
>> <mikael.durling at slu.se> wrote:
>> 
>>> Can this use of maker_coor be used only to hint about the placement of the
>>> ests, without affecting the naming of the final genes? Ie if I have a
>>> database of EST where I have a priori knowledge of their rough placement,
>>> can this placement be given to maker without providing est_forward=1?
>>> 
>>> Thanks,
>>> Mikael
>>> 
>>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>> 
>>>> There is a way.  It?s not a standard option and it?s undocumented, but if
>>>> you add est_forward=1 to the maker_opts.ctl file, then it will do just
>>>> that.  The option won?t already be there so you?ll have to type it in.
>>>> 
>>>> There is also a feature designed to work with this option.  If you add tags
>>>> to your fasta headers, those can be used to guide the mapping and naming.
>>>> For example, gene_id=<some_gene>  will ensure different isoforms that share
>>>> a common gene_id get clustered into the same gene, and
>>>> maker_coor=chr1:1-10000 in the fasta header will force a particular
>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp  and
>>>> just using maker_coor=chr1 will force it to only be mapped against chr1.
>>>> 
>>>> This is an undocumented way to remap genes onto new assemblies using blast
>>>> alignments of earlier transcript or protein annotations as a guide.
>>>> 
>>>> ?Carson
>>>> 
>>>> 
>>>> 
>>>> 
>>>> From: Shaun Jackman <sjackman at gmail.com>
>>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>>> To: <maker-devel at yandell-lab.org>
>>>> Subject: [maker-devel] Mapping gene names
>>>> 
>>>> Hi,
>>>> 
>>>> I?m annotating a genome using a closely related genome from Genbank, using
>>>> the .frn (RNA) and .faa (protein) files from Genbank as evidence to
>>>> annotate my genome. I?ve run Maker, and the annotation seems to have worked
>>>> well. Is it possible to map the names of the genes from the related species
>>>> to my annotation? I see the map_forward option, which applies to the
>>>> model_gff parameter. Is there a similar option for est and protein?
>>>> 
>>>> maker_opts.ctl
>>>> est=NC_123456.frn
>>>> protein=NC_123456.faa
>>>> est2genome=1
>>>> protein2genome=1
>>>> Thanks,
>>>> Shaun
>>>> _______________________________________________ maker-devel mailing list
>>>> maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>> _______________________________________________
>>>> maker-devel mailing list
>>>> maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140226/8981875a/attachment.html>

From carsonhh at gmail.com  Wed Feb 26 17:45:30 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 26 Feb 2014 16:45:30 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CF33B334.A551%carsonhh@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
Message-ID: <B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>

Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff.

--Carson 

Sent from my iPhone

> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
> 
> What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1.  Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline.  Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted).  I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors. 
> 
> Thanks,
> Carson
> 
> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
> Date: Wednesday, February 26, 2014 at 3:04 PM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Mapping gene names
> 
> It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions.
> 
> In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature.
> 
> THanks,
> Mikael
> 
>> 26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>> 
>> It will still work without est_forward.  It just works a little differently.  Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome.
>> 
>> If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline.  Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well).  To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors).  The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match. 
>> 
>> Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter).  MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly.  Also match parameters for exonerate will not be relaxed as they were with est_forward.
>> 
>> As you can see the behavior, is slightly different (because it?s an accidental feature).
>> 
>> Thanks,
>> Carson
>> 
>> 
>> 
>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>> Date: Wednesday, February 26, 2014 at 6:37 AM
>> To: Carson Holt <carsonhh at gmail.com>
>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] Mapping gene names
>> 
>> That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right?
>> 
>> Mikael
>> 
>>> 26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>>> 
>>> Yes.  That should work as well as an accidental feature.
>>> 
>>> --Carson 
>>> 
>>> Sent from my iPhone
>>> 
>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se> wrote:
>>>> 
>>>> Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?
>>>> 
>>>> Thanks,
>>>> Mikael
>>>> 
>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>>>> 
>>>>> There is a way.  It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that.  The option won?t already be there so you?ll have to type it in.
>>>>> 
>>>>> There is also a feature designed to work with this option.  If you add tags to your fasta headers, those can be used to guide the mapping and naming.  For example, gene_id=<some_gene>  will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp  and just using maker_coor=chr1 will force it to only be mapped against chr1.
>>>>> 
>>>>> This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.
>>>>> 
>>>>> ?Carson
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> From: Shaun Jackman <sjackman at gmail.com>
>>>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>>>> To: <maker-devel at yandell-lab.org>
>>>>> Subject: [maker-devel] Mapping gene names
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?
>>>>> 
>>>>> maker_opts.ctl
>>>>> 
>>>>> est=NC_123456.frn
>>>>> protein=NC_123456.faa
>>>>> est2genome=1
>>>>> protein2genome=1
>>>>> Thanks,
>>>>> Shaun
>>>>> 
>>>>> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org 
>>>>> _______________________________________________
>>>>> maker-devel mailing list
>>>>> maker-devel at box290.bluehost.com
>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140226/4b8b7fdb/attachment.html>

From bioinformatics.umd at gmail.com  Thu Feb 27 10:46:44 2014
From: bioinformatics.umd at gmail.com (UMD Bioinformatics)
Date: Thu, 27 Feb 2014 11:46:44 -0500
Subject: [maker-devel] Problem with OpenFabrics and infiniband
Message-ID: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com>

Hello,

I?ve had my IT folks install maker on our cluster at UMD. I?m having a SEGFAULT error when running maker on inifiniband nodes vs gigE nodes. According to the logs this appears to be an issue with forks but I?m not sure how to fix this. I would simply use the gigE nodes but we are in the process of updating everything to inifiniband so I?ll need to address this issue as some point. I?ve attached the error log from the MPI run as well as commentary from my HPCC team. 

IT suggestions

If you look at the top of the error log for the problematic job, it clearly
warns of an issue with doing 'fork's within openmpi/openfabrics framework.

In particular, the use of the fork system call is only partially supported
in the OpenFabrics software (this is the drivers, etc for the infiniband
connections). See e.g. 
http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork
for more information. In particular the paragraphs starting with the
sentence with the red highlighted "it does not mean that your fork()-calling 
application is safe". (The kernel, openMPI version, and OFED version are 
sufficiently recent to mean that there is _some_ fork support).

The fact that the job runs over gigE but not IB, in conjunction with the
warning from openmpi, strongly suggests that this is the issue that you are 
encountering. I suspect that maker touches registered memory before the fork,
which would result in a segfault (matching what was observed).

You can try adding the arguments
--mca mpi_warn_on_fork 0 
to the mpirun command, just in case the crash was somehow caused by openmpi's
warning, but I would not hold out much hope for that.

###UPDATE### This does not fix the problem.


Basically, it looks like maker uses some system calls like fork in a manner
which is incompatible with the current OpenFabrics software, and thus will
not work with infiniband. This situation is likely to remain until either
maker changes to be compatible with OFED, or OFED's support for the fork
system call is broadened.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140227/acd7e3ab/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: maker_error_openfabrics.txt
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140227/acd7e3ab/attachment.txt>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140227/acd7e3ab/attachment-0001.html>

From carson.holt at genetics.utah.edu  Thu Feb 27 12:09:21 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Thu, 27 Feb 2014 18:09:21 +0000
Subject: [maker-devel] Problem with OpenFabrics and infiniband
In-Reply-To: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com>
References: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com>
Message-ID: <CF34C944.A5B0%carson.holt@genetics.utah.edu>

It?s a little more complicated than that.  MAKER is written in Perl, and Perl doesn?t give me the low level access that a language like C would for controlling memory access (I don?t control that).  All I get is Perl?s standard implementation of forks.  So it?s not really a matter of MAKER changing, it would be a matter of changing Perl itself (which I have no power over, and I don?t think will be changing anytime soon).

For now you just have to add this flag to OpenMPI when running MAKER with mpiexec ?>  -mca btl ^openib

Example :
mpiexec -mca btl ^openib -n 20 maker


Thanks,
Carson


From: UMD Bioinformatics <bioinformatics.umd at gmail.com<mailto:bioinformatics.umd at gmail.com>>
Date: Thursday, February 27, 2014 at 9:46 AM
To: <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Problem with OpenFabrics and infiniband

Hello,

I?ve had my IT folks install maker on our cluster at UMD. I?m having a SEGFAULT error when running maker on inifiniband nodes vs gigE nodes. According to the logs this appears to be an issue with forks but I?m not sure how to fix this. I would simply use the gigE nodes but we are in the process of updating everything to inifiniband so I?ll need to address this issue as some point. I?ve attached the error log from the MPI run as well as commentary from my HPCC team.

IT suggestions

If you look at the top of the error log for the problematic job, it clearly
warns of an issue with doing 'fork's within openmpi/openfabrics framework.

In particular, the use of the fork system call is only partially supported
in the OpenFabrics software (this is the drivers, etc for the infiniband
connections). See e.g.
http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork
for more information. In particular the paragraphs starting with the
sentence with the red highlighted "it does not mean that your fork()-calling
application is safe". (The kernel, openMPI version, and OFED version are
sufficiently recent to mean that there is _some_ fork support).

The fact that the job runs over gigE but not IB, in conjunction with the
warning from openmpi, strongly suggests that this is the issue that you are
encountering. I suspect that maker touches registered memory before the fork,
which would result in a segfault (matching what was observed).

You can try adding the arguments
--mca mpi_warn_on_fork 0
to the mpirun command, just in case the crash was somehow caused by openmpi's
warning, but I would not hold out much hope for that.

###UPDATE### This does not fix the problem.


Basically, it looks like maker uses some system calls like fork in a manner
which is incompatible with the current OpenFabrics software, and thus will
not work with infiniband. This situation is likely to remain until either
maker changes to be compatible with OFED, or OFED's support for the fork
system call is broadened.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140227/062719d0/attachment.html>

From bioinformatics.umd at gmail.com  Thu Feb 27 12:55:34 2014
From: bioinformatics.umd at gmail.com (UMD Bioinformatics)
Date: Thu, 27 Feb 2014 13:55:34 -0500
Subject: [maker-devel] Problem with OpenFabrics and infiniband
In-Reply-To: <CF34C944.A5B0%carson.holt@genetics.utah.edu>
References: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com>
	<CF34C944.A5B0%carson.holt@genetics.utah.edu>
Message-ID: <2840BC1C-70CC-4A0D-AB44-AEFD718C7B8C@gmail.com>

Hi Carson,

Thanks that fixed the issue. 

Cheers
Ian

On Feb 27, 2014, at 1:09 PM, Carson Holt <carson.holt at genetics.utah.edu> wrote:

> It?s a little more complicated than that.  MAKER is written in Perl, and Perl doesn?t give me the low level access that a language like C would for controlling memory access (I don?t control that).  All I get is Perl?s standard implementation of forks.  So it?s not really a matter of MAKER changing, it would be a matter of changing Perl itself (which I have no power over, and I don?t think will be changing anytime soon).
> 
> For now you just have to add this flag to OpenMPI when running MAKER with mpiexec ?>  -mca btl ^openib
> 
> Example :
>> mpiexec -mca btl ^openib -n 20 maker
> 
> 
> Thanks,
> Carson
> 
> 
> From: UMD Bioinformatics <bioinformatics.umd at gmail.com>
> Date: Thursday, February 27, 2014 at 9:46 AM
> To: <maker-devel at yandell-lab.org>
> Subject: Problem with OpenFabrics and infiniband
> 
> Hello,
> 
> I?ve had my IT folks install maker on our cluster at UMD. I?m having a SEGFAULT error when running maker on inifiniband nodes vs gigE nodes. According to the logs this appears to be an issue with forks but I?m not sure how to fix this. I would simply use the gigE nodes but we are in the process of updating everything to inifiniband so I?ll need to address this issue as some point. I?ve attached the error log from the MPI run as well as commentary from my HPCC team. 
> 
> IT suggestions
> 
> If you look at the top of the error log for the problematic job, it clearly
> warns of an issue with doing 'fork's within openmpi/openfabrics framework.
> 
> In particular, the use of the fork system call is only partially supported
> in the OpenFabrics software (this is the drivers, etc for the infiniband
> connections). See e.g. 
> http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork
> for more information. In particular the paragraphs starting with the
> sentence with the red highlighted "it does not mean that your fork()-calling 
> application is safe". (The kernel, openMPI version, and OFED version are 
> sufficiently recent to mean that there is _some_ fork support).
> 
> The fact that the job runs over gigE but not IB, in conjunction with the
> warning from openmpi, strongly suggests that this is the issue that you are 
> encountering. I suspect that maker touches registered memory before the fork,
> which would result in a segfault (matching what was observed).
> 
> You can try adding the arguments
> --mca mpi_warn_on_fork 0 
> to the mpirun command, just in case the crash was somehow caused by openmpi's
> warning, but I would not hold out much hope for that.
> 
> ###UPDATE### This does not fix the problem.
> 
> 
> Basically, it looks like maker uses some system calls like fork in a manner
> which is incompatible with the current OpenFabrics software, and thus will
> not work with infiniband. This situation is likely to remain until either
> maker changes to be compatible with OFED, or OFED's support for the fork
> system call is broadened.
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140227/c8d05f7d/attachment.html>

From sjackman at gmail.com  Thu Feb 27 17:17:22 2014
From: sjackman at gmail.com (Shaun Jackman)
Date: Thu, 27 Feb 2014 15:17:22 -0800
Subject: [maker-devel] Mapping gene names
In-Reply-To: <B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
Message-ID: <etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>

Is there a corresponding?protein_forward=1 option to map forward protein names from protein2genome?

Cheers,
Shaun

On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote:

Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff.

--Carson?

Sent from my iPhone

On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:

What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. ?Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. ?Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). ?I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors.?

Thanks,
Carson

From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date: Wednesday, February 26, 2014 at 3:04 PM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions.

In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature.

THanks,
Mikael

26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:

It will still work without est_forward. ?It just works a little differently. ?Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome.

If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. ?Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). ?So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). ?To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). ?The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match.?

Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). ?MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. ?Also match parameters for exonerate will not be relaxed as they were with est_forward.

As you can see the behavior, is slightly different (because it?s an accidental feature).

Thanks,
Carson


From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date: Wednesday, February 26, 2014 at 6:37 AM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right?

Mikael

26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:

Yes. ?That should work as well as an accidental feature.

--Carson?

Sent from my iPhone

On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se> wrote:

Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?

Thanks,
Mikael

26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:

There is a way. ?It?s not a standard option and it?s undocumented, but if you add?est_forward=1 to the maker_opts.ctl file, then it will do just that. ?The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option. ?If you add tags to your fasta headers, those can be used to guide the mapping and naming. ?For example, gene_id=<some_gene> ?will ensure different isoforms that share a common gene_id get clustered into the same gene, and?maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp ?and just using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.

?Carson


From: Shaun Jackman <sjackman at gmail.com>
Reply-To: Shaun Jackman <sjackman at gmail.com>
Date: Tuesday, February 25, 2014 at 5:06 PM
To: <maker-devel at yandell-lab.org>
Subject: [maker-devel] Mapping gene names

Hi,

I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?

maker_opts.ctl


est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1

Thanks,
Shaun

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________  
maker-devel mailing list  
maker-devel at box290.bluehost.com  
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140227/15f5085c/attachment.html>

From sjackman at gmail.com  Thu Feb 27 18:27:30 2014
From: sjackman at gmail.com (Shaun Jackman)
Date: Thu, 27 Feb 2014 16:27:30 -0800
Subject: [maker-devel] Mapping gene names
In-Reply-To: <etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
Message-ID: <CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>

Sorry, ignore my previous question. est_forward also carries forward the
names of protein evidence and works like a charm. Thank you!

The larger rrn16 and rrn23 genes annotated perfectly, but the smaller
rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They
are in the blastn output, and in the evidence_0.gff. rrn5 has perfect
identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value
(2e-66 < eval_blastn=1e-10). How should I debug which filter is removing
these hits?

organism_type=prokaryotic
est2genome=1
protein2genome=1
est_forward=1

Cheers,
Shaun


On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:

> Is there a corresponding protein_forward=1 option to map forward protein
> names from protein2genome?
>
> Cheers,
> Shaun
>
> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com<//carsonhh at gmail.com>)
> wrote:
>
> Sorry I meant to say prefilter on the score in the mRNA column before
> passing the gff3 to model_gff.
>
> --Carson
>
> Sent from my iPhone
>
> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
>
>  What you can do is run it once with just est_forward=1 and
> est2genome/protein2genome set to 1.  Then take those results, pass them in
> as model_gff and use the map_forward option to then filter the results
> based on mRNA score and that would copy names onto new gene under the
> standard MAKER pipeline.  Eventually it?s really supposed to go into a
> separate tool that will map genes onto new assemblies (but under the hood
> the tool will just be calling MAKER with certain parameters restricted).  I
> do this because if people commonly use it mixed with things like SNAP I can
> start to get some very weird behaviors.
>
> Thanks,
> Carson
>
>  From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
> Date: Wednesday, February 26, 2014 at 3:04 PM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Mapping gene names
>
>  It seems that this could be a very useful option in those cases where
> you have firm a priori knowledge of the placement of ESTs. However, while
> trying it I note that est_forward implies that the est2genome predictor is
> turned on, implicitly. Is this necessary for this to work? I?m after the
> behavior you describe below where exonerate is made to try really hard
> within a limited region to align an est, but I would not like maker to
> produce est2genome predictions.
>
> In general, I think this maker_coor and est_forward is a feature set that
> is worthy to be promoted into a documented feature.
>
> THanks,
> Mikael
>
>  26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>
>  It will still work without est_forward.  It just works a little
> differently.  Keep in mind this was a hidden feature I used to find
> stubborn or hard to find missing genes after reassembly of a genome.
>
> If est_forward is provided, MAKER will parse the database to look for the
> maker_coor tags early in the pipeline.  Then it will create a list of
> locations to search, and it will search them even if there are no BLAST
> results to seed the search (normally MAKER gets a BLAST result first and
> then polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to
> look for a match using all of chr1 as the input to exonerate even when
> BLAST finds nothing (this is a very very slow search, but can help pick up
> one or two stubborn genes that don?t remap well).  To allow this, MAKER
> gives exonerate looser matching parameters (i.e. allows for single base
> pair introns perhaps caused by assembly errors).  The logic here is that
> given the fact that I already told MAKER that with some degree of
> confidence I expect sequence A to map to to location X, it will try its
> hardest to make it match.
>
> Without est_forward set, the maker_coor= flag still gets read in GI.pm at
> line 1563, but only after a BLAST alignment has already seeded it to the
> region (that BLAST result has the information in its description
> parameter).  MAKER will then ignore seeds completely outside of maker_coor.
> In addition any BLAST seeds that overlap maker_coor will get the search
> space for alignment polishing adjusted to match maker_coor exactly.  Also
> match parameters for exonerate will not be relaxed as they were with
> est_forward.
>
> As you can see the behavior, is slightly different (because it?s an
> accidental feature).
>
> Thanks,
> Carson
>
>
>
>  From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
> Date: Wednesday, February 26, 2014 at 6:37 AM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Mapping gene names
>
>  That might be a useful and time saving accidental feature. But, reading
> the code, it seems that I need to supply maker_coor but not gene_id, as
> well as the configuration option est_forward for this to work. Any
> occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1
> right?
>
> Mikael
>
>  26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>
>  Yes.  That should work as well as an accidental feature.
>
> --Carson
>
> Sent from my iPhone
>
> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <
> mikael.durling at slu.se> wrote:
>
> Can this use of maker_coor be used only to hint about the placement of the
> ests, without affecting the naming of the final genes? Ie if I have a
> database of EST where I have a priori knowledge of their rough placement,
> can this placement be given to maker without providing est_forward=1?
>
> Thanks,
> Mikael
>
>  26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>
>  There is a way.  It?s not a standard option and it?s undocumented, but
> if you add est_forward=1 to the maker_opts.ctl file, then it will do just
> that.  The option won?t already be there so you?ll have to type it in.
>
> There is also a feature designed to work with this option.  If you add
> tags to your fasta headers, those can be used to guide the mapping and
> naming.  For example, gene_id=<some_gene>  will ensure different isoforms
> that share a common gene_id get clustered into the same gene,
> and maker_coor=chr1:1-10000 in the fasta header will force a particular
> sequence to only be mapped against chr1 within the range of 1-10000 bp  and
> just using maker_coor=chr1 will force it to only be mapped against chr1.
>
> This is an undocumented way to remap genes onto new assemblies using blast
> alignments of earlier transcript or protein annotations as a guide.
>
> ?Carson
>
>
>
>
>  From: Shaun Jackman <sjackman at gmail.com>
> Reply-To: Shaun Jackman <sjackman at gmail.com>
> Date: Tuesday, February 25, 2014 at 5:06 PM
> To: <maker-devel at yandell-lab.org>
> Subject: [maker-devel] Mapping gene names
>
>  Hi,
>
> I?m annotating a genome using a closely related genome from Genbank, using
> the .frn (RNA) and .faa (protein) files from Genbank as evidence to
> annotate my genome. I?ve run Maker, and the annotation seems to have worked
> well. Is it possible to map the names of the genes from the related species
> to my annotation? I see the *map_forward* option, which applies to the
> *model_gff* parameter. Is there a similar option for *est* and *protein*?
>
> *maker_opts.ctl*
>
> est=NC_123456.frn
> protein=NC_123456.faa
> est2genome=1
> protein2genome=1
>
> Thanks,
> Shaun
>  _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
>  http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
>
>   _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140227/1670be5a/attachment.html>

From carsonhh at gmail.com  Thu Feb 27 19:13:06 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 27 Feb 2014 18:13:06 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
	<CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
Message-ID: <CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>

Set single_exon=1, and the minimum size to a smaller value.  I think it's set to 250 right now.  Also est2genome is looking for ORF, so if there is none (as with tRNAs) they probably won't get picked up.

--Carson 

Sent from my iPhone

> On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:
> 
> Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you!
> 
> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits?
> 
> organism_type=prokaryotic
> est2genome=1
> protein2genome=1
> est_forward=1
> Cheers,
> Shaun
> 
> 
> 
>> On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
>> Is there a corresponding protein_forward=1 option to map forward protein names from protein2genome?
>> 
>> Cheers,
>> Shaun
>> 
>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote:
>>> 
>>> Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff.
>>> 
>>> --Carson 
>>> 
>>> Sent from my iPhone
>>> 
>>> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>> 
>>>> What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1.  Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline.  Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted).  I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors. 
>>>> 
>>>> Thanks,
>>>> Carson
>>>> 
>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>> Date: Wednesday, February 26, 2014 at 3:04 PM
>>>> To: Carson Holt <carsonhh at gmail.com>
>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>> Subject: Re: [maker-devel] Mapping gene names
>>>> 
>>>> It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions.
>>>> 
>>>> In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature.
>>>> 
>>>> THanks,
>>>> Mikael
>>>> 
>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>>>>> 
>>>>> It will still work without est_forward.  It just works a little differently.  Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome.
>>>>> 
>>>>> If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline.  Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well).  To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors).  The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match. 
>>>>> 
>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter).  MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly.  Also match parameters for exonerate will not be relaxed as they were with est_forward.
>>>>> 
>>>>> As you can see the behavior, is slightly different (because it?s an accidental feature).
>>>>> 
>>>>> Thanks,
>>>>> Carson
>>>>> 
>>>>> 
>>>>> 
>>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM
>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>> 
>>>>> That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? 
>>>>> 
>>>>> Mikael
>>>>> 
>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>> 
>>>>>> Yes.  That should work as well as an accidental feature.
>>>>>> 
>>>>>> --Carson 
>>>>>> 
>>>>>> Sent from my iPhone
>>>>>> 
>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se> wrote:
>>>>>> 
>>>>>>> Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Mikael
>>>>>>> 
>>>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>>> 
>>>>>>>> There is a way.  It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that.  The option won?t already be there so you?ll have to type it in.
>>>>>>>> 
>>>>>>>> There is also a feature designed to work with this option.  If you add tags to your fasta headers, those can be used to guide the mapping and naming.  For example, gene_id=<some_gene>  will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp  and just using maker_coor=chr1 will force it to only be mapped against chr1.
>>>>>>>> 
>>>>>>>> This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.
>>>>>>>> 
>>>>>>>> ?Carson
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> From: Shaun Jackman <sjackman at gmail.com>
>>>>>>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>>>>>>> To: <maker-devel at yandell-lab.org>
>>>>>>>> Subject: [maker-devel] Mapping gene names
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?
>>>>>>>> 
>>>>>>>> maker_opts.ctl
>>>>>>>> 
>>>>>>>> est=NC_123456.frn
>>>>>>>> protein=NC_123456.faa
>>>>>>>> est2genome=1
>>>>>>>> protein2genome=1
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Shaun
>>>>>>>> 
>>>>>>>> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>>>> _______________________________________________
>>>>>>>> maker-devel mailing list
>>>>>>>> maker-devel at box290.bluehost.com
>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> _______________________________________________ 
>>> maker-devel mailing list 
>>> maker-devel at box290.bluehost.com 
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140227/a927fc81/attachment.html>

From mikael.durling at slu.se  Fri Feb 28 04:40:30 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Fri, 28 Feb 2014 10:40:30 +0000
Subject: [maker-devel] maker_coor behaviour
Message-ID: <8CA99854-CF5B-4533-B625-0EDD5DFFCE8B@slu.se>

Hi,

in a previous thread, the maker_coor feature for ETSs was mentioned. I have been trying it out, without using it for mapping gene names. I have placed these ESTs by other means, an thought the maker_coor feature would be a good use of this a priori knowledge. My major problem i try to solve is that I find that some ESTs where I know where they should be aligned, are not recruited to that position by maker?s blastn->exonerate method (I find them on other scaffolds). So I thought maker_coor with the est_forward behavior (as described) would be a good option to force my evidence onto the correct position, instead of ending up supporting or braking other models. However, as soon as I run with maker_coor tagged est sequences, no est2genome evidence appears in the final gff3 file. The blastn evidence is there when est_forward is disabled, but as expected, there is no blastn evidence when est_forward is turned on. It seems though as the evidence is used, as the QI lines indicate EST support for both splice sites as well as exon alignments, but I have no way to visualize and/or evaluate the congruence of evidence and models. Would it be possible to tweak Maker into outputting the est2genome alignments when est_forward/maker_coor is used? I couldn?t figure myself where in the code this was handled.

I could of course do my own exonerate alignments of these ESTs and feed them into maker as est_gff, but if maker already has the machinery to to this, I thought it would be a good idea to use it.

Thanks,
Mikael


From carsonhh at gmail.com  Fri Feb 28 08:09:09 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 28 Feb 2014 07:09:09 -0700
Subject: [maker-devel] maker_coor behaviour
Message-ID: <CF35E345.A60A%carsonhh@gmail.com>

I wouldn?t use those options for standard de novo annotation.  There are
really other more appropriate thing that should be used instead.  Both
maker_coor and est_forward are destined to be part of a separate tool that
will secretly just be calling MAKER, but will allow me to control what
other parameters MAKER sees to avoid certain logic incompatibilities that
make sense when mapping entire genes onto a new assembly, but not really
for de novo annotation using ESTs.

You should instead try modifying these options in the maker_bopts.ctl file
?>

pcov_blastn= #Blastn Percent Coverage Threhold EST-Genome Alignments
pid_blastn= #Blastn Percent Identity Threshold EST-Genome Aligments
eval_blastn= #Blastn eval cutoff
bit_blastn= #Blastn bit cutoff
depth_blastn= #Blastn depth cutoff (0 to disable cutoff). For trimming
high evidence overlap regions

en_score_limit= #Exonerate nucleotide percent of maximal score threshold


If either blastn or est2genome results disappear, it is because they don?t
meet one of these thresholds (blastn results that don?t meet the
thresholds but are borderline are kept if exonerate does meet the
thresholds, but if exonerate misses a threshold they will be thrown out).
That is whey the EST in question gets thrown out and it?s why the blastn
result disappears when you try and anchor it with maker_coor.

You can visualize everything with a browser when your done.  I still
recommend the old version of Apollo for this (it?s just easier).  You can
try and install it using the ?./Build apollo? option from the
.../maker/src/ directory, and it will be installed in
.../maker/exe/apollo.  It requires that you have apache ant installed to
do this.  Otherwise just download it from the GMOD source forge page and
install it manually.

Thanks,
Carson


On 2/28/14, 3:40 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
wrote:

>Hi,
>
>in a previous thread, the maker_coor feature for ETSs was mentioned. I
>have been trying it out, without using it for mapping gene names. I have
>placed these ESTs by other means, an thought the maker_coor feature would
>be a good use of this a priori knowledge. My major problem i try to solve
>is that I find that some ESTs where I know where they should be aligned,
>are not recruited to that position by maker?s blastn->exonerate method (I
>find them on other scaffolds). So I thought maker_coor with the
>est_forward behavior (as described) would be a good option to force my
>evidence onto the correct position, instead of ending up supporting or
>braking other models. However, as soon as I run with maker_coor tagged
>est sequences, no est2genome evidence appears in the final gff3 file. The
>blastn evidence is there when est_forward is disabled, but as expected,
>there is no blastn evidence when est_forward is turned on. It seems
>though as the evidence is used, as the QI lines indicate EST support for
>both splice sites as well as exon alignments, but I have no way to
>visualize and/or evaluate the congruence of evidence and models. Would it
>be possible to tweak Maker into outputting the est2genome alignments when
>est_forward/maker_coor is used? I couldn?t figure myself where in the
>code this was handled.
>
>I could of course do my own exonerate alignments of these ESTs and feed
>them into maker as est_gff, but if maker already has the machinery to to
>this, I thought it would be a good idea to use it.
>
>Thanks,
>Mikael
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From rbharris at uw.edu  Fri Feb 28 14:14:55 2014
From: rbharris at uw.edu (Rebecca Harris)
Date: Fri, 28 Feb 2014 12:14:55 -0800
Subject: [maker-devel] error in snap training
In-Reply-To: <16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com>
References: <CAESS276MjRUmto+9fkr68jRXBE9or4geWB-q4Oc5_qKsQOdnpA@mail.gmail.com>
	<16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com>
Message-ID: <CAESS277JnyDD48DQvpKtw_kDw1xqOnGR-Fiqu-PoOPaesO3Oug@mail.gmail.com>

Hi -

I tried this and ran cegma --genome on my original fasta file. I then tried
to use cegama2zff to convert, fathom, and forge. However, when I try to
generate new parameters with forge, I get the same error that I got when
trying to train SNAP without CEGMA: "ZOE ERROR (from forge): impossible
error5 KOG1342.20". Any suggestions would be great,
thanks!

Cheers,
Rebecca


On Tue, Feb 25, 2014 at 2:12 PM, Carson Holt <carsonhh at gmail.com> wrote:

> Make sure you are using 2.31,  and then try the maker2zff filters
> individually.  If the protein models are not working well, use CEGMA to
> generate models. It's from the same group as SNAP.  Use cegma2zff for the
> conversion.
>
> --Carson
>
> Sent from my iPhone
>
> > On Feb 25, 2014, at 2:49 PM, Rebecca Harris <rbharris at uw.edu> wrote:
> >
> > Hey -
> >
> > I'm trying to train SNAP and am running into errors. I don't have any
> EST evidence, just protein. My .gff file reports 10865 genes but when I run
> maker2zff  -c0 -e0 I get back empty genome files. When I run maker2zff -n,
> a ton of overlap_prev_exon errors get written to the screen and then with I
> get to the forge step I get an "impossible error5". Any help would be
> greatly appreciated.
> >
> > Thanks!
> > Rebecca
> > _______________________________________________
> > maker-devel mailing list
> > maker-devel at box290.bluehost.com
> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140228/4957d69e/attachment.html>

From carsonhh at gmail.com  Fri Feb 28 14:22:12 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 28 Feb 2014 13:22:12 -0700
Subject: [maker-devel] error in snap training
In-Reply-To: <CAESS277JnyDD48DQvpKtw_kDw1xqOnGR-Fiqu-PoOPaesO3Oug@mail.gmail.com>
References: <CAESS276MjRUmto+9fkr68jRXBE9or4geWB-q4Oc5_qKsQOdnpA@mail.gmail.com>
	<16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com>
	<CAESS277JnyDD48DQvpKtw_kDw1xqOnGR-Fiqu-PoOPaesO3Oug@mail.gmail.com>
Message-ID: <CF363CE6.A6B6%carsonhh@gmail.com>

If it?s failing both ways I?m thinking this may be SNAP itself. Try these
two different versions of SNAP.

?> http://korflab.ucdavis.edu/Software/snap-2013-02-16.tar.gz
and 
?> http://korflab.ucdavis.edu/Software/snap-2013-11-29.tar.gz

If they both fail then contact the SNAP development group ?> korflab AT
ucdavis DOT edu

Thanks,
Carson


From:  Rebecca Harris <rbharris at uw.edu>
Date:  Friday, February 28, 2014 at 1:14 PM
To:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] error in snap training

Hi -

I tried this and ran cegma --genome on my original fasta file. I then tried
to use cegama2zff to convert, fathom, and forge. However, when I try to
generate new parameters with forge, I get the same error that I got when
trying to train SNAP without CEGMA: "ZOE ERROR (from forge): impossible
error5 KOG1342.20". Any suggestions would be great,
thanks!

Cheers,
Rebecca


On Tue, Feb 25, 2014 at 2:12 PM, Carson Holt <carsonhh at gmail.com> wrote:
> Make sure you are using 2.31,  and then try the maker2zff filters
> individually.  If the protein models are not working well, use CEGMA to
> generate models. It's from the same group as SNAP.  Use cegma2zff for the
> conversion.
> 
> --Carson
> 
> Sent from my iPhone
> 
>> > On Feb 25, 2014, at 2:49 PM, Rebecca Harris <rbharris at uw.edu> wrote:
>> >
>> > Hey -
>> >
>> > I'm trying to train SNAP and am running into errors. I don't have any EST
>> evidence, just protein. My .gff file reports 10865 genes but when I run
>> maker2zff  -c0 -e0 I get back empty genome files. When I run maker2zff -n, a
>> ton of overlap_prev_exon errors get written to the screen and then with I get
>> to the forge step I get an "impossible error5". Any help would be greatly
>> appreciated.
>> >
>> > Thanks!
>> > Rebecca
>> > _______________________________________________
>> > maker-devel mailing list
>> > maker-devel at box290.bluehost.com
>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20140228/e77809ff/attachment.html>

From darasappan at gmail.com  Mon Feb  3 09:31:16 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Mon, 3 Feb 2014 10:31:16 -0600
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
Message-ID: <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>

Hi Daniel,

I was able to check on some of those questions.

1. From trinity assembly: I started with 102000 contigs. I used  
trinotate to annotate proteins in this.

I ran maker on this data with est2genome set to 1. The output looks  
like this (most important parts on top):

     6653 gene
    46675 exon
  280534 protein_match
59934 CDS
     969 contig
  105388 expressed_sequence_match
   12584 five_prime_UTR
   78565 match
1401369 match_part
   10180 mRNA
   11545 three_prime_UTR

2. From cufflinks assembly: I started with 133380 entries (out of  
which there are 29,000 transcripts).  I used the protein sequences  
from trinity assembly.

I ran maker on this data with est2genome set to 1. The output looks  
like this:
      29 gene
      75 exon
  573659 protein_match
67 CDS
    1099 contig
  269298 expressed_sequence_match
      23 five_prime_UTR
  173844 match
2221846 match_part
      29 mRNA
      23 three_prime_UTR

The genes annotated using the trinity assembly is lower than expected,  
so I went the cufflinks route. I dont understand why when using the  
cufflinks transcripts, even less genes are being found.

3. Training SNAP:  I used the results of maker from 1 to train SNAP.   
I then used that training set to rerun maker:
snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
maker_mpi_withAlltrinity/snap/RHA.hmm
est2genome=0

And again I got results with no entries for gene, exon, CDS etc.
957 contig
   46555 expressed_sequence_match
   43651 match
  553633 match_part
  113738 protein_match

As I mentioned in another email, cegma results indicated that the  
genome was more than 90% complete. Any suggestions would be helpful.

Thank you
Dhivya


On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:

> Hi Dhivya,
>
> I think there a few numbers that could be helpful to understand  
> what's happening here.
>
> How many transcripts did Trinity assembly the RNA-seq data into?  
> Also, you had 29,000 transcripts from cufflinks, but fewer from  
> MAKER when you gave it the cufflinks data. How many transcripts did  
> MAKER identify with the cufflinks data? Did you still get more than  
> the 10,000 transcripts that you found with just the Trinity data?
>
> A key part of MAKER's approach to genome annotation that might be  
> affecting it's performance is that it only annotates a gene where  
> there is both evidence (like your RNA-seq data) and an ab-initio  
> prediction. If a prediction is unsupported by the evidence, then  
> MAKER won't annotate a gene and if evidence aligns where there's no  
> prediction, MAKER won't annotate a gene either. What ab-initio  
> predictors are you using and have they been trained specific genome?
>
> You can force MAKER to automatically promote evidence alignments to  
> a gene model by setting the est2genome option to 1, but that will  
> usually give you many false positives.
>
> Try rerunning it with either the Trinity data or the Cufflinks data  
> and with est2genome set to 1, and let us know how that affects the  
> MAKER results.
>
> Thanks,
> Daniel
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ________________________________________
> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of  
> dhivya arasappan [darasappan at gmail.com]
> Sent: Thursday, January 30, 2014 11:18 AM
> To: maker-devel at yandell-lab.org
> Subject: [maker-devel] maker annotation with cufflinks output
>
> Hello,
>
> I am trying to annotate a 200 mb plant genome for which I have a very
> good assembly.
>
> I tried to denovo assemble RNA-seq data using trinity and ran maker
> using my genome assembly and the trinity results.  I did not get as
> many transcripts as expected, around 10,000 transcripts.
>
> So, I decided to try a different approach.  I did a genome assisted
> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
> generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
> genome assembly and the cufflinks result.  I get much less number of
> transcripts as a result.
>
> If cufflinks found 29000 transcripts by mapping to the genome, I'm
> confused as to why maker is not finding the same.
>
> Any suggestions would be appreciated.
>
> Thanks
> Dhivya
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- 
> lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140203/f454f816/attachment-0001.html>

From rebzi87 at gmail.com  Tue Feb  4 15:29:41 2014
From: rebzi87 at gmail.com (Rebecca Harris)
Date: Tue, 4 Feb 2014 14:29:41 -0800
Subject: [maker-devel] maker output
Message-ID: <CAESS275oypPL7CUMF2QaV3MKxNtNtXppYdF3exxFQMKSA7Vqdw@mail.gmail.com>

Hi,

I'm running maker on a cluster and am having some problems with the run
ending prematurely. I would like to know if there is a straightforward way
to figure out whether maker has completed. I've tried: 1) counting the
number of run.log files in the datastore directly, and 2) counting the
instances of "FINISHED" in the master_datastore_index.log. These numbers
are inconsistent. I have 200,000 contigs in my fasta file - do I expect
200,000 run.log files? I've had to restart maker a few times - it appears
that maker is appending to the master_datastore_index.log, as I find
multiple instances of the same contig being finished.

Thanks!

Cheers,
Rebecca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140204/690873a4/attachment-0001.html>

From darasappan at gmail.com  Tue Feb  4 15:43:19 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Tue, 4 Feb 2014 16:43:19 -0600
Subject: [maker-devel] Fwd:  maker annotation with cufflinks output
References: <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
Message-ID: <EAFE0808-FDA7-49E5-8FD6-9AFD570DF20C@gmail.com>

Resending this since it didnt make it to the mailing list before.

>
> I was able to check on some of those questions.
>
> 1. From trinity assembly: I started with 102000 contigs. I used  
> trinotate to annotate proteins in this.
>
> I ran maker on this data with est2genome set to 1. The output looks  
> like this (most important parts on top):
>
>     6653 gene
>    46675 exon
>  280534 protein_match
> 59934 CDS
>     969 contig
>  105388 expressed_sequence_match
>   12584 five_prime_UTR
>   78565 match
> 1401369 match_part
>   10180 mRNA
>   11545 three_prime_UTR
>
> 2. From cufflinks assembly: I started with 133380 entries (out of  
> which there are 29,000 transcripts).  I used the protein sequences  
> from trinity assembly.
>
> I ran maker on this data with est2genome set to 1. The output looks  
> like this:
>      29 gene
>      75 exon
>  573659 protein_match
> 67 CDS
>    1099 contig
>  269298 expressed_sequence_match
>      23 five_prime_UTR
>  173844 match
> 2221846 match_part
>      29 mRNA
>      23 three_prime_UTR
>
> The genes annotated using the trinity assembly is lower than  
> expected, so I went the cufflinks route. I dont understand why when  
> using the cufflinks transcripts, even less genes are being found.
>
> 3. Training SNAP:  I used the results of maker from 1 to train  
> SNAP.  I then used that training set to rerun maker:
> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
> maker_mpi_withAlltrinity/snap/RHA.hmm
> est2genome=0
>
> And again I got results with no entries for gene, exon, CDS etc.
> 957 contig
>   46555 expressed_sequence_match
>   43651 match
>  553633 match_part
>  113738 protein_match
>
> As I mentioned in another email, cegma results indicated that the  
> genome was more than 90% complete. Any suggestions would be helpful.
>
> Thank you
> Dhivya
>
>
>
>
> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>
>> Hi Dhivya,
>>
>> I think there a few numbers that could be helpful to understand  
>> what's happening here.
>>
>> How many transcripts did Trinity assembly the RNA-seq data into?  
>> Also, you had 29,000 transcripts from cufflinks, but fewer from  
>> MAKER when you gave it the cufflinks data. How many transcripts did  
>> MAKER identify with the cufflinks data? Did you still get more than  
>> the 10,000 transcripts that you found with just the Trinity data?
>>
>> A key part of MAKER's approach to genome annotation that might be  
>> affecting it's performance is that it only annotates a gene where  
>> there is both evidence (like your RNA-seq data) and an ab-initio  
>> prediction. If a prediction is unsupported by the evidence, then  
>> MAKER won't annotate a gene and if evidence aligns where there's no  
>> prediction, MAKER won't annotate a gene either. What ab-initio  
>> predictors are you using and have they been trained specific genome?
>>
>> You can force MAKER to automatically promote evidence alignments to  
>> a gene model by setting the est2genome option to 1, but that will  
>> usually give you many false positives.
>>
>> Try rerunning it with either the Trinity data or the Cufflinks data  
>> and with est2genome set to 1, and let us know how that affects the  
>> MAKER results.
>>
>> Thanks,
>> Daniel
>>
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> ________________________________________
>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf  
>> of dhivya arasappan [darasappan at gmail.com]
>> Sent: Thursday, January 30, 2014 11:18 AM
>> To: maker-devel at yandell-lab.org
>> Subject: [maker-devel] maker annotation with cufflinks output
>>
>> Hello,
>>
>> I am trying to annotate a 200 mb plant genome for which I have a very
>> good assembly.
>>
>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>> using my genome assembly and the trinity results.  I did not get as
>> many transcripts as expected, around 10,000 transcripts.
>>
>> So, I decided to try a different approach.  I did a genome assisted
>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using  
>> my
>> genome assembly and the cufflinks result.  I get much less number of
>> transcripts as a result.
>>
>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>> confused as to why maker is not finding the same.
>>
>> Any suggestions would be appreciated.
>>
>> Thanks
>> Dhivya
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140204/b1755e26/attachment-0001.html>

From dence at genetics.utah.edu  Tue Feb  4 15:42:52 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Tue, 4 Feb 2014 22:42:52 +0000
Subject: [maker-devel] maker output
In-Reply-To: <CAESS275oypPL7CUMF2QaV3MKxNtNtXppYdF3exxFQMKSA7Vqdw@mail.gmail.com>
References: <CAESS275oypPL7CUMF2QaV3MKxNtNtXppYdF3exxFQMKSA7Vqdw@mail.gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D43E51@mxb2.hg.genetics.utah.edu>

Hi Rebecca, If you're looking at the master_datastore_index.log, then you're looking for lines with the "FINISHED" status. If you do a count on those (with "grep -c" for example), that will tell you how many contigs have finished.

If you have 200,000,000 contigs that you're trying to annotate, you might also consider settinng the "min_contig" parameter in the maker_opts.ctl file. This parameter sets a minimum length for a contig before MAKER tries to annotate it. Usually 5000 bp or larger is what you want. That will save you some time in the long run.

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Rebecca Harris [rebzi87 at gmail.com]
Sent: Tuesday, February 04, 2014 3:29 PM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] maker output

Hi,

I'm running maker on a cluster and am having some problems with the run ending prematurely. I would like to know if there is a straightforward way to figure out whether maker has completed. I've tried: 1) counting the number of run.log files in the datastore directly, and 2) counting the instances of "FINISHED" in the master_datastore_index.log. These numbers are inconsistent. I have 200,000 contigs in my fasta file - do I expect 200,000 run.log files? I've had to restart maker a few times - it appears that maker is appending to the master_datastore_index.log, as I find multiple instances of the same contig being finished.

Thanks!

Cheers,
Rebecca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140204/ce6b2734/attachment-0001.html>

From mikael.durling at slu.se  Tue Feb  4 15:49:46 2014
From: mikael.durling at slu.se (=?iso-8859-1?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Tue, 4 Feb 2014 22:49:46 +0000
Subject: [maker-devel] maker output
In-Reply-To: <CAESS275oypPL7CUMF2QaV3MKxNtNtXppYdF3exxFQMKSA7Vqdw@mail.gmail.com>
References: <CAESS275oypPL7CUMF2QaV3MKxNtNtXppYdF3exxFQMKSA7Vqdw@mail.gmail.com>
Message-ID: <D36EEC49-FC5A-4DB8-BF08-795103F1B485@slu.se>

> 4 feb 2014 kl. 23:32 skrev "Rebecca Harris" <rebzi87 at gmail.com>:
> 
> Hi,
> 
> I'm running maker on a cluster and am having some problems with the run ending prematurely. I would like to know if there is a straightforward way to figure out whether maker has completed. I've tried: 1) counting the number of run.log files in the datastore directly, and 2) counting the instances of "FINISHED" in the master_datastore_index.log.

This is usually what I do to check if maker has finished all scaffolds. There should be one FINISHED statement for each entry in the scata file. (It might be one for every scaffold longer than the gjven minimum length. 

> These numbers are inconsistent. I have 200,000 contigs in my fasta file - do I expect 200,000 run.log files? I've had to restart maker a few times - it appears that maker is appending to the master_datastore_index.log, as I find multiple instances of the same contig being finished. 

Run maker -daindex to rebuild the file if you like. The number of FINISHED should not change though

Mikael

> 
> Thanks!
> 
> Cheers,
> Rebecca
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Tue Feb  4 15:50:10 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 04 Feb 2014 15:50:10 -0700
Subject: [maker-devel] maker output
In-Reply-To: <CAESS275oypPL7CUMF2QaV3MKxNtNtXppYdF3exxFQMKSA7Vqdw@mail.gmail.com>
References: <CAESS275oypPL7CUMF2QaV3MKxNtNtXppYdF3exxFQMKSA7Vqdw@mail.gmail.com>
Message-ID: <CF16BBC3.9807%carsonhh@gmail.com>

Clusters are notoriously flakey, so maker is restartable (hence the need for
the log file).  Also since multiple nodes may write simultaneously to the
log, they can munge it?s contents.   You can rerun maker with the -dsindex
flag to regenerate the master_datastore_index.log as well without processing
anything else. You can even delete it before rebuilding it if you want to
ensure all entries are uniq (run on a single cpus when you do this).

Then count the number of FINISHED entries in the log.

Thanks,
Carson


From:  Rebecca Harris <rebzi87 at gmail.com>
Date:  Tuesday, February 4, 2014 at 3:29 PM
To:  <maker-devel at yandell-lab.org>
Subject:  [maker-devel] maker output

Hi,

I'm running maker on a cluster and am having some problems with the run
ending prematurely. I would like to know if there is a straightforward way
to figure out whether maker has completed. I've tried: 1) counting the
number of run.log files in the datastore directly, and 2) counting the
instances of "FINISHED" in the master_datastore_index.log. These numbers are
inconsistent. I have 200,000 contigs in my fasta file - do I expect 200,000
run.log files? I've had to restart maker a few times - it appears that maker
is appending to the master_datastore_index.log, as I find multiple instances
of the same contig being finished.

Thanks!

Cheers,
Rebecca
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140204/9fedef33/attachment-0001.html>

From carsonhh at gmail.com  Wed Feb  5 11:38:50 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 05 Feb 2014 11:38:50 -0700
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
Message-ID: <CF17D1FC.987A%carsonhh@gmail.com>

Do you have any features of type snap in your results from step 3?  We?ve
had a couple of recent posts where after training snap was giving no
results, and as a result maker couldn?t give any genes.  One cause of
something like that may be your step 2.  Make sure the ZFF wasn?t empty you
used to train with.  The maker2zff script uses filters to only put the best
genes in the off file, and if all your genes fail the filtering then you are
training with an empty ZFF.

Also you should use proteins from a related species as your protein file.  I
see that you protein marches are varying wildly from run to run? So is your
contig count?  Were the subset of contigs you have results for long enough
to contain genes?

?Carson

From:  dhivya arasappan <darasappan at gmail.com>
Date:  Monday, February 3, 2014 at 9:31 AM
To:  Daniel Ence <dence at genetics.utah.edu>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] maker annotation with cufflinks output

Hi Daniel,

I was able to check on some of those questions.

1. From trinity assembly: I started with 102000 contigs. I used trinotate to
annotate proteins in this.

I ran maker on this data with est2genome set to 1. The output looks like
this (most important parts on top):

    6653 gene
   46675 exon
 280534 protein_match
59934 CDS
    969 contig
 105388 expressed_sequence_match
  12584 five_prime_UTR
  78565 match
1401369 match_part
  10180 mRNA
  11545 three_prime_UTR

2. From cufflinks assembly: I started with 133380 entries (out of which
there are 29,000 transcripts).  I used the protein sequences from trinity
assembly.

I ran maker on this data with est2genome set to 1. The output looks like
this:
     29 gene
     75 exon
 573659 protein_match
67 CDS
   1099 contig
 269298 expressed_sequence_match
     23 five_prime_UTR
 173844 match
2221846 match_part
     29 mRNA
     23 three_prime_UTR

The genes annotated using the trinity assembly is lower than expected, so I
went the cufflinks route. I dont understand why when using the cufflinks
transcripts, even less genes are being found.

3. Training SNAP:  I used the results of maker from 1 to train SNAP.  I then
used that training set to rerun maker:
snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/sn
ap/RHA.hmm
est2genome=0

And again I got results with no entries for gene, exon, CDS etc.
957 contig
  46555 expressed_sequence_match
  43651 match
 553633 match_part
 113738 protein_match

As I mentioned in another email, cegma results indicated that the genome was
more than 90% complete. Any suggestions would be helpful.

Thank you
Dhivya


On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:

> Hi Dhivya, 
> 
> I think there a few numbers that could be helpful to understand what's
> happening here. 
> 
> How many transcripts did Trinity assembly the RNA-seq data into? Also, you had
> 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it the
> cufflinks data. How many transcripts did MAKER identify with the cufflinks
> data? Did you still get more than the 10,000 transcripts that you found with
> just the Trinity data?
> 
> A key part of MAKER's approach to genome annotation that might be affecting
> it's performance is that it only annotates a gene where there is both evidence
> (like your RNA-seq data) and an ab-initio prediction. If a prediction is
> unsupported by the evidence, then MAKER won't annotate a gene and if evidence
> aligns where there's no prediction, MAKER won't annotate a gene either. What
> ab-initio predictors are you using and have they been trained specific genome?
> 
> You can force MAKER to automatically promote evidence alignments to a gene
> model by setting the est2genome option to 1, but that will usually give you
> many false positives.
> 
> Try rerunning it with either the Trinity data or the Cufflinks data and with
> est2genome set to 1, and let us know how that affects the MAKER results.
> 
> Thanks,
> Daniel
> 
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ________________________________________
> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya
> arasappan [darasappan at gmail.com]
> Sent: Thursday, January 30, 2014 11:18 AM
> To: maker-devel at yandell-lab.org
> Subject: [maker-devel] maker annotation with cufflinks output
> 
> Hello,
> 
> I am trying to annotate a 200 mb plant genome for which I have a very
> good assembly.
> 
> I tried to denovo assemble RNA-seq data using trinity and ran maker
> using my genome assembly and the trinity results.  I did not get as
> many transcripts as expected, around 10,000 transcripts.
> 
> So, I decided to try a different approach.  I did a genome assisted
> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
> generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
> genome assembly and the cufflinks result.  I get much less number of
> transcripts as a result.
> 
> If cufflinks found 29000 transcripts by mapping to the genome, I'm
> confused as to why maker is not finding the same.
> 
> Any suggestions would be appreciated.
> 
> Thanks
> Dhivya
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140205/2bbca2c5/attachment-0001.html>

From dence at genetics.utah.edu  Wed Feb  5 12:28:48 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Wed, 5 Feb 2014 19:28:48 +0000
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <CF17D1FC.987A%carsonhh@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>,
	<CF17D1FC.987A%carsonhh@gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D43F95@mxb2.hg.genetics.utah.edu>

Hi Dhivya, Are the protein matches in your results coming from your annotations of the transcriptome? You should really use amino-acid sequences from related organisms and some kind of omnibus source like SwissProt.

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: Carson Holt [carsonhh at gmail.com]
Sent: Wednesday, February 05, 2014 11:38 AM
To: dhivya arasappan; Daniel Ence
Cc: maker-devel at yandell-lab.org
Subject: Re: [maker-devel] maker annotation with cufflinks output

Do you have any features of type snap in your results from step 3?  We?ve had a couple of recent posts where after training snap was giving no results, and as a result maker couldn?t give any genes.  One cause of something like that may be your step 2.  Make sure the ZFF wasn?t empty you used to train with.  The maker2zff script uses filters to only put the best genes in the off file, and if all your genes fail the filtering then you are training with an empty ZFF.

Also you should use proteins from a related species as your protein file.  I see that you protein marches are varying wildly from run to run? So is your contig count?  Were the subset of contigs you have results for long enough to contain genes?

?Carson

From: dhivya arasappan <darasappan at gmail.com<mailto:darasappan at gmail.com>>
Date: Monday, February 3, 2014 at 9:31 AM
To: Daniel Ence <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] maker annotation with cufflinks output

Hi Daniel,

I was able to check on some of those questions.

1. From trinity assembly: I started with 102000 contigs. I used trinotate to annotate proteins in this.

I ran maker on this data with est2genome set to 1. The output looks like this (most important parts on top):

    6653 gene
   46675 exon
 280534 protein_match
59934 CDS
    969 contig
 105388 expressed_sequence_match
  12584 five_prime_UTR
  78565 match
1401369 match_part
  10180 mRNA
  11545 three_prime_UTR

2. From cufflinks assembly: I started with 133380 entries (out of which there are 29,000 transcripts).  I used the protein sequences from trinity assembly.

I ran maker on this data with est2genome set to 1. The output looks like this:
     29 gene
     75 exon
 573659 protein_match
67 CDS
   1099 contig
 269298 expressed_sequence_match
     23 five_prime_UTR
 173844 match
2221846 match_part
     29 mRNA
     23 three_prime_UTR

The genes annotated using the trinity assembly is lower than expected, so I went the cufflinks route. I dont understand why when using the cufflinks transcripts, even less genes are being found.

3. Training SNAP:  I used the results of maker from 1 to train SNAP.  I then used that training set to rerun maker:
snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap/RHA.hmm
est2genome=0

And again I got results with no entries for gene, exon, CDS etc.
957 contig
  46555 expressed_sequence_match
  43651 match
 553633 match_part
 113738 protein_match

As I mentioned in another email, cegma results indicated that the genome was more than 90% complete. Any suggestions would be helpful.

Thank you
Dhivya


On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:

Hi Dhivya,

I think there a few numbers that could be helpful to understand what's happening here.

How many transcripts did Trinity assembly the RNA-seq data into? Also, you had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it the cufflinks data. How many transcripts did MAKER identify with the cufflinks data? Did you still get more than the 10,000 transcripts that you found with just the Trinity data?

A key part of MAKER's approach to genome annotation that might be affecting it's performance is that it only annotates a gene where there is both evidence (like your RNA-seq data) and an ab-initio prediction. If a prediction is unsupported by the evidence, then MAKER won't annotate a gene and if evidence aligns where there's no prediction, MAKER won't annotate a gene either. What ab-initio predictors are you using and have they been trained specific genome?

You can force MAKER to automatically promote evidence alignments to a gene model by setting the est2genome option to 1, but that will usually give you many false positives.

Try rerunning it with either the Trinity data or the Cufflinks data and with est2genome set to 1, and let us know how that affects the MAKER results.

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org>] on behalf of dhivya arasappan [darasappan at gmail.com<mailto:darasappan at gmail.com>]
Sent: Thursday, January 30, 2014 11:18 AM
To: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
Subject: [maker-devel] maker annotation with cufflinks output

Hello,

I am trying to annotate a 200 mb plant genome for which I have a very
good assembly.

I tried to denovo assemble RNA-seq data using trinity and ran maker
using my genome assembly and the trinity results.  I did not get as
many transcripts as expected, around 10,000 transcripts.

So, I decided to try a different approach.  I did a genome assisted
assembly of the RNA-seq data using tophat/cufflinks. This pipeline
generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
genome assembly and the cufflinks result.  I get much less number of
transcripts as a result.

If cufflinks found 29000 transcripts by mapping to the genome, I'm
confused as to why maker is not finding the same.

Any suggestions would be appreciated.

Thanks
Dhivya


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140205/98e0f3f4/attachment-0001.html>

From darasappan at gmail.com  Wed Feb  5 13:13:57 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Wed, 5 Feb 2014 14:13:57 -0600
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D43F95@mxb2.hg.genetics.utah.edu>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>,
	<CF17D1FC.987A%carsonhh@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43F95@mxb2.hg.genetics.utah.edu>
Message-ID: <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com>

Hello Daniel and Carson,

Thanks for your replies.

Yes I used the the protein sequences resulting from annotation of  
trinity assembly (using trinotate).  I'll try using protein sequences  
from related species (though there arent sequences from closely  
related orgs).  Could you tell me a little about why protein data from  
annotating my rnaseq data would not work best here?

Thanks
Dhivya

On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote:

> Hi Dhivya, Are the protein matches in your results coming from your  
> annotations of the transcriptome? You should really use amino-acid  
> sequences from related organisms and some kind of omnibus source  
> like SwissProt.
>
> Thanks,
> Daniel
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> From: Carson Holt [carsonhh at gmail.com]
> Sent: Wednesday, February 05, 2014 11:38 AM
> To: dhivya arasappan; Daniel Ence
> Cc: maker-devel at yandell-lab.org
> Subject: Re: [maker-devel] maker annotation with cufflinks output
>
> Do you have any features of type snap in your results from step 3?   
> We?ve had a couple of recent posts where after training snap was  
> giving no results, and as a result maker couldn?t give any genes.   
> One cause of something like that may be your step 2.  Make sure the  
> ZFF wasn?t empty you used to train with.  The maker2zff script uses  
> filters to only put the best genes in the off file, and if all your  
> genes fail the filtering then you are training with an empty ZFF.
>
> Also you should use proteins from a related species as your protein  
> file.  I see that you protein marches are varying wildly from run to  
> run? So is your contig count?  Were the subset of contigs you have  
> results for long enough to contain genes?
>
> ?Carson
>
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Monday, February 3, 2014 at 9:31 AM
> To: Daniel Ence <dence at genetics.utah.edu>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] maker annotation with cufflinks output
>
> Hi Daniel,
>
> I was able to check on some of those questions.
>
> 1. From trinity assembly: I started with 102000 contigs. I used  
> trinotate to annotate proteins in this.
>
> I ran maker on this data with est2genome set to 1. The output looks  
> like this (most important parts on top):
>
>     6653 gene
>    46675 exon
>  280534 protein_match
> 59934 CDS
>     969 contig
>  105388 expressed_sequence_match
>   12584 five_prime_UTR
>   78565 match
> 1401369 match_part
>   10180 mRNA
>   11545 three_prime_UTR
>
> 2. From cufflinks assembly: I started with 133380 entries (out of  
> which there are 29,000 transcripts).  I used the protein sequences  
> from trinity assembly.
>
> I ran maker on this data with est2genome set to 1. The output looks  
> like this:
>      29 gene
>      75 exon
>  573659 protein_match
> 67 CDS
>    1099 contig
>  269298 expressed_sequence_match
>      23 five_prime_UTR
>  173844 match
> 2221846 match_part
>      29 mRNA
>      23 three_prime_UTR
>
> The genes annotated using the trinity assembly is lower than  
> expected, so I went the cufflinks route. I dont understand why when  
> using the cufflinks transcripts, even less genes are being found.
>
> 3. Training SNAP:  I used the results of maker from 1 to train  
> SNAP.  I then used that training set to rerun maker:
> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
> maker_mpi_withAlltrinity/snap/RHA.hmm
> est2genome=0
>
> And again I got results with no entries for gene, exon, CDS etc.
> 957 contig
>   46555 expressed_sequence_match
>   43651 match
>  553633 match_part
>  113738 protein_match
>
> As I mentioned in another email, cegma results indicated that the  
> genome was more than 90% complete. Any suggestions would be helpful.
>
> Thank you
> Dhivya
>
>
>
>
> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>
>> Hi Dhivya,
>>
>> I think there a few numbers that could be helpful to understand  
>> what's happening here.
>>
>> How many transcripts did Trinity assembly the RNA-seq data into?  
>> Also, you had 29,000 transcripts from cufflinks, but fewer from  
>> MAKER when you gave it the cufflinks data. How many transcripts did  
>> MAKER identify with the cufflinks data? Did you still get more than  
>> the 10,000 transcripts that you found with just the Trinity data?
>>
>> A key part of MAKER's approach to genome annotation that might be  
>> affecting it's performance is that it only annotates a gene where  
>> there is both evidence (like your RNA-seq data) and an ab-initio  
>> prediction. If a prediction is unsupported by the evidence, then  
>> MAKER won't annotate a gene and if evidence aligns where there's no  
>> prediction, MAKER won't annotate a gene either. What ab-initio  
>> predictors are you using and have they been trained specific genome?
>>
>> You can force MAKER to automatically promote evidence alignments to  
>> a gene model by setting the est2genome option to 1, but that will  
>> usually give you many false positives.
>>
>> Try rerunning it with either the Trinity data or the Cufflinks data  
>> and with est2genome set to 1, and let us know how that affects the  
>> MAKER results.
>>
>> Thanks,
>> Daniel
>>
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> ________________________________________
>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf  
>> of dhivya arasappan [darasappan at gmail.com]
>> Sent: Thursday, January 30, 2014 11:18 AM
>> To: maker-devel at yandell-lab.org
>> Subject: [maker-devel] maker annotation with cufflinks output
>>
>> Hello,
>>
>> I am trying to annotate a 200 mb plant genome for which I have a very
>> good assembly.
>>
>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>> using my genome assembly and the trinity results.  I did not get as
>> many transcripts as expected, around 10,000 transcripts.
>>
>> So, I decided to try a different approach.  I did a genome assisted
>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using  
>> my
>> genome assembly and the cufflinks result.  I get much less number of
>> transcripts as a result.
>>
>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>> confused as to why maker is not finding the same.
>>
>> Any suggestions would be appreciated.
>>
>> Thanks
>> Dhivya
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
> _______________________________________________ maker-devel mailing  
> list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140205/44820157/attachment-0001.html>

From dence at genetics.utah.edu  Wed Feb  5 13:36:26 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Wed, 5 Feb 2014 20:36:26 +0000
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>,
	<CF17D1FC.987A%carsonhh@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43F95@mxb2.hg.genetics.utah.edu>,
	<4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D43FB4@mxb2.hg.genetics.utah.edu>

Hi Dhivya,

In genome annotation, often you want to use as many sources for evidence as is reasonable, but those sources should be distinct.  It will confuse downstream annotation efforts if your protein evidence is actually based on the RNA-seq data.

Using the trinotate results for protein evidence here restricts you first to the proteins coded by the transcripts in the RNA-seq data, which may be incomplete, and secondly to the proteins that trinotate could annotate from among the transcripts.

The problem that Carson mentioned with the SNAP HMM file is a real possibility also.

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: dhivya arasappan [darasappan at gmail.com]
Sent: Wednesday, February 05, 2014 1:13 PM
To: Daniel Ence
Cc: Carson Holt; maker-devel at yandell-lab.org
Subject: Re: [maker-devel] maker annotation with cufflinks output

Hello Daniel and Carson,

Thanks for your replies.

Yes I used the the protein sequences resulting from annotation of trinity assembly (using trinotate).  I'll try using protein sequences from related species (though there arent sequences from closely related orgs).  Could you tell me a little about why protein data from annotating my rnaseq data would not work best here?

Thanks
Dhivya

On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote:

Hi Dhivya, Are the protein matches in your results coming from your annotations of the transcriptome? You should really use amino-acid sequences from related organisms and some kind of omnibus source like SwissProt.

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: Carson Holt [carsonhh at gmail.com<mailto:carsonhh at gmail.com>]
Sent: Wednesday, February 05, 2014 11:38 AM
To: dhivya arasappan; Daniel Ence
Cc: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] maker annotation with cufflinks output

Do you have any features of type snap in your results from step 3?  We?ve had a couple of recent posts where after training snap was giving no results, and as a result maker couldn?t give any genes.  One cause of something like that may be your step 2.  Make sure the ZFF wasn?t empty you used to train with.  The maker2zff script uses filters to only put the best genes in the off file, and if all your genes fail the filtering then you are training with an empty ZFF.

Also you should use proteins from a related species as your protein file.  I see that you protein marches are varying wildly from run to run? So is your contig count?  Were the subset of contigs you have results for long enough to contain genes?

?Carson

From: dhivya arasappan <darasappan at gmail.com<mailto:darasappan at gmail.com>>
Date: Monday, February 3, 2014 at 9:31 AM
To: Daniel Ence <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] maker annotation with cufflinks output

Hi Daniel,

I was able to check on some of those questions.

1. From trinity assembly: I started with 102000 contigs. I used trinotate to annotate proteins in this.

I ran maker on this data with est2genome set to 1. The output looks like this (most important parts on top):

    6653 gene
   46675 exon
 280534 protein_match
59934 CDS
    969 contig
 105388 expressed_sequence_match
  12584 five_prime_UTR
  78565 match
1401369 match_part
  10180 mRNA
  11545 three_prime_UTR

2. From cufflinks assembly: I started with 133380 entries (out of which there are 29,000 transcripts).  I used the protein sequences from trinity assembly.

I ran maker on this data with est2genome set to 1. The output looks like this:
     29 gene
     75 exon
 573659 protein_match
67 CDS
   1099 contig
 269298 expressed_sequence_match
     23 five_prime_UTR
 173844 match
2221846 match_part
     29 mRNA
     23 three_prime_UTR

The genes annotated using the trinity assembly is lower than expected, so I went the cufflinks route. I dont understand why when using the cufflinks transcripts, even less genes are being found.

3. Training SNAP:  I used the results of maker from 1 to train SNAP.  I then used that training set to rerun maker:
snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap/RHA.hmm
est2genome=0

And again I got results with no entries for gene, exon, CDS etc.
957 contig
  46555 expressed_sequence_match
  43651 match
 553633 match_part
 113738 protein_match

As I mentioned in another email, cegma results indicated that the genome was more than 90% complete. Any suggestions would be helpful.

Thank you
Dhivya


On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:

Hi Dhivya,

I think there a few numbers that could be helpful to understand what's happening here.

How many transcripts did Trinity assembly the RNA-seq data into? Also, you had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it the cufflinks data. How many transcripts did MAKER identify with the cufflinks data? Did you still get more than the 10,000 transcripts that you found with just the Trinity data?

A key part of MAKER's approach to genome annotation that might be affecting it's performance is that it only annotates a gene where there is both evidence (like your RNA-seq data) and an ab-initio prediction. If a prediction is unsupported by the evidence, then MAKER won't annotate a gene and if evidence aligns where there's no prediction, MAKER won't annotate a gene either. What ab-initio predictors are you using and have they been trained specific genome?

You can force MAKER to automatically promote evidence alignments to a gene model by setting the est2genome option to 1, but that will usually give you many false positives.

Try rerunning it with either the Trinity data or the Cufflinks data and with est2genome set to 1, and let us know how that affects the MAKER results.

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org>] on behalf of dhivya arasappan [darasappan at gmail.com<mailto:darasappan at gmail.com>]
Sent: Thursday, January 30, 2014 11:18 AM
To: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
Subject: [maker-devel] maker annotation with cufflinks output

Hello,

I am trying to annotate a 200 mb plant genome for which I have a very
good assembly.

I tried to denovo assemble RNA-seq data using trinity and ran maker
using my genome assembly and the trinity results.  I did not get as
many transcripts as expected, around 10,000 transcripts.

So, I decided to try a different approach.  I did a genome assisted
assembly of the RNA-seq data using tophat/cufflinks. This pipeline
generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
genome assembly and the cufflinks result.  I get much less number of
transcripts as a result.

If cufflinks found 29000 transcripts by mapping to the genome, I'm
confused as to why maker is not finding the same.

Any suggestions would be appreciated.

Thanks
Dhivya


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140205/36c41e54/attachment-0001.html>

From carsonhh at gmail.com  Wed Feb  5 13:38:44 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 05 Feb 2014 13:38:44 -0700
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43F95@mxb2.hg.genetics.utah.edu>
	<4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com>
Message-ID: <CF17E9B9.9892%carsonhh@gmail.com>

Protein data doesn?t have to be from that closely a related species.  This
is because genes maintain homology at the amino acid level across even very
large evolutionary distances.  Having a closer related species just ensures
that genome contents are similar (fewer losses/gains relative to each
other). And use the entire proteome of at least one related species (just
using a database like swiss-prot is not sufficient).

Using translated mRNA-seq data will not give you any new information that
was not already available from the untranslated sequence.  Plus it will
introduce the complicating artifacts that mRNA-seq generates into the
protein part of the pipeline (gene merging, incorrect assembly, and false
calls caused by background transcription).  A big gotcha with mRNA-seq is
that all of your genome gets transcribed at a low level, not just the genes,
so you will always have contamination that does not represent real gene
models.  Also in the end you really only expect to capture about 50% of the
genes with mRNA-seq (maybe 70% if you are fortunate - and most of those will
be partial). So using the proteins from another species, is important to
improve sensitivity, and fix many of the issues that arise from the noisy
nature of mRNA-seq.  In fact if you were forced to use only one (either
protein evidence or mRNA-seq) you will actually get better annotations from
the protein evidence in most cases. You get better annotations when you use
both, but if using only one of them, the proteins from another species are
better, and noisy mRNA-seq will be the primary source of annotation error.

Thanks,
Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Wednesday, February 5, 2014 at 1:13 PM
To:  Daniel Ence <dence at genetics.utah.edu>
Cc:  Carson Holt <carsonhh at gmail.com>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] maker annotation with cufflinks output

Hello Daniel and Carson,

Thanks for your replies.

Yes I used the the protein sequences resulting from annotation of trinity
assembly (using trinotate).  I'll try using protein sequences from related
species (though there arent sequences from closely related orgs).  Could you
tell me a little about why protein data from annotating my rnaseq data would
not work best here?

Thanks
Dhivya
 
On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote:

> Hi Dhivya, Are the protein matches in your results coming from your
> annotations of the transcriptome? You should really use amino-acid sequences
> from related organisms and some kind of omnibus source like SwissProt.
> 
> Thanks,
> Daniel
> 
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> 
> From: Carson Holt [carsonhh at gmail.com]
> Sent: Wednesday, February 05, 2014 11:38 AM
> To: dhivya arasappan; Daniel Ence
> Cc: maker-devel at yandell-lab.org
> Subject: Re: [maker-devel] maker annotation with cufflinks output
> 
> Do you have any features of type snap in your results from step 3?  We?ve had
> a couple of recent posts where after training snap was giving no results, and
> as a result maker couldn?t give any genes.  One cause of something like that
> may be your step 2.  Make sure the ZFF wasn?t empty you used to train with.
> The maker2zff script uses filters to only put the best genes in the off file,
> and if all your genes fail the filtering then you are training with an empty
> ZFF.
> 
> Also you should use proteins from a related species as your protein file.  I
> see that you protein marches are varying wildly from run to run? So is your
> contig count?  Were the subset of contigs you have results for long enough to
> contain genes?
> 
> ?Carson
> 
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Monday, February 3, 2014 at 9:31 AM
> To: Daniel Ence <dence at genetics.utah.edu>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] maker annotation with cufflinks output
> 
> Hi Daniel,
> 
> I was able to check on some of those questions.
> 
> 1. From trinity assembly: I started with 102000 contigs. I used trinotate to
> annotate proteins in this.
> 
> I ran maker on this data with est2genome set to 1. The output looks like this
> (most important parts on top):
> 
>     6653 gene
>    46675 exon
>  280534 protein_match
> 59934 CDS
>     969 contig
>  105388 expressed_sequence_match
>   12584 five_prime_UTR
>   78565 match
> 1401369 match_part
>   10180 mRNA
>   11545 three_prime_UTR
> 
> 2. From cufflinks assembly: I started with 133380 entries (out of which there
> are 29,000 transcripts).  I used the protein sequences from trinity assembly.
> 
> I ran maker on this data with est2genome set to 1. The output looks like this:
>      29 gene
>      75 exon
>  573659 protein_match
> 67 CDS
>    1099 contig
>  269298 expressed_sequence_match
>      23 five_prime_UTR
>  173844 match
> 2221846 match_part
>      29 mRNA
>      23 three_prime_UTR
> 
> The genes annotated using the trinity assembly is lower than expected, so I
> went the cufflinks route. I dont understand why when using the cufflinks
> transcripts, even less genes are being found.
> 
> 3. Training SNAP:  I used the results of maker from 1 to train SNAP.  I then
> used that training set to rerun maker:
> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap
> /RHA.hmm
> est2genome=0
> 
> And again I got results with no entries for gene, exon, CDS etc.
> 957 contig
>   46555 expressed_sequence_match
>   43651 match
>  553633 match_part
>  113738 protein_match
> 
> As I mentioned in another email, cegma results indicated that the genome was
> more than 90% complete. Any suggestions would be helpful.
> 
> Thank you
> Dhivya
> 
> 
> 
> 
> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
> 
>> Hi Dhivya, 
>> 
>> I think there a few numbers that could be helpful to understand what's
>> happening here. 
>> 
>> How many transcripts did Trinity assembly the RNA-seq data into? Also, you
>> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it
>> the cufflinks data. How many transcripts did MAKER identify with the
>> cufflinks data? Did you still get more than the 10,000 transcripts that you
>> found with just the Trinity data?
>> 
>> A key part of MAKER's approach to genome annotation that might be affecting
>> it's performance is that it only annotates a gene where there is both
>> evidence (like your RNA-seq data) and an ab-initio prediction. If a
>> prediction is unsupported by the evidence, then MAKER won't annotate a gene
>> and if evidence aligns where there's no prediction, MAKER won't annotate a
>> gene either. What ab-initio predictors are you using and have they been
>> trained specific genome?
>> 
>> You can force MAKER to automatically promote evidence alignments to a gene
>> model by setting the est2genome option to 1, but that will usually give you
>> many false positives.
>> 
>> Try rerunning it with either the Trinity data or the Cufflinks data and with
>> est2genome set to 1, and let us know how that affects the MAKER results.
>> 
>> Thanks,
>> Daniel
>> 
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> ________________________________________
>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya
>> arasappan [darasappan at gmail.com]
>> Sent: Thursday, January 30, 2014 11:18 AM
>> To: maker-devel at yandell-lab.org
>> Subject: [maker-devel] maker annotation with cufflinks output
>> 
>> Hello,
>> 
>> I am trying to annotate a 200 mb plant genome for which I have a very
>> good assembly.
>> 
>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>> using my genome assembly and the trinity results.  I did not get as
>> many transcripts as expected, around 10,000 transcripts.
>> 
>> So, I decided to try a different approach.  I did a genome assisted
>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
>> genome assembly and the cufflinks result.  I get much less number of
>> transcripts as a result.
>> 
>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>> confused as to why maker is not finding the same.
>> 
>> Any suggestions would be appreciated.
>> 
>> Thanks
>> Dhivya
>> 
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140205/422a18ff/attachment-0001.html>

From darasappan at gmail.com  Wed Feb  5 22:16:43 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Wed, 5 Feb 2014 23:16:43 -0600
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <CF17E9B9.9892%carsonhh@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43F95@mxb2.hg.genetics.utah.edu>
	<4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com>
	<CF17E9B9.9892%carsonhh@gmail.com>
Message-ID: <1188173E-53C1-4FFE-B790-B710C3A55B86@gmail.com>

Thank you both for those explanations. I'll get back to you after I  
try rerunning maker.

Dhivya

On Feb 5, 2014, at 2:38 PM, Carson Holt wrote:

> Protein data doesn?t have to be from that closely a related  
> species.  This is because genes maintain homology at the amino acid  
> level across even very large evolutionary distances.  Having a  
> closer related species just ensures that genome contents are similar  
> (fewer losses/gains relative to each other). And use the entire  
> proteome of at least one related species (just using a database like  
> swiss-prot is not sufficient).
>
> Using translated mRNA-seq data will not give you any new information  
> that was not already available from the untranslated sequence.  Plus  
> it will introduce the complicating artifacts that mRNA-seq generates  
> into the protein part of the pipeline (gene merging, incorrect  
> assembly, and false calls caused by background transcription).  A  
> big gotcha with mRNA-seq is that all of your genome gets transcribed  
> at a low level, not just the genes, so you will always have  
> contamination that does not represent real gene models.  Also in the  
> end you really only expect to capture about 50% of the genes with  
> mRNA-seq (maybe 70% if you are fortunate - and most of those will be  
> partial). So using the proteins from another species, is important  
> to improve sensitivity, and fix many of the issues that arise from  
> the noisy nature of mRNA-seq.  In fact if you were forced to use  
> only one (either protein evidence or mRNA-seq) you will actually get  
> better annotations from the protein evidence in most cases. You get  
> better annotations when you use both, but if using only one of them,  
> the proteins from another species are better, and noisy mRNA-seq  
> will be the primary source of annotation error.
>
> Thanks,
> Carson
>
>
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Wednesday, February 5, 2014 at 1:13 PM
> To: Daniel Ence <dence at genetics.utah.edu>
> Cc: Carson Holt <carsonhh at gmail.com>, "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org 
> >
> Subject: Re: [maker-devel] maker annotation with cufflinks output
>
> Hello Daniel and Carson,
>
> Thanks for your replies.
>
> Yes I used the the protein sequences resulting from annotation of  
> trinity assembly (using trinotate).  I'll try using protein  
> sequences from related species (though there arent sequences from  
> closely related orgs).  Could you tell me a little about why protein  
> data from annotating my rnaseq data would not work best here?
>
> Thanks
> Dhivya
>
> On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote:
>
>> Hi Dhivya, Are the protein matches in your results coming from your  
>> annotations of the transcriptome? You should really use amino-acid  
>> sequences from related organisms and some kind of omnibus source  
>> like SwissProt.
>>
>> Thanks,
>> Daniel
>>
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> From: Carson Holt [carsonhh at gmail.com]
>> Sent: Wednesday, February 05, 2014 11:38 AM
>> To: dhivya arasappan; Daniel Ence
>> Cc: maker-devel at yandell-lab.org
>> Subject: Re: [maker-devel] maker annotation with cufflinks output
>>
>> Do you have any features of type snap in your results from step 3?   
>> We?ve had a couple of recent posts where after training snap was  
>> giving no results, and as a result maker couldn?t give any genes.   
>> One cause of something like that may be your step 2.  Make sure the  
>> ZFF wasn?t empty you used to train with.  The maker2zff script uses  
>> filters to only put the best genes in the off file, and if all your  
>> genes fail the filtering then you are training with an empty ZFF.
>>
>> Also you should use proteins from a related species as your protein  
>> file.  I see that you protein marches are varying wildly from run  
>> to run? So is your contig count?  Were the subset of contigs you  
>> have results for long enough to contain genes?
>>
>> ?Carson
>>
>> From: dhivya arasappan <darasappan at gmail.com>
>> Date: Monday, February 3, 2014 at 9:31 AM
>> To: Daniel Ence <dence at genetics.utah.edu>
>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] maker annotation with cufflinks output
>>
>> Hi Daniel,
>>
>> I was able to check on some of those questions.
>>
>> 1. From trinity assembly: I started with 102000 contigs. I used  
>> trinotate to annotate proteins in this.
>>
>> I ran maker on this data with est2genome set to 1. The output looks  
>> like this (most important parts on top):
>>
>>     6653 gene
>>    46675 exon
>>  280534 protein_match
>> 59934 CDS
>>     969 contig
>>  105388 expressed_sequence_match
>>   12584 five_prime_UTR
>>   78565 match
>> 1401369 match_part
>>   10180 mRNA
>>   11545 three_prime_UTR
>>
>> 2. From cufflinks assembly: I started with 133380 entries (out of  
>> which there are 29,000 transcripts).  I used the protein sequences  
>> from trinity assembly.
>>
>> I ran maker on this data with est2genome set to 1. The output looks  
>> like this:
>>      29 gene
>>      75 exon
>>  573659 protein_match
>> 67 CDS
>>    1099 contig
>>  269298 expressed_sequence_match
>>      23 five_prime_UTR
>>  173844 match
>> 2221846 match_part
>>      29 mRNA
>>      23 three_prime_UTR
>>
>> The genes annotated using the trinity assembly is lower than  
>> expected, so I went the cufflinks route. I dont understand why when  
>> using the cufflinks transcripts, even less genes are being found.
>>
>> 3. Training SNAP:  I used the results of maker from 1 to train  
>> SNAP.  I then used that training set to rerun maker:
>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
>> maker_mpi_withAlltrinity/snap/RHA.hmm
>> est2genome=0
>>
>> And again I got results with no entries for gene, exon, CDS etc.
>> 957 contig
>>   46555 expressed_sequence_match
>>   43651 match
>>  553633 match_part
>>  113738 protein_match
>>
>> As I mentioned in another email, cegma results indicated that the  
>> genome was more than 90% complete. Any suggestions would be helpful.
>>
>> Thank you
>> Dhivya
>>
>>
>>
>>
>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>>
>>> Hi Dhivya,
>>>
>>> I think there a few numbers that could be helpful to understand  
>>> what's happening here.
>>>
>>> How many transcripts did Trinity assembly the RNA-seq data into?  
>>> Also, you had 29,000 transcripts from cufflinks, but fewer from  
>>> MAKER when you gave it the cufflinks data. How many transcripts  
>>> did MAKER identify with the cufflinks data? Did you still get more  
>>> than the 10,000 transcripts that you found with just the Trinity  
>>> data?
>>>
>>> A key part of MAKER's approach to genome annotation that might be  
>>> affecting it's performance is that it only annotates a gene where  
>>> there is both evidence (like your RNA-seq data) and an ab-initio  
>>> prediction. If a prediction is unsupported by the evidence, then  
>>> MAKER won't annotate a gene and if evidence aligns where there's  
>>> no prediction, MAKER won't annotate a gene either. What ab-initio  
>>> predictors are you using and have they been trained specific genome?
>>>
>>> You can force MAKER to automatically promote evidence alignments  
>>> to a gene model by setting the est2genome option to 1, but that  
>>> will usually give you many false positives.
>>>
>>> Try rerunning it with either the Trinity data or the Cufflinks  
>>> data and with est2genome set to 1, and let us know how that  
>>> affects the MAKER results.
>>>
>>> Thanks,
>>> Daniel
>>>
>>> Daniel Ence
>>> Graduate Student
>>> Eccles Institute of Human Genetics
>>> University of Utah
>>> 15 North 2030 East, Room 2100
>>> Salt Lake City, UT 84112-5330
>>> ________________________________________
>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf  
>>> of dhivya arasappan [darasappan at gmail.com]
>>> Sent: Thursday, January 30, 2014 11:18 AM
>>> To: maker-devel at yandell-lab.org
>>> Subject: [maker-devel] maker annotation with cufflinks output
>>>
>>> Hello,
>>>
>>> I am trying to annotate a 200 mb plant genome for which I have a  
>>> very
>>> good assembly.
>>>
>>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>>> using my genome assembly and the trinity results.  I did not get as
>>> many transcripts as expected, around 10,000 transcripts.
>>>
>>> So, I decided to try a different approach.  I did a genome assisted
>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>>> generated 21,000 genes, 29,000 transcripts.  I then ran maker  
>>> using my
>>> genome assembly and the cufflinks result.  I get much less number of
>>> transcripts as a result.
>>>
>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>>> confused as to why maker is not finding the same.
>>>
>>> Any suggestions would be appreciated.
>>>
>>> Thanks
>>> Dhivya
>>>
>>>
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>> _______________________________________________ maker-devel mailing  
>> list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140205/02e0218f/attachment-0001.html>

From mikael.durling at slu.se  Thu Feb  6 04:02:37 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Thu, 6 Feb 2014 11:02:37 +0000
Subject: [maker-devel] ncRNA support in maker
In-Reply-To: <CEF5A8E0.88B4%carsonhh@gmail.com>
References: <CEF5A8E0.88B4%carsonhh@gmail.com>
Message-ID: <CCBE48F7-81F1-42E3-87A3-B251EE03140C@slu.se>

Hi Carson,

it?s nice to see all these new features in maker.

I gave the trnascan option a try by enabling it in the config file for one of my fungal genomes. It failed though, with this error message:

ERROR: You found a tRNA with an intron! This should not happen
--> rank=12, hostname=my-mgrid6
ERROR: Failed while gathering ab-init output files
ERROR: Chunk failed at level:1, tier_type:2
FAILED CONTIG:scf_013

ERROR: Chunk failed at level:4, tier_type:0
FAILED CONTIG:scf_013

I checked the trnascan output (scf_013.abinit_nomask.0.eukaryotic.trnascan) in theVoid for that contig, and the output seems valid to me:

scf_013         1       189339  189410  Thr     AGT     0       0       82.83
scf_013         2       510381  510462  Ser     AGA     0       0       67.09
scf_013         3       586886  587000  Leu     CAA     586924  586956  57.97
scf_013         4       942166  942069  Leu     AAG     942128  942113  57.48
scf_013         5       169102  168993  Leu     TAA     169065  169037  56.49


Hope this can be of some help while debugging. I?ll leave trnascan off for now.

thanks,

Mikael


10 jan 2014 kl. 22:03 skrev Carson Holt <carsonhh at gmail.com>:

> Hi Mikael,
> 
> The options are part of the new MAKER-P integration
> (http://www.plantphysiol.org/content/early/2013/12/06/pp.113.230144.abstrac
> t).  Additional documentation/tutorials will be forthcoming - probably in
> a nice wiki page as part of the upcoming GMOD Malaysia courses in February
> or alternatively with the annual GMOD summer school. The tRNA option is
> easy enough to turn on (just set trna=1 in the maker_opts.ctl file).
> 
> Thanks,
> Carson
> 
> 
> 
> On 1/10/14, 2:48 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
> wrote:
> 
>> Hi Carson and other maker developers,
>> 
>> I was reading the source code of the latest maker release and noted
>> several references to ncRNAs, snoscan and trnascan. Can these be
>> incorporated into the normal annotation workflow? If so, are there any
>> instructions available for that?
>> 
>> best regards,
>> Mikael Durling
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> 


From darasappan at gmail.com  Thu Feb  6 07:52:12 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Thu, 6 Feb 2014 08:52:12 -0600
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <CF17D1FC.987A%carsonhh@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
Message-ID: <73AFCD9F-3B60-4C9C-9E03-35BC682E14ED@gmail.com>

Hello,

I does appear than my genome.ann file from maker2zff script has data  
in it. However, the SNAP steps after that have created empty files.   
The following are all empty:

alt.dna  err.dna  export.dna  genome.dna  olp.dna  uni.dna  wrn.dna
alt.ann  err.ann  export.ann  genome.ann  olp.ann  uni.ann  wrn.ann

When I tried to get gene stats or validate genome.ann, I get errors  
like this for all of them:

fathom genome.ann genome.dna -gene-stats |more
MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds  
exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds  
exon-6:out_of_bounds
MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds  
exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds  
exon-1:out_of_bounds
MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds  
exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds  
exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds  
exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds  
exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds  
exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds  
exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds  
exon-20:out_of_bounds exon-21:out_of_bounds

I'm not sure why the annotation I'm seeing in genome.ann are all  
showing up as errors. I realize this may be an issue with snap, but  
are you familiar with anything like this? Snippet of my genome.ann  
file is attached (since its too big for the list) for reference.

Thanks
Dhivya


On Feb 5, 2014, at 12:38 PM, Carson Holt wrote:

> Do you have any features of type snap in your results from step 3?   
> We?ve had a couple of recent posts where after training snap was  
> giving no results, and as a result maker couldn?t give any genes.   
> One cause of something like that may be your step 2.  Make sure the  
> ZFF wasn?t empty you used to train with.  The maker2zff script uses  
> filters to only put the best genes in the off file, and if all your  
> genes fail the filtering then you are training with an empty ZFF.
>
> Also you should use proteins from a related species as your protein  
> file.  I see that you protein marches are varying wildly from run to  
> run? So is your contig count?  Were the subset of contigs you have  
> results for long enough to contain genes?
>
> ?Carson
>
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Monday, February 3, 2014 at 9:31 AM
> To: Daniel Ence <dence at genetics.utah.edu>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] maker annotation with cufflinks output
>
> Hi Daniel,
>
> I was able to check on some of those questions.
>
> 1. From trinity assembly: I started with 102000 contigs. I used  
> trinotate to annotate proteins in this.
>
> I ran maker on this data with est2genome set to 1. The output looks  
> like this (most important parts on top):
>
>     6653 gene
>    46675 exon
>  280534 protein_match
> 59934 CDS
>     969 contig
>  105388 expressed_sequence_match
>   12584 five_prime_UTR
>   78565 match
> 1401369 match_part
>   10180 mRNA
>   11545 three_prime_UTR
>
> 2. From cufflinks assembly: I started with 133380 entries (out of  
> which there are 29,000 transcripts).  I used the protein sequences  
> from trinity assembly.
>
> I ran maker on this data with est2genome set to 1. The output looks  
> like this:
>      29 gene
>      75 exon
>  573659 protein_match
> 67 CDS
>    1099 contig
>  269298 expressed_sequence_match
>      23 five_prime_UTR
>  173844 match
> 2221846 match_part
>      29 mRNA
>      23 three_prime_UTR
>
> The genes annotated using the trinity assembly is lower than  
> expected, so I went the cufflinks route. I dont understand why when  
> using the cufflinks transcripts, even less genes are being found.
>
> 3. Training SNAP:  I used the results of maker from 1 to train  
> SNAP.  I then used that training set to rerun maker:
> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
> maker_mpi_withAlltrinity/snap/RHA.hmm
> est2genome=0
>
> And again I got results with no entries for gene, exon, CDS etc.
> 957 contig
>   46555 expressed_sequence_match
>   43651 match
>  553633 match_part
>  113738 protein_match
>
> As I mentioned in another email, cegma results indicated that the  
> genome was more than 90% complete. Any suggestions would be helpful.
>
> Thank you
> Dhivya
>
>
>
>
> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>
>> Hi Dhivya,
>>
>> I think there a few numbers that could be helpful to understand  
>> what's happening here.
>>
>> How many transcripts did Trinity assembly the RNA-seq data into?  
>> Also, you had 29,000 transcripts from cufflinks, but fewer from  
>> MAKER when you gave it the cufflinks data. How many transcripts did  
>> MAKER identify with the cufflinks data? Did you still get more than  
>> the 10,000 transcripts that you found with just the Trinity data?
>>
>> A key part of MAKER's approach to genome annotation that might be  
>> affecting it's performance is that it only annotates a gene where  
>> there is both evidence (like your RNA-seq data) and an ab-initio  
>> prediction. If a prediction is unsupported by the evidence, then  
>> MAKER won't annotate a gene and if evidence aligns where there's no  
>> prediction, MAKER won't annotate a gene either. What ab-initio  
>> predictors are you using and have they been trained specific genome?
>>
>> You can force MAKER to automatically promote evidence alignments to  
>> a gene model by setting the est2genome option to 1, but that will  
>> usually give you many false positives.
>>
>> Try rerunning it with either the Trinity data or the Cufflinks data  
>> and with est2genome set to 1, and let us know how that affects the  
>> MAKER results.
>>
>> Thanks,
>> Daniel
>>
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> ________________________________________
>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf  
>> of dhivya arasappan [darasappan at gmail.com]
>> Sent: Thursday, January 30, 2014 11:18 AM
>> To: maker-devel at yandell-lab.org
>> Subject: [maker-devel] maker annotation with cufflinks output
>>
>> Hello,
>>
>> I am trying to annotate a 200 mb plant genome for which I have a very
>> good assembly.
>>
>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>> using my genome assembly and the trinity results.  I did not get as
>> many transcripts as expected, around 10,000 transcripts.
>>
>> So, I decided to try a different approach.  I did a genome assisted
>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using  
>> my
>> genome assembly and the cufflinks result.  I get much less number of
>> transcripts as a result.
>>
>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>> confused as to why maker is not finding the same.
>>
>> Any suggestions would be appreciated.
>>
>> Thanks
>> Dhivya
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
> _______________________________________________ maker-devel mailing  
> list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a6912d46/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: head.genome.ann
Type: application/octet-stream
Size: 15761 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a6912d46/attachment-0002.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a6912d46/attachment-0004.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: head.genome.dna
Type: application/octet-stream
Size: 3075 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a6912d46/attachment-0003.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a6912d46/attachment-0005.html>

From carsonhh at gmail.com  Thu Feb  6 09:01:04 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Feb 2014 09:01:04 -0700
Subject: [maker-devel] ncRNA support in maker
In-Reply-To: <CCBE48F7-81F1-42E3-87A3-B251EE03140C@slu.se>
References: <CEF5A8E0.88B4%carsonhh@gmail.com>
	<CCBE48F7-81F1-42E3-87A3-B251EE03140C@slu.se>
Message-ID: <CF18FE86.9903%carsonhh@gmail.com>

I?m making a new release this weekend, but if you have access to the devel
version, you can test now.  All changes have been committed tot he
subversion repository.

Thanks,
Carson


On 2/6/14, 4:02 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
wrote:

>Hi Carson,
>
>it?s nice to see all these new features in maker.
>
>I gave the trnascan option a try by enabling it in the config file for
>one of my fungal genomes. It failed though, with this error message:
>
>ERROR: You found a tRNA with an intron! This should not happen
>--> rank=12, hostname=my-mgrid6
>ERROR: Failed while gathering ab-init output files
>ERROR: Chunk failed at level:1, tier_type:2
>FAILED CONTIG:scf_013
>
>ERROR: Chunk failed at level:4, tier_type:0
>FAILED CONTIG:scf_013
>
>I checked the trnascan output
>(scf_013.abinit_nomask.0.eukaryotic.trnascan) in theVoid for that contig,
>and the output seems valid to me:
>
>scf_013         1       189339  189410  Thr     AGT     0       0
>82.83
>scf_013         2       510381  510462  Ser     AGA     0       0
>67.09
>scf_013         3       586886  587000  Leu     CAA     586924  586956
>57.97
>scf_013         4       942166  942069  Leu     AAG     942128  942113
>57.48
>scf_013         5       169102  168993  Leu     TAA     169065  169037
>56.49
>
>
>Hope this can be of some help while debugging. I?ll leave trnascan off
>for now.
>
>thanks,
>
>Mikael
>
>
>10 jan 2014 kl. 22:03 skrev Carson Holt <carsonhh at gmail.com>:
>
>> Hi Mikael,
>> 
>> The options are part of the new MAKER-P integration
>> 
>>(http://www.plantphysiol.org/content/early/2013/12/06/pp.113.230144.abstr
>>ac
>> t).  Additional documentation/tutorials will be forthcoming - probably
>>in
>> a nice wiki page as part of the upcoming GMOD Malaysia courses in
>>February
>> or alternatively with the annual GMOD summer school. The tRNA option is
>> easy enough to turn on (just set trna=1 in the maker_opts.ctl file).
>> 
>> Thanks,
>> Carson
>> 
>> 
>> 
>> On 1/10/14, 2:48 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
>> wrote:
>> 
>>> Hi Carson and other maker developers,
>>> 
>>> I was reading the source code of the latest maker release and noted
>>> several references to ncRNAs, snoscan and trnascan. Can these be
>>> incorporated into the normal annotation workflow? If so, are there any
>>> instructions available for that?
>>> 
>>> best regards,
>>> Mikael Durling
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
>> 
>


From carsonhh at gmail.com  Thu Feb  6 09:05:05 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Feb 2014 09:05:05 -0700
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <C375C3D8-1B13-4685-9E90-AAF710CADCDD@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
	<C375C3D8-1B13-4685-9E90-AAF710CADCDD@gmail.com>
Message-ID: <CF19004A.9913%carsonhh@gmail.com>

Your genome.dna file has no sequence?  Did you by any chance strip the fasta
sequence from the GFF3 you are using as input to maker2zff?  There should be
fasta sequence at the end of that file.  Also can I see the GFF3 file you
are using as input to maker2zff.

Thanks,
Carson

From:  dhivya arasappan <darasappan at gmail.com>
Date:  Thursday, February 6, 2014 at 7:47 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] maker annotation with cufflinks output

Hello,

I does appear than my genome.ann file from maker2zff script has data in it.
However, the SNAP steps after that have created empty files.  The following
are all empty:

alt.dna  err.dna  export.dna  genome.dna  olp.dna  uni.dna  wrn.dna
alt.ann  err.ann  export.ann  genome.ann  olp.ann  uni.ann  wrn.ann

When I tried to get gene stats or validate genome.ann, I get errors like
this for all of them:

fathom genome.ann genome.dna -gene-stats |more
MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds
exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
exon-6:out_of_bounds
MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds
exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds
exon-1:out_of_bounds
MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds
exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds
exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds
exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds
exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds
exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds
exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds
exon-21:out_of_bounds

I'm not sure why the annotation I'm seeing in genome.ann are all showing up
as errors. I realize this may be an issue with snap, but are you familiar
with anything like this? My genome.ann file is attached for reference.

Thanks
Dhivya

On Feb 5, 2014, at 12:38 PM, Carson Holt wrote:

> Do you have any features of type snap in your results from step 3?  We?ve had
> a couple of recent posts where after training snap was giving no results, and
> as a result maker couldn?t give any genes.  One cause of something like that
> may be your step 2.  Make sure the ZFF wasn?t empty you used to train with.
> The maker2zff script uses filters to only put the best genes in the off file,
> and if all your genes fail the filtering then you are training with an empty
> ZFF.
> 
> Also you should use proteins from a related species as your protein file.  I
> see that you protein marches are varying wildly from run to run? So is your
> contig count?  Were the subset of contigs you have results for long enough to
> contain genes?
> 
> ?Carson
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Monday, February 3, 2014 at 9:31 AM
> To:  Daniel Ence <dence at genetics.utah.edu>
> Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject:  Re: [maker-devel] maker annotation with cufflinks output
> 
> Hi Daniel,
> 
> I was able to check on some of those questions.
> 
> 1. From trinity assembly: I started with 102000 contigs. I used trinotate to
> annotate proteins in this.
> 
> I ran maker on this data with est2genome set to 1. The output looks like this
> (most important parts on top):
> 
>     6653 gene
>    46675 exon
>  280534 protein_match
> 59934 CDS
>     969 contig
>  105388 expressed_sequence_match
>   12584 five_prime_UTR
>   78565 match
> 1401369 match_part
>   10180 mRNA
>   11545 three_prime_UTR
> 
> 2. From cufflinks assembly: I started with 133380 entries (out of which there
> are 29,000 transcripts).  I used the protein sequences from trinity assembly.
> 
> I ran maker on this data with est2genome set to 1. The output looks like this:
>      29 gene
>      75 exon
>  573659 protein_match
> 67 CDS
>    1099 contig
>  269298 expressed_sequence_match
>      23 five_prime_UTR
>  173844 match
> 2221846 match_part
>      29 mRNA
>      23 three_prime_UTR
> 
> The genes annotated using the trinity assembly is lower than expected, so I
> went the cufflinks route. I dont understand why when using the cufflinks
> transcripts, even less genes are being found.
> 
> 3. Training SNAP:  I used the results of maker from 1 to train SNAP.  I then
> used that training set to rerun maker:
> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap
> /RHA.hmm
> est2genome=0
> 
> And again I got results with no entries for gene, exon, CDS etc.
> 957 contig
>   46555 expressed_sequence_match
>   43651 match
>  553633 match_part
>  113738 protein_match
> 
> As I mentioned in another email, cegma results indicated that the genome was
> more than 90% complete. Any suggestions would be helpful.
> 
> Thank you
> Dhivya
> 
> 
> 
> 
> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
> 
>> Hi Dhivya, 
>> 
>> I think there a few numbers that could be helpful to understand what's
>> happening here. 
>> 
>> How many transcripts did Trinity assembly the RNA-seq data into? Also, you
>> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it
>> the cufflinks data. How many transcripts did MAKER identify with the
>> cufflinks data? Did you still get more than the 10,000 transcripts that you
>> found with just the Trinity data?
>> 
>> A key part of MAKER's approach to genome annotation that might be affecting
>> it's performance is that it only annotates a gene where there is both
>> evidence (like your RNA-seq data) and an ab-initio prediction. If a
>> prediction is unsupported by the evidence, then MAKER won't annotate a gene
>> and if evidence aligns where there's no prediction, MAKER won't annotate a
>> gene either. What ab-initio predictors are you using and have they been
>> trained specific genome?
>> 
>> You can force MAKER to automatically promote evidence alignments to a gene
>> model by setting the est2genome option to 1, but that will usually give you
>> many false positives.
>> 
>> Try rerunning it with either the Trinity data or the Cufflinks data and with
>> est2genome set to 1, and let us know how that affects the MAKER results.
>> 
>> Thanks,
>> Daniel
>> 
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> ________________________________________
>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya
>> arasappan [darasappan at gmail.com]
>> Sent: Thursday, January 30, 2014 11:18 AM
>> To: maker-devel at yandell-lab.org
>> Subject: [maker-devel] maker annotation with cufflinks output
>> 
>> Hello,
>> 
>> I am trying to annotate a 200 mb plant genome for which I have a very
>> good assembly.
>> 
>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>> using my genome assembly and the trinity results.  I did not get as
>> many transcripts as expected, around 10,000 transcripts.
>> 
>> So, I decided to try a different approach.  I did a genome assisted
>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
>> genome assembly and the cufflinks result.  I get much less number of
>> transcripts as a result.
>> 
>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>> confused as to why maker is not finding the same.
>> 
>> Any suggestions would be appreciated.
>> 
>> Thanks
>> Dhivya
>> 
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/9fd72060/attachment-0001.html>

From carsonhh at gmail.com  Thu Feb  6 10:04:25 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Feb 2014 10:04:25 -0700
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
	<C375C3D8-1B13-4685-9E90-AAF710CADCDD@gmail.com>
	<CF19004A.9913%carsonhh@gmail.com>
	<02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com>
Message-ID: <CF190E83.9927%carsonhh@gmail.com>

Could you give me the file without using 'head? to trim it, its cutting it
before it reaches the part I?m interested in.

?Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Thursday, February 6, 2014 at 10:01 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] maker annotation with cufflinks output

Oh yes I did- I took just the non sequence entries in the gff file and used
that as my input.  I will rerun snap with the gff file containing the
sequences as well. 

I'm attaching a snippet of the gff file that I used as input to maker2zff.

Thanks for your help
Dhivya


On Feb 6, 2014, at 10:05 AM, Carson Holt wrote:

> Your genome.dna file has no sequence?  Did you by any chance strip the fasta
> sequence from the GFF3 you are using as input to maker2zff?  There should be
> fasta sequence at the end of that file.  Also can I see the GFF3 file you are
> using as input to maker2zff.
> 
> Thanks,
> Carson
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Thursday, February 6, 2014 at 7:47 AM
> To:  Carson Holt <carsonhh at gmail.com>
> Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
> <maker-devel at yandell-lab.org>
> Subject:  Re: [maker-devel] maker annotation with cufflinks output
> 
> Hello,
> 
> I does appear than my genome.ann file from maker2zff script has data in it.
> However, the SNAP steps after that have created empty files.  The following
> are all empty:
> 
> alt.dna  err.dna  export.dna  genome.dna  olp.dna  uni.dna  wrn.dna
> alt.ann  err.ann  export.ann  genome.ann  olp.ann  uni.ann  wrn.ann
> 
> When I tried to get gene stats or validate genome.ann, I get errors like this
> for all of them:
> 
> fathom genome.ann genome.dna -gene-stats |more
> MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds
> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
> exon-6:out_of_bounds
> MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds
> exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds
> exon-1:out_of_bounds
> MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds
> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
> MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds
> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
> exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds
> exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds
> exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds
> exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds
> exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds
> exon-21:out_of_bounds
> 
> I'm not sure why the annotation I'm seeing in genome.ann are all showing up as
> errors. I realize this may be an issue with snap, but are you familiar with
> anything like this? My genome.ann file is attached for reference.
> 
> Thanks
> Dhivya
> 
> On Feb 5, 2014, at 12:38 PM, Carson Holt wrote:
> 
>> Do you have any features of type snap in your results from step 3?  We?ve had
>> a couple of recent posts where after training snap was giving no results, and
>> as a result maker couldn?t give any genes.  One cause of something like that
>> may be your step 2.  Make sure the ZFF wasn?t empty you used to train with.
>> The maker2zff script uses filters to only put the best genes in the off file,
>> and if all your genes fail the filtering then you are training with an empty
>> ZFF.
>> 
>> Also you should use proteins from a related species as your protein file.  I
>> see that you protein marches are varying wildly from run to run? So is your
>> contig count?  Were the subset of contigs you have results for long enough to
>> contain genes?
>> 
>> ?Carson
>> 
>> From:  dhivya arasappan <darasappan at gmail.com>
>> Date:  Monday, February 3, 2014 at 9:31 AM
>> To:  Daniel Ence <dence at genetics.utah.edu>
>> Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject:  Re: [maker-devel] maker annotation with cufflinks output
>> 
>> Hi Daniel,
>> 
>> I was able to check on some of those questions.
>> 
>> 1. From trinity assembly: I started with 102000 contigs. I used trinotate to
>> annotate proteins in this.
>> 
>> I ran maker on this data with est2genome set to 1. The output looks like this
>> (most important parts on top):
>> 
>>     6653 gene
>>    46675 exon
>>  280534 protein_match
>> 59934 CDS
>>     969 contig
>>  105388 expressed_sequence_match
>>   12584 five_prime_UTR
>>   78565 match
>> 1401369 match_part
>>   10180 mRNA
>>   11545 three_prime_UTR
>> 
>> 2. From cufflinks assembly: I started with 133380 entries (out of which there
>> are 29,000 transcripts).  I used the protein sequences from trinity assembly.
>> 
>> I ran maker on this data with est2genome set to 1. The output looks like
>> this:
>>      29 gene
>>      75 exon
>>  573659 protein_match
>> 67 CDS
>>    1099 contig
>>  269298 expressed_sequence_match
>>      23 five_prime_UTR
>>  173844 match
>> 2221846 match_part
>>      29 mRNA
>>      23 three_prime_UTR
>> 
>> The genes annotated using the trinity assembly is lower than expected, so I
>> went the cufflinks route. I dont understand why when using the cufflinks
>> transcripts, even less genes are being found.
>> 
>> 3. Training SNAP:  I used the results of maker from 1 to train SNAP.  I then
>> used that training set to rerun maker:
>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/sna
>> p/RHA.hmm
>> est2genome=0
>> 
>> And again I got results with no entries for gene, exon, CDS etc.
>> 957 contig
>>   46555 expressed_sequence_match
>>   43651 match
>>  553633 match_part
>>  113738 protein_match
>> 
>> As I mentioned in another email, cegma results indicated that the genome was
>> more than 90% complete. Any suggestions would be helpful.
>> 
>> Thank you
>> Dhivya
>> 
>> 
>> 
>> 
>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>> 
>>> Hi Dhivya, 
>>> 
>>> I think there a few numbers that could be helpful to understand what's
>>> happening here.
>>> 
>>> How many transcripts did Trinity assembly the RNA-seq data into? Also, you
>>> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it
>>> the cufflinks data. How many transcripts did MAKER identify with the
>>> cufflinks data? Did you still get more than the 10,000 transcripts that you
>>> found with just the Trinity data?
>>> 
>>> A key part of MAKER's approach to genome annotation that might be affecting
>>> it's performance is that it only annotates a gene where there is both
>>> evidence (like your RNA-seq data) and an ab-initio prediction. If a
>>> prediction is unsupported by the evidence, then MAKER won't annotate a gene
>>> and if evidence aligns where there's no prediction, MAKER won't annotate a
>>> gene either. What ab-initio predictors are you using and have they been
>>> trained specific genome?
>>> 
>>> You can force MAKER to automatically promote evidence alignments to a gene
>>> model by setting the est2genome option to 1, but that will usually give you
>>> many false positives.
>>> 
>>> Try rerunning it with either the Trinity data or the Cufflinks data and with
>>> est2genome set to 1, and let us know how that affects the MAKER results.
>>> 
>>> Thanks,
>>> Daniel
>>> 
>>> Daniel Ence
>>> Graduate Student
>>> Eccles Institute of Human Genetics
>>> University of Utah
>>> 15 North 2030 East, Room 2100
>>> Salt Lake City, UT 84112-5330
>>> ________________________________________
>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya
>>> arasappan [darasappan at gmail.com]
>>> Sent: Thursday, January 30, 2014 11:18 AM
>>> To: maker-devel at yandell-lab.org
>>> Subject: [maker-devel] maker annotation with cufflinks output
>>> 
>>> Hello,
>>> 
>>> I am trying to annotate a 200 mb plant genome for which I have a very
>>> good assembly.
>>> 
>>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>>> using my genome assembly and the trinity results.  I did not get as
>>> many transcripts as expected, around 10,000 transcripts.
>>> 
>>> So, I decided to try a different approach.  I did a genome assisted
>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
>>> genome assembly and the cufflinks result.  I get much less number of
>>> transcripts as a result.
>>> 
>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>>> confused as to why maker is not finding the same.
>>> 
>>> Any suggestions would be appreciated.
>>> 
>>> Thanks
>>> Dhivya
>>> 
>>> 
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
>> _______________________________________________ maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/0e6ce7ae/attachment-0001.html>

From darasappan at gmail.com  Thu Feb  6 10:01:44 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Thu, 6 Feb 2014 11:01:44 -0600
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <CF19004A.9913%carsonhh@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
	<C375C3D8-1B13-4685-9E90-AAF710CADCDD@gmail.com>
	<CF19004A.9913%carsonhh@gmail.com>
Message-ID: <02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com>

Oh yes I did- I took just the non sequence entries in the gff file and  
used that as my input.  I will rerun snap with the gff file containing  
the sequences as well.

I'm attaching a snippet of the gff file that I used as input to  
maker2zff.

Thanks for your help
Dhivya


On Feb 6, 2014, at 10:05 AM, Carson Holt wrote:

> Your genome.dna file has no sequence?  Did you by any chance strip  
> the fasta sequence from the GFF3 you are using as input to  
> maker2zff?  There should be fasta sequence at the end of that file.   
> Also can I see the GFF3 file you are using as input to maker2zff.
>
> Thanks,
> Carson
>
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Thursday, February 6, 2014 at 7:47 AM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org 
> " <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] maker annotation with cufflinks output
>
> Hello,
>
> I does appear than my genome.ann file from maker2zff script has data  
> in it. However, the SNAP steps after that have created empty files.   
> The following are all empty:
>
> alt.dna  err.dna  export.dna  genome.dna  olp.dna  uni.dna  wrn.dna
> alt.ann  err.ann  export.ann  genome.ann  olp.ann  uni.ann  wrn.ann
>
> When I tried to get gene stats or validate genome.ann, I get errors  
> like this for all of them:
>
> fathom genome.ann genome.dna -gene-stats |more
> MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds  
> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
> exon-5:out_of_bounds exon-6:out_of_bounds
> MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds  
> exon-5:out_of_bounds exon-4:out_of_bounds exon-3:out_of_bounds  
> exon-2:out_of_bounds exon-1:out_of_bounds
> MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds  
> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
> exon-5:out_of_bounds
> MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds  
> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
> exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds  
> exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds  
> exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds  
> exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds  
> exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds  
> exon-20:out_of_bounds exon-21:out_of_bounds
>
> I'm not sure why the annotation I'm seeing in genome.ann are all  
> showing up as errors. I realize this may be an issue with snap, but  
> are you familiar with anything like this? My genome.ann file is  
> attached for reference.
>
> Thanks
> Dhivya
>
> On Feb 5, 2014, at 12:38 PM, Carson Holt wrote:
>
>> Do you have any features of type snap in your results from step 3?   
>> We?ve had a couple of recent posts where after training snap was  
>> giving no results, and as a result maker couldn?t give any genes.   
>> One cause of something like that may be your step 2.  Make sure the  
>> ZFF wasn?t empty you used to train with.  The maker2zff script uses  
>> filters to only put the best genes in the off file, and if all your  
>> genes fail the filtering then you are training with an empty ZFF.
>>
>> Also you should use proteins from a related species as your protein  
>> file.  I see that you protein marches are varying wildly from run  
>> to run? So is your contig count?  Were the subset of contigs you  
>> have results for long enough to contain genes?
>>
>> ?Carson
>>
>> From: dhivya arasappan <darasappan at gmail.com>
>> Date: Monday, February 3, 2014 at 9:31 AM
>> To: Daniel Ence <dence at genetics.utah.edu>
>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] maker annotation with cufflinks output
>>
>> Hi Daniel,
>>
>> I was able to check on some of those questions.
>>
>> 1. From trinity assembly: I started with 102000 contigs. I used  
>> trinotate to annotate proteins in this.
>>
>> I ran maker on this data with est2genome set to 1. The output looks  
>> like this (most important parts on top):
>>
>>     6653 gene
>>    46675 exon
>>  280534 protein_match
>> 59934 CDS
>>     969 contig
>>  105388 expressed_sequence_match
>>   12584 five_prime_UTR
>>   78565 match
>> 1401369 match_part
>>   10180 mRNA
>>   11545 three_prime_UTR
>>
>> 2. From cufflinks assembly: I started with 133380 entries (out of  
>> which there are 29,000 transcripts).  I used the protein sequences  
>> from trinity assembly.
>>
>> I ran maker on this data with est2genome set to 1. The output looks  
>> like this:
>>      29 gene
>>      75 exon
>>  573659 protein_match
>> 67 CDS
>>    1099 contig
>>  269298 expressed_sequence_match
>>      23 five_prime_UTR
>>  173844 match
>> 2221846 match_part
>>      29 mRNA
>>      23 three_prime_UTR
>>
>> The genes annotated using the trinity assembly is lower than  
>> expected, so I went the cufflinks route. I dont understand why when  
>> using the cufflinks transcripts, even less genes are being found.
>>
>> 3. Training SNAP:  I used the results of maker from 1 to train  
>> SNAP.  I then used that training set to rerun maker:
>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
>> maker_mpi_withAlltrinity/snap/RHA.hmm
>> est2genome=0
>>
>> And again I got results with no entries for gene, exon, CDS etc.
>> 957 contig
>>   46555 expressed_sequence_match
>>   43651 match
>>  553633 match_part
>>  113738 protein_match
>>
>> As I mentioned in another email, cegma results indicated that the  
>> genome was more than 90% complete. Any suggestions would be helpful.
>>
>> Thank you
>> Dhivya
>>
>>
>>
>>
>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>>
>>> Hi Dhivya,
>>>
>>> I think there a few numbers that could be helpful to understand  
>>> what's happening here.
>>>
>>> How many transcripts did Trinity assembly the RNA-seq data into?  
>>> Also, you had 29,000 transcripts from cufflinks, but fewer from  
>>> MAKER when you gave it the cufflinks data. How many transcripts  
>>> did MAKER identify with the cufflinks data? Did you still get more  
>>> than the 10,000 transcripts that you found with just the Trinity  
>>> data?
>>>
>>> A key part of MAKER's approach to genome annotation that might be  
>>> affecting it's performance is that it only annotates a gene where  
>>> there is both evidence (like your RNA-seq data) and an ab-initio  
>>> prediction. If a prediction is unsupported by the evidence, then  
>>> MAKER won't annotate a gene and if evidence aligns where there's  
>>> no prediction, MAKER won't annotate a gene either. What ab-initio  
>>> predictors are you using and have they been trained specific genome?
>>>
>>> You can force MAKER to automatically promote evidence alignments  
>>> to a gene model by setting the est2genome option to 1, but that  
>>> will usually give you many false positives.
>>>
>>> Try rerunning it with either the Trinity data or the Cufflinks  
>>> data and with est2genome set to 1, and let us know how that  
>>> affects the MAKER results.
>>>
>>> Thanks,
>>> Daniel
>>>
>>> Daniel Ence
>>> Graduate Student
>>> Eccles Institute of Human Genetics
>>> University of Utah
>>> 15 North 2030 East, Room 2100
>>> Salt Lake City, UT 84112-5330
>>> ________________________________________
>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf  
>>> of dhivya arasappan [darasappan at gmail.com]
>>> Sent: Thursday, January 30, 2014 11:18 AM
>>> To: maker-devel at yandell-lab.org
>>> Subject: [maker-devel] maker annotation with cufflinks output
>>>
>>> Hello,
>>>
>>> I am trying to annotate a 200 mb plant genome for which I have a  
>>> very
>>> good assembly.
>>>
>>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>>> using my genome assembly and the trinity results.  I did not get as
>>> many transcripts as expected, around 10,000 transcripts.
>>>
>>> So, I decided to try a different approach.  I did a genome assisted
>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>>> generated 21,000 genes, 29,000 transcripts.  I then ran maker  
>>> using my
>>> genome assembly and the cufflinks result.  I get much less number of
>>> transcripts as a result.
>>>
>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>>> confused as to why maker is not finding the same.
>>>
>>> Any suggestions would be appreciated.
>>>
>>> Thanks
>>> Dhivya
>>>
>>>
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>> _______________________________________________ maker-devel mailing  
>> list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a662c5a7/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: head.cat.formatted.gff
Type: application/octet-stream
Size: 19905 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a662c5a7/attachment-0001.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a662c5a7/attachment-0003.html>

From sjackman at gmail.com  Thu Feb  6 17:22:57 2014
From: sjackman at gmail.com (Shaun Jackman)
Date: Thu, 6 Feb 2014 16:22:57 -0800
Subject: [maker-devel] Adding MAKER to Homebrew for ease of installation
Message-ID: <CADX6M3r=29brfAzzjPr22mAGW28VUb7np5MJz5bEjsAL-o2r-w@mail.gmail.com>

Hi MAKER developers,

I?d like to add MAKER to Homebrew <http://brew.sh> to make the installation
of MAKER and its dependencies as straight forward as brew install maker.
Homebrew is a system for installing software, originally developed for Mac
OS, and now also for Linux through
Linuxbrew<https://github.com/Homebrew/linuxbrew>.
Homebrew/science <https://github.com/Homebrew/homebrew-science> is a
collection of scientific software, which includes a lot of bioinformatics
software.

I?ve created a prototype for the MAKER installation
script<https://github.com/Homebrew/homebrew-science/blob/maker/maker.rb>(called
a formula, in Homebrew parlance). Is there a static URL for the
source code of MAKER? The current formula won?t work out of the box,
because part of the
URL<https://github.com/Homebrew/homebrew-science/blob/maker/maker.rb#L7>depends
on the user?s unique ID:
http://yandell.topaz.genetics.utah.edu/maker_downloads/$key/maker-2.28.tgz.

Would you be interested in adding MAKER to Homebrew? I know MAKER must be
licensed for commercial use. It is possible for Homebrew to display a
notice of the MAKER license when it?s installed.

MAKER is not available for commercial use without a license. Those wishing
to license MAKER for commercial use should contact Beth Drees at the
University of Utah TCO to discuss your needs.

Cheers,
Shaun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/404a2418/attachment-0001.html>

From bioinformatics.umd at gmail.com  Fri Feb  7 06:29:27 2014
From: bioinformatics.umd at gmail.com (UMD Bioinformatics)
Date: Fri, 7 Feb 2014 08:29:27 -0500
Subject: [maker-devel] NCBI feature table
Message-ID: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com>

Hello Maker Developers,

I have used this software with great success and I continue to look to it going forward. However, as I?m getting ready to submit my annotations to NCBI with the genomes I haven?t found a straightforward method of turning the MAKER produced GFF files into a NCBI feature table. What is the process for creating this table? It seem that the format NCBI is looking for is unique and I haven?t uncovered any scripts or tools to assist in the creation of this table from my annotation files. If anyone has any insight on this issue it would be greatly appreciated.

Cheers
Ian


From mike.thon at gmail.com  Fri Feb  7 07:14:06 2014
From: mike.thon at gmail.com (Michael Thon)
Date: Fri, 7 Feb 2014 15:14:06 +0100
Subject: [maker-devel] NCBI feature table
In-Reply-To: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com>
References: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com>
Message-ID: <7239FEAE-64AF-4F91-B608-EDDF44B7B51D@gmail.com>

Hi Ian -

We've been struggling with this too and I started developing a script to convert the maker gff into ncbi's .tbl format.  However we found that some of the gene models required manual editing so what we do is import the gff into a commercial application called Geneious where we do the edits.  From there we export the data in genbank format and then convert it to .tbl format with a script. Our submission just passed the automated checks and we're waiting for the manual review. Probably none of my code will help you, and in any case its kind of a mess.  The only advice I can offer is to say that you'll probably need some manual editing in your workflow, if not Apollo, then some other app.  In that case you'll need to convert the output of that app into .tbl format.

> On Feb 7, 2014, at 2:29 PM, UMD Bioinformatics <bioinformatics.umd at gmail.com> wrote:
> 
> Hello Maker Developers,
> 
> I have used this software with great success and I continue to look to it going forward. However, as I?m getting ready to submit my annotations to NCBI with the genomes I haven?t found a straightforward method of turning the MAKER produced GFF files into a NCBI feature table. What is the process for creating this table? It seem that the format NCBI is looking for is unique and I haven?t uncovered any scripts or tools to assist in the creation of this table from my annotation files. If anyone has any insight on this issue it would be greatly appreciated.
> 
> Cheers
> Ian
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From cexzurjimenezjr at gmail.com  Thu Feb  6 22:27:13 2014
From: cexzurjimenezjr at gmail.com (Cexzur Jimenez Jr.)
Date: Fri, 7 Feb 2014 13:27:13 +0800
Subject: [maker-devel] Testing MAKER After Installation
Message-ID: <CABb+y6SiT7D8ZLZGLXNdBORAW5ks_GdRdvMhfb0co+kp1N1_2Q@mail.gmail.com>

Hello,

I have finished installing MAKER marked by "PERL Dependencies: INSTALLED,
External Programs: INSTALLED, MPI SUPPORT: NOT CONFIGURED,
MAKER: INSTALLED" and it seems everything's fine. I'm using MAKER 2.10 and
I have followed the installation instructions both in its corresponding
"README" and "INSTALL" files and the 2012 GMOD MAKER Tutorial. After
editing the three configuration files and run with "maker", I saw the
following error in my terminal. I have searched Google and tried the
solutions offered there but the error is still showing. Below is the error
I got:


Can't locate package GDBM_File for @AnyDBM_File::ISA at
/usr/lib/perl/5.14/DB_File.pm line 287.
Can't locate package NDBM_File for @AnyDBM_File::ISA at
/usr/lib/perl/5.14/DB_File.pm line 287.
Can't locate package SDBM_File for @AnyDBM_File::ISA at
/usr/lib/perl/5.14/DB_File.pm line 287.
A data structure will be created for you at:
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore

To access files for individual sequences use the datastore index:
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_master_datastore_index.log


--Next Contig--

#---------------------------------------------------------------------
Now starting the contig!!
SeqID: contig-dpp-500-500
Length: 32156
#---------------------------------------------------------------------


running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
/usr/local/maker/exe/RepeatMasker/RepeatMasker
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb
-species all -dir
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500
-pa 1
#-------------------------------#
Building general libraries in:
/usr/local/maker/exe/RepeatMasker/Libraries/20120418/general
RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb
on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib.
ERROR: RepeatMasker failed

FATAL ERROR
ERROR: Failed while doing repeat masking!!

ERROR: Chunk failed at level 2
!!
FAILED CONTIG:contig-dpp-500-500


--Next Contig--

Processing run.log file...
MAKER WARNING: The file
dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out
did not finish on the last run and must be erased
#---------------------------------------------------------------------
Now retrying the contig!!
SeqID: contig-dpp-500-500
Length: 32156
Retry: 1!!
#---------------------------------------------------------------------


running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
/usr/local/maker/exe/RepeatMasker/RepeatMasker
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb
-species all -dir
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500
-pa 1
#-------------------------------#
Building general libraries in:
/usr/local/maker/exe/RepeatMasker/Libraries/20120418/general
RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb
on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib.
ERROR: RepeatMasker failed

FATAL ERROR
ERROR: Failed while doing repeat masking!!

ERROR: Chunk failed at level 2
!!
FAILED CONTIG:contig-dpp-500-500


--Next Contig--

Processing run.log file...
MAKER WARNING: The file
dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out
did not finish on the last run and must be erased


Maker is now finished!!!


Can you state to me the error and what part of the installation did I go
wrong? Your help will be very much appreciated. Thank you.

Attached herein are configuration files I used for MAKER.


Sincerely,

CJ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140207/b2025b2a/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: maker_bopts.ctl
Type: application/octet-stream
Size: 1501 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140207/b2025b2a/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: maker_exe.ctl
Type: application/octet-stream
Size: 1319 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140207/b2025b2a/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: maker_opts.ctl
Type: application/octet-stream
Size: 4540 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140207/b2025b2a/attachment-0005.obj>

From carson.holt at genetics.utah.edu  Fri Feb  7 11:11:44 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Fri, 7 Feb 2014 18:11:44 +0000
Subject: [maker-devel] Maker installation
In-Reply-To: <CAEpzfGCB9HFkj+Kd2suNTRN_prriqipM26kdj=3gW=QygmXjmw@mail.gmail.com>
References: <CAEpzfGCB9HFkj+Kd2suNTRN_prriqipM26kdj=3gW=QygmXjmw@mail.gmail.com>
Message-ID: <CF1A6E45.99DF%carson.holt@genetics.utah.edu>

Hi Tracy,

The older apollo is pretty much deprecated.  There are still people who like to use it though (myself among them).  You can download and install it manually from here ?> http://sourceforge.net/projects/gmod/files/Apollo/.

If you want to let MAKER install it for you, you can edit the URL in the .../maker/src/locations file to be this ?> http://weatherby.genetics.utah.edu/apollo/apollo.tar.gz

You can also use Web-Apollo for your data if you want, and that is what I would recommend.

On a side note, if you are trying to install the old Apollo as part of the optional web-based GUI, I?d recommend not doing that.  The GUI is really only for demonstration purposes or very small datasets.  It is not for production (that is why it is off by default).

Thanks,
Carson

From: Tracy Smith <tmsmith23 at wisc.edu<mailto:tmsmith23 at wisc.edu>>
Date: Friday, February 7, 2014 at 10:48 AM
To: Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>
Cc: <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Maker installation

Hi,

I am trying to install Maker and am running into the same problem noted on this page, namely I cannot install Apollo.

https://groups.google.com/forum/#!msg/maker-devel/vrVa2mEsKbg/0e_25LvOvdEJ

I tried using the new url you provided, "Here is a new location for the source --> http://sourceforge.net/code-snapshots/svn/g/gm/gmod/svn/gmod-svn-25291-apollo-trunk.zip"
but that url now points nowhere.

Is it possible to use WebApollo instead? Or do you know of another location where a copy of Apollo could be downloaded?

Thank you so much.

Best regards,
Tracy

--
Tracy Smith
University of Wisconsin- Madison
Pepperell Lab
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140207/9ac7950e/attachment-0001.html>

From carson.holt at genetics.utah.edu  Fri Feb  7 11:28:29 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Fri, 7 Feb 2014 18:28:29 +0000
Subject: [maker-devel] NCBI feature table
In-Reply-To: <7239FEAE-64AF-4F91-B608-EDDF44B7B51D@gmail.com>
References: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com>
	<7239FEAE-64AF-4F91-B608-EDDF44B7B51D@gmail.com>
Message-ID: <CF1A7331.9A09%carson.holt@genetics.utah.edu>

Yes.  The non-web version of apollo can open GFF3 and then save to table
format ?> http://sourceforge.net/projects/gmod/files/Apollo/

I?ve also attached a script made by a lab member that can convert MAKER
derived GFF3 gene entries into raw table format, and I?ve CC?d the scripts
author (Michael Campbell) incase you have any questions.

Thanks,
Carson


On 2/7/14, 7:14 AM, "Michael Thon" <mike.thon at gmail.com> wrote:

>Hi Ian -
>
>We've been struggling with this too and I started developing a script to
>convert the maker gff into ncbi's .tbl format.  However we found that
>some of the gene models required manual editing so what we do is import
>the gff into a commercial application called Geneious where we do the
>edits.  From there we export the data in genbank format and then convert
>it to .tbl format with a script. Our submission just passed the automated
>checks and we're waiting for the manual review. Probably none of my code
>will help you, and in any case its kind of a mess.  The only advice I can
>offer is to say that you'll probably need some manual editing in your
>workflow, if not Apollo, then some other app.  In that case you'll need
>to convert the output of that app into .tbl format.
>
>> On Feb 7, 2014, at 2:29 PM, UMD Bioinformatics
>><bioinformatics.umd at gmail.com> wrote:
>> 
>> Hello Maker Developers,
>> 
>> I have used this software with great success and I continue to look to
>>it going forward. However, as I?m getting ready to submit my annotations
>>to NCBI with the genomes I haven?t found a straightforward method of
>>turning the MAKER produced GFF files into a NCBI feature table. What is
>>the process for creating this table? It seem that the format NCBI is
>>looking for is unique and I haven?t uncovered any scripts or tools to
>>assist in the creation of this table from my annotation files. If anyone
>>has any insight on this issue it would be greatly appreciated.
>> 
>> Cheers
>> Ian
>> 
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: gff32table
Type: application/octet-stream
Size: 7511 bytes
Desc: gff32table
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140207/2e51f964/attachment-0001.obj>

From carson.holt at genetics.utah.edu  Fri Feb  7 11:31:17 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Fri, 7 Feb 2014 18:31:17 +0000
Subject: [maker-devel] Testing MAKER After Installation
In-Reply-To: <CABb+y6SiT7D8ZLZGLXNdBORAW5ks_GdRdvMhfb0co+kp1N1_2Q@mail.gmail.com>
References: <CABb+y6SiT7D8ZLZGLXNdBORAW5ks_GdRdvMhfb0co+kp1N1_2Q@mail.gmail.com>
Message-ID: <CF1A7417.9A11%carson.holt@genetics.utah.edu>

That can happen on some systems with that very old version of MAKER.  Use MAKER 2.28 or 2.30 instead ?> http://www.yandell-lab.org/software/maker.html

Thanks,
Carson


From: "Cexzur Jimenez Jr." <cexzurjimenezjr at gmail.com<mailto:cexzurjimenezjr at gmail.com>>
Date: Thursday, February 6, 2014 at 10:27 PM
To: <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: [maker-devel] Testing MAKER After Installation

Hello,

I have finished installing MAKER marked by "PERL Dependencies: INSTALLED, External Programs: INSTALLED, MPI SUPPORT: NOT CONFIGURED,
MAKER: INSTALLED" and it seems everything's fine. I'm using MAKER 2.10 and I have followed the installation instructions both in its corresponding "README" and "INSTALL" files and the 2012 GMOD MAKER Tutorial. After editing the three configuration files and run with "maker", I saw the following error in my terminal. I have searched Google and tried the solutions offered there but the error is still showing. Below is the error I got:


Can't locate package GDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287.
Can't locate package NDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287.
Can't locate package SDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287.
A data structure will be created for you at:
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore

To access files for individual sequences use the datastore index:
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_master_datastore_index.log


--Next Contig--

#---------------------------------------------------------------------
Now starting the contig!!
SeqID: contig-dpp-500-500
Length: 32156
#---------------------------------------------------------------------


running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
/usr/local/maker/exe/RepeatMasker/RepeatMasker /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb -species all -dir /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500 -pa 1
#-------------------------------#
Building general libraries in: /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general
RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib.
ERROR: RepeatMasker failed

FATAL ERROR
ERROR: Failed while doing repeat masking!!

ERROR: Chunk failed at level 2
!!
FAILED CONTIG:contig-dpp-500-500


--Next Contig--

Processing run.log file...
MAKER WARNING: The file dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out
did not finish on the last run and must be erased
#---------------------------------------------------------------------
Now retrying the contig!!
SeqID: contig-dpp-500-500
Length: 32156
Retry: 1!!
#---------------------------------------------------------------------


running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
/usr/local/maker/exe/RepeatMasker/RepeatMasker /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb -species all -dir /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500 -pa 1
#-------------------------------#
Building general libraries in: /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general
RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib.
ERROR: RepeatMasker failed

FATAL ERROR
ERROR: Failed while doing repeat masking!!

ERROR: Chunk failed at level 2
!!
FAILED CONTIG:contig-dpp-500-500


--Next Contig--

Processing run.log file...
MAKER WARNING: The file dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out
did not finish on the last run and must be erased


Maker is now finished!!!


Can you state to me the error and what part of the installation did I go wrong? Your help will be very much appreciated. Thank you.

Attached herein are configuration files I used for MAKER.


Sincerely,

CJ

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140207/333ceab2/attachment-0001.html>

From bhall7 at hawaii.edu  Fri Feb  7 17:31:36 2014
From: bhall7 at hawaii.edu (Brian Hall)
Date: Fri, 07 Feb 2014 14:31:36 -1000
Subject: [maker-devel] NCBI feature table
In-Reply-To: <mailman.61.1391786169.26968.maker-devel_yandell-lab.org@box290.bluehost.com>
References: <mailman.61.1391786169.26968.maker-devel_yandell-lab.org@box290.bluehost.com>
Message-ID: <52F57AE8.5090002@hawaii.edu>

Hi Ian,

My colleagues are also working on preparing a genome for submission to 
the NCBI. The software we are developing for this task is still a work 
in progress, but you are welcome to give it a try:

https://github.com/tedsta/GAG

It's a console-based application and it requires Python 2.6. Its 
strength is in filtering and modifying large segments of the genome at 
once -- where Apollo is good for removing a few erroneous exons, we are 
dealing with lists of dozens or more. This program seeks to make such 
changes as painless as possible.

My advice is to try the simplest gff3-to-tbl script you can find and 
then run tbl2asn. If it works out okay, great! If you get a massive 
error report, get in touch and we'll help you out if we can :)

--Brian

On 02/07/2014 05:16 AM, maker-devel-request at yandell-lab.org wrote:
> Date: Fri, 7 Feb 2014 08:29:27 -0500
> From: UMD Bioinformatics <bioinformatics.umd at gmail.com>
> To: maker-devel at yandell-lab.org
> Subject: [maker-devel] NCBI feature table
> Message-ID: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8 at gmail.com>
> Content-Type: text/plain; charset=windows-1252
>
> Hello Maker Developers,
>
> I have used this software with great success and I continue to look to it going forward. However, as I?m getting ready to submit my annotations to NCBI with the genomes I haven?t found a straightforward method of turning the MAKER produced GFF files into a NCBI feature table. What is the process for creating this table? It seem that the format NCBI is looking for is unique and I haven?t uncovered any scripts or tools to assist in the creation of this table from my annotation files. If anyone has any insight on this issue it would be greatly appreciated.
>
> Cheers
> Ian
>


From tmsmith23 at wisc.edu  Fri Feb  7 10:48:13 2014
From: tmsmith23 at wisc.edu (Tracy Smith)
Date: Fri, 7 Feb 2014 11:48:13 -0600
Subject: [maker-devel] Maker installation
Message-ID: <CAEpzfGCB9HFkj+Kd2suNTRN_prriqipM26kdj=3gW=QygmXjmw@mail.gmail.com>

Hi,

I am trying to install Maker and am running into the same problem noted on
this page, namely I cannot install Apollo.

https://groups.google.com/forum/#!msg/maker-devel/vrVa2mEsKbg/0e_25LvOvdEJ

I tried using the new url you provided, "Here is a new location for the
source -->
http://sourceforge.net/code-snapshots/svn/g/gm/gmod/svn/gmod-svn-25291-apollo-trunk.zip
"
but that url now points nowhere.

Is it possible to use WebApollo instead? Or do you know of another location
where a copy of Apollo could be downloaded?

Thank you so much.

Best regards,
Tracy

-- 
Tracy Smith
University of Wisconsin- Madison
Pepperell Lab
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140207/0ddc7929/attachment-0001.html>

From carsonhh at gmail.com  Mon Feb 10 08:34:58 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Feb 2014 08:34:58 -0700
Subject: [maker-devel] MAKER presentation at PAG
In-Reply-To: <CAAer89Z=ivW==Pv0eSA+RQtPg1r9JoLHv7hH+TP2c4=DUwh8tg@mail.gmail.com>
References: <CAAer89Z=ivW==Pv0eSA+RQtPg1r9JoLHv7hH+TP2c4=DUwh8tg@mail.gmail.com>
Message-ID: <CF1E3E65.9B13%carsonhh@gmail.com>

* 
* maker_map_ids - Build shorter IDs/Names for MAKER genes and transcripts
following the NCBI suggested naming format.
* map_fasta_ids - Maps short IDs/Names generated by maker_map_ids to MAKER
fasta files.
* map_gff_ids - Maps short IDs/Names generated by maker_map_id to MAKER GFF3
files, old IDs/Names are mapped to to the Alias attribute.
* maker_functional_fasta - Maps putative functions identified from BLASTP
against UniProt/SwissProt to the MAKER produced transcript and protein fasta
files.
* maker_functional_gff - Maps putative functions identified from BLASTP
against UniProt/SwissProt to the MAKER produced GFF3 files in the Note
attribute
* ipr_update_gff - Takes InterproScan (iprscan) output and maps domain IDs
and GO terms to the Dbxref and Ontology_term attributes in the GFF3 file.
This is meta data that shows up when you click on an annotation in JBrowse
/GBrowse.
* iprscan2gff3 - Takes InerproScan (iprscan) output and generates GFF3
features representing domains. Interesting tier for GBrowse. These are
visible features tracks that can be seen in JBrowse/GBrowse.
Thanks,
Carson

From:  Kevin Dorn <dorn at umn.edu>
Date:  Sunday, February 9, 2014 at 9:23 PM
To:  <carson.holt at utah.edu>
Subject:  MAKER presentation at PAG

Hi Carson, 

I saw your MAKER presentation at PAG this year and have a quick question.
I've used MAKER to annotate the plant genome we're working on, and am mostly
done. I had to step out for a second during your talk, and when I came back,
you were talking about how you can transfer meaningful annotations (getting
rid of the 'ugly MAKER names' for genes). Is there an accessory script to do
this? 

Thanks, 
Kevin Dorn 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140210/26f43039/attachment-0001.html>

From amitha at ccmb.res.in  Mon Feb 10 00:04:37 2014
From: amitha at ccmb.res.in (AMITHA SAMPATH KUMAR)
Date: Mon, 10 Feb 2014 12:34:37 +0530 (IST)
Subject: [maker-devel] Falied to create new account
In-Reply-To: <bea52988-c660-488d-aae4-196364348cea@node1>
Message-ID: <11349995-a97a-43fd-9fd6-420dd067cd6b@node1>

Hi,

I an interested in using Maker online version, for which i tried to create a profile using the email id 'amitha at ccmb.res.in', but unfortunately, I did not successfully login. 
I am also pasting a link of the error here, http://weatherby.genetics.utah.edu/cgi-bin/mwas/maker.cgi.

The error mentioned is:
Error executing run mode 'forgot_login': Can't call method "MailMsg" without a package or object reference at /var/www/cgi-bin/mwas/lib/MWAS_util.pm line 529.
 at /var/www/cgi-bin/mwas/maker.cgi line 21.

Kindly help me through the registration asap.

Thanks
Amitha.


From listona at science.oregonstate.edu  Sat Feb  8 19:08:42 2014
From: listona at science.oregonstate.edu (Aaron Liston)
Date: Sat, 08 Feb 2014 18:08:42 -0800
Subject: [maker-devel] Re-using repeat masking in SNAP training
Message-ID: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu>

I am following the tutorial for training SNAP, and it works fine.  
However, the tutorial instructions have MAKER repeat the repeat  
masking. To avoid this, I concatenated my gff files from the first  
round of annotation and used maker_gff=round1.gff and rm_pass=1  but  
at the end of the process, the repeat annotations were not there. Any  
suggestions?  Thanks, Aaron


From caigh02 at gmail.com  Sun Feb  9 20:26:57 2014
From: caigh02 at gmail.com (Guohong Cai)
Date: Sun, 9 Feb 2014 21:26:57 -0600
Subject: [maker-devel] Fwd: Functional annotation of MAKER gene models
In-Reply-To: <CAOcLemT5qaFvSRfjQ1QrObr9WCLh915aJ14a7ZbSemcuOBypfQ@mail.gmail.com>
References: <CAOcLemT5qaFvSRfjQ1QrObr9WCLh915aJ14a7ZbSemcuOBypfQ@mail.gmail.com>
Message-ID: <CAOcLemT3CCPmWMpwoZr_w322Gv9ZXFrmD70t7ygZWOk1Kq9TMg@mail.gmail.com>

I sent the following message to Carson but forgot to send to the
maker-devel list

Hi Carson,

Again need your help!

With your guidance, I have the gene models for my genomes. Now I am trying
to assign functions to the gene models. I noticed that I can use
maker_functional_gff/fasta or interproScan. I dig out some old messages in
maker-devel google group, but still have a few questions:

1. Will maker_functional_gff/fasta take NCBI blastp results, or only
wu-blast results? I do not have wu-blast.

2. Do I have to use Uniprot/Swiss_prot database or I can use something
else? For example, may I add a few high-quality genome annotations of
related species to the swiss_prot database? Or may I use Uniref90 or nr
database instead of swiss_prot?

3. Do you have a script to integrate blast2go results to the maker
gff/fasta?

Thanks.

Guohong

Rutgers University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140209/bad045be/attachment-0001.html>

From carsonhh at gmail.com  Mon Feb 10 10:25:06 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Feb 2014 10:25:06 -0700
Subject: [maker-devel] Falied to create new account
In-Reply-To: <11349995-a97a-43fd-9fd6-420dd067cd6b@node1>
References: <bea52988-c660-488d-aae4-196364348cea@node1>
	<11349995-a97a-43fd-9fd6-420dd067cd6b@node1>
Message-ID: <CF1E5936.9B37%carsonhh@gmail.com>

The smtp server that sends e-mails out is just down.  So when you said you
forgot your login, it couldn?t e-mail you.  I switched to a different
server for the time being.

?Carson


On 2/10/14, 12:04 AM, "AMITHA SAMPATH KUMAR" <amitha at ccmb.res.in> wrote:

>Hi,
>
>I an interested in using Maker online version, for which i tried to
>create a profile using the email id 'amitha at ccmb.res.in', but
>unfortunately, I did not successfully login.
>I am also pasting a link of the error here,
>http://weatherby.genetics.utah.edu/cgi-bin/mwas/maker.cgi.
>
>The error mentioned is:
>Error executing run mode 'forgot_login': Can't call method "MailMsg"
>without a package or object reference at
>/var/www/cgi-bin/mwas/lib/MWAS_util.pm line 529.
> at /var/www/cgi-bin/mwas/maker.cgi line 21.
>
>Kindly help me through the registration asap.
>
>Thanks
>Amitha.
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Mon Feb 10 10:26:06 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Feb 2014 10:26:06 -0700
Subject: [maker-devel] Fwd: Functional annotation of MAKER gene models
In-Reply-To: <CAOcLemT3CCPmWMpwoZr_w322Gv9ZXFrmD70t7ygZWOk1Kq9TMg@mail.gmail.com>
References: <CAOcLemT5qaFvSRfjQ1QrObr9WCLh915aJ14a7ZbSemcuOBypfQ@mail.gmail.com>
	<CAOcLemT3CCPmWMpwoZr_w322Gv9ZXFrmD70t7ygZWOk1Kq9TMg@mail.gmail.com>
Message-ID: <CF1E59B4.9B3B%carsonhh@gmail.com>

1. yes. It should take NCBI BLAST+ results.
2. It has to be UniProt/Swissprot or you can modify the comments of another
database to look like UniProt/Swissport
3. ipr_update_gff, can also take BLAST2GO results as an undocumented feature
(or at least it could last time I tested it - which was quite a long time
ago).

Thanks,
Carson

From:  Guohong Cai <caigh02 at gmail.com>
Date:  Sunday, February 9, 2014 at 8:26 PM
To:  <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Fwd: Functional annotation of MAKER gene models

I sent the following message to Carson but forgot to send to the maker-devel
list

Hi Carson,

Again need your help!

With your guidance, I have the gene models for my genomes. Now I am trying
to assign functions to the gene models. I noticed that I can use
maker_functional_gff/fasta or interproScan. I dig out some old messages in
maker-devel google group, but still have a few questions:

1. Will maker_functional_gff/fasta take NCBI blastp results, or only
wu-blast results? I do not have wu-blast.

2. Do I have to use Uniprot/Swiss_prot database or I can use something else?
For example, may I add a few high-quality genome annotations of related
species to the swiss_prot database? Or may I use Uniref90 or nr database
instead of swiss_prot?

3. Do you have a script to integrate blast2go results to the maker
gff/fasta?  

Thanks.

Guohong

Rutgers University 

_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140210/5042428b/attachment-0001.html>

From barry.utah at gmail.com  Mon Feb 10 12:21:31 2014
From: barry.utah at gmail.com (Barry Moore)
Date: Mon, 10 Feb 2014 12:21:31 -0700
Subject: [maker-devel] Re-using repeat masking in SNAP training
In-Reply-To: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu>
References: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu>
Message-ID: <78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu>

Hi Arron,

If you re-run maker and don't change the details about the repeat library (i.e. you only update the SNAP HMM file) then MAKER shouldn't redo any work with repeat masking it should reuse the work it has already done.  Is this not what you are seeing?

Barry


On Feb 8, 2014, at 7:08 PM, Aaron Liston wrote:

> I am following the tutorial for training SNAP, and it works fine. However, the tutorial instructions have MAKER repeat the repeat masking. To avoid this, I concatenated my gff files from the first round of annotation and used maker_gff=round1.gff and rm_pass=1  but at the end of the process, the repeat annotations were not there. Any suggestions?  Thanks, Aaron
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140210/15a1305a/attachment-0001.html>

From listona at science.oregonstate.edu  Mon Feb 10 12:46:06 2014
From: listona at science.oregonstate.edu (Aaron Liston)
Date: Mon, 10 Feb 2014 11:46:06 -0800
Subject: [maker-devel] Re-using repeat masking in SNAP training
In-Reply-To: <78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu>
References: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu>
	<78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu>
Message-ID: <02b401cf2698$bd2a1550$377e3ff0$@science.oregonstate.edu>

Hi Barry:   I changed the name of the genome file, so that I could see the
results at each step. However, it sounds like if I had kept the same name,
MAKER would use the info from the previous run.  Is that correct?  Aaron

 
From: Barry Moore [mailto:barry.utah at gmail.com] 
Sent: Monday, February 10, 2014 11:22 AM
To: Aaron Liston
Cc: maker-devel at yandell-lab.org
Subject: Re: [maker-devel] Re-using repeat masking in SNAP training

 
Hi Arron,

 
If you re-run maker and don't change the details about the repeat library
(i.e. you only update the SNAP HMM file) then MAKER shouldn't redo any work
with repeat masking it should reuse the work it has already done.  Is this
not what you are seeing?

 
Barry

 
On Feb 8, 2014, at 7:08 PM, Aaron Liston wrote:


I am following the tutorial for training SNAP, and it works fine. However,
the tutorial instructions have MAKER repeat the repeat masking. To avoid
this, I concatenated my gff files from the first round of annotation and
used maker_gff=round1.gff and rm_pass=1  but at the end of the process, the
repeat annotations were not there. Any suggestions?  Thanks, Aaron


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

 
Barry Moore

Research Scientist

Dept. of Human Genetics

University of Utah

Salt Lake City, UT 84112

--------------------------------------------

(801) 585-3543

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140210/6c808a76/attachment-0001.html>

From barry.utah at gmail.com  Mon Feb 10 12:56:26 2014
From: barry.utah at gmail.com (Barry Moore)
Date: Mon, 10 Feb 2014 12:56:26 -0700
Subject: [maker-devel] Re-using repeat masking in SNAP training
In-Reply-To: <02b401cf2698$bd2a1550$377e3ff0$@science.oregonstate.edu>
References: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu>
	<78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu>
	<02b401cf2698$bd2a1550$377e3ff0$@science.oregonstate.edu>
Message-ID: <19FC4633-46F6-4B32-820A-A68C242A1E77@gmail.com>

Yep.  If you want to keep the results from each step just copy the GFF3 file from your first run to a new name and then redo your run.

B

On Feb 10, 2014, at 12:46 PM, Aaron Liston wrote:

> Hi Barry:   I changed the name of the genome file, so that I could see the results at each step. However, it sounds like if I had kept the same name, MAKER would use the info from the previous run.  Is that correct?  Aaron
>  
> From: Barry Moore [mailto:barry.utah at gmail.com] 
> Sent: Monday, February 10, 2014 11:22 AM
> To: Aaron Liston
> Cc: maker-devel at yandell-lab.org
> Subject: Re: [maker-devel] Re-using repeat masking in SNAP training
>  
> Hi Arron,
>  
> If you re-run maker and don't change the details about the repeat library (i.e. you only update the SNAP HMM file) then MAKER shouldn't redo any work with repeat masking it should reuse the work it has already done.  Is this not what you are seeing?
>  
> Barry
>  
>  
> On Feb 8, 2014, at 7:08 PM, Aaron Liston wrote:
> 
> 
> I am following the tutorial for training SNAP, and it works fine. However, the tutorial instructions have MAKER repeat the repeat masking. To avoid this, I concatenated my gff files from the first round of annotation and used maker_gff=round1.gff and rm_pass=1  but at the end of the process, the repeat annotations were not there. Any suggestions?  Thanks, Aaron
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>  
> Barry Moore
> Research Scientist
> Dept. of Human Genetics
> University of Utah
> Salt Lake City, UT 84112
> --------------------------------------------
> (801) 585-3543
>  
>  
>  
>  

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140210/344b73a2/attachment-0001.html>

From dence at genetics.utah.edu  Tue Feb 11 11:37:36 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Tue, 11 Feb 2014 18:37:36 +0000
Subject: [maker-devel] Falied to create new account
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A89089AF@SKREGIXES2.AGR.GC.CA>
References: <bea52988-c660-488d-aae4-196364348cea@node1>
	<11349995-a97a-43fd-9fd6-420dd067cd6b@node1>
	<CF1E5936.9B37%carsonhh@gmail.com>
	<E8EDFB90D92694478065C37017B3A3A6A8908910@SKREGIXES2.AGR.GC.CA>
	<CF1FA919.9BBB%carsonhh@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D445A3@mxb2.hg.genetics.utah.edu>,
	<E8EDFB90D92694478065C37017B3A3A6A89089AF@SKREGIXES2.AGR.GC.CA>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D445B3@mxb2.hg.genetics.utah.edu>

Hossein, 

Ok. So since this error came up on a local install, I'm going to need some more information to understand what went wrong. Is it the same contig that always causes this error? If it is, then is the the only error or warning that MAKER encounters while running on this contig? Or, if multiple contigs fail, then is it always the same error? 

If you can narrow it down to the smallest possible dataset that consistently gives the same error, then we canb egin to understand what's wrong. 

Thanks,
Daniel 


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
Sent: Tuesday, February 11, 2014 11:20 AM
To: Daniel Ence
Subject: Re: [maker-devel] Falied to create new account

Hi Daniel

I running it through the local server at my work


M. Hossein Borhan, Ph.D.
Research Scientist/ Chercheur Scientifique
Saskatoon Research Centre/Centre de Recherches de Saskatoon
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
107 Science Place, Saskatoon, SK.,S7N 0X2
Telephone/T?l?phone: (306) 385-9441
Facsimile/T?l?copieur: (306) 385-9482
Hossein.borhan at agr.gc.ca


On 14-02-11 12:16 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hi Hossein,
>
>Did you encounter this error while you were running MAKER on your local
>machine or through the MAKER web annotation service?
>
>Thanks,
>Daniel
>
>
>Daniel Ence
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: Carson Holt [carsonhh at gmail.com]
>Sent: Tuesday, February 11, 2014 10:18 AM
>To: Daniel Ence
>Cc: Mark Yandell
>Subject: FW: [maker-devel] Falied to create new account
>
>Hey Daniel could you download his dataset, and see if you can replicate
>the error.  Also check if this was an MWAS job or a local maker run (his
>dataset will already be there for MWAS, you just need the job ID).
>
>Thanks,
>Carson
>
>On 2/11/14, 10:16 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:
>
>>Hi Carson
>>
>>
>>I encountered this error while running maker
>>
>>FATAL ERROR
>>ERROR: Failed while processing the chunk divide!!
>>
>>ERROR: Chunk failed at level 17
>>!!
>>FAILED CONTIG:PbPT3Sc00006
>>
>>
>>
>>
>>
>>HB
>>
>>
>>
>>
>>
>>
>>
>>>
>>
>
>


From darasappan at gmail.com  Tue Feb 11 11:48:23 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Tue, 11 Feb 2014 12:48:23 -0600
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <CF19187C.994D%carsonhh@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
	<C375C3D8-1B13-4685-9E90-AAF710CADCDD@gmail.com>
	<CF19004A.9913%carsonhh@gmail.com>
	<02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com>
	<CF190E83.9927%carsonhh@gmail.com>
	<CAGWaY_4mGU2DLWwcQ=_F3-O+YE1ZmDtE=zgdi6cVouhkH=N5HQ@mail.gmail.com>
	<CF19187C.994D%carsonhh@gmail.com>
Message-ID: <0BB3E178-1CA3-46E7-8923-3E7C6B834665@gmail.com>

With your suggested changes (using a protein file not derived from the  
RNA-seq data and fixing the gff file for training SNAP), I was able to  
increase the number of genes from 6000+ to 18116.

I'm now trying to evaluate the quality of the annotation.  I have a  
question about the usage for mpi_evaluator.

In the maker tutorial,  the usage is given as:

  mpi_evaluator [options] <eval_opts> <eval_bopts> <eval_exe>
What files are being referred to in the input parameters: eval_opts,  
eval_bopts and eval_exe?

Thanks
Dhivya

On Feb 6, 2014, at 11:47 AM, Carson Holt wrote:

> Ok.  Content looks good.  Just make sure to use gff3_merge to join  
> the GFF3?s without stripping out the fasta sequence at the end when  
> training SNAP.
>
> Thanks,
> Carson
>
>
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Thursday, February 6, 2014 at 10:29 AM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: Daniel Ence <dence at genetics.utah.edu>
> Subject: Re: [maker-devel] maker annotation with cufflinks output
>
> Sorry I was just trying to make it small enough to be approved by  
> the mailing list.
>
> Here is the whole file:
>
>
>  cat.formatted.gff.tgz
>
>
>
> On Thu, Feb 6, 2014 at 11:04 AM, Carson Holt <carsonhh at gmail.com>  
> wrote:
>> Could you give me the file without using 'head? to trim it, its  
>> cutting it before it reaches the part I?m interested in.
>>
>> ?Carson
>>
>>
>> From: dhivya arasappan <darasappan at gmail.com>
>> Date: Thursday, February 6, 2014 at 10:01 AM
>>
>> To: Carson Holt <carsonhh at gmail.com>
>> Cc: Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org 
>> " <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] maker annotation with cufflinks output
>>
>> Oh yes I did- I took just the non sequence entries in the gff file  
>> and used that as my input.  I will rerun snap with the gff file  
>> containing the sequences as well.
>>
>> I'm attaching a snippet of the gff file that I used as input to  
>> maker2zff.
>>
>> Thanks for your help
>> Dhivya
>>
>>
>>
>>
>> On Feb 6, 2014, at 10:05 AM, Carson Holt wrote:
>>
>>> Your genome.dna file has no sequence?  Did you by any chance strip  
>>> the fasta sequence from the GFF3 you are using as input to  
>>> maker2zff?  There should be fasta sequence at the end of that  
>>> file.  Also can I see the GFF3 file you are using as input to  
>>> maker2zff.
>>>
>>> Thanks,
>>> Carson
>>>
>>> From: dhivya arasappan <darasappan at gmail.com>
>>> Date: Thursday, February 6, 2014 at 7:47 AM
>>> To: Carson Holt <carsonhh at gmail.com>
>>> Cc: Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org 
>>> " <maker-devel at yandell-lab.org>
>>> Subject: Re: [maker-devel] maker annotation with cufflinks output
>>>
>>> Hello,
>>>
>>> I does appear than my genome.ann file from maker2zff script has  
>>> data in it. However, the SNAP steps after that have created empty  
>>> files.  The following are all empty:
>>>
>>> alt.dna  err.dna  export.dna  genome.dna  olp.dna  uni.dna  wrn.dna
>>> alt.ann  err.ann  export.ann  genome.ann  olp.ann  uni.ann  wrn.ann
>>>
>>> When I tried to get gene stats or validate genome.ann, I get  
>>> errors like this for all of them:
>>>
>>> fathom genome.ann genome.dna -gene-stats |more
>>> MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds  
>>> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
>>> exon-5:out_of_bounds exon-6:out_of_bounds
>>> MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds  
>>> exon-5:out_of_bounds exon-4:out_of_bounds exon-3:out_of_bounds  
>>> exon-2:out_of_bounds exon-1:out_of_bounds
>>> MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds  
>>> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
>>> exon-5:out_of_bounds
>>> MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds  
>>> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
>>> exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds  
>>> exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds  
>>> exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds  
>>> exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds  
>>> exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds  
>>> exon-20:out_of_bounds exon-21:out_of_bounds
>>>
>>> I'm not sure why the annotation I'm seeing in genome.ann are all  
>>> showing up as errors. I realize this may be an issue with snap,  
>>> but are you familiar with anything like this? My genome.ann file  
>>> is attached for reference.
>>>
>>> Thanks
>>> Dhivya
>>>
>>> On Feb 5, 2014, at 12:38 PM, Carson Holt wrote:
>>>
>>>> Do you have any features of type snap in your results from step  
>>>> 3?  We?ve had a couple of recent posts where after training snap  
>>>> was giving no results, and as a result maker couldn?t give any  
>>>> genes.  One cause of something like that may be your step 2.   
>>>> Make sure the ZFF wasn?t empty you used to train with.  The  
>>>> maker2zff script uses filters to only put the best genes in the  
>>>> off file, and if all your genes fail the filtering then you are  
>>>> training with an empty ZFF.
>>>>
>>>> Also you should use proteins from a related species as your  
>>>> protein file.  I see that you protein marches are varying wildly  
>>>> from run to run? So is your contig count?  Were the subset of  
>>>> contigs you have results for long enough to contain genes?
>>>>
>>>> ?Carson
>>>>
>>>> From: dhivya arasappan <darasappan at gmail.com>
>>>> Date: Monday, February 3, 2014 at 9:31 AM
>>>> To: Daniel Ence <dence at genetics.utah.edu>
>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>> Subject: Re: [maker-devel] maker annotation with cufflinks output
>>>>
>>>> Hi Daniel,
>>>>
>>>> I was able to check on some of those questions.
>>>>
>>>> 1. From trinity assembly: I started with 102000 contigs. I used  
>>>> trinotate to annotate proteins in this.
>>>>
>>>> I ran maker on this data with est2genome set to 1. The output  
>>>> looks like this (most important parts on top):
>>>>
>>>>     6653 gene
>>>>    46675 exon
>>>>  280534 protein_match
>>>> 59934 CDS
>>>>     969 contig
>>>>  105388 expressed_sequence_match
>>>>   12584 five_prime_UTR
>>>>   78565 match
>>>> 1401369 match_part
>>>>   10180 mRNA
>>>>   11545 three_prime_UTR
>>>>
>>>> 2. From cufflinks assembly: I started with 133380 entries (out of  
>>>> which there are 29,000 transcripts).  I used the protein  
>>>> sequences from trinity assembly.
>>>>
>>>> I ran maker on this data with est2genome set to 1. The output  
>>>> looks like this:
>>>>      29 gene
>>>>      75 exon
>>>>  573659 protein_match
>>>> 67 CDS
>>>>    1099 contig
>>>>  269298 expressed_sequence_match
>>>>      23 five_prime_UTR
>>>>  173844 match
>>>> 2221846 match_part
>>>>      29 mRNA
>>>>      23 three_prime_UTR
>>>>
>>>> The genes annotated using the trinity assembly is lower than  
>>>> expected, so I went the cufflinks route. I dont understand why  
>>>> when using the cufflinks transcripts, even less genes are being  
>>>> found.
>>>>
>>>> 3. Training SNAP:  I used the results of maker from 1 to train  
>>>> SNAP.  I then used that training set to rerun maker:
>>>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
>>>> maker_mpi_withAlltrinity/snap/RHA.hmm
>>>> est2genome=0
>>>>
>>>> And again I got results with no entries for gene, exon, CDS etc.
>>>> 957 contig
>>>>   46555 expressed_sequence_match
>>>>   43651 match
>>>>  553633 match_part
>>>>  113738 protein_match
>>>>
>>>> As I mentioned in another email, cegma results indicated that the  
>>>> genome was more than 90% complete. Any suggestions would be  
>>>> helpful.
>>>>
>>>> Thank you
>>>> Dhivya
>>>>
>>>>
>>>>
>>>>
>>>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>>>>
>>>>> Hi Dhivya,
>>>>>
>>>>> I think there a few numbers that could be helpful to understand  
>>>>> what's happening here.
>>>>>
>>>>> How many transcripts did Trinity assembly the RNA-seq data into?  
>>>>> Also, you had 29,000 transcripts from cufflinks, but fewer from  
>>>>> MAKER when you gave it the cufflinks data. How many transcripts  
>>>>> did MAKER identify with the cufflinks data? Did you still get  
>>>>> more than the 10,000 transcripts that you found with just the  
>>>>> Trinity data?
>>>>>
>>>>> A key part of MAKER's approach to genome annotation that might  
>>>>> be affecting it's performance is that it only annotates a gene  
>>>>> where there is both evidence (like your RNA-seq data) and an ab- 
>>>>> initio prediction. If a prediction is unsupported by the  
>>>>> evidence, then MAKER won't annotate a gene and if evidence  
>>>>> aligns where there's no prediction, MAKER won't annotate a gene  
>>>>> either. What ab-initio predictors are you using and have they  
>>>>> been trained specific genome?
>>>>>
>>>>> You can force MAKER to automatically promote evidence alignments  
>>>>> to a gene model by setting the est2genome option to 1, but that  
>>>>> will usually give you many false positives.
>>>>>
>>>>> Try rerunning it with either the Trinity data or the Cufflinks  
>>>>> data and with est2genome set to 1, and let us know how that  
>>>>> affects the MAKER results.
>>>>>
>>>>> Thanks,
>>>>> Daniel
>>>>>
>>>>> Daniel Ence
>>>>> Graduate Student
>>>>> Eccles Institute of Human Genetics
>>>>> University of Utah
>>>>> 15 North 2030 East, Room 2100
>>>>> Salt Lake City, UT 84112-5330
>>>>> ________________________________________
>>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on  
>>>>> behalf of dhivya arasappan [darasappan at gmail.com]
>>>>> Sent: Thursday, January 30, 2014 11:18 AM
>>>>> To: maker-devel at yandell-lab.org
>>>>> Subject: [maker-devel] maker annotation with cufflinks output
>>>>>
>>>>> Hello,
>>>>>
>>>>> I am trying to annotate a 200 mb plant genome for which I have a  
>>>>> very
>>>>> good assembly.
>>>>>
>>>>> I tried to denovo assemble RNA-seq data using trinity and ran  
>>>>> maker
>>>>> using my genome assembly and the trinity results.  I did not get  
>>>>> as
>>>>> many transcripts as expected, around 10,000 transcripts.
>>>>>
>>>>> So, I decided to try a different approach.  I did a genome  
>>>>> assisted
>>>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>>>>> generated 21,000 genes, 29,000 transcripts.  I then ran maker  
>>>>> using my
>>>>> genome assembly and the cufflinks result.  I get much less  
>>>>> number of
>>>>> transcripts as a result.
>>>>>
>>>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>>>>> confused as to why maker is not finding the same.
>>>>>
>>>>> Any suggestions would be appreciated.
>>>>>
>>>>> Thanks
>>>>> Dhivya
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> maker-devel mailing list
>>>>> maker-devel at box290.bluehost.com
>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>
>>>> _______________________________________________ maker-devel  
>>>> mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140211/bf1fae70/attachment-0001.html>

From carsonhh at gmail.com  Tue Feb 11 11:55:38 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 11 Feb 2014 11:55:38 -0700
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <0BB3E178-1CA3-46E7-8923-3E7C6B834665@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
	<C375C3D8-1B13-4685-9E90-AAF710CADCDD@gmail.com>
	<CF19004A.9913%carsonhh@gmail.com>
	<02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com>
	<CF190E83.9927%carsonhh@gmail.com>
	<CAGWaY_4mGU2DLWwcQ=_F3-O+YE1ZmDtE=zgdi6cVouhkH=N5HQ@mail.gmail.com>
	<CF19187C.994D%carsonhh@gmail.com>
	<0BB3E178-1CA3-46E7-8923-3E7C6B834665@gmail.com>
Message-ID: <CF1FBEEF.9BF5%carsonhh@gmail.com>

I wouldn?t use mpi_evaluator.  It is buggy and has virtually no
documentation.  The AED values are the best way to identify which genes are
higher and lower quality.  You can also run interproscan to identify protein
domain content as an independent evaluation. Look at this paper here ?>
http://www.biomedcentral.com/1471-2105/12/491

Figure 4 has a nice example of how AED, domain content, and gene orthology
correlate to show the quality of different subsets of genes in seven ant
genomes.

If you choose to try mpi_evaluator it uses the -CTL option to generate empty
files that you then fill in.

Thanks,
Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Tuesday, February 11, 2014 at 11:48 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Daniel Ence <dence at genetics.utah.edu>, <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] maker annotation with cufflinks output

With your suggested changes (using a protein file not derived from the
RNA-seq data and fixing the gff file for training SNAP), I was able to
increase the number of genes from 6000+ to 18116.

I'm now trying to evaluate the quality of the annotation.  I have a question
about the usage for mpi_evaluator.

In the maker tutorial,  the usage is given as:

 mpi_evaluator [options] <eval_opts> <eval_bopts> <eval_exe>
What files are being referred to in the input parameters: eval_opts,
eval_bopts and eval_exe?

Thanks 
Dhivya

On Feb 6, 2014, at 11:47 AM, Carson Holt wrote:

> Ok.  Content looks good.  Just make sure to use gff3_merge to join the GFF3?s
> without stripping out the fasta sequence at the end when training SNAP.
> 
> Thanks,
> Carson
> 
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Thursday, February 6, 2014 at 10:29 AM
> To:  Carson Holt <carsonhh at gmail.com>
> Cc:  Daniel Ence <dence at genetics.utah.edu>
> Subject:  Re: [maker-devel] maker annotation with cufflinks output
> 
> Sorry I was just trying to make it small enough to be approved by the mailing
> list.
> 
> Here is the whole file:
> 
> 
>  cat.formatted.gff.tgz
> <https://docs.google.com/file/d/0B3fACsJDXQi6VEE1VG5tWEh5M1U/edit?usp=drive_we
> b> 
> 
> 
> 
> On Thu, Feb 6, 2014 at 11:04 AM, Carson Holt <carsonhh at gmail.com> wrote:
>> Could you give me the file without using 'head? to trim it, its cutting it
>> before it reaches the part I?m interested in.
>> 
>> ?Carson
>> 
>> 
>> From:  dhivya arasappan <darasappan at gmail.com>
>> Date:  Thursday, February 6, 2014 at 10:01 AM
>> 
>> To:  Carson Holt <carsonhh at gmail.com>
>> Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
>> <maker-devel at yandell-lab.org>
>> Subject:  Re: [maker-devel] maker annotation with cufflinks output
>> 
>> Oh yes I did- I took just the non sequence entries in the gff file and used
>> that as my input.  I will rerun snap with the gff file containing the
>> sequences as well.
>> 
>> I'm attaching a snippet of the gff file that I used as input to maker2zff.
>> 
>> Thanks for your help
>> Dhivya
>> 
>> 
>> 
>> 
>> On Feb 6, 2014, at 10:05 AM, Carson Holt wrote:
>> 
>>> Your genome.dna file has no sequence?  Did you by any chance strip the fasta
>>> sequence from the GFF3 you are using as input to maker2zff?  There should be
>>> fasta sequence at the end of that file.  Also can I see the GFF3 file you
>>> are using as input to maker2zff.
>>> 
>>> Thanks,
>>> Carson
>>> 
>>> From:  dhivya arasappan <darasappan at gmail.com>
>>> Date:  Thursday, February 6, 2014 at 7:47 AM
>>> To:  Carson Holt <carsonhh at gmail.com>
>>> Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
>>> <maker-devel at yandell-lab.org>
>>> Subject:  Re: [maker-devel] maker annotation with cufflinks output
>>> 
>>> Hello,
>>> 
>>> I does appear than my genome.ann file from maker2zff script has data in it.
>>> However, the SNAP steps after that have created empty files.  The following
>>> are all empty:
>>> 
>>> alt.dna  err.dna  export.dna  genome.dna  olp.dna  uni.dna  wrn.dna
>>> alt.ann  err.ann  export.ann  genome.ann  olp.ann  uni.ann  wrn.ann
>>> 
>>> When I tried to get gene stats or validate genome.ann, I get errors like
>>> this for all of them:
>>> 
>>> fathom genome.ann genome.dna -gene-stats |more
>>> MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds
>>> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
>>> exon-6:out_of_bounds
>>> MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds
>>> exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds
>>> exon-1:out_of_bounds
>>> MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds
>>> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
>>> MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds
>>> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
>>> exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds
>>> exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds
>>> exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds
>>> exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds
>>> exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds
>>> exon-21:out_of_bounds
>>> 
>>> I'm not sure why the annotation I'm seeing in genome.ann are all showing up
>>> as errors. I realize this may be an issue with snap, but are you familiar
>>> with anything like this? My genome.ann file is attached for reference.
>>> 
>>> Thanks
>>> Dhivya
>>> 
>>> On Feb 5, 2014, at 12:38 PM, Carson Holt wrote:
>>> 
>>>> Do you have any features of type snap in your results from step 3?  We?ve
>>>> had a couple of recent posts where after training snap was giving no
>>>> results, and as a result maker couldn?t give any genes.  One cause of
>>>> something like that may be your step 2.  Make sure the ZFF wasn?t empty you
>>>> used to train with.  The maker2zff script uses filters to only put the best
>>>> genes in the off file, and if all your genes fail the filtering then you
>>>> are training with an empty ZFF.
>>>> 
>>>> Also you should use proteins from a related species as your protein file.
>>>> I see that you protein marches are varying wildly from run to run? So is
>>>> your contig count?  Were the subset of contigs you have results for long
>>>> enough to contain genes?
>>>> 
>>>> ?Carson
>>>> 
>>>> From:  dhivya arasappan <darasappan at gmail.com>
>>>> Date:  Monday, February 3, 2014 at 9:31 AM
>>>> To:  Daniel Ence <dence at genetics.utah.edu>
>>>> Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>> Subject:  Re: [maker-devel] maker annotation with cufflinks output
>>>> 
>>>> Hi Daniel,
>>>> 
>>>> I was able to check on some of those questions.
>>>> 
>>>> 1. From trinity assembly: I started with 102000 contigs. I used trinotate
>>>> to annotate proteins in this.
>>>> 
>>>> I ran maker on this data with est2genome set to 1. The output looks like
>>>> this (most important parts on top):
>>>> 
>>>>     6653 gene
>>>>    46675 exon
>>>>  280534 protein_match
>>>> 59934 CDS
>>>>     969 contig
>>>>  105388 expressed_sequence_match
>>>>   12584 five_prime_UTR
>>>>   78565 match
>>>> 1401369 match_part
>>>>   10180 mRNA
>>>>   11545 three_prime_UTR
>>>> 
>>>> 2. From cufflinks assembly: I started with 133380 entries (out of which
>>>> there are 29,000 transcripts).  I used the protein sequences from trinity
>>>> assembly.
>>>> 
>>>> I ran maker on this data with est2genome set to 1. The output looks like
>>>> this:
>>>>      29 gene
>>>>      75 exon
>>>>  573659 protein_match
>>>> 67 CDS
>>>>    1099 contig
>>>>  269298 expressed_sequence_match
>>>>      23 five_prime_UTR
>>>>  173844 match
>>>> 2221846 match_part
>>>>      29 mRNA
>>>>      23 three_prime_UTR
>>>> 
>>>> The genes annotated using the trinity assembly is lower than expected, so I
>>>> went the cufflinks route. I dont understand why when using the cufflinks
>>>> transcripts, even less genes are being found.
>>>> 
>>>> 3. Training SNAP:  I used the results of maker from 1 to train SNAP.  I
>>>> then used that training set to rerun maker:
>>>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/s
>>>> nap/RHA.hmm
>>>> est2genome=0
>>>> 
>>>> And again I got results with no entries for gene, exon, CDS etc.
>>>> 957 contig
>>>>   46555 expressed_sequence_match
>>>>   43651 match
>>>>  553633 match_part
>>>>  113738 protein_match
>>>> 
>>>> As I mentioned in another email, cegma results indicated that the genome
>>>> was more than 90% complete. Any suggestions would be helpful.
>>>> 
>>>> Thank you
>>>> Dhivya
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>>>> 
>>>>> Hi Dhivya, 
>>>>> 
>>>>> I think there a few numbers that could be helpful to understand what's
>>>>> happening here.
>>>>> 
>>>>> How many transcripts did Trinity assembly the RNA-seq data into? Also, you
>>>>> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave
>>>>> it the cufflinks data. How many transcripts did MAKER identify with the
>>>>> cufflinks data? Did you still get more than the 10,000 transcripts that
>>>>> you found with just the Trinity data?
>>>>> 
>>>>> A key part of MAKER's approach to genome annotation that might be
>>>>> affecting it's performance is that it only annotates a gene where there is
>>>>> both evidence (like your RNA-seq data) and an ab-initio prediction. If a
>>>>> prediction is unsupported by the evidence, then MAKER won't annotate a
>>>>> gene and if evidence aligns where there's no prediction, MAKER won't
>>>>> annotate a gene either. What ab-initio predictors are you using and have
>>>>> they been trained specific genome?
>>>>> 
>>>>> You can force MAKER to automatically promote evidence alignments to a gene
>>>>> model by setting the est2genome option to 1, but that will usually give
>>>>> you many false positives.
>>>>> 
>>>>> Try rerunning it with either the Trinity data or the Cufflinks data and
>>>>> with est2genome set to 1, and let us know how that affects the MAKER
>>>>> results. 
>>>>> 
>>>>> Thanks,
>>>>> Daniel
>>>>> 
>>>>> Daniel Ence
>>>>> Graduate Student
>>>>> Eccles Institute of Human Genetics
>>>>> University of Utah
>>>>> 15 North 2030 East, Room 2100
>>>>> Salt Lake City, UT 84112-5330
>>>>> ________________________________________
>>>>>  From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>>>>> dhivya arasappan [darasappan at gmail.com]
>>>>>  Sent: Thursday, January 30, 2014 11:18 AM
>>>>> To: maker-devel at yandell-lab.org
>>>>> Subject: [maker-devel] maker annotation with cufflinks output
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> I am trying to annotate a 200 mb plant genome for which I have a very
>>>>> good assembly.
>>>>> 
>>>>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>>>>> using my genome assembly and the trinity results.  I did not get as
>>>>>  many transcripts as expected, around 10,000 transcripts.
>>>>> 
>>>>> So, I decided to try a different approach.  I did a genome assisted
>>>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>>>>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
>>>>>  genome assembly and the cufflinks result.  I get much less number of
>>>>> transcripts as a result.
>>>>> 
>>>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>>>>> confused as to why maker is not finding the same.
>>>>> 
>>>>> Any suggestions would be appreciated.
>>>>> 
>>>>> Thanks
>>>>> Dhivya
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> maker-devel mailing list
>>>>> maker-devel at box290.bluehost.com
>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>> 
>>>> _______________________________________________ maker-devel mailing list
>>>> maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> 
>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140211/0f491f93/attachment-0001.html>

From carson.holt at genetics.utah.edu  Tue Feb 11 13:52:05 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Tue, 11 Feb 2014 20:52:05 +0000
Subject: [maker-devel] New MAKER release
Message-ID: <CF1FDB84.9C17%carson.holt@genetics.utah.edu>

Hello all,

MAKER has been updated to 2.31.

There are no major new features over 2.30.  It is primarily just bug fixes, and updates to the features that were added from MAKER-P like tRNAscan support.  I also was able to remove the seg faults that sometimes happened on exit under OpenMPI.

Thanks,
Carson

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140211/bce7d2a5/attachment-0001.html>

From carson.holt at genetics.utah.edu  Tue Feb 11 14:19:17 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Tue, 11 Feb 2014 21:19:17 +0000
Subject: [maker-devel] New MAKER release
In-Reply-To: <CA+77YqG+FiWr+HvSNYY6R6UOBCtcejA1wCLCXvzQr_Top5Eemw@mail.gmail.com>
References: <CF1FDB84.9C17%carson.holt@genetics.utah.edu>
	<CA+77YqG+FiWr+HvSNYY6R6UOBCtcejA1wCLCXvzQr_Top5Eemw@mail.gmail.com>
Message-ID: <CF1FDDCC.9C1B%carson.holt@genetics.utah.edu>

URLs can be manually edited in the .../maker/src/locations file. I?ve also updated that file in the latest MAKER download. to point to the new RepBase URL.

Thanks,
Carson

From: Joanna Kelley <jokelley at stanford.edu<mailto:jokelley at stanford.edu>>
Date: Tuesday, February 11, 2014 at 2:00 PM
To: Carson Holt <carson.holt at genetics.utah.edu<mailto:carson.holt at genetics.utah.edu>>
Subject: Re: [maker-devel] New MAKER release

Hi Carson,

The RepBase step is failing, it seems to be looking for the incorrect version, where do I change the code to solve that?

Thanks,
Joanna

 Downloading RepBase...
--2014-02-11 12:59:38--  http://www.girinst.org/server/RepBase/protected/repeatmaskerlibraries/repeatmaskerlibraries-20130422.tar.gz
Resolving www.girinst.org... 66.201.49.247
Connecting to www.girinst.org<http://www.girinst.org>|66.201.49.247|:80... connected.
HTTP request sent, awaiting response... 401 Authorization Required
Connecting to www.girinst.org<http://www.girinst.org>|66.201.49.247|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2014-02-11 12:59:38 ERROR 404: Not Found.


ERROR: Failed installing RepBase, now cleaning installation path...
You may need to install RepBase manually.


On Tue, Feb 11, 2014 at 12:52 PM, Carson Holt <carson.holt at genetics.utah.edu<mailto:carson.holt at genetics.utah.edu>> wrote:
Hello all,

MAKER has been updated to 2.31.

There are no major new features over 2.30.  It is primarily just bug fixes, and updates to the features that were added from MAKER-P like tRNAscan support.  I also was able to remove the seg faults that sometimes happened on exit under OpenMPI.

Thanks,
Carson


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


--
Please update your address book, my new email address is joanna.l.kelley at wsu.edu<mailto:joanna.l.kelley at wsu.edu>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140211/3da9afda/attachment-0001.html>

From dence at genetics.utah.edu  Tue Feb 11 15:59:57 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Tue, 11 Feb 2014 22:59:57 +0000
Subject: [maker-devel] ERROR: Failed while processing the chunk divide!!
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A8908ADE@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A8908ADE@SKREGIXES2.AGR.GC.CA>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D4462B@mxb2.hg.genetics.utah.edu>

Hi Hossen, 

I think that what would be the most help right now is if you ran MAKER on only one of those contigs that are failing and send me the entire error output along with the maker control files that you are using. It looks like the error is coming from the gff3 files that you are using as input. 

Thanks,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
Sent: Tuesday, February 11, 2014 3:51 PM
To: Daniel Ence
Subject: ERROR: Failed while processing the chunk divide!!

Dear Daniel

I re-started maker and it is still running. But in error our file that has
been generated so far it seems that smaller conitgs are affected. There
are contigs of 2-4 kb with this error but also I noticed a contig of 30kb
length having this error

I was wondering if I need to change the setting in the maker_opt file

#-----MAKER Behavior Options
max_dna_len=100000 #length for dividing up contigs into chunks
(increases/decreases  memory usage)
min_contig=1 #skip genome contigs below this length (under 10kb are often
useless)


If I understand correctly max_dna_len   divide conitgs  of over 100kb to
smaller chucks. However it is not clear to me that for the min_contig
option if the default contig length is 10kb or less, then why I have error
message for 30kb long contigs. Should I change this to 0

Here is an example of the error message for one of the contigs


#--------- command -------------#
Widget::exonerate::est2genome:
/usr/local/exonerate-2.2.0-x86_64/bin/exonerate  -q
/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.brass
icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35
/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta
-t
/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.brass
icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom
e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-1136.
fasta
-Q dna -T dna --model est2genome
--minintron 20 --showcigar --percent 20 >
/raid01/projects/Plasmodiophora/brassica
e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.brassi
cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3S
c00001.235-1136.comp14545_c0_seq1.est_exonerate
#-------------------------------#
cleaning blastn...
cleaning tblastx...
cleaning blastx...
ERROR: Failed on
PbPT3Sc00001_S_0.8_1-mRNA-1
Check your input GFF3 file for errors!
(from GFFDB)

FATAL ERROR
ERROR: Failed while processing the chunk
divide!!

ERROR: Chunk failed at level 17
!!
FAILED CONTIG:PbPT3Sc00001


--Next Contig--


Regards


HB


On 14-02-11 12:37 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hossein,
>
>Ok. So since this error came up on a local install, I'm going to need
>some more information to understand what went wrong. Is it the same
>contig that always causes this error? If it is, then is the the only
>error or warning that MAKER encounters while running on this contig? Or,
>if multiple contigs fail, then is it always the same error?
>
>If you can narrow it down to the smallest possible dataset that
>consistently gives the same error, then we canb egin to understand what's
>wrong.
>
>Thanks,
>Daniel
>
>
>Daniel Ence
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>Sent: Tuesday, February 11, 2014 11:20 AM
>To: Daniel Ence
>Subject: Re: [maker-devel] Falied to create new account
>
>Hi Daniel
>
>I running it through the local server at my work
>
>
>
>
>
>
>M. Hossein Borhan, Ph.D.
>Research Scientist/ Chercheur Scientifique
>Saskatoon Research Centre/Centre de Recherches de Saskatoon
>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
>107 Science Place, Saskatoon, SK.,S7N 0X2
>Telephone/T?l?phone: (306) 385-9441
>Facsimile/T?l?copieur: (306) 385-9482
>Hossein.borhan at agr.gc.ca
>
>
>
>
>
>
>
>
>On 14-02-11 12:16 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>
>>Hi Hossein,
>>
>>Did you encounter this error while you were running MAKER on your local
>>machine or through the MAKER web annotation service?
>>
>>Thanks,
>>Daniel
>>
>>
>>Daniel Ence
>>Graduate Student
>>Eccles Institute of Human Genetics
>>University of Utah
>>15 North 2030 East, Room 2100
>>Salt Lake City, UT 84112-5330
>>________________________________________
>>From: Carson Holt [carsonhh at gmail.com]
>>Sent: Tuesday, February 11, 2014 10:18 AM
>>To: Daniel Ence
>>Cc: Mark Yandell
>>Subject: FW: [maker-devel] Falied to create new account
>>
>>Hey Daniel could you download his dataset, and see if you can replicate
>>the error.  Also check if this was an MWAS job or a local maker run (his
>>dataset will already be there for MWAS, you just need the job ID).
>>
>>Thanks,
>>Carson
>>
>>On 2/11/14, 10:16 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:
>>
>>>Hi Carson
>>>
>>>
>>>I encountered this error while running maker
>>>
>>>FATAL ERROR
>>>ERROR: Failed while processing the chunk divide!!
>>>
>>>ERROR: Chunk failed at level 17
>>>!!
>>>FAILED CONTIG:PbPT3Sc00006
>>>
>>>
>>>
>>>
>>>
>>>HB
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>>
>>>
>>
>>
>


From marc.hoeppner at imbim.uu.se  Wed Feb 12 01:34:12 2014
From: marc.hoeppner at imbim.uu.se (Marc P. Hoeppner)
Date: Wed, 12 Feb 2014 09:34:12 +0100
Subject: [maker-devel] Annotations from protein alignments
Message-ID: <52FB3204.60606@imbim.uu.se>

Dear list,

I have an annotation project with both protein data (it's a bird, so 
I've been using both vertebrates in general and chicken in specific), 
and huge amounts of somewhat dodgy (as in lot's of pre-mRNA) RNA-seq 
data. The chicken augustus model seems to do a decent job in seeding 
gene loci, but it's not quite perfect. I want to use protein alignments 
to create a high-confidence set of exons and subsequently a set of gene 
loci to train e.g. snap), but when testing to set protein2genome=1 I 
never get any annotations. This is also true for the test data set that 
is delivered together with Maker (hsap_). Anything I should know about 
the use of proteins to generate annotations? I left all settings in the 
config file at their defaults (except protein2genome=1). I've tried this 
with both Maker 2.30 and 2.31.

All the best,

Marc

-- 
-----------
Marc P. Hoeppner, PhD
Group leader
BILS Genome annotation platform

Department of Medical Biochemistry and Microbiology
Uppsala University, Sweden
marc.hoepner at imbim.uu.se


From carsonhh at gmail.com  Wed Feb 12 08:42:36 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 12 Feb 2014 08:42:36 -0700
Subject: [maker-devel] Annotations from protein alignments
In-Reply-To: <52FB3204.60606@imbim.uu.se>
References: <52FB3204.60606@imbim.uu.se>
Message-ID: <CF20E42A.9C8C%carsonhh@gmail.com>

I updated the 2.31 tar ball.  Go ahead and download it again.
protein2genome was turned off for eukaryotes and only working for
prokaryotic genomes.

?Carson


On 2/12/14, 1:34 AM, "Marc P. Hoeppner" <marc.hoeppner at imbim.uu.se> wrote:

>Dear list,
>
>I have an annotation project with both protein data (it's a bird, so
>I've been using both vertebrates in general and chicken in specific),
>and huge amounts of somewhat dodgy (as in lot's of pre-mRNA) RNA-seq
>data. The chicken augustus model seems to do a decent job in seeding
>gene loci, but it's not quite perfect. I want to use protein alignments
>to create a high-confidence set of exons and subsequently a set of gene
>loci to train e.g. snap), but when testing to set protein2genome=1 I
>never get any annotations. This is also true for the test data set that
>is delivered together with Maker (hsap_). Anything I should know about
>the use of proteins to generate annotations? I left all settings in the
>config file at their defaults (except protein2genome=1). I've tried this
>with both Maker 2.30 and 2.31.
>
>All the best,
>
>Marc
>
>-- 
>-----------
>Marc P. Hoeppner, PhD
>Group leader
>BILS Genome annotation platform
>
>Department of Medical Biochemistry and Microbiology
>Uppsala University, Sweden
>marc.hoepner at imbim.uu.se
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From dence at genetics.utah.edu  Wed Feb 12 11:59:11 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Wed, 12 Feb 2014 18:59:11 +0000
Subject: [maker-devel] ERROR: Failed while processing the chunk divide!!
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A8908D02@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A8908ADE@SKREGIXES2.AGR.GC.CA>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D4462B@mxb2.hg.genetics.utah.edu>,
	<E8EDFB90D92694478065C37017B3A3A6A8908D02@SKREGIXES2.AGR.GC.CA>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D448BA@mxb2.hg.genetics.utah.edu>

Hi Hossein, 

So, after looking at the gff3 and your control files, I had an idea. There's the part of the control file called "Re-annotation Using MAKER Derived GFF3", but you can also passthrough features from a gff3 using the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines. 

Sometimes we encounter problems with the MAKER passthrough. Could you try dividing the gff3 file into the different feature sources and passing it through the "est_gff" etc options and not with the MAKER passthrough? That will tell us if the problem is with the gff3 file or with how MAKER is processing it. 

Another also to check is to make sure that the contig names in the gff3 file match the contig names in the fasta file that you're annotating. 

Thanks,
Daniel


Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
Sent: Wednesday, February 12, 2014 8:49 AM
To: Daniel Ence
Subject: Re: ERROR: Failed while processing the chunk divide!!

Dear Daniel


I have generated the files that you requested. I choose Sc00009 from my
genome which is 30 kb and was one of the scaffolds coming up with error.
In addition to Ctl files and error output file I also attached a part of
the gff file related to SC00009 that is indicated in the error message.


Thanks for helping with this


Regards


HB


On 14-02-11 4:59 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hi Hossen,
>
>I think that what would be the most help right now is if you ran MAKER on
>only one of those contigs that are failing and send me the entire error
>output along with the maker control files that you are using. It looks
>like the error is coming from the gff3 files that you are using as input.
>
>Thanks,
>Daniel
>
>
>
>Daniel Ence
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>Sent: Tuesday, February 11, 2014 3:51 PM
>To: Daniel Ence
>Subject: ERROR: Failed while processing the chunk divide!!
>
>Dear Daniel
>
>I re-started maker and it is still running. But in error our file that has
>been generated so far it seems that smaller conitgs are affected. There
>are contigs of 2-4 kb with this error but also I noticed a contig of 30kb
>length having this error
>
>I was wondering if I need to change the setting in the maker_opt file
>
>#-----MAKER Behavior Options
>max_dna_len=100000 #length for dividing up contigs into chunks
>(increases/decreases  memory usage)
>min_contig=1 #skip genome contigs below this length (under 10kb are often
>useless)
>
>
>If I understand correctly max_dna_len   divide conitgs  of over 100kb to
>smaller chucks. However it is not clear to me that for the min_contig
>option if the default contig length is 10kb or less, then why I have error
>message for 30kb long contigs. Should I change this to 0
>
>Here is an example of the error message for one of the contigs
>
>
>#--------- command -------------#
>Widget::exonerate::est2genome:
>/usr/local/exonerate-2.2.0-x86_64/bin/exonerate  -q
>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras
>s
>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35
>/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta
>-t
>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras
>s
>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom
>e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-1136
>.
>fasta
>-Q dna -T dna --model est2genome
>--minintron 20 --showcigar --percent 20 >
>/raid01/projects/Plasmodiophora/brassica
>e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.brass
>i
>cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3
>S
>c00001.235-1136.comp14545_c0_seq1.est_exonerate
>#-------------------------------#
>cleaning blastn...
>cleaning tblastx...
>cleaning blastx...
>ERROR: Failed on
>PbPT3Sc00001_S_0.8_1-mRNA-1
>Check your input GFF3 file for errors!
>(from GFFDB)
>
>FATAL ERROR
>ERROR: Failed while processing the chunk
>divide!!
>
>ERROR: Chunk failed at level 17
>!!
>FAILED CONTIG:PbPT3Sc00001
>
>
>
>
>--Next Contig--
>
>
>
>
>
>
>Regards
>
>
>HB
>
>
>
>
>
>
>
>
>
>
>On 14-02-11 12:37 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>
>>Hossein,
>>
>>Ok. So since this error came up on a local install, I'm going to need
>>some more information to understand what went wrong. Is it the same
>>contig that always causes this error? If it is, then is the the only
>>error or warning that MAKER encounters while running on this contig? Or,
>>if multiple contigs fail, then is it always the same error?
>>
>>If you can narrow it down to the smallest possible dataset that
>>consistently gives the same error, then we canb egin to understand what's
>>wrong.
>>
>>Thanks,
>>Daniel
>>
>>
>>Daniel Ence
>>Graduate Student
>>Eccles Institute of Human Genetics
>>University of Utah
>>15 North 2030 East, Room 2100
>>Salt Lake City, UT 84112-5330
>>________________________________________
>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>>Sent: Tuesday, February 11, 2014 11:20 AM
>>To: Daniel Ence
>>Subject: Re: [maker-devel] Falied to create new account
>>
>>Hi Daniel
>>
>>I running it through the local server at my work
>>
>>
>>
>>
>>
>>
>>M. Hossein Borhan, Ph.D.
>>Research Scientist/ Chercheur Scientifique
>>Saskatoon Research Centre/Centre de Recherches de Saskatoon
>>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
>>107 Science Place, Saskatoon, SK.,S7N 0X2
>>Telephone/T?l?phone: (306) 385-9441
>>Facsimile/T?l?copieur: (306) 385-9482
>>Hossein.borhan at agr.gc.ca
>>
>>
>>
>>
>>
>>
>>
>>
>>On 14-02-11 12:16 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>
>>>Hi Hossein,
>>>
>>>Did you encounter this error while you were running MAKER on your local
>>>machine or through the MAKER web annotation service?
>>>
>>>Thanks,
>>>Daniel
>>>
>>>
>>>Daniel Ence
>>>Graduate Student
>>>Eccles Institute of Human Genetics
>>>University of Utah
>>>15 North 2030 East, Room 2100
>>>Salt Lake City, UT 84112-5330
>>>________________________________________
>>>From: Carson Holt [carsonhh at gmail.com]
>>>Sent: Tuesday, February 11, 2014 10:18 AM
>>>To: Daniel Ence
>>>Cc: Mark Yandell
>>>Subject: FW: [maker-devel] Falied to create new account
>>>
>>>Hey Daniel could you download his dataset, and see if you can replicate
>>>the error.  Also check if this was an MWAS job or a local maker run (his
>>>dataset will already be there for MWAS, you just need the job ID).
>>>
>>>Thanks,
>>>Carson
>>>
>>>On 2/11/14, 10:16 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA>
>>>wrote:
>>>
>>>>Hi Carson
>>>>
>>>>
>>>>I encountered this error while running maker
>>>>
>>>>FATAL ERROR
>>>>ERROR: Failed while processing the chunk divide!!
>>>>
>>>>ERROR: Chunk failed at level 17
>>>>!!
>>>>FAILED CONTIG:PbPT3Sc00006
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>HB
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>
>>>
>>>
>>
>


From dence at genetics.utah.edu  Wed Feb 12 12:15:59 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Wed, 12 Feb 2014 19:15:59 +0000
Subject: [maker-devel] ERROR: Failed while processing the chunk divide!!
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D448BA@mxb2.hg.genetics.utah.edu>
References: <E8EDFB90D92694478065C37017B3A3A6A8908ADE@SKREGIXES2.AGR.GC.CA>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D4462B@mxb2.hg.genetics.utah.edu>,
	<E8EDFB90D92694478065C37017B3A3A6A8908D02@SKREGIXES2.AGR.GC.CA>,
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D448BA@mxb2.hg.genetics.utah.edu>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D44928@mxb2.hg.genetics.utah.edu>

Hi Hossein, 

One more question. How did you make the gff3 that you're passing through here? 

Thanks,
Daniel 


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Daniel Ence [dence at genetics.utah.edu]
Sent: Wednesday, February 12, 2014 11:59 AM
To: Borhan, Hossein
Cc: maker-devel at yandell-lab.org
Subject: Re: [maker-devel] ERROR: Failed while processing the chunk divide!!

Hi Hossein,

So, after looking at the gff3 and your control files, I had an idea. There's the part of the control file called "Re-annotation Using MAKER Derived GFF3", but you can also passthrough features from a gff3 using the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines.

Sometimes we encounter problems with the MAKER passthrough. Could you try dividing the gff3 file into the different feature sources and passing it through the "est_gff" etc options and not with the MAKER passthrough? That will tell us if the problem is with the gff3 file or with how MAKER is processing it.

Another also to check is to make sure that the contig names in the gff3 file match the contig names in the fasta file that you're annotating.

Thanks,
Daniel


Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
Sent: Wednesday, February 12, 2014 8:49 AM
To: Daniel Ence
Subject: Re: ERROR: Failed while processing the chunk divide!!

Dear Daniel


I have generated the files that you requested. I choose Sc00009 from my
genome which is 30 kb and was one of the scaffolds coming up with error.
In addition to Ctl files and error output file I also attached a part of
the gff file related to SC00009 that is indicated in the error message.


Thanks for helping with this


Regards


HB


On 14-02-11 4:59 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hi Hossen,
>
>I think that what would be the most help right now is if you ran MAKER on
>only one of those contigs that are failing and send me the entire error
>output along with the maker control files that you are using. It looks
>like the error is coming from the gff3 files that you are using as input.
>
>Thanks,
>Daniel
>
>
>
>Daniel Ence
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>Sent: Tuesday, February 11, 2014 3:51 PM
>To: Daniel Ence
>Subject: ERROR: Failed while processing the chunk divide!!
>
>Dear Daniel
>
>I re-started maker and it is still running. But in error our file that has
>been generated so far it seems that smaller conitgs are affected. There
>are contigs of 2-4 kb with this error but also I noticed a contig of 30kb
>length having this error
>
>I was wondering if I need to change the setting in the maker_opt file
>
>#-----MAKER Behavior Options
>max_dna_len=100000 #length for dividing up contigs into chunks
>(increases/decreases  memory usage)
>min_contig=1 #skip genome contigs below this length (under 10kb are often
>useless)
>
>
>If I understand correctly max_dna_len   divide conitgs  of over 100kb to
>smaller chucks. However it is not clear to me that for the min_contig
>option if the default contig length is 10kb or less, then why I have error
>message for 30kb long contigs. Should I change this to 0
>
>Here is an example of the error message for one of the contigs
>
>
>#--------- command -------------#
>Widget::exonerate::est2genome:
>/usr/local/exonerate-2.2.0-x86_64/bin/exonerate  -q
>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras
>s
>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35
>/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta
>-t
>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras
>s
>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom
>e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-1136
>.
>fasta
>-Q dna -T dna --model est2genome
>--minintron 20 --showcigar --percent 20 >
>/raid01/projects/Plasmodiophora/brassica
>e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.brass
>i
>cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3
>S
>c00001.235-1136.comp14545_c0_seq1.est_exonerate
>#-------------------------------#
>cleaning blastn...
>cleaning tblastx...
>cleaning blastx...
>ERROR: Failed on
>PbPT3Sc00001_S_0.8_1-mRNA-1
>Check your input GFF3 file for errors!
>(from GFFDB)
>
>FATAL ERROR
>ERROR: Failed while processing the chunk
>divide!!
>
>ERROR: Chunk failed at level 17
>!!
>FAILED CONTIG:PbPT3Sc00001
>
>
>
>
>--Next Contig--
>
>
>
>
>
>
>Regards
>
>
>HB
>
>
>
>
>
>
>
>
>
>
>On 14-02-11 12:37 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>
>>Hossein,
>>
>>Ok. So since this error came up on a local install, I'm going to need
>>some more information to understand what went wrong. Is it the same
>>contig that always causes this error? If it is, then is the the only
>>error or warning that MAKER encounters while running on this contig? Or,
>>if multiple contigs fail, then is it always the same error?
>>
>>If you can narrow it down to the smallest possible dataset that
>>consistently gives the same error, then we canb egin to understand what's
>>wrong.
>>
>>Thanks,
>>Daniel
>>
>>
>>Daniel Ence
>>Graduate Student
>>Eccles Institute of Human Genetics
>>University of Utah
>>15 North 2030 East, Room 2100
>>Salt Lake City, UT 84112-5330
>>________________________________________
>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>>Sent: Tuesday, February 11, 2014 11:20 AM
>>To: Daniel Ence
>>Subject: Re: [maker-devel] Falied to create new account
>>
>>Hi Daniel
>>
>>I running it through the local server at my work
>>
>>
>>
>>
>>
>>
>>M. Hossein Borhan, Ph.D.
>>Research Scientist/ Chercheur Scientifique
>>Saskatoon Research Centre/Centre de Recherches de Saskatoon
>>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
>>107 Science Place, Saskatoon, SK.,S7N 0X2
>>Telephone/T?l?phone: (306) 385-9441
>>Facsimile/T?l?copieur: (306) 385-9482
>>Hossein.borhan at agr.gc.ca
>>
>>
>>
>>
>>
>>
>>
>>
>>On 14-02-11 12:16 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>
>>>Hi Hossein,
>>>
>>>Did you encounter this error while you were running MAKER on your local
>>>machine or through the MAKER web annotation service?
>>>
>>>Thanks,
>>>Daniel
>>>
>>>
>>>Daniel Ence
>>>Graduate Student
>>>Eccles Institute of Human Genetics
>>>University of Utah
>>>15 North 2030 East, Room 2100
>>>Salt Lake City, UT 84112-5330
>>>________________________________________
>>>From: Carson Holt [carsonhh at gmail.com]
>>>Sent: Tuesday, February 11, 2014 10:18 AM
>>>To: Daniel Ence
>>>Cc: Mark Yandell
>>>Subject: FW: [maker-devel] Falied to create new account
>>>
>>>Hey Daniel could you download his dataset, and see if you can replicate
>>>the error.  Also check if this was an MWAS job or a local maker run (his
>>>dataset will already be there for MWAS, you just need the job ID).
>>>
>>>Thanks,
>>>Carson
>>>
>>>On 2/11/14, 10:16 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA>
>>>wrote:
>>>
>>>>Hi Carson
>>>>
>>>>
>>>>I encountered this error while running maker
>>>>
>>>>FATAL ERROR
>>>>ERROR: Failed while processing the chunk divide!!
>>>>
>>>>ERROR: Chunk failed at level 17
>>>>!!
>>>>FAILED CONTIG:PbPT3Sc00006
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>HB
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>
>>>
>>>
>>
>


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From dence at genetics.utah.edu  Wed Feb 12 13:42:03 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Wed, 12 Feb 2014 20:42:03 +0000
Subject: [maker-devel] ERROR: Failed while processing the chunk divide!!
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A8908E3E@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A8908ADE@SKREGIXES2.AGR.GC.CA>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D4462B@mxb2.hg.genetics.utah.edu>
	<E8EDFB90D92694478065C37017B3A3A6A8908D02@SKREGIXES2.AGR.GC.CA>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D448BA@mxb2.hg.genetics.utah.edu>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D44928@mxb2.hg.genetics.utah.edu>
	<E8EDFB90D92694478065C37017B3A3A6A8908DE5@SKREGIXES2.AGR.GC.CA>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D4498A@mxb2.hg.genetics.utah.edu>,
	<E8EDFB90D92694478065C37017B3A3A6A8908E3E@SKREGIXES2.AGR.GC.CA>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D44A3B@mxb2.hg.genetics.utah.edu>

Hi Hossein, 

So, those problems with passing through MAKER-derived gff3 have been addressed in newer versions of MAKER. The current version is 2.31 and is available for download now on our website. Try installing that version and trying the same controls file you started out using, and let me know if that fixes the problems. 

Thanks,
Daniel

 
Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
Sent: Wednesday, February 12, 2014 12:55 PM
To: Daniel Ence
Subject: Re: ERROR: Failed while processing the chunk divide!!

Hi Daniel

I am using maker 2.10
 I also checked the naming of the scaffold in the genome file and the gff
file for the failed example. Naming is the same

Thanks

Hossein


M. Hossein Borhan, Ph.D.
Research Scientist/ Chercheur Scientifique
Saskatoon Research Centre/Centre de Recherches de Saskatoon
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
107 Science Place, Saskatoon, SK.,S7N 0X2
Telephone/T?l?phone: (306) 385-9441
Facsimile/T?l?copieur: (306) 385-9482
Hossein.borhan at agr.gc.ca


On 14-02-12 1:30 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hi Hossein,
>
>And which version of MAKER are you using?
>
>Thanks,
>Daniel
>
>
>Daniel Ence
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>Sent: Wednesday, February 12, 2014 12:25 PM
>To: Daniel Ence
>Subject: Re: ERROR: Failed while processing the chunk divide!!
>
>Hi Daniel
>
>Gff file was generated by the 1st run of maker
>
>
>
>HB
>
>
>
>
>
>
>
>On 14-02-12 1:15 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>
>>Hi Hossein,
>>
>>One more question. How did you make the gff3 that you're passing through
>>here?
>>
>>Thanks,
>>Daniel
>>
>>
>>Daniel Ence
>>Graduate Student
>>Eccles Institute of Human Genetics
>>University of Utah
>>15 North 2030 East, Room 2100
>>Salt Lake City, UT 84112-5330
>>________________________________________
>>From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>>Daniel Ence [dence at genetics.utah.edu]
>>Sent: Wednesday, February 12, 2014 11:59 AM
>>To: Borhan, Hossein
>>Cc: maker-devel at yandell-lab.org
>>Subject: Re: [maker-devel] ERROR: Failed while processing the chunk
>>divide!!
>>
>>Hi Hossein,
>>
>>So, after looking at the gff3 and your control files, I had an idea.
>>There's the part of the control file called "Re-annotation Using MAKER
>>Derived GFF3", but you can also passthrough features from a gff3 using
>>the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines.
>>
>>Sometimes we encounter problems with the MAKER passthrough. Could you try
>>dividing the gff3 file into the different feature sources and passing it
>>through the "est_gff" etc options and not with the MAKER passthrough?
>>That will tell us if the problem is with the gff3 file or with how MAKER
>>is processing it.
>>
>>Another also to check is to make sure that the contig names in the gff3
>>file match the contig names in the fasta file that you're annotating.
>>
>>Thanks,
>>Daniel
>>
>>
>>
>>Graduate Student
>>Eccles Institute of Human Genetics
>>University of Utah
>>15 North 2030 East, Room 2100
>>Salt Lake City, UT 84112-5330
>>________________________________________
>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>>Sent: Wednesday, February 12, 2014 8:49 AM
>>To: Daniel Ence
>>Subject: Re: ERROR: Failed while processing the chunk divide!!
>>
>>Dear Daniel
>>
>>
>>I have generated the files that you requested. I choose Sc00009 from my
>>genome which is 30 kb and was one of the scaffolds coming up with error.
>>In addition to Ctl files and error output file I also attached a part of
>>the gff file related to SC00009 that is indicated in the error message.
>>
>>
>>Thanks for helping with this
>>
>>
>>
>>Regards
>>
>>
>>HB
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>On 14-02-11 4:59 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>
>>>Hi Hossen,
>>>
>>>I think that what would be the most help right now is if you ran MAKER
>>>on
>>>only one of those contigs that are failing and send me the entire error
>>>output along with the maker control files that you are using. It looks
>>>like the error is coming from the gff3 files that you are using as
>>>input.
>>>
>>>Thanks,
>>>Daniel
>>>
>>>
>>>
>>>Daniel Ence
>>>Graduate Student
>>>Eccles Institute of Human Genetics
>>>University of Utah
>>>15 North 2030 East, Room 2100
>>>Salt Lake City, UT 84112-5330
>>>________________________________________
>>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>>>Sent: Tuesday, February 11, 2014 3:51 PM
>>>To: Daniel Ence
>>>Subject: ERROR: Failed while processing the chunk divide!!
>>>
>>>Dear Daniel
>>>
>>>I re-started maker and it is still running. But in error our file that
>>>has
>>>been generated so far it seems that smaller conitgs are affected. There
>>>are contigs of 2-4 kb with this error but also I noticed a contig of
>>>30kb
>>>length having this error
>>>
>>>I was wondering if I need to change the setting in the maker_opt file
>>>
>>>#-----MAKER Behavior Options
>>>max_dna_len=100000 #length for dividing up contigs into chunks
>>>(increases/decreases  memory usage)
>>>min_contig=1 #skip genome contigs below this length (under 10kb are
>>>often
>>>useless)
>>>
>>>
>>>If I understand correctly max_dna_len   divide conitgs  of over 100kb to
>>>smaller chucks. However it is not clear to me that for the min_contig
>>>option if the default contig length is 10kb or less, then why I have
>>>error
>>>message for 30kb long contigs. Should I change this to 0
>>>
>>>Here is an example of the error message for one of the contigs
>>>
>>>
>>>#--------- command -------------#
>>>Widget::exonerate::est2genome:
>>>/usr/local/exonerate-2.2.0-x86_64/bin/exonerate  -q
>>>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.br
>>>a
>>>s
>>>s
>>>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35
>>>/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta
>>>-t
>>>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.br
>>>a
>>>s
>>>s
>>>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom
>>>e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-11
>>>3
>>>6
>>>.
>>>fasta
>>>-Q dna -T dna --model est2genome
>>>--minintron 20 --showcigar --percent 20 >
>>>/raid01/projects/Plasmodiophora/brassica
>>>e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.bra
>>>s
>>>s
>>>i
>>>cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbP
>>>T
>>>3
>>>S
>>>c00001.235-1136.comp14545_c0_seq1.est_exonerate
>>>#-------------------------------#
>>>cleaning blastn...
>>>cleaning tblastx...
>>>cleaning blastx...
>>>ERROR: Failed on
>>>PbPT3Sc00001_S_0.8_1-mRNA-1
>>>Check your input GFF3 file for errors!
>>>(from GFFDB)
>>>
>>>FATAL ERROR
>>>ERROR: Failed while processing the chunk
>>>divide!!
>>>
>>>ERROR: Chunk failed at level 17
>>>!!
>>>FAILED CONTIG:PbPT3Sc00001
>>>
>>>
>>>
>>>
>>>--Next Contig--
>>>
>>>
>>>
>>>
>>>
>>>
>>>Regards
>>>
>>>
>>>HB
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>On 14-02-11 12:37 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>>
>>>>Hossein,
>>>>
>>>>Ok. So since this error came up on a local install, I'm going to need
>>>>some more information to understand what went wrong. Is it the same
>>>>contig that always causes this error? If it is, then is the the only
>>>>error or warning that MAKER encounters while running on this contig?
>>>>Or,
>>>>if multiple contigs fail, then is it always the same error?
>>>>
>>>>If you can narrow it down to the smallest possible dataset that
>>>>consistently gives the same error, then we canb egin to understand
>>>>what's
>>>>wrong.
>>>>
>>>>Thanks,
>>>>Daniel
>>>>
>>>>
>>>>Daniel Ence
>>>>Graduate Student
>>>>Eccles Institute of Human Genetics
>>>>University of Utah
>>>>15 North 2030 East, Room 2100
>>>>Salt Lake City, UT 84112-5330
>>>>________________________________________
>>>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>>>>Sent: Tuesday, February 11, 2014 11:20 AM
>>>>To: Daniel Ence
>>>>Subject: Re: [maker-devel] Falied to create new account
>>>>
>>>>Hi Daniel
>>>>
>>>>I running it through the local server at my work
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>M. Hossein Borhan, Ph.D.
>>>>Research Scientist/ Chercheur Scientifique
>>>>Saskatoon Research Centre/Centre de Recherches de Saskatoon
>>>>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
>>>>107 Science Place, Saskatoon, SK.,S7N 0X2
>>>>Telephone/T?l?phone: (306) 385-9441
>>>>Facsimile/T?l?copieur: (306) 385-9482
>>>>Hossein.borhan at agr.gc.ca
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>On 14-02-11 12:16 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>>>
>>>>>Hi Hossein,
>>>>>
>>>>>Did you encounter this error while you were running MAKER on your
>>>>>local
>>>>>machine or through the MAKER web annotation service?
>>>>>
>>>>>Thanks,
>>>>>Daniel
>>>>>
>>>>>
>>>>>Daniel Ence
>>>>>Graduate Student
>>>>>Eccles Institute of Human Genetics
>>>>>University of Utah
>>>>>15 North 2030 East, Room 2100
>>>>>Salt Lake City, UT 84112-5330
>>>>>________________________________________
>>>>>From: Carson Holt [carsonhh at gmail.com]
>>>>>Sent: Tuesday, February 11, 2014 10:18 AM
>>>>>To: Daniel Ence
>>>>>Cc: Mark Yandell
>>>>>Subject: FW: [maker-devel] Falied to create new account
>>>>>
>>>>>Hey Daniel could you download his dataset, and see if you can
>>>>>replicate
>>>>>the error.  Also check if this was an MWAS job or a local maker run
>>>>>(his
>>>>>dataset will already be there for MWAS, you just need the job ID).
>>>>>
>>>>>Thanks,
>>>>>Carson
>>>>>
>>>>>On 2/11/14, 10:16 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA>
>>>>>wrote:
>>>>>
>>>>>>Hi Carson
>>>>>>
>>>>>>
>>>>>>I encountered this error while running maker
>>>>>>
>>>>>>FATAL ERROR
>>>>>>ERROR: Failed while processing the chunk divide!!
>>>>>>
>>>>>>ERROR: Chunk failed at level 17
>>>>>>!!
>>>>>>FAILED CONTIG:PbPT3Sc00006
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>HB
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>>_______________________________________________
>>maker-devel mailing list
>>maker-devel at box290.bluehost.com
>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>


From masa at bioinfo.hr  Thu Feb 13 03:17:11 2014
From: masa at bioinfo.hr (Masa Roller)
Date: Thu, 13 Feb 2014 11:17:11 +0100
Subject: [maker-devel] SNAP scores and AED scores
Message-ID: <52FC9BA7.6060505@bioinfo.hr>

Dear all,

I ran snap2 based gene prediction through maker.

In the resulting gff file, in the source "snap_masked" I can find the 
score in the score column of every snap prediction that did not get 
promoted to a maker gene. This would be the score of how well the 
prediction matches the HMM?

It seems to me that those snap models that are given gene status no 
longer appear as snap_masked source but only as source "maker". Maker 
then removes the score column, instead giving AED and eAED scores (which 
are more about how the model corresponds to the evidence). When viewing 
the maker transcripts and SNAP predictions in a browser, they do not 
match (mostly, maker predictions are longer).

I am interested in the score of individual gene predictions that 
underlined maker gene models. Where could I find that information?

Many thanks!


From carsonhh at gmail.com  Thu Feb 13 13:11:22 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 13 Feb 2014 13:11:22 -0700
Subject: [maker-devel] SNAP scores and AED scores
In-Reply-To: <52FC9BA7.6060505@bioinfo.hr>
References: <52FC9BA7.6060505@bioinfo.hr>
Message-ID: <CF227374.9D6F%carsonhh@gmail.com>

No.  Snap genes do not disappear. All SNAP ab initio calls will always be
kept as reference fetters marked snap_masked (for repeat masked genome)
and snap (for unmasked genome).  MAKER then runs SNAP another time where
it feeds hints to SNAP based on EST and protein alignment evidence.  These
hint based models can then compete against the ab initio SNAP models to be
promoted to genes if their AED scores are better.  Fianl models can also
get UTR added based on EST evidence.  That is why you can get models from
MAKER that do not match the original SNAP ab initio calls.

So in summary, all SNAP ab initio models will be in snap_masked.  The
MAKER models will consist of hint based SNAP rerun plus SNAP ab intio
models processed to add UTR.

Thanks,
Carson


On 2/13/14, 3:17 AM, "Masa Roller" <masa at bioinfo.hr> wrote:

>Dear all,
>
>I ran snap2 based gene prediction through maker.
>
>In the resulting gff file, in the source "snap_masked" I can find the
>score in the score column of every snap prediction that did not get
>promoted to a maker gene. This would be the score of how well the
>prediction matches the HMM?
>
>It seems to me that those snap models that are given gene status no
>longer appear as snap_masked source but only as source "maker". Maker
>then removes the score column, instead giving AED and eAED scores (which
>are more about how the model corresponds to the evidence). When viewing
>the maker transcripts and SNAP predictions in a browser, they do not
>match (mostly, maker predictions are longer).
>
>I am interested in the score of individual gene predictions that
>underlined maker gene models. Where could I find that information?
>
>Many thanks!
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Thu Feb 13 13:23:07 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 13 Feb 2014 13:23:07 -0700
Subject: [maker-devel] SNAP scores and AED scores
In-Reply-To: <CF227374.9D6F%carsonhh@gmail.com>
References: <52FC9BA7.6060505@bioinfo.hr>
 <CF227374.9D6F%carsonhh@gmail.com>
Message-ID: <CF227602.9D7E%carsonhh@gmail.com>

On a side note.  Because the MAKER models involve modifying either the ab
initio SNAP model or manipulating the underlying scoring scheme using
hints, the SNAP score on those is virtually meaningless.  However Ian Korf
has developed a tool that can take any gene structure and reverse generate
a score (i.e. what would the score of this gene have been if SNAP would
have called it that way in the first place).

I believe the tool is called fathom and is part of the SNAP package.  It
is not well documented, so you might have to contact Ian Korf directly for
that.  You can use the maker2zff tool to generate the input to fathom.

Thanks,
Carson


On 2/13/14, 1:11 PM, "Carson Holt" <carsonhh at gmail.com> wrote:

>No.  Snap genes do not disappear. All SNAP ab initio calls will always be
>kept as reference fetters marked snap_masked (for repeat masked genome)
>and snap (for unmasked genome).  MAKER then runs SNAP another time where
>it feeds hints to SNAP based on EST and protein alignment evidence.  These
>hint based models can then compete against the ab initio SNAP models to be
>promoted to genes if their AED scores are better.  Fianl models can also
>get UTR added based on EST evidence.  That is why you can get models from
>MAKER that do not match the original SNAP ab initio calls.
>
>So in summary, all SNAP ab initio models will be in snap_masked.  The
>MAKER models will consist of hint based SNAP rerun plus SNAP ab intio
>models processed to add UTR.
>
>Thanks,
>Carson
>
>
>
>On 2/13/14, 3:17 AM, "Masa Roller" <masa at bioinfo.hr> wrote:
>
>>Dear all,
>>
>>I ran snap2 based gene prediction through maker.
>>
>>In the resulting gff file, in the source "snap_masked" I can find the
>>score in the score column of every snap prediction that did not get
>>promoted to a maker gene. This would be the score of how well the
>>prediction matches the HMM?
>>
>>It seems to me that those snap models that are given gene status no
>>longer appear as snap_masked source but only as source "maker". Maker
>>then removes the score column, instead giving AED and eAED scores (which
>>are more about how the model corresponds to the evidence). When viewing
>>the maker transcripts and SNAP predictions in a browser, they do not
>>match (mostly, maker predictions are longer).
>>
>>I am interested in the score of individual gene predictions that
>>underlined maker gene models. Where could I find that information?
>>
>>Many thanks!
>>
>>_______________________________________________
>>maker-devel mailing list
>>maker-devel at box290.bluehost.com
>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>


From barry.utah at gmail.com  Thu Feb 13 13:27:17 2014
From: barry.utah at gmail.com (Barry Moore)
Date: Thu, 13 Feb 2014 13:27:17 -0700
Subject: [maker-devel] SNAP scores and AED scores
In-Reply-To: <CF227374.9D6F%carsonhh@gmail.com>
References: <52FC9BA7.6060505@bioinfo.hr> <CF227374.9D6F%carsonhh@gmail.com>
Message-ID: <39AA5089-3E89-4067-A8DF-60B6716C98DF@genetics.utah.edu>

Hi Masa,

Also, if you want additional SNAP output that hasn't been passed forward in MAKER you can alway access the original SNAP output files in the MAKER datastore.  This is a directory structure created by MAKER to store contig specific data.  There is a datastore directory (and a corresponding index file) in the make output directory.  The index file will provide the path to individual contigs and in that contig specific directory there is a directory call theVoid.  This contains all of the output of each program that MAKER runs.

B

On Feb 13, 2014, at 1:11 PM, Carson Holt wrote:

> No.  Snap genes do not disappear. All SNAP ab initio calls will always be
> kept as reference fetters marked snap_masked (for repeat masked genome)
> and snap (for unmasked genome).  MAKER then runs SNAP another time where
> it feeds hints to SNAP based on EST and protein alignment evidence.  These
> hint based models can then compete against the ab initio SNAP models to be
> promoted to genes if their AED scores are better.  Fianl models can also
> get UTR added based on EST evidence.  That is why you can get models from
> MAKER that do not match the original SNAP ab initio calls.
> 
> So in summary, all SNAP ab initio models will be in snap_masked.  The
> MAKER models will consist of hint based SNAP rerun plus SNAP ab intio
> models processed to add UTR.
> 
> Thanks,
> Carson
> 
> 
> 
> On 2/13/14, 3:17 AM, "Masa Roller" <masa at bioinfo.hr> wrote:
> 
>> Dear all,
>> 
>> I ran snap2 based gene prediction through maker.
>> 
>> In the resulting gff file, in the source "snap_masked" I can find the
>> score in the score column of every snap prediction that did not get
>> promoted to a maker gene. This would be the score of how well the
>> prediction matches the HMM?
>> 
>> It seems to me that those snap models that are given gene status no
>> longer appear as snap_masked source but only as source "maker". Maker
>> then removes the score column, instead giving AED and eAED scores (which
>> are more about how the model corresponds to the evidence). When viewing
>> the maker transcripts and SNAP predictions in a browser, they do not
>> match (mostly, maker predictions are longer).
>> 
>> I am interested in the score of individual gene predictions that
>> underlined maker gene models. Where could I find that information?
>> 
>> Many thanks!
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140213/4966ce40/attachment-0001.html>

From mptrsen at uni-bonn.de  Thu Feb 13 20:00:24 2014
From: mptrsen at uni-bonn.de (Malte Petersen)
Date: Fri, 14 Feb 2014 04:00:24 +0100
Subject: [maker-devel] BLAST options error / should Maker check for file
	format?
Message-ID: <52FD86C8.6040007@uni-bonn.de>

Dear MAKER devs,

I was running Maker version 2.30p-beta on an insect genome, and it
didn't produce any output. I got these error messages:


Widget::formater:
/path/to/makeblastdb -dbtype nucl -in
/tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-62_e3%2Escaf.mpi.10.0
#-------------------------------#
BLAST options error: File
/tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-62_e3%2Escaf.mpi.10.0
is empty
ERROR: /path/to/makeblastdb failed in Widget::formater
--> rank=NA, hostname=Jeanne-GBR
ERROR: Failed while doing blastn of ESTs
ERROR: Chunk failed at level:0, tier_type:3
FAILED CONTIG:scf7180005143343

ERROR: Chunk failed at level:4, tier_type:0
FAILED CONTIG:scf7180005143343


I figured out that this error is due to a non-Fasta file format being
fed to Maker as extrinsic evidence (I gave it a meta-info file).  While
I got the pipeline running now with the correct file, I think that it
should be complaining (a lot earlier) if any of the input files are of
the wrong format.  More people might run into this problem and have no
idea where to look for a solution.

What do you think?

Best,
Malte


From carsonhh at gmail.com  Thu Feb 13 20:11:22 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 13 Feb 2014 20:11:22 -0700
Subject: [maker-devel] BLAST options error / should Maker check for file
 format?
In-Reply-To: <52FD86C8.6040007@uni-bonn.de>
References: <52FD86C8.6040007@uni-bonn.de>
Message-ID: <CF22D59B.9DEB%carsonhh@gmail.com>

Hi Malte,

Actually there already is.  I?m very surprised your file made it that far.
Normally it fails right away.

Example ?>

STATUS: Parsing control files...
STATUS: Processing and indexing input FASTA files...
ERROR: The fasta file /Users/cholt/Developer/maker/trunk/data/test1
appears to be empty.


Another test file ?>


STATUS: Parsing control files...
STATUS: Processing and indexing input FASTA files...
ERROR: The nucleotide sequence file
'/Users/cholt/Developer/maker/trunk/data/test2'
appears to contain protein sequence or unrecognized characters. Note
the following nucleotides may be valid but are unsupported [RYKMSWBDHV]
Please check/fix the file before continuing, or set -fix_nucleotides on
the command line to fix this automatically.
Invalid Character: 'M'


You seem to have found just the right formula of improper input to get
past the filters on your run :-)


Thanks,
Carson


On 2/13/14, 8:00 PM, "Malte Petersen" <mptrsen at uni-bonn.de> wrote:

>Dear MAKER devs,
>
>I was running Maker version 2.30p-beta on an insect genome, and it
>didn't produce any output. I got these error messages:
>
>
>Widget::formater:
>/path/to/makeblastdb -dbtype nucl -in
>/tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-6
>2_e3%2Escaf.mpi.10.0
>#-------------------------------#
>BLAST options error: File
>/tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-6
>2_e3%2Escaf.mpi.10.0
>is empty
>ERROR: /path/to/makeblastdb failed in Widget::formater
>--> rank=NA, hostname=Jeanne-GBR
>ERROR: Failed while doing blastn of ESTs
>ERROR: Chunk failed at level:0, tier_type:3
>FAILED CONTIG:scf7180005143343
>
>ERROR: Chunk failed at level:4, tier_type:0
>FAILED CONTIG:scf7180005143343
>
>
>I figured out that this error is due to a non-Fasta file format being
>fed to Maker as extrinsic evidence (I gave it a meta-info file).  While
>I got the pipeline running now with the correct file, I think that it
>should be complaining (a lot earlier) if any of the input files are of
>the wrong format.  More people might run into this problem and have no
>idea where to look for a solution.
>
>What do you think?
>
>Best,
>Malte
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From dence at genetics.utah.edu  Fri Feb 14 12:09:08 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Fri, 14 Feb 2014 19:09:08 +0000
Subject: [maker-devel] ERROR: Failed while processing the chunk divide!!
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A89090D3@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A8908ADE@SKREGIXES2.AGR.GC.CA>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D4462B@mxb2.hg.genetics.utah.edu>
	<E8EDFB90D92694478065C37017B3A3A6A8908D02@SKREGIXES2.AGR.GC.CA>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D448BA@mxb2.hg.genetics.utah.edu>,
	<E8EDFB90D92694478065C37017B3A3A6A89090D3@SKREGIXES2.AGR.GC.CA>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D452AD@mxb2.hg.genetics.utah.edu>

Hi Hossein, 

So, this is what is going on. The problem is with the GFF3 file, and the problem is that the exon features in that GFF3 should have the mRNA as their parent instead of the gene. When you deleted the "-mRNA-1", the Name of the mRNA became the same as the Name of the gene, which restored the proper relationship between the features. The same problem exists for the CDS features.

The solution for this is to make the exon and CDS parent's "point" to the mRNA and not the gene. Since MAKER has very regular rules for making names, this should be pretty straight forward. You should be ok with just adding "-mRNA-1" to the end of all the exon and CDS lines. This will work unless there some mRNAs with alternative splice forms because then the mRNA's will end with something like "-mRNA-2". 

I've attached a script that should do this for you. 

Run it with this command

"perl fix_gff3_script.pl <your_gff3> > <fixed_gff3>"

And then run MAKER with the fixed gff3 file in place of the old gff3 file. 

Let me know if that works, 

Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
Sent: Thursday, February 13, 2014 3:27 PM
To: Daniel Ence
Subject: Re: ERROR: Failed while processing the chunk divide!!

Dear Daniel


I downloaded maker 2.31 and ran the same scaffold. Again it gave error on
the gff file. I then removed the word mRNA-1 from my gff file and ran it
again. It seems to have worked this time. Attached are std error files for
first try std-err (the one that failed) and 2nd one named std-err-wo-mRNA
(that apparently worked).  Since the gff file is as evidence only I
thought it should not matter to remove the mRNA-1 naming form the gff file.


Cheers

HB


On 14-02-12 12:59 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hi Hossein,
>
>So, after looking at the gff3 and your control files, I had an idea.
>There's the part of the control file called "Re-annotation Using MAKER
>Derived GFF3", but you can also passthrough features from a gff3 using
>the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines.
>
>Sometimes we encounter problems with the MAKER passthrough. Could you try
>dividing the gff3 file into the different feature sources and passing it
>through the "est_gff" etc options and not with the MAKER passthrough?
>That will tell us if the problem is with the gff3 file or with how MAKER
>is processing it.
>
>Another also to check is to make sure that the contig names in the gff3
>file match the contig names in the fasta file that you're annotating.
>
>Thanks,
>Daniel
>
>
>
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>Sent: Wednesday, February 12, 2014 8:49 AM
>To: Daniel Ence
>Subject: Re: ERROR: Failed while processing the chunk divide!!
>
>Dear Daniel
>
>
>I have generated the files that you requested. I choose Sc00009 from my
>genome which is 30 kb and was one of the scaffolds coming up with error.
>In addition to Ctl files and error output file I also attached a part of
>the gff file related to SC00009 that is indicated in the error message.
>
>
>Thanks for helping with this
>
>
>
>Regards
>
>
>HB
>
>
>
>
>
>
>
>
>
>
>
>
>On 14-02-11 4:59 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>
>>Hi Hossen,
>>
>>I think that what would be the most help right now is if you ran MAKER on
>>only one of those contigs that are failing and send me the entire error
>>output along with the maker control files that you are using. It looks
>>like the error is coming from the gff3 files that you are using as input.
>>
>>Thanks,
>>Daniel
>>
>>
>>
>>Daniel Ence
>>Graduate Student
>>Eccles Institute of Human Genetics
>>University of Utah
>>15 North 2030 East, Room 2100
>>Salt Lake City, UT 84112-5330
>>________________________________________
>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>>Sent: Tuesday, February 11, 2014 3:51 PM
>>To: Daniel Ence
>>Subject: ERROR: Failed while processing the chunk divide!!
>>
>>Dear Daniel
>>
>>I re-started maker and it is still running. But in error our file that
>>has
>>been generated so far it seems that smaller conitgs are affected. There
>>are contigs of 2-4 kb with this error but also I noticed a contig of 30kb
>>length having this error
>>
>>I was wondering if I need to change the setting in the maker_opt file
>>
>>#-----MAKER Behavior Options
>>max_dna_len=100000 #length for dividing up contigs into chunks
>>(increases/decreases  memory usage)
>>min_contig=1 #skip genome contigs below this length (under 10kb are often
>>useless)
>>
>>
>>If I understand correctly max_dna_len   divide conitgs  of over 100kb to
>>smaller chucks. However it is not clear to me that for the min_contig
>>option if the default contig length is 10kb or less, then why I have
>>error
>>message for 30kb long contigs. Should I change this to 0
>>
>>Here is an example of the error message for one of the contigs
>>
>>
>>#--------- command -------------#
>>Widget::exonerate::est2genome:
>>/usr/local/exonerate-2.2.0-x86_64/bin/exonerate  -q
>>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bra
>>s
>>s
>>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35
>>/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta
>>-t
>>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bra
>>s
>>s
>>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom
>>e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-113
>>6
>>.
>>fasta
>>-Q dna -T dna --model est2genome
>>--minintron 20 --showcigar --percent 20 >
>>/raid01/projects/Plasmodiophora/brassica
>>e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.bras
>>s
>>i
>>cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT
>>3
>>S
>>c00001.235-1136.comp14545_c0_seq1.est_exonerate
>>#-------------------------------#
>>cleaning blastn...
>>cleaning tblastx...
>>cleaning blastx...
>>ERROR: Failed on
>>PbPT3Sc00001_S_0.8_1-mRNA-1
>>Check your input GFF3 file for errors!
>>(from GFFDB)
>>
>>FATAL ERROR
>>ERROR: Failed while processing the chunk
>>divide!!
>>
>>ERROR: Chunk failed at level 17
>>!!
>>FAILED CONTIG:PbPT3Sc00001
>>
>>
>>
>>
>>--Next Contig--
>>
>>
>>
>>
>>
>>
>>Regards
>>
>>
>>HB
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>On 14-02-11 12:37 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>
>>>Hossein,
>>>
>>>Ok. So since this error came up on a local install, I'm going to need
>>>some more information to understand what went wrong. Is it the same
>>>contig that always causes this error? If it is, then is the the only
>>>error or warning that MAKER encounters while running on this contig? Or,
>>>if multiple contigs fail, then is it always the same error?
>>>
>>>If you can narrow it down to the smallest possible dataset that
>>>consistently gives the same error, then we canb egin to understand
>>>what's
>>>wrong.
>>>
>>>Thanks,
>>>Daniel
>>>
>>>
>>>Daniel Ence
>>>Graduate Student
>>>Eccles Institute of Human Genetics
>>>University of Utah
>>>15 North 2030 East, Room 2100
>>>Salt Lake City, UT 84112-5330
>>>________________________________________
>>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>>>Sent: Tuesday, February 11, 2014 11:20 AM
>>>To: Daniel Ence
>>>Subject: Re: [maker-devel] Falied to create new account
>>>
>>>Hi Daniel
>>>
>>>I running it through the local server at my work
>>>
>>>
>>>
>>>
>>>
>>>
>>>M. Hossein Borhan, Ph.D.
>>>Research Scientist/ Chercheur Scientifique
>>>Saskatoon Research Centre/Centre de Recherches de Saskatoon
>>>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
>>>107 Science Place, Saskatoon, SK.,S7N 0X2
>>>Telephone/T?l?phone: (306) 385-9441
>>>Facsimile/T?l?copieur: (306) 385-9482
>>>Hossein.borhan at agr.gc.ca
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>On 14-02-11 12:16 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>>
>>>>Hi Hossein,
>>>>
>>>>Did you encounter this error while you were running MAKER on your local
>>>>machine or through the MAKER web annotation service?
>>>>
>>>>Thanks,
>>>>Daniel
>>>>
>>>>
>>>>Daniel Ence
>>>>Graduate Student
>>>>Eccles Institute of Human Genetics
>>>>University of Utah
>>>>15 North 2030 East, Room 2100
>>>>Salt Lake City, UT 84112-5330
>>>>________________________________________
>>>>From: Carson Holt [carsonhh at gmail.com]
>>>>Sent: Tuesday, February 11, 2014 10:18 AM
>>>>To: Daniel Ence
>>>>Cc: Mark Yandell
>>>>Subject: FW: [maker-devel] Falied to create new account
>>>>
>>>>Hey Daniel could you download his dataset, and see if you can replicate
>>>>the error.  Also check if this was an MWAS job or a local maker run
>>>>(his
>>>>dataset will already be there for MWAS, you just need the job ID).
>>>>
>>>>Thanks,
>>>>Carson
>>>>
>>>>On 2/11/14, 10:16 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA>
>>>>wrote:
>>>>
>>>>>Hi Carson
>>>>>
>>>>>
>>>>>I encountered this error while running maker
>>>>>
>>>>>FATAL ERROR
>>>>>ERROR: Failed while processing the chunk divide!!
>>>>>
>>>>>ERROR: Chunk failed at level 17
>>>>>!!
>>>>>FAILED CONTIG:PbPT3Sc00006
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>HB
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix_gff3_script.pl
Type: application/octet-stream
Size: 349 bytes
Desc: fix_gff3_script.pl
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140214/364c961e/attachment-0001.obj>

From claudio.valero at wur.nl  Mon Feb 17 02:23:21 2014
From: claudio.valero at wur.nl (Valero Jimenez, Claudio)
Date: Mon, 17 Feb 2014 09:23:21 +0000
Subject: [maker-devel] Maker not predicting many genes
Message-ID: <A60E0B903F7C834D8F8ED0D21DE86ECF1CF820@SCOMP0936.wurnet.nl>

Dear list,

I'm trying to annotate a fungal genome, and I'm surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

Regards,

Claudio


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140217/69ce0cfc/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: maker_opts.log
Type: application/octet-stream
Size: 4776 bytes
Desc: maker_opts.log
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140217/69ce0cfc/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SOBA.pdf
Type: application/pdf
Size: 210262 bytes
Desc: SOBA.pdf
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140217/69ce0cfc/attachment-0001.pdf>

From carson.holt at genetics.utah.edu  Mon Feb 17 12:22:13 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Mon, 17 Feb 2014 19:22:13 +0000
Subject: [maker-devel] Maker not predicting many genes
In-Reply-To: <A60E0B903F7C834D8F8ED0D21DE86ECF1CF820@SCOMP0936.wurnet.nl>
References: <A60E0B903F7C834D8F8ED0D21DE86ECF1CF820@SCOMP0936.wurnet.nl>
Message-ID: <CF27AB29.9F59%carson.holt@genetics.utah.edu>

You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off.

If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).

Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.

Thanks,
Carson


From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>
Date: Monday, February 17, 2014 at 2:23 AM
To: "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Maker not predicting many genes

Dear list,

I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

Regards,

Claudio


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140217/d8a9d19c/attachment-0001.html>

From carsonhh at gmail.com  Mon Feb 17 12:26:05 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 17 Feb 2014 12:26:05 -0700
Subject: [maker-devel] Maker not predicting many genes
Message-ID: <CF27AFF8.9F83%carsonhh@gmail.com>

>From your control file, it looks like not setting single_exon=1, and only
using UniProt rather than supplying complete proteomes of a related species
are your primary shortcomings.  I?d set correct_est_fusion=1 as well.

?Carson


From:  Carson Holt <carson.holt at genetics.utah.edu>
Date:  Monday, February 17, 2014 at 12:22 PM
To:  "Valero Jimenez, Claudio" <claudio.valero at wur.nl>,
"'maker-devel at yandell-lab.org'" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Maker not predicting many genes

You also need to look at the contigs in a browser like apollo.  That will
allow you to see both the predictions and the evidence in context.  You can
then see if genes are being dropped because they are only being supported by
single exon evidence, they have no evidence support whatsoever, or if they
are being excluded because of UTR overlap.  That last one is a common
problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so
close that they often overlap in the UTR.  As a result, mRNA-seq assemblers
falsely asseble neighboring genes into single transcripts.  The result is
really long UTR on some of your gene models that force other models to be
excluded.  If this is the case, rerun something like trinity with the
jacquard clip option set  to avoid transcript fusion.  Then set
correct_est_fusion=1 in the MAKER control files to get those long false
UTR?s clipped off.

If it is a lack of evidence overlap, make sure you provided minimum 1
proteome from a related species to the protein= option.  At least 2
proteomes are recommended though (these are not proteins from the same
species but rather complete proteomes from related species).  Also
comprehensive databases like UniProt/Swiss-prot are not sufficient on their
own, but can supplement the other proteome data.  Also are you providing EST
data?  Note that EST/mRNA-seq data without a proteome from a related species
is also not siufficient (because both quality and how comprehensive
EST/mRNA-seq databsases are can vary so widely, and may only capture as
little as 30% of the genes).

Another thing that comes into play are single exon evidence.  In anything
but fungi, single exon evidence is mostly caused by spurious alignments.
But fungi have so many single exon genes, that this is not the case for
them.  Make sure single_exon=1 is set to allow that evidence to be kept, and
set the length of single exon evidence to keep to something like 250 bp.

Thanks,
Carson


From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl>
Date: Monday, February 17, 2014 at 2:23 AM
To: "'maker-devel at yandell-lab.org'" <maker-devel at yandell-lab.org>
Subject: Maker not predicting many genes

Dear list,
 
I?m trying to annotate a fungal genome, and I?m surprised that Maker does
not predict many genes (3697). I have trained SNAP and followed all the
tutorials available. Ab initio predictors are able to predict between
8000-10000 genes. It is something that I have in the configuration file that
is wrong?? I attach the ops file and the SOBA summary of the annotation.
 
Regards,
 
Claudio
 
 
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140217/6c29cf24/attachment-0001.html>

From claudio.valero at wur.nl  Wed Feb 19 01:20:04 2014
From: claudio.valero at wur.nl (Valero Jimenez, Claudio)
Date: Wed, 19 Feb 2014 08:20:04 +0000
Subject: [maker-devel] Maker not predicting many genes
In-Reply-To: <CF27AFF8.9F83%carsonhh@gmail.com>
References: <CF27AFF8.9F83%carsonhh@gmail.com>
Message-ID: <A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>

Hi Carson,

Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error:

Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67.

Similar thing happens when I try fasta_merge:

Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52.

I never had this problem before with these commands.


Regards,

Claudio

From: Carson Holt [mailto:carsonhh at gmail.com]
Sent: maandag 17 februari 2014 20:26
To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org'
Subject: Re: [maker-devel] Maker not predicting many genes

From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings.  I?d set correct_est_fusion=1 as well.

?Carson


From: Carson Holt <carson.holt at genetics.utah.edu<mailto:carson.holt at genetics.utah.edu>>
Date: Monday, February 17, 2014 at 12:22 PM
To: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>, "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] Maker not predicting many genes

You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off.

If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).

Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.

Thanks,
Carson


From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>
Date: Monday, February 17, 2014 at 2:23 AM
To: "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Maker not predicting many genes

Dear list,

I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

Regards,

Claudio


_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140219/ac13ef29/attachment-0001.html>

From carsonhh at gmail.com  Wed Feb 19 08:34:33 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 19 Feb 2014 08:34:33 -0700
Subject: [maker-devel] Maker not predicting many genes
In-Reply-To: <A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>
References: <CF27AFF8.9F83%carsonhh@gmail.com>
	<A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>
Message-ID: <CF2A1C44.A02B%carsonhh@gmail.com>

You provided a directory rather than a file to the -d option (?d' stands for
datastore log).
You must provide the location of the datastore index log file and not the
datastore directory.

Example ?> ./dpp_contig.maker.output/dpp_contig_master_datastore_index.log

Thanks,
Carson


From:  "Valero Jimenez, Claudio" <claudio.valero at wur.nl>
Date:  Wednesday, February 19, 2014 at 1:20 AM
To:  Carson Holt <carsonhh at gmail.com>, Carson Holt
<carson.holt at genetics.utah.edu>, "'maker-devel at yandell-lab.org'"
<maker-devel at yandell-lab.org>
Subject:  RE: [maker-devel] Maker not predicting many genes

Hi Carson,
 
Thank you for your suggestions. I ran again Maker and it was able to predict
many more genes. Although I have a different problem now. I try to run
gff3_merge and get the following error:
 
Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge
line 67.
 
Similar thing happens when I try fasta_merge:
 
Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge
line 52.
 
I never had this problem before with these commands.
 
 
Regards,
 
Claudio
 

From: Carson Holt [mailto:carsonhh at gmail.com]
Sent: maandag 17 februari 2014 20:26
To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org'
Subject: Re: [maker-devel] Maker not predicting many genes
 

>From your control file, it looks like not setting single_exon=1, and only
using UniProt rather than supplying complete proteomes of a related species
are your primary shortcomings.  I?d set correct_est_fusion=1 as well.

 
?Carson

 
From: Carson Holt <carson.holt at genetics.utah.edu>
Date: Monday, February 17, 2014 at 12:22 PM
To: "Valero Jimenez, Claudio" <claudio.valero at wur.nl>,
"'maker-devel at yandell-lab.org'" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Maker not predicting many genes

 
You also need to look at the contigs in a browser like apollo.  That will
allow you to see both the predictions and the evidence in context.  You can
then see if genes are being dropped because they are only being supported by
single exon evidence, they have no evidence support whatsoever, or if they
are being excluded because of UTR overlap.  That last one is a common
problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so
close that they often overlap in the UTR.  As a result, mRNA-seq assemblers
falsely asseble neighboring genes into single transcripts.  The result is
really long UTR on some of your gene models that force other models to be
excluded.  If this is the case, rerun something like trinity with the
jacquard clip option set  to avoid transcript fusion.  Then set
correct_est_fusion=1 in the MAKER control files to get those long false
UTR?s clipped off.

 
If it is a lack of evidence overlap, make sure you provided minimum 1
proteome from a related species to the protein= option.  At least 2
proteomes are recommended though (these are not proteins from the same
species but rather complete proteomes from related species).  Also
comprehensive databases like UniProt/Swiss-prot are not sufficient on their
own, but can supplement the other proteome data.  Also are you providing EST
data?  Note that EST/mRNA-seq data without a proteome from a related species
is also not siufficient (because both quality and how comprehensive
EST/mRNA-seq databsases are can vary so widely, and may only capture as
little as 30% of the genes).

 
Another thing that comes into play are single exon evidence.  In anything
but fungi, single exon evidence is mostly caused by spurious alignments.
But fungi have so many single exon genes, that this is not the case for
them.  Make sure single_exon=1 is set to allow that evidence to be kept, and
set the length of single exon evidence to keep to something like 250 bp.

 
Thanks,

Carson

 
From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl>
Date: Monday, February 17, 2014 at 2:23 AM
To: "'maker-devel at yandell-lab.org'" <maker-devel at yandell-lab.org>
Subject: Maker not predicting many genes

 
Dear list,
 
I?m trying to annotate a fungal genome, and I?m surprised that Maker does
not predict many genes (3697). I have trained SNAP and followed all the
tutorials available. Ab initio predictors are able to predict between
8000-10000 genes. It is something that I have in the configuration file that
is wrong?? I attach the ops file and the SOBA summary of the annotation.
 
Regards,
 
Claudio
 
 
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
<http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140219/a158d5b1/attachment-0001.html>

From dence at genetics.utah.edu  Wed Feb 19 09:04:08 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Wed, 19 Feb 2014 16:04:08 +0000
Subject: [maker-devel] Maker not predicting many genes
In-Reply-To: <A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>
References: <CF27AFF8.9F83%carsonhh@gmail.com>,
	<A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6272A@mxb2.hg.genetics.utah.edu>

Hi Claudio,

What was the command line you used for gff3_merge?

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl]
Sent: Wednesday, February 19, 2014 1:20 AM
To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org'
Subject: Re: [maker-devel] Maker not predicting many genes

Hi Carson,

Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error:

Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67.

Similar thing happens when I try fasta_merge:

Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52.

I never had this problem before with these commands.


Regards,

Claudio

From: Carson Holt [mailto:carsonhh at gmail.com]
Sent: maandag 17 februari 2014 20:26
To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org'
Subject: Re: [maker-devel] Maker not predicting many genes

>From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings.  I?d set correct_est_fusion=1 as well.

?Carson


From: Carson Holt <carson.holt at genetics.utah.edu<mailto:carson.holt at genetics.utah.edu>>
Date: Monday, February 17, 2014 at 12:22 PM
To: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>, "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] Maker not predicting many genes

You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off.

If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).

Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.

Thanks,
Carson


From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>
Date: Monday, February 17, 2014 at 2:23 AM
To: "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Maker not predicting many genes

Dear list,

I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

Regards,

Claudio


_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140219/4b409201/attachment-0001.html>

From claudio.valero at wur.nl  Wed Feb 19 09:33:36 2014
From: claudio.valero at wur.nl (Valero Jimenez, Claudio)
Date: Wed, 19 Feb 2014 16:33:36 +0000
Subject: [maker-devel] Maker not predicting many genes
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6272A@mxb2.hg.genetics.utah.edu>
References: <CF27AFF8.9F83%carsonhh@gmail.com>,
	<A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6272A@mxb2.hg.genetics.utah.edu>
Message-ID: <A60E0B903F7C834D8F8ED0D21DE86ECF1D695A@SCOMP0936.wurnet.nl>

Hi,

Thanks, I had a mistake in the command line!!!

Regards,

Claudio

From: Daniel Ence [mailto:dence at genetics.utah.edu]
Sent: woensdag 19 februari 2014 17:04
To: Valero Jimenez, Claudio; 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org'
Subject: RE: [maker-devel] Maker not predicting many genes

Hi Claudio,

What was the command line you used for gff3_merge?

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl]
Sent: Wednesday, February 19, 2014 1:20 AM
To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org'
Subject: Re: [maker-devel] Maker not predicting many genes
Hi Carson,

Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error:

Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67.

Similar thing happens when I try fasta_merge:

Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52.

I never had this problem before with these commands.


Regards,

Claudio

From: Carson Holt [mailto:carsonhh at gmail.com]
Sent: maandag 17 februari 2014 20:26
To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org'
Subject: Re: [maker-devel] Maker not predicting many genes

>From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings.  I'd set correct_est_fusion=1 as well.

-Carson


From: Carson Holt <carson.holt at genetics.utah.edu<mailto:carson.holt at genetics.utah.edu>>
Date: Monday, February 17, 2014 at 12:22 PM
To: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>, "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] Maker not predicting many genes

You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR's clipped off.

If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).

Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.

Thanks,
Carson


From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>
Date: Monday, February 17, 2014 at 2:23 AM
To: "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Maker not predicting many genes

Dear list,

I'm trying to annotate a fungal genome, and I'm surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

Regards,

Claudio


_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140219/2ad5ced8/attachment-0001.html>

From barry.utah at gmail.com  Wed Feb 19 11:03:47 2014
From: barry.utah at gmail.com (Barry Moore)
Date: Wed, 19 Feb 2014 11:03:47 -0700
Subject: [maker-devel] Maker not predicting many genes
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6272A@mxb2.hg.genetics.utah.edu>
References: <CF27AFF8.9F83%carsonhh@gmail.com>,
	<A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6272A@mxb2.hg.genetics.utah.edu>
Message-ID: <0F5B5A10-4B50-47EC-847B-0223E4CCF612@genetics.utah.edu>

Hi Daniel,

Could you add an error message to those two scripts that detects that a filename is missing or that a directory was given instead and gives the user a suggested solution.

Thanks,

B

On Feb 19, 2014, at 9:04 AM, Daniel Ence wrote:

> Hi Claudio, 
> 
> What was the command line you used for gff3_merge?
> 
> Thanks,
> Daniel
> 
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl]
> Sent: Wednesday, February 19, 2014 1:20 AM
> To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org'
> Subject: Re: [maker-devel] Maker not predicting many genes
> 
> Hi Carson,
>  
> Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error:
>  
> Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67.
>  
> Similar thing happens when I try fasta_merge:
>  
> Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52.
>  
> I never had this problem before with these commands.
>  
>  
> Regards,
>  
> Claudio
>  
> From: Carson Holt [mailto:carsonhh at gmail.com] 
> Sent: maandag 17 februari 2014 20:26
> To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org'
> Subject: Re: [maker-devel] Maker not predicting many genes
>  
> From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings.  I?d set correct_est_fusion=1 as well.
>  
> ?Carson
>  
>  
> From: Carson Holt <carson.holt at genetics.utah.edu>
> Date: Monday, February 17, 2014 at 12:22 PM
> To: "Valero Jimenez, Claudio" <claudio.valero at wur.nl>, "'maker-devel at yandell-lab.org'" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Maker not predicting many genes
>  
> You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off.
>  
> If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).
>  
> Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.
>  
> Thanks,
> Carson
>  
>  
>  
>  
>  
>  
> From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl>
> Date: Monday, February 17, 2014 at 2:23 AM
> To: "'maker-devel at yandell-lab.org'" <maker-devel at yandell-lab.org>
> Subject: Maker not predicting many genes
>  
> Dear list,
>  
> I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.
>  
> Regards,
>  
> Claudio
>  
>  
> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140219/fa42921a/attachment-0001.html>

From carson.holt at genetics.utah.edu  Wed Feb 19 11:06:52 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Wed, 19 Feb 2014 18:06:52 +0000
Subject: [maker-devel] Maker not predicting many genes
In-Reply-To: <0F5B5A10-4B50-47EC-847B-0223E4CCF612@genetics.utah.edu>
References: <CF27AFF8.9F83%carsonhh@gmail.com>
	<A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6272A@mxb2.hg.genetics.utah.edu>
	<0F5B5A10-4B50-47EC-847B-0223E4CCF612@genetics.utah.edu>
Message-ID: <CF2A4058.A064%carson.holt@genetics.utah.edu>

You only need to swap a single character in the script.  Just change the  -e (exists) test to a -f (is file) test.

Thanks,
Carson

From: Barry Moore <barry.utah at gmail.com<mailto:barry.utah at gmail.com>>
Date: Wednesday, February 19, 2014 at 11:03 AM
To: Daniel Ence <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>>
Cc: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>, Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>, Carson Holt <carson.holt at genetics.utah.edu<mailto:carson.holt at genetics.utah.edu>>, "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] Maker not predicting many genes

Hi Daniel,

Could you add an error message to those two scripts that detects that a filename is missing or that a directory was given instead and gives the user a suggested solution.

Thanks,

B

On Feb 19, 2014, at 9:04 AM, Daniel Ence wrote:

Hi Claudio,

What was the command line you used for gff3_merge?

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org>] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>]
Sent: Wednesday, February 19, 2014 1:20 AM
To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>'
Subject: Re: [maker-devel] Maker not predicting many genes

Hi Carson,

Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error:

Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67.

Similar thing happens when I try fasta_merge:

Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52.

I never had this problem before with these commands.


Regards,

Claudio

From: Carson Holt [mailto:carsonhh at gmail.com]
Sent: maandag 17 februari 2014 20:26
To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>'
Subject: Re: [maker-devel] Maker not predicting many genes

From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings.  I?d set correct_est_fusion=1 as well.

?Carson


From: Carson Holt <carson.holt at genetics.utah.edu<mailto:carson.holt at genetics.utah.edu>>
Date: Monday, February 17, 2014 at 12:22 PM
To: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>, "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] Maker not predicting many genes

You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off.

If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).

Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.

Thanks,
Carson


From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>
Date: Monday, February 17, 2014 at 2:23 AM
To: "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Maker not predicting many genes

Dear list,

I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

Regards,

Claudio


_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140219/6a80ec35/attachment-0001.html>

From gtaylor at bcgsc.ca  Fri Feb 21 11:48:42 2014
From: gtaylor at bcgsc.ca (Greg Taylor)
Date: Fri, 21 Feb 2014 10:48:42 -0800
Subject: [maker-devel] Maker jobs hanging
Message-ID: <C521977B031ADB40857D0FE9C98CC82737CC600AA1@xchange4>

Hello,
 I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb genome with predictors SNAP and Genemark, and using ABySS assembled RNA-seq data. To do this I am using 480 processors on our local cluster. Once a run begins, 479 contigs are started, as noted in the *_master_datastore_index.log file, the standard error log for the whole job looks normal, as do the run.log and run.log.child.0 for the daughter processes. This seems to be sequence dependent, as re-running contigs that hang doesn't help, the same contigs will always hang. I'm still looking into this myself, but it seems most if not all the jobs are stuck at the Blastx stage. If you have any suggestions, your help would be greatly appreciated.

sincerely,
Greg Taylor


From dence at genetics.utah.edu  Fri Feb 21 11:54:17 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Fri, 21 Feb 2014 18:54:17 +0000
Subject: [maker-devel] Maker jobs hanging
In-Reply-To: <C521977B031ADB40857D0FE9C98CC82737CC600AA1@xchange4>
References: <C521977B031ADB40857D0FE9C98CC82737CC600AA1@xchange4>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D66CD0@mxb2.hg.genetics.utah.edu>

Hi Greg, 

Since this is probably going to be a more complicated situation, would you upload your data and control file at this URL so that we can try to replicate the error on our machines?
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=166

Also, which version of MPI are you using? And you might want to try updating MAKER. I think version 2.31 was just updated a few weeks ago. 

Thanks,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Greg Taylor [gtaylor at bcgsc.ca]
Sent: Friday, February 21, 2014 11:48 AM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] Maker jobs hanging

Hello,
 I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb genome with predictors SNAP and Genemark, and using ABySS assembled RNA-seq data. To do this I am using 480 processors on our local cluster. Once a run begins, 479 contigs are started, as noted in the *_master_datastore_index.log file, the standard error log for the whole job looks normal, as do the run.log and run.log.child.0 for the daughter processes. This seems to be sequence dependent, as re-running contigs that hang doesn't help, the same contigs will always hang. I'm still looking into this myself, but it seems most if not all the jobs are stuck at the Blastx stage. If you have any suggestions, your help would be greatly appreciated.

sincerely,
Greg Taylor
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Fri Feb 21 11:56:50 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 21 Feb 2014 11:56:50 -0700
Subject: [maker-devel] Maker jobs hanging
Message-ID: <CF2CEDC6.A15D%carsonhh@gmail.com>

Use 2.31.  It has been tested to work without issue on several thousand
cpus.  Also use OpenMPI for any jobs greater than 100 cpus. In addition,
OpenMPI can freeze on some systems without the following flag when using
perl based MPI programs --> -mca btl ^openib

Example --> mpiexec -mca btl ^openib -n 200 maker


Finally, never use MVAPICH2.  It doesn't play well with perl, and freezes
whenever perl based MPI jobs extend across nodes (they run fine within a
single node though).

?Carson


On 2/21/14, 11:48 AM, "Greg Taylor" <gtaylor at bcgsc.ca> wrote:

>Hello,
> I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb
>genome with predictors SNAP and Genemark, and using ABySS assembled
>RNA-seq data. To do this I am using 480 processors on our local cluster.
>Once a run begins, 479 contigs are started, as noted in the
>*_master_datastore_index.log file, the standard error log for the whole
>job looks normal, as do the run.log and run.log.child.0 for the daughter
>processes. This seems to be sequence dependent, as re-running contigs
>that hang doesn't help, the same contigs will always hang. I'm still
>looking into this myself, but it seems most if not all the jobs are stuck
>at the Blastx stage. If you have any suggestions, your help would be
>greatly appreciated.
>
>sincerely,
>Greg Taylor
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From dence at genetics.utah.edu  Fri Feb 21 15:04:34 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Fri, 21 Feb 2014 22:04:34 +0000
Subject: [maker-devel] FW:  Maker jobs hanging
In-Reply-To: <CF2CEDC6.A15D%carsonhh@gmail.com>
References: <CF2CEDC6.A15D%carsonhh@gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D66D7E@mxb2.hg.genetics.utah.edu>

Hi Greg, 

You should be able to have the new MAKER work on the old datastore. Note the following advice from the main MAKER developer, Carson Holt. 

Thanks,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Carson Holt [carsonhh at gmail.com]
Sent: Friday, February 21, 2014 11:56 AM
To: Greg Taylor; maker-devel at yandell-lab.org
Subject: Re: [maker-devel] Maker jobs hanging

Use 2.31.  It has been tested to work without issue on several thousand
cpus.  Also use OpenMPI for any jobs greater than 100 cpus. In addition,
OpenMPI can freeze on some systems without the following flag when using
perl based MPI programs --> -mca btl ^openib

Example --> mpiexec -mca btl ^openib -n 200 maker


Finally, never use MVAPICH2.  It doesn't play well with perl, and freezes
whenever perl based MPI jobs extend across nodes (they run fine within a
single node though).

?Carson


On 2/21/14, 11:48 AM, "Greg Taylor" <gtaylor at bcgsc.ca> wrote:

>Hello,
> I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb
>genome with predictors SNAP and Genemark, and using ABySS assembled
>RNA-seq data. To do this I am using 480 processors on our local cluster.
>Once a run begins, 479 contigs are started, as noted in the
>*_master_datastore_index.log file, the standard error log for the whole
>job looks normal, as do the run.log and run.log.child.0 for the daughter
>processes. This seems to be sequence dependent, as re-running contigs
>that hang doesn't help, the same contigs will always hang. I'm still
>looking into this myself, but it seems most if not all the jobs are stuck
>at the Blastx stage. If you have any suggestions, your help would be
>greatly appreciated.
>
>sincerely,
>Greg Taylor
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From dence at genetics.utah.edu  Fri Feb 21 19:38:59 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Sat, 22 Feb 2014 02:38:59 +0000
Subject: [maker-devel] I am a PhD candidate at NMSU and have a question
	about maker2
In-Reply-To: <7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu>
References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu>,
	<7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D66E9C@mxb2.hg.genetics.utah.edu>

Hi Joe, 

Will you upload your control files and data at this URL?
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169

Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene?

I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues. 

Thanks,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Mark Yandell
Sent: Friday, February 21, 2014 7:32 PM
To: Daniel Ence
Subject: FW: I am a PhD candidate at NMSU and have a question about maker2

Mark Yandell
Professor of Human Genetics
H.A. & Edna Benning Presidential Endowed Chair
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
ph:801-587-7707

________________________________________
From: Joseph Said [joesaid at nmsu.edu]
Sent: Friday, February 21, 2014 5:18 PM
To: Mark Yandell
Subject: I am a PhD candidate at NMSU and have a question about maker2

Dear Dr. Yandell,

I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me.

Thank you,
Joe

Sent from my iPad


From dence at genetics.utah.edu  Fri Feb 21 21:27:10 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Sat, 22 Feb 2014 04:27:10 +0000
Subject: [maker-devel] I am a PhD candidate at NMSU and have a question
	about maker2
In-Reply-To: <d5533a5c463b498e877651cd01820309@BY2PR01MB506.prod.exchangelabs.com>
References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu>,
	<7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu>,
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66E9C@mxb2.hg.genetics.utah.edu>,
	<d5533a5c463b498e877651cd01820309@BY2PR01MB506.prod.exchangelabs.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D66ECE@mxb2.hg.genetics.utah.edu>

Hi Joe, 

MAKER runs blast from your local system (or your server where MAKER is installed), and it blasts evidence that the user supplies in the "est" and "protein" settings. The est and protein settings are set in the maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file and the specific blast settings are in the "maker_bopts.ctl" file. 

Will you attach those file to your reply, so we can make sure that the settings are set up correctly?

Thanks,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Joseph Said [joesaid at nmsu.edu]
Sent: Friday, February 21, 2014 7:44 PM
To: Daniel Ence
Subject: RE: I am a PhD candidate at NMSU and have a question about maker2

Hi Daniel,

Thank you for getting back to me so quickly. I am using the cotton Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I believe maker2 just calls BLAST from NCBI's page. So when I search the cotton genome it returns zero hits. But then I used a known cotton gene as a test and ran a search and also returned zero hits. I am not sure what the problem is but it seems like the protocol that should be returning the results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits. I can a BLAST standalone and came up with hits for both my gene of interest and the control test gene and came up with results.

Thanks,
Joe
________________________________________
From: Daniel Ence <dence at genetics.utah.edu>
Sent: Friday, February 21, 2014 7:38 PM
To: Joseph Said
Cc: maker-devel at yandell-lab.org
Subject: RE: I am a PhD candidate at NMSU and have a question about maker2

Hi Joe,

Will you upload your control files and data at this URL?
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169

Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene?

I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues.

Thanks,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Mark Yandell
Sent: Friday, February 21, 2014 7:32 PM
To: Daniel Ence
Subject: FW: I am a PhD candidate at NMSU and have a question about maker2

Mark Yandell
Professor of Human Genetics
H.A. & Edna Benning Presidential Endowed Chair
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
ph:801-587-7707

________________________________________
From: Joseph Said [joesaid at nmsu.edu]
Sent: Friday, February 21, 2014 5:18 PM
To: Mark Yandell
Subject: I am a PhD candidate at NMSU and have a question about maker2

Dear Dr. Yandell,

I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me.

Thank you,
Joe

Sent from my iPad


From dence at genetics.utah.edu  Sat Feb 22 15:51:48 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Sat, 22 Feb 2014 22:51:48 +0000
Subject: [maker-devel] I am a PhD candidate at NMSU and have a question
	about maker2
In-Reply-To: <CA+ebk3=kXzXEH+DVjKFvMNt689-Gwjw-+6GtySaMG_gZLQ5XvA@mail.gmail.com>
References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu>
	<7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66E9C@mxb2.hg.genetics.utah.edu>
	<d5533a5c463b498e877651cd01820309@BY2PR01MB506.prod.exchangelabs.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66ECE@mxb2.hg.genetics.utah.edu>
	<6FA1C2F9-68A0-4154-8825-7B502E4762BF@nmsu.edu>,
	<CA+ebk3=kXzXEH+DVjKFvMNt689-Gwjw-+6GtySaMG_gZLQ5XvA@mail.gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D66F8F@mxb2.hg.genetics.utah.edu>

Hi,

Will you send me the long file that you were trying to blast against?

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: Hua Zhong [zh9118 at gmail.com]
Sent: Saturday, February 22, 2014 10:46 AM
To: Daniel Ence
Cc: Joe Song; Joseph Said
Subject: Re: I am a PhD candidate at NMSU and have a question about maker2

hi all,
Attached are the three configuration files and two input files, which are used to predict something between the genome and protein. For a simple test, we used one short sequence about 60bp and its translated protein sequence as inputs. But got nothing returned. What's more, we did test long genome sequence as one input as well, but still got nothing. I am not sure what's the reason cause this result.
Thanks a lot for help.

Hua


On Fri, Feb 21, 2014 at 9:31 PM, Joseph Said <joesaid at nmsu.edu<mailto:joesaid at nmsu.edu>> wrote:
Hi Daniel,

I do not have the exact files with me right now, but my coauthors on the paper I am working on have been copied on this email. Hua can send you those files. Thank you for being very helpful especially on a Friday night.

Thanks,
Joe

Sent from my iPad

> On Feb 21, 2014, at 9:27 PM, "Daniel Ence" <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>> wrote:
>
> Hi Joe,
>
> MAKER runs blast from your local system (or your server where MAKER is installed), and it blasts evidence that the user supplies in the "est" and "protein" settings. The est and protein settings are set in the maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file and the specific blast settings are in the "maker_bopts.ctl" file.
>
> Will you attach those file to your reply, so we can make sure that the settings are set up correctly?
>
> Thanks,
> Daniel
>
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ________________________________________
> From: Joseph Said [joesaid at nmsu.edu<mailto:joesaid at nmsu.edu>]
> Sent: Friday, February 21, 2014 7:44 PM
> To: Daniel Ence
> Subject: RE: I am a PhD candidate at NMSU and have a question about maker2
>
> Hi Daniel,
>
> Thank you for getting back to me so quickly. I am using the cotton Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I believe maker2 just calls BLAST from NCBI's page. So when I search the cotton genome it returns zero hits. But then I used a known cotton gene as a test and ran a search and also returned zero hits. I am not sure what the problem is but it seems like the protocol that should be returning the results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits. I can a BLAST standalone and came up with hits for both my gene of interest and the control test gene and came up with results.
>
> Thanks,
> Joe
> ________________________________________
> From: Daniel Ence <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>>
> Sent: Friday, February 21, 2014 7:38 PM
> To: Joseph Said
> Cc: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
> Subject: RE: I am a PhD candidate at NMSU and have a question about maker2
>
> Hi Joe,
>
> Will you upload your control files and data at this URL?
> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169
>
> Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene?
>
> I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues.
>
> Thanks,
> Daniel
>
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ________________________________________
> From: Mark Yandell
> Sent: Friday, February 21, 2014 7:32 PM
> To: Daniel Ence
> Subject: FW: I am a PhD candidate at NMSU and have a question about maker2
>
> Mark Yandell
> Professor of Human Genetics
> H.A. & Edna Benning Presidential Endowed Chair
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ph:801-587-7707
>
> ________________________________________
> From: Joseph Said [joesaid at nmsu.edu<mailto:joesaid at nmsu.edu>]
> Sent: Friday, February 21, 2014 5:18 PM
> To: Mark Yandell
> Subject: I am a PhD candidate at NMSU and have a question about maker2
>
> Dear Dr. Yandell,
>
> I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me.
>
> Thank you,
> Joe
>
> Sent from my iPad

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140222/2fbf1dbc/attachment-0001.html>

From dence at genetics.utah.edu  Sat Feb 22 16:21:51 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Sat, 22 Feb 2014 23:21:51 +0000
Subject: [maker-devel] I am a PhD candidate at NMSU and have a question
	about maker2
In-Reply-To: <CA+ebk3=2mJi_1wxy5gnkOb4syEVZ14Pcj_bGRVcq=uHgySPmqQ@mail.gmail.com>
References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu>
	<7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66E9C@mxb2.hg.genetics.utah.edu>
	<d5533a5c463b498e877651cd01820309@BY2PR01MB506.prod.exchangelabs.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66ECE@mxb2.hg.genetics.utah.edu>
	<6FA1C2F9-68A0-4154-8825-7B502E4762BF@nmsu.edu>
	<CA+ebk3=kXzXEH+DVjKFvMNt689-Gwjw-+6GtySaMG_gZLQ5XvA@mail.gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66F8F@mxb2.hg.genetics.utah.edu>,
	<CA+ebk3=2mJi_1wxy5gnkOb4syEVZ14Pcj_bGRVcq=uHgySPmqQ@mail.gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D66FAB@mxb2.hg.genetics.utah.edu>

Hi Hua, will you upload the genome file to this URL?
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=170
I am more concerned that MAKER didn't find the gene in the whole genome than in the 60bp substring. I think that MAKER needs more sequence than that to annotate a gene model.

Will you also upload the MAKER output and datastore from the MAKER run?

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: Hua Zhong [zh9118 at gmail.com]
Sent: Saturday, February 22, 2014 4:00 PM
To: Daniel Ence
Cc: maker-devel at yandell-lab.org; Joseph Said; Joe Song
Subject: RE: I am a PhD candidate at NMSU and have a question about maker2


The long file we used is a whole genome. Quite huge a file. I am not able to send that. Sorry. But in the simple test i told you, the nucleotide sequence sent you is consider to be the genome file, and protein sequence is another input. There two are what we want to blast against to each other to see if Maker2 works well.
Thanks.

On Feb 22, 2014 3:51 PM, "Daniel Ence" <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>> wrote:
Hi,

Will you send me the long file that you were trying to blast against?

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: Hua Zhong [zh9118 at gmail.com<mailto:zh9118 at gmail.com>]
Sent: Saturday, February 22, 2014 10:46 AM
To: Daniel Ence
Cc: Joe Song; Joseph Said
Subject: Re: I am a PhD candidate at NMSU and have a question about maker2

hi all,
Attached are the three configuration files and two input files, which are used to predict something between the genome and protein. For a simple test, we used one short sequence about 60bp and its translated protein sequence as inputs. But got nothing returned. What's more, we did test long genome sequence as one input as well, but still got nothing. I am not sure what's the reason cause this result.
Thanks a lot for help.

Hua


On Fri, Feb 21, 2014 at 9:31 PM, Joseph Said <joesaid at nmsu.edu<mailto:joesaid at nmsu.edu>> wrote:
Hi Daniel,

I do not have the exact files with me right now, but my coauthors on the paper I am working on have been copied on this email. Hua can send you those files. Thank you for being very helpful especially on a Friday night.

Thanks,
Joe

Sent from my iPad

> On Feb 21, 2014, at 9:27 PM, "Daniel Ence" <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>> wrote:
>
> Hi Joe,
>
> MAKER runs blast from your local system (or your server where MAKER is installed), and it blasts evidence that the user supplies in the "est" and "protein" settings. The est and protein settings are set in the maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file and the specific blast settings are in the "maker_bopts.ctl" file.
>
> Will you attach those file to your reply, so we can make sure that the settings are set up correctly?
>
> Thanks,
> Daniel
>
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ________________________________________
> From: Joseph Said [joesaid at nmsu.edu<mailto:joesaid at nmsu.edu>]
> Sent: Friday, February 21, 2014 7:44 PM
> To: Daniel Ence
> Subject: RE: I am a PhD candidate at NMSU and have a question about maker2
>
> Hi Daniel,
>
> Thank you for getting back to me so quickly. I am using the cotton Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I believe maker2 just calls BLAST from NCBI's page. So when I search the cotton genome it returns zero hits. But then I used a known cotton gene as a test and ran a search and also returned zero hits. I am not sure what the problem is but it seems like the protocol that should be returning the results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits. I can a BLAST standalone and came up with hits for both my gene of interest and the control test gene and came up with results.
>
> Thanks,
> Joe
> ________________________________________
> From: Daniel Ence <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>>
> Sent: Friday, February 21, 2014 7:38 PM
> To: Joseph Said
> Cc: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
> Subject: RE: I am a PhD candidate at NMSU and have a question about maker2
>
> Hi Joe,
>
> Will you upload your control files and data at this URL?
> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169
>
> Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene?
>
> I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues.
>
> Thanks,
> Daniel
>
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ________________________________________
> From: Mark Yandell
> Sent: Friday, February 21, 2014 7:32 PM
> To: Daniel Ence
> Subject: FW: I am a PhD candidate at NMSU and have a question about maker2
>
> Mark Yandell
> Professor of Human Genetics
> H.A. & Edna Benning Presidential Endowed Chair
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ph:801-587-7707<tel:801-587-7707>
>
> ________________________________________
> From: Joseph Said [joesaid at nmsu.edu<mailto:joesaid at nmsu.edu>]
> Sent: Friday, February 21, 2014 5:18 PM
> To: Mark Yandell
> Subject: I am a PhD candidate at NMSU and have a question about maker2
>
> Dear Dr. Yandell,
>
> I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me.
>
> Thank you,
> Joe
>
> Sent from my iPad

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140222/0033879e/attachment-0001.html>

From mikael.durling at slu.se  Sun Feb 23 09:57:09 2014
From: mikael.durling at slu.se (=?iso-8859-1?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Sun, 23 Feb 2014 16:57:09 +0000
Subject: [maker-devel] Maker predicting fusion genes?
Message-ID: <4CFD158A-DE75-4756-AD05-4CBF99BAF72D@slu.se>

Dear list and maker developers,

I was browsing the results of a recent maker run, focusing on differences between this run with the a recent maker (svn r1067) and a previous run with svn revision 1022 (I recall). One of the differences I found was a gene lost in the new prediction set, but replaced by an extended version of a previous neighbor (see http://figshare.com/articles/Maker_prediction_comparison/942300).  As you can see, there is no support for the join in the evidence. Do you have any clue to what might cause this?

Best regards,
Mikael Durling


From carsonhh at gmail.com  Sun Feb 23 13:00:50 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Sun, 23 Feb 2014 13:00:50 -0700
Subject: [maker-devel] Maker predicting fusion genes?
Message-ID: <CF2FA087.A21D%carsonhh@gmail.com>

The image doesn?t show all evidence sources, but the short answer is that
one of you evidence sources (est2genome, protein2genome, or blastx)
bridges the two regions, and when provided the bridged hint one of the
gene predictors thinks it makes sense to create a single model instead.
my guess is that it?s blastx evidence.

?Carson


On 2/23/14, 9:57 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
wrote:

>Dear list and maker developers,
>
>I was browsing the results of a recent maker run, focusing on differences
>between this run with the a recent maker (svn r1067) and a previous run
>with svn revision 1022 (I recall). One of the differences I found was a
>gene lost in the new prediction set, but replaced by an extended version
>of a previous neighbor (see
>http://figshare.com/articles/Maker_prediction_comparison/942300).  As you
>can see, there is no support for the join in the evidence. Do you have
>any clue to what might cause this?
>
>Best regards,
>Mikael Durling
>
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From mikael.durling at slu.se  Sun Feb 23 14:14:00 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Sun, 23 Feb 2014 21:14:00 +0000
Subject: [maker-devel] Maker predicting fusion genes?
In-Reply-To: <CF2FA087.A21D%carsonhh@gmail.com>
References: <CF2FA087.A21D%carsonhh@gmail.com>
Message-ID: <7CCC5270-93B9-4E5A-9687-26A1BF0EB1F8@slu.se>

Ok, do you by that imply that the predictions that end up in the gff3 output from the ab initio predictors (snap_masked, augustus_masked, and genemark), are not the final hinted predictions? Otherwise, I?m sorry that I can?t follow your reasoning. I checked my gff file, and there is no evidence there to support the bridge, as far as I can tell (See attached gff of the region or http://figshare.com/articles/Maker_prediction/942301 where all evidence is plotted).

Mikael


23 feb 2014 kl. 21:00 skrev Carson Holt <carsonhh at gmail.com>:

> The image doesn?t show all evidence sources, but the short answer is that
> one of you evidence sources (est2genome, protein2genome, or blastx)
> bridges the two regions, and when provided the bridged hint one of the
> gene predictors thinks it makes sense to create a single model instead.
> my guess is that it?s blastx evidence.
>
> ?Carson
>
>
> On 2/23/14, 9:57 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
> wrote:
>
>> Dear list and maker developers,
>>
>> I was browsing the results of a recent maker run, focusing on differences
>> between this run with the a recent maker (svn r1067) and a previous run
>> with svn revision 1022 (I recall). One of the differences I found was a
>> gene lost in the new prediction set, but replaced by an extended version
>> of a previous neighbor (see
>> http://figshare.com/articles/Maker_prediction_comparison/942300).  As you
>> can see, there is no support for the join in the evidence. Do you have
>> any clue to what might cause this?
>>
>> Best regards,
>> Mikael Durling
>>
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140223/240ecba4/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: region.gff3
Type: application/octet-stream
Size: 19612 bytes
Desc: region.gff3
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140223/240ecba4/attachment-0001.obj>

From hedgyx at yahoo.com  Mon Feb 24 00:02:41 2014
From: hedgyx at yahoo.com (Megan)
Date: Sun, 23 Feb 2014 23:02:41 -0800 (PST)
Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides
Message-ID: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com>

Maker folks,
I am re-annotating a single contig and I am having a few problems.

First, I am having trouble passing through a Maker derived gff (from Maker 2.09, with some modifications to gene names and functional information added).  The gff file passes the modencode validator but Maker always fails on the first gene in the file, regardless of which gene comes first.  So it appears to be a systematic error across the entire file.  The Maker error is "Check your input GFF3 file for errors! (from GFFDB)".   I have tried Maker 2.10 and 2.31, using both genome_gff with model_pass=1 and pred_gff.  Attached is a gff with the first 2 genes.  

Second, when I updated to Maker 2.31, Maker now complains that my EST fasta file has nucleotides that are not supported [RYKMSWBDHV].  It suggests "set -fix_nucleotides on the command line to fix this automatically".  Is the -fix_nucleotides a Maker flag?  What exactly does it do?  Does it remove the entire sequence or replace ambiguous bases with a randomly selected one?  Half of my 20k ESTs contain these characters, so I don't want to throw them out entirely.  

Also, just curious, has Maker never supported these characters but just never complained?  I used this EST data set with Maker 2.09.  I did note poor EST coverage, but thought it was an issue with the EST data itself.

I appreciate any suggestions.
Thanks,
Megan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: part_passthru.gff
Type: application/octet-stream
Size: 4363 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140223/3950a0b4/attachment-0001.obj>

From zh9118 at gmail.com  Sat Feb 22 16:00:28 2014
From: zh9118 at gmail.com (Hua Zhong)
Date: Sat, 22 Feb 2014 16:00:28 -0700
Subject: [maker-devel] I am a PhD candidate at NMSU and have a question
	about maker2
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D66F8F@mxb2.hg.genetics.utah.edu>
References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu>
	<7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66E9C@mxb2.hg.genetics.utah.edu>
	<d5533a5c463b498e877651cd01820309@BY2PR01MB506.prod.exchangelabs.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66ECE@mxb2.hg.genetics.utah.edu>
	<6FA1C2F9-68A0-4154-8825-7B502E4762BF@nmsu.edu>
	<CA+ebk3=kXzXEH+DVjKFvMNt689-Gwjw-+6GtySaMG_gZLQ5XvA@mail.gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66F8F@mxb2.hg.genetics.utah.edu>
Message-ID: <CA+ebk3=2mJi_1wxy5gnkOb4syEVZ14Pcj_bGRVcq=uHgySPmqQ@mail.gmail.com>

The long file we used is a whole genome. Quite huge a file. I am not able
to send that. Sorry. But in the simple test i told you, the nucleotide
sequence sent you is consider to be the genome file, and protein sequence
is another input. There two are what we want to blast against to each other
to see if Maker2 works well.
Thanks.
On Feb 22, 2014 3:51 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>  Hi,
>
>  Will you send me the long file that you were trying to blast against?
>
>  Thanks,
> Daniel
>
>  Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
>   ------------------------------
> *From:* Hua Zhong [zh9118 at gmail.com]
> *Sent:* Saturday, February 22, 2014 10:46 AM
> *To:* Daniel Ence
> *Cc:* Joe Song; Joseph Said
> *Subject:* Re: I am a PhD candidate at NMSU and have a question about
> maker2
>
>   hi all,
> Attached are the three configuration files and two input files, which are
> used to predict something between the genome and protein. For a simple
> test, we used one short sequence about 60bp and its translated protein
> sequence as inputs. But got nothing returned. What's more, we did test long
> genome sequence as one input as well, but still got nothing. I am not sure
> what's the reason cause this result.
> Thanks a lot for help.
>
>  Hua
>
>
>
>
> On Fri, Feb 21, 2014 at 9:31 PM, Joseph Said <joesaid at nmsu.edu> wrote:
>
>> Hi Daniel,
>>
>> I do not have the exact files with me right now, but my coauthors on the
>> paper I am working on have been copied on this email. Hua can send you
>> those files. Thank you for being very helpful especially on a Friday night.
>>
>> Thanks,
>> Joe
>>
>> Sent from my iPad
>>
>> > On Feb 21, 2014, at 9:27 PM, "Daniel Ence" <dence at genetics.utah.edu>
>> wrote:
>> >
>> > Hi Joe,
>> >
>> > MAKER runs blast from your local system (or your server where MAKER is
>> installed), and it blasts evidence that the user supplies in the "est" and
>> "protein" settings. The est and protein settings are set in the
>> maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file
>> and the specific blast settings are in the "maker_bopts.ctl" file.
>> >
>> > Will you attach those file to your reply, so we can make sure that the
>> settings are set up correctly?
>> >
>> > Thanks,
>> > Daniel
>> >
>> >
>> > Daniel Ence
>> > Graduate Student
>> > Eccles Institute of Human Genetics
>> > University of Utah
>> > 15 North 2030 East, Room 2100
>> > Salt Lake City, UT 84112-5330
>> > ________________________________________
>> > From: Joseph Said [joesaid at nmsu.edu]
>> > Sent: Friday, February 21, 2014 7:44 PM
>> > To: Daniel Ence
>> > Subject: RE: I am a PhD candidate at NMSU and have a question about
>> maker2
>> >
>> > Hi Daniel,
>> >
>> > Thank you for getting back to me so quickly. I am using the cotton
>> Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the
>> GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I
>> believe maker2 just calls BLAST from NCBI's page. So when I search the
>> cotton genome it returns zero hits. But then I used a known cotton gene as
>> a test and ran a search and also returned zero hits. I am not sure what the
>> problem is but it seems like the protocol that should be returning the
>> results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits.
>> I can a BLAST standalone and came up with hits for both my gene of interest
>> and the control test gene and came up with results.
>> >
>> > Thanks,
>> > Joe
>> > ________________________________________
>> > From: Daniel Ence <dence at genetics.utah.edu>
>> > Sent: Friday, February 21, 2014 7:38 PM
>> > To: Joseph Said
>> > Cc: maker-devel at yandell-lab.org
>> > Subject: RE: I am a PhD candidate at NMSU and have a question about
>> maker2
>> >
>> > Hi Joe,
>> >
>> > Will you upload your control files and data at this URL?
>> > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169
>> >
>> > Also, what version of MAKER and blast are you using? And which file are
>> you using for the known arabidopsis gene?
>> >
>> > I've copied this email to the maker-development list, which is a really
>> good resource for trouble-shooting MAKER issues.
>> >
>> > Thanks,
>> > Daniel
>> >
>> >
>> > Daniel Ence
>> > Graduate Student
>> > Eccles Institute of Human Genetics
>> > University of Utah
>> > 15 North 2030 East, Room 2100
>> > Salt Lake City, UT 84112-5330
>> > ________________________________________
>> > From: Mark Yandell
>> > Sent: Friday, February 21, 2014 7:32 PM
>> > To: Daniel Ence
>> > Subject: FW: I am a PhD candidate at NMSU and have a question about
>> maker2
>> >
>> > Mark Yandell
>> > Professor of Human Genetics
>> > H.A. & Edna Benning Presidential Endowed Chair
>> > Eccles Institute of Human Genetics
>> > University of Utah
>> > 15 North 2030 East, Room 2100
>> > Salt Lake City, UT 84112-5330
>> > ph:801-587-7707
>> >
>> > ________________________________________
>> > From: Joseph Said [joesaid at nmsu.edu]
>> > Sent: Friday, February 21, 2014 5:18 PM
>> > To: Mark Yandell
>> > Subject: I am a PhD candidate at NMSU and have a question about maker2
>> >
>> > Dear Dr. Yandell,
>> >
>> > I am a molecular biologist at NMSU. I am trying to use maker2 with the
>> cotton genome, and search an Arabidopsis gene against it. I think there is
>> a problem with the blast component because zero results are returned. I
>> tried troubleshooting by searching a known gene and still returned zero
>> results. Is this a common problem maybe with the pipeline? I would
>> appreciate any ideas you might have to help me.
>> >
>> > Thank you,
>> > Joe
>> >
>> > Sent from my iPad
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140222/57e1804c/attachment-0001.html>

From carsonhh at gmail.com  Mon Feb 24 11:18:18 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Feb 2014 11:18:18 -0700
Subject: [maker-devel] gff pass thru problem and unsupported EST
 nucleotides
In-Reply-To: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com>
References: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com>
Message-ID: <CF30D6EC.A2CC%carsonhh@gmail.com>

The -fix_nucleotides flag is added to the command line (I.e. maker
-fix_nucleotides flag).  It is there so you are aware that there is an
issue with your fasta file, that will cause things downstream to fail.
MAKER can fix the errors for you, but first it gives a warning designed to
make you look at the file and validate it.  Why would you want to do this?
 For example, what if you provided protein sequence to the EST option
accidentally, you wouldn?t want MAKER to just proceed.  You want a warning
so you can check first.  If your file is in fact EST data, then set the
flag and those characters will be changed to N?s in the fixed fasta
sequence, otherwise those characters will cause errors in downstream tools
like exonerate, and even some downstream GMOD tools, so they can?t be
allowed to remain as is.

For the GFF3 file, there is almost definitely a logic issue in the file
(mod encode validator won?t check for those).  This can be from prior
manipulation of the GFF3 file.  For example, IDs for a gene that are the
same across two contigs (technically valid but a logic error).  The GFF3
error message will normally give the ID of the feature causing the issue.

I could also take a look for you.  You can upload the GFF3 file here ?>
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
Click on 'new guest account' then e-mail me back you guest ID, so I know
which files to review.

Thanks,
Carson


On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com> wrote:

>Maker folks,
>I am re-annotating a single contig and I am having a few problems.
>
>First, I am having trouble passing through a Maker derived gff (from
>Maker 2.09, with some modifications to gene names and functional
>information added).  The gff file passes the modencode validator but
>Maker always fails on the first gene in the file, regardless of which
>gene comes first.  So it appears to be a systematic error across the
>entire file.  The Maker error is "Check your input GFF3 file for errors!
>(from GFFDB)".   I have tried Maker 2.10 and 2.31, using both genome_gff
>with model_pass=1 and pred_gff.  Attached is a gff with the first 2
>genes.  
>
>Second, when I updated to Maker 2.31, Maker now complains that my EST
>fasta file has nucleotides that are not supported [RYKMSWBDHV].  It
>suggests "set -fix_nucleotides on the command line to fix this
>automatically".  Is the -fix_nucleotides a Maker flag?  What exactly does
>it do?  Does it remove the entire sequence or replace ambiguous bases
>with a randomly selected one?  Half of my 20k ESTs contain these
>characters, so I don't want to throw them out entirely.
>
>Also, just curious, has Maker never supported these characters but just
>never complained?  I used this EST data set with Maker 2.09.  I did note
>poor EST coverage, but thought it was an issue with the EST data itself.
>
>I appreciate any suggestions.
>Thanks,
>Megan_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From dence at genetics.utah.edu  Mon Feb 24 11:31:47 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Mon, 24 Feb 2014 18:31:47 +0000
Subject: [maker-devel] gff pass thru problem and unsupported EST
 nucleotides
In-Reply-To: <CF30D6EC.A2CC%carsonhh@gmail.com>
References: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com>,
	<CF30D6EC.A2CC%carsonhh@gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D671BB@mxb2.hg.genetics.utah.edu>

Hi Megan, 

One problem with the GFF3 that you attached is that the ID's for the CDS features are being made wrong. All of the CDS features for a given mRNA or transcript should have the same ID. The CDS features in your GFF3 have IDs that use the exon name. 

You can fix it with this command-line perl:
cat part_passthru.gff | perl -ane 'if(/\tCDS\t/){ chomp; /Parent=([\S]+)/; my $parent=$1; s/ID=([^\;]+)/ID=$parent-cds/; print "$_\n"}else{print $_}' > fixed.gff3

It just fixes the ID attributes in all of the CDS features. Try it on the test gff3 you sent and let me know if it works. I can't test it myself without the fasta file that you are annotating. 

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Carson Holt [carsonhh at gmail.com]
Sent: Monday, February 24, 2014 11:18 AM
To: Megan; maker-devel at yandell-lab.org
Subject: Re: [maker-devel] gff pass thru problem and unsupported EST nucleotides

The -fix_nucleotides flag is added to the command line (I.e. maker
-fix_nucleotides flag).  It is there so you are aware that there is an
issue with your fasta file, that will cause things downstream to fail.
MAKER can fix the errors for you, but first it gives a warning designed to
make you look at the file and validate it.  Why would you want to do this?
 For example, what if you provided protein sequence to the EST option
accidentally, you wouldn?t want MAKER to just proceed.  You want a warning
so you can check first.  If your file is in fact EST data, then set the
flag and those characters will be changed to N?s in the fixed fasta
sequence, otherwise those characters will cause errors in downstream tools
like exonerate, and even some downstream GMOD tools, so they can?t be
allowed to remain as is.

For the GFF3 file, there is almost definitely a logic issue in the file
(mod encode validator won?t check for those).  This can be from prior
manipulation of the GFF3 file.  For example, IDs for a gene that are the
same across two contigs (technically valid but a logic error).  The GFF3
error message will normally give the ID of the feature causing the issue.

I could also take a look for you.  You can upload the GFF3 file here ?>
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
Click on 'new guest account' then e-mail me back you guest ID, so I know
which files to review.

Thanks,
Carson


On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com> wrote:

>Maker folks,
>I am re-annotating a single contig and I am having a few problems.
>
>First, I am having trouble passing through a Maker derived gff (from
>Maker 2.09, with some modifications to gene names and functional
>information added).  The gff file passes the modencode validator but
>Maker always fails on the first gene in the file, regardless of which
>gene comes first.  So it appears to be a systematic error across the
>entire file.  The Maker error is "Check your input GFF3 file for errors!
>(from GFFDB)".   I have tried Maker 2.10 and 2.31, using both genome_gff
>with model_pass=1 and pred_gff.  Attached is a gff with the first 2
>genes.
>
>Second, when I updated to Maker 2.31, Maker now complains that my EST
>fasta file has nucleotides that are not supported [RYKMSWBDHV].  It
>suggests "set -fix_nucleotides on the command line to fix this
>automatically".  Is the -fix_nucleotides a Maker flag?  What exactly does
>it do?  Does it remove the entire sequence or replace ambiguous bases
>with a randomly selected one?  Half of my 20k ESTs contain these
>characters, so I don't want to throw them out entirely.
>
>Also, just curious, has Maker never supported these characters but just
>never complained?  I used this EST data set with Maker 2.09.  I did note
>poor EST coverage, but thought it was an issue with the EST data itself.
>
>I appreciate any suggestions.
>Thanks,
>Megan_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Mon Feb 24 11:34:28 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Feb 2014 11:34:28 -0700
Subject: [maker-devel] gff pass thru problem and unsupported EST
 nucleotides
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D671BB@mxb2.hg.genetics.utah.edu>
References: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com>
	<CF30D6EC.A2CC%carsonhh@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D671BB@mxb2.hg.genetics.utah.edu>
Message-ID: <CF30DE6B.A2F6%carsonhh@gmail.com>

Actually that is not true.  CDS IDs can be the same or different.  MAKER
doesn?t care either way.  Both are valid in GFF3.  Having the same ID just
allows then to be put together by some GMOD viewers without having to go
through a container feature.

?Carson

On 2/24/14, 11:31 AM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hi Megan, 
>
>One problem with the GFF3 that you attached is that the ID's for the CDS
>features are being made wrong. All of the CDS features for a given mRNA
>or transcript should have the same ID. The CDS features in your GFF3 have
>IDs that use the exon name.
>
>You can fix it with this command-line perl:
>cat part_passthru.gff | perl -ane 'if(/\tCDS\t/){ chomp;
>/Parent=([\S]+)/; my $parent=$1; s/ID=([^\;]+)/ID=$parent-cds/; print
>"$_\n"}else{print $_}' > fixed.gff3
>
>It just fixes the ID attributes in all of the CDS features. Try it on the
>test gff3 you sent and let me know if it works. I can't test it myself
>without the fasta file that you are annotating.
>
>Thanks,
>Daniel
>
>Daniel Ence
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>Carson Holt [carsonhh at gmail.com]
>Sent: Monday, February 24, 2014 11:18 AM
>To: Megan; maker-devel at yandell-lab.org
>Subject: Re: [maker-devel] gff pass thru problem and unsupported EST
>nucleotides
>
>The -fix_nucleotides flag is added to the command line (I.e. maker
>-fix_nucleotides flag).  It is there so you are aware that there is an
>issue with your fasta file, that will cause things downstream to fail.
>MAKER can fix the errors for you, but first it gives a warning designed to
>make you look at the file and validate it.  Why would you want to do this?
> For example, what if you provided protein sequence to the EST option
>accidentally, you wouldn?t want MAKER to just proceed.  You want a warning
>so you can check first.  If your file is in fact EST data, then set the
>flag and those characters will be changed to N?s in the fixed fasta
>sequence, otherwise those characters will cause errors in downstream tools
>like exonerate, and even some downstream GMOD tools, so they can?t be
>allowed to remain as is.
>
>For the GFF3 file, there is almost definitely a logic issue in the file
>(mod encode validator won?t check for those).  This can be from prior
>manipulation of the GFF3 file.  For example, IDs for a gene that are the
>same across two contigs (technically valid but a logic error).  The GFF3
>error message will normally give the ID of the feature causing the issue.
>
>I could also take a look for you.  You can upload the GFF3 file here ?>
>http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>Click on 'new guest account' then e-mail me back you guest ID, so I know
>which files to review.
>
>Thanks,
>Carson
>
>
>
>On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com> wrote:
>
>>Maker folks,
>>I am re-annotating a single contig and I am having a few problems.
>>
>>First, I am having trouble passing through a Maker derived gff (from
>>Maker 2.09, with some modifications to gene names and functional
>>information added).  The gff file passes the modencode validator but
>>Maker always fails on the first gene in the file, regardless of which
>>gene comes first.  So it appears to be a systematic error across the
>>entire file.  The Maker error is "Check your input GFF3 file for errors!
>>(from GFFDB)".   I have tried Maker 2.10 and 2.31, using both genome_gff
>>with model_pass=1 and pred_gff.  Attached is a gff with the first 2
>>genes.
>>
>>Second, when I updated to Maker 2.31, Maker now complains that my EST
>>fasta file has nucleotides that are not supported [RYKMSWBDHV].  It
>>suggests "set -fix_nucleotides on the command line to fix this
>>automatically".  Is the -fix_nucleotides a Maker flag?  What exactly does
>>it do?  Does it remove the entire sequence or replace ambiguous bases
>>with a randomly selected one?  Half of my 20k ESTs contain these
>>characters, so I don't want to throw them out entirely.
>>
>>Also, just curious, has Maker never supported these characters but just
>>never complained?  I used this EST data set with Maker 2.09.  I did note
>>poor EST coverage, but thought it was an issue with the EST data itself.
>>
>>I appreciate any suggestions.
>>Thanks,
>>Megan_______________________________________________
>>maker-devel mailing list
>>maker-devel at box290.bluehost.com
>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Mon Feb 24 13:59:12 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Feb 2014 13:59:12 -0700
Subject: [maker-devel] gff pass thru problem and unsupported EST
 nucleotides
In-Reply-To: <1393273971.41635.YahooMailBasic@web162205.mail.bf1.yahoo.com>
References: <CF30D6EC.A2CC%carsonhh@gmail.com>
	<1393273971.41635.YahooMailBasic@web162205.mail.bf1.yahoo.com>
Message-ID: <CF30FEE0.A32D%carsonhh@gmail.com>

I found the issue.  You have non-ascii characters at the end of almost
every line.  Because they are happening within the Parent= tag, they then
become part of the Parent ID when the file is read.

So instead of "HERA000031-RA? you get ?> "HERA000031-RA\cM? as the Parent
ID.

?\cM? is a meta-return.

I ran the attached script to remove these characters (perl purify
<gff3_file>), and then it works.  Make sure to remove the
.../Hera_Cr_HmelHybd_Nov2013.maker.output/Hera_Cr_HmelHybd_Nov2013.db file
to force the GFF3 database to be rebuilt after fixing the file when you
rerun MAKER.

Thanks,
Carson


On 2/24/14, 1:32 PM, "Megan" <hedgyx at yahoo.com> wrote:

>Hi Carson and Daniel,
>
>Thanks for your suggestions.  I have looked at the gff file, but I do not
>see any obvious errors.  I have uploaded the files to your website.  The
>reference fasta is there, the full gff, and a single gene gff that also
>causes an error.  If I remove that gene from the full gff, then the error
>is on the next gene in the file, so it appears to be a systematic problem
>throughout the gff.  The gff was generated by Maker, but I may have
>messed it up when I modified it to rename genes and add functional
>information.  I checked with cat -te, but don't see any obvious
>formatting errors.
>
>Thanks!
>Megan
>
>
>--------------------------------------------
>On Mon, 2/24/14, Carson Holt <carsonhh at gmail.com> wrote:
>
> Subject: Re: [maker-devel] gff pass thru problem and unsupported EST
>nucleotides
> To: "Megan" <hedgyx at yahoo.com>, maker-devel at yandell-lab.org
> Date: Monday, February 24, 2014, 10:18 AM
> 
> The -fix_nucleotides flag is added to
> the command line (I.e. maker
> -fix_nucleotides flag).  It is there so you are aware
> that there is an
> issue with your fasta file, that will cause things
> downstream to fail.
> MAKER can fix the errors for you, but first it gives a
> warning designed to
> make you look at the file and validate it.  Why would
> you want to do this?
>  For example, what if you provided protein sequence to the
> EST option
> accidentally, you wouldn?t want MAKER to just
> proceed.  You want a warning
> so you can check first.  If your file is in fact EST
> data, then set the
> flag and those characters will be changed to N?s in the
> fixed fasta
> sequence, otherwise those characters will cause errors in
> downstream tools
> like exonerate, and even some downstream GMOD tools, so they
> can?t be
> allowed to remain as is.
> 
> For the GFF3 file, there is almost definitely a logic issue
> in the file
> (mod encode validator won?t check for those).  This
> can be from prior
> manipulation of the GFF3 file.  For example, IDs for a
> gene that are the
> same across two contigs (technically valid but a logic
> error).  The GFF3
> error message will normally give the ID of the feature
> causing the issue.
> 
> I could also take a look for you.  You can upload the
> GFF3 file here ?>
> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
> Click on 'new guest account' then e-mail me back you guest
> ID, so I know
> which files to review.
> 
> Thanks,
> Carson
> 
> 
> 
> On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com>
> wrote:
> 
> >Maker folks,
> >I am re-annotating a single contig and I am having a few
> problems.
> >
> >First, I am having trouble passing through a Maker
> derived gff (from
> >Maker 2.09, with some modifications to gene names and
> functional
> >information added).  The gff file passes the
> modencode validator but
> >Maker always fails on the first gene in the file,
> regardless of which
> >gene comes first.  So it appears to be a systematic
> error across the
> >entire file.  The Maker error is "Check your input
> GFF3 file for errors!
> >(from GFFDB)".   I have tried Maker 2.10
> and 2.31, using both genome_gff
> >with model_pass=1 and pred_gff.  Attached is a gff
> with the first 2
> >genes.  
> >
> >Second, when I updated to Maker 2.31, Maker now
> complains that my EST
> >fasta file has nucleotides that are not supported
> [RYKMSWBDHV].  It
> >suggests "set -fix_nucleotides on the command line to
> fix this
> >automatically".  Is the -fix_nucleotides a Maker
> flag?  What exactly does
> >it do?  Does it remove the entire sequence or
> replace ambiguous bases
> >with a randomly selected one?  Half of my 20k ESTs
> contain these
> >characters, so I don't want to throw them out entirely.
> >
> >Also, just curious, has Maker never supported these
> characters but just
> >never complained?  I used this EST data set with
> Maker 2.09.  I did note
> >poor EST coverage, but thought it was an issue with the
> EST data itself.
> >
> >I appreciate any suggestions.
> >Thanks,
> >Megan_______________________________________________
> >maker-devel mailing list
> >maker-devel at box290.bluehost.com
> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> 
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: purify
Type: application/octet-stream
Size: 1965 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140224/a1582e7d/attachment-0001.obj>

From carsonhh at gmail.com  Mon Feb 24 14:03:00 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Feb 2014 14:03:00 -0700
Subject: [maker-devel] gff pass thru problem and unsupported EST
 nucleotides
In-Reply-To: <CF30FEE0.A32D%carsonhh@gmail.com>
References: <CF30D6EC.A2CC%carsonhh@gmail.com>
	<1393273971.41635.YahooMailBasic@web162205.mail.bf1.yahoo.com>
	<CF30FEE0.A32D%carsonhh@gmail.com>
Message-ID: <CF310121.A33F%carsonhh@gmail.com>

One more thing.  You must give the file to pred_gff or model_gff.  It is
no longer strictly a MAKER file, as many of the source columns read ?.?
meaning it has been edited by Apollo or another editor.  So it will not be
guaranteed to be recognized by genome_gff, because many of the source tags
have changed.

Thanks,
Carson


On 2/24/14, 1:59 PM, "Carson Holt" <carsonhh at gmail.com> wrote:

>I found the issue.  You have non-ascii characters at the end of almost
>every line.  Because they are happening within the Parent= tag, they then
>become part of the Parent ID when the file is read.
>
>So instead of "HERA000031-RA? you get ?> "HERA000031-RA\cM? as the Parent
>ID.
>
>?\cM? is a meta-return.
>
>I ran the attached script to remove these characters (perl purify
><gff3_file>), and then it works.  Make sure to remove the
>.../Hera_Cr_HmelHybd_Nov2013.maker.output/Hera_Cr_HmelHybd_Nov2013.db file
>to force the GFF3 database to be rebuilt after fixing the file when you
>rerun MAKER.
>
>Thanks,
>Carson
>
>
>
>
>On 2/24/14, 1:32 PM, "Megan" <hedgyx at yahoo.com> wrote:
>
>>Hi Carson and Daniel,
>>
>>Thanks for your suggestions.  I have looked at the gff file, but I do not
>>see any obvious errors.  I have uploaded the files to your website.  The
>>reference fasta is there, the full gff, and a single gene gff that also
>>causes an error.  If I remove that gene from the full gff, then the error
>>is on the next gene in the file, so it appears to be a systematic problem
>>throughout the gff.  The gff was generated by Maker, but I may have
>>messed it up when I modified it to rename genes and add functional
>>information.  I checked with cat -te, but don't see any obvious
>>formatting errors.
>>
>>Thanks!
>>Megan
>>
>>
>>--------------------------------------------
>>On Mon, 2/24/14, Carson Holt <carsonhh at gmail.com> wrote:
>>
>> Subject: Re: [maker-devel] gff pass thru problem and unsupported EST
>>nucleotides
>> To: "Megan" <hedgyx at yahoo.com>, maker-devel at yandell-lab.org
>> Date: Monday, February 24, 2014, 10:18 AM
>> 
>> The -fix_nucleotides flag is added to
>> the command line (I.e. maker
>> -fix_nucleotides flag).  It is there so you are aware
>> that there is an
>> issue with your fasta file, that will cause things
>> downstream to fail.
>> MAKER can fix the errors for you, but first it gives a
>> warning designed to
>> make you look at the file and validate it.  Why would
>> you want to do this?
>>  For example, what if you provided protein sequence to the
>> EST option
>> accidentally, you wouldn?t want MAKER to just
>> proceed.  You want a warning
>> so you can check first.  If your file is in fact EST
>> data, then set the
>> flag and those characters will be changed to N?s in the
>> fixed fasta
>> sequence, otherwise those characters will cause errors in
>> downstream tools
>> like exonerate, and even some downstream GMOD tools, so they
>> can?t be
>> allowed to remain as is.
>> 
>> For the GFF3 file, there is almost definitely a logic issue
>> in the file
>> (mod encode validator won?t check for those).  This
>> can be from prior
>> manipulation of the GFF3 file.  For example, IDs for a
>> gene that are the
>> same across two contigs (technically valid but a logic
>> error).  The GFF3
>> error message will normally give the ID of the feature
>> causing the issue.
>> 
>> I could also take a look for you.  You can upload the
>> GFF3 file here ?>
>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>> Click on 'new guest account' then e-mail me back you guest
>> ID, so I know
>> which files to review.
>> 
>> Thanks,
>> Carson
>> 
>> 
>> 
>> On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com>
>> wrote:
>> 
>> >Maker folks,
>> >I am re-annotating a single contig and I am having a few
>> problems.
>> >
>> >First, I am having trouble passing through a Maker
>> derived gff (from
>> >Maker 2.09, with some modifications to gene names and
>> functional
>> >information added).  The gff file passes the
>> modencode validator but
>> >Maker always fails on the first gene in the file,
>> regardless of which
>> >gene comes first.  So it appears to be a systematic
>> error across the
>> >entire file.  The Maker error is "Check your input
>> GFF3 file for errors!
>> >(from GFFDB)".   I have tried Maker 2.10
>> and 2.31, using both genome_gff
>> >with model_pass=1 and pred_gff.  Attached is a gff
>> with the first 2
>> >genes.  
>> >
>> >Second, when I updated to Maker 2.31, Maker now
>> complains that my EST
>> >fasta file has nucleotides that are not supported
>> [RYKMSWBDHV].  It
>> >suggests "set -fix_nucleotides on the command line to
>> fix this
>> >automatically".  Is the -fix_nucleotides a Maker
>> flag?  What exactly does
>> >it do?  Does it remove the entire sequence or
>> replace ambiguous bases
>> >with a randomly selected one?  Half of my 20k ESTs
>> contain these
>> >characters, so I don't want to throw them out entirely.
>> >
>> >Also, just curious, has Maker never supported these
>> characters but just
>> >never complained?  I used this EST data set with
>> Maker 2.09.  I did note
>> >poor EST coverage, but thought it was an issue with the
>> EST data itself.
>> >
>> >I appreciate any suggestions.
>> >Thanks,
>> >Megan_______________________________________________
>> >maker-devel mailing list
>> >maker-devel at box290.bluehost.com
>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
>> 
>>
>


From rbharris at uw.edu  Tue Feb 25 14:49:57 2014
From: rbharris at uw.edu (Rebecca Harris)
Date: Tue, 25 Feb 2014 13:49:57 -0800
Subject: [maker-devel] error in snap training
Message-ID: <CAESS276MjRUmto+9fkr68jRXBE9or4geWB-q4Oc5_qKsQOdnpA@mail.gmail.com>

Hey -

I'm trying to train SNAP and am running into errors. I don't have any EST
evidence, just protein. My .gff file reports 10865 genes but when I run
maker2zff  -c0 -e0 I get back empty genome files. When I run maker2zff -n,
a ton of overlap_prev_exon errors get written to the screen and then with I
get to the forge step I get an "impossible error5". Any help would be
greatly appreciated.

Thanks!
Rebecca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140225/cc68f3a6/attachment-0001.html>

From carsonhh at gmail.com  Tue Feb 25 15:12:14 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 25 Feb 2014 15:12:14 -0700
Subject: [maker-devel] error in snap training
In-Reply-To: <CAESS276MjRUmto+9fkr68jRXBE9or4geWB-q4Oc5_qKsQOdnpA@mail.gmail.com>
References: <CAESS276MjRUmto+9fkr68jRXBE9or4geWB-q4Oc5_qKsQOdnpA@mail.gmail.com>
Message-ID: <16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com>

Make sure you are using 2.31,  and then try the maker2zff filters individually.  If the protein models are not working well, use CEGMA to generate models. It's from the same group as SNAP.  Use cegma2zff for the conversion.

--Carson

Sent from my iPhone

> On Feb 25, 2014, at 2:49 PM, Rebecca Harris <rbharris at uw.edu> wrote:
> 
> Hey - 
> 
> I'm trying to train SNAP and am running into errors. I don't have any EST evidence, just protein. My .gff file reports 10865 genes but when I run maker2zff  -c0 -e0 I get back empty genome files. When I run maker2zff -n, a ton of overlap_prev_exon errors get written to the screen and then with I get to the forge step I get an "impossible error5". Any help would be greatly appreciated.
> 
> Thanks!
> Rebecca
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From sjackman at gmail.com  Tue Feb 25 17:06:03 2014
From: sjackman at gmail.com (Shaun Jackman)
Date: Tue, 25 Feb 2014 16:06:03 -0800
Subject: [maker-devel] Mapping gene names
Message-ID: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>

Hi,

I?m annotating a genome using a closely related genome from Genbank, using
the .frn (RNA) and .faa (protein) files from Genbank as evidence to
annotate my genome. I?ve run Maker, and the annotation seems to have worked
well. Is it possible to map the names of the genes from the related species
to my annotation? I see the *map_forward* option, which applies to the
*model_gff* parameter. Is there a similar option for *est* and *protein*?

*maker_opts.ctl*

est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1

Thanks,
Shaun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140225/7ae5e966/attachment-0001.html>

From hedgyx at yahoo.com  Tue Feb 25 17:26:11 2014
From: hedgyx at yahoo.com (Megan)
Date: Tue, 25 Feb 2014 16:26:11 -0800 (PST)
Subject: [maker-devel] gff pass thru problem and unsupported EST
	nucleotides
In-Reply-To: <CF30FEE0.A32D%carsonhh@gmail.com>
Message-ID: <1393374371.45210.YahooMailBasic@web162201.mail.bf1.yahoo.com>

Carson,

Everything ran through smoothly after removing the ^Ms.  Thanks for the help.

Megan
--------------------------------------------
On Mon, 2/24/14, Carson Holt <carsonhh at gmail.com> wrote:

 Subject: Re: [maker-devel] gff pass thru problem and unsupported EST nucleotides
 To: "Megan" <hedgyx at yahoo.com>, "Daniel Ence" <dence at genetics.utah.edu>
 Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
 Date: Monday, February 24, 2014, 12:59 PM
 
 I found the issue.? You have
 non-ascii characters at the end of almost
 every line.? Because they are happening within the
 Parent= tag, they then
 become part of the Parent ID when the file is read.
 
 So instead of "HERA000031-RA? you get ?>
 "HERA000031-RA\cM? as the Parent
 ID.
 
 ?\cM? is a meta-return.
 
 I ran the attached script to remove these characters (perl
 purify
 <gff3_file>), and then it works.? Make sure to
 remove the
 .../Hera_Cr_HmelHybd_Nov2013.maker.output/Hera_Cr_HmelHybd_Nov2013.db
 file
 to force the GFF3 database to be rebuilt after fixing the
 file when you
 rerun MAKER.
 
 Thanks,
 Carson
 
 
 On 2/24/14, 1:32 PM, "Megan" <hedgyx at yahoo.com>
 wrote:
 
 >Hi Carson and Daniel,
 >
 >Thanks for your suggestions.? I have looked at the
 gff file, but I do not
 >see any obvious errors.? I have uploaded the files
 to your website.? The
 >reference fasta is there, the full gff, and a single
 gene gff that also
 >causes an error.? If I remove that gene from the
 full gff, then the error
 >is on the next gene in the file, so it appears to be a
 systematic problem
 >throughout the gff.? The gff was generated by
 Maker, but I may have
 >messed it up when I modified it to rename genes and add
 functional
 >information.? I checked with cat -te, but don't see
 any obvious
 >formatting errors.
 >
 >Thanks!
 >Megan
 >
 >
 >--------------------------------------------
 >On Mon, 2/24/14, Carson Holt <carsonhh at gmail.com>
 wrote:
 >
 > Subject: Re: [maker-devel] gff pass thru problem and
 unsupported EST
 >nucleotides
 > To: "Megan" <hedgyx at yahoo.com>,
 maker-devel at yandell-lab.org
 > Date: Monday, February 24, 2014, 10:18 AM
 > 
 > The -fix_nucleotides flag is added to
 > the command line (I.e. maker
 > -fix_nucleotides flag).? It is there so you are
 aware
 > that there is an
 > issue with your fasta file, that will cause things
 > downstream to fail.
 > MAKER can fix the errors for you, but first it gives a
 > warning designed to
 > make you look at the file and validate it.? Why
 would
 > you want to do this?
 >? For example, what if you provided protein
 sequence to the
 > EST option
 > accidentally, you wouldn?t want MAKER to just
 > proceed.? You want a warning
 > so you can check first.? If your file is in fact
 EST
 > data, then set the
 > flag and those characters will be changed to N?s in
 the
 > fixed fasta
 > sequence, otherwise those characters will cause errors
 in
 > downstream tools
 > like exonerate, and even some downstream GMOD tools, so
 they
 > can?t be
 > allowed to remain as is.
 > 
 > For the GFF3 file, there is almost definitely a logic
 issue
 > in the file
 > (mod encode validator won?t check for those).?
 This
 > can be from prior
 > manipulation of the GFF3 file.? For example, IDs
 for a
 > gene that are the
 > same across two contigs (technically valid but a logic
 > error).? The GFF3
 > error message will normally give the ID of the feature
 > causing the issue.
 > 
 > I could also take a look for you.? You can upload
 the
 > GFF3 file here ?>
 > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
 > Click on 'new guest account' then e-mail me back you
 guest
 > ID, so I know
 > which files to review.
 > 
 > Thanks,
 > Carson
 > 
 > 
 > 
 > On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com>
 > wrote:
 > 
 > >Maker folks,
 > >I am re-annotating a single contig and I am having
 a few
 > problems.
 > >
 > >First, I am having trouble passing through a Maker
 > derived gff (from
 > >Maker 2.09, with some modifications to gene names
 and
 > functional
 > >information added).? The gff file passes the
 > modencode validator but
 > >Maker always fails on the first gene in the file,
 > regardless of which
 > >gene comes first.? So it appears to be a
 systematic
 > error across the
 > >entire file.? The Maker error is "Check your
 input
 > GFF3 file for errors!
 > >(from GFFDB)".???I have tried Maker
 2.10
 > and 2.31, using both genome_gff
 > >with model_pass=1 and pred_gff.? Attached is a
 gff
 > with the first 2
 > >genes.? 
 > >
 > >Second, when I updated to Maker 2.31, Maker now
 > complains that my EST
 > >fasta file has nucleotides that are not supported
 > [RYKMSWBDHV].? It
 > >suggests "set -fix_nucleotides on the command line
 to
 > fix this
 > >automatically".? Is the -fix_nucleotides a
 Maker
 > flag?? What exactly does
 > >it do?? Does it remove the entire sequence or
 > replace ambiguous bases
 > >with a randomly selected one?? Half of my 20k
 ESTs
 > contain these
 > >characters, so I don't want to throw them out
 entirely.
 > >
 > >Also, just curious, has Maker never supported
 these
 > characters but just
 > >never complained?? I used this EST data set
 with
 > Maker 2.09.? I did note
 > >poor EST coverage, but thought it was an issue with
 the
 > EST data itself.
 > >
 > >I appreciate any suggestions.
 > >Thanks,
 >
 >Megan_______________________________________________
 > >maker-devel mailing list
 > >maker-devel at box290.bluehost.com
 > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
 > 
 > 
 >
 
 
From carsonhh at gmail.com  Tue Feb 25 17:58:08 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 25 Feb 2014 17:58:08 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
Message-ID: <CF32868D.A42A%carsonhh@gmail.com>

There is a way.  It?s not a standard option and it?s undocumented, but if
you add est_forward=1 to the maker_opts.ctl file, then it will do just that.
The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option.  If you add tags
to your fasta headers, those can be used to guide the mapping and naming.
For example, gene_id=<some_gene>  will ensure different isoforms that share
a common gene_id get clustered into the same gene, and
maker_coor=chr1:1-10000 in the fasta header will force a particular sequence
to only be mapped against chr1 within the range of 1-10000 bp  and just
using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast
alignments of earlier transcript or protein annotations as a guide.

?Carson


From:  Shaun Jackman <sjackman at gmail.com>
Reply-To:  Shaun Jackman <sjackman at gmail.com>
Date:  Tuesday, February 25, 2014 at 5:06 PM
To:  <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Mapping gene names

Hi,

I?m annotating a genome using a closely related genome from Genbank, using
the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate
my genome. I?ve run Maker, and the annotation seems to have worked well. Is
it possible to map the names of the genes from the related species to my
annotation? I see the map_forward option, which applies to the model_gff
parameter. Is there a similar option for est and protein?

maker_opts.ctl
est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1
Thanks,
Shaun
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140225/acb85579/attachment-0001.html>

From carsonhh at gmail.com  Tue Feb 25 18:04:48 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 25 Feb 2014 18:04:48 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CF32868D.A42A%carsonhh@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
Message-ID: <CF328AAA.A44D%carsonhh@gmail.com>

One more note.  When using this option, the score column of mRNA features
will represent how completely this gene matches the source EST/protein
(fraction coverage multiplied by % identity).  So a value of 100 means there
is perfect match.  This way if the same transcript maps to multiple
locations, then you can identify which locations is the closest match (also
works for identifying likly orthologs vs. paralogs).

?Carson


From:  Carson Holt <carsonhh at gmail.com>
Date:  Tuesday, February 25, 2014 at 5:58 PM
To:  Shaun Jackman <sjackman at gmail.com>, <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Mapping gene names

There is a way.  It?s not a standard option and it?s undocumented, but if
you add est_forward=1 to the maker_opts.ctl file, then it will do just that.
The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option.  If you add tags
to your fasta headers, those can be used to guide the mapping and naming.
For example, gene_id=<some_gene>  will ensure different isoforms that share
a common gene_id get clustered into the same gene, and
maker_coor=chr1:1-10000 in the fasta header will force a particular sequence
to only be mapped against chr1 within the range of 1-10000 bp  and just
using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast
alignments of earlier transcript or protein annotations as a guide.

?Carson


From:  Shaun Jackman <sjackman at gmail.com>
Reply-To:  Shaun Jackman <sjackman at gmail.com>
Date:  Tuesday, February 25, 2014 at 5:06 PM
To:  <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Mapping gene names

Hi,

I?m annotating a genome using a closely related genome from Genbank, using
the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate
my genome. I?ve run Maker, and the annotation seems to have worked well. Is
it possible to map the names of the genes from the related species to my
annotation? I see the map_forward option, which applies to the model_gff
parameter. Is there a similar option for est and protein?

maker_opts.ctl
est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1
Thanks,
Shaun
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m
aker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140225/bc343f94/attachment-0001.html>

From weckalba at asu.edu  Tue Feb 25 18:36:21 2014
From: weckalba at asu.edu (Walter Eckalbar)
Date: Tue, 25 Feb 2014 17:36:21 -0800
Subject: [maker-devel] invalid gff3 format issues
Message-ID: <CANRPJScjqJDph_SMu0+8PaTMDT7aym9a3u_nhVihYa6BNxZ3AQ@mail.gmail.com>

Hi all,

I am trying to update maker annotations with PASA and encountered errors
stemming from file format issues in the gff3 file.

I put a few lines from the gff3 to highlight the issue below.  Basically,
the problem is that there are non-unique IDs for a number of the
annotations.

Is there anything that can be done to right this problem?

Thanks,

Walter

Lines from GFF3 file, repeated IDs are highlighted:


chr1    maker    gene    9377440    9432028    .    -    .
ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16
chr1    maker    mRNA    9377440    9432028    .    -    .
ID=maker-chr1-snap-gene-4.53-mRNA-1;
Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234
chr1    maker    exon    9431899    9432028    .    -    .
ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1
chr1    maker    exon    9431698    9431808    .    -    .
ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1

chr1    maker    gene    8894975    9021577    .    +    .
ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53
chr1    maker    mRNA    8894975    9021577    .    +    .
ID=maker-chr1-snap-gene-4.53-mRNA-1;
Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007
chr1    maker    exon    8894975    8895153    .    +    .
ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
chr1    maker    exon    8942215    8942531    .    +    .
ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140225/2bb3934c/attachment-0001.html>

From dence at genetics.utah.edu  Tue Feb 25 19:02:04 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Wed, 26 Feb 2014 02:02:04 +0000
Subject: [maker-devel] invalid gff3 format issues
In-Reply-To: <CANRPJScjqJDph_SMu0+8PaTMDT7aym9a3u_nhVihYa6BNxZ3AQ@mail.gmail.com>
References: <CANRPJScjqJDph_SMu0+8PaTMDT7aym9a3u_nhVihYa6BNxZ3AQ@mail.gmail.com>
Message-ID: <BA9485A1-B761-4C33-A695-9FF6EF43B109@genetics.utah.edu>

Hi Walter,

Will you upload the full GFF3 and the control files that you used to this URL?
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189
Also, what version of MAKER are you running this with?

Thanks,
Daniel


On Feb 25, 2014, at 6:36 PM, Walter Eckalbar <weckalba at asu.edu<mailto:weckalba at asu.edu>>
 wrote:

Hi all,

I am trying to update maker annotations with PASA and encountered errors stemming from file format issues in the gff3 file.

I put a few lines from the gff3 to highlight the issue below.  Basically, the problem is that there are non-unique IDs for a number of the annotations.

Is there anything that can be done to right this problem?

Thanks,

Walter

Lines from GFF3 file, repeated IDs are highlighted:


chr1    maker    gene    9377440    9432028    .    -    .    ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16
chr1    maker    mRNA    9377440    9432028    .    -    .    ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234
chr1    maker    exon    9431899    9432028    .    -    .    ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1
chr1    maker    exon    9431698    9431808    .    -    .    ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1

chr1    maker    gene    8894975    9021577    .    +    .    ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53
chr1    maker    mRNA    8894975    9021577    .    +    .    ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007
chr1    maker    exon    8894975    8895153    .    +    .    ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
chr1    maker    exon    8942215    8942531    .    +    .    ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/72236939/attachment-0001.html>

From weckalba at asu.edu  Tue Feb 25 19:11:12 2014
From: weckalba at asu.edu (Walter Eckalbar)
Date: Tue, 25 Feb 2014 18:11:12 -0800
Subject: [maker-devel] invalid gff3 format issues
In-Reply-To: <BA9485A1-B761-4C33-A695-9FF6EF43B109@genetics.utah.edu>
References: <CANRPJScjqJDph_SMu0+8PaTMDT7aym9a3u_nhVihYa6BNxZ3AQ@mail.gmail.com>
	<BA9485A1-B761-4C33-A695-9FF6EF43B109@genetics.utah.edu>
Message-ID: <CANRPJSdY6--A0QtTOUBNQM+HN7dWRDv1YZv7bi=+CVef8LLRXw@mail.gmail.com>

Hi Daniel, those have been uploaded and I'm using version 2.28.

Walter


On 25 February 2014 18:02, Daniel Ence <dence at genetics.utah.edu> wrote:

>  Hi Walter,
>
>  Will you upload the full GFF3 and the control files that you used to
> this URL?
>  http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189
> Also, what version of MAKER are you running this with?
>
>  Thanks,
> Daniel
>
>
>
>  On Feb 25, 2014, at 6:36 PM, Walter Eckalbar <weckalba at asu.edu>
>  wrote:
>
>   Hi all,
>
> I am trying to update maker annotations with PASA and encountered errors
> stemming from file format issues in the gff3 file.
>
>  I put a few lines from the gff3 to highlight the issue below.  Basically,
> the problem is that there are non-unique IDs for a number of the
> annotations.
>
>  Is there anything that can be done to right this problem?
>
> Thanks,
>
>  Walter
>
> Lines from GFF3 file, repeated IDs are highlighted:
>
>
> chr1    maker    gene    9377440    9432028    .    -    .
> ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16
> chr1    maker    mRNA    9377440    9432028    .    -    .
> ID=maker-chr1-snap-gene-4.53-mRNA-1;
> Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234
> chr1    maker    exon    9431899    9432028    .    -    .
> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1
> chr1    maker    exon    9431698    9431808    .    -    .
> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1
>
> chr1    maker    gene    8894975    9021577    .    +    .
> ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53
> chr1    maker    mRNA    8894975    9021577    .    +    .   ID=maker-chr1-snap-gene-4.53-mRNA-1;
> Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007
> chr1    maker    exon    8894975    8895153    .    +    .
> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
> chr1    maker    exon    8942215    8942531    .    +    .
> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
>  _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140225/2392a8fd/attachment-0001.html>

From carsonhh at gmail.com  Tue Feb 25 21:10:27 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 25 Feb 2014 21:10:27 -0700
Subject: [maker-devel] invalid gff3 format issues
In-Reply-To: <CANRPJSdY6--A0QtTOUBNQM+HN7dWRDv1YZv7bi=+CVef8LLRXw@mail.gmail.com>
References: <CANRPJScjqJDph_SMu0+8PaTMDT7aym9a3u_nhVihYa6BNxZ3AQ@mail.gmail.com>
	<BA9485A1-B761-4C33-A695-9FF6EF43B109@genetics.utah.edu>
	<CANRPJSdY6--A0QtTOUBNQM+HN7dWRDv1YZv7bi=+CVef8LLRXw@mail.gmail.com>
Message-ID: <CF32B115.A46C%carsonhh@gmail.com>

Could you try version 2.31 (the current version)?  I believe this is
happening because you are passing in MAKER genes as pred_gff the transcripts
thus ended up with the same Names and IDs as the genes being generated by
the MAKER run via SNAP etc.  This shouldn?t happen with model_gff, and
shouldn?t happen in 2.31 (IDs and names are generated slightly differently
in 2.30+).

Thanks,
Carson

From:  Walter Eckalbar <weckalba at asu.edu>
Date:  Tuesday, February 25, 2014 at 7:11 PM
To:  Daniel Ence <dence at genetics.utah.edu>
Cc:  "<maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] invalid gff3 format issues

Hi Daniel, those have been uploaded and I?m using version 2.28.

Walter


On 25 February 2014 18:02, Daniel Ence <dence at genetics.utah.edu> wrote:
> Hi Walter, 
> 
> Will you upload the full GFF3 and the control files that you used to this URL?
> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189
> Also, what version of MAKER are you running this with?
> 
> Thanks,
> Daniel
> 
> 
> 
> On Feb 25, 2014, at 6:36 PM, Walter Eckalbar <weckalba at asu.edu>
>  wrote:
> 
>> Hi all,
>> 
>> I am trying to update maker annotations with PASA and encountered errors
>> stemming from file format issues in the gff3 file.
>> 
>> I put a few lines from the gff3 to highlight the issue below.  Basically, the
>> problem is that there are non-unique IDs for a number of the annotations.
>> 
>> Is there anything that can be done to right this problem?
>> 
>> Thanks,
>> 
>> Walter
>> 
>> Lines from GFF3 file, repeated IDs are highlighted:
>> 
>> 
>> chr1    maker    gene    9377440    9432028    .    -    .
>> ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.
>> 16
>> chr1    maker    mRNA    9377440    9432028    .    -    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-pred_gff_maker-gene-4.1
>> 6;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82
>> |1|1|1|28|1680|1234
>> chr1    maker    exon    9431899    9432028    .    -    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53
>> -mRNA-1
>> chr1    maker    exon    9431698    9431808    .    -    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53
>> -mRNA-1
>> 
>> chr1    maker    gene    8894975    9021577    .    +    .
>> ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53
>> chr1    maker    mRNA    8894975    9021577    .    +    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-snap-gene-4.53;Name=mak
>> er-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0
>> .88|27|503|2007
>> chr1    maker    exon    8894975    8895153    .    +    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53
>> -mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,mak
>> er-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-sna
>> p-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53
>> -mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,ma
>> ker-chr1-snap-gene-4.53-mRNA-11
>> chr1    maker    exon    8942215    8942531    .    +    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53
>> -mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,mak
>> er-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-sna
>> p-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53
>> -mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,ma
>> ker-chr1-snap-gene-4.53-mRNA-11
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 

_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140225/f87e77c7/attachment-0001.html>

From marc.hoeppner at imbim.uu.se  Wed Feb 26 01:26:35 2014
From: marc.hoeppner at imbim.uu.se (=?Windows-1252?Q?Marc_H=F6ppner?=)
Date: Wed, 26 Feb 2014 08:26:35 +0000
Subject: [maker-devel] Functional annotation options
Message-ID: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se>

Dear List,

I have finished a gene build now, and I would like to go over to functional annotation. I understand that maker includes a few script to facilitate such analyses. However, I have a few questions about this:

1) iprscan
It seems maker includes a MPI wrapper for InterProscan, but requests ?iprscan? to be in $PATH. The latest versions of Interproscan I have worked with are java applications and eventho I put their location in $PATH, mpi_iprscan seems to want something else? But what?

2) maker_functional_gff
This script seems to be very useful, but the description suggests that it requires WuBlast tabular output ?2', which I think looks quite different from the ncbi blast tabular output. Since Wublast is not really available anymore (except this very old, frozen binary bundle), I was wondering how to address this issue. 

3) maker_functional
This just throws an error about a missing Job ID, so no clue what this would be used for.

I guess what I am after is some suggestion as to how use the scripts included with Maker to achieve a reasonable functional annotation. 

With kind regards,

Marc Hoeppner

Marc P. Hoeppner, PhD
Team Leader
BILS Genome Annotation Platform
Department for Medical Biochemistry and Microbiology
Uppsala University, Sweden
marc.hoeppner at imbim.uu.se


From mikael.durling at slu.se  Wed Feb 26 02:43:43 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Wed, 26 Feb 2014 09:43:43 +0000
Subject: [maker-devel] Functional annotation options
In-Reply-To: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se>
References: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se>
Message-ID: <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se>


26 feb 2014 kl. 09:26 skrev Marc H?ppner <marc.hoeppner at imbim.uu.se>:

> Dear List,
> 
> I have finished a gene build now, and I would like to go over to functional annotation. I understand that maker includes a few script to facilitate such analyses. However, I have a few questions about this:
> 
> 1) iprscan
> It seems maker includes a MPI wrapper for InterProscan, but requests ?iprscan? to be in $PATH. The latest versions of Interproscan I have worked with are java applications and eventho I put their location in $PATH, mpi_iprscan seems to want something else? But what?

I don?t believe it works with interproscan5. What I usually do is to split the maker protein file into chunks, and then run these chunks as separate jobs on our cluster, then finally merge the results. The TSV file form iprscan5 can be input into the maker tool ipr_update_gff. I have not tried the iprscan2gff3, as I haven?t figured how to get an iprscan4 raw file from iprscan5.


> 2) maker_functional_gff
> This script seems to be very useful, but the description suggests that it requires WuBlast tabular output ?2', which I think looks quite different from the ncbi blast tabular output. Since Wublast is not really available anymore (except this very old, frozen binary bundle), I was wondering how to address this issue. 

It works fine with ncbiblast+ and the blastp command with -outfmt 6. 

cheers,
Mikael

Ps. Your welcome to visit me at SLU if you would like to discuss experiences of genome annotations.


> 
> 3) maker_functional
> This just throws an error about a missing Job ID, so no clue what this would be used for.
> 
> I guess what I am after is some suggestion as to how use the scripts included with Maker to achieve a reasonable functional annotation. 
> 
> With kind regards,
> 
> Marc Hoeppner
> 
> Marc P. Hoeppner, PhD
> Team Leader
> BILS Genome Annotation Platform
> Department for Medical Biochemistry and Microbiology
> Uppsala University, Sweden
> marc.hoeppner at imbim.uu.se
> 
> 
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From mikael.durling at slu.se  Wed Feb 26 02:55:56 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Wed, 26 Feb 2014 09:55:56 +0000
Subject: [maker-devel] Functional annotation options
In-Reply-To: <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se>
References: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se>
	<63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se>
Message-ID: <29357689-D616-465F-BCC4-66AF5B1D5D2E@slu.se>


26 feb 2014 kl. 10:43 skrev Mikael Brandstr?m Durling <mikael.durling at slu.se<mailto:mikael.durling at slu.se>>:


26 feb 2014 kl. 09:26 skrev Marc H?ppner <marc.hoeppner at imbim.uu.se<mailto:marc.hoeppner at imbim.uu.se>>:

Dear List,

I have finished a gene build now, and I would like to go over to functional annotation. I understand that maker includes a few script to facilitate such analyses. However, I have a few questions about this:

1) iprscan
It seems maker includes a MPI wrapper for InterProscan, but requests ?iprscan? to be in $PATH. The latest versions of Interproscan I have worked with are java applications and eventho I put their location in $PATH, mpi_iprscan seems to want something else? But what?

I don?t believe it works with interproscan5. What I usually do is to split the maker protein file into chunks, and then run these chunks as separate jobs on our cluster, then finally merge the results. The TSV file form iprscan5 can be input into the maker tool ipr_update_gff. I have not tried the iprscan2gff3, as I haven?t figured how to get an iprscan4 raw file from iprscan5.

I should clarify this and say that mpi_iprscan doesn?t seem to work with iprscan5. ipr_update_gff3 does, however.


Mikael

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/b4a81f22/attachment-0001.html>

From mikael.durling at slu.se  Wed Feb 26 05:30:44 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Wed, 26 Feb 2014 12:30:44 +0000
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CF32868D.A42A%carsonhh@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
Message-ID: <BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>

Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?

Thanks,
Mikael

26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>:

There is a way.  It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that.  The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option.  If you add tags to your fasta headers, those can be used to guide the mapping and naming.  For example, gene_id=<some_gene>  will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp  and just using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.

?Carson


From: Shaun Jackman <sjackman at gmail.com<mailto:sjackman at gmail.com>>
Reply-To: Shaun Jackman <sjackman at gmail.com<mailto:sjackman at gmail.com>>
Date: Tuesday, February 25, 2014 at 5:06 PM
To: <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: [maker-devel] Mapping gene names


Hi,

I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?

maker_opts.ctl

est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1


Thanks,
Shaun

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/874f135e/attachment-0001.html>

From carsonhh at gmail.com  Wed Feb 26 06:22:34 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 26 Feb 2014 06:22:34 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
Message-ID: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>

Yes.  That should work as well as an accidental feature.

--Carson 

Sent from my iPhone

> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se> wrote:
> 
> Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?
> 
> Thanks,
> Mikael
> 
>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>> 
>> There is a way.  It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that.  The option won?t already be there so you?ll have to type it in.
>> 
>> There is also a feature designed to work with this option.  If you add tags to your fasta headers, those can be used to guide the mapping and naming.  For example, gene_id=<some_gene>  will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp  and just using maker_coor=chr1 will force it to only be mapped against chr1.
>> 
>> This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.
>> 
>> ?Carson
>> 
>> 
>> 
>> 
>> From: Shaun Jackman <sjackman at gmail.com>
>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>> Date: Tuesday, February 25, 2014 at 5:06 PM
>> To: <maker-devel at yandell-lab.org>
>> Subject: [maker-devel] Mapping gene names
>> 
>> Hi,
>> 
>> I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?
>> 
>> maker_opts.ctl
>> 
>> est=NC_123456.frn
>> protein=NC_123456.faa
>> est2genome=1
>> protein2genome=1
>> Thanks,
>> Shaun
>> 
>> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/f3b97c58/attachment-0001.html>

From mikael.durling at slu.se  Wed Feb 26 06:37:29 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Wed, 26 Feb 2014 13:37:29 +0000
Subject: [maker-devel] Mapping gene names
In-Reply-To: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
Message-ID: <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>

That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right?

Mikael

26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>:

Yes.  That should work as well as an accidental feature.

--Carson

Sent from my iPhone

On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se<mailto:mikael.durling at slu.se>> wrote:

Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?

Thanks,
Mikael

26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>:

There is a way.  It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that.  The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option.  If you add tags to your fasta headers, those can be used to guide the mapping and naming.  For example, gene_id=<some_gene>  will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp  and just using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.

?Carson


From: Shaun Jackman <sjackman at gmail.com<mailto:sjackman at gmail.com>>
Reply-To: Shaun Jackman <sjackman at gmail.com<mailto:sjackman at gmail.com>>
Date: Tuesday, February 25, 2014 at 5:06 PM
To: <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: [maker-devel] Mapping gene names


Hi,

I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?

maker_opts.ctl

est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1


Thanks,
Shaun

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/791ef46d/attachment-0001.html>

From nextgen.usfs at gmail.com  Wed Feb 26 09:21:33 2014
From: nextgen.usfs at gmail.com (USFS Ion PGM)
Date: Wed, 26 Feb 2014 10:21:33 -0600
Subject: [maker-devel] change program locations in maker_exe
Message-ID: <CDD24D4E-4555-474F-9367-B6F6D05F11B4@gmail.com>

Hello,
I was wondering if there is a way to make permanent changes to the maker_exe.ctl file, as it seems on the install that maker didn?t find the gene mark or pro build locations correctly, which means that I have to manually edit the maker_exe.ctl file every time and add that information.  Where can I modify this permanently so that the maker -CTL command creates the appropriate maker_exe file?  Thank you.

- Jon


From carsonhh at gmail.com  Wed Feb 26 08:38:47 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 26 Feb 2014 08:38:47 -0700
Subject: [maker-devel] Functional annotation options
In-Reply-To: <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se>
References: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se>
	<63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se>
Message-ID: <CF33558F.A4C3%carsonhh@gmail.com>

maker_functional is a script that gets called by another script, not meant
to be called directly by the user.  So ignore that.

Just run iprscan directly it already works pretty well.  The mpi_iprscan
and iprscan_wrap scripts, just give some logging functionality by wrapping
the iprscan call.  In most cases there is not advantage over just running
iprscan directly.

?Carson


On 2/26/14, 2:43 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
wrote:

>
>26 feb 2014 kl. 09:26 skrev Marc H?ppner <marc.hoeppner at imbim.uu.se>:
>
>> Dear List,
>> 
>> I have finished a gene build now, and I would like to go over to
>>functional annotation. I understand that maker includes a few script to
>>facilitate such analyses. However, I have a few questions about this:
>> 
>> 1) iprscan
>> It seems maker includes a MPI wrapper for InterProscan, but requests
>>?iprscan? to be in $PATH. The latest versions of Interproscan I have
>>worked with are java applications and eventho I put their location in
>>$PATH, mpi_iprscan seems to want something else? But what?
>
>I don?t believe it works with interproscan5. What I usually do is to
>split the maker protein file into chunks, and then run these chunks as
>separate jobs on our cluster, then finally merge the results. The TSV
>file form iprscan5 can be input into the maker tool ipr_update_gff. I
>have not tried the iprscan2gff3, as I haven?t figured how to get an
>iprscan4 raw file from iprscan5.
>
>
>> 2) maker_functional_gff
>> This script seems to be very useful, but the description suggests that
>>it requires WuBlast tabular output ?2', which I think looks quite
>>different from the ncbi blast tabular output. Since Wublast is not
>>really available anymore (except this very old, frozen binary bundle), I
>>was wondering how to address this issue.
>
>It works fine with ncbiblast+ and the blastp command with -outfmt 6.
>
>cheers,
>Mikael
>
>Ps. Your welcome to visit me at SLU if you would like to discuss
>experiences of genome annotations.
>
>
>> 
>> 3) maker_functional
>> This just throws an error about a missing Job ID, so no clue what this
>>would be used for.
>> 
>> I guess what I am after is some suggestion as to how use the scripts
>>included with Maker to achieve a reasonable functional annotation.
>> 
>> With kind regards,
>> 
>> Marc Hoeppner
>> 
>> Marc P. Hoeppner, PhD
>> Team Leader
>> BILS Genome Annotation Platform
>> Department for Medical Biochemistry and Microbiology
>> Uppsala University, Sweden
>> marc.hoeppner at imbim.uu.se
>> 
>> 
>> 
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Wed Feb 26 09:09:14 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 26 Feb 2014 09:09:14 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
Message-ID: <CF335A95.A4DE%carsonhh@gmail.com>

It will still work without est_forward.  It just works a little differently.
Keep in mind this was a hidden feature I used to find stubborn or hard to
find missing genes after reassembly of a genome.

If est_forward is provided, MAKER will parse the database to look for the
maker_coor tags early in the pipeline.  Then it will create a list of
locations to search, and it will search them even if there are no BLAST
results to seed the search (normally MAKER gets a BLAST result first and
then polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to
look for a match using all of chr1 as the input to exonerate even when BLAST
finds nothing (this is a very very slow search, but can help pick up one or
two stubborn genes that don?t remap well).  To allow this, MAKER gives
exonerate looser matching parameters (i.e. allows for single base pair
introns perhaps caused by assembly errors).  The logic here is that given
the fact that I already told MAKER that with some degree of confidence I
expect sequence A to map to to location X, it will try its hardest to make
it match. 

Without est_forward set, the maker_coor= flag still gets read in GI.pm at
line 1563, but only after a BLAST alignment has already seeded it to the
region (that BLAST result has the information in its description parameter).
MAKER will then ignore seeds completely outside of maker_coor. In addition
any BLAST seeds that overlap maker_coor will get the search space for
alignment polishing adjusted to match maker_coor exactly.  Also match
parameters for exonerate will not be relaxed as they were with est_forward.

As you can see the behavior, is slightly different (because it?s an
accidental feature).

Thanks,
Carson


From:  Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date:  Wednesday, February 26, 2014 at 6:37 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Mapping gene names

That might be a useful and time saving accidental feature. But, reading the
code, it seems that I need to supply maker_coor but not gene_id, as well as
the configuration option est_forward for this to work. Any occurrences of
maker_coor in GI.pm seems to be conditioned on set_forward=1 right?

Mikael

26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:

> Yes.  That should work as well as an accidental feature.
> 
> --Carson 
> 
> Sent from my iPhone
> 
> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se>
> wrote:
> 
>> Can this use of maker_coor be used only to hint about the placement of the
>> ests, without affecting the naming of the final genes? Ie if I have a
>> database of EST where I have a priori knowledge of their rough placement, can
>> this placement be given to maker without providing est_forward=1?
>> 
>> Thanks,
>> Mikael
>> 
>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>> 
>>> There is a way.  It?s not a standard option and it?s undocumented, but if
>>> you add est_forward=1 to the maker_opts.ctl file, then it will do just that.
>>> The option won?t already be there so you?ll have to type it in.
>>> 
>>> There is also a feature designed to work with this option.  If you add tags
>>> to your fasta headers, those can be used to guide the mapping and naming.
>>> For example, gene_id=<some_gene>  will ensure different isoforms that share
>>> a common gene_id get clustered into the same gene, and
>>> maker_coor=chr1:1-10000 in the fasta header will force a particular sequence
>>> to only be mapped against chr1 within the range of 1-10000 bp  and just
>>> using maker_coor=chr1 will force it to only be mapped against chr1.
>>> 
>>> This is an undocumented way to remap genes onto new assemblies using blast
>>> alignments of earlier transcript or protein annotations as a guide.
>>> 
>>> ?Carson
>>> 
>>> 
>>> 
>>> 
>>> From: Shaun Jackman <sjackman at gmail.com>
>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>> To: <maker-devel at yandell-lab.org>
>>> Subject: [maker-devel] Mapping gene names
>>> 
>>> Hi,
>>> 
>>> I?m annotating a genome using a closely related genome from Genbank, using
>>> the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate
>>> my genome. I?ve run Maker, and the annotation seems to have worked well. Is
>>> it possible to map the names of the genes from the related species to my
>>> annotation? I see the map_forward option, which applies to the model_gff
>>> parameter. Is there a similar option for est and protein?
>>> 
>>> maker_opts.ctl
>>> est=NC_123456.frn
>>> protein=NC_123456.faa
>>> est2genome=1
>>> protein2genome=1
>>> Thanks,
>>> Shaun
>>> _______________________________________________ maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/4889751f/attachment-0001.html>

From carson.holt at genetics.utah.edu  Wed Feb 26 09:38:37 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Wed, 26 Feb 2014 16:38:37 +0000
Subject: [maker-devel] change program locations in maker_exe
In-Reply-To: <CDD24D4E-4555-474F-9367-B6F6D05F11B4@gmail.com>
References: <CDD24D4E-4555-474F-9367-B6F6D05F11B4@gmail.com>
Message-ID: <CF33655B.A514%carson.holt@genetics.utah.edu>

MAKER first looks inside of .../maker/exe/ for any executables.  Then it
uses the systems ?which? command to identify executables in your PATH
environmental variable.  If MAKER is not finding the one you want, then
you can either put the program in the .../maker/exe/ folder (I.e. create
.../maker/exe/bin/  and then put soft links to the executables you want to
be used first), or you can rearrange the order of paraameters in your PATH
environmental variable so that ?which <program_name>? returns the location
you want.  If MAKER is always leaving the locations to those programs
empty, it is because you need to add them to your PATH environmental
variable.

Thanks,
Carson

On 2/26/14, 9:21 AM, "USFS Ion PGM" <nextgen.usfs at gmail.com> wrote:

>Hello,
>I was wondering if there is a way to make permanent changes to the
>maker_exe.ctl file, as it seems on the install that maker didn?t find the
>gene mark or pro build locations correctly, which means that I have to
>manually edit the maker_exe.ctl file every time and add that information.
> Where can I modify this permanently so that the maker -CTL command
>creates the appropriate maker_exe file?  Thank you.
>
>- Jon
>
>


From nextgen.usfs at gmail.com  Wed Feb 26 09:58:11 2014
From: nextgen.usfs at gmail.com (USFS Ion PGM)
Date: Wed, 26 Feb 2014 10:58:11 -0600
Subject: [maker-devel] change program locations in maker_exe
In-Reply-To: <CF33655B.A514%carson.holt@genetics.utah.edu>
References: <CDD24D4E-4555-474F-9367-B6F6D05F11B4@gmail.com>
	<CF33655B.A514%carson.holt@genetics.utah.edu>
Message-ID: <2FA61AAE-0548-4030-9F4A-6964A631703C@gmail.com>

Hi Carson,

Thank you - that did it, I didn?t have them in the PATH.  All working now.

Cheers,
Jon

On Feb 26, 2014, at 10:38 AM, Carson Holt <carson.holt at genetics.utah.edu> wrote:

> MAKER first looks inside of .../maker/exe/ for any executables.  Then it
> uses the systems ?which? command to identify executables in your PATH
> environmental variable.  If MAKER is not finding the one you want, then
> you can either put the program in the .../maker/exe/ folder (I.e. create
> .../maker/exe/bin/  and then put soft links to the executables you want to
> be used first), or you can rearrange the order of paraameters in your PATH
> environmental variable so that ?which <program_name>? returns the location
> you want.  If MAKER is always leaving the locations to those programs
> empty, it is because you need to add them to your PATH environmental
> variable.
> 
> Thanks,
> Carson
> 
> On 2/26/14, 9:21 AM, "USFS Ion PGM" <nextgen.usfs at gmail.com> wrote:
> 
>> Hello,
>> I was wondering if there is a way to make permanent changes to the
>> maker_exe.ctl file, as it seems on the install that maker didn?t find the
>> gene mark or pro build locations correctly, which means that I have to
>> manually edit the maker_exe.ctl file every time and add that information.
>> Where can I modify this permanently so that the maker -CTL command
>> creates the appropriate maker_exe file?  Thank you.
>> 
>> - Jon
>> 
>> 
> 


From weckalba at asu.edu  Wed Feb 26 13:05:05 2014
From: weckalba at asu.edu (Walter Eckalbar)
Date: Wed, 26 Feb 2014 12:05:05 -0800
Subject: [maker-devel] invalid gff3 format issues
In-Reply-To: <CF32B115.A46C%carsonhh@gmail.com>
References: <CANRPJScjqJDph_SMu0+8PaTMDT7aym9a3u_nhVihYa6BNxZ3AQ@mail.gmail.com>
	<BA9485A1-B761-4C33-A695-9FF6EF43B109@genetics.utah.edu>
	<CANRPJSdY6--A0QtTOUBNQM+HN7dWRDv1YZv7bi=+CVef8LLRXw@mail.gmail.com>
	<CF32B115.A46C%carsonhh@gmail.com>
Message-ID: <CANRPJSfTAZrey0m6usseLZ6Sj-2fOsMWe_q1_6-9yXvOiwm44w@mail.gmail.com>

Hi Carson,

Thanks, that seems to have mostly resolved the issue.  Oddly enough though,
PASA still complains about the GFF3 file directly from gff3_merge, but if I
first transform it with maker2eval_gtf, then use PASA's
gtf_to_gff3_format.pl script, everything seems to run fine.


On 25 February 2014 20:10, Carson Holt <carsonhh at gmail.com> wrote:

> Could you try version 2.31 (the current version)?  I believe this is
> happening because you are passing in MAKER genes as pred_gff the
> transcripts thus ended up with the same Names and IDs as the genes being
> generated by the MAKER run via SNAP etc.  This shouldn't happen with
> model_gff, and shouldn't happen in 2.31 (IDs and names are generated
> slightly differently in 2.30+).
>
> Thanks,
> Carson
>
> From: Walter Eckalbar <weckalba at asu.edu>
> Date: Tuesday, February 25, 2014 at 7:11 PM
> To: Daniel Ence <dence at genetics.utah.edu>
> Cc: "<maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] invalid gff3 format issues
>
> Hi Daniel, those have been uploaded and I'm using version 2.28.
>
> Walter
>
>
> On 25 February 2014 18:02, Daniel Ence <dence at genetics.utah.edu> wrote:
>
>> Hi Walter,
>>
>> Will you upload the full GFF3 and the control files that you used to this
>> URL?
>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189
>> Also, what version of MAKER are you running this with?
>>
>> Thanks,
>> Daniel
>>
>>
>>
>> On Feb 25, 2014, at 6:36 PM, Walter Eckalbar <weckalba at asu.edu>
>>  wrote:
>>
>> Hi all,
>>
>> I am trying to update maker annotations with PASA and encountered errors
>> stemming from file format issues in the gff3 file.
>>
>> I put a few lines from the gff3 to highlight the issue below.  Basically,
>> the problem is that there are non-unique IDs for a number of the
>> annotations.
>>
>> Is there anything that can be done to right this problem?
>>
>> Thanks,
>>
>> Walter
>>
>> Lines from GFF3 file, repeated IDs are highlighted:
>>
>>
>> chr1    maker    gene    9377440    9432028    .    -    .
>> ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16
>> chr1    maker    mRNA    9377440    9432028    .    -    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1;
>> Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234
>> chr1    maker    exon    9431899    9432028    .    -    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1
>> chr1    maker    exon    9431698    9431808    .    -    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1
>>
>> chr1    maker    gene    8894975    9021577    .    +    .
>> ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53
>> chr1    maker    mRNA    8894975    9021577    .    +    .   ID=maker-chr1-snap-gene-4.53-mRNA-1;
>> Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007
>> chr1    maker    exon    8894975    8895153    .    +    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
>> chr1    maker    exon    8942215    8942531    .    +    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>>
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/2d2f2884/attachment-0001.html>

From carsonhh at gmail.com  Wed Feb 26 14:12:23 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 26 Feb 2014 14:12:23 -0700
Subject: [maker-devel] invalid gff3 format issues
In-Reply-To: <CANRPJSfTAZrey0m6usseLZ6Sj-2fOsMWe_q1_6-9yXvOiwm44w@mail.gmail.com>
References: <CANRPJScjqJDph_SMu0+8PaTMDT7aym9a3u_nhVihYa6BNxZ3AQ@mail.gmail.com>
	<BA9485A1-B761-4C33-A695-9FF6EF43B109@genetics.utah.edu>
	<CANRPJSdY6--A0QtTOUBNQM+HN7dWRDv1YZv7bi=+CVef8LLRXw@mail.gmail.com>
	<CF32B115.A46C%carsonhh@gmail.com>
	<CANRPJSfTAZrey0m6usseLZ6Sj-2fOsMWe_q1_6-9yXvOiwm44w@mail.gmail.com>
Message-ID: <CF33A669.A53C%carsonhh@gmail.com>

Could you put the file in this GFF3 validator to see if anything comes up?
?> http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online

Maybe it?s just PASA.  But I?d like to know there?s no issue being caused by
something else.

Thanks,
Carson


From:  Walter Eckalbar <weckalba at asu.edu>
Date:  Wednesday, February 26, 2014 at 1:05 PM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Daniel Ence <dence at genetics.utah.edu>, "<maker-devel at yandell-lab.org>"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] invalid gff3 format issues

Hi Carson,

Thanks, that seems to have mostly resolved the issue.  Oddly enough though,
PASA still complains about the GFF3 file directly from gff3_merge, but if I
first transform it with maker2eval_gtf, then use PASA?s
gtf_to_gff3_format.pl <http://gtf_to_gff3_format.pl>  script, everything
seems to run fine.


On 25 February 2014 20:10, Carson Holt <carsonhh at gmail.com> wrote:
> Could you try version 2.31 (the current version)?  I believe this is happening
> because you are passing in MAKER genes as pred_gff the transcripts thus ended
> up with the same Names and IDs as the genes being generated by the MAKER run
> via SNAP etc.  This shouldn?t happen with model_gff, and shouldn?t happen in
> 2.31 (IDs and names are generated slightly differently in 2.30+).
> 
> Thanks,
> Carson
> 
> From:  Walter Eckalbar <weckalba at asu.edu>
> Date:  Tuesday, February 25, 2014 at 7:11 PM
> To:  Daniel Ence <dence at genetics.utah.edu>
> Cc:  "<maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org>
> Subject:  Re: [maker-devel] invalid gff3 format issues
> 
> Hi Daniel, those have been uploaded and I?m using version 2.28.
> 
> Walter
> 
> 
> On 25 February 2014 18:02, Daniel Ence <dence at genetics.utah.edu> wrote:
>> Hi Walter, 
>> 
>> Will you upload the full GFF3 and the control files that you used to this
>> URL?
>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189
>> Also, what version of MAKER are you running this with?
>> 
>> Thanks,
>> Daniel
>> 
>> 
>> 
>> On Feb 25, 2014, at 6:36 PM, Walter Eckalbar <weckalba at asu.edu>
>>  wrote:
>> 
>>> Hi all,
>>> 
>>> I am trying to update maker annotations with PASA and encountered errors
>>> stemming from file format issues in the gff3 file.
>>> 
>>> I put a few lines from the gff3 to highlight the issue below.  Basically,
>>> the problem is that there are non-unique IDs for a number of the
>>> annotations.
>>> 
>>> Is there anything that can be done to right this problem?
>>> 
>>> Thanks,
>>> 
>>> Walter
>>> 
>>> Lines from GFF3 file, repeated IDs are highlighted:
>>> 
>>> 
>>> chr1    maker    gene    9377440    9432028    .    -    .
>>> ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4
>>> .16
>>> chr1    maker    mRNA    9377440    9432028    .    -    .
>>> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-pred_gff_maker-gene-4.
>>> 16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.
>>> 82|1|1|1|28|1680|1234
>>> chr1    maker    exon    9431899    9432028    .    -    .
>>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.5
>>> 3-mRNA-1
>>> chr1    maker    exon    9431698    9431808    .    -    .
>>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.5
>>> 3-mRNA-1
>>> 
>>> chr1    maker    gene    8894975    9021577    .    +    .
>>> ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53
>>> chr1    maker    mRNA    8894975    9021577    .    +    .
>>> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-snap-gene-4.53;Name=ma
>>> ker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84
>>> |0.88|27|503|2007
>>> chr1    maker    exon    8894975    8895153    .    +    .
>>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.5
>>> 3-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,m
>>> aker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-
>>> snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-
>>> 4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-
>>> 10,maker-chr1-snap-gene-4.53-mRNA-11
>>> chr1    maker    exon    8942215    8942531    .    +    .
>>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.5
>>> 3-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,m
>>> aker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-
>>> snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-
>>> 4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-
>>> 10,maker-chr1-snap-gene-4.53-mRNA-11
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
> 
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak
> er-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/ea166d94/attachment-0001.html>

From mikael.durling at slu.se  Wed Feb 26 15:04:37 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Wed, 26 Feb 2014 22:04:37 +0000
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CF335A95.A4DE%carsonhh@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
Message-ID: <ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>

It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions.

In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature.

THanks,
Mikael

26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>:

It will still work without est_forward.  It just works a little differently.  Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome.

If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline.  Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well).  To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors).  The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match.

Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter).  MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly.  Also match parameters for exonerate will not be relaxed as they were with est_forward.

As you can see the behavior, is slightly different (because it?s an accidental feature).

Thanks,
Carson


From: Mikael Brandstr?m Durling <mikael.durling at slu.se<mailto:mikael.durling at slu.se>>
Date: Wednesday, February 26, 2014 at 6:37 AM
To: Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] Mapping gene names

That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right?

Mikael

26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>:

Yes.  That should work as well as an accidental feature.

--Carson

Sent from my iPhone

On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se<mailto:mikael.durling at slu.se>> wrote:

Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?

Thanks,
Mikael

26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>:

There is a way.  It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that.  The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option.  If you add tags to your fasta headers, those can be used to guide the mapping and naming.  For example, gene_id=<some_gene>  will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp  and just using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.

?Carson


From: Shaun Jackman <sjackman at gmail.com<mailto:sjackman at gmail.com>>
Reply-To: Shaun Jackman <sjackman at gmail.com<mailto:sjackman at gmail.com>>
Date: Tuesday, February 25, 2014 at 5:06 PM
To: <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: [maker-devel] Mapping gene names


Hi,

I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?

maker_opts.ctl

est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1


Thanks,
Shaun

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/0409040d/attachment-0001.html>

From carsonhh at gmail.com  Wed Feb 26 15:50:30 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 26 Feb 2014 15:50:30 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
Message-ID: <CF33B334.A551%carsonhh@gmail.com>

What you can do is run it once with just est_forward=1 and
est2genome/protein2genome set to 1.  Then take those results, pass them in
as model_gff and use the map_forward option to then filter the results based
on mRNA score and that would copy names onto new gene under the standard
MAKER pipeline.  Eventually it?s really supposed to go into a separate tool
that will map genes onto new assemblies (but under the hood the tool will
just be calling MAKER with certain parameters restricted).  I do this
because if people commonly use it mixed with things like SNAP I can start to
get some very weird behaviors.

Thanks,
Carson

From:  Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date:  Wednesday, February 26, 2014 at 3:04 PM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Mapping gene names

It seems that this could be a very useful option in those cases where you
have firm a priori knowledge of the placement of ESTs. However, while trying
it I note that est_forward implies that the est2genome predictor is turned
on, implicitly. Is this necessary for this to work? I?m after the behavior
you describe below where exonerate is made to try really hard within a
limited region to align an est, but I would not like maker to produce
est2genome predictions.

In general, I think this maker_coor and est_forward is a feature set that is
worthy to be promoted into a documented feature.

THanks,
Mikael

26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:

> It will still work without est_forward.  It just works a little differently.
> Keep in mind this was a hidden feature I used to find stubborn or hard to find
> missing genes after reassembly of a genome.
> 
> If est_forward is provided, MAKER will parse the database to look for the
> maker_coor tags early in the pipeline.  Then it will create a list of
> locations to search, and it will search them even if there are no BLAST
> results to seed the search (normally MAKER gets a BLAST result first and then
> polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to look for
> a match using all of chr1 as the input to exonerate even when BLAST finds
> nothing (this is a very very slow search, but can help pick up one or two
> stubborn genes that don?t remap well).  To allow this, MAKER gives exonerate
> looser matching parameters (i.e. allows for single base pair introns perhaps
> caused by assembly errors).  The logic here is that given the fact that I
> already told MAKER that with some degree of confidence I expect sequence A to
> map to to location X, it will try its hardest to make it match.
> 
> Without est_forward set, the maker_coor= flag still gets read in GI.pm at line
> 1563, but only after a BLAST alignment has already seeded it to the region
> (that BLAST result has the information in its description parameter).  MAKER
> will then ignore seeds completely outside of maker_coor. In addition any BLAST
> seeds that overlap maker_coor will get the search space for alignment
> polishing adjusted to match maker_coor exactly.  Also match parameters for
> exonerate will not be relaxed as they were with est_forward.
> 
> As you can see the behavior, is slightly different (because it?s an accidental
> feature).
> 
> Thanks,
> Carson
> 
> 
> 
> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
> Date: Wednesday, February 26, 2014 at 6:37 AM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Mapping gene names
> 
> That might be a useful and time saving accidental feature. But, reading the
> code, it seems that I need to supply maker_coor but not gene_id, as well as
> the configuration option est_forward for this to work. Any occurrences of
> maker_coor in GI.pm seems to be conditioned on set_forward=1 right?
> 
> Mikael
> 
> 26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
> 
>> Yes.  That should work as well as an accidental feature.
>> 
>> --Carson 
>> 
>> Sent from my iPhone
>> 
>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling
>> <mikael.durling at slu.se> wrote:
>> 
>>> Can this use of maker_coor be used only to hint about the placement of the
>>> ests, without affecting the naming of the final genes? Ie if I have a
>>> database of EST where I have a priori knowledge of their rough placement,
>>> can this placement be given to maker without providing est_forward=1?
>>> 
>>> Thanks,
>>> Mikael
>>> 
>>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>> 
>>>> There is a way.  It?s not a standard option and it?s undocumented, but if
>>>> you add est_forward=1 to the maker_opts.ctl file, then it will do just
>>>> that.  The option won?t already be there so you?ll have to type it in.
>>>> 
>>>> There is also a feature designed to work with this option.  If you add tags
>>>> to your fasta headers, those can be used to guide the mapping and naming.
>>>> For example, gene_id=<some_gene>  will ensure different isoforms that share
>>>> a common gene_id get clustered into the same gene, and
>>>> maker_coor=chr1:1-10000 in the fasta header will force a particular
>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp  and
>>>> just using maker_coor=chr1 will force it to only be mapped against chr1.
>>>> 
>>>> This is an undocumented way to remap genes onto new assemblies using blast
>>>> alignments of earlier transcript or protein annotations as a guide.
>>>> 
>>>> ?Carson
>>>> 
>>>> 
>>>> 
>>>> 
>>>> From: Shaun Jackman <sjackman at gmail.com>
>>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>>> To: <maker-devel at yandell-lab.org>
>>>> Subject: [maker-devel] Mapping gene names
>>>> 
>>>> Hi,
>>>> 
>>>> I?m annotating a genome using a closely related genome from Genbank, using
>>>> the .frn (RNA) and .faa (protein) files from Genbank as evidence to
>>>> annotate my genome. I?ve run Maker, and the annotation seems to have worked
>>>> well. Is it possible to map the names of the genes from the related species
>>>> to my annotation? I see the map_forward option, which applies to the
>>>> model_gff parameter. Is there a similar option for est and protein?
>>>> 
>>>> maker_opts.ctl
>>>> est=NC_123456.frn
>>>> protein=NC_123456.faa
>>>> est2genome=1
>>>> protein2genome=1
>>>> Thanks,
>>>> Shaun
>>>> _______________________________________________ maker-devel mailing list
>>>> maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>> _______________________________________________
>>>> maker-devel mailing list
>>>> maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/8981875a/attachment-0001.html>

From carsonhh at gmail.com  Wed Feb 26 16:45:30 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 26 Feb 2014 16:45:30 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CF33B334.A551%carsonhh@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
Message-ID: <B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>

Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff.

--Carson 

Sent from my iPhone

> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
> 
> What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1.  Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline.  Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted).  I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors. 
> 
> Thanks,
> Carson
> 
> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
> Date: Wednesday, February 26, 2014 at 3:04 PM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Mapping gene names
> 
> It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions.
> 
> In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature.
> 
> THanks,
> Mikael
> 
>> 26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>> 
>> It will still work without est_forward.  It just works a little differently.  Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome.
>> 
>> If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline.  Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well).  To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors).  The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match. 
>> 
>> Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter).  MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly.  Also match parameters for exonerate will not be relaxed as they were with est_forward.
>> 
>> As you can see the behavior, is slightly different (because it?s an accidental feature).
>> 
>> Thanks,
>> Carson
>> 
>> 
>> 
>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>> Date: Wednesday, February 26, 2014 at 6:37 AM
>> To: Carson Holt <carsonhh at gmail.com>
>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] Mapping gene names
>> 
>> That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right?
>> 
>> Mikael
>> 
>>> 26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>>> 
>>> Yes.  That should work as well as an accidental feature.
>>> 
>>> --Carson 
>>> 
>>> Sent from my iPhone
>>> 
>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se> wrote:
>>>> 
>>>> Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?
>>>> 
>>>> Thanks,
>>>> Mikael
>>>> 
>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>>>> 
>>>>> There is a way.  It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that.  The option won?t already be there so you?ll have to type it in.
>>>>> 
>>>>> There is also a feature designed to work with this option.  If you add tags to your fasta headers, those can be used to guide the mapping and naming.  For example, gene_id=<some_gene>  will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp  and just using maker_coor=chr1 will force it to only be mapped against chr1.
>>>>> 
>>>>> This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.
>>>>> 
>>>>> ?Carson
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> From: Shaun Jackman <sjackman at gmail.com>
>>>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>>>> To: <maker-devel at yandell-lab.org>
>>>>> Subject: [maker-devel] Mapping gene names
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?
>>>>> 
>>>>> maker_opts.ctl
>>>>> 
>>>>> est=NC_123456.frn
>>>>> protein=NC_123456.faa
>>>>> est2genome=1
>>>>> protein2genome=1
>>>>> Thanks,
>>>>> Shaun
>>>>> 
>>>>> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org 
>>>>> _______________________________________________
>>>>> maker-devel mailing list
>>>>> maker-devel at box290.bluehost.com
>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/4b8b7fdb/attachment-0001.html>

From bioinformatics.umd at gmail.com  Thu Feb 27 09:46:44 2014
From: bioinformatics.umd at gmail.com (UMD Bioinformatics)
Date: Thu, 27 Feb 2014 11:46:44 -0500
Subject: [maker-devel] Problem with OpenFabrics and infiniband
Message-ID: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com>

Hello,

I?ve had my IT folks install maker on our cluster at UMD. I?m having a SEGFAULT error when running maker on inifiniband nodes vs gigE nodes. According to the logs this appears to be an issue with forks but I?m not sure how to fix this. I would simply use the gigE nodes but we are in the process of updating everything to inifiniband so I?ll need to address this issue as some point. I?ve attached the error log from the MPI run as well as commentary from my HPCC team. 

IT suggestions

If you look at the top of the error log for the problematic job, it clearly
warns of an issue with doing 'fork's within openmpi/openfabrics framework.

In particular, the use of the fork system call is only partially supported
in the OpenFabrics software (this is the drivers, etc for the infiniband
connections). See e.g. 
http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork
for more information. In particular the paragraphs starting with the
sentence with the red highlighted "it does not mean that your fork()-calling 
application is safe". (The kernel, openMPI version, and OFED version are 
sufficiently recent to mean that there is _some_ fork support).

The fact that the job runs over gigE but not IB, in conjunction with the
warning from openmpi, strongly suggests that this is the issue that you are 
encountering. I suspect that maker touches registered memory before the fork,
which would result in a segfault (matching what was observed).

You can try adding the arguments
--mca mpi_warn_on_fork 0 
to the mpirun command, just in case the crash was somehow caused by openmpi's
warning, but I would not hold out much hope for that.

###UPDATE### This does not fix the problem.


Basically, it looks like maker uses some system calls like fork in a manner
which is incompatible with the current OpenFabrics software, and thus will
not work with infiniband. This situation is likely to remain until either
maker changes to be compatible with OFED, or OFED's support for the fork
system call is broadened.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140227/acd7e3ab/attachment-0002.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: maker_error_openfabrics.txt
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140227/acd7e3ab/attachment-0001.txt>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140227/acd7e3ab/attachment-0003.html>

From carson.holt at genetics.utah.edu  Thu Feb 27 11:09:21 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Thu, 27 Feb 2014 18:09:21 +0000
Subject: [maker-devel] Problem with OpenFabrics and infiniband
In-Reply-To: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com>
References: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com>
Message-ID: <CF34C944.A5B0%carson.holt@genetics.utah.edu>

It?s a little more complicated than that.  MAKER is written in Perl, and Perl doesn?t give me the low level access that a language like C would for controlling memory access (I don?t control that).  All I get is Perl?s standard implementation of forks.  So it?s not really a matter of MAKER changing, it would be a matter of changing Perl itself (which I have no power over, and I don?t think will be changing anytime soon).

For now you just have to add this flag to OpenMPI when running MAKER with mpiexec ?>  -mca btl ^openib

Example :
mpiexec -mca btl ^openib -n 20 maker


Thanks,
Carson


From: UMD Bioinformatics <bioinformatics.umd at gmail.com<mailto:bioinformatics.umd at gmail.com>>
Date: Thursday, February 27, 2014 at 9:46 AM
To: <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Problem with OpenFabrics and infiniband

Hello,

I?ve had my IT folks install maker on our cluster at UMD. I?m having a SEGFAULT error when running maker on inifiniband nodes vs gigE nodes. According to the logs this appears to be an issue with forks but I?m not sure how to fix this. I would simply use the gigE nodes but we are in the process of updating everything to inifiniband so I?ll need to address this issue as some point. I?ve attached the error log from the MPI run as well as commentary from my HPCC team.

IT suggestions

If you look at the top of the error log for the problematic job, it clearly
warns of an issue with doing 'fork's within openmpi/openfabrics framework.

In particular, the use of the fork system call is only partially supported
in the OpenFabrics software (this is the drivers, etc for the infiniband
connections). See e.g.
http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork
for more information. In particular the paragraphs starting with the
sentence with the red highlighted "it does not mean that your fork()-calling
application is safe". (The kernel, openMPI version, and OFED version are
sufficiently recent to mean that there is _some_ fork support).

The fact that the job runs over gigE but not IB, in conjunction with the
warning from openmpi, strongly suggests that this is the issue that you are
encountering. I suspect that maker touches registered memory before the fork,
which would result in a segfault (matching what was observed).

You can try adding the arguments
--mca mpi_warn_on_fork 0
to the mpirun command, just in case the crash was somehow caused by openmpi's
warning, but I would not hold out much hope for that.

###UPDATE### This does not fix the problem.


Basically, it looks like maker uses some system calls like fork in a manner
which is incompatible with the current OpenFabrics software, and thus will
not work with infiniband. This situation is likely to remain until either
maker changes to be compatible with OFED, or OFED's support for the fork
system call is broadened.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140227/062719d0/attachment-0001.html>

From bioinformatics.umd at gmail.com  Thu Feb 27 11:55:34 2014
From: bioinformatics.umd at gmail.com (UMD Bioinformatics)
Date: Thu, 27 Feb 2014 13:55:34 -0500
Subject: [maker-devel] Problem with OpenFabrics and infiniband
In-Reply-To: <CF34C944.A5B0%carson.holt@genetics.utah.edu>
References: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com>
	<CF34C944.A5B0%carson.holt@genetics.utah.edu>
Message-ID: <2840BC1C-70CC-4A0D-AB44-AEFD718C7B8C@gmail.com>

Hi Carson,

Thanks that fixed the issue. 

Cheers
Ian

On Feb 27, 2014, at 1:09 PM, Carson Holt <carson.holt at genetics.utah.edu> wrote:

> It?s a little more complicated than that.  MAKER is written in Perl, and Perl doesn?t give me the low level access that a language like C would for controlling memory access (I don?t control that).  All I get is Perl?s standard implementation of forks.  So it?s not really a matter of MAKER changing, it would be a matter of changing Perl itself (which I have no power over, and I don?t think will be changing anytime soon).
> 
> For now you just have to add this flag to OpenMPI when running MAKER with mpiexec ?>  -mca btl ^openib
> 
> Example :
>> mpiexec -mca btl ^openib -n 20 maker
> 
> 
> Thanks,
> Carson
> 
> 
> From: UMD Bioinformatics <bioinformatics.umd at gmail.com>
> Date: Thursday, February 27, 2014 at 9:46 AM
> To: <maker-devel at yandell-lab.org>
> Subject: Problem with OpenFabrics and infiniband
> 
> Hello,
> 
> I?ve had my IT folks install maker on our cluster at UMD. I?m having a SEGFAULT error when running maker on inifiniband nodes vs gigE nodes. According to the logs this appears to be an issue with forks but I?m not sure how to fix this. I would simply use the gigE nodes but we are in the process of updating everything to inifiniband so I?ll need to address this issue as some point. I?ve attached the error log from the MPI run as well as commentary from my HPCC team. 
> 
> IT suggestions
> 
> If you look at the top of the error log for the problematic job, it clearly
> warns of an issue with doing 'fork's within openmpi/openfabrics framework.
> 
> In particular, the use of the fork system call is only partially supported
> in the OpenFabrics software (this is the drivers, etc for the infiniband
> connections). See e.g. 
> http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork
> for more information. In particular the paragraphs starting with the
> sentence with the red highlighted "it does not mean that your fork()-calling 
> application is safe". (The kernel, openMPI version, and OFED version are 
> sufficiently recent to mean that there is _some_ fork support).
> 
> The fact that the job runs over gigE but not IB, in conjunction with the
> warning from openmpi, strongly suggests that this is the issue that you are 
> encountering. I suspect that maker touches registered memory before the fork,
> which would result in a segfault (matching what was observed).
> 
> You can try adding the arguments
> --mca mpi_warn_on_fork 0 
> to the mpirun command, just in case the crash was somehow caused by openmpi's
> warning, but I would not hold out much hope for that.
> 
> ###UPDATE### This does not fix the problem.
> 
> 
> Basically, it looks like maker uses some system calls like fork in a manner
> which is incompatible with the current OpenFabrics software, and thus will
> not work with infiniband. This situation is likely to remain until either
> maker changes to be compatible with OFED, or OFED's support for the fork
> system call is broadened.
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140227/c8d05f7d/attachment-0001.html>

From sjackman at gmail.com  Thu Feb 27 16:17:22 2014
From: sjackman at gmail.com (Shaun Jackman)
Date: Thu, 27 Feb 2014 15:17:22 -0800
Subject: [maker-devel] Mapping gene names
In-Reply-To: <B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
Message-ID: <etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>

Is there a corresponding?protein_forward=1 option to map forward protein names from protein2genome?

Cheers,
Shaun

On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote:

Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff.

--Carson?

Sent from my iPhone

On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:

What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. ?Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. ?Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). ?I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors.?

Thanks,
Carson

From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date: Wednesday, February 26, 2014 at 3:04 PM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions.

In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature.

THanks,
Mikael

26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:

It will still work without est_forward. ?It just works a little differently. ?Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome.

If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. ?Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). ?So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). ?To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). ?The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match.?

Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). ?MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. ?Also match parameters for exonerate will not be relaxed as they were with est_forward.

As you can see the behavior, is slightly different (because it?s an accidental feature).

Thanks,
Carson


From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date: Wednesday, February 26, 2014 at 6:37 AM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right?

Mikael

26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:

Yes. ?That should work as well as an accidental feature.

--Carson?

Sent from my iPhone

On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se> wrote:

Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?

Thanks,
Mikael

26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:

There is a way. ?It?s not a standard option and it?s undocumented, but if you add?est_forward=1 to the maker_opts.ctl file, then it will do just that. ?The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option. ?If you add tags to your fasta headers, those can be used to guide the mapping and naming. ?For example, gene_id=<some_gene> ?will ensure different isoforms that share a common gene_id get clustered into the same gene, and?maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp ?and just using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.

?Carson


From: Shaun Jackman <sjackman at gmail.com>
Reply-To: Shaun Jackman <sjackman at gmail.com>
Date: Tuesday, February 25, 2014 at 5:06 PM
To: <maker-devel at yandell-lab.org>
Subject: [maker-devel] Mapping gene names

Hi,

I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?

maker_opts.ctl


est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1

Thanks,
Shaun

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________  
maker-devel mailing list  
maker-devel at box290.bluehost.com  
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140227/15f5085c/attachment-0001.html>

From sjackman at gmail.com  Thu Feb 27 17:27:30 2014
From: sjackman at gmail.com (Shaun Jackman)
Date: Thu, 27 Feb 2014 16:27:30 -0800
Subject: [maker-devel] Mapping gene names
In-Reply-To: <etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
Message-ID: <CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>

Sorry, ignore my previous question. est_forward also carries forward the
names of protein evidence and works like a charm. Thank you!

The larger rrn16 and rrn23 genes annotated perfectly, but the smaller
rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They
are in the blastn output, and in the evidence_0.gff. rrn5 has perfect
identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value
(2e-66 < eval_blastn=1e-10). How should I debug which filter is removing
these hits?

organism_type=prokaryotic
est2genome=1
protein2genome=1
est_forward=1

Cheers,
Shaun


On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:

> Is there a corresponding protein_forward=1 option to map forward protein
> names from protein2genome?
>
> Cheers,
> Shaun
>
> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com<//carsonhh at gmail.com>)
> wrote:
>
> Sorry I meant to say prefilter on the score in the mRNA column before
> passing the gff3 to model_gff.
>
> --Carson
>
> Sent from my iPhone
>
> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
>
>  What you can do is run it once with just est_forward=1 and
> est2genome/protein2genome set to 1.  Then take those results, pass them in
> as model_gff and use the map_forward option to then filter the results
> based on mRNA score and that would copy names onto new gene under the
> standard MAKER pipeline.  Eventually it?s really supposed to go into a
> separate tool that will map genes onto new assemblies (but under the hood
> the tool will just be calling MAKER with certain parameters restricted).  I
> do this because if people commonly use it mixed with things like SNAP I can
> start to get some very weird behaviors.
>
> Thanks,
> Carson
>
>  From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
> Date: Wednesday, February 26, 2014 at 3:04 PM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Mapping gene names
>
>  It seems that this could be a very useful option in those cases where
> you have firm a priori knowledge of the placement of ESTs. However, while
> trying it I note that est_forward implies that the est2genome predictor is
> turned on, implicitly. Is this necessary for this to work? I?m after the
> behavior you describe below where exonerate is made to try really hard
> within a limited region to align an est, but I would not like maker to
> produce est2genome predictions.
>
> In general, I think this maker_coor and est_forward is a feature set that
> is worthy to be promoted into a documented feature.
>
> THanks,
> Mikael
>
>  26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>
>  It will still work without est_forward.  It just works a little
> differently.  Keep in mind this was a hidden feature I used to find
> stubborn or hard to find missing genes after reassembly of a genome.
>
> If est_forward is provided, MAKER will parse the database to look for the
> maker_coor tags early in the pipeline.  Then it will create a list of
> locations to search, and it will search them even if there are no BLAST
> results to seed the search (normally MAKER gets a BLAST result first and
> then polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to
> look for a match using all of chr1 as the input to exonerate even when
> BLAST finds nothing (this is a very very slow search, but can help pick up
> one or two stubborn genes that don?t remap well).  To allow this, MAKER
> gives exonerate looser matching parameters (i.e. allows for single base
> pair introns perhaps caused by assembly errors).  The logic here is that
> given the fact that I already told MAKER that with some degree of
> confidence I expect sequence A to map to to location X, it will try its
> hardest to make it match.
>
> Without est_forward set, the maker_coor= flag still gets read in GI.pm at
> line 1563, but only after a BLAST alignment has already seeded it to the
> region (that BLAST result has the information in its description
> parameter).  MAKER will then ignore seeds completely outside of maker_coor.
> In addition any BLAST seeds that overlap maker_coor will get the search
> space for alignment polishing adjusted to match maker_coor exactly.  Also
> match parameters for exonerate will not be relaxed as they were with
> est_forward.
>
> As you can see the behavior, is slightly different (because it?s an
> accidental feature).
>
> Thanks,
> Carson
>
>
>
>  From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
> Date: Wednesday, February 26, 2014 at 6:37 AM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Mapping gene names
>
>  That might be a useful and time saving accidental feature. But, reading
> the code, it seems that I need to supply maker_coor but not gene_id, as
> well as the configuration option est_forward for this to work. Any
> occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1
> right?
>
> Mikael
>
>  26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>
>  Yes.  That should work as well as an accidental feature.
>
> --Carson
>
> Sent from my iPhone
>
> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <
> mikael.durling at slu.se> wrote:
>
> Can this use of maker_coor be used only to hint about the placement of the
> ests, without affecting the naming of the final genes? Ie if I have a
> database of EST where I have a priori knowledge of their rough placement,
> can this placement be given to maker without providing est_forward=1?
>
> Thanks,
> Mikael
>
>  26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>
>  There is a way.  It?s not a standard option and it?s undocumented, but
> if you add est_forward=1 to the maker_opts.ctl file, then it will do just
> that.  The option won?t already be there so you?ll have to type it in.
>
> There is also a feature designed to work with this option.  If you add
> tags to your fasta headers, those can be used to guide the mapping and
> naming.  For example, gene_id=<some_gene>  will ensure different isoforms
> that share a common gene_id get clustered into the same gene,
> and maker_coor=chr1:1-10000 in the fasta header will force a particular
> sequence to only be mapped against chr1 within the range of 1-10000 bp  and
> just using maker_coor=chr1 will force it to only be mapped against chr1.
>
> This is an undocumented way to remap genes onto new assemblies using blast
> alignments of earlier transcript or protein annotations as a guide.
>
> ?Carson
>
>
>
>
>  From: Shaun Jackman <sjackman at gmail.com>
> Reply-To: Shaun Jackman <sjackman at gmail.com>
> Date: Tuesday, February 25, 2014 at 5:06 PM
> To: <maker-devel at yandell-lab.org>
> Subject: [maker-devel] Mapping gene names
>
>  Hi,
>
> I?m annotating a genome using a closely related genome from Genbank, using
> the .frn (RNA) and .faa (protein) files from Genbank as evidence to
> annotate my genome. I?ve run Maker, and the annotation seems to have worked
> well. Is it possible to map the names of the genes from the related species
> to my annotation? I see the *map_forward* option, which applies to the
> *model_gff* parameter. Is there a similar option for *est* and *protein*?
>
> *maker_opts.ctl*
>
> est=NC_123456.frn
> protein=NC_123456.faa
> est2genome=1
> protein2genome=1
>
> Thanks,
> Shaun
>  _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
>  http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
>
>   _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140227/1670be5a/attachment-0001.html>

From carsonhh at gmail.com  Thu Feb 27 18:13:06 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 27 Feb 2014 18:13:06 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
	<CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
Message-ID: <CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>

Set single_exon=1, and the minimum size to a smaller value.  I think it's set to 250 right now.  Also est2genome is looking for ORF, so if there is none (as with tRNAs) they probably won't get picked up.

--Carson 

Sent from my iPhone

> On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:
> 
> Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you!
> 
> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits?
> 
> organism_type=prokaryotic
> est2genome=1
> protein2genome=1
> est_forward=1
> Cheers,
> Shaun
> 
> 
> 
>> On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
>> Is there a corresponding protein_forward=1 option to map forward protein names from protein2genome?
>> 
>> Cheers,
>> Shaun
>> 
>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote:
>>> 
>>> Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff.
>>> 
>>> --Carson 
>>> 
>>> Sent from my iPhone
>>> 
>>> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>> 
>>>> What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1.  Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline.  Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted).  I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors. 
>>>> 
>>>> Thanks,
>>>> Carson
>>>> 
>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>> Date: Wednesday, February 26, 2014 at 3:04 PM
>>>> To: Carson Holt <carsonhh at gmail.com>
>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>> Subject: Re: [maker-devel] Mapping gene names
>>>> 
>>>> It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions.
>>>> 
>>>> In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature.
>>>> 
>>>> THanks,
>>>> Mikael
>>>> 
>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>>>>> 
>>>>> It will still work without est_forward.  It just works a little differently.  Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome.
>>>>> 
>>>>> If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline.  Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well).  To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors).  The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match. 
>>>>> 
>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter).  MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly.  Also match parameters for exonerate will not be relaxed as they were with est_forward.
>>>>> 
>>>>> As you can see the behavior, is slightly different (because it?s an accidental feature).
>>>>> 
>>>>> Thanks,
>>>>> Carson
>>>>> 
>>>>> 
>>>>> 
>>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM
>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>> 
>>>>> That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? 
>>>>> 
>>>>> Mikael
>>>>> 
>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>> 
>>>>>> Yes.  That should work as well as an accidental feature.
>>>>>> 
>>>>>> --Carson 
>>>>>> 
>>>>>> Sent from my iPhone
>>>>>> 
>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se> wrote:
>>>>>> 
>>>>>>> Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Mikael
>>>>>>> 
>>>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>>> 
>>>>>>>> There is a way.  It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that.  The option won?t already be there so you?ll have to type it in.
>>>>>>>> 
>>>>>>>> There is also a feature designed to work with this option.  If you add tags to your fasta headers, those can be used to guide the mapping and naming.  For example, gene_id=<some_gene>  will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp  and just using maker_coor=chr1 will force it to only be mapped against chr1.
>>>>>>>> 
>>>>>>>> This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.
>>>>>>>> 
>>>>>>>> ?Carson
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> From: Shaun Jackman <sjackman at gmail.com>
>>>>>>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>>>>>>> To: <maker-devel at yandell-lab.org>
>>>>>>>> Subject: [maker-devel] Mapping gene names
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?
>>>>>>>> 
>>>>>>>> maker_opts.ctl
>>>>>>>> 
>>>>>>>> est=NC_123456.frn
>>>>>>>> protein=NC_123456.faa
>>>>>>>> est2genome=1
>>>>>>>> protein2genome=1
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Shaun
>>>>>>>> 
>>>>>>>> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>>>> _______________________________________________
>>>>>>>> maker-devel mailing list
>>>>>>>> maker-devel at box290.bluehost.com
>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> _______________________________________________ 
>>> maker-devel mailing list 
>>> maker-devel at box290.bluehost.com 
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140227/a927fc81/attachment-0001.html>

From mikael.durling at slu.se  Fri Feb 28 03:40:30 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Fri, 28 Feb 2014 10:40:30 +0000
Subject: [maker-devel] maker_coor behaviour
Message-ID: <8CA99854-CF5B-4533-B625-0EDD5DFFCE8B@slu.se>

Hi,

in a previous thread, the maker_coor feature for ETSs was mentioned. I have been trying it out, without using it for mapping gene names. I have placed these ESTs by other means, an thought the maker_coor feature would be a good use of this a priori knowledge. My major problem i try to solve is that I find that some ESTs where I know where they should be aligned, are not recruited to that position by maker?s blastn->exonerate method (I find them on other scaffolds). So I thought maker_coor with the est_forward behavior (as described) would be a good option to force my evidence onto the correct position, instead of ending up supporting or braking other models. However, as soon as I run with maker_coor tagged est sequences, no est2genome evidence appears in the final gff3 file. The blastn evidence is there when est_forward is disabled, but as expected, there is no blastn evidence when est_forward is turned on. It seems though as the evidence is used, as the QI lines indicate EST support for both splice sites as well as exon alignments, but I have no way to visualize and/or evaluate the congruence of evidence and models. Would it be possible to tweak Maker into outputting the est2genome alignments when est_forward/maker_coor is used? I couldn?t figure myself where in the code this was handled.

I could of course do my own exonerate alignments of these ESTs and feed them into maker as est_gff, but if maker already has the machinery to to this, I thought it would be a good idea to use it.

Thanks,
Mikael


From carsonhh at gmail.com  Fri Feb 28 07:09:09 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 28 Feb 2014 07:09:09 -0700
Subject: [maker-devel] maker_coor behaviour
Message-ID: <CF35E345.A60A%carsonhh@gmail.com>

I wouldn?t use those options for standard de novo annotation.  There are
really other more appropriate thing that should be used instead.  Both
maker_coor and est_forward are destined to be part of a separate tool that
will secretly just be calling MAKER, but will allow me to control what
other parameters MAKER sees to avoid certain logic incompatibilities that
make sense when mapping entire genes onto a new assembly, but not really
for de novo annotation using ESTs.

You should instead try modifying these options in the maker_bopts.ctl file
?>

pcov_blastn= #Blastn Percent Coverage Threhold EST-Genome Alignments
pid_blastn= #Blastn Percent Identity Threshold EST-Genome Aligments
eval_blastn= #Blastn eval cutoff
bit_blastn= #Blastn bit cutoff
depth_blastn= #Blastn depth cutoff (0 to disable cutoff). For trimming
high evidence overlap regions

en_score_limit= #Exonerate nucleotide percent of maximal score threshold


If either blastn or est2genome results disappear, it is because they don?t
meet one of these thresholds (blastn results that don?t meet the
thresholds but are borderline are kept if exonerate does meet the
thresholds, but if exonerate misses a threshold they will be thrown out).
That is whey the EST in question gets thrown out and it?s why the blastn
result disappears when you try and anchor it with maker_coor.

You can visualize everything with a browser when your done.  I still
recommend the old version of Apollo for this (it?s just easier).  You can
try and install it using the ?./Build apollo? option from the
.../maker/src/ directory, and it will be installed in
.../maker/exe/apollo.  It requires that you have apache ant installed to
do this.  Otherwise just download it from the GMOD source forge page and
install it manually.

Thanks,
Carson


On 2/28/14, 3:40 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
wrote:

>Hi,
>
>in a previous thread, the maker_coor feature for ETSs was mentioned. I
>have been trying it out, without using it for mapping gene names. I have
>placed these ESTs by other means, an thought the maker_coor feature would
>be a good use of this a priori knowledge. My major problem i try to solve
>is that I find that some ESTs where I know where they should be aligned,
>are not recruited to that position by maker?s blastn->exonerate method (I
>find them on other scaffolds). So I thought maker_coor with the
>est_forward behavior (as described) would be a good option to force my
>evidence onto the correct position, instead of ending up supporting or
>braking other models. However, as soon as I run with maker_coor tagged
>est sequences, no est2genome evidence appears in the final gff3 file. The
>blastn evidence is there when est_forward is disabled, but as expected,
>there is no blastn evidence when est_forward is turned on. It seems
>though as the evidence is used, as the QI lines indicate EST support for
>both splice sites as well as exon alignments, but I have no way to
>visualize and/or evaluate the congruence of evidence and models. Would it
>be possible to tweak Maker into outputting the est2genome alignments when
>est_forward/maker_coor is used? I couldn?t figure myself where in the
>code this was handled.
>
>I could of course do my own exonerate alignments of these ESTs and feed
>them into maker as est_gff, but if maker already has the machinery to to
>this, I thought it would be a good idea to use it.
>
>Thanks,
>Mikael
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From rbharris at uw.edu  Fri Feb 28 13:14:55 2014
From: rbharris at uw.edu (Rebecca Harris)
Date: Fri, 28 Feb 2014 12:14:55 -0800
Subject: [maker-devel] error in snap training
In-Reply-To: <16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com>
References: <CAESS276MjRUmto+9fkr68jRXBE9or4geWB-q4Oc5_qKsQOdnpA@mail.gmail.com>
	<16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com>
Message-ID: <CAESS277JnyDD48DQvpKtw_kDw1xqOnGR-Fiqu-PoOPaesO3Oug@mail.gmail.com>

Hi -

I tried this and ran cegma --genome on my original fasta file. I then tried
to use cegama2zff to convert, fathom, and forge. However, when I try to
generate new parameters with forge, I get the same error that I got when
trying to train SNAP without CEGMA: "ZOE ERROR (from forge): impossible
error5 KOG1342.20". Any suggestions would be great,
thanks!

Cheers,
Rebecca


On Tue, Feb 25, 2014 at 2:12 PM, Carson Holt <carsonhh at gmail.com> wrote:

> Make sure you are using 2.31,  and then try the maker2zff filters
> individually.  If the protein models are not working well, use CEGMA to
> generate models. It's from the same group as SNAP.  Use cegma2zff for the
> conversion.
>
> --Carson
>
> Sent from my iPhone
>
> > On Feb 25, 2014, at 2:49 PM, Rebecca Harris <rbharris at uw.edu> wrote:
> >
> > Hey -
> >
> > I'm trying to train SNAP and am running into errors. I don't have any
> EST evidence, just protein. My .gff file reports 10865 genes but when I run
> maker2zff  -c0 -e0 I get back empty genome files. When I run maker2zff -n,
> a ton of overlap_prev_exon errors get written to the screen and then with I
> get to the forge step I get an "impossible error5". Any help would be
> greatly appreciated.
> >
> > Thanks!
> > Rebecca
> > _______________________________________________
> > maker-devel mailing list
> > maker-devel at box290.bluehost.com
> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140228/4957d69e/attachment-0001.html>

From carsonhh at gmail.com  Fri Feb 28 13:22:12 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 28 Feb 2014 13:22:12 -0700
Subject: [maker-devel] error in snap training
In-Reply-To: <CAESS277JnyDD48DQvpKtw_kDw1xqOnGR-Fiqu-PoOPaesO3Oug@mail.gmail.com>
References: <CAESS276MjRUmto+9fkr68jRXBE9or4geWB-q4Oc5_qKsQOdnpA@mail.gmail.com>
	<16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com>
	<CAESS277JnyDD48DQvpKtw_kDw1xqOnGR-Fiqu-PoOPaesO3Oug@mail.gmail.com>
Message-ID: <CF363CE6.A6B6%carsonhh@gmail.com>

If it?s failing both ways I?m thinking this may be SNAP itself. Try these
two different versions of SNAP.

?> http://korflab.ucdavis.edu/Software/snap-2013-02-16.tar.gz
and 
?> http://korflab.ucdavis.edu/Software/snap-2013-11-29.tar.gz

If they both fail then contact the SNAP development group ?> korflab AT
ucdavis DOT edu

Thanks,
Carson


From:  Rebecca Harris <rbharris at uw.edu>
Date:  Friday, February 28, 2014 at 1:14 PM
To:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] error in snap training

Hi -

I tried this and ran cegma --genome on my original fasta file. I then tried
to use cegama2zff to convert, fathom, and forge. However, when I try to
generate new parameters with forge, I get the same error that I got when
trying to train SNAP without CEGMA: "ZOE ERROR (from forge): impossible
error5 KOG1342.20". Any suggestions would be great,
thanks!

Cheers,
Rebecca


On Tue, Feb 25, 2014 at 2:12 PM, Carson Holt <carsonhh at gmail.com> wrote:
> Make sure you are using 2.31,  and then try the maker2zff filters
> individually.  If the protein models are not working well, use CEGMA to
> generate models. It's from the same group as SNAP.  Use cegma2zff for the
> conversion.
> 
> --Carson
> 
> Sent from my iPhone
> 
>> > On Feb 25, 2014, at 2:49 PM, Rebecca Harris <rbharris at uw.edu> wrote:
>> >
>> > Hey -
>> >
>> > I'm trying to train SNAP and am running into errors. I don't have any EST
>> evidence, just protein. My .gff file reports 10865 genes but when I run
>> maker2zff  -c0 -e0 I get back empty genome files. When I run maker2zff -n, a
>> ton of overlap_prev_exon errors get written to the screen and then with I get
>> to the forge step I get an "impossible error5". Any help would be greatly
>> appreciated.
>> >
>> > Thanks!
>> > Rebecca
>> > _______________________________________________
>> > maker-devel mailing list
>> > maker-devel at box290.bluehost.com
>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140228/e77809ff/attachment-0001.html>

From darasappan at gmail.com  Mon Feb  3 09:31:16 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Mon, 3 Feb 2014 10:31:16 -0600
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
Message-ID: <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>

Hi Daniel,

I was able to check on some of those questions.

1. From trinity assembly: I started with 102000 contigs. I used  
trinotate to annotate proteins in this.

I ran maker on this data with est2genome set to 1. The output looks  
like this (most important parts on top):

     6653 gene
    46675 exon
  280534 protein_match
59934 CDS
     969 contig
  105388 expressed_sequence_match
   12584 five_prime_UTR
   78565 match
1401369 match_part
   10180 mRNA
   11545 three_prime_UTR

2. From cufflinks assembly: I started with 133380 entries (out of  
which there are 29,000 transcripts).  I used the protein sequences  
from trinity assembly.

I ran maker on this data with est2genome set to 1. The output looks  
like this:
      29 gene
      75 exon
  573659 protein_match
67 CDS
    1099 contig
  269298 expressed_sequence_match
      23 five_prime_UTR
  173844 match
2221846 match_part
      29 mRNA
      23 three_prime_UTR

The genes annotated using the trinity assembly is lower than expected,  
so I went the cufflinks route. I dont understand why when using the  
cufflinks transcripts, even less genes are being found.

3. Training SNAP:  I used the results of maker from 1 to train SNAP.   
I then used that training set to rerun maker:
snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
maker_mpi_withAlltrinity/snap/RHA.hmm
est2genome=0

And again I got results with no entries for gene, exon, CDS etc.
957 contig
   46555 expressed_sequence_match
   43651 match
  553633 match_part
  113738 protein_match

As I mentioned in another email, cegma results indicated that the  
genome was more than 90% complete. Any suggestions would be helpful.

Thank you
Dhivya


On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:

> Hi Dhivya,
>
> I think there a few numbers that could be helpful to understand  
> what's happening here.
>
> How many transcripts did Trinity assembly the RNA-seq data into?  
> Also, you had 29,000 transcripts from cufflinks, but fewer from  
> MAKER when you gave it the cufflinks data. How many transcripts did  
> MAKER identify with the cufflinks data? Did you still get more than  
> the 10,000 transcripts that you found with just the Trinity data?
>
> A key part of MAKER's approach to genome annotation that might be  
> affecting it's performance is that it only annotates a gene where  
> there is both evidence (like your RNA-seq data) and an ab-initio  
> prediction. If a prediction is unsupported by the evidence, then  
> MAKER won't annotate a gene and if evidence aligns where there's no  
> prediction, MAKER won't annotate a gene either. What ab-initio  
> predictors are you using and have they been trained specific genome?
>
> You can force MAKER to automatically promote evidence alignments to  
> a gene model by setting the est2genome option to 1, but that will  
> usually give you many false positives.
>
> Try rerunning it with either the Trinity data or the Cufflinks data  
> and with est2genome set to 1, and let us know how that affects the  
> MAKER results.
>
> Thanks,
> Daniel
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ________________________________________
> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of  
> dhivya arasappan [darasappan at gmail.com]
> Sent: Thursday, January 30, 2014 11:18 AM
> To: maker-devel at yandell-lab.org
> Subject: [maker-devel] maker annotation with cufflinks output
>
> Hello,
>
> I am trying to annotate a 200 mb plant genome for which I have a very
> good assembly.
>
> I tried to denovo assemble RNA-seq data using trinity and ran maker
> using my genome assembly and the trinity results.  I did not get as
> many transcripts as expected, around 10,000 transcripts.
>
> So, I decided to try a different approach.  I did a genome assisted
> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
> generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
> genome assembly and the cufflinks result.  I get much less number of
> transcripts as a result.
>
> If cufflinks found 29000 transcripts by mapping to the genome, I'm
> confused as to why maker is not finding the same.
>
> Any suggestions would be appreciated.
>
> Thanks
> Dhivya
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- 
> lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140203/f454f816/attachment-0002.html>

From rebzi87 at gmail.com  Tue Feb  4 15:29:41 2014
From: rebzi87 at gmail.com (Rebecca Harris)
Date: Tue, 4 Feb 2014 14:29:41 -0800
Subject: [maker-devel] maker output
Message-ID: <CAESS275oypPL7CUMF2QaV3MKxNtNtXppYdF3exxFQMKSA7Vqdw@mail.gmail.com>

Hi,

I'm running maker on a cluster and am having some problems with the run
ending prematurely. I would like to know if there is a straightforward way
to figure out whether maker has completed. I've tried: 1) counting the
number of run.log files in the datastore directly, and 2) counting the
instances of "FINISHED" in the master_datastore_index.log. These numbers
are inconsistent. I have 200,000 contigs in my fasta file - do I expect
200,000 run.log files? I've had to restart maker a few times - it appears
that maker is appending to the master_datastore_index.log, as I find
multiple instances of the same contig being finished.

Thanks!

Cheers,
Rebecca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140204/690873a4/attachment-0002.html>

From darasappan at gmail.com  Tue Feb  4 15:43:19 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Tue, 4 Feb 2014 16:43:19 -0600
Subject: [maker-devel] Fwd:  maker annotation with cufflinks output
References: <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
Message-ID: <EAFE0808-FDA7-49E5-8FD6-9AFD570DF20C@gmail.com>

Resending this since it didnt make it to the mailing list before.

>
> I was able to check on some of those questions.
>
> 1. From trinity assembly: I started with 102000 contigs. I used  
> trinotate to annotate proteins in this.
>
> I ran maker on this data with est2genome set to 1. The output looks  
> like this (most important parts on top):
>
>     6653 gene
>    46675 exon
>  280534 protein_match
> 59934 CDS
>     969 contig
>  105388 expressed_sequence_match
>   12584 five_prime_UTR
>   78565 match
> 1401369 match_part
>   10180 mRNA
>   11545 three_prime_UTR
>
> 2. From cufflinks assembly: I started with 133380 entries (out of  
> which there are 29,000 transcripts).  I used the protein sequences  
> from trinity assembly.
>
> I ran maker on this data with est2genome set to 1. The output looks  
> like this:
>      29 gene
>      75 exon
>  573659 protein_match
> 67 CDS
>    1099 contig
>  269298 expressed_sequence_match
>      23 five_prime_UTR
>  173844 match
> 2221846 match_part
>      29 mRNA
>      23 three_prime_UTR
>
> The genes annotated using the trinity assembly is lower than  
> expected, so I went the cufflinks route. I dont understand why when  
> using the cufflinks transcripts, even less genes are being found.
>
> 3. Training SNAP:  I used the results of maker from 1 to train  
> SNAP.  I then used that training set to rerun maker:
> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
> maker_mpi_withAlltrinity/snap/RHA.hmm
> est2genome=0
>
> And again I got results with no entries for gene, exon, CDS etc.
> 957 contig
>   46555 expressed_sequence_match
>   43651 match
>  553633 match_part
>  113738 protein_match
>
> As I mentioned in another email, cegma results indicated that the  
> genome was more than 90% complete. Any suggestions would be helpful.
>
> Thank you
> Dhivya
>
>
>
>
> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>
>> Hi Dhivya,
>>
>> I think there a few numbers that could be helpful to understand  
>> what's happening here.
>>
>> How many transcripts did Trinity assembly the RNA-seq data into?  
>> Also, you had 29,000 transcripts from cufflinks, but fewer from  
>> MAKER when you gave it the cufflinks data. How many transcripts did  
>> MAKER identify with the cufflinks data? Did you still get more than  
>> the 10,000 transcripts that you found with just the Trinity data?
>>
>> A key part of MAKER's approach to genome annotation that might be  
>> affecting it's performance is that it only annotates a gene where  
>> there is both evidence (like your RNA-seq data) and an ab-initio  
>> prediction. If a prediction is unsupported by the evidence, then  
>> MAKER won't annotate a gene and if evidence aligns where there's no  
>> prediction, MAKER won't annotate a gene either. What ab-initio  
>> predictors are you using and have they been trained specific genome?
>>
>> You can force MAKER to automatically promote evidence alignments to  
>> a gene model by setting the est2genome option to 1, but that will  
>> usually give you many false positives.
>>
>> Try rerunning it with either the Trinity data or the Cufflinks data  
>> and with est2genome set to 1, and let us know how that affects the  
>> MAKER results.
>>
>> Thanks,
>> Daniel
>>
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> ________________________________________
>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf  
>> of dhivya arasappan [darasappan at gmail.com]
>> Sent: Thursday, January 30, 2014 11:18 AM
>> To: maker-devel at yandell-lab.org
>> Subject: [maker-devel] maker annotation with cufflinks output
>>
>> Hello,
>>
>> I am trying to annotate a 200 mb plant genome for which I have a very
>> good assembly.
>>
>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>> using my genome assembly and the trinity results.  I did not get as
>> many transcripts as expected, around 10,000 transcripts.
>>
>> So, I decided to try a different approach.  I did a genome assisted
>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using  
>> my
>> genome assembly and the cufflinks result.  I get much less number of
>> transcripts as a result.
>>
>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>> confused as to why maker is not finding the same.
>>
>> Any suggestions would be appreciated.
>>
>> Thanks
>> Dhivya
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140204/b1755e26/attachment-0002.html>

From dence at genetics.utah.edu  Tue Feb  4 15:42:52 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Tue, 4 Feb 2014 22:42:52 +0000
Subject: [maker-devel] maker output
In-Reply-To: <CAESS275oypPL7CUMF2QaV3MKxNtNtXppYdF3exxFQMKSA7Vqdw@mail.gmail.com>
References: <CAESS275oypPL7CUMF2QaV3MKxNtNtXppYdF3exxFQMKSA7Vqdw@mail.gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D43E51@mxb2.hg.genetics.utah.edu>

Hi Rebecca, If you're looking at the master_datastore_index.log, then you're looking for lines with the "FINISHED" status. If you do a count on those (with "grep -c" for example), that will tell you how many contigs have finished.

If you have 200,000,000 contigs that you're trying to annotate, you might also consider settinng the "min_contig" parameter in the maker_opts.ctl file. This parameter sets a minimum length for a contig before MAKER tries to annotate it. Usually 5000 bp or larger is what you want. That will save you some time in the long run.

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Rebecca Harris [rebzi87 at gmail.com]
Sent: Tuesday, February 04, 2014 3:29 PM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] maker output

Hi,

I'm running maker on a cluster and am having some problems with the run ending prematurely. I would like to know if there is a straightforward way to figure out whether maker has completed. I've tried: 1) counting the number of run.log files in the datastore directly, and 2) counting the instances of "FINISHED" in the master_datastore_index.log. These numbers are inconsistent. I have 200,000 contigs in my fasta file - do I expect 200,000 run.log files? I've had to restart maker a few times - it appears that maker is appending to the master_datastore_index.log, as I find multiple instances of the same contig being finished.

Thanks!

Cheers,
Rebecca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140204/ce6b2734/attachment-0002.html>

From mikael.durling at slu.se  Tue Feb  4 15:49:46 2014
From: mikael.durling at slu.se (=?iso-8859-1?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Tue, 4 Feb 2014 22:49:46 +0000
Subject: [maker-devel] maker output
In-Reply-To: <CAESS275oypPL7CUMF2QaV3MKxNtNtXppYdF3exxFQMKSA7Vqdw@mail.gmail.com>
References: <CAESS275oypPL7CUMF2QaV3MKxNtNtXppYdF3exxFQMKSA7Vqdw@mail.gmail.com>
Message-ID: <D36EEC49-FC5A-4DB8-BF08-795103F1B485@slu.se>

> 4 feb 2014 kl. 23:32 skrev "Rebecca Harris" <rebzi87 at gmail.com>:
> 
> Hi,
> 
> I'm running maker on a cluster and am having some problems with the run ending prematurely. I would like to know if there is a straightforward way to figure out whether maker has completed. I've tried: 1) counting the number of run.log files in the datastore directly, and 2) counting the instances of "FINISHED" in the master_datastore_index.log.

This is usually what I do to check if maker has finished all scaffolds. There should be one FINISHED statement for each entry in the scata file. (It might be one for every scaffold longer than the gjven minimum length. 

> These numbers are inconsistent. I have 200,000 contigs in my fasta file - do I expect 200,000 run.log files? I've had to restart maker a few times - it appears that maker is appending to the master_datastore_index.log, as I find multiple instances of the same contig being finished. 

Run maker -daindex to rebuild the file if you like. The number of FINISHED should not change though

Mikael

> 
> Thanks!
> 
> Cheers,
> Rebecca
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Tue Feb  4 15:50:10 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 04 Feb 2014 15:50:10 -0700
Subject: [maker-devel] maker output
In-Reply-To: <CAESS275oypPL7CUMF2QaV3MKxNtNtXppYdF3exxFQMKSA7Vqdw@mail.gmail.com>
References: <CAESS275oypPL7CUMF2QaV3MKxNtNtXppYdF3exxFQMKSA7Vqdw@mail.gmail.com>
Message-ID: <CF16BBC3.9807%carsonhh@gmail.com>

Clusters are notoriously flakey, so maker is restartable (hence the need for
the log file).  Also since multiple nodes may write simultaneously to the
log, they can munge it?s contents.   You can rerun maker with the -dsindex
flag to regenerate the master_datastore_index.log as well without processing
anything else. You can even delete it before rebuilding it if you want to
ensure all entries are uniq (run on a single cpus when you do this).

Then count the number of FINISHED entries in the log.

Thanks,
Carson


From:  Rebecca Harris <rebzi87 at gmail.com>
Date:  Tuesday, February 4, 2014 at 3:29 PM
To:  <maker-devel at yandell-lab.org>
Subject:  [maker-devel] maker output

Hi,

I'm running maker on a cluster and am having some problems with the run
ending prematurely. I would like to know if there is a straightforward way
to figure out whether maker has completed. I've tried: 1) counting the
number of run.log files in the datastore directly, and 2) counting the
instances of "FINISHED" in the master_datastore_index.log. These numbers are
inconsistent. I have 200,000 contigs in my fasta file - do I expect 200,000
run.log files? I've had to restart maker a few times - it appears that maker
is appending to the master_datastore_index.log, as I find multiple instances
of the same contig being finished.

Thanks!

Cheers,
Rebecca
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140204/9fedef33/attachment-0002.html>

From carsonhh at gmail.com  Wed Feb  5 11:38:50 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 05 Feb 2014 11:38:50 -0700
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
Message-ID: <CF17D1FC.987A%carsonhh@gmail.com>

Do you have any features of type snap in your results from step 3?  We?ve
had a couple of recent posts where after training snap was giving no
results, and as a result maker couldn?t give any genes.  One cause of
something like that may be your step 2.  Make sure the ZFF wasn?t empty you
used to train with.  The maker2zff script uses filters to only put the best
genes in the off file, and if all your genes fail the filtering then you are
training with an empty ZFF.

Also you should use proteins from a related species as your protein file.  I
see that you protein marches are varying wildly from run to run? So is your
contig count?  Were the subset of contigs you have results for long enough
to contain genes?

?Carson

From:  dhivya arasappan <darasappan at gmail.com>
Date:  Monday, February 3, 2014 at 9:31 AM
To:  Daniel Ence <dence at genetics.utah.edu>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] maker annotation with cufflinks output

Hi Daniel,

I was able to check on some of those questions.

1. From trinity assembly: I started with 102000 contigs. I used trinotate to
annotate proteins in this.

I ran maker on this data with est2genome set to 1. The output looks like
this (most important parts on top):

    6653 gene
   46675 exon
 280534 protein_match
59934 CDS
    969 contig
 105388 expressed_sequence_match
  12584 five_prime_UTR
  78565 match
1401369 match_part
  10180 mRNA
  11545 three_prime_UTR

2. From cufflinks assembly: I started with 133380 entries (out of which
there are 29,000 transcripts).  I used the protein sequences from trinity
assembly.

I ran maker on this data with est2genome set to 1. The output looks like
this:
     29 gene
     75 exon
 573659 protein_match
67 CDS
   1099 contig
 269298 expressed_sequence_match
     23 five_prime_UTR
 173844 match
2221846 match_part
     29 mRNA
     23 three_prime_UTR

The genes annotated using the trinity assembly is lower than expected, so I
went the cufflinks route. I dont understand why when using the cufflinks
transcripts, even less genes are being found.

3. Training SNAP:  I used the results of maker from 1 to train SNAP.  I then
used that training set to rerun maker:
snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/sn
ap/RHA.hmm
est2genome=0

And again I got results with no entries for gene, exon, CDS etc.
957 contig
  46555 expressed_sequence_match
  43651 match
 553633 match_part
 113738 protein_match

As I mentioned in another email, cegma results indicated that the genome was
more than 90% complete. Any suggestions would be helpful.

Thank you
Dhivya


On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:

> Hi Dhivya, 
> 
> I think there a few numbers that could be helpful to understand what's
> happening here. 
> 
> How many transcripts did Trinity assembly the RNA-seq data into? Also, you had
> 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it the
> cufflinks data. How many transcripts did MAKER identify with the cufflinks
> data? Did you still get more than the 10,000 transcripts that you found with
> just the Trinity data?
> 
> A key part of MAKER's approach to genome annotation that might be affecting
> it's performance is that it only annotates a gene where there is both evidence
> (like your RNA-seq data) and an ab-initio prediction. If a prediction is
> unsupported by the evidence, then MAKER won't annotate a gene and if evidence
> aligns where there's no prediction, MAKER won't annotate a gene either. What
> ab-initio predictors are you using and have they been trained specific genome?
> 
> You can force MAKER to automatically promote evidence alignments to a gene
> model by setting the est2genome option to 1, but that will usually give you
> many false positives.
> 
> Try rerunning it with either the Trinity data or the Cufflinks data and with
> est2genome set to 1, and let us know how that affects the MAKER results.
> 
> Thanks,
> Daniel
> 
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ________________________________________
> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya
> arasappan [darasappan at gmail.com]
> Sent: Thursday, January 30, 2014 11:18 AM
> To: maker-devel at yandell-lab.org
> Subject: [maker-devel] maker annotation with cufflinks output
> 
> Hello,
> 
> I am trying to annotate a 200 mb plant genome for which I have a very
> good assembly.
> 
> I tried to denovo assemble RNA-seq data using trinity and ran maker
> using my genome assembly and the trinity results.  I did not get as
> many transcripts as expected, around 10,000 transcripts.
> 
> So, I decided to try a different approach.  I did a genome assisted
> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
> generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
> genome assembly and the cufflinks result.  I get much less number of
> transcripts as a result.
> 
> If cufflinks found 29000 transcripts by mapping to the genome, I'm
> confused as to why maker is not finding the same.
> 
> Any suggestions would be appreciated.
> 
> Thanks
> Dhivya
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140205/2bbca2c5/attachment-0002.html>

From dence at genetics.utah.edu  Wed Feb  5 12:28:48 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Wed, 5 Feb 2014 19:28:48 +0000
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <CF17D1FC.987A%carsonhh@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>,
	<CF17D1FC.987A%carsonhh@gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D43F95@mxb2.hg.genetics.utah.edu>

Hi Dhivya, Are the protein matches in your results coming from your annotations of the transcriptome? You should really use amino-acid sequences from related organisms and some kind of omnibus source like SwissProt.

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: Carson Holt [carsonhh at gmail.com]
Sent: Wednesday, February 05, 2014 11:38 AM
To: dhivya arasappan; Daniel Ence
Cc: maker-devel at yandell-lab.org
Subject: Re: [maker-devel] maker annotation with cufflinks output

Do you have any features of type snap in your results from step 3?  We?ve had a couple of recent posts where after training snap was giving no results, and as a result maker couldn?t give any genes.  One cause of something like that may be your step 2.  Make sure the ZFF wasn?t empty you used to train with.  The maker2zff script uses filters to only put the best genes in the off file, and if all your genes fail the filtering then you are training with an empty ZFF.

Also you should use proteins from a related species as your protein file.  I see that you protein marches are varying wildly from run to run? So is your contig count?  Were the subset of contigs you have results for long enough to contain genes?

?Carson

From: dhivya arasappan <darasappan at gmail.com<mailto:darasappan at gmail.com>>
Date: Monday, February 3, 2014 at 9:31 AM
To: Daniel Ence <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] maker annotation with cufflinks output

Hi Daniel,

I was able to check on some of those questions.

1. From trinity assembly: I started with 102000 contigs. I used trinotate to annotate proteins in this.

I ran maker on this data with est2genome set to 1. The output looks like this (most important parts on top):

    6653 gene
   46675 exon
 280534 protein_match
59934 CDS
    969 contig
 105388 expressed_sequence_match
  12584 five_prime_UTR
  78565 match
1401369 match_part
  10180 mRNA
  11545 three_prime_UTR

2. From cufflinks assembly: I started with 133380 entries (out of which there are 29,000 transcripts).  I used the protein sequences from trinity assembly.

I ran maker on this data with est2genome set to 1. The output looks like this:
     29 gene
     75 exon
 573659 protein_match
67 CDS
   1099 contig
 269298 expressed_sequence_match
     23 five_prime_UTR
 173844 match
2221846 match_part
     29 mRNA
     23 three_prime_UTR

The genes annotated using the trinity assembly is lower than expected, so I went the cufflinks route. I dont understand why when using the cufflinks transcripts, even less genes are being found.

3. Training SNAP:  I used the results of maker from 1 to train SNAP.  I then used that training set to rerun maker:
snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap/RHA.hmm
est2genome=0

And again I got results with no entries for gene, exon, CDS etc.
957 contig
  46555 expressed_sequence_match
  43651 match
 553633 match_part
 113738 protein_match

As I mentioned in another email, cegma results indicated that the genome was more than 90% complete. Any suggestions would be helpful.

Thank you
Dhivya


On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:

Hi Dhivya,

I think there a few numbers that could be helpful to understand what's happening here.

How many transcripts did Trinity assembly the RNA-seq data into? Also, you had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it the cufflinks data. How many transcripts did MAKER identify with the cufflinks data? Did you still get more than the 10,000 transcripts that you found with just the Trinity data?

A key part of MAKER's approach to genome annotation that might be affecting it's performance is that it only annotates a gene where there is both evidence (like your RNA-seq data) and an ab-initio prediction. If a prediction is unsupported by the evidence, then MAKER won't annotate a gene and if evidence aligns where there's no prediction, MAKER won't annotate a gene either. What ab-initio predictors are you using and have they been trained specific genome?

You can force MAKER to automatically promote evidence alignments to a gene model by setting the est2genome option to 1, but that will usually give you many false positives.

Try rerunning it with either the Trinity data or the Cufflinks data and with est2genome set to 1, and let us know how that affects the MAKER results.

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org>] on behalf of dhivya arasappan [darasappan at gmail.com<mailto:darasappan at gmail.com>]
Sent: Thursday, January 30, 2014 11:18 AM
To: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
Subject: [maker-devel] maker annotation with cufflinks output

Hello,

I am trying to annotate a 200 mb plant genome for which I have a very
good assembly.

I tried to denovo assemble RNA-seq data using trinity and ran maker
using my genome assembly and the trinity results.  I did not get as
many transcripts as expected, around 10,000 transcripts.

So, I decided to try a different approach.  I did a genome assisted
assembly of the RNA-seq data using tophat/cufflinks. This pipeline
generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
genome assembly and the cufflinks result.  I get much less number of
transcripts as a result.

If cufflinks found 29000 transcripts by mapping to the genome, I'm
confused as to why maker is not finding the same.

Any suggestions would be appreciated.

Thanks
Dhivya


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140205/98e0f3f4/attachment-0002.html>

From darasappan at gmail.com  Wed Feb  5 13:13:57 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Wed, 5 Feb 2014 14:13:57 -0600
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D43F95@mxb2.hg.genetics.utah.edu>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>,
	<CF17D1FC.987A%carsonhh@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43F95@mxb2.hg.genetics.utah.edu>
Message-ID: <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com>

Hello Daniel and Carson,

Thanks for your replies.

Yes I used the the protein sequences resulting from annotation of  
trinity assembly (using trinotate).  I'll try using protein sequences  
from related species (though there arent sequences from closely  
related orgs).  Could you tell me a little about why protein data from  
annotating my rnaseq data would not work best here?

Thanks
Dhivya

On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote:

> Hi Dhivya, Are the protein matches in your results coming from your  
> annotations of the transcriptome? You should really use amino-acid  
> sequences from related organisms and some kind of omnibus source  
> like SwissProt.
>
> Thanks,
> Daniel
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> From: Carson Holt [carsonhh at gmail.com]
> Sent: Wednesday, February 05, 2014 11:38 AM
> To: dhivya arasappan; Daniel Ence
> Cc: maker-devel at yandell-lab.org
> Subject: Re: [maker-devel] maker annotation with cufflinks output
>
> Do you have any features of type snap in your results from step 3?   
> We?ve had a couple of recent posts where after training snap was  
> giving no results, and as a result maker couldn?t give any genes.   
> One cause of something like that may be your step 2.  Make sure the  
> ZFF wasn?t empty you used to train with.  The maker2zff script uses  
> filters to only put the best genes in the off file, and if all your  
> genes fail the filtering then you are training with an empty ZFF.
>
> Also you should use proteins from a related species as your protein  
> file.  I see that you protein marches are varying wildly from run to  
> run? So is your contig count?  Were the subset of contigs you have  
> results for long enough to contain genes?
>
> ?Carson
>
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Monday, February 3, 2014 at 9:31 AM
> To: Daniel Ence <dence at genetics.utah.edu>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] maker annotation with cufflinks output
>
> Hi Daniel,
>
> I was able to check on some of those questions.
>
> 1. From trinity assembly: I started with 102000 contigs. I used  
> trinotate to annotate proteins in this.
>
> I ran maker on this data with est2genome set to 1. The output looks  
> like this (most important parts on top):
>
>     6653 gene
>    46675 exon
>  280534 protein_match
> 59934 CDS
>     969 contig
>  105388 expressed_sequence_match
>   12584 five_prime_UTR
>   78565 match
> 1401369 match_part
>   10180 mRNA
>   11545 three_prime_UTR
>
> 2. From cufflinks assembly: I started with 133380 entries (out of  
> which there are 29,000 transcripts).  I used the protein sequences  
> from trinity assembly.
>
> I ran maker on this data with est2genome set to 1. The output looks  
> like this:
>      29 gene
>      75 exon
>  573659 protein_match
> 67 CDS
>    1099 contig
>  269298 expressed_sequence_match
>      23 five_prime_UTR
>  173844 match
> 2221846 match_part
>      29 mRNA
>      23 three_prime_UTR
>
> The genes annotated using the trinity assembly is lower than  
> expected, so I went the cufflinks route. I dont understand why when  
> using the cufflinks transcripts, even less genes are being found.
>
> 3. Training SNAP:  I used the results of maker from 1 to train  
> SNAP.  I then used that training set to rerun maker:
> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
> maker_mpi_withAlltrinity/snap/RHA.hmm
> est2genome=0
>
> And again I got results with no entries for gene, exon, CDS etc.
> 957 contig
>   46555 expressed_sequence_match
>   43651 match
>  553633 match_part
>  113738 protein_match
>
> As I mentioned in another email, cegma results indicated that the  
> genome was more than 90% complete. Any suggestions would be helpful.
>
> Thank you
> Dhivya
>
>
>
>
> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>
>> Hi Dhivya,
>>
>> I think there a few numbers that could be helpful to understand  
>> what's happening here.
>>
>> How many transcripts did Trinity assembly the RNA-seq data into?  
>> Also, you had 29,000 transcripts from cufflinks, but fewer from  
>> MAKER when you gave it the cufflinks data. How many transcripts did  
>> MAKER identify with the cufflinks data? Did you still get more than  
>> the 10,000 transcripts that you found with just the Trinity data?
>>
>> A key part of MAKER's approach to genome annotation that might be  
>> affecting it's performance is that it only annotates a gene where  
>> there is both evidence (like your RNA-seq data) and an ab-initio  
>> prediction. If a prediction is unsupported by the evidence, then  
>> MAKER won't annotate a gene and if evidence aligns where there's no  
>> prediction, MAKER won't annotate a gene either. What ab-initio  
>> predictors are you using and have they been trained specific genome?
>>
>> You can force MAKER to automatically promote evidence alignments to  
>> a gene model by setting the est2genome option to 1, but that will  
>> usually give you many false positives.
>>
>> Try rerunning it with either the Trinity data or the Cufflinks data  
>> and with est2genome set to 1, and let us know how that affects the  
>> MAKER results.
>>
>> Thanks,
>> Daniel
>>
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> ________________________________________
>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf  
>> of dhivya arasappan [darasappan at gmail.com]
>> Sent: Thursday, January 30, 2014 11:18 AM
>> To: maker-devel at yandell-lab.org
>> Subject: [maker-devel] maker annotation with cufflinks output
>>
>> Hello,
>>
>> I am trying to annotate a 200 mb plant genome for which I have a very
>> good assembly.
>>
>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>> using my genome assembly and the trinity results.  I did not get as
>> many transcripts as expected, around 10,000 transcripts.
>>
>> So, I decided to try a different approach.  I did a genome assisted
>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using  
>> my
>> genome assembly and the cufflinks result.  I get much less number of
>> transcripts as a result.
>>
>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>> confused as to why maker is not finding the same.
>>
>> Any suggestions would be appreciated.
>>
>> Thanks
>> Dhivya
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
> _______________________________________________ maker-devel mailing  
> list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140205/44820157/attachment-0002.html>

From dence at genetics.utah.edu  Wed Feb  5 13:36:26 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Wed, 5 Feb 2014 20:36:26 +0000
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>,
	<CF17D1FC.987A%carsonhh@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43F95@mxb2.hg.genetics.utah.edu>,
	<4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D43FB4@mxb2.hg.genetics.utah.edu>

Hi Dhivya,

In genome annotation, often you want to use as many sources for evidence as is reasonable, but those sources should be distinct.  It will confuse downstream annotation efforts if your protein evidence is actually based on the RNA-seq data.

Using the trinotate results for protein evidence here restricts you first to the proteins coded by the transcripts in the RNA-seq data, which may be incomplete, and secondly to the proteins that trinotate could annotate from among the transcripts.

The problem that Carson mentioned with the SNAP HMM file is a real possibility also.

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: dhivya arasappan [darasappan at gmail.com]
Sent: Wednesday, February 05, 2014 1:13 PM
To: Daniel Ence
Cc: Carson Holt; maker-devel at yandell-lab.org
Subject: Re: [maker-devel] maker annotation with cufflinks output

Hello Daniel and Carson,

Thanks for your replies.

Yes I used the the protein sequences resulting from annotation of trinity assembly (using trinotate).  I'll try using protein sequences from related species (though there arent sequences from closely related orgs).  Could you tell me a little about why protein data from annotating my rnaseq data would not work best here?

Thanks
Dhivya

On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote:

Hi Dhivya, Are the protein matches in your results coming from your annotations of the transcriptome? You should really use amino-acid sequences from related organisms and some kind of omnibus source like SwissProt.

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: Carson Holt [carsonhh at gmail.com<mailto:carsonhh at gmail.com>]
Sent: Wednesday, February 05, 2014 11:38 AM
To: dhivya arasappan; Daniel Ence
Cc: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] maker annotation with cufflinks output

Do you have any features of type snap in your results from step 3?  We?ve had a couple of recent posts where after training snap was giving no results, and as a result maker couldn?t give any genes.  One cause of something like that may be your step 2.  Make sure the ZFF wasn?t empty you used to train with.  The maker2zff script uses filters to only put the best genes in the off file, and if all your genes fail the filtering then you are training with an empty ZFF.

Also you should use proteins from a related species as your protein file.  I see that you protein marches are varying wildly from run to run? So is your contig count?  Were the subset of contigs you have results for long enough to contain genes?

?Carson

From: dhivya arasappan <darasappan at gmail.com<mailto:darasappan at gmail.com>>
Date: Monday, February 3, 2014 at 9:31 AM
To: Daniel Ence <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] maker annotation with cufflinks output

Hi Daniel,

I was able to check on some of those questions.

1. From trinity assembly: I started with 102000 contigs. I used trinotate to annotate proteins in this.

I ran maker on this data with est2genome set to 1. The output looks like this (most important parts on top):

    6653 gene
   46675 exon
 280534 protein_match
59934 CDS
    969 contig
 105388 expressed_sequence_match
  12584 five_prime_UTR
  78565 match
1401369 match_part
  10180 mRNA
  11545 three_prime_UTR

2. From cufflinks assembly: I started with 133380 entries (out of which there are 29,000 transcripts).  I used the protein sequences from trinity assembly.

I ran maker on this data with est2genome set to 1. The output looks like this:
     29 gene
     75 exon
 573659 protein_match
67 CDS
   1099 contig
 269298 expressed_sequence_match
     23 five_prime_UTR
 173844 match
2221846 match_part
     29 mRNA
     23 three_prime_UTR

The genes annotated using the trinity assembly is lower than expected, so I went the cufflinks route. I dont understand why when using the cufflinks transcripts, even less genes are being found.

3. Training SNAP:  I used the results of maker from 1 to train SNAP.  I then used that training set to rerun maker:
snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap/RHA.hmm
est2genome=0

And again I got results with no entries for gene, exon, CDS etc.
957 contig
  46555 expressed_sequence_match
  43651 match
 553633 match_part
 113738 protein_match

As I mentioned in another email, cegma results indicated that the genome was more than 90% complete. Any suggestions would be helpful.

Thank you
Dhivya


On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:

Hi Dhivya,

I think there a few numbers that could be helpful to understand what's happening here.

How many transcripts did Trinity assembly the RNA-seq data into? Also, you had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it the cufflinks data. How many transcripts did MAKER identify with the cufflinks data? Did you still get more than the 10,000 transcripts that you found with just the Trinity data?

A key part of MAKER's approach to genome annotation that might be affecting it's performance is that it only annotates a gene where there is both evidence (like your RNA-seq data) and an ab-initio prediction. If a prediction is unsupported by the evidence, then MAKER won't annotate a gene and if evidence aligns where there's no prediction, MAKER won't annotate a gene either. What ab-initio predictors are you using and have they been trained specific genome?

You can force MAKER to automatically promote evidence alignments to a gene model by setting the est2genome option to 1, but that will usually give you many false positives.

Try rerunning it with either the Trinity data or the Cufflinks data and with est2genome set to 1, and let us know how that affects the MAKER results.

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org>] on behalf of dhivya arasappan [darasappan at gmail.com<mailto:darasappan at gmail.com>]
Sent: Thursday, January 30, 2014 11:18 AM
To: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
Subject: [maker-devel] maker annotation with cufflinks output

Hello,

I am trying to annotate a 200 mb plant genome for which I have a very
good assembly.

I tried to denovo assemble RNA-seq data using trinity and ran maker
using my genome assembly and the trinity results.  I did not get as
many transcripts as expected, around 10,000 transcripts.

So, I decided to try a different approach.  I did a genome assisted
assembly of the RNA-seq data using tophat/cufflinks. This pipeline
generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
genome assembly and the cufflinks result.  I get much less number of
transcripts as a result.

If cufflinks found 29000 transcripts by mapping to the genome, I'm
confused as to why maker is not finding the same.

Any suggestions would be appreciated.

Thanks
Dhivya


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140205/36c41e54/attachment-0002.html>

From carsonhh at gmail.com  Wed Feb  5 13:38:44 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 05 Feb 2014 13:38:44 -0700
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43F95@mxb2.hg.genetics.utah.edu>
	<4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com>
Message-ID: <CF17E9B9.9892%carsonhh@gmail.com>

Protein data doesn?t have to be from that closely a related species.  This
is because genes maintain homology at the amino acid level across even very
large evolutionary distances.  Having a closer related species just ensures
that genome contents are similar (fewer losses/gains relative to each
other). And use the entire proteome of at least one related species (just
using a database like swiss-prot is not sufficient).

Using translated mRNA-seq data will not give you any new information that
was not already available from the untranslated sequence.  Plus it will
introduce the complicating artifacts that mRNA-seq generates into the
protein part of the pipeline (gene merging, incorrect assembly, and false
calls caused by background transcription).  A big gotcha with mRNA-seq is
that all of your genome gets transcribed at a low level, not just the genes,
so you will always have contamination that does not represent real gene
models.  Also in the end you really only expect to capture about 50% of the
genes with mRNA-seq (maybe 70% if you are fortunate - and most of those will
be partial). So using the proteins from another species, is important to
improve sensitivity, and fix many of the issues that arise from the noisy
nature of mRNA-seq.  In fact if you were forced to use only one (either
protein evidence or mRNA-seq) you will actually get better annotations from
the protein evidence in most cases. You get better annotations when you use
both, but if using only one of them, the proteins from another species are
better, and noisy mRNA-seq will be the primary source of annotation error.

Thanks,
Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Wednesday, February 5, 2014 at 1:13 PM
To:  Daniel Ence <dence at genetics.utah.edu>
Cc:  Carson Holt <carsonhh at gmail.com>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] maker annotation with cufflinks output

Hello Daniel and Carson,

Thanks for your replies.

Yes I used the the protein sequences resulting from annotation of trinity
assembly (using trinotate).  I'll try using protein sequences from related
species (though there arent sequences from closely related orgs).  Could you
tell me a little about why protein data from annotating my rnaseq data would
not work best here?

Thanks
Dhivya
 
On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote:

> Hi Dhivya, Are the protein matches in your results coming from your
> annotations of the transcriptome? You should really use amino-acid sequences
> from related organisms and some kind of omnibus source like SwissProt.
> 
> Thanks,
> Daniel
> 
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> 
> From: Carson Holt [carsonhh at gmail.com]
> Sent: Wednesday, February 05, 2014 11:38 AM
> To: dhivya arasappan; Daniel Ence
> Cc: maker-devel at yandell-lab.org
> Subject: Re: [maker-devel] maker annotation with cufflinks output
> 
> Do you have any features of type snap in your results from step 3?  We?ve had
> a couple of recent posts where after training snap was giving no results, and
> as a result maker couldn?t give any genes.  One cause of something like that
> may be your step 2.  Make sure the ZFF wasn?t empty you used to train with.
> The maker2zff script uses filters to only put the best genes in the off file,
> and if all your genes fail the filtering then you are training with an empty
> ZFF.
> 
> Also you should use proteins from a related species as your protein file.  I
> see that you protein marches are varying wildly from run to run? So is your
> contig count?  Were the subset of contigs you have results for long enough to
> contain genes?
> 
> ?Carson
> 
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Monday, February 3, 2014 at 9:31 AM
> To: Daniel Ence <dence at genetics.utah.edu>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] maker annotation with cufflinks output
> 
> Hi Daniel,
> 
> I was able to check on some of those questions.
> 
> 1. From trinity assembly: I started with 102000 contigs. I used trinotate to
> annotate proteins in this.
> 
> I ran maker on this data with est2genome set to 1. The output looks like this
> (most important parts on top):
> 
>     6653 gene
>    46675 exon
>  280534 protein_match
> 59934 CDS
>     969 contig
>  105388 expressed_sequence_match
>   12584 five_prime_UTR
>   78565 match
> 1401369 match_part
>   10180 mRNA
>   11545 three_prime_UTR
> 
> 2. From cufflinks assembly: I started with 133380 entries (out of which there
> are 29,000 transcripts).  I used the protein sequences from trinity assembly.
> 
> I ran maker on this data with est2genome set to 1. The output looks like this:
>      29 gene
>      75 exon
>  573659 protein_match
> 67 CDS
>    1099 contig
>  269298 expressed_sequence_match
>      23 five_prime_UTR
>  173844 match
> 2221846 match_part
>      29 mRNA
>      23 three_prime_UTR
> 
> The genes annotated using the trinity assembly is lower than expected, so I
> went the cufflinks route. I dont understand why when using the cufflinks
> transcripts, even less genes are being found.
> 
> 3. Training SNAP:  I used the results of maker from 1 to train SNAP.  I then
> used that training set to rerun maker:
> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap
> /RHA.hmm
> est2genome=0
> 
> And again I got results with no entries for gene, exon, CDS etc.
> 957 contig
>   46555 expressed_sequence_match
>   43651 match
>  553633 match_part
>  113738 protein_match
> 
> As I mentioned in another email, cegma results indicated that the genome was
> more than 90% complete. Any suggestions would be helpful.
> 
> Thank you
> Dhivya
> 
> 
> 
> 
> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
> 
>> Hi Dhivya, 
>> 
>> I think there a few numbers that could be helpful to understand what's
>> happening here. 
>> 
>> How many transcripts did Trinity assembly the RNA-seq data into? Also, you
>> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it
>> the cufflinks data. How many transcripts did MAKER identify with the
>> cufflinks data? Did you still get more than the 10,000 transcripts that you
>> found with just the Trinity data?
>> 
>> A key part of MAKER's approach to genome annotation that might be affecting
>> it's performance is that it only annotates a gene where there is both
>> evidence (like your RNA-seq data) and an ab-initio prediction. If a
>> prediction is unsupported by the evidence, then MAKER won't annotate a gene
>> and if evidence aligns where there's no prediction, MAKER won't annotate a
>> gene either. What ab-initio predictors are you using and have they been
>> trained specific genome?
>> 
>> You can force MAKER to automatically promote evidence alignments to a gene
>> model by setting the est2genome option to 1, but that will usually give you
>> many false positives.
>> 
>> Try rerunning it with either the Trinity data or the Cufflinks data and with
>> est2genome set to 1, and let us know how that affects the MAKER results.
>> 
>> Thanks,
>> Daniel
>> 
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> ________________________________________
>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya
>> arasappan [darasappan at gmail.com]
>> Sent: Thursday, January 30, 2014 11:18 AM
>> To: maker-devel at yandell-lab.org
>> Subject: [maker-devel] maker annotation with cufflinks output
>> 
>> Hello,
>> 
>> I am trying to annotate a 200 mb plant genome for which I have a very
>> good assembly.
>> 
>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>> using my genome assembly and the trinity results.  I did not get as
>> many transcripts as expected, around 10,000 transcripts.
>> 
>> So, I decided to try a different approach.  I did a genome assisted
>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
>> genome assembly and the cufflinks result.  I get much less number of
>> transcripts as a result.
>> 
>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>> confused as to why maker is not finding the same.
>> 
>> Any suggestions would be appreciated.
>> 
>> Thanks
>> Dhivya
>> 
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140205/422a18ff/attachment-0002.html>

From darasappan at gmail.com  Wed Feb  5 22:16:43 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Wed, 5 Feb 2014 23:16:43 -0600
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <CF17E9B9.9892%carsonhh@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43F95@mxb2.hg.genetics.utah.edu>
	<4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com>
	<CF17E9B9.9892%carsonhh@gmail.com>
Message-ID: <1188173E-53C1-4FFE-B790-B710C3A55B86@gmail.com>

Thank you both for those explanations. I'll get back to you after I  
try rerunning maker.

Dhivya

On Feb 5, 2014, at 2:38 PM, Carson Holt wrote:

> Protein data doesn?t have to be from that closely a related  
> species.  This is because genes maintain homology at the amino acid  
> level across even very large evolutionary distances.  Having a  
> closer related species just ensures that genome contents are similar  
> (fewer losses/gains relative to each other). And use the entire  
> proteome of at least one related species (just using a database like  
> swiss-prot is not sufficient).
>
> Using translated mRNA-seq data will not give you any new information  
> that was not already available from the untranslated sequence.  Plus  
> it will introduce the complicating artifacts that mRNA-seq generates  
> into the protein part of the pipeline (gene merging, incorrect  
> assembly, and false calls caused by background transcription).  A  
> big gotcha with mRNA-seq is that all of your genome gets transcribed  
> at a low level, not just the genes, so you will always have  
> contamination that does not represent real gene models.  Also in the  
> end you really only expect to capture about 50% of the genes with  
> mRNA-seq (maybe 70% if you are fortunate - and most of those will be  
> partial). So using the proteins from another species, is important  
> to improve sensitivity, and fix many of the issues that arise from  
> the noisy nature of mRNA-seq.  In fact if you were forced to use  
> only one (either protein evidence or mRNA-seq) you will actually get  
> better annotations from the protein evidence in most cases. You get  
> better annotations when you use both, but if using only one of them,  
> the proteins from another species are better, and noisy mRNA-seq  
> will be the primary source of annotation error.
>
> Thanks,
> Carson
>
>
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Wednesday, February 5, 2014 at 1:13 PM
> To: Daniel Ence <dence at genetics.utah.edu>
> Cc: Carson Holt <carsonhh at gmail.com>, "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org 
> >
> Subject: Re: [maker-devel] maker annotation with cufflinks output
>
> Hello Daniel and Carson,
>
> Thanks for your replies.
>
> Yes I used the the protein sequences resulting from annotation of  
> trinity assembly (using trinotate).  I'll try using protein  
> sequences from related species (though there arent sequences from  
> closely related orgs).  Could you tell me a little about why protein  
> data from annotating my rnaseq data would not work best here?
>
> Thanks
> Dhivya
>
> On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote:
>
>> Hi Dhivya, Are the protein matches in your results coming from your  
>> annotations of the transcriptome? You should really use amino-acid  
>> sequences from related organisms and some kind of omnibus source  
>> like SwissProt.
>>
>> Thanks,
>> Daniel
>>
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> From: Carson Holt [carsonhh at gmail.com]
>> Sent: Wednesday, February 05, 2014 11:38 AM
>> To: dhivya arasappan; Daniel Ence
>> Cc: maker-devel at yandell-lab.org
>> Subject: Re: [maker-devel] maker annotation with cufflinks output
>>
>> Do you have any features of type snap in your results from step 3?   
>> We?ve had a couple of recent posts where after training snap was  
>> giving no results, and as a result maker couldn?t give any genes.   
>> One cause of something like that may be your step 2.  Make sure the  
>> ZFF wasn?t empty you used to train with.  The maker2zff script uses  
>> filters to only put the best genes in the off file, and if all your  
>> genes fail the filtering then you are training with an empty ZFF.
>>
>> Also you should use proteins from a related species as your protein  
>> file.  I see that you protein marches are varying wildly from run  
>> to run? So is your contig count?  Were the subset of contigs you  
>> have results for long enough to contain genes?
>>
>> ?Carson
>>
>> From: dhivya arasappan <darasappan at gmail.com>
>> Date: Monday, February 3, 2014 at 9:31 AM
>> To: Daniel Ence <dence at genetics.utah.edu>
>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] maker annotation with cufflinks output
>>
>> Hi Daniel,
>>
>> I was able to check on some of those questions.
>>
>> 1. From trinity assembly: I started with 102000 contigs. I used  
>> trinotate to annotate proteins in this.
>>
>> I ran maker on this data with est2genome set to 1. The output looks  
>> like this (most important parts on top):
>>
>>     6653 gene
>>    46675 exon
>>  280534 protein_match
>> 59934 CDS
>>     969 contig
>>  105388 expressed_sequence_match
>>   12584 five_prime_UTR
>>   78565 match
>> 1401369 match_part
>>   10180 mRNA
>>   11545 three_prime_UTR
>>
>> 2. From cufflinks assembly: I started with 133380 entries (out of  
>> which there are 29,000 transcripts).  I used the protein sequences  
>> from trinity assembly.
>>
>> I ran maker on this data with est2genome set to 1. The output looks  
>> like this:
>>      29 gene
>>      75 exon
>>  573659 protein_match
>> 67 CDS
>>    1099 contig
>>  269298 expressed_sequence_match
>>      23 five_prime_UTR
>>  173844 match
>> 2221846 match_part
>>      29 mRNA
>>      23 three_prime_UTR
>>
>> The genes annotated using the trinity assembly is lower than  
>> expected, so I went the cufflinks route. I dont understand why when  
>> using the cufflinks transcripts, even less genes are being found.
>>
>> 3. Training SNAP:  I used the results of maker from 1 to train  
>> SNAP.  I then used that training set to rerun maker:
>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
>> maker_mpi_withAlltrinity/snap/RHA.hmm
>> est2genome=0
>>
>> And again I got results with no entries for gene, exon, CDS etc.
>> 957 contig
>>   46555 expressed_sequence_match
>>   43651 match
>>  553633 match_part
>>  113738 protein_match
>>
>> As I mentioned in another email, cegma results indicated that the  
>> genome was more than 90% complete. Any suggestions would be helpful.
>>
>> Thank you
>> Dhivya
>>
>>
>>
>>
>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>>
>>> Hi Dhivya,
>>>
>>> I think there a few numbers that could be helpful to understand  
>>> what's happening here.
>>>
>>> How many transcripts did Trinity assembly the RNA-seq data into?  
>>> Also, you had 29,000 transcripts from cufflinks, but fewer from  
>>> MAKER when you gave it the cufflinks data. How many transcripts  
>>> did MAKER identify with the cufflinks data? Did you still get more  
>>> than the 10,000 transcripts that you found with just the Trinity  
>>> data?
>>>
>>> A key part of MAKER's approach to genome annotation that might be  
>>> affecting it's performance is that it only annotates a gene where  
>>> there is both evidence (like your RNA-seq data) and an ab-initio  
>>> prediction. If a prediction is unsupported by the evidence, then  
>>> MAKER won't annotate a gene and if evidence aligns where there's  
>>> no prediction, MAKER won't annotate a gene either. What ab-initio  
>>> predictors are you using and have they been trained specific genome?
>>>
>>> You can force MAKER to automatically promote evidence alignments  
>>> to a gene model by setting the est2genome option to 1, but that  
>>> will usually give you many false positives.
>>>
>>> Try rerunning it with either the Trinity data or the Cufflinks  
>>> data and with est2genome set to 1, and let us know how that  
>>> affects the MAKER results.
>>>
>>> Thanks,
>>> Daniel
>>>
>>> Daniel Ence
>>> Graduate Student
>>> Eccles Institute of Human Genetics
>>> University of Utah
>>> 15 North 2030 East, Room 2100
>>> Salt Lake City, UT 84112-5330
>>> ________________________________________
>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf  
>>> of dhivya arasappan [darasappan at gmail.com]
>>> Sent: Thursday, January 30, 2014 11:18 AM
>>> To: maker-devel at yandell-lab.org
>>> Subject: [maker-devel] maker annotation with cufflinks output
>>>
>>> Hello,
>>>
>>> I am trying to annotate a 200 mb plant genome for which I have a  
>>> very
>>> good assembly.
>>>
>>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>>> using my genome assembly and the trinity results.  I did not get as
>>> many transcripts as expected, around 10,000 transcripts.
>>>
>>> So, I decided to try a different approach.  I did a genome assisted
>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>>> generated 21,000 genes, 29,000 transcripts.  I then ran maker  
>>> using my
>>> genome assembly and the cufflinks result.  I get much less number of
>>> transcripts as a result.
>>>
>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>>> confused as to why maker is not finding the same.
>>>
>>> Any suggestions would be appreciated.
>>>
>>> Thanks
>>> Dhivya
>>>
>>>
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>> _______________________________________________ maker-devel mailing  
>> list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140205/02e0218f/attachment-0002.html>

From mikael.durling at slu.se  Thu Feb  6 04:02:37 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Thu, 6 Feb 2014 11:02:37 +0000
Subject: [maker-devel] ncRNA support in maker
In-Reply-To: <CEF5A8E0.88B4%carsonhh@gmail.com>
References: <CEF5A8E0.88B4%carsonhh@gmail.com>
Message-ID: <CCBE48F7-81F1-42E3-87A3-B251EE03140C@slu.se>

Hi Carson,

it?s nice to see all these new features in maker.

I gave the trnascan option a try by enabling it in the config file for one of my fungal genomes. It failed though, with this error message:

ERROR: You found a tRNA with an intron! This should not happen
--> rank=12, hostname=my-mgrid6
ERROR: Failed while gathering ab-init output files
ERROR: Chunk failed at level:1, tier_type:2
FAILED CONTIG:scf_013

ERROR: Chunk failed at level:4, tier_type:0
FAILED CONTIG:scf_013

I checked the trnascan output (scf_013.abinit_nomask.0.eukaryotic.trnascan) in theVoid for that contig, and the output seems valid to me:

scf_013         1       189339  189410  Thr     AGT     0       0       82.83
scf_013         2       510381  510462  Ser     AGA     0       0       67.09
scf_013         3       586886  587000  Leu     CAA     586924  586956  57.97
scf_013         4       942166  942069  Leu     AAG     942128  942113  57.48
scf_013         5       169102  168993  Leu     TAA     169065  169037  56.49


Hope this can be of some help while debugging. I?ll leave trnascan off for now.

thanks,

Mikael


10 jan 2014 kl. 22:03 skrev Carson Holt <carsonhh at gmail.com>:

> Hi Mikael,
> 
> The options are part of the new MAKER-P integration
> (http://www.plantphysiol.org/content/early/2013/12/06/pp.113.230144.abstrac
> t).  Additional documentation/tutorials will be forthcoming - probably in
> a nice wiki page as part of the upcoming GMOD Malaysia courses in February
> or alternatively with the annual GMOD summer school. The tRNA option is
> easy enough to turn on (just set trna=1 in the maker_opts.ctl file).
> 
> Thanks,
> Carson
> 
> 
> 
> On 1/10/14, 2:48 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
> wrote:
> 
>> Hi Carson and other maker developers,
>> 
>> I was reading the source code of the latest maker release and noted
>> several references to ncRNAs, snoscan and trnascan. Can these be
>> incorporated into the normal annotation workflow? If so, are there any
>> instructions available for that?
>> 
>> best regards,
>> Mikael Durling
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> 


From darasappan at gmail.com  Thu Feb  6 07:52:12 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Thu, 6 Feb 2014 08:52:12 -0600
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <CF17D1FC.987A%carsonhh@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
Message-ID: <73AFCD9F-3B60-4C9C-9E03-35BC682E14ED@gmail.com>

Hello,

I does appear than my genome.ann file from maker2zff script has data  
in it. However, the SNAP steps after that have created empty files.   
The following are all empty:

alt.dna  err.dna  export.dna  genome.dna  olp.dna  uni.dna  wrn.dna
alt.ann  err.ann  export.ann  genome.ann  olp.ann  uni.ann  wrn.ann

When I tried to get gene stats or validate genome.ann, I get errors  
like this for all of them:

fathom genome.ann genome.dna -gene-stats |more
MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds  
exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds  
exon-6:out_of_bounds
MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds  
exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds  
exon-1:out_of_bounds
MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds  
exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds  
exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds  
exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds  
exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds  
exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds  
exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds  
exon-20:out_of_bounds exon-21:out_of_bounds

I'm not sure why the annotation I'm seeing in genome.ann are all  
showing up as errors. I realize this may be an issue with snap, but  
are you familiar with anything like this? Snippet of my genome.ann  
file is attached (since its too big for the list) for reference.

Thanks
Dhivya


On Feb 5, 2014, at 12:38 PM, Carson Holt wrote:

> Do you have any features of type snap in your results from step 3?   
> We?ve had a couple of recent posts where after training snap was  
> giving no results, and as a result maker couldn?t give any genes.   
> One cause of something like that may be your step 2.  Make sure the  
> ZFF wasn?t empty you used to train with.  The maker2zff script uses  
> filters to only put the best genes in the off file, and if all your  
> genes fail the filtering then you are training with an empty ZFF.
>
> Also you should use proteins from a related species as your protein  
> file.  I see that you protein marches are varying wildly from run to  
> run? So is your contig count?  Were the subset of contigs you have  
> results for long enough to contain genes?
>
> ?Carson
>
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Monday, February 3, 2014 at 9:31 AM
> To: Daniel Ence <dence at genetics.utah.edu>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] maker annotation with cufflinks output
>
> Hi Daniel,
>
> I was able to check on some of those questions.
>
> 1. From trinity assembly: I started with 102000 contigs. I used  
> trinotate to annotate proteins in this.
>
> I ran maker on this data with est2genome set to 1. The output looks  
> like this (most important parts on top):
>
>     6653 gene
>    46675 exon
>  280534 protein_match
> 59934 CDS
>     969 contig
>  105388 expressed_sequence_match
>   12584 five_prime_UTR
>   78565 match
> 1401369 match_part
>   10180 mRNA
>   11545 three_prime_UTR
>
> 2. From cufflinks assembly: I started with 133380 entries (out of  
> which there are 29,000 transcripts).  I used the protein sequences  
> from trinity assembly.
>
> I ran maker on this data with est2genome set to 1. The output looks  
> like this:
>      29 gene
>      75 exon
>  573659 protein_match
> 67 CDS
>    1099 contig
>  269298 expressed_sequence_match
>      23 five_prime_UTR
>  173844 match
> 2221846 match_part
>      29 mRNA
>      23 three_prime_UTR
>
> The genes annotated using the trinity assembly is lower than  
> expected, so I went the cufflinks route. I dont understand why when  
> using the cufflinks transcripts, even less genes are being found.
>
> 3. Training SNAP:  I used the results of maker from 1 to train  
> SNAP.  I then used that training set to rerun maker:
> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
> maker_mpi_withAlltrinity/snap/RHA.hmm
> est2genome=0
>
> And again I got results with no entries for gene, exon, CDS etc.
> 957 contig
>   46555 expressed_sequence_match
>   43651 match
>  553633 match_part
>  113738 protein_match
>
> As I mentioned in another email, cegma results indicated that the  
> genome was more than 90% complete. Any suggestions would be helpful.
>
> Thank you
> Dhivya
>
>
>
>
> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>
>> Hi Dhivya,
>>
>> I think there a few numbers that could be helpful to understand  
>> what's happening here.
>>
>> How many transcripts did Trinity assembly the RNA-seq data into?  
>> Also, you had 29,000 transcripts from cufflinks, but fewer from  
>> MAKER when you gave it the cufflinks data. How many transcripts did  
>> MAKER identify with the cufflinks data? Did you still get more than  
>> the 10,000 transcripts that you found with just the Trinity data?
>>
>> A key part of MAKER's approach to genome annotation that might be  
>> affecting it's performance is that it only annotates a gene where  
>> there is both evidence (like your RNA-seq data) and an ab-initio  
>> prediction. If a prediction is unsupported by the evidence, then  
>> MAKER won't annotate a gene and if evidence aligns where there's no  
>> prediction, MAKER won't annotate a gene either. What ab-initio  
>> predictors are you using and have they been trained specific genome?
>>
>> You can force MAKER to automatically promote evidence alignments to  
>> a gene model by setting the est2genome option to 1, but that will  
>> usually give you many false positives.
>>
>> Try rerunning it with either the Trinity data or the Cufflinks data  
>> and with est2genome set to 1, and let us know how that affects the  
>> MAKER results.
>>
>> Thanks,
>> Daniel
>>
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> ________________________________________
>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf  
>> of dhivya arasappan [darasappan at gmail.com]
>> Sent: Thursday, January 30, 2014 11:18 AM
>> To: maker-devel at yandell-lab.org
>> Subject: [maker-devel] maker annotation with cufflinks output
>>
>> Hello,
>>
>> I am trying to annotate a 200 mb plant genome for which I have a very
>> good assembly.
>>
>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>> using my genome assembly and the trinity results.  I did not get as
>> many transcripts as expected, around 10,000 transcripts.
>>
>> So, I decided to try a different approach.  I did a genome assisted
>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using  
>> my
>> genome assembly and the cufflinks result.  I get much less number of
>> transcripts as a result.
>>
>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>> confused as to why maker is not finding the same.
>>
>> Any suggestions would be appreciated.
>>
>> Thanks
>> Dhivya
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
> _______________________________________________ maker-devel mailing  
> list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a6912d46/attachment-0006.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: head.genome.ann
Type: application/octet-stream
Size: 15761 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a6912d46/attachment-0004.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a6912d46/attachment-0007.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: head.genome.dna
Type: application/octet-stream
Size: 3075 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a6912d46/attachment-0005.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a6912d46/attachment-0008.html>

From carsonhh at gmail.com  Thu Feb  6 09:01:04 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Feb 2014 09:01:04 -0700
Subject: [maker-devel] ncRNA support in maker
In-Reply-To: <CCBE48F7-81F1-42E3-87A3-B251EE03140C@slu.se>
References: <CEF5A8E0.88B4%carsonhh@gmail.com>
	<CCBE48F7-81F1-42E3-87A3-B251EE03140C@slu.se>
Message-ID: <CF18FE86.9903%carsonhh@gmail.com>

I?m making a new release this weekend, but if you have access to the devel
version, you can test now.  All changes have been committed tot he
subversion repository.

Thanks,
Carson


On 2/6/14, 4:02 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
wrote:

>Hi Carson,
>
>it?s nice to see all these new features in maker.
>
>I gave the trnascan option a try by enabling it in the config file for
>one of my fungal genomes. It failed though, with this error message:
>
>ERROR: You found a tRNA with an intron! This should not happen
>--> rank=12, hostname=my-mgrid6
>ERROR: Failed while gathering ab-init output files
>ERROR: Chunk failed at level:1, tier_type:2
>FAILED CONTIG:scf_013
>
>ERROR: Chunk failed at level:4, tier_type:0
>FAILED CONTIG:scf_013
>
>I checked the trnascan output
>(scf_013.abinit_nomask.0.eukaryotic.trnascan) in theVoid for that contig,
>and the output seems valid to me:
>
>scf_013         1       189339  189410  Thr     AGT     0       0
>82.83
>scf_013         2       510381  510462  Ser     AGA     0       0
>67.09
>scf_013         3       586886  587000  Leu     CAA     586924  586956
>57.97
>scf_013         4       942166  942069  Leu     AAG     942128  942113
>57.48
>scf_013         5       169102  168993  Leu     TAA     169065  169037
>56.49
>
>
>Hope this can be of some help while debugging. I?ll leave trnascan off
>for now.
>
>thanks,
>
>Mikael
>
>
>10 jan 2014 kl. 22:03 skrev Carson Holt <carsonhh at gmail.com>:
>
>> Hi Mikael,
>> 
>> The options are part of the new MAKER-P integration
>> 
>>(http://www.plantphysiol.org/content/early/2013/12/06/pp.113.230144.abstr
>>ac
>> t).  Additional documentation/tutorials will be forthcoming - probably
>>in
>> a nice wiki page as part of the upcoming GMOD Malaysia courses in
>>February
>> or alternatively with the annual GMOD summer school. The tRNA option is
>> easy enough to turn on (just set trna=1 in the maker_opts.ctl file).
>> 
>> Thanks,
>> Carson
>> 
>> 
>> 
>> On 1/10/14, 2:48 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
>> wrote:
>> 
>>> Hi Carson and other maker developers,
>>> 
>>> I was reading the source code of the latest maker release and noted
>>> several references to ncRNAs, snoscan and trnascan. Can these be
>>> incorporated into the normal annotation workflow? If so, are there any
>>> instructions available for that?
>>> 
>>> best regards,
>>> Mikael Durling
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
>> 
>


From carsonhh at gmail.com  Thu Feb  6 09:05:05 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Feb 2014 09:05:05 -0700
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <C375C3D8-1B13-4685-9E90-AAF710CADCDD@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
	<C375C3D8-1B13-4685-9E90-AAF710CADCDD@gmail.com>
Message-ID: <CF19004A.9913%carsonhh@gmail.com>

Your genome.dna file has no sequence?  Did you by any chance strip the fasta
sequence from the GFF3 you are using as input to maker2zff?  There should be
fasta sequence at the end of that file.  Also can I see the GFF3 file you
are using as input to maker2zff.

Thanks,
Carson

From:  dhivya arasappan <darasappan at gmail.com>
Date:  Thursday, February 6, 2014 at 7:47 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] maker annotation with cufflinks output

Hello,

I does appear than my genome.ann file from maker2zff script has data in it.
However, the SNAP steps after that have created empty files.  The following
are all empty:

alt.dna  err.dna  export.dna  genome.dna  olp.dna  uni.dna  wrn.dna
alt.ann  err.ann  export.ann  genome.ann  olp.ann  uni.ann  wrn.ann

When I tried to get gene stats or validate genome.ann, I get errors like
this for all of them:

fathom genome.ann genome.dna -gene-stats |more
MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds
exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
exon-6:out_of_bounds
MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds
exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds
exon-1:out_of_bounds
MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds
exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds
exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds
exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds
exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds
exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds
exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds
exon-21:out_of_bounds

I'm not sure why the annotation I'm seeing in genome.ann are all showing up
as errors. I realize this may be an issue with snap, but are you familiar
with anything like this? My genome.ann file is attached for reference.

Thanks
Dhivya

On Feb 5, 2014, at 12:38 PM, Carson Holt wrote:

> Do you have any features of type snap in your results from step 3?  We?ve had
> a couple of recent posts where after training snap was giving no results, and
> as a result maker couldn?t give any genes.  One cause of something like that
> may be your step 2.  Make sure the ZFF wasn?t empty you used to train with.
> The maker2zff script uses filters to only put the best genes in the off file,
> and if all your genes fail the filtering then you are training with an empty
> ZFF.
> 
> Also you should use proteins from a related species as your protein file.  I
> see that you protein marches are varying wildly from run to run? So is your
> contig count?  Were the subset of contigs you have results for long enough to
> contain genes?
> 
> ?Carson
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Monday, February 3, 2014 at 9:31 AM
> To:  Daniel Ence <dence at genetics.utah.edu>
> Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject:  Re: [maker-devel] maker annotation with cufflinks output
> 
> Hi Daniel,
> 
> I was able to check on some of those questions.
> 
> 1. From trinity assembly: I started with 102000 contigs. I used trinotate to
> annotate proteins in this.
> 
> I ran maker on this data with est2genome set to 1. The output looks like this
> (most important parts on top):
> 
>     6653 gene
>    46675 exon
>  280534 protein_match
> 59934 CDS
>     969 contig
>  105388 expressed_sequence_match
>   12584 five_prime_UTR
>   78565 match
> 1401369 match_part
>   10180 mRNA
>   11545 three_prime_UTR
> 
> 2. From cufflinks assembly: I started with 133380 entries (out of which there
> are 29,000 transcripts).  I used the protein sequences from trinity assembly.
> 
> I ran maker on this data with est2genome set to 1. The output looks like this:
>      29 gene
>      75 exon
>  573659 protein_match
> 67 CDS
>    1099 contig
>  269298 expressed_sequence_match
>      23 five_prime_UTR
>  173844 match
> 2221846 match_part
>      29 mRNA
>      23 three_prime_UTR
> 
> The genes annotated using the trinity assembly is lower than expected, so I
> went the cufflinks route. I dont understand why when using the cufflinks
> transcripts, even less genes are being found.
> 
> 3. Training SNAP:  I used the results of maker from 1 to train SNAP.  I then
> used that training set to rerun maker:
> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap
> /RHA.hmm
> est2genome=0
> 
> And again I got results with no entries for gene, exon, CDS etc.
> 957 contig
>   46555 expressed_sequence_match
>   43651 match
>  553633 match_part
>  113738 protein_match
> 
> As I mentioned in another email, cegma results indicated that the genome was
> more than 90% complete. Any suggestions would be helpful.
> 
> Thank you
> Dhivya
> 
> 
> 
> 
> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
> 
>> Hi Dhivya, 
>> 
>> I think there a few numbers that could be helpful to understand what's
>> happening here. 
>> 
>> How many transcripts did Trinity assembly the RNA-seq data into? Also, you
>> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it
>> the cufflinks data. How many transcripts did MAKER identify with the
>> cufflinks data? Did you still get more than the 10,000 transcripts that you
>> found with just the Trinity data?
>> 
>> A key part of MAKER's approach to genome annotation that might be affecting
>> it's performance is that it only annotates a gene where there is both
>> evidence (like your RNA-seq data) and an ab-initio prediction. If a
>> prediction is unsupported by the evidence, then MAKER won't annotate a gene
>> and if evidence aligns where there's no prediction, MAKER won't annotate a
>> gene either. What ab-initio predictors are you using and have they been
>> trained specific genome?
>> 
>> You can force MAKER to automatically promote evidence alignments to a gene
>> model by setting the est2genome option to 1, but that will usually give you
>> many false positives.
>> 
>> Try rerunning it with either the Trinity data or the Cufflinks data and with
>> est2genome set to 1, and let us know how that affects the MAKER results.
>> 
>> Thanks,
>> Daniel
>> 
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> ________________________________________
>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya
>> arasappan [darasappan at gmail.com]
>> Sent: Thursday, January 30, 2014 11:18 AM
>> To: maker-devel at yandell-lab.org
>> Subject: [maker-devel] maker annotation with cufflinks output
>> 
>> Hello,
>> 
>> I am trying to annotate a 200 mb plant genome for which I have a very
>> good assembly.
>> 
>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>> using my genome assembly and the trinity results.  I did not get as
>> many transcripts as expected, around 10,000 transcripts.
>> 
>> So, I decided to try a different approach.  I did a genome assisted
>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
>> genome assembly and the cufflinks result.  I get much less number of
>> transcripts as a result.
>> 
>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>> confused as to why maker is not finding the same.
>> 
>> Any suggestions would be appreciated.
>> 
>> Thanks
>> Dhivya
>> 
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/9fd72060/attachment-0002.html>

From carsonhh at gmail.com  Thu Feb  6 10:04:25 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Feb 2014 10:04:25 -0700
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
	<C375C3D8-1B13-4685-9E90-AAF710CADCDD@gmail.com>
	<CF19004A.9913%carsonhh@gmail.com>
	<02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com>
Message-ID: <CF190E83.9927%carsonhh@gmail.com>

Could you give me the file without using 'head? to trim it, its cutting it
before it reaches the part I?m interested in.

?Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Thursday, February 6, 2014 at 10:01 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] maker annotation with cufflinks output

Oh yes I did- I took just the non sequence entries in the gff file and used
that as my input.  I will rerun snap with the gff file containing the
sequences as well. 

I'm attaching a snippet of the gff file that I used as input to maker2zff.

Thanks for your help
Dhivya


On Feb 6, 2014, at 10:05 AM, Carson Holt wrote:

> Your genome.dna file has no sequence?  Did you by any chance strip the fasta
> sequence from the GFF3 you are using as input to maker2zff?  There should be
> fasta sequence at the end of that file.  Also can I see the GFF3 file you are
> using as input to maker2zff.
> 
> Thanks,
> Carson
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Thursday, February 6, 2014 at 7:47 AM
> To:  Carson Holt <carsonhh at gmail.com>
> Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
> <maker-devel at yandell-lab.org>
> Subject:  Re: [maker-devel] maker annotation with cufflinks output
> 
> Hello,
> 
> I does appear than my genome.ann file from maker2zff script has data in it.
> However, the SNAP steps after that have created empty files.  The following
> are all empty:
> 
> alt.dna  err.dna  export.dna  genome.dna  olp.dna  uni.dna  wrn.dna
> alt.ann  err.ann  export.ann  genome.ann  olp.ann  uni.ann  wrn.ann
> 
> When I tried to get gene stats or validate genome.ann, I get errors like this
> for all of them:
> 
> fathom genome.ann genome.dna -gene-stats |more
> MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds
> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
> exon-6:out_of_bounds
> MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds
> exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds
> exon-1:out_of_bounds
> MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds
> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
> MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds
> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
> exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds
> exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds
> exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds
> exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds
> exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds
> exon-21:out_of_bounds
> 
> I'm not sure why the annotation I'm seeing in genome.ann are all showing up as
> errors. I realize this may be an issue with snap, but are you familiar with
> anything like this? My genome.ann file is attached for reference.
> 
> Thanks
> Dhivya
> 
> On Feb 5, 2014, at 12:38 PM, Carson Holt wrote:
> 
>> Do you have any features of type snap in your results from step 3?  We?ve had
>> a couple of recent posts where after training snap was giving no results, and
>> as a result maker couldn?t give any genes.  One cause of something like that
>> may be your step 2.  Make sure the ZFF wasn?t empty you used to train with.
>> The maker2zff script uses filters to only put the best genes in the off file,
>> and if all your genes fail the filtering then you are training with an empty
>> ZFF.
>> 
>> Also you should use proteins from a related species as your protein file.  I
>> see that you protein marches are varying wildly from run to run? So is your
>> contig count?  Were the subset of contigs you have results for long enough to
>> contain genes?
>> 
>> ?Carson
>> 
>> From:  dhivya arasappan <darasappan at gmail.com>
>> Date:  Monday, February 3, 2014 at 9:31 AM
>> To:  Daniel Ence <dence at genetics.utah.edu>
>> Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject:  Re: [maker-devel] maker annotation with cufflinks output
>> 
>> Hi Daniel,
>> 
>> I was able to check on some of those questions.
>> 
>> 1. From trinity assembly: I started with 102000 contigs. I used trinotate to
>> annotate proteins in this.
>> 
>> I ran maker on this data with est2genome set to 1. The output looks like this
>> (most important parts on top):
>> 
>>     6653 gene
>>    46675 exon
>>  280534 protein_match
>> 59934 CDS
>>     969 contig
>>  105388 expressed_sequence_match
>>   12584 five_prime_UTR
>>   78565 match
>> 1401369 match_part
>>   10180 mRNA
>>   11545 three_prime_UTR
>> 
>> 2. From cufflinks assembly: I started with 133380 entries (out of which there
>> are 29,000 transcripts).  I used the protein sequences from trinity assembly.
>> 
>> I ran maker on this data with est2genome set to 1. The output looks like
>> this:
>>      29 gene
>>      75 exon
>>  573659 protein_match
>> 67 CDS
>>    1099 contig
>>  269298 expressed_sequence_match
>>      23 five_prime_UTR
>>  173844 match
>> 2221846 match_part
>>      29 mRNA
>>      23 three_prime_UTR
>> 
>> The genes annotated using the trinity assembly is lower than expected, so I
>> went the cufflinks route. I dont understand why when using the cufflinks
>> transcripts, even less genes are being found.
>> 
>> 3. Training SNAP:  I used the results of maker from 1 to train SNAP.  I then
>> used that training set to rerun maker:
>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/sna
>> p/RHA.hmm
>> est2genome=0
>> 
>> And again I got results with no entries for gene, exon, CDS etc.
>> 957 contig
>>   46555 expressed_sequence_match
>>   43651 match
>>  553633 match_part
>>  113738 protein_match
>> 
>> As I mentioned in another email, cegma results indicated that the genome was
>> more than 90% complete. Any suggestions would be helpful.
>> 
>> Thank you
>> Dhivya
>> 
>> 
>> 
>> 
>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>> 
>>> Hi Dhivya, 
>>> 
>>> I think there a few numbers that could be helpful to understand what's
>>> happening here.
>>> 
>>> How many transcripts did Trinity assembly the RNA-seq data into? Also, you
>>> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it
>>> the cufflinks data. How many transcripts did MAKER identify with the
>>> cufflinks data? Did you still get more than the 10,000 transcripts that you
>>> found with just the Trinity data?
>>> 
>>> A key part of MAKER's approach to genome annotation that might be affecting
>>> it's performance is that it only annotates a gene where there is both
>>> evidence (like your RNA-seq data) and an ab-initio prediction. If a
>>> prediction is unsupported by the evidence, then MAKER won't annotate a gene
>>> and if evidence aligns where there's no prediction, MAKER won't annotate a
>>> gene either. What ab-initio predictors are you using and have they been
>>> trained specific genome?
>>> 
>>> You can force MAKER to automatically promote evidence alignments to a gene
>>> model by setting the est2genome option to 1, but that will usually give you
>>> many false positives.
>>> 
>>> Try rerunning it with either the Trinity data or the Cufflinks data and with
>>> est2genome set to 1, and let us know how that affects the MAKER results.
>>> 
>>> Thanks,
>>> Daniel
>>> 
>>> Daniel Ence
>>> Graduate Student
>>> Eccles Institute of Human Genetics
>>> University of Utah
>>> 15 North 2030 East, Room 2100
>>> Salt Lake City, UT 84112-5330
>>> ________________________________________
>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya
>>> arasappan [darasappan at gmail.com]
>>> Sent: Thursday, January 30, 2014 11:18 AM
>>> To: maker-devel at yandell-lab.org
>>> Subject: [maker-devel] maker annotation with cufflinks output
>>> 
>>> Hello,
>>> 
>>> I am trying to annotate a 200 mb plant genome for which I have a very
>>> good assembly.
>>> 
>>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>>> using my genome assembly and the trinity results.  I did not get as
>>> many transcripts as expected, around 10,000 transcripts.
>>> 
>>> So, I decided to try a different approach.  I did a genome assisted
>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
>>> genome assembly and the cufflinks result.  I get much less number of
>>> transcripts as a result.
>>> 
>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>>> confused as to why maker is not finding the same.
>>> 
>>> Any suggestions would be appreciated.
>>> 
>>> Thanks
>>> Dhivya
>>> 
>>> 
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
>> _______________________________________________ maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/0e6ce7ae/attachment-0002.html>

From darasappan at gmail.com  Thu Feb  6 10:01:44 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Thu, 6 Feb 2014 11:01:44 -0600
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <CF19004A.9913%carsonhh@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
	<C375C3D8-1B13-4685-9E90-AAF710CADCDD@gmail.com>
	<CF19004A.9913%carsonhh@gmail.com>
Message-ID: <02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com>

Oh yes I did- I took just the non sequence entries in the gff file and  
used that as my input.  I will rerun snap with the gff file containing  
the sequences as well.

I'm attaching a snippet of the gff file that I used as input to  
maker2zff.

Thanks for your help
Dhivya


On Feb 6, 2014, at 10:05 AM, Carson Holt wrote:

> Your genome.dna file has no sequence?  Did you by any chance strip  
> the fasta sequence from the GFF3 you are using as input to  
> maker2zff?  There should be fasta sequence at the end of that file.   
> Also can I see the GFF3 file you are using as input to maker2zff.
>
> Thanks,
> Carson
>
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Thursday, February 6, 2014 at 7:47 AM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org 
> " <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] maker annotation with cufflinks output
>
> Hello,
>
> I does appear than my genome.ann file from maker2zff script has data  
> in it. However, the SNAP steps after that have created empty files.   
> The following are all empty:
>
> alt.dna  err.dna  export.dna  genome.dna  olp.dna  uni.dna  wrn.dna
> alt.ann  err.ann  export.ann  genome.ann  olp.ann  uni.ann  wrn.ann
>
> When I tried to get gene stats or validate genome.ann, I get errors  
> like this for all of them:
>
> fathom genome.ann genome.dna -gene-stats |more
> MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds  
> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
> exon-5:out_of_bounds exon-6:out_of_bounds
> MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds  
> exon-5:out_of_bounds exon-4:out_of_bounds exon-3:out_of_bounds  
> exon-2:out_of_bounds exon-1:out_of_bounds
> MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds  
> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
> exon-5:out_of_bounds
> MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds  
> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
> exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds  
> exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds  
> exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds  
> exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds  
> exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds  
> exon-20:out_of_bounds exon-21:out_of_bounds
>
> I'm not sure why the annotation I'm seeing in genome.ann are all  
> showing up as errors. I realize this may be an issue with snap, but  
> are you familiar with anything like this? My genome.ann file is  
> attached for reference.
>
> Thanks
> Dhivya
>
> On Feb 5, 2014, at 12:38 PM, Carson Holt wrote:
>
>> Do you have any features of type snap in your results from step 3?   
>> We?ve had a couple of recent posts where after training snap was  
>> giving no results, and as a result maker couldn?t give any genes.   
>> One cause of something like that may be your step 2.  Make sure the  
>> ZFF wasn?t empty you used to train with.  The maker2zff script uses  
>> filters to only put the best genes in the off file, and if all your  
>> genes fail the filtering then you are training with an empty ZFF.
>>
>> Also you should use proteins from a related species as your protein  
>> file.  I see that you protein marches are varying wildly from run  
>> to run? So is your contig count?  Were the subset of contigs you  
>> have results for long enough to contain genes?
>>
>> ?Carson
>>
>> From: dhivya arasappan <darasappan at gmail.com>
>> Date: Monday, February 3, 2014 at 9:31 AM
>> To: Daniel Ence <dence at genetics.utah.edu>
>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] maker annotation with cufflinks output
>>
>> Hi Daniel,
>>
>> I was able to check on some of those questions.
>>
>> 1. From trinity assembly: I started with 102000 contigs. I used  
>> trinotate to annotate proteins in this.
>>
>> I ran maker on this data with est2genome set to 1. The output looks  
>> like this (most important parts on top):
>>
>>     6653 gene
>>    46675 exon
>>  280534 protein_match
>> 59934 CDS
>>     969 contig
>>  105388 expressed_sequence_match
>>   12584 five_prime_UTR
>>   78565 match
>> 1401369 match_part
>>   10180 mRNA
>>   11545 three_prime_UTR
>>
>> 2. From cufflinks assembly: I started with 133380 entries (out of  
>> which there are 29,000 transcripts).  I used the protein sequences  
>> from trinity assembly.
>>
>> I ran maker on this data with est2genome set to 1. The output looks  
>> like this:
>>      29 gene
>>      75 exon
>>  573659 protein_match
>> 67 CDS
>>    1099 contig
>>  269298 expressed_sequence_match
>>      23 five_prime_UTR
>>  173844 match
>> 2221846 match_part
>>      29 mRNA
>>      23 three_prime_UTR
>>
>> The genes annotated using the trinity assembly is lower than  
>> expected, so I went the cufflinks route. I dont understand why when  
>> using the cufflinks transcripts, even less genes are being found.
>>
>> 3. Training SNAP:  I used the results of maker from 1 to train  
>> SNAP.  I then used that training set to rerun maker:
>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
>> maker_mpi_withAlltrinity/snap/RHA.hmm
>> est2genome=0
>>
>> And again I got results with no entries for gene, exon, CDS etc.
>> 957 contig
>>   46555 expressed_sequence_match
>>   43651 match
>>  553633 match_part
>>  113738 protein_match
>>
>> As I mentioned in another email, cegma results indicated that the  
>> genome was more than 90% complete. Any suggestions would be helpful.
>>
>> Thank you
>> Dhivya
>>
>>
>>
>>
>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>>
>>> Hi Dhivya,
>>>
>>> I think there a few numbers that could be helpful to understand  
>>> what's happening here.
>>>
>>> How many transcripts did Trinity assembly the RNA-seq data into?  
>>> Also, you had 29,000 transcripts from cufflinks, but fewer from  
>>> MAKER when you gave it the cufflinks data. How many transcripts  
>>> did MAKER identify with the cufflinks data? Did you still get more  
>>> than the 10,000 transcripts that you found with just the Trinity  
>>> data?
>>>
>>> A key part of MAKER's approach to genome annotation that might be  
>>> affecting it's performance is that it only annotates a gene where  
>>> there is both evidence (like your RNA-seq data) and an ab-initio  
>>> prediction. If a prediction is unsupported by the evidence, then  
>>> MAKER won't annotate a gene and if evidence aligns where there's  
>>> no prediction, MAKER won't annotate a gene either. What ab-initio  
>>> predictors are you using and have they been trained specific genome?
>>>
>>> You can force MAKER to automatically promote evidence alignments  
>>> to a gene model by setting the est2genome option to 1, but that  
>>> will usually give you many false positives.
>>>
>>> Try rerunning it with either the Trinity data or the Cufflinks  
>>> data and with est2genome set to 1, and let us know how that  
>>> affects the MAKER results.
>>>
>>> Thanks,
>>> Daniel
>>>
>>> Daniel Ence
>>> Graduate Student
>>> Eccles Institute of Human Genetics
>>> University of Utah
>>> 15 North 2030 East, Room 2100
>>> Salt Lake City, UT 84112-5330
>>> ________________________________________
>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf  
>>> of dhivya arasappan [darasappan at gmail.com]
>>> Sent: Thursday, January 30, 2014 11:18 AM
>>> To: maker-devel at yandell-lab.org
>>> Subject: [maker-devel] maker annotation with cufflinks output
>>>
>>> Hello,
>>>
>>> I am trying to annotate a 200 mb plant genome for which I have a  
>>> very
>>> good assembly.
>>>
>>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>>> using my genome assembly and the trinity results.  I did not get as
>>> many transcripts as expected, around 10,000 transcripts.
>>>
>>> So, I decided to try a different approach.  I did a genome assisted
>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>>> generated 21,000 genes, 29,000 transcripts.  I then ran maker  
>>> using my
>>> genome assembly and the cufflinks result.  I get much less number of
>>> transcripts as a result.
>>>
>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>>> confused as to why maker is not finding the same.
>>>
>>> Any suggestions would be appreciated.
>>>
>>> Thanks
>>> Dhivya
>>>
>>>
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>> _______________________________________________ maker-devel mailing  
>> list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a662c5a7/attachment-0004.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: head.cat.formatted.gff
Type: application/octet-stream
Size: 19905 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a662c5a7/attachment-0002.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a662c5a7/attachment-0005.html>

From sjackman at gmail.com  Thu Feb  6 17:22:57 2014
From: sjackman at gmail.com (Shaun Jackman)
Date: Thu, 6 Feb 2014 16:22:57 -0800
Subject: [maker-devel] Adding MAKER to Homebrew for ease of installation
Message-ID: <CADX6M3r=29brfAzzjPr22mAGW28VUb7np5MJz5bEjsAL-o2r-w@mail.gmail.com>

Hi MAKER developers,

I?d like to add MAKER to Homebrew <http://brew.sh> to make the installation
of MAKER and its dependencies as straight forward as brew install maker.
Homebrew is a system for installing software, originally developed for Mac
OS, and now also for Linux through
Linuxbrew<https://github.com/Homebrew/linuxbrew>.
Homebrew/science <https://github.com/Homebrew/homebrew-science> is a
collection of scientific software, which includes a lot of bioinformatics
software.

I?ve created a prototype for the MAKER installation
script<https://github.com/Homebrew/homebrew-science/blob/maker/maker.rb>(called
a formula, in Homebrew parlance). Is there a static URL for the
source code of MAKER? The current formula won?t work out of the box,
because part of the
URL<https://github.com/Homebrew/homebrew-science/blob/maker/maker.rb#L7>depends
on the user?s unique ID:
http://yandell.topaz.genetics.utah.edu/maker_downloads/$key/maker-2.28.tgz.

Would you be interested in adding MAKER to Homebrew? I know MAKER must be
licensed for commercial use. It is possible for Homebrew to display a
notice of the MAKER license when it?s installed.

MAKER is not available for commercial use without a license. Those wishing
to license MAKER for commercial use should contact Beth Drees at the
University of Utah TCO to discuss your needs.

Cheers,
Shaun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/404a2418/attachment-0002.html>

From bioinformatics.umd at gmail.com  Fri Feb  7 06:29:27 2014
From: bioinformatics.umd at gmail.com (UMD Bioinformatics)
Date: Fri, 7 Feb 2014 08:29:27 -0500
Subject: [maker-devel] NCBI feature table
Message-ID: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com>

Hello Maker Developers,

I have used this software with great success and I continue to look to it going forward. However, as I?m getting ready to submit my annotations to NCBI with the genomes I haven?t found a straightforward method of turning the MAKER produced GFF files into a NCBI feature table. What is the process for creating this table? It seem that the format NCBI is looking for is unique and I haven?t uncovered any scripts or tools to assist in the creation of this table from my annotation files. If anyone has any insight on this issue it would be greatly appreciated.

Cheers
Ian


From mike.thon at gmail.com  Fri Feb  7 07:14:06 2014
From: mike.thon at gmail.com (Michael Thon)
Date: Fri, 7 Feb 2014 15:14:06 +0100
Subject: [maker-devel] NCBI feature table
In-Reply-To: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com>
References: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com>
Message-ID: <7239FEAE-64AF-4F91-B608-EDDF44B7B51D@gmail.com>

Hi Ian -

We've been struggling with this too and I started developing a script to convert the maker gff into ncbi's .tbl format.  However we found that some of the gene models required manual editing so what we do is import the gff into a commercial application called Geneious where we do the edits.  From there we export the data in genbank format and then convert it to .tbl format with a script. Our submission just passed the automated checks and we're waiting for the manual review. Probably none of my code will help you, and in any case its kind of a mess.  The only advice I can offer is to say that you'll probably need some manual editing in your workflow, if not Apollo, then some other app.  In that case you'll need to convert the output of that app into .tbl format.

> On Feb 7, 2014, at 2:29 PM, UMD Bioinformatics <bioinformatics.umd at gmail.com> wrote:
> 
> Hello Maker Developers,
> 
> I have used this software with great success and I continue to look to it going forward. However, as I?m getting ready to submit my annotations to NCBI with the genomes I haven?t found a straightforward method of turning the MAKER produced GFF files into a NCBI feature table. What is the process for creating this table? It seem that the format NCBI is looking for is unique and I haven?t uncovered any scripts or tools to assist in the creation of this table from my annotation files. If anyone has any insight on this issue it would be greatly appreciated.
> 
> Cheers
> Ian
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From cexzurjimenezjr at gmail.com  Thu Feb  6 22:27:13 2014
From: cexzurjimenezjr at gmail.com (Cexzur Jimenez Jr.)
Date: Fri, 7 Feb 2014 13:27:13 +0800
Subject: [maker-devel] Testing MAKER After Installation
Message-ID: <CABb+y6SiT7D8ZLZGLXNdBORAW5ks_GdRdvMhfb0co+kp1N1_2Q@mail.gmail.com>

Hello,

I have finished installing MAKER marked by "PERL Dependencies: INSTALLED,
External Programs: INSTALLED, MPI SUPPORT: NOT CONFIGURED,
MAKER: INSTALLED" and it seems everything's fine. I'm using MAKER 2.10 and
I have followed the installation instructions both in its corresponding
"README" and "INSTALL" files and the 2012 GMOD MAKER Tutorial. After
editing the three configuration files and run with "maker", I saw the
following error in my terminal. I have searched Google and tried the
solutions offered there but the error is still showing. Below is the error
I got:


Can't locate package GDBM_File for @AnyDBM_File::ISA at
/usr/lib/perl/5.14/DB_File.pm line 287.
Can't locate package NDBM_File for @AnyDBM_File::ISA at
/usr/lib/perl/5.14/DB_File.pm line 287.
Can't locate package SDBM_File for @AnyDBM_File::ISA at
/usr/lib/perl/5.14/DB_File.pm line 287.
A data structure will be created for you at:
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore

To access files for individual sequences use the datastore index:
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_master_datastore_index.log


--Next Contig--

#---------------------------------------------------------------------
Now starting the contig!!
SeqID: contig-dpp-500-500
Length: 32156
#---------------------------------------------------------------------


running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
/usr/local/maker/exe/RepeatMasker/RepeatMasker
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb
-species all -dir
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500
-pa 1
#-------------------------------#
Building general libraries in:
/usr/local/maker/exe/RepeatMasker/Libraries/20120418/general
RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb
on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib.
ERROR: RepeatMasker failed

FATAL ERROR
ERROR: Failed while doing repeat masking!!

ERROR: Chunk failed at level 2
!!
FAILED CONTIG:contig-dpp-500-500


--Next Contig--

Processing run.log file...
MAKER WARNING: The file
dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out
did not finish on the last run and must be erased
#---------------------------------------------------------------------
Now retrying the contig!!
SeqID: contig-dpp-500-500
Length: 32156
Retry: 1!!
#---------------------------------------------------------------------


running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
/usr/local/maker/exe/RepeatMasker/RepeatMasker
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb
-species all -dir
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500
-pa 1
#-------------------------------#
Building general libraries in:
/usr/local/maker/exe/RepeatMasker/Libraries/20120418/general
RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb
on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib.
ERROR: RepeatMasker failed

FATAL ERROR
ERROR: Failed while doing repeat masking!!

ERROR: Chunk failed at level 2
!!
FAILED CONTIG:contig-dpp-500-500


--Next Contig--

Processing run.log file...
MAKER WARNING: The file
dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out
did not finish on the last run and must be erased


Maker is now finished!!!


Can you state to me the error and what part of the installation did I go
wrong? Your help will be very much appreciated. Thank you.

Attached herein are configuration files I used for MAKER.


Sincerely,

CJ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140207/b2025b2a/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: maker_bopts.ctl
Type: application/octet-stream
Size: 1502 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140207/b2025b2a/attachment-0006.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: maker_exe.ctl
Type: application/octet-stream
Size: 1320 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140207/b2025b2a/attachment-0007.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: maker_opts.ctl
Type: application/octet-stream
Size: 4541 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140207/b2025b2a/attachment-0008.obj>

From carson.holt at genetics.utah.edu  Fri Feb  7 11:11:44 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Fri, 7 Feb 2014 18:11:44 +0000
Subject: [maker-devel] Maker installation
In-Reply-To: <CAEpzfGCB9HFkj+Kd2suNTRN_prriqipM26kdj=3gW=QygmXjmw@mail.gmail.com>
References: <CAEpzfGCB9HFkj+Kd2suNTRN_prriqipM26kdj=3gW=QygmXjmw@mail.gmail.com>
Message-ID: <CF1A6E45.99DF%carson.holt@genetics.utah.edu>

Hi Tracy,

The older apollo is pretty much deprecated.  There are still people who like to use it though (myself among them).  You can download and install it manually from here ?> http://sourceforge.net/projects/gmod/files/Apollo/.

If you want to let MAKER install it for you, you can edit the URL in the .../maker/src/locations file to be this ?> http://weatherby.genetics.utah.edu/apollo/apollo.tar.gz

You can also use Web-Apollo for your data if you want, and that is what I would recommend.

On a side note, if you are trying to install the old Apollo as part of the optional web-based GUI, I?d recommend not doing that.  The GUI is really only for demonstration purposes or very small datasets.  It is not for production (that is why it is off by default).

Thanks,
Carson

From: Tracy Smith <tmsmith23 at wisc.edu<mailto:tmsmith23 at wisc.edu>>
Date: Friday, February 7, 2014 at 10:48 AM
To: Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>
Cc: <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Maker installation

Hi,

I am trying to install Maker and am running into the same problem noted on this page, namely I cannot install Apollo.

https://groups.google.com/forum/#!msg/maker-devel/vrVa2mEsKbg/0e_25LvOvdEJ

I tried using the new url you provided, "Here is a new location for the source --> http://sourceforge.net/code-snapshots/svn/g/gm/gmod/svn/gmod-svn-25291-apollo-trunk.zip"
but that url now points nowhere.

Is it possible to use WebApollo instead? Or do you know of another location where a copy of Apollo could be downloaded?

Thank you so much.

Best regards,
Tracy

--
Tracy Smith
University of Wisconsin- Madison
Pepperell Lab
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140207/9ac7950e/attachment-0002.html>

From carson.holt at genetics.utah.edu  Fri Feb  7 11:28:29 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Fri, 7 Feb 2014 18:28:29 +0000
Subject: [maker-devel] NCBI feature table
In-Reply-To: <7239FEAE-64AF-4F91-B608-EDDF44B7B51D@gmail.com>
References: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com>
	<7239FEAE-64AF-4F91-B608-EDDF44B7B51D@gmail.com>
Message-ID: <CF1A7331.9A09%carson.holt@genetics.utah.edu>

Yes.  The non-web version of apollo can open GFF3 and then save to table
format ?> http://sourceforge.net/projects/gmod/files/Apollo/

I?ve also attached a script made by a lab member that can convert MAKER
derived GFF3 gene entries into raw table format, and I?ve CC?d the scripts
author (Michael Campbell) incase you have any questions.

Thanks,
Carson


On 2/7/14, 7:14 AM, "Michael Thon" <mike.thon at gmail.com> wrote:

>Hi Ian -
>
>We've been struggling with this too and I started developing a script to
>convert the maker gff into ncbi's .tbl format.  However we found that
>some of the gene models required manual editing so what we do is import
>the gff into a commercial application called Geneious where we do the
>edits.  From there we export the data in genbank format and then convert
>it to .tbl format with a script. Our submission just passed the automated
>checks and we're waiting for the manual review. Probably none of my code
>will help you, and in any case its kind of a mess.  The only advice I can
>offer is to say that you'll probably need some manual editing in your
>workflow, if not Apollo, then some other app.  In that case you'll need
>to convert the output of that app into .tbl format.
>
>> On Feb 7, 2014, at 2:29 PM, UMD Bioinformatics
>><bioinformatics.umd at gmail.com> wrote:
>> 
>> Hello Maker Developers,
>> 
>> I have used this software with great success and I continue to look to
>>it going forward. However, as I?m getting ready to submit my annotations
>>to NCBI with the genomes I haven?t found a straightforward method of
>>turning the MAKER produced GFF files into a NCBI feature table. What is
>>the process for creating this table? It seem that the format NCBI is
>>looking for is unique and I haven?t uncovered any scripts or tools to
>>assist in the creation of this table from my annotation files. If anyone
>>has any insight on this issue it would be greatly appreciated.
>> 
>> Cheers
>> Ian
>> 
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: gff32table
Type: application/octet-stream
Size: 7511 bytes
Desc: gff32table
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140207/2e51f964/attachment-0002.obj>

From carson.holt at genetics.utah.edu  Fri Feb  7 11:31:17 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Fri, 7 Feb 2014 18:31:17 +0000
Subject: [maker-devel] Testing MAKER After Installation
In-Reply-To: <CABb+y6SiT7D8ZLZGLXNdBORAW5ks_GdRdvMhfb0co+kp1N1_2Q@mail.gmail.com>
References: <CABb+y6SiT7D8ZLZGLXNdBORAW5ks_GdRdvMhfb0co+kp1N1_2Q@mail.gmail.com>
Message-ID: <CF1A7417.9A11%carson.holt@genetics.utah.edu>

That can happen on some systems with that very old version of MAKER.  Use MAKER 2.28 or 2.30 instead ?> http://www.yandell-lab.org/software/maker.html

Thanks,
Carson


From: "Cexzur Jimenez Jr." <cexzurjimenezjr at gmail.com<mailto:cexzurjimenezjr at gmail.com>>
Date: Thursday, February 6, 2014 at 10:27 PM
To: <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: [maker-devel] Testing MAKER After Installation

Hello,

I have finished installing MAKER marked by "PERL Dependencies: INSTALLED, External Programs: INSTALLED, MPI SUPPORT: NOT CONFIGURED,
MAKER: INSTALLED" and it seems everything's fine. I'm using MAKER 2.10 and I have followed the installation instructions both in its corresponding "README" and "INSTALL" files and the 2012 GMOD MAKER Tutorial. After editing the three configuration files and run with "maker", I saw the following error in my terminal. I have searched Google and tried the solutions offered there but the error is still showing. Below is the error I got:


Can't locate package GDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287.
Can't locate package NDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287.
Can't locate package SDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287.
A data structure will be created for you at:
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore

To access files for individual sequences use the datastore index:
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_master_datastore_index.log


--Next Contig--

#---------------------------------------------------------------------
Now starting the contig!!
SeqID: contig-dpp-500-500
Length: 32156
#---------------------------------------------------------------------


running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
/usr/local/maker/exe/RepeatMasker/RepeatMasker /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb -species all -dir /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500 -pa 1
#-------------------------------#
Building general libraries in: /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general
RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib.
ERROR: RepeatMasker failed

FATAL ERROR
ERROR: Failed while doing repeat masking!!

ERROR: Chunk failed at level 2
!!
FAILED CONTIG:contig-dpp-500-500


--Next Contig--

Processing run.log file...
MAKER WARNING: The file dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out
did not finish on the last run and must be erased
#---------------------------------------------------------------------
Now retrying the contig!!
SeqID: contig-dpp-500-500
Length: 32156
Retry: 1!!
#---------------------------------------------------------------------


running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
/usr/local/maker/exe/RepeatMasker/RepeatMasker /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb -species all -dir /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500 -pa 1
#-------------------------------#
Building general libraries in: /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general
RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib.
ERROR: RepeatMasker failed

FATAL ERROR
ERROR: Failed while doing repeat masking!!

ERROR: Chunk failed at level 2
!!
FAILED CONTIG:contig-dpp-500-500


--Next Contig--

Processing run.log file...
MAKER WARNING: The file dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out
did not finish on the last run and must be erased


Maker is now finished!!!


Can you state to me the error and what part of the installation did I go wrong? Your help will be very much appreciated. Thank you.

Attached herein are configuration files I used for MAKER.


Sincerely,

CJ

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140207/333ceab2/attachment-0002.html>

From bhall7 at hawaii.edu  Fri Feb  7 17:31:36 2014
From: bhall7 at hawaii.edu (Brian Hall)
Date: Fri, 07 Feb 2014 14:31:36 -1000
Subject: [maker-devel] NCBI feature table
In-Reply-To: <mailman.61.1391786169.26968.maker-devel_yandell-lab.org@box290.bluehost.com>
References: <mailman.61.1391786169.26968.maker-devel_yandell-lab.org@box290.bluehost.com>
Message-ID: <52F57AE8.5090002@hawaii.edu>

Hi Ian,

My colleagues are also working on preparing a genome for submission to 
the NCBI. The software we are developing for this task is still a work 
in progress, but you are welcome to give it a try:

https://github.com/tedsta/GAG

It's a console-based application and it requires Python 2.6. Its 
strength is in filtering and modifying large segments of the genome at 
once -- where Apollo is good for removing a few erroneous exons, we are 
dealing with lists of dozens or more. This program seeks to make such 
changes as painless as possible.

My advice is to try the simplest gff3-to-tbl script you can find and 
then run tbl2asn. If it works out okay, great! If you get a massive 
error report, get in touch and we'll help you out if we can :)

--Brian

On 02/07/2014 05:16 AM, maker-devel-request at yandell-lab.org wrote:
> Date: Fri, 7 Feb 2014 08:29:27 -0500
> From: UMD Bioinformatics <bioinformatics.umd at gmail.com>
> To: maker-devel at yandell-lab.org
> Subject: [maker-devel] NCBI feature table
> Message-ID: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8 at gmail.com>
> Content-Type: text/plain; charset=windows-1252
>
> Hello Maker Developers,
>
> I have used this software with great success and I continue to look to it going forward. However, as I?m getting ready to submit my annotations to NCBI with the genomes I haven?t found a straightforward method of turning the MAKER produced GFF files into a NCBI feature table. What is the process for creating this table? It seem that the format NCBI is looking for is unique and I haven?t uncovered any scripts or tools to assist in the creation of this table from my annotation files. If anyone has any insight on this issue it would be greatly appreciated.
>
> Cheers
> Ian
>


From tmsmith23 at wisc.edu  Fri Feb  7 10:48:13 2014
From: tmsmith23 at wisc.edu (Tracy Smith)
Date: Fri, 7 Feb 2014 11:48:13 -0600
Subject: [maker-devel] Maker installation
Message-ID: <CAEpzfGCB9HFkj+Kd2suNTRN_prriqipM26kdj=3gW=QygmXjmw@mail.gmail.com>

Hi,

I am trying to install Maker and am running into the same problem noted on
this page, namely I cannot install Apollo.

https://groups.google.com/forum/#!msg/maker-devel/vrVa2mEsKbg/0e_25LvOvdEJ

I tried using the new url you provided, "Here is a new location for the
source -->
http://sourceforge.net/code-snapshots/svn/g/gm/gmod/svn/gmod-svn-25291-apollo-trunk.zip
"
but that url now points nowhere.

Is it possible to use WebApollo instead? Or do you know of another location
where a copy of Apollo could be downloaded?

Thank you so much.

Best regards,
Tracy

-- 
Tracy Smith
University of Wisconsin- Madison
Pepperell Lab
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140207/0ddc7929/attachment-0002.html>

From carsonhh at gmail.com  Mon Feb 10 08:34:58 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Feb 2014 08:34:58 -0700
Subject: [maker-devel] MAKER presentation at PAG
In-Reply-To: <CAAer89Z=ivW==Pv0eSA+RQtPg1r9JoLHv7hH+TP2c4=DUwh8tg@mail.gmail.com>
References: <CAAer89Z=ivW==Pv0eSA+RQtPg1r9JoLHv7hH+TP2c4=DUwh8tg@mail.gmail.com>
Message-ID: <CF1E3E65.9B13%carsonhh@gmail.com>

* 
* maker_map_ids - Build shorter IDs/Names for MAKER genes and transcripts
following the NCBI suggested naming format.
* map_fasta_ids - Maps short IDs/Names generated by maker_map_ids to MAKER
fasta files.
* map_gff_ids - Maps short IDs/Names generated by maker_map_id to MAKER GFF3
files, old IDs/Names are mapped to to the Alias attribute.
* maker_functional_fasta - Maps putative functions identified from BLASTP
against UniProt/SwissProt to the MAKER produced transcript and protein fasta
files.
* maker_functional_gff - Maps putative functions identified from BLASTP
against UniProt/SwissProt to the MAKER produced GFF3 files in the Note
attribute
* ipr_update_gff - Takes InterproScan (iprscan) output and maps domain IDs
and GO terms to the Dbxref and Ontology_term attributes in the GFF3 file.
This is meta data that shows up when you click on an annotation in JBrowse
/GBrowse.
* iprscan2gff3 - Takes InerproScan (iprscan) output and generates GFF3
features representing domains. Interesting tier for GBrowse. These are
visible features tracks that can be seen in JBrowse/GBrowse.
Thanks,
Carson

From:  Kevin Dorn <dorn at umn.edu>
Date:  Sunday, February 9, 2014 at 9:23 PM
To:  <carson.holt at utah.edu>
Subject:  MAKER presentation at PAG

Hi Carson, 

I saw your MAKER presentation at PAG this year and have a quick question.
I've used MAKER to annotate the plant genome we're working on, and am mostly
done. I had to step out for a second during your talk, and when I came back,
you were talking about how you can transfer meaningful annotations (getting
rid of the 'ugly MAKER names' for genes). Is there an accessory script to do
this? 

Thanks, 
Kevin Dorn 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140210/26f43039/attachment-0002.html>

From amitha at ccmb.res.in  Mon Feb 10 00:04:37 2014
From: amitha at ccmb.res.in (AMITHA SAMPATH KUMAR)
Date: Mon, 10 Feb 2014 12:34:37 +0530 (IST)
Subject: [maker-devel] Falied to create new account
In-Reply-To: <bea52988-c660-488d-aae4-196364348cea@node1>
Message-ID: <11349995-a97a-43fd-9fd6-420dd067cd6b@node1>

Hi,

I an interested in using Maker online version, for which i tried to create a profile using the email id 'amitha at ccmb.res.in', but unfortunately, I did not successfully login. 
I am also pasting a link of the error here, http://weatherby.genetics.utah.edu/cgi-bin/mwas/maker.cgi.

The error mentioned is:
Error executing run mode 'forgot_login': Can't call method "MailMsg" without a package or object reference at /var/www/cgi-bin/mwas/lib/MWAS_util.pm line 529.
 at /var/www/cgi-bin/mwas/maker.cgi line 21.

Kindly help me through the registration asap.

Thanks
Amitha.


From listona at science.oregonstate.edu  Sat Feb  8 19:08:42 2014
From: listona at science.oregonstate.edu (Aaron Liston)
Date: Sat, 08 Feb 2014 18:08:42 -0800
Subject: [maker-devel] Re-using repeat masking in SNAP training
Message-ID: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu>

I am following the tutorial for training SNAP, and it works fine.  
However, the tutorial instructions have MAKER repeat the repeat  
masking. To avoid this, I concatenated my gff files from the first  
round of annotation and used maker_gff=round1.gff and rm_pass=1  but  
at the end of the process, the repeat annotations were not there. Any  
suggestions?  Thanks, Aaron


From caigh02 at gmail.com  Sun Feb  9 20:26:57 2014
From: caigh02 at gmail.com (Guohong Cai)
Date: Sun, 9 Feb 2014 21:26:57 -0600
Subject: [maker-devel] Fwd: Functional annotation of MAKER gene models
In-Reply-To: <CAOcLemT5qaFvSRfjQ1QrObr9WCLh915aJ14a7ZbSemcuOBypfQ@mail.gmail.com>
References: <CAOcLemT5qaFvSRfjQ1QrObr9WCLh915aJ14a7ZbSemcuOBypfQ@mail.gmail.com>
Message-ID: <CAOcLemT3CCPmWMpwoZr_w322Gv9ZXFrmD70t7ygZWOk1Kq9TMg@mail.gmail.com>

I sent the following message to Carson but forgot to send to the
maker-devel list

Hi Carson,

Again need your help!

With your guidance, I have the gene models for my genomes. Now I am trying
to assign functions to the gene models. I noticed that I can use
maker_functional_gff/fasta or interproScan. I dig out some old messages in
maker-devel google group, but still have a few questions:

1. Will maker_functional_gff/fasta take NCBI blastp results, or only
wu-blast results? I do not have wu-blast.

2. Do I have to use Uniprot/Swiss_prot database or I can use something
else? For example, may I add a few high-quality genome annotations of
related species to the swiss_prot database? Or may I use Uniref90 or nr
database instead of swiss_prot?

3. Do you have a script to integrate blast2go results to the maker
gff/fasta?

Thanks.

Guohong

Rutgers University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140209/bad045be/attachment-0002.html>

From carsonhh at gmail.com  Mon Feb 10 10:25:06 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Feb 2014 10:25:06 -0700
Subject: [maker-devel] Falied to create new account
In-Reply-To: <11349995-a97a-43fd-9fd6-420dd067cd6b@node1>
References: <bea52988-c660-488d-aae4-196364348cea@node1>
	<11349995-a97a-43fd-9fd6-420dd067cd6b@node1>
Message-ID: <CF1E5936.9B37%carsonhh@gmail.com>

The smtp server that sends e-mails out is just down.  So when you said you
forgot your login, it couldn?t e-mail you.  I switched to a different
server for the time being.

?Carson


On 2/10/14, 12:04 AM, "AMITHA SAMPATH KUMAR" <amitha at ccmb.res.in> wrote:

>Hi,
>
>I an interested in using Maker online version, for which i tried to
>create a profile using the email id 'amitha at ccmb.res.in', but
>unfortunately, I did not successfully login.
>I am also pasting a link of the error here,
>http://weatherby.genetics.utah.edu/cgi-bin/mwas/maker.cgi.
>
>The error mentioned is:
>Error executing run mode 'forgot_login': Can't call method "MailMsg"
>without a package or object reference at
>/var/www/cgi-bin/mwas/lib/MWAS_util.pm line 529.
> at /var/www/cgi-bin/mwas/maker.cgi line 21.
>
>Kindly help me through the registration asap.
>
>Thanks
>Amitha.
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Mon Feb 10 10:26:06 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Feb 2014 10:26:06 -0700
Subject: [maker-devel] Fwd: Functional annotation of MAKER gene models
In-Reply-To: <CAOcLemT3CCPmWMpwoZr_w322Gv9ZXFrmD70t7ygZWOk1Kq9TMg@mail.gmail.com>
References: <CAOcLemT5qaFvSRfjQ1QrObr9WCLh915aJ14a7ZbSemcuOBypfQ@mail.gmail.com>
	<CAOcLemT3CCPmWMpwoZr_w322Gv9ZXFrmD70t7ygZWOk1Kq9TMg@mail.gmail.com>
Message-ID: <CF1E59B4.9B3B%carsonhh@gmail.com>

1. yes. It should take NCBI BLAST+ results.
2. It has to be UniProt/Swissprot or you can modify the comments of another
database to look like UniProt/Swissport
3. ipr_update_gff, can also take BLAST2GO results as an undocumented feature
(or at least it could last time I tested it - which was quite a long time
ago).

Thanks,
Carson

From:  Guohong Cai <caigh02 at gmail.com>
Date:  Sunday, February 9, 2014 at 8:26 PM
To:  <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Fwd: Functional annotation of MAKER gene models

I sent the following message to Carson but forgot to send to the maker-devel
list

Hi Carson,

Again need your help!

With your guidance, I have the gene models for my genomes. Now I am trying
to assign functions to the gene models. I noticed that I can use
maker_functional_gff/fasta or interproScan. I dig out some old messages in
maker-devel google group, but still have a few questions:

1. Will maker_functional_gff/fasta take NCBI blastp results, or only
wu-blast results? I do not have wu-blast.

2. Do I have to use Uniprot/Swiss_prot database or I can use something else?
For example, may I add a few high-quality genome annotations of related
species to the swiss_prot database? Or may I use Uniref90 or nr database
instead of swiss_prot?

3. Do you have a script to integrate blast2go results to the maker
gff/fasta?  

Thanks.

Guohong

Rutgers University 

_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140210/5042428b/attachment-0002.html>

From barry.utah at gmail.com  Mon Feb 10 12:21:31 2014
From: barry.utah at gmail.com (Barry Moore)
Date: Mon, 10 Feb 2014 12:21:31 -0700
Subject: [maker-devel] Re-using repeat masking in SNAP training
In-Reply-To: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu>
References: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu>
Message-ID: <78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu>

Hi Arron,

If you re-run maker and don't change the details about the repeat library (i.e. you only update the SNAP HMM file) then MAKER shouldn't redo any work with repeat masking it should reuse the work it has already done.  Is this not what you are seeing?

Barry


On Feb 8, 2014, at 7:08 PM, Aaron Liston wrote:

> I am following the tutorial for training SNAP, and it works fine. However, the tutorial instructions have MAKER repeat the repeat masking. To avoid this, I concatenated my gff files from the first round of annotation and used maker_gff=round1.gff and rm_pass=1  but at the end of the process, the repeat annotations were not there. Any suggestions?  Thanks, Aaron
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140210/15a1305a/attachment-0002.html>

From listona at science.oregonstate.edu  Mon Feb 10 12:46:06 2014
From: listona at science.oregonstate.edu (Aaron Liston)
Date: Mon, 10 Feb 2014 11:46:06 -0800
Subject: [maker-devel] Re-using repeat masking in SNAP training
In-Reply-To: <78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu>
References: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu>
	<78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu>
Message-ID: <02b401cf2698$bd2a1550$377e3ff0$@science.oregonstate.edu>

Hi Barry:   I changed the name of the genome file, so that I could see the
results at each step. However, it sounds like if I had kept the same name,
MAKER would use the info from the previous run.  Is that correct?  Aaron

 
From: Barry Moore [mailto:barry.utah at gmail.com] 
Sent: Monday, February 10, 2014 11:22 AM
To: Aaron Liston
Cc: maker-devel at yandell-lab.org
Subject: Re: [maker-devel] Re-using repeat masking in SNAP training

 
Hi Arron,

 
If you re-run maker and don't change the details about the repeat library
(i.e. you only update the SNAP HMM file) then MAKER shouldn't redo any work
with repeat masking it should reuse the work it has already done.  Is this
not what you are seeing?

 
Barry

 
On Feb 8, 2014, at 7:08 PM, Aaron Liston wrote:


I am following the tutorial for training SNAP, and it works fine. However,
the tutorial instructions have MAKER repeat the repeat masking. To avoid
this, I concatenated my gff files from the first round of annotation and
used maker_gff=round1.gff and rm_pass=1  but at the end of the process, the
repeat annotations were not there. Any suggestions?  Thanks, Aaron


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

 
Barry Moore

Research Scientist

Dept. of Human Genetics

University of Utah

Salt Lake City, UT 84112

--------------------------------------------

(801) 585-3543

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140210/6c808a76/attachment-0002.html>

From barry.utah at gmail.com  Mon Feb 10 12:56:26 2014
From: barry.utah at gmail.com (Barry Moore)
Date: Mon, 10 Feb 2014 12:56:26 -0700
Subject: [maker-devel] Re-using repeat masking in SNAP training
In-Reply-To: <02b401cf2698$bd2a1550$377e3ff0$@science.oregonstate.edu>
References: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu>
	<78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu>
	<02b401cf2698$bd2a1550$377e3ff0$@science.oregonstate.edu>
Message-ID: <19FC4633-46F6-4B32-820A-A68C242A1E77@gmail.com>

Yep.  If you want to keep the results from each step just copy the GFF3 file from your first run to a new name and then redo your run.

B

On Feb 10, 2014, at 12:46 PM, Aaron Liston wrote:

> Hi Barry:   I changed the name of the genome file, so that I could see the results at each step. However, it sounds like if I had kept the same name, MAKER would use the info from the previous run.  Is that correct?  Aaron
>  
> From: Barry Moore [mailto:barry.utah at gmail.com] 
> Sent: Monday, February 10, 2014 11:22 AM
> To: Aaron Liston
> Cc: maker-devel at yandell-lab.org
> Subject: Re: [maker-devel] Re-using repeat masking in SNAP training
>  
> Hi Arron,
>  
> If you re-run maker and don't change the details about the repeat library (i.e. you only update the SNAP HMM file) then MAKER shouldn't redo any work with repeat masking it should reuse the work it has already done.  Is this not what you are seeing?
>  
> Barry
>  
>  
> On Feb 8, 2014, at 7:08 PM, Aaron Liston wrote:
> 
> 
> I am following the tutorial for training SNAP, and it works fine. However, the tutorial instructions have MAKER repeat the repeat masking. To avoid this, I concatenated my gff files from the first round of annotation and used maker_gff=round1.gff and rm_pass=1  but at the end of the process, the repeat annotations were not there. Any suggestions?  Thanks, Aaron
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>  
> Barry Moore
> Research Scientist
> Dept. of Human Genetics
> University of Utah
> Salt Lake City, UT 84112
> --------------------------------------------
> (801) 585-3543
>  
>  
>  
>  

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140210/344b73a2/attachment-0002.html>

From dence at genetics.utah.edu  Tue Feb 11 11:37:36 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Tue, 11 Feb 2014 18:37:36 +0000
Subject: [maker-devel] Falied to create new account
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A89089AF@SKREGIXES2.AGR.GC.CA>
References: <bea52988-c660-488d-aae4-196364348cea@node1>
	<11349995-a97a-43fd-9fd6-420dd067cd6b@node1>
	<CF1E5936.9B37%carsonhh@gmail.com>
	<E8EDFB90D92694478065C37017B3A3A6A8908910@SKREGIXES2.AGR.GC.CA>
	<CF1FA919.9BBB%carsonhh@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D445A3@mxb2.hg.genetics.utah.edu>,
	<E8EDFB90D92694478065C37017B3A3A6A89089AF@SKREGIXES2.AGR.GC.CA>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D445B3@mxb2.hg.genetics.utah.edu>

Hossein, 

Ok. So since this error came up on a local install, I'm going to need some more information to understand what went wrong. Is it the same contig that always causes this error? If it is, then is the the only error or warning that MAKER encounters while running on this contig? Or, if multiple contigs fail, then is it always the same error? 

If you can narrow it down to the smallest possible dataset that consistently gives the same error, then we canb egin to understand what's wrong. 

Thanks,
Daniel 


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
Sent: Tuesday, February 11, 2014 11:20 AM
To: Daniel Ence
Subject: Re: [maker-devel] Falied to create new account

Hi Daniel

I running it through the local server at my work


M. Hossein Borhan, Ph.D.
Research Scientist/ Chercheur Scientifique
Saskatoon Research Centre/Centre de Recherches de Saskatoon
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
107 Science Place, Saskatoon, SK.,S7N 0X2
Telephone/T?l?phone: (306) 385-9441
Facsimile/T?l?copieur: (306) 385-9482
Hossein.borhan at agr.gc.ca


On 14-02-11 12:16 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hi Hossein,
>
>Did you encounter this error while you were running MAKER on your local
>machine or through the MAKER web annotation service?
>
>Thanks,
>Daniel
>
>
>Daniel Ence
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: Carson Holt [carsonhh at gmail.com]
>Sent: Tuesday, February 11, 2014 10:18 AM
>To: Daniel Ence
>Cc: Mark Yandell
>Subject: FW: [maker-devel] Falied to create new account
>
>Hey Daniel could you download his dataset, and see if you can replicate
>the error.  Also check if this was an MWAS job or a local maker run (his
>dataset will already be there for MWAS, you just need the job ID).
>
>Thanks,
>Carson
>
>On 2/11/14, 10:16 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:
>
>>Hi Carson
>>
>>
>>I encountered this error while running maker
>>
>>FATAL ERROR
>>ERROR: Failed while processing the chunk divide!!
>>
>>ERROR: Chunk failed at level 17
>>!!
>>FAILED CONTIG:PbPT3Sc00006
>>
>>
>>
>>
>>
>>HB
>>
>>
>>
>>
>>
>>
>>
>>>
>>
>
>


From darasappan at gmail.com  Tue Feb 11 11:48:23 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Tue, 11 Feb 2014 12:48:23 -0600
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <CF19187C.994D%carsonhh@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
	<C375C3D8-1B13-4685-9E90-AAF710CADCDD@gmail.com>
	<CF19004A.9913%carsonhh@gmail.com>
	<02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com>
	<CF190E83.9927%carsonhh@gmail.com>
	<CAGWaY_4mGU2DLWwcQ=_F3-O+YE1ZmDtE=zgdi6cVouhkH=N5HQ@mail.gmail.com>
	<CF19187C.994D%carsonhh@gmail.com>
Message-ID: <0BB3E178-1CA3-46E7-8923-3E7C6B834665@gmail.com>

With your suggested changes (using a protein file not derived from the  
RNA-seq data and fixing the gff file for training SNAP), I was able to  
increase the number of genes from 6000+ to 18116.

I'm now trying to evaluate the quality of the annotation.  I have a  
question about the usage for mpi_evaluator.

In the maker tutorial,  the usage is given as:

  mpi_evaluator [options] <eval_opts> <eval_bopts> <eval_exe>
What files are being referred to in the input parameters: eval_opts,  
eval_bopts and eval_exe?

Thanks
Dhivya

On Feb 6, 2014, at 11:47 AM, Carson Holt wrote:

> Ok.  Content looks good.  Just make sure to use gff3_merge to join  
> the GFF3?s without stripping out the fasta sequence at the end when  
> training SNAP.
>
> Thanks,
> Carson
>
>
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Thursday, February 6, 2014 at 10:29 AM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: Daniel Ence <dence at genetics.utah.edu>
> Subject: Re: [maker-devel] maker annotation with cufflinks output
>
> Sorry I was just trying to make it small enough to be approved by  
> the mailing list.
>
> Here is the whole file:
>
>
>  cat.formatted.gff.tgz
>
>
>
> On Thu, Feb 6, 2014 at 11:04 AM, Carson Holt <carsonhh at gmail.com>  
> wrote:
>> Could you give me the file without using 'head? to trim it, its  
>> cutting it before it reaches the part I?m interested in.
>>
>> ?Carson
>>
>>
>> From: dhivya arasappan <darasappan at gmail.com>
>> Date: Thursday, February 6, 2014 at 10:01 AM
>>
>> To: Carson Holt <carsonhh at gmail.com>
>> Cc: Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org 
>> " <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] maker annotation with cufflinks output
>>
>> Oh yes I did- I took just the non sequence entries in the gff file  
>> and used that as my input.  I will rerun snap with the gff file  
>> containing the sequences as well.
>>
>> I'm attaching a snippet of the gff file that I used as input to  
>> maker2zff.
>>
>> Thanks for your help
>> Dhivya
>>
>>
>>
>>
>> On Feb 6, 2014, at 10:05 AM, Carson Holt wrote:
>>
>>> Your genome.dna file has no sequence?  Did you by any chance strip  
>>> the fasta sequence from the GFF3 you are using as input to  
>>> maker2zff?  There should be fasta sequence at the end of that  
>>> file.  Also can I see the GFF3 file you are using as input to  
>>> maker2zff.
>>>
>>> Thanks,
>>> Carson
>>>
>>> From: dhivya arasappan <darasappan at gmail.com>
>>> Date: Thursday, February 6, 2014 at 7:47 AM
>>> To: Carson Holt <carsonhh at gmail.com>
>>> Cc: Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org 
>>> " <maker-devel at yandell-lab.org>
>>> Subject: Re: [maker-devel] maker annotation with cufflinks output
>>>
>>> Hello,
>>>
>>> I does appear than my genome.ann file from maker2zff script has  
>>> data in it. However, the SNAP steps after that have created empty  
>>> files.  The following are all empty:
>>>
>>> alt.dna  err.dna  export.dna  genome.dna  olp.dna  uni.dna  wrn.dna
>>> alt.ann  err.ann  export.ann  genome.ann  olp.ann  uni.ann  wrn.ann
>>>
>>> When I tried to get gene stats or validate genome.ann, I get  
>>> errors like this for all of them:
>>>
>>> fathom genome.ann genome.dna -gene-stats |more
>>> MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds  
>>> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
>>> exon-5:out_of_bounds exon-6:out_of_bounds
>>> MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds  
>>> exon-5:out_of_bounds exon-4:out_of_bounds exon-3:out_of_bounds  
>>> exon-2:out_of_bounds exon-1:out_of_bounds
>>> MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds  
>>> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
>>> exon-5:out_of_bounds
>>> MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds  
>>> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
>>> exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds  
>>> exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds  
>>> exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds  
>>> exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds  
>>> exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds  
>>> exon-20:out_of_bounds exon-21:out_of_bounds
>>>
>>> I'm not sure why the annotation I'm seeing in genome.ann are all  
>>> showing up as errors. I realize this may be an issue with snap,  
>>> but are you familiar with anything like this? My genome.ann file  
>>> is attached for reference.
>>>
>>> Thanks
>>> Dhivya
>>>
>>> On Feb 5, 2014, at 12:38 PM, Carson Holt wrote:
>>>
>>>> Do you have any features of type snap in your results from step  
>>>> 3?  We?ve had a couple of recent posts where after training snap  
>>>> was giving no results, and as a result maker couldn?t give any  
>>>> genes.  One cause of something like that may be your step 2.   
>>>> Make sure the ZFF wasn?t empty you used to train with.  The  
>>>> maker2zff script uses filters to only put the best genes in the  
>>>> off file, and if all your genes fail the filtering then you are  
>>>> training with an empty ZFF.
>>>>
>>>> Also you should use proteins from a related species as your  
>>>> protein file.  I see that you protein marches are varying wildly  
>>>> from run to run? So is your contig count?  Were the subset of  
>>>> contigs you have results for long enough to contain genes?
>>>>
>>>> ?Carson
>>>>
>>>> From: dhivya arasappan <darasappan at gmail.com>
>>>> Date: Monday, February 3, 2014 at 9:31 AM
>>>> To: Daniel Ence <dence at genetics.utah.edu>
>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>> Subject: Re: [maker-devel] maker annotation with cufflinks output
>>>>
>>>> Hi Daniel,
>>>>
>>>> I was able to check on some of those questions.
>>>>
>>>> 1. From trinity assembly: I started with 102000 contigs. I used  
>>>> trinotate to annotate proteins in this.
>>>>
>>>> I ran maker on this data with est2genome set to 1. The output  
>>>> looks like this (most important parts on top):
>>>>
>>>>     6653 gene
>>>>    46675 exon
>>>>  280534 protein_match
>>>> 59934 CDS
>>>>     969 contig
>>>>  105388 expressed_sequence_match
>>>>   12584 five_prime_UTR
>>>>   78565 match
>>>> 1401369 match_part
>>>>   10180 mRNA
>>>>   11545 three_prime_UTR
>>>>
>>>> 2. From cufflinks assembly: I started with 133380 entries (out of  
>>>> which there are 29,000 transcripts).  I used the protein  
>>>> sequences from trinity assembly.
>>>>
>>>> I ran maker on this data with est2genome set to 1. The output  
>>>> looks like this:
>>>>      29 gene
>>>>      75 exon
>>>>  573659 protein_match
>>>> 67 CDS
>>>>    1099 contig
>>>>  269298 expressed_sequence_match
>>>>      23 five_prime_UTR
>>>>  173844 match
>>>> 2221846 match_part
>>>>      29 mRNA
>>>>      23 three_prime_UTR
>>>>
>>>> The genes annotated using the trinity assembly is lower than  
>>>> expected, so I went the cufflinks route. I dont understand why  
>>>> when using the cufflinks transcripts, even less genes are being  
>>>> found.
>>>>
>>>> 3. Training SNAP:  I used the results of maker from 1 to train  
>>>> SNAP.  I then used that training set to rerun maker:
>>>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
>>>> maker_mpi_withAlltrinity/snap/RHA.hmm
>>>> est2genome=0
>>>>
>>>> And again I got results with no entries for gene, exon, CDS etc.
>>>> 957 contig
>>>>   46555 expressed_sequence_match
>>>>   43651 match
>>>>  553633 match_part
>>>>  113738 protein_match
>>>>
>>>> As I mentioned in another email, cegma results indicated that the  
>>>> genome was more than 90% complete. Any suggestions would be  
>>>> helpful.
>>>>
>>>> Thank you
>>>> Dhivya
>>>>
>>>>
>>>>
>>>>
>>>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>>>>
>>>>> Hi Dhivya,
>>>>>
>>>>> I think there a few numbers that could be helpful to understand  
>>>>> what's happening here.
>>>>>
>>>>> How many transcripts did Trinity assembly the RNA-seq data into?  
>>>>> Also, you had 29,000 transcripts from cufflinks, but fewer from  
>>>>> MAKER when you gave it the cufflinks data. How many transcripts  
>>>>> did MAKER identify with the cufflinks data? Did you still get  
>>>>> more than the 10,000 transcripts that you found with just the  
>>>>> Trinity data?
>>>>>
>>>>> A key part of MAKER's approach to genome annotation that might  
>>>>> be affecting it's performance is that it only annotates a gene  
>>>>> where there is both evidence (like your RNA-seq data) and an ab- 
>>>>> initio prediction. If a prediction is unsupported by the  
>>>>> evidence, then MAKER won't annotate a gene and if evidence  
>>>>> aligns where there's no prediction, MAKER won't annotate a gene  
>>>>> either. What ab-initio predictors are you using and have they  
>>>>> been trained specific genome?
>>>>>
>>>>> You can force MAKER to automatically promote evidence alignments  
>>>>> to a gene model by setting the est2genome option to 1, but that  
>>>>> will usually give you many false positives.
>>>>>
>>>>> Try rerunning it with either the Trinity data or the Cufflinks  
>>>>> data and with est2genome set to 1, and let us know how that  
>>>>> affects the MAKER results.
>>>>>
>>>>> Thanks,
>>>>> Daniel
>>>>>
>>>>> Daniel Ence
>>>>> Graduate Student
>>>>> Eccles Institute of Human Genetics
>>>>> University of Utah
>>>>> 15 North 2030 East, Room 2100
>>>>> Salt Lake City, UT 84112-5330
>>>>> ________________________________________
>>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on  
>>>>> behalf of dhivya arasappan [darasappan at gmail.com]
>>>>> Sent: Thursday, January 30, 2014 11:18 AM
>>>>> To: maker-devel at yandell-lab.org
>>>>> Subject: [maker-devel] maker annotation with cufflinks output
>>>>>
>>>>> Hello,
>>>>>
>>>>> I am trying to annotate a 200 mb plant genome for which I have a  
>>>>> very
>>>>> good assembly.
>>>>>
>>>>> I tried to denovo assemble RNA-seq data using trinity and ran  
>>>>> maker
>>>>> using my genome assembly and the trinity results.  I did not get  
>>>>> as
>>>>> many transcripts as expected, around 10,000 transcripts.
>>>>>
>>>>> So, I decided to try a different approach.  I did a genome  
>>>>> assisted
>>>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>>>>> generated 21,000 genes, 29,000 transcripts.  I then ran maker  
>>>>> using my
>>>>> genome assembly and the cufflinks result.  I get much less  
>>>>> number of
>>>>> transcripts as a result.
>>>>>
>>>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>>>>> confused as to why maker is not finding the same.
>>>>>
>>>>> Any suggestions would be appreciated.
>>>>>
>>>>> Thanks
>>>>> Dhivya
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> maker-devel mailing list
>>>>> maker-devel at box290.bluehost.com
>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>
>>>> _______________________________________________ maker-devel  
>>>> mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140211/bf1fae70/attachment-0002.html>

From carsonhh at gmail.com  Tue Feb 11 11:55:38 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 11 Feb 2014 11:55:38 -0700
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <0BB3E178-1CA3-46E7-8923-3E7C6B834665@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
	<C375C3D8-1B13-4685-9E90-AAF710CADCDD@gmail.com>
	<CF19004A.9913%carsonhh@gmail.com>
	<02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com>
	<CF190E83.9927%carsonhh@gmail.com>
	<CAGWaY_4mGU2DLWwcQ=_F3-O+YE1ZmDtE=zgdi6cVouhkH=N5HQ@mail.gmail.com>
	<CF19187C.994D%carsonhh@gmail.com>
	<0BB3E178-1CA3-46E7-8923-3E7C6B834665@gmail.com>
Message-ID: <CF1FBEEF.9BF5%carsonhh@gmail.com>

I wouldn?t use mpi_evaluator.  It is buggy and has virtually no
documentation.  The AED values are the best way to identify which genes are
higher and lower quality.  You can also run interproscan to identify protein
domain content as an independent evaluation. Look at this paper here ?>
http://www.biomedcentral.com/1471-2105/12/491

Figure 4 has a nice example of how AED, domain content, and gene orthology
correlate to show the quality of different subsets of genes in seven ant
genomes.

If you choose to try mpi_evaluator it uses the -CTL option to generate empty
files that you then fill in.

Thanks,
Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Tuesday, February 11, 2014 at 11:48 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Daniel Ence <dence at genetics.utah.edu>, <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] maker annotation with cufflinks output

With your suggested changes (using a protein file not derived from the
RNA-seq data and fixing the gff file for training SNAP), I was able to
increase the number of genes from 6000+ to 18116.

I'm now trying to evaluate the quality of the annotation.  I have a question
about the usage for mpi_evaluator.

In the maker tutorial,  the usage is given as:

 mpi_evaluator [options] <eval_opts> <eval_bopts> <eval_exe>
What files are being referred to in the input parameters: eval_opts,
eval_bopts and eval_exe?

Thanks 
Dhivya

On Feb 6, 2014, at 11:47 AM, Carson Holt wrote:

> Ok.  Content looks good.  Just make sure to use gff3_merge to join the GFF3?s
> without stripping out the fasta sequence at the end when training SNAP.
> 
> Thanks,
> Carson
> 
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Thursday, February 6, 2014 at 10:29 AM
> To:  Carson Holt <carsonhh at gmail.com>
> Cc:  Daniel Ence <dence at genetics.utah.edu>
> Subject:  Re: [maker-devel] maker annotation with cufflinks output
> 
> Sorry I was just trying to make it small enough to be approved by the mailing
> list.
> 
> Here is the whole file:
> 
> 
>  cat.formatted.gff.tgz
> <https://docs.google.com/file/d/0B3fACsJDXQi6VEE1VG5tWEh5M1U/edit?usp=drive_we
> b> 
> 
> 
> 
> On Thu, Feb 6, 2014 at 11:04 AM, Carson Holt <carsonhh at gmail.com> wrote:
>> Could you give me the file without using 'head? to trim it, its cutting it
>> before it reaches the part I?m interested in.
>> 
>> ?Carson
>> 
>> 
>> From:  dhivya arasappan <darasappan at gmail.com>
>> Date:  Thursday, February 6, 2014 at 10:01 AM
>> 
>> To:  Carson Holt <carsonhh at gmail.com>
>> Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
>> <maker-devel at yandell-lab.org>
>> Subject:  Re: [maker-devel] maker annotation with cufflinks output
>> 
>> Oh yes I did- I took just the non sequence entries in the gff file and used
>> that as my input.  I will rerun snap with the gff file containing the
>> sequences as well.
>> 
>> I'm attaching a snippet of the gff file that I used as input to maker2zff.
>> 
>> Thanks for your help
>> Dhivya
>> 
>> 
>> 
>> 
>> On Feb 6, 2014, at 10:05 AM, Carson Holt wrote:
>> 
>>> Your genome.dna file has no sequence?  Did you by any chance strip the fasta
>>> sequence from the GFF3 you are using as input to maker2zff?  There should be
>>> fasta sequence at the end of that file.  Also can I see the GFF3 file you
>>> are using as input to maker2zff.
>>> 
>>> Thanks,
>>> Carson
>>> 
>>> From:  dhivya arasappan <darasappan at gmail.com>
>>> Date:  Thursday, February 6, 2014 at 7:47 AM
>>> To:  Carson Holt <carsonhh at gmail.com>
>>> Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
>>> <maker-devel at yandell-lab.org>
>>> Subject:  Re: [maker-devel] maker annotation with cufflinks output
>>> 
>>> Hello,
>>> 
>>> I does appear than my genome.ann file from maker2zff script has data in it.
>>> However, the SNAP steps after that have created empty files.  The following
>>> are all empty:
>>> 
>>> alt.dna  err.dna  export.dna  genome.dna  olp.dna  uni.dna  wrn.dna
>>> alt.ann  err.ann  export.ann  genome.ann  olp.ann  uni.ann  wrn.ann
>>> 
>>> When I tried to get gene stats or validate genome.ann, I get errors like
>>> this for all of them:
>>> 
>>> fathom genome.ann genome.dna -gene-stats |more
>>> MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds
>>> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
>>> exon-6:out_of_bounds
>>> MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds
>>> exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds
>>> exon-1:out_of_bounds
>>> MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds
>>> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
>>> MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds
>>> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
>>> exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds
>>> exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds
>>> exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds
>>> exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds
>>> exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds
>>> exon-21:out_of_bounds
>>> 
>>> I'm not sure why the annotation I'm seeing in genome.ann are all showing up
>>> as errors. I realize this may be an issue with snap, but are you familiar
>>> with anything like this? My genome.ann file is attached for reference.
>>> 
>>> Thanks
>>> Dhivya
>>> 
>>> On Feb 5, 2014, at 12:38 PM, Carson Holt wrote:
>>> 
>>>> Do you have any features of type snap in your results from step 3?  We?ve
>>>> had a couple of recent posts where after training snap was giving no
>>>> results, and as a result maker couldn?t give any genes.  One cause of
>>>> something like that may be your step 2.  Make sure the ZFF wasn?t empty you
>>>> used to train with.  The maker2zff script uses filters to only put the best
>>>> genes in the off file, and if all your genes fail the filtering then you
>>>> are training with an empty ZFF.
>>>> 
>>>> Also you should use proteins from a related species as your protein file.
>>>> I see that you protein marches are varying wildly from run to run? So is
>>>> your contig count?  Were the subset of contigs you have results for long
>>>> enough to contain genes?
>>>> 
>>>> ?Carson
>>>> 
>>>> From:  dhivya arasappan <darasappan at gmail.com>
>>>> Date:  Monday, February 3, 2014 at 9:31 AM
>>>> To:  Daniel Ence <dence at genetics.utah.edu>
>>>> Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>> Subject:  Re: [maker-devel] maker annotation with cufflinks output
>>>> 
>>>> Hi Daniel,
>>>> 
>>>> I was able to check on some of those questions.
>>>> 
>>>> 1. From trinity assembly: I started with 102000 contigs. I used trinotate
>>>> to annotate proteins in this.
>>>> 
>>>> I ran maker on this data with est2genome set to 1. The output looks like
>>>> this (most important parts on top):
>>>> 
>>>>     6653 gene
>>>>    46675 exon
>>>>  280534 protein_match
>>>> 59934 CDS
>>>>     969 contig
>>>>  105388 expressed_sequence_match
>>>>   12584 five_prime_UTR
>>>>   78565 match
>>>> 1401369 match_part
>>>>   10180 mRNA
>>>>   11545 three_prime_UTR
>>>> 
>>>> 2. From cufflinks assembly: I started with 133380 entries (out of which
>>>> there are 29,000 transcripts).  I used the protein sequences from trinity
>>>> assembly.
>>>> 
>>>> I ran maker on this data with est2genome set to 1. The output looks like
>>>> this:
>>>>      29 gene
>>>>      75 exon
>>>>  573659 protein_match
>>>> 67 CDS
>>>>    1099 contig
>>>>  269298 expressed_sequence_match
>>>>      23 five_prime_UTR
>>>>  173844 match
>>>> 2221846 match_part
>>>>      29 mRNA
>>>>      23 three_prime_UTR
>>>> 
>>>> The genes annotated using the trinity assembly is lower than expected, so I
>>>> went the cufflinks route. I dont understand why when using the cufflinks
>>>> transcripts, even less genes are being found.
>>>> 
>>>> 3. Training SNAP:  I used the results of maker from 1 to train SNAP.  I
>>>> then used that training set to rerun maker:
>>>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/s
>>>> nap/RHA.hmm
>>>> est2genome=0
>>>> 
>>>> And again I got results with no entries for gene, exon, CDS etc.
>>>> 957 contig
>>>>   46555 expressed_sequence_match
>>>>   43651 match
>>>>  553633 match_part
>>>>  113738 protein_match
>>>> 
>>>> As I mentioned in another email, cegma results indicated that the genome
>>>> was more than 90% complete. Any suggestions would be helpful.
>>>> 
>>>> Thank you
>>>> Dhivya
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>>>> 
>>>>> Hi Dhivya, 
>>>>> 
>>>>> I think there a few numbers that could be helpful to understand what's
>>>>> happening here.
>>>>> 
>>>>> How many transcripts did Trinity assembly the RNA-seq data into? Also, you
>>>>> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave
>>>>> it the cufflinks data. How many transcripts did MAKER identify with the
>>>>> cufflinks data? Did you still get more than the 10,000 transcripts that
>>>>> you found with just the Trinity data?
>>>>> 
>>>>> A key part of MAKER's approach to genome annotation that might be
>>>>> affecting it's performance is that it only annotates a gene where there is
>>>>> both evidence (like your RNA-seq data) and an ab-initio prediction. If a
>>>>> prediction is unsupported by the evidence, then MAKER won't annotate a
>>>>> gene and if evidence aligns where there's no prediction, MAKER won't
>>>>> annotate a gene either. What ab-initio predictors are you using and have
>>>>> they been trained specific genome?
>>>>> 
>>>>> You can force MAKER to automatically promote evidence alignments to a gene
>>>>> model by setting the est2genome option to 1, but that will usually give
>>>>> you many false positives.
>>>>> 
>>>>> Try rerunning it with either the Trinity data or the Cufflinks data and
>>>>> with est2genome set to 1, and let us know how that affects the MAKER
>>>>> results. 
>>>>> 
>>>>> Thanks,
>>>>> Daniel
>>>>> 
>>>>> Daniel Ence
>>>>> Graduate Student
>>>>> Eccles Institute of Human Genetics
>>>>> University of Utah
>>>>> 15 North 2030 East, Room 2100
>>>>> Salt Lake City, UT 84112-5330
>>>>> ________________________________________
>>>>>  From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>>>>> dhivya arasappan [darasappan at gmail.com]
>>>>>  Sent: Thursday, January 30, 2014 11:18 AM
>>>>> To: maker-devel at yandell-lab.org
>>>>> Subject: [maker-devel] maker annotation with cufflinks output
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> I am trying to annotate a 200 mb plant genome for which I have a very
>>>>> good assembly.
>>>>> 
>>>>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>>>>> using my genome assembly and the trinity results.  I did not get as
>>>>>  many transcripts as expected, around 10,000 transcripts.
>>>>> 
>>>>> So, I decided to try a different approach.  I did a genome assisted
>>>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>>>>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
>>>>>  genome assembly and the cufflinks result.  I get much less number of
>>>>> transcripts as a result.
>>>>> 
>>>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>>>>> confused as to why maker is not finding the same.
>>>>> 
>>>>> Any suggestions would be appreciated.
>>>>> 
>>>>> Thanks
>>>>> Dhivya
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> maker-devel mailing list
>>>>> maker-devel at box290.bluehost.com
>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>> 
>>>> _______________________________________________ maker-devel mailing list
>>>> maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> 
>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140211/0f491f93/attachment-0002.html>

From carson.holt at genetics.utah.edu  Tue Feb 11 13:52:05 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Tue, 11 Feb 2014 20:52:05 +0000
Subject: [maker-devel] New MAKER release
Message-ID: <CF1FDB84.9C17%carson.holt@genetics.utah.edu>

Hello all,

MAKER has been updated to 2.31.

There are no major new features over 2.30.  It is primarily just bug fixes, and updates to the features that were added from MAKER-P like tRNAscan support.  I also was able to remove the seg faults that sometimes happened on exit under OpenMPI.

Thanks,
Carson

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140211/bce7d2a5/attachment-0002.html>

From carson.holt at genetics.utah.edu  Tue Feb 11 14:19:17 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Tue, 11 Feb 2014 21:19:17 +0000
Subject: [maker-devel] New MAKER release
In-Reply-To: <CA+77YqG+FiWr+HvSNYY6R6UOBCtcejA1wCLCXvzQr_Top5Eemw@mail.gmail.com>
References: <CF1FDB84.9C17%carson.holt@genetics.utah.edu>
	<CA+77YqG+FiWr+HvSNYY6R6UOBCtcejA1wCLCXvzQr_Top5Eemw@mail.gmail.com>
Message-ID: <CF1FDDCC.9C1B%carson.holt@genetics.utah.edu>

URLs can be manually edited in the .../maker/src/locations file. I?ve also updated that file in the latest MAKER download. to point to the new RepBase URL.

Thanks,
Carson

From: Joanna Kelley <jokelley at stanford.edu<mailto:jokelley at stanford.edu>>
Date: Tuesday, February 11, 2014 at 2:00 PM
To: Carson Holt <carson.holt at genetics.utah.edu<mailto:carson.holt at genetics.utah.edu>>
Subject: Re: [maker-devel] New MAKER release

Hi Carson,

The RepBase step is failing, it seems to be looking for the incorrect version, where do I change the code to solve that?

Thanks,
Joanna

 Downloading RepBase...
--2014-02-11 12:59:38--  http://www.girinst.org/server/RepBase/protected/repeatmaskerlibraries/repeatmaskerlibraries-20130422.tar.gz
Resolving www.girinst.org... 66.201.49.247
Connecting to www.girinst.org<http://www.girinst.org>|66.201.49.247|:80... connected.
HTTP request sent, awaiting response... 401 Authorization Required
Connecting to www.girinst.org<http://www.girinst.org>|66.201.49.247|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2014-02-11 12:59:38 ERROR 404: Not Found.


ERROR: Failed installing RepBase, now cleaning installation path...
You may need to install RepBase manually.


On Tue, Feb 11, 2014 at 12:52 PM, Carson Holt <carson.holt at genetics.utah.edu<mailto:carson.holt at genetics.utah.edu>> wrote:
Hello all,

MAKER has been updated to 2.31.

There are no major new features over 2.30.  It is primarily just bug fixes, and updates to the features that were added from MAKER-P like tRNAscan support.  I also was able to remove the seg faults that sometimes happened on exit under OpenMPI.

Thanks,
Carson


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


--
Please update your address book, my new email address is joanna.l.kelley at wsu.edu<mailto:joanna.l.kelley at wsu.edu>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140211/3da9afda/attachment-0002.html>

From dence at genetics.utah.edu  Tue Feb 11 15:59:57 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Tue, 11 Feb 2014 22:59:57 +0000
Subject: [maker-devel] ERROR: Failed while processing the chunk divide!!
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A8908ADE@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A8908ADE@SKREGIXES2.AGR.GC.CA>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D4462B@mxb2.hg.genetics.utah.edu>

Hi Hossen, 

I think that what would be the most help right now is if you ran MAKER on only one of those contigs that are failing and send me the entire error output along with the maker control files that you are using. It looks like the error is coming from the gff3 files that you are using as input. 

Thanks,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
Sent: Tuesday, February 11, 2014 3:51 PM
To: Daniel Ence
Subject: ERROR: Failed while processing the chunk divide!!

Dear Daniel

I re-started maker and it is still running. But in error our file that has
been generated so far it seems that smaller conitgs are affected. There
are contigs of 2-4 kb with this error but also I noticed a contig of 30kb
length having this error

I was wondering if I need to change the setting in the maker_opt file

#-----MAKER Behavior Options
max_dna_len=100000 #length for dividing up contigs into chunks
(increases/decreases  memory usage)
min_contig=1 #skip genome contigs below this length (under 10kb are often
useless)


If I understand correctly max_dna_len   divide conitgs  of over 100kb to
smaller chucks. However it is not clear to me that for the min_contig
option if the default contig length is 10kb or less, then why I have error
message for 30kb long contigs. Should I change this to 0

Here is an example of the error message for one of the contigs


#--------- command -------------#
Widget::exonerate::est2genome:
/usr/local/exonerate-2.2.0-x86_64/bin/exonerate  -q
/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.brass
icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35
/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta
-t
/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.brass
icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom
e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-1136.
fasta
-Q dna -T dna --model est2genome
--minintron 20 --showcigar --percent 20 >
/raid01/projects/Plasmodiophora/brassica
e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.brassi
cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3S
c00001.235-1136.comp14545_c0_seq1.est_exonerate
#-------------------------------#
cleaning blastn...
cleaning tblastx...
cleaning blastx...
ERROR: Failed on
PbPT3Sc00001_S_0.8_1-mRNA-1
Check your input GFF3 file for errors!
(from GFFDB)

FATAL ERROR
ERROR: Failed while processing the chunk
divide!!

ERROR: Chunk failed at level 17
!!
FAILED CONTIG:PbPT3Sc00001


--Next Contig--


Regards


HB


On 14-02-11 12:37 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hossein,
>
>Ok. So since this error came up on a local install, I'm going to need
>some more information to understand what went wrong. Is it the same
>contig that always causes this error? If it is, then is the the only
>error or warning that MAKER encounters while running on this contig? Or,
>if multiple contigs fail, then is it always the same error?
>
>If you can narrow it down to the smallest possible dataset that
>consistently gives the same error, then we canb egin to understand what's
>wrong.
>
>Thanks,
>Daniel
>
>
>Daniel Ence
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>Sent: Tuesday, February 11, 2014 11:20 AM
>To: Daniel Ence
>Subject: Re: [maker-devel] Falied to create new account
>
>Hi Daniel
>
>I running it through the local server at my work
>
>
>
>
>
>
>M. Hossein Borhan, Ph.D.
>Research Scientist/ Chercheur Scientifique
>Saskatoon Research Centre/Centre de Recherches de Saskatoon
>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
>107 Science Place, Saskatoon, SK.,S7N 0X2
>Telephone/T?l?phone: (306) 385-9441
>Facsimile/T?l?copieur: (306) 385-9482
>Hossein.borhan at agr.gc.ca
>
>
>
>
>
>
>
>
>On 14-02-11 12:16 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>
>>Hi Hossein,
>>
>>Did you encounter this error while you were running MAKER on your local
>>machine or through the MAKER web annotation service?
>>
>>Thanks,
>>Daniel
>>
>>
>>Daniel Ence
>>Graduate Student
>>Eccles Institute of Human Genetics
>>University of Utah
>>15 North 2030 East, Room 2100
>>Salt Lake City, UT 84112-5330
>>________________________________________
>>From: Carson Holt [carsonhh at gmail.com]
>>Sent: Tuesday, February 11, 2014 10:18 AM
>>To: Daniel Ence
>>Cc: Mark Yandell
>>Subject: FW: [maker-devel] Falied to create new account
>>
>>Hey Daniel could you download his dataset, and see if you can replicate
>>the error.  Also check if this was an MWAS job or a local maker run (his
>>dataset will already be there for MWAS, you just need the job ID).
>>
>>Thanks,
>>Carson
>>
>>On 2/11/14, 10:16 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:
>>
>>>Hi Carson
>>>
>>>
>>>I encountered this error while running maker
>>>
>>>FATAL ERROR
>>>ERROR: Failed while processing the chunk divide!!
>>>
>>>ERROR: Chunk failed at level 17
>>>!!
>>>FAILED CONTIG:PbPT3Sc00006
>>>
>>>
>>>
>>>
>>>
>>>HB
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>>
>>>
>>
>>
>


From marc.hoeppner at imbim.uu.se  Wed Feb 12 01:34:12 2014
From: marc.hoeppner at imbim.uu.se (Marc P. Hoeppner)
Date: Wed, 12 Feb 2014 09:34:12 +0100
Subject: [maker-devel] Annotations from protein alignments
Message-ID: <52FB3204.60606@imbim.uu.se>

Dear list,

I have an annotation project with both protein data (it's a bird, so 
I've been using both vertebrates in general and chicken in specific), 
and huge amounts of somewhat dodgy (as in lot's of pre-mRNA) RNA-seq 
data. The chicken augustus model seems to do a decent job in seeding 
gene loci, but it's not quite perfect. I want to use protein alignments 
to create a high-confidence set of exons and subsequently a set of gene 
loci to train e.g. snap), but when testing to set protein2genome=1 I 
never get any annotations. This is also true for the test data set that 
is delivered together with Maker (hsap_). Anything I should know about 
the use of proteins to generate annotations? I left all settings in the 
config file at their defaults (except protein2genome=1). I've tried this 
with both Maker 2.30 and 2.31.

All the best,

Marc

-- 
-----------
Marc P. Hoeppner, PhD
Group leader
BILS Genome annotation platform

Department of Medical Biochemistry and Microbiology
Uppsala University, Sweden
marc.hoepner at imbim.uu.se


From carsonhh at gmail.com  Wed Feb 12 08:42:36 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 12 Feb 2014 08:42:36 -0700
Subject: [maker-devel] Annotations from protein alignments
In-Reply-To: <52FB3204.60606@imbim.uu.se>
References: <52FB3204.60606@imbim.uu.se>
Message-ID: <CF20E42A.9C8C%carsonhh@gmail.com>

I updated the 2.31 tar ball.  Go ahead and download it again.
protein2genome was turned off for eukaryotes and only working for
prokaryotic genomes.

?Carson


On 2/12/14, 1:34 AM, "Marc P. Hoeppner" <marc.hoeppner at imbim.uu.se> wrote:

>Dear list,
>
>I have an annotation project with both protein data (it's a bird, so
>I've been using both vertebrates in general and chicken in specific),
>and huge amounts of somewhat dodgy (as in lot's of pre-mRNA) RNA-seq
>data. The chicken augustus model seems to do a decent job in seeding
>gene loci, but it's not quite perfect. I want to use protein alignments
>to create a high-confidence set of exons and subsequently a set of gene
>loci to train e.g. snap), but when testing to set protein2genome=1 I
>never get any annotations. This is also true for the test data set that
>is delivered together with Maker (hsap_). Anything I should know about
>the use of proteins to generate annotations? I left all settings in the
>config file at their defaults (except protein2genome=1). I've tried this
>with both Maker 2.30 and 2.31.
>
>All the best,
>
>Marc
>
>-- 
>-----------
>Marc P. Hoeppner, PhD
>Group leader
>BILS Genome annotation platform
>
>Department of Medical Biochemistry and Microbiology
>Uppsala University, Sweden
>marc.hoepner at imbim.uu.se
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From dence at genetics.utah.edu  Wed Feb 12 11:59:11 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Wed, 12 Feb 2014 18:59:11 +0000
Subject: [maker-devel] ERROR: Failed while processing the chunk divide!!
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A8908D02@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A8908ADE@SKREGIXES2.AGR.GC.CA>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D4462B@mxb2.hg.genetics.utah.edu>,
	<E8EDFB90D92694478065C37017B3A3A6A8908D02@SKREGIXES2.AGR.GC.CA>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D448BA@mxb2.hg.genetics.utah.edu>

Hi Hossein, 

So, after looking at the gff3 and your control files, I had an idea. There's the part of the control file called "Re-annotation Using MAKER Derived GFF3", but you can also passthrough features from a gff3 using the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines. 

Sometimes we encounter problems with the MAKER passthrough. Could you try dividing the gff3 file into the different feature sources and passing it through the "est_gff" etc options and not with the MAKER passthrough? That will tell us if the problem is with the gff3 file or with how MAKER is processing it. 

Another also to check is to make sure that the contig names in the gff3 file match the contig names in the fasta file that you're annotating. 

Thanks,
Daniel


Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
Sent: Wednesday, February 12, 2014 8:49 AM
To: Daniel Ence
Subject: Re: ERROR: Failed while processing the chunk divide!!

Dear Daniel


I have generated the files that you requested. I choose Sc00009 from my
genome which is 30 kb and was one of the scaffolds coming up with error.
In addition to Ctl files and error output file I also attached a part of
the gff file related to SC00009 that is indicated in the error message.


Thanks for helping with this


Regards


HB


On 14-02-11 4:59 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hi Hossen,
>
>I think that what would be the most help right now is if you ran MAKER on
>only one of those contigs that are failing and send me the entire error
>output along with the maker control files that you are using. It looks
>like the error is coming from the gff3 files that you are using as input.
>
>Thanks,
>Daniel
>
>
>
>Daniel Ence
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>Sent: Tuesday, February 11, 2014 3:51 PM
>To: Daniel Ence
>Subject: ERROR: Failed while processing the chunk divide!!
>
>Dear Daniel
>
>I re-started maker and it is still running. But in error our file that has
>been generated so far it seems that smaller conitgs are affected. There
>are contigs of 2-4 kb with this error but also I noticed a contig of 30kb
>length having this error
>
>I was wondering if I need to change the setting in the maker_opt file
>
>#-----MAKER Behavior Options
>max_dna_len=100000 #length for dividing up contigs into chunks
>(increases/decreases  memory usage)
>min_contig=1 #skip genome contigs below this length (under 10kb are often
>useless)
>
>
>If I understand correctly max_dna_len   divide conitgs  of over 100kb to
>smaller chucks. However it is not clear to me that for the min_contig
>option if the default contig length is 10kb or less, then why I have error
>message for 30kb long contigs. Should I change this to 0
>
>Here is an example of the error message for one of the contigs
>
>
>#--------- command -------------#
>Widget::exonerate::est2genome:
>/usr/local/exonerate-2.2.0-x86_64/bin/exonerate  -q
>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras
>s
>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35
>/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta
>-t
>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras
>s
>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom
>e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-1136
>.
>fasta
>-Q dna -T dna --model est2genome
>--minintron 20 --showcigar --percent 20 >
>/raid01/projects/Plasmodiophora/brassica
>e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.brass
>i
>cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3
>S
>c00001.235-1136.comp14545_c0_seq1.est_exonerate
>#-------------------------------#
>cleaning blastn...
>cleaning tblastx...
>cleaning blastx...
>ERROR: Failed on
>PbPT3Sc00001_S_0.8_1-mRNA-1
>Check your input GFF3 file for errors!
>(from GFFDB)
>
>FATAL ERROR
>ERROR: Failed while processing the chunk
>divide!!
>
>ERROR: Chunk failed at level 17
>!!
>FAILED CONTIG:PbPT3Sc00001
>
>
>
>
>--Next Contig--
>
>
>
>
>
>
>Regards
>
>
>HB
>
>
>
>
>
>
>
>
>
>
>On 14-02-11 12:37 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>
>>Hossein,
>>
>>Ok. So since this error came up on a local install, I'm going to need
>>some more information to understand what went wrong. Is it the same
>>contig that always causes this error? If it is, then is the the only
>>error or warning that MAKER encounters while running on this contig? Or,
>>if multiple contigs fail, then is it always the same error?
>>
>>If you can narrow it down to the smallest possible dataset that
>>consistently gives the same error, then we canb egin to understand what's
>>wrong.
>>
>>Thanks,
>>Daniel
>>
>>
>>Daniel Ence
>>Graduate Student
>>Eccles Institute of Human Genetics
>>University of Utah
>>15 North 2030 East, Room 2100
>>Salt Lake City, UT 84112-5330
>>________________________________________
>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>>Sent: Tuesday, February 11, 2014 11:20 AM
>>To: Daniel Ence
>>Subject: Re: [maker-devel] Falied to create new account
>>
>>Hi Daniel
>>
>>I running it through the local server at my work
>>
>>
>>
>>
>>
>>
>>M. Hossein Borhan, Ph.D.
>>Research Scientist/ Chercheur Scientifique
>>Saskatoon Research Centre/Centre de Recherches de Saskatoon
>>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
>>107 Science Place, Saskatoon, SK.,S7N 0X2
>>Telephone/T?l?phone: (306) 385-9441
>>Facsimile/T?l?copieur: (306) 385-9482
>>Hossein.borhan at agr.gc.ca
>>
>>
>>
>>
>>
>>
>>
>>
>>On 14-02-11 12:16 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>
>>>Hi Hossein,
>>>
>>>Did you encounter this error while you were running MAKER on your local
>>>machine or through the MAKER web annotation service?
>>>
>>>Thanks,
>>>Daniel
>>>
>>>
>>>Daniel Ence
>>>Graduate Student
>>>Eccles Institute of Human Genetics
>>>University of Utah
>>>15 North 2030 East, Room 2100
>>>Salt Lake City, UT 84112-5330
>>>________________________________________
>>>From: Carson Holt [carsonhh at gmail.com]
>>>Sent: Tuesday, February 11, 2014 10:18 AM
>>>To: Daniel Ence
>>>Cc: Mark Yandell
>>>Subject: FW: [maker-devel] Falied to create new account
>>>
>>>Hey Daniel could you download his dataset, and see if you can replicate
>>>the error.  Also check if this was an MWAS job or a local maker run (his
>>>dataset will already be there for MWAS, you just need the job ID).
>>>
>>>Thanks,
>>>Carson
>>>
>>>On 2/11/14, 10:16 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA>
>>>wrote:
>>>
>>>>Hi Carson
>>>>
>>>>
>>>>I encountered this error while running maker
>>>>
>>>>FATAL ERROR
>>>>ERROR: Failed while processing the chunk divide!!
>>>>
>>>>ERROR: Chunk failed at level 17
>>>>!!
>>>>FAILED CONTIG:PbPT3Sc00006
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>HB
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>
>>>
>>>
>>
>


From dence at genetics.utah.edu  Wed Feb 12 12:15:59 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Wed, 12 Feb 2014 19:15:59 +0000
Subject: [maker-devel] ERROR: Failed while processing the chunk divide!!
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D448BA@mxb2.hg.genetics.utah.edu>
References: <E8EDFB90D92694478065C37017B3A3A6A8908ADE@SKREGIXES2.AGR.GC.CA>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D4462B@mxb2.hg.genetics.utah.edu>,
	<E8EDFB90D92694478065C37017B3A3A6A8908D02@SKREGIXES2.AGR.GC.CA>,
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D448BA@mxb2.hg.genetics.utah.edu>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D44928@mxb2.hg.genetics.utah.edu>

Hi Hossein, 

One more question. How did you make the gff3 that you're passing through here? 

Thanks,
Daniel 


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Daniel Ence [dence at genetics.utah.edu]
Sent: Wednesday, February 12, 2014 11:59 AM
To: Borhan, Hossein
Cc: maker-devel at yandell-lab.org
Subject: Re: [maker-devel] ERROR: Failed while processing the chunk divide!!

Hi Hossein,

So, after looking at the gff3 and your control files, I had an idea. There's the part of the control file called "Re-annotation Using MAKER Derived GFF3", but you can also passthrough features from a gff3 using the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines.

Sometimes we encounter problems with the MAKER passthrough. Could you try dividing the gff3 file into the different feature sources and passing it through the "est_gff" etc options and not with the MAKER passthrough? That will tell us if the problem is with the gff3 file or with how MAKER is processing it.

Another also to check is to make sure that the contig names in the gff3 file match the contig names in the fasta file that you're annotating.

Thanks,
Daniel


Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
Sent: Wednesday, February 12, 2014 8:49 AM
To: Daniel Ence
Subject: Re: ERROR: Failed while processing the chunk divide!!

Dear Daniel


I have generated the files that you requested. I choose Sc00009 from my
genome which is 30 kb and was one of the scaffolds coming up with error.
In addition to Ctl files and error output file I also attached a part of
the gff file related to SC00009 that is indicated in the error message.


Thanks for helping with this


Regards


HB


On 14-02-11 4:59 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hi Hossen,
>
>I think that what would be the most help right now is if you ran MAKER on
>only one of those contigs that are failing and send me the entire error
>output along with the maker control files that you are using. It looks
>like the error is coming from the gff3 files that you are using as input.
>
>Thanks,
>Daniel
>
>
>
>Daniel Ence
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>Sent: Tuesday, February 11, 2014 3:51 PM
>To: Daniel Ence
>Subject: ERROR: Failed while processing the chunk divide!!
>
>Dear Daniel
>
>I re-started maker and it is still running. But in error our file that has
>been generated so far it seems that smaller conitgs are affected. There
>are contigs of 2-4 kb with this error but also I noticed a contig of 30kb
>length having this error
>
>I was wondering if I need to change the setting in the maker_opt file
>
>#-----MAKER Behavior Options
>max_dna_len=100000 #length for dividing up contigs into chunks
>(increases/decreases  memory usage)
>min_contig=1 #skip genome contigs below this length (under 10kb are often
>useless)
>
>
>If I understand correctly max_dna_len   divide conitgs  of over 100kb to
>smaller chucks. However it is not clear to me that for the min_contig
>option if the default contig length is 10kb or less, then why I have error
>message for 30kb long contigs. Should I change this to 0
>
>Here is an example of the error message for one of the contigs
>
>
>#--------- command -------------#
>Widget::exonerate::est2genome:
>/usr/local/exonerate-2.2.0-x86_64/bin/exonerate  -q
>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras
>s
>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35
>/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta
>-t
>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras
>s
>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom
>e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-1136
>.
>fasta
>-Q dna -T dna --model est2genome
>--minintron 20 --showcigar --percent 20 >
>/raid01/projects/Plasmodiophora/brassica
>e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.brass
>i
>cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3
>S
>c00001.235-1136.comp14545_c0_seq1.est_exonerate
>#-------------------------------#
>cleaning blastn...
>cleaning tblastx...
>cleaning blastx...
>ERROR: Failed on
>PbPT3Sc00001_S_0.8_1-mRNA-1
>Check your input GFF3 file for errors!
>(from GFFDB)
>
>FATAL ERROR
>ERROR: Failed while processing the chunk
>divide!!
>
>ERROR: Chunk failed at level 17
>!!
>FAILED CONTIG:PbPT3Sc00001
>
>
>
>
>--Next Contig--
>
>
>
>
>
>
>Regards
>
>
>HB
>
>
>
>
>
>
>
>
>
>
>On 14-02-11 12:37 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>
>>Hossein,
>>
>>Ok. So since this error came up on a local install, I'm going to need
>>some more information to understand what went wrong. Is it the same
>>contig that always causes this error? If it is, then is the the only
>>error or warning that MAKER encounters while running on this contig? Or,
>>if multiple contigs fail, then is it always the same error?
>>
>>If you can narrow it down to the smallest possible dataset that
>>consistently gives the same error, then we canb egin to understand what's
>>wrong.
>>
>>Thanks,
>>Daniel
>>
>>
>>Daniel Ence
>>Graduate Student
>>Eccles Institute of Human Genetics
>>University of Utah
>>15 North 2030 East, Room 2100
>>Salt Lake City, UT 84112-5330
>>________________________________________
>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>>Sent: Tuesday, February 11, 2014 11:20 AM
>>To: Daniel Ence
>>Subject: Re: [maker-devel] Falied to create new account
>>
>>Hi Daniel
>>
>>I running it through the local server at my work
>>
>>
>>
>>
>>
>>
>>M. Hossein Borhan, Ph.D.
>>Research Scientist/ Chercheur Scientifique
>>Saskatoon Research Centre/Centre de Recherches de Saskatoon
>>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
>>107 Science Place, Saskatoon, SK.,S7N 0X2
>>Telephone/T?l?phone: (306) 385-9441
>>Facsimile/T?l?copieur: (306) 385-9482
>>Hossein.borhan at agr.gc.ca
>>
>>
>>
>>
>>
>>
>>
>>
>>On 14-02-11 12:16 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>
>>>Hi Hossein,
>>>
>>>Did you encounter this error while you were running MAKER on your local
>>>machine or through the MAKER web annotation service?
>>>
>>>Thanks,
>>>Daniel
>>>
>>>
>>>Daniel Ence
>>>Graduate Student
>>>Eccles Institute of Human Genetics
>>>University of Utah
>>>15 North 2030 East, Room 2100
>>>Salt Lake City, UT 84112-5330
>>>________________________________________
>>>From: Carson Holt [carsonhh at gmail.com]
>>>Sent: Tuesday, February 11, 2014 10:18 AM
>>>To: Daniel Ence
>>>Cc: Mark Yandell
>>>Subject: FW: [maker-devel] Falied to create new account
>>>
>>>Hey Daniel could you download his dataset, and see if you can replicate
>>>the error.  Also check if this was an MWAS job or a local maker run (his
>>>dataset will already be there for MWAS, you just need the job ID).
>>>
>>>Thanks,
>>>Carson
>>>
>>>On 2/11/14, 10:16 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA>
>>>wrote:
>>>
>>>>Hi Carson
>>>>
>>>>
>>>>I encountered this error while running maker
>>>>
>>>>FATAL ERROR
>>>>ERROR: Failed while processing the chunk divide!!
>>>>
>>>>ERROR: Chunk failed at level 17
>>>>!!
>>>>FAILED CONTIG:PbPT3Sc00006
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>HB
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>
>>>
>>>
>>
>


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From dence at genetics.utah.edu  Wed Feb 12 13:42:03 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Wed, 12 Feb 2014 20:42:03 +0000
Subject: [maker-devel] ERROR: Failed while processing the chunk divide!!
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A8908E3E@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A8908ADE@SKREGIXES2.AGR.GC.CA>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D4462B@mxb2.hg.genetics.utah.edu>
	<E8EDFB90D92694478065C37017B3A3A6A8908D02@SKREGIXES2.AGR.GC.CA>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D448BA@mxb2.hg.genetics.utah.edu>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D44928@mxb2.hg.genetics.utah.edu>
	<E8EDFB90D92694478065C37017B3A3A6A8908DE5@SKREGIXES2.AGR.GC.CA>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D4498A@mxb2.hg.genetics.utah.edu>,
	<E8EDFB90D92694478065C37017B3A3A6A8908E3E@SKREGIXES2.AGR.GC.CA>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D44A3B@mxb2.hg.genetics.utah.edu>

Hi Hossein, 

So, those problems with passing through MAKER-derived gff3 have been addressed in newer versions of MAKER. The current version is 2.31 and is available for download now on our website. Try installing that version and trying the same controls file you started out using, and let me know if that fixes the problems. 

Thanks,
Daniel

 
Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
Sent: Wednesday, February 12, 2014 12:55 PM
To: Daniel Ence
Subject: Re: ERROR: Failed while processing the chunk divide!!

Hi Daniel

I am using maker 2.10
 I also checked the naming of the scaffold in the genome file and the gff
file for the failed example. Naming is the same

Thanks

Hossein


M. Hossein Borhan, Ph.D.
Research Scientist/ Chercheur Scientifique
Saskatoon Research Centre/Centre de Recherches de Saskatoon
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
107 Science Place, Saskatoon, SK.,S7N 0X2
Telephone/T?l?phone: (306) 385-9441
Facsimile/T?l?copieur: (306) 385-9482
Hossein.borhan at agr.gc.ca


On 14-02-12 1:30 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hi Hossein,
>
>And which version of MAKER are you using?
>
>Thanks,
>Daniel
>
>
>Daniel Ence
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>Sent: Wednesday, February 12, 2014 12:25 PM
>To: Daniel Ence
>Subject: Re: ERROR: Failed while processing the chunk divide!!
>
>Hi Daniel
>
>Gff file was generated by the 1st run of maker
>
>
>
>HB
>
>
>
>
>
>
>
>On 14-02-12 1:15 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>
>>Hi Hossein,
>>
>>One more question. How did you make the gff3 that you're passing through
>>here?
>>
>>Thanks,
>>Daniel
>>
>>
>>Daniel Ence
>>Graduate Student
>>Eccles Institute of Human Genetics
>>University of Utah
>>15 North 2030 East, Room 2100
>>Salt Lake City, UT 84112-5330
>>________________________________________
>>From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>>Daniel Ence [dence at genetics.utah.edu]
>>Sent: Wednesday, February 12, 2014 11:59 AM
>>To: Borhan, Hossein
>>Cc: maker-devel at yandell-lab.org
>>Subject: Re: [maker-devel] ERROR: Failed while processing the chunk
>>divide!!
>>
>>Hi Hossein,
>>
>>So, after looking at the gff3 and your control files, I had an idea.
>>There's the part of the control file called "Re-annotation Using MAKER
>>Derived GFF3", but you can also passthrough features from a gff3 using
>>the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines.
>>
>>Sometimes we encounter problems with the MAKER passthrough. Could you try
>>dividing the gff3 file into the different feature sources and passing it
>>through the "est_gff" etc options and not with the MAKER passthrough?
>>That will tell us if the problem is with the gff3 file or with how MAKER
>>is processing it.
>>
>>Another also to check is to make sure that the contig names in the gff3
>>file match the contig names in the fasta file that you're annotating.
>>
>>Thanks,
>>Daniel
>>
>>
>>
>>Graduate Student
>>Eccles Institute of Human Genetics
>>University of Utah
>>15 North 2030 East, Room 2100
>>Salt Lake City, UT 84112-5330
>>________________________________________
>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>>Sent: Wednesday, February 12, 2014 8:49 AM
>>To: Daniel Ence
>>Subject: Re: ERROR: Failed while processing the chunk divide!!
>>
>>Dear Daniel
>>
>>
>>I have generated the files that you requested. I choose Sc00009 from my
>>genome which is 30 kb and was one of the scaffolds coming up with error.
>>In addition to Ctl files and error output file I also attached a part of
>>the gff file related to SC00009 that is indicated in the error message.
>>
>>
>>Thanks for helping with this
>>
>>
>>
>>Regards
>>
>>
>>HB
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>On 14-02-11 4:59 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>
>>>Hi Hossen,
>>>
>>>I think that what would be the most help right now is if you ran MAKER
>>>on
>>>only one of those contigs that are failing and send me the entire error
>>>output along with the maker control files that you are using. It looks
>>>like the error is coming from the gff3 files that you are using as
>>>input.
>>>
>>>Thanks,
>>>Daniel
>>>
>>>
>>>
>>>Daniel Ence
>>>Graduate Student
>>>Eccles Institute of Human Genetics
>>>University of Utah
>>>15 North 2030 East, Room 2100
>>>Salt Lake City, UT 84112-5330
>>>________________________________________
>>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>>>Sent: Tuesday, February 11, 2014 3:51 PM
>>>To: Daniel Ence
>>>Subject: ERROR: Failed while processing the chunk divide!!
>>>
>>>Dear Daniel
>>>
>>>I re-started maker and it is still running. But in error our file that
>>>has
>>>been generated so far it seems that smaller conitgs are affected. There
>>>are contigs of 2-4 kb with this error but also I noticed a contig of
>>>30kb
>>>length having this error
>>>
>>>I was wondering if I need to change the setting in the maker_opt file
>>>
>>>#-----MAKER Behavior Options
>>>max_dna_len=100000 #length for dividing up contigs into chunks
>>>(increases/decreases  memory usage)
>>>min_contig=1 #skip genome contigs below this length (under 10kb are
>>>often
>>>useless)
>>>
>>>
>>>If I understand correctly max_dna_len   divide conitgs  of over 100kb to
>>>smaller chucks. However it is not clear to me that for the min_contig
>>>option if the default contig length is 10kb or less, then why I have
>>>error
>>>message for 30kb long contigs. Should I change this to 0
>>>
>>>Here is an example of the error message for one of the contigs
>>>
>>>
>>>#--------- command -------------#
>>>Widget::exonerate::est2genome:
>>>/usr/local/exonerate-2.2.0-x86_64/bin/exonerate  -q
>>>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.br
>>>a
>>>s
>>>s
>>>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35
>>>/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta
>>>-t
>>>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.br
>>>a
>>>s
>>>s
>>>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom
>>>e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-11
>>>3
>>>6
>>>.
>>>fasta
>>>-Q dna -T dna --model est2genome
>>>--minintron 20 --showcigar --percent 20 >
>>>/raid01/projects/Plasmodiophora/brassica
>>>e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.bra
>>>s
>>>s
>>>i
>>>cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbP
>>>T
>>>3
>>>S
>>>c00001.235-1136.comp14545_c0_seq1.est_exonerate
>>>#-------------------------------#
>>>cleaning blastn...
>>>cleaning tblastx...
>>>cleaning blastx...
>>>ERROR: Failed on
>>>PbPT3Sc00001_S_0.8_1-mRNA-1
>>>Check your input GFF3 file for errors!
>>>(from GFFDB)
>>>
>>>FATAL ERROR
>>>ERROR: Failed while processing the chunk
>>>divide!!
>>>
>>>ERROR: Chunk failed at level 17
>>>!!
>>>FAILED CONTIG:PbPT3Sc00001
>>>
>>>
>>>
>>>
>>>--Next Contig--
>>>
>>>
>>>
>>>
>>>
>>>
>>>Regards
>>>
>>>
>>>HB
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>On 14-02-11 12:37 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>>
>>>>Hossein,
>>>>
>>>>Ok. So since this error came up on a local install, I'm going to need
>>>>some more information to understand what went wrong. Is it the same
>>>>contig that always causes this error? If it is, then is the the only
>>>>error or warning that MAKER encounters while running on this contig?
>>>>Or,
>>>>if multiple contigs fail, then is it always the same error?
>>>>
>>>>If you can narrow it down to the smallest possible dataset that
>>>>consistently gives the same error, then we canb egin to understand
>>>>what's
>>>>wrong.
>>>>
>>>>Thanks,
>>>>Daniel
>>>>
>>>>
>>>>Daniel Ence
>>>>Graduate Student
>>>>Eccles Institute of Human Genetics
>>>>University of Utah
>>>>15 North 2030 East, Room 2100
>>>>Salt Lake City, UT 84112-5330
>>>>________________________________________
>>>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>>>>Sent: Tuesday, February 11, 2014 11:20 AM
>>>>To: Daniel Ence
>>>>Subject: Re: [maker-devel] Falied to create new account
>>>>
>>>>Hi Daniel
>>>>
>>>>I running it through the local server at my work
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>M. Hossein Borhan, Ph.D.
>>>>Research Scientist/ Chercheur Scientifique
>>>>Saskatoon Research Centre/Centre de Recherches de Saskatoon
>>>>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
>>>>107 Science Place, Saskatoon, SK.,S7N 0X2
>>>>Telephone/T?l?phone: (306) 385-9441
>>>>Facsimile/T?l?copieur: (306) 385-9482
>>>>Hossein.borhan at agr.gc.ca
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>On 14-02-11 12:16 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>>>
>>>>>Hi Hossein,
>>>>>
>>>>>Did you encounter this error while you were running MAKER on your
>>>>>local
>>>>>machine or through the MAKER web annotation service?
>>>>>
>>>>>Thanks,
>>>>>Daniel
>>>>>
>>>>>
>>>>>Daniel Ence
>>>>>Graduate Student
>>>>>Eccles Institute of Human Genetics
>>>>>University of Utah
>>>>>15 North 2030 East, Room 2100
>>>>>Salt Lake City, UT 84112-5330
>>>>>________________________________________
>>>>>From: Carson Holt [carsonhh at gmail.com]
>>>>>Sent: Tuesday, February 11, 2014 10:18 AM
>>>>>To: Daniel Ence
>>>>>Cc: Mark Yandell
>>>>>Subject: FW: [maker-devel] Falied to create new account
>>>>>
>>>>>Hey Daniel could you download his dataset, and see if you can
>>>>>replicate
>>>>>the error.  Also check if this was an MWAS job or a local maker run
>>>>>(his
>>>>>dataset will already be there for MWAS, you just need the job ID).
>>>>>
>>>>>Thanks,
>>>>>Carson
>>>>>
>>>>>On 2/11/14, 10:16 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA>
>>>>>wrote:
>>>>>
>>>>>>Hi Carson
>>>>>>
>>>>>>
>>>>>>I encountered this error while running maker
>>>>>>
>>>>>>FATAL ERROR
>>>>>>ERROR: Failed while processing the chunk divide!!
>>>>>>
>>>>>>ERROR: Chunk failed at level 17
>>>>>>!!
>>>>>>FAILED CONTIG:PbPT3Sc00006
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>HB
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>>_______________________________________________
>>maker-devel mailing list
>>maker-devel at box290.bluehost.com
>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>


From masa at bioinfo.hr  Thu Feb 13 03:17:11 2014
From: masa at bioinfo.hr (Masa Roller)
Date: Thu, 13 Feb 2014 11:17:11 +0100
Subject: [maker-devel] SNAP scores and AED scores
Message-ID: <52FC9BA7.6060505@bioinfo.hr>

Dear all,

I ran snap2 based gene prediction through maker.

In the resulting gff file, in the source "snap_masked" I can find the 
score in the score column of every snap prediction that did not get 
promoted to a maker gene. This would be the score of how well the 
prediction matches the HMM?

It seems to me that those snap models that are given gene status no 
longer appear as snap_masked source but only as source "maker". Maker 
then removes the score column, instead giving AED and eAED scores (which 
are more about how the model corresponds to the evidence). When viewing 
the maker transcripts and SNAP predictions in a browser, they do not 
match (mostly, maker predictions are longer).

I am interested in the score of individual gene predictions that 
underlined maker gene models. Where could I find that information?

Many thanks!


From carsonhh at gmail.com  Thu Feb 13 13:11:22 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 13 Feb 2014 13:11:22 -0700
Subject: [maker-devel] SNAP scores and AED scores
In-Reply-To: <52FC9BA7.6060505@bioinfo.hr>
References: <52FC9BA7.6060505@bioinfo.hr>
Message-ID: <CF227374.9D6F%carsonhh@gmail.com>

No.  Snap genes do not disappear. All SNAP ab initio calls will always be
kept as reference fetters marked snap_masked (for repeat masked genome)
and snap (for unmasked genome).  MAKER then runs SNAP another time where
it feeds hints to SNAP based on EST and protein alignment evidence.  These
hint based models can then compete against the ab initio SNAP models to be
promoted to genes if their AED scores are better.  Fianl models can also
get UTR added based on EST evidence.  That is why you can get models from
MAKER that do not match the original SNAP ab initio calls.

So in summary, all SNAP ab initio models will be in snap_masked.  The
MAKER models will consist of hint based SNAP rerun plus SNAP ab intio
models processed to add UTR.

Thanks,
Carson


On 2/13/14, 3:17 AM, "Masa Roller" <masa at bioinfo.hr> wrote:

>Dear all,
>
>I ran snap2 based gene prediction through maker.
>
>In the resulting gff file, in the source "snap_masked" I can find the
>score in the score column of every snap prediction that did not get
>promoted to a maker gene. This would be the score of how well the
>prediction matches the HMM?
>
>It seems to me that those snap models that are given gene status no
>longer appear as snap_masked source but only as source "maker". Maker
>then removes the score column, instead giving AED and eAED scores (which
>are more about how the model corresponds to the evidence). When viewing
>the maker transcripts and SNAP predictions in a browser, they do not
>match (mostly, maker predictions are longer).
>
>I am interested in the score of individual gene predictions that
>underlined maker gene models. Where could I find that information?
>
>Many thanks!
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Thu Feb 13 13:23:07 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 13 Feb 2014 13:23:07 -0700
Subject: [maker-devel] SNAP scores and AED scores
In-Reply-To: <CF227374.9D6F%carsonhh@gmail.com>
References: <52FC9BA7.6060505@bioinfo.hr>
 <CF227374.9D6F%carsonhh@gmail.com>
Message-ID: <CF227602.9D7E%carsonhh@gmail.com>

On a side note.  Because the MAKER models involve modifying either the ab
initio SNAP model or manipulating the underlying scoring scheme using
hints, the SNAP score on those is virtually meaningless.  However Ian Korf
has developed a tool that can take any gene structure and reverse generate
a score (i.e. what would the score of this gene have been if SNAP would
have called it that way in the first place).

I believe the tool is called fathom and is part of the SNAP package.  It
is not well documented, so you might have to contact Ian Korf directly for
that.  You can use the maker2zff tool to generate the input to fathom.

Thanks,
Carson


On 2/13/14, 1:11 PM, "Carson Holt" <carsonhh at gmail.com> wrote:

>No.  Snap genes do not disappear. All SNAP ab initio calls will always be
>kept as reference fetters marked snap_masked (for repeat masked genome)
>and snap (for unmasked genome).  MAKER then runs SNAP another time where
>it feeds hints to SNAP based on EST and protein alignment evidence.  These
>hint based models can then compete against the ab initio SNAP models to be
>promoted to genes if their AED scores are better.  Fianl models can also
>get UTR added based on EST evidence.  That is why you can get models from
>MAKER that do not match the original SNAP ab initio calls.
>
>So in summary, all SNAP ab initio models will be in snap_masked.  The
>MAKER models will consist of hint based SNAP rerun plus SNAP ab intio
>models processed to add UTR.
>
>Thanks,
>Carson
>
>
>
>On 2/13/14, 3:17 AM, "Masa Roller" <masa at bioinfo.hr> wrote:
>
>>Dear all,
>>
>>I ran snap2 based gene prediction through maker.
>>
>>In the resulting gff file, in the source "snap_masked" I can find the
>>score in the score column of every snap prediction that did not get
>>promoted to a maker gene. This would be the score of how well the
>>prediction matches the HMM?
>>
>>It seems to me that those snap models that are given gene status no
>>longer appear as snap_masked source but only as source "maker". Maker
>>then removes the score column, instead giving AED and eAED scores (which
>>are more about how the model corresponds to the evidence). When viewing
>>the maker transcripts and SNAP predictions in a browser, they do not
>>match (mostly, maker predictions are longer).
>>
>>I am interested in the score of individual gene predictions that
>>underlined maker gene models. Where could I find that information?
>>
>>Many thanks!
>>
>>_______________________________________________
>>maker-devel mailing list
>>maker-devel at box290.bluehost.com
>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>


From barry.utah at gmail.com  Thu Feb 13 13:27:17 2014
From: barry.utah at gmail.com (Barry Moore)
Date: Thu, 13 Feb 2014 13:27:17 -0700
Subject: [maker-devel] SNAP scores and AED scores
In-Reply-To: <CF227374.9D6F%carsonhh@gmail.com>
References: <52FC9BA7.6060505@bioinfo.hr> <CF227374.9D6F%carsonhh@gmail.com>
Message-ID: <39AA5089-3E89-4067-A8DF-60B6716C98DF@genetics.utah.edu>

Hi Masa,

Also, if you want additional SNAP output that hasn't been passed forward in MAKER you can alway access the original SNAP output files in the MAKER datastore.  This is a directory structure created by MAKER to store contig specific data.  There is a datastore directory (and a corresponding index file) in the make output directory.  The index file will provide the path to individual contigs and in that contig specific directory there is a directory call theVoid.  This contains all of the output of each program that MAKER runs.

B

On Feb 13, 2014, at 1:11 PM, Carson Holt wrote:

> No.  Snap genes do not disappear. All SNAP ab initio calls will always be
> kept as reference fetters marked snap_masked (for repeat masked genome)
> and snap (for unmasked genome).  MAKER then runs SNAP another time where
> it feeds hints to SNAP based on EST and protein alignment evidence.  These
> hint based models can then compete against the ab initio SNAP models to be
> promoted to genes if their AED scores are better.  Fianl models can also
> get UTR added based on EST evidence.  That is why you can get models from
> MAKER that do not match the original SNAP ab initio calls.
> 
> So in summary, all SNAP ab initio models will be in snap_masked.  The
> MAKER models will consist of hint based SNAP rerun plus SNAP ab intio
> models processed to add UTR.
> 
> Thanks,
> Carson
> 
> 
> 
> On 2/13/14, 3:17 AM, "Masa Roller" <masa at bioinfo.hr> wrote:
> 
>> Dear all,
>> 
>> I ran snap2 based gene prediction through maker.
>> 
>> In the resulting gff file, in the source "snap_masked" I can find the
>> score in the score column of every snap prediction that did not get
>> promoted to a maker gene. This would be the score of how well the
>> prediction matches the HMM?
>> 
>> It seems to me that those snap models that are given gene status no
>> longer appear as snap_masked source but only as source "maker". Maker
>> then removes the score column, instead giving AED and eAED scores (which
>> are more about how the model corresponds to the evidence). When viewing
>> the maker transcripts and SNAP predictions in a browser, they do not
>> match (mostly, maker predictions are longer).
>> 
>> I am interested in the score of individual gene predictions that
>> underlined maker gene models. Where could I find that information?
>> 
>> Many thanks!
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140213/4966ce40/attachment-0002.html>

From mptrsen at uni-bonn.de  Thu Feb 13 20:00:24 2014
From: mptrsen at uni-bonn.de (Malte Petersen)
Date: Fri, 14 Feb 2014 04:00:24 +0100
Subject: [maker-devel] BLAST options error / should Maker check for file
	format?
Message-ID: <52FD86C8.6040007@uni-bonn.de>

Dear MAKER devs,

I was running Maker version 2.30p-beta on an insect genome, and it
didn't produce any output. I got these error messages:


Widget::formater:
/path/to/makeblastdb -dbtype nucl -in
/tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-62_e3%2Escaf.mpi.10.0
#-------------------------------#
BLAST options error: File
/tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-62_e3%2Escaf.mpi.10.0
is empty
ERROR: /path/to/makeblastdb failed in Widget::formater
--> rank=NA, hostname=Jeanne-GBR
ERROR: Failed while doing blastn of ESTs
ERROR: Chunk failed at level:0, tier_type:3
FAILED CONTIG:scf7180005143343

ERROR: Chunk failed at level:4, tier_type:0
FAILED CONTIG:scf7180005143343


I figured out that this error is due to a non-Fasta file format being
fed to Maker as extrinsic evidence (I gave it a meta-info file).  While
I got the pipeline running now with the correct file, I think that it
should be complaining (a lot earlier) if any of the input files are of
the wrong format.  More people might run into this problem and have no
idea where to look for a solution.

What do you think?

Best,
Malte


From carsonhh at gmail.com  Thu Feb 13 20:11:22 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 13 Feb 2014 20:11:22 -0700
Subject: [maker-devel] BLAST options error / should Maker check for file
 format?
In-Reply-To: <52FD86C8.6040007@uni-bonn.de>
References: <52FD86C8.6040007@uni-bonn.de>
Message-ID: <CF22D59B.9DEB%carsonhh@gmail.com>

Hi Malte,

Actually there already is.  I?m very surprised your file made it that far.
Normally it fails right away.

Example ?>

STATUS: Parsing control files...
STATUS: Processing and indexing input FASTA files...
ERROR: The fasta file /Users/cholt/Developer/maker/trunk/data/test1
appears to be empty.


Another test file ?>


STATUS: Parsing control files...
STATUS: Processing and indexing input FASTA files...
ERROR: The nucleotide sequence file
'/Users/cholt/Developer/maker/trunk/data/test2'
appears to contain protein sequence or unrecognized characters. Note
the following nucleotides may be valid but are unsupported [RYKMSWBDHV]
Please check/fix the file before continuing, or set -fix_nucleotides on
the command line to fix this automatically.
Invalid Character: 'M'


You seem to have found just the right formula of improper input to get
past the filters on your run :-)


Thanks,
Carson


On 2/13/14, 8:00 PM, "Malte Petersen" <mptrsen at uni-bonn.de> wrote:

>Dear MAKER devs,
>
>I was running Maker version 2.30p-beta on an insect genome, and it
>didn't produce any output. I got these error messages:
>
>
>Widget::formater:
>/path/to/makeblastdb -dbtype nucl -in
>/tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-6
>2_e3%2Escaf.mpi.10.0
>#-------------------------------#
>BLAST options error: File
>/tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-6
>2_e3%2Escaf.mpi.10.0
>is empty
>ERROR: /path/to/makeblastdb failed in Widget::formater
>--> rank=NA, hostname=Jeanne-GBR
>ERROR: Failed while doing blastn of ESTs
>ERROR: Chunk failed at level:0, tier_type:3
>FAILED CONTIG:scf7180005143343
>
>ERROR: Chunk failed at level:4, tier_type:0
>FAILED CONTIG:scf7180005143343
>
>
>I figured out that this error is due to a non-Fasta file format being
>fed to Maker as extrinsic evidence (I gave it a meta-info file).  While
>I got the pipeline running now with the correct file, I think that it
>should be complaining (a lot earlier) if any of the input files are of
>the wrong format.  More people might run into this problem and have no
>idea where to look for a solution.
>
>What do you think?
>
>Best,
>Malte
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From dence at genetics.utah.edu  Fri Feb 14 12:09:08 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Fri, 14 Feb 2014 19:09:08 +0000
Subject: [maker-devel] ERROR: Failed while processing the chunk divide!!
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A89090D3@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A8908ADE@SKREGIXES2.AGR.GC.CA>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D4462B@mxb2.hg.genetics.utah.edu>
	<E8EDFB90D92694478065C37017B3A3A6A8908D02@SKREGIXES2.AGR.GC.CA>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D448BA@mxb2.hg.genetics.utah.edu>,
	<E8EDFB90D92694478065C37017B3A3A6A89090D3@SKREGIXES2.AGR.GC.CA>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D452AD@mxb2.hg.genetics.utah.edu>

Hi Hossein, 

So, this is what is going on. The problem is with the GFF3 file, and the problem is that the exon features in that GFF3 should have the mRNA as their parent instead of the gene. When you deleted the "-mRNA-1", the Name of the mRNA became the same as the Name of the gene, which restored the proper relationship between the features. The same problem exists for the CDS features.

The solution for this is to make the exon and CDS parent's "point" to the mRNA and not the gene. Since MAKER has very regular rules for making names, this should be pretty straight forward. You should be ok with just adding "-mRNA-1" to the end of all the exon and CDS lines. This will work unless there some mRNAs with alternative splice forms because then the mRNA's will end with something like "-mRNA-2". 

I've attached a script that should do this for you. 

Run it with this command

"perl fix_gff3_script.pl <your_gff3> > <fixed_gff3>"

And then run MAKER with the fixed gff3 file in place of the old gff3 file. 

Let me know if that works, 

Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
Sent: Thursday, February 13, 2014 3:27 PM
To: Daniel Ence
Subject: Re: ERROR: Failed while processing the chunk divide!!

Dear Daniel


I downloaded maker 2.31 and ran the same scaffold. Again it gave error on
the gff file. I then removed the word mRNA-1 from my gff file and ran it
again. It seems to have worked this time. Attached are std error files for
first try std-err (the one that failed) and 2nd one named std-err-wo-mRNA
(that apparently worked).  Since the gff file is as evidence only I
thought it should not matter to remove the mRNA-1 naming form the gff file.


Cheers

HB


On 14-02-12 12:59 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hi Hossein,
>
>So, after looking at the gff3 and your control files, I had an idea.
>There's the part of the control file called "Re-annotation Using MAKER
>Derived GFF3", but you can also passthrough features from a gff3 using
>the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines.
>
>Sometimes we encounter problems with the MAKER passthrough. Could you try
>dividing the gff3 file into the different feature sources and passing it
>through the "est_gff" etc options and not with the MAKER passthrough?
>That will tell us if the problem is with the gff3 file or with how MAKER
>is processing it.
>
>Another also to check is to make sure that the contig names in the gff3
>file match the contig names in the fasta file that you're annotating.
>
>Thanks,
>Daniel
>
>
>
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>Sent: Wednesday, February 12, 2014 8:49 AM
>To: Daniel Ence
>Subject: Re: ERROR: Failed while processing the chunk divide!!
>
>Dear Daniel
>
>
>I have generated the files that you requested. I choose Sc00009 from my
>genome which is 30 kb and was one of the scaffolds coming up with error.
>In addition to Ctl files and error output file I also attached a part of
>the gff file related to SC00009 that is indicated in the error message.
>
>
>Thanks for helping with this
>
>
>
>Regards
>
>
>HB
>
>
>
>
>
>
>
>
>
>
>
>
>On 14-02-11 4:59 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>
>>Hi Hossen,
>>
>>I think that what would be the most help right now is if you ran MAKER on
>>only one of those contigs that are failing and send me the entire error
>>output along with the maker control files that you are using. It looks
>>like the error is coming from the gff3 files that you are using as input.
>>
>>Thanks,
>>Daniel
>>
>>
>>
>>Daniel Ence
>>Graduate Student
>>Eccles Institute of Human Genetics
>>University of Utah
>>15 North 2030 East, Room 2100
>>Salt Lake City, UT 84112-5330
>>________________________________________
>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>>Sent: Tuesday, February 11, 2014 3:51 PM
>>To: Daniel Ence
>>Subject: ERROR: Failed while processing the chunk divide!!
>>
>>Dear Daniel
>>
>>I re-started maker and it is still running. But in error our file that
>>has
>>been generated so far it seems that smaller conitgs are affected. There
>>are contigs of 2-4 kb with this error but also I noticed a contig of 30kb
>>length having this error
>>
>>I was wondering if I need to change the setting in the maker_opt file
>>
>>#-----MAKER Behavior Options
>>max_dna_len=100000 #length for dividing up contigs into chunks
>>(increases/decreases  memory usage)
>>min_contig=1 #skip genome contigs below this length (under 10kb are often
>>useless)
>>
>>
>>If I understand correctly max_dna_len   divide conitgs  of over 100kb to
>>smaller chucks. However it is not clear to me that for the min_contig
>>option if the default contig length is 10kb or less, then why I have
>>error
>>message for 30kb long contigs. Should I change this to 0
>>
>>Here is an example of the error message for one of the contigs
>>
>>
>>#--------- command -------------#
>>Widget::exonerate::est2genome:
>>/usr/local/exonerate-2.2.0-x86_64/bin/exonerate  -q
>>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bra
>>s
>>s
>>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35
>>/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta
>>-t
>>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bra
>>s
>>s
>>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom
>>e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-113
>>6
>>.
>>fasta
>>-Q dna -T dna --model est2genome
>>--minintron 20 --showcigar --percent 20 >
>>/raid01/projects/Plasmodiophora/brassica
>>e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.bras
>>s
>>i
>>cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT
>>3
>>S
>>c00001.235-1136.comp14545_c0_seq1.est_exonerate
>>#-------------------------------#
>>cleaning blastn...
>>cleaning tblastx...
>>cleaning blastx...
>>ERROR: Failed on
>>PbPT3Sc00001_S_0.8_1-mRNA-1
>>Check your input GFF3 file for errors!
>>(from GFFDB)
>>
>>FATAL ERROR
>>ERROR: Failed while processing the chunk
>>divide!!
>>
>>ERROR: Chunk failed at level 17
>>!!
>>FAILED CONTIG:PbPT3Sc00001
>>
>>
>>
>>
>>--Next Contig--
>>
>>
>>
>>
>>
>>
>>Regards
>>
>>
>>HB
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>On 14-02-11 12:37 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>
>>>Hossein,
>>>
>>>Ok. So since this error came up on a local install, I'm going to need
>>>some more information to understand what went wrong. Is it the same
>>>contig that always causes this error? If it is, then is the the only
>>>error or warning that MAKER encounters while running on this contig? Or,
>>>if multiple contigs fail, then is it always the same error?
>>>
>>>If you can narrow it down to the smallest possible dataset that
>>>consistently gives the same error, then we canb egin to understand
>>>what's
>>>wrong.
>>>
>>>Thanks,
>>>Daniel
>>>
>>>
>>>Daniel Ence
>>>Graduate Student
>>>Eccles Institute of Human Genetics
>>>University of Utah
>>>15 North 2030 East, Room 2100
>>>Salt Lake City, UT 84112-5330
>>>________________________________________
>>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>>>Sent: Tuesday, February 11, 2014 11:20 AM
>>>To: Daniel Ence
>>>Subject: Re: [maker-devel] Falied to create new account
>>>
>>>Hi Daniel
>>>
>>>I running it through the local server at my work
>>>
>>>
>>>
>>>
>>>
>>>
>>>M. Hossein Borhan, Ph.D.
>>>Research Scientist/ Chercheur Scientifique
>>>Saskatoon Research Centre/Centre de Recherches de Saskatoon
>>>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
>>>107 Science Place, Saskatoon, SK.,S7N 0X2
>>>Telephone/T?l?phone: (306) 385-9441
>>>Facsimile/T?l?copieur: (306) 385-9482
>>>Hossein.borhan at agr.gc.ca
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>On 14-02-11 12:16 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>>
>>>>Hi Hossein,
>>>>
>>>>Did you encounter this error while you were running MAKER on your local
>>>>machine or through the MAKER web annotation service?
>>>>
>>>>Thanks,
>>>>Daniel
>>>>
>>>>
>>>>Daniel Ence
>>>>Graduate Student
>>>>Eccles Institute of Human Genetics
>>>>University of Utah
>>>>15 North 2030 East, Room 2100
>>>>Salt Lake City, UT 84112-5330
>>>>________________________________________
>>>>From: Carson Holt [carsonhh at gmail.com]
>>>>Sent: Tuesday, February 11, 2014 10:18 AM
>>>>To: Daniel Ence
>>>>Cc: Mark Yandell
>>>>Subject: FW: [maker-devel] Falied to create new account
>>>>
>>>>Hey Daniel could you download his dataset, and see if you can replicate
>>>>the error.  Also check if this was an MWAS job or a local maker run
>>>>(his
>>>>dataset will already be there for MWAS, you just need the job ID).
>>>>
>>>>Thanks,
>>>>Carson
>>>>
>>>>On 2/11/14, 10:16 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA>
>>>>wrote:
>>>>
>>>>>Hi Carson
>>>>>
>>>>>
>>>>>I encountered this error while running maker
>>>>>
>>>>>FATAL ERROR
>>>>>ERROR: Failed while processing the chunk divide!!
>>>>>
>>>>>ERROR: Chunk failed at level 17
>>>>>!!
>>>>>FAILED CONTIG:PbPT3Sc00006
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>HB
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix_gff3_script.pl
Type: application/octet-stream
Size: 349 bytes
Desc: fix_gff3_script.pl
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140214/364c961e/attachment-0002.obj>

From claudio.valero at wur.nl  Mon Feb 17 02:23:21 2014
From: claudio.valero at wur.nl (Valero Jimenez, Claudio)
Date: Mon, 17 Feb 2014 09:23:21 +0000
Subject: [maker-devel] Maker not predicting many genes
Message-ID: <A60E0B903F7C834D8F8ED0D21DE86ECF1CF820@SCOMP0936.wurnet.nl>

Dear list,

I'm trying to annotate a fungal genome, and I'm surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

Regards,

Claudio


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140217/69ce0cfc/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: maker_opts.log
Type: application/octet-stream
Size: 4776 bytes
Desc: maker_opts.log
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140217/69ce0cfc/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SOBA.pdf
Type: application/pdf
Size: 210262 bytes
Desc: SOBA.pdf
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140217/69ce0cfc/attachment-0002.pdf>

From carson.holt at genetics.utah.edu  Mon Feb 17 12:22:13 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Mon, 17 Feb 2014 19:22:13 +0000
Subject: [maker-devel] Maker not predicting many genes
In-Reply-To: <A60E0B903F7C834D8F8ED0D21DE86ECF1CF820@SCOMP0936.wurnet.nl>
References: <A60E0B903F7C834D8F8ED0D21DE86ECF1CF820@SCOMP0936.wurnet.nl>
Message-ID: <CF27AB29.9F59%carson.holt@genetics.utah.edu>

You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off.

If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).

Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.

Thanks,
Carson


From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>
Date: Monday, February 17, 2014 at 2:23 AM
To: "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Maker not predicting many genes

Dear list,

I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

Regards,

Claudio


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140217/d8a9d19c/attachment-0002.html>

From carsonhh at gmail.com  Mon Feb 17 12:26:05 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 17 Feb 2014 12:26:05 -0700
Subject: [maker-devel] Maker not predicting many genes
Message-ID: <CF27AFF8.9F83%carsonhh@gmail.com>

>From your control file, it looks like not setting single_exon=1, and only
using UniProt rather than supplying complete proteomes of a related species
are your primary shortcomings.  I?d set correct_est_fusion=1 as well.

?Carson


From:  Carson Holt <carson.holt at genetics.utah.edu>
Date:  Monday, February 17, 2014 at 12:22 PM
To:  "Valero Jimenez, Claudio" <claudio.valero at wur.nl>,
"'maker-devel at yandell-lab.org'" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Maker not predicting many genes

You also need to look at the contigs in a browser like apollo.  That will
allow you to see both the predictions and the evidence in context.  You can
then see if genes are being dropped because they are only being supported by
single exon evidence, they have no evidence support whatsoever, or if they
are being excluded because of UTR overlap.  That last one is a common
problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so
close that they often overlap in the UTR.  As a result, mRNA-seq assemblers
falsely asseble neighboring genes into single transcripts.  The result is
really long UTR on some of your gene models that force other models to be
excluded.  If this is the case, rerun something like trinity with the
jacquard clip option set  to avoid transcript fusion.  Then set
correct_est_fusion=1 in the MAKER control files to get those long false
UTR?s clipped off.

If it is a lack of evidence overlap, make sure you provided minimum 1
proteome from a related species to the protein= option.  At least 2
proteomes are recommended though (these are not proteins from the same
species but rather complete proteomes from related species).  Also
comprehensive databases like UniProt/Swiss-prot are not sufficient on their
own, but can supplement the other proteome data.  Also are you providing EST
data?  Note that EST/mRNA-seq data without a proteome from a related species
is also not siufficient (because both quality and how comprehensive
EST/mRNA-seq databsases are can vary so widely, and may only capture as
little as 30% of the genes).

Another thing that comes into play are single exon evidence.  In anything
but fungi, single exon evidence is mostly caused by spurious alignments.
But fungi have so many single exon genes, that this is not the case for
them.  Make sure single_exon=1 is set to allow that evidence to be kept, and
set the length of single exon evidence to keep to something like 250 bp.

Thanks,
Carson


From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl>
Date: Monday, February 17, 2014 at 2:23 AM
To: "'maker-devel at yandell-lab.org'" <maker-devel at yandell-lab.org>
Subject: Maker not predicting many genes

Dear list,
 
I?m trying to annotate a fungal genome, and I?m surprised that Maker does
not predict many genes (3697). I have trained SNAP and followed all the
tutorials available. Ab initio predictors are able to predict between
8000-10000 genes. It is something that I have in the configuration file that
is wrong?? I attach the ops file and the SOBA summary of the annotation.
 
Regards,
 
Claudio
 
 
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140217/6c29cf24/attachment-0002.html>

From claudio.valero at wur.nl  Wed Feb 19 01:20:04 2014
From: claudio.valero at wur.nl (Valero Jimenez, Claudio)
Date: Wed, 19 Feb 2014 08:20:04 +0000
Subject: [maker-devel] Maker not predicting many genes
In-Reply-To: <CF27AFF8.9F83%carsonhh@gmail.com>
References: <CF27AFF8.9F83%carsonhh@gmail.com>
Message-ID: <A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>

Hi Carson,

Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error:

Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67.

Similar thing happens when I try fasta_merge:

Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52.

I never had this problem before with these commands.


Regards,

Claudio

From: Carson Holt [mailto:carsonhh at gmail.com]
Sent: maandag 17 februari 2014 20:26
To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org'
Subject: Re: [maker-devel] Maker not predicting many genes

From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings.  I?d set correct_est_fusion=1 as well.

?Carson


From: Carson Holt <carson.holt at genetics.utah.edu<mailto:carson.holt at genetics.utah.edu>>
Date: Monday, February 17, 2014 at 12:22 PM
To: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>, "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] Maker not predicting many genes

You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off.

If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).

Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.

Thanks,
Carson


From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>
Date: Monday, February 17, 2014 at 2:23 AM
To: "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Maker not predicting many genes

Dear list,

I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

Regards,

Claudio


_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140219/ac13ef29/attachment-0002.html>

From carsonhh at gmail.com  Wed Feb 19 08:34:33 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 19 Feb 2014 08:34:33 -0700
Subject: [maker-devel] Maker not predicting many genes
In-Reply-To: <A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>
References: <CF27AFF8.9F83%carsonhh@gmail.com>
	<A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>
Message-ID: <CF2A1C44.A02B%carsonhh@gmail.com>

You provided a directory rather than a file to the -d option (?d' stands for
datastore log).
You must provide the location of the datastore index log file and not the
datastore directory.

Example ?> ./dpp_contig.maker.output/dpp_contig_master_datastore_index.log

Thanks,
Carson


From:  "Valero Jimenez, Claudio" <claudio.valero at wur.nl>
Date:  Wednesday, February 19, 2014 at 1:20 AM
To:  Carson Holt <carsonhh at gmail.com>, Carson Holt
<carson.holt at genetics.utah.edu>, "'maker-devel at yandell-lab.org'"
<maker-devel at yandell-lab.org>
Subject:  RE: [maker-devel] Maker not predicting many genes

Hi Carson,
 
Thank you for your suggestions. I ran again Maker and it was able to predict
many more genes. Although I have a different problem now. I try to run
gff3_merge and get the following error:
 
Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge
line 67.
 
Similar thing happens when I try fasta_merge:
 
Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge
line 52.
 
I never had this problem before with these commands.
 
 
Regards,
 
Claudio
 

From: Carson Holt [mailto:carsonhh at gmail.com]
Sent: maandag 17 februari 2014 20:26
To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org'
Subject: Re: [maker-devel] Maker not predicting many genes
 

>From your control file, it looks like not setting single_exon=1, and only
using UniProt rather than supplying complete proteomes of a related species
are your primary shortcomings.  I?d set correct_est_fusion=1 as well.

 
?Carson

 
From: Carson Holt <carson.holt at genetics.utah.edu>
Date: Monday, February 17, 2014 at 12:22 PM
To: "Valero Jimenez, Claudio" <claudio.valero at wur.nl>,
"'maker-devel at yandell-lab.org'" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Maker not predicting many genes

 
You also need to look at the contigs in a browser like apollo.  That will
allow you to see both the predictions and the evidence in context.  You can
then see if genes are being dropped because they are only being supported by
single exon evidence, they have no evidence support whatsoever, or if they
are being excluded because of UTR overlap.  That last one is a common
problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so
close that they often overlap in the UTR.  As a result, mRNA-seq assemblers
falsely asseble neighboring genes into single transcripts.  The result is
really long UTR on some of your gene models that force other models to be
excluded.  If this is the case, rerun something like trinity with the
jacquard clip option set  to avoid transcript fusion.  Then set
correct_est_fusion=1 in the MAKER control files to get those long false
UTR?s clipped off.

 
If it is a lack of evidence overlap, make sure you provided minimum 1
proteome from a related species to the protein= option.  At least 2
proteomes are recommended though (these are not proteins from the same
species but rather complete proteomes from related species).  Also
comprehensive databases like UniProt/Swiss-prot are not sufficient on their
own, but can supplement the other proteome data.  Also are you providing EST
data?  Note that EST/mRNA-seq data without a proteome from a related species
is also not siufficient (because both quality and how comprehensive
EST/mRNA-seq databsases are can vary so widely, and may only capture as
little as 30% of the genes).

 
Another thing that comes into play are single exon evidence.  In anything
but fungi, single exon evidence is mostly caused by spurious alignments.
But fungi have so many single exon genes, that this is not the case for
them.  Make sure single_exon=1 is set to allow that evidence to be kept, and
set the length of single exon evidence to keep to something like 250 bp.

 
Thanks,

Carson

 
From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl>
Date: Monday, February 17, 2014 at 2:23 AM
To: "'maker-devel at yandell-lab.org'" <maker-devel at yandell-lab.org>
Subject: Maker not predicting many genes

 
Dear list,
 
I?m trying to annotate a fungal genome, and I?m surprised that Maker does
not predict many genes (3697). I have trained SNAP and followed all the
tutorials available. Ab initio predictors are able to predict between
8000-10000 genes. It is something that I have in the configuration file that
is wrong?? I attach the ops file and the SOBA summary of the annotation.
 
Regards,
 
Claudio
 
 
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
<http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140219/a158d5b1/attachment-0002.html>

From dence at genetics.utah.edu  Wed Feb 19 09:04:08 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Wed, 19 Feb 2014 16:04:08 +0000
Subject: [maker-devel] Maker not predicting many genes
In-Reply-To: <A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>
References: <CF27AFF8.9F83%carsonhh@gmail.com>,
	<A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6272A@mxb2.hg.genetics.utah.edu>

Hi Claudio,

What was the command line you used for gff3_merge?

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl]
Sent: Wednesday, February 19, 2014 1:20 AM
To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org'
Subject: Re: [maker-devel] Maker not predicting many genes

Hi Carson,

Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error:

Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67.

Similar thing happens when I try fasta_merge:

Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52.

I never had this problem before with these commands.


Regards,

Claudio

From: Carson Holt [mailto:carsonhh at gmail.com]
Sent: maandag 17 februari 2014 20:26
To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org'
Subject: Re: [maker-devel] Maker not predicting many genes

>From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings.  I?d set correct_est_fusion=1 as well.

?Carson


From: Carson Holt <carson.holt at genetics.utah.edu<mailto:carson.holt at genetics.utah.edu>>
Date: Monday, February 17, 2014 at 12:22 PM
To: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>, "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] Maker not predicting many genes

You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off.

If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).

Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.

Thanks,
Carson


From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>
Date: Monday, February 17, 2014 at 2:23 AM
To: "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Maker not predicting many genes

Dear list,

I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

Regards,

Claudio


_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140219/4b409201/attachment-0002.html>

From claudio.valero at wur.nl  Wed Feb 19 09:33:36 2014
From: claudio.valero at wur.nl (Valero Jimenez, Claudio)
Date: Wed, 19 Feb 2014 16:33:36 +0000
Subject: [maker-devel] Maker not predicting many genes
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6272A@mxb2.hg.genetics.utah.edu>
References: <CF27AFF8.9F83%carsonhh@gmail.com>,
	<A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6272A@mxb2.hg.genetics.utah.edu>
Message-ID: <A60E0B903F7C834D8F8ED0D21DE86ECF1D695A@SCOMP0936.wurnet.nl>

Hi,

Thanks, I had a mistake in the command line!!!

Regards,

Claudio

From: Daniel Ence [mailto:dence at genetics.utah.edu]
Sent: woensdag 19 februari 2014 17:04
To: Valero Jimenez, Claudio; 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org'
Subject: RE: [maker-devel] Maker not predicting many genes

Hi Claudio,

What was the command line you used for gff3_merge?

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl]
Sent: Wednesday, February 19, 2014 1:20 AM
To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org'
Subject: Re: [maker-devel] Maker not predicting many genes
Hi Carson,

Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error:

Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67.

Similar thing happens when I try fasta_merge:

Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52.

I never had this problem before with these commands.


Regards,

Claudio

From: Carson Holt [mailto:carsonhh at gmail.com]
Sent: maandag 17 februari 2014 20:26
To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org'
Subject: Re: [maker-devel] Maker not predicting many genes

>From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings.  I'd set correct_est_fusion=1 as well.

-Carson


From: Carson Holt <carson.holt at genetics.utah.edu<mailto:carson.holt at genetics.utah.edu>>
Date: Monday, February 17, 2014 at 12:22 PM
To: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>, "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] Maker not predicting many genes

You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR's clipped off.

If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).

Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.

Thanks,
Carson


From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>
Date: Monday, February 17, 2014 at 2:23 AM
To: "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Maker not predicting many genes

Dear list,

I'm trying to annotate a fungal genome, and I'm surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

Regards,

Claudio


_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140219/2ad5ced8/attachment-0002.html>

From barry.utah at gmail.com  Wed Feb 19 11:03:47 2014
From: barry.utah at gmail.com (Barry Moore)
Date: Wed, 19 Feb 2014 11:03:47 -0700
Subject: [maker-devel] Maker not predicting many genes
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6272A@mxb2.hg.genetics.utah.edu>
References: <CF27AFF8.9F83%carsonhh@gmail.com>,
	<A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6272A@mxb2.hg.genetics.utah.edu>
Message-ID: <0F5B5A10-4B50-47EC-847B-0223E4CCF612@genetics.utah.edu>

Hi Daniel,

Could you add an error message to those two scripts that detects that a filename is missing or that a directory was given instead and gives the user a suggested solution.

Thanks,

B

On Feb 19, 2014, at 9:04 AM, Daniel Ence wrote:

> Hi Claudio, 
> 
> What was the command line you used for gff3_merge?
> 
> Thanks,
> Daniel
> 
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl]
> Sent: Wednesday, February 19, 2014 1:20 AM
> To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org'
> Subject: Re: [maker-devel] Maker not predicting many genes
> 
> Hi Carson,
>  
> Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error:
>  
> Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67.
>  
> Similar thing happens when I try fasta_merge:
>  
> Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52.
>  
> I never had this problem before with these commands.
>  
>  
> Regards,
>  
> Claudio
>  
> From: Carson Holt [mailto:carsonhh at gmail.com] 
> Sent: maandag 17 februari 2014 20:26
> To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org'
> Subject: Re: [maker-devel] Maker not predicting many genes
>  
> From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings.  I?d set correct_est_fusion=1 as well.
>  
> ?Carson
>  
>  
> From: Carson Holt <carson.holt at genetics.utah.edu>
> Date: Monday, February 17, 2014 at 12:22 PM
> To: "Valero Jimenez, Claudio" <claudio.valero at wur.nl>, "'maker-devel at yandell-lab.org'" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Maker not predicting many genes
>  
> You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off.
>  
> If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).
>  
> Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.
>  
> Thanks,
> Carson
>  
>  
>  
>  
>  
>  
> From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl>
> Date: Monday, February 17, 2014 at 2:23 AM
> To: "'maker-devel at yandell-lab.org'" <maker-devel at yandell-lab.org>
> Subject: Maker not predicting many genes
>  
> Dear list,
>  
> I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.
>  
> Regards,
>  
> Claudio
>  
>  
> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140219/fa42921a/attachment-0002.html>

From carson.holt at genetics.utah.edu  Wed Feb 19 11:06:52 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Wed, 19 Feb 2014 18:06:52 +0000
Subject: [maker-devel] Maker not predicting many genes
In-Reply-To: <0F5B5A10-4B50-47EC-847B-0223E4CCF612@genetics.utah.edu>
References: <CF27AFF8.9F83%carsonhh@gmail.com>
	<A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6272A@mxb2.hg.genetics.utah.edu>
	<0F5B5A10-4B50-47EC-847B-0223E4CCF612@genetics.utah.edu>
Message-ID: <CF2A4058.A064%carson.holt@genetics.utah.edu>

You only need to swap a single character in the script.  Just change the  -e (exists) test to a -f (is file) test.

Thanks,
Carson

From: Barry Moore <barry.utah at gmail.com<mailto:barry.utah at gmail.com>>
Date: Wednesday, February 19, 2014 at 11:03 AM
To: Daniel Ence <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>>
Cc: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>, Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>, Carson Holt <carson.holt at genetics.utah.edu<mailto:carson.holt at genetics.utah.edu>>, "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] Maker not predicting many genes

Hi Daniel,

Could you add an error message to those two scripts that detects that a filename is missing or that a directory was given instead and gives the user a suggested solution.

Thanks,

B

On Feb 19, 2014, at 9:04 AM, Daniel Ence wrote:

Hi Claudio,

What was the command line you used for gff3_merge?

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org>] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>]
Sent: Wednesday, February 19, 2014 1:20 AM
To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>'
Subject: Re: [maker-devel] Maker not predicting many genes

Hi Carson,

Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error:

Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67.

Similar thing happens when I try fasta_merge:

Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52.

I never had this problem before with these commands.


Regards,

Claudio

From: Carson Holt [mailto:carsonhh at gmail.com]
Sent: maandag 17 februari 2014 20:26
To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>'
Subject: Re: [maker-devel] Maker not predicting many genes

From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings.  I?d set correct_est_fusion=1 as well.

?Carson


From: Carson Holt <carson.holt at genetics.utah.edu<mailto:carson.holt at genetics.utah.edu>>
Date: Monday, February 17, 2014 at 12:22 PM
To: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>, "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] Maker not predicting many genes

You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off.

If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).

Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.

Thanks,
Carson


From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>
Date: Monday, February 17, 2014 at 2:23 AM
To: "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Maker not predicting many genes

Dear list,

I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

Regards,

Claudio


_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140219/6a80ec35/attachment-0002.html>

From gtaylor at bcgsc.ca  Fri Feb 21 11:48:42 2014
From: gtaylor at bcgsc.ca (Greg Taylor)
Date: Fri, 21 Feb 2014 10:48:42 -0800
Subject: [maker-devel] Maker jobs hanging
Message-ID: <C521977B031ADB40857D0FE9C98CC82737CC600AA1@xchange4>

Hello,
 I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb genome with predictors SNAP and Genemark, and using ABySS assembled RNA-seq data. To do this I am using 480 processors on our local cluster. Once a run begins, 479 contigs are started, as noted in the *_master_datastore_index.log file, the standard error log for the whole job looks normal, as do the run.log and run.log.child.0 for the daughter processes. This seems to be sequence dependent, as re-running contigs that hang doesn't help, the same contigs will always hang. I'm still looking into this myself, but it seems most if not all the jobs are stuck at the Blastx stage. If you have any suggestions, your help would be greatly appreciated.

sincerely,
Greg Taylor


From dence at genetics.utah.edu  Fri Feb 21 11:54:17 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Fri, 21 Feb 2014 18:54:17 +0000
Subject: [maker-devel] Maker jobs hanging
In-Reply-To: <C521977B031ADB40857D0FE9C98CC82737CC600AA1@xchange4>
References: <C521977B031ADB40857D0FE9C98CC82737CC600AA1@xchange4>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D66CD0@mxb2.hg.genetics.utah.edu>

Hi Greg, 

Since this is probably going to be a more complicated situation, would you upload your data and control file at this URL so that we can try to replicate the error on our machines?
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=166

Also, which version of MPI are you using? And you might want to try updating MAKER. I think version 2.31 was just updated a few weeks ago. 

Thanks,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Greg Taylor [gtaylor at bcgsc.ca]
Sent: Friday, February 21, 2014 11:48 AM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] Maker jobs hanging

Hello,
 I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb genome with predictors SNAP and Genemark, and using ABySS assembled RNA-seq data. To do this I am using 480 processors on our local cluster. Once a run begins, 479 contigs are started, as noted in the *_master_datastore_index.log file, the standard error log for the whole job looks normal, as do the run.log and run.log.child.0 for the daughter processes. This seems to be sequence dependent, as re-running contigs that hang doesn't help, the same contigs will always hang. I'm still looking into this myself, but it seems most if not all the jobs are stuck at the Blastx stage. If you have any suggestions, your help would be greatly appreciated.

sincerely,
Greg Taylor
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Fri Feb 21 11:56:50 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 21 Feb 2014 11:56:50 -0700
Subject: [maker-devel] Maker jobs hanging
Message-ID: <CF2CEDC6.A15D%carsonhh@gmail.com>

Use 2.31.  It has been tested to work without issue on several thousand
cpus.  Also use OpenMPI for any jobs greater than 100 cpus. In addition,
OpenMPI can freeze on some systems without the following flag when using
perl based MPI programs --> -mca btl ^openib

Example --> mpiexec -mca btl ^openib -n 200 maker


Finally, never use MVAPICH2.  It doesn't play well with perl, and freezes
whenever perl based MPI jobs extend across nodes (they run fine within a
single node though).

?Carson


On 2/21/14, 11:48 AM, "Greg Taylor" <gtaylor at bcgsc.ca> wrote:

>Hello,
> I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb
>genome with predictors SNAP and Genemark, and using ABySS assembled
>RNA-seq data. To do this I am using 480 processors on our local cluster.
>Once a run begins, 479 contigs are started, as noted in the
>*_master_datastore_index.log file, the standard error log for the whole
>job looks normal, as do the run.log and run.log.child.0 for the daughter
>processes. This seems to be sequence dependent, as re-running contigs
>that hang doesn't help, the same contigs will always hang. I'm still
>looking into this myself, but it seems most if not all the jobs are stuck
>at the Blastx stage. If you have any suggestions, your help would be
>greatly appreciated.
>
>sincerely,
>Greg Taylor
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From dence at genetics.utah.edu  Fri Feb 21 15:04:34 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Fri, 21 Feb 2014 22:04:34 +0000
Subject: [maker-devel] FW:  Maker jobs hanging
In-Reply-To: <CF2CEDC6.A15D%carsonhh@gmail.com>
References: <CF2CEDC6.A15D%carsonhh@gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D66D7E@mxb2.hg.genetics.utah.edu>

Hi Greg, 

You should be able to have the new MAKER work on the old datastore. Note the following advice from the main MAKER developer, Carson Holt. 

Thanks,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Carson Holt [carsonhh at gmail.com]
Sent: Friday, February 21, 2014 11:56 AM
To: Greg Taylor; maker-devel at yandell-lab.org
Subject: Re: [maker-devel] Maker jobs hanging

Use 2.31.  It has been tested to work without issue on several thousand
cpus.  Also use OpenMPI for any jobs greater than 100 cpus. In addition,
OpenMPI can freeze on some systems without the following flag when using
perl based MPI programs --> -mca btl ^openib

Example --> mpiexec -mca btl ^openib -n 200 maker


Finally, never use MVAPICH2.  It doesn't play well with perl, and freezes
whenever perl based MPI jobs extend across nodes (they run fine within a
single node though).

?Carson


On 2/21/14, 11:48 AM, "Greg Taylor" <gtaylor at bcgsc.ca> wrote:

>Hello,
> I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb
>genome with predictors SNAP and Genemark, and using ABySS assembled
>RNA-seq data. To do this I am using 480 processors on our local cluster.
>Once a run begins, 479 contigs are started, as noted in the
>*_master_datastore_index.log file, the standard error log for the whole
>job looks normal, as do the run.log and run.log.child.0 for the daughter
>processes. This seems to be sequence dependent, as re-running contigs
>that hang doesn't help, the same contigs will always hang. I'm still
>looking into this myself, but it seems most if not all the jobs are stuck
>at the Blastx stage. If you have any suggestions, your help would be
>greatly appreciated.
>
>sincerely,
>Greg Taylor
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From dence at genetics.utah.edu  Fri Feb 21 19:38:59 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Sat, 22 Feb 2014 02:38:59 +0000
Subject: [maker-devel] I am a PhD candidate at NMSU and have a question
	about maker2
In-Reply-To: <7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu>
References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu>,
	<7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D66E9C@mxb2.hg.genetics.utah.edu>

Hi Joe, 

Will you upload your control files and data at this URL?
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169

Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene?

I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues. 

Thanks,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Mark Yandell
Sent: Friday, February 21, 2014 7:32 PM
To: Daniel Ence
Subject: FW: I am a PhD candidate at NMSU and have a question about maker2

Mark Yandell
Professor of Human Genetics
H.A. & Edna Benning Presidential Endowed Chair
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
ph:801-587-7707

________________________________________
From: Joseph Said [joesaid at nmsu.edu]
Sent: Friday, February 21, 2014 5:18 PM
To: Mark Yandell
Subject: I am a PhD candidate at NMSU and have a question about maker2

Dear Dr. Yandell,

I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me.

Thank you,
Joe

Sent from my iPad


From dence at genetics.utah.edu  Fri Feb 21 21:27:10 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Sat, 22 Feb 2014 04:27:10 +0000
Subject: [maker-devel] I am a PhD candidate at NMSU and have a question
	about maker2
In-Reply-To: <d5533a5c463b498e877651cd01820309@BY2PR01MB506.prod.exchangelabs.com>
References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu>,
	<7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu>,
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66E9C@mxb2.hg.genetics.utah.edu>,
	<d5533a5c463b498e877651cd01820309@BY2PR01MB506.prod.exchangelabs.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D66ECE@mxb2.hg.genetics.utah.edu>

Hi Joe, 

MAKER runs blast from your local system (or your server where MAKER is installed), and it blasts evidence that the user supplies in the "est" and "protein" settings. The est and protein settings are set in the maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file and the specific blast settings are in the "maker_bopts.ctl" file. 

Will you attach those file to your reply, so we can make sure that the settings are set up correctly?

Thanks,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Joseph Said [joesaid at nmsu.edu]
Sent: Friday, February 21, 2014 7:44 PM
To: Daniel Ence
Subject: RE: I am a PhD candidate at NMSU and have a question about maker2

Hi Daniel,

Thank you for getting back to me so quickly. I am using the cotton Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I believe maker2 just calls BLAST from NCBI's page. So when I search the cotton genome it returns zero hits. But then I used a known cotton gene as a test and ran a search and also returned zero hits. I am not sure what the problem is but it seems like the protocol that should be returning the results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits. I can a BLAST standalone and came up with hits for both my gene of interest and the control test gene and came up with results.

Thanks,
Joe
________________________________________
From: Daniel Ence <dence at genetics.utah.edu>
Sent: Friday, February 21, 2014 7:38 PM
To: Joseph Said
Cc: maker-devel at yandell-lab.org
Subject: RE: I am a PhD candidate at NMSU and have a question about maker2

Hi Joe,

Will you upload your control files and data at this URL?
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169

Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene?

I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues.

Thanks,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Mark Yandell
Sent: Friday, February 21, 2014 7:32 PM
To: Daniel Ence
Subject: FW: I am a PhD candidate at NMSU and have a question about maker2

Mark Yandell
Professor of Human Genetics
H.A. & Edna Benning Presidential Endowed Chair
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
ph:801-587-7707

________________________________________
From: Joseph Said [joesaid at nmsu.edu]
Sent: Friday, February 21, 2014 5:18 PM
To: Mark Yandell
Subject: I am a PhD candidate at NMSU and have a question about maker2

Dear Dr. Yandell,

I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me.

Thank you,
Joe

Sent from my iPad


From dence at genetics.utah.edu  Sat Feb 22 15:51:48 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Sat, 22 Feb 2014 22:51:48 +0000
Subject: [maker-devel] I am a PhD candidate at NMSU and have a question
	about maker2
In-Reply-To: <CA+ebk3=kXzXEH+DVjKFvMNt689-Gwjw-+6GtySaMG_gZLQ5XvA@mail.gmail.com>
References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu>
	<7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66E9C@mxb2.hg.genetics.utah.edu>
	<d5533a5c463b498e877651cd01820309@BY2PR01MB506.prod.exchangelabs.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66ECE@mxb2.hg.genetics.utah.edu>
	<6FA1C2F9-68A0-4154-8825-7B502E4762BF@nmsu.edu>,
	<CA+ebk3=kXzXEH+DVjKFvMNt689-Gwjw-+6GtySaMG_gZLQ5XvA@mail.gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D66F8F@mxb2.hg.genetics.utah.edu>

Hi,

Will you send me the long file that you were trying to blast against?

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: Hua Zhong [zh9118 at gmail.com]
Sent: Saturday, February 22, 2014 10:46 AM
To: Daniel Ence
Cc: Joe Song; Joseph Said
Subject: Re: I am a PhD candidate at NMSU and have a question about maker2

hi all,
Attached are the three configuration files and two input files, which are used to predict something between the genome and protein. For a simple test, we used one short sequence about 60bp and its translated protein sequence as inputs. But got nothing returned. What's more, we did test long genome sequence as one input as well, but still got nothing. I am not sure what's the reason cause this result.
Thanks a lot for help.

Hua


On Fri, Feb 21, 2014 at 9:31 PM, Joseph Said <joesaid at nmsu.edu<mailto:joesaid at nmsu.edu>> wrote:
Hi Daniel,

I do not have the exact files with me right now, but my coauthors on the paper I am working on have been copied on this email. Hua can send you those files. Thank you for being very helpful especially on a Friday night.

Thanks,
Joe

Sent from my iPad

> On Feb 21, 2014, at 9:27 PM, "Daniel Ence" <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>> wrote:
>
> Hi Joe,
>
> MAKER runs blast from your local system (or your server where MAKER is installed), and it blasts evidence that the user supplies in the "est" and "protein" settings. The est and protein settings are set in the maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file and the specific blast settings are in the "maker_bopts.ctl" file.
>
> Will you attach those file to your reply, so we can make sure that the settings are set up correctly?
>
> Thanks,
> Daniel
>
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ________________________________________
> From: Joseph Said [joesaid at nmsu.edu<mailto:joesaid at nmsu.edu>]
> Sent: Friday, February 21, 2014 7:44 PM
> To: Daniel Ence
> Subject: RE: I am a PhD candidate at NMSU and have a question about maker2
>
> Hi Daniel,
>
> Thank you for getting back to me so quickly. I am using the cotton Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I believe maker2 just calls BLAST from NCBI's page. So when I search the cotton genome it returns zero hits. But then I used a known cotton gene as a test and ran a search and also returned zero hits. I am not sure what the problem is but it seems like the protocol that should be returning the results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits. I can a BLAST standalone and came up with hits for both my gene of interest and the control test gene and came up with results.
>
> Thanks,
> Joe
> ________________________________________
> From: Daniel Ence <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>>
> Sent: Friday, February 21, 2014 7:38 PM
> To: Joseph Said
> Cc: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
> Subject: RE: I am a PhD candidate at NMSU and have a question about maker2
>
> Hi Joe,
>
> Will you upload your control files and data at this URL?
> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169
>
> Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene?
>
> I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues.
>
> Thanks,
> Daniel
>
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ________________________________________
> From: Mark Yandell
> Sent: Friday, February 21, 2014 7:32 PM
> To: Daniel Ence
> Subject: FW: I am a PhD candidate at NMSU and have a question about maker2
>
> Mark Yandell
> Professor of Human Genetics
> H.A. & Edna Benning Presidential Endowed Chair
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ph:801-587-7707
>
> ________________________________________
> From: Joseph Said [joesaid at nmsu.edu<mailto:joesaid at nmsu.edu>]
> Sent: Friday, February 21, 2014 5:18 PM
> To: Mark Yandell
> Subject: I am a PhD candidate at NMSU and have a question about maker2
>
> Dear Dr. Yandell,
>
> I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me.
>
> Thank you,
> Joe
>
> Sent from my iPad

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140222/2fbf1dbc/attachment-0002.html>

From dence at genetics.utah.edu  Sat Feb 22 16:21:51 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Sat, 22 Feb 2014 23:21:51 +0000
Subject: [maker-devel] I am a PhD candidate at NMSU and have a question
	about maker2
In-Reply-To: <CA+ebk3=2mJi_1wxy5gnkOb4syEVZ14Pcj_bGRVcq=uHgySPmqQ@mail.gmail.com>
References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu>
	<7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66E9C@mxb2.hg.genetics.utah.edu>
	<d5533a5c463b498e877651cd01820309@BY2PR01MB506.prod.exchangelabs.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66ECE@mxb2.hg.genetics.utah.edu>
	<6FA1C2F9-68A0-4154-8825-7B502E4762BF@nmsu.edu>
	<CA+ebk3=kXzXEH+DVjKFvMNt689-Gwjw-+6GtySaMG_gZLQ5XvA@mail.gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66F8F@mxb2.hg.genetics.utah.edu>,
	<CA+ebk3=2mJi_1wxy5gnkOb4syEVZ14Pcj_bGRVcq=uHgySPmqQ@mail.gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D66FAB@mxb2.hg.genetics.utah.edu>

Hi Hua, will you upload the genome file to this URL?
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=170
I am more concerned that MAKER didn't find the gene in the whole genome than in the 60bp substring. I think that MAKER needs more sequence than that to annotate a gene model.

Will you also upload the MAKER output and datastore from the MAKER run?

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: Hua Zhong [zh9118 at gmail.com]
Sent: Saturday, February 22, 2014 4:00 PM
To: Daniel Ence
Cc: maker-devel at yandell-lab.org; Joseph Said; Joe Song
Subject: RE: I am a PhD candidate at NMSU and have a question about maker2


The long file we used is a whole genome. Quite huge a file. I am not able to send that. Sorry. But in the simple test i told you, the nucleotide sequence sent you is consider to be the genome file, and protein sequence is another input. There two are what we want to blast against to each other to see if Maker2 works well.
Thanks.

On Feb 22, 2014 3:51 PM, "Daniel Ence" <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>> wrote:
Hi,

Will you send me the long file that you were trying to blast against?

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: Hua Zhong [zh9118 at gmail.com<mailto:zh9118 at gmail.com>]
Sent: Saturday, February 22, 2014 10:46 AM
To: Daniel Ence
Cc: Joe Song; Joseph Said
Subject: Re: I am a PhD candidate at NMSU and have a question about maker2

hi all,
Attached are the three configuration files and two input files, which are used to predict something between the genome and protein. For a simple test, we used one short sequence about 60bp and its translated protein sequence as inputs. But got nothing returned. What's more, we did test long genome sequence as one input as well, but still got nothing. I am not sure what's the reason cause this result.
Thanks a lot for help.

Hua


On Fri, Feb 21, 2014 at 9:31 PM, Joseph Said <joesaid at nmsu.edu<mailto:joesaid at nmsu.edu>> wrote:
Hi Daniel,

I do not have the exact files with me right now, but my coauthors on the paper I am working on have been copied on this email. Hua can send you those files. Thank you for being very helpful especially on a Friday night.

Thanks,
Joe

Sent from my iPad

> On Feb 21, 2014, at 9:27 PM, "Daniel Ence" <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>> wrote:
>
> Hi Joe,
>
> MAKER runs blast from your local system (or your server where MAKER is installed), and it blasts evidence that the user supplies in the "est" and "protein" settings. The est and protein settings are set in the maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file and the specific blast settings are in the "maker_bopts.ctl" file.
>
> Will you attach those file to your reply, so we can make sure that the settings are set up correctly?
>
> Thanks,
> Daniel
>
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ________________________________________
> From: Joseph Said [joesaid at nmsu.edu<mailto:joesaid at nmsu.edu>]
> Sent: Friday, February 21, 2014 7:44 PM
> To: Daniel Ence
> Subject: RE: I am a PhD candidate at NMSU and have a question about maker2
>
> Hi Daniel,
>
> Thank you for getting back to me so quickly. I am using the cotton Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I believe maker2 just calls BLAST from NCBI's page. So when I search the cotton genome it returns zero hits. But then I used a known cotton gene as a test and ran a search and also returned zero hits. I am not sure what the problem is but it seems like the protocol that should be returning the results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits. I can a BLAST standalone and came up with hits for both my gene of interest and the control test gene and came up with results.
>
> Thanks,
> Joe
> ________________________________________
> From: Daniel Ence <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>>
> Sent: Friday, February 21, 2014 7:38 PM
> To: Joseph Said
> Cc: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
> Subject: RE: I am a PhD candidate at NMSU and have a question about maker2
>
> Hi Joe,
>
> Will you upload your control files and data at this URL?
> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169
>
> Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene?
>
> I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues.
>
> Thanks,
> Daniel
>
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ________________________________________
> From: Mark Yandell
> Sent: Friday, February 21, 2014 7:32 PM
> To: Daniel Ence
> Subject: FW: I am a PhD candidate at NMSU and have a question about maker2
>
> Mark Yandell
> Professor of Human Genetics
> H.A. & Edna Benning Presidential Endowed Chair
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ph:801-587-7707<tel:801-587-7707>
>
> ________________________________________
> From: Joseph Said [joesaid at nmsu.edu<mailto:joesaid at nmsu.edu>]
> Sent: Friday, February 21, 2014 5:18 PM
> To: Mark Yandell
> Subject: I am a PhD candidate at NMSU and have a question about maker2
>
> Dear Dr. Yandell,
>
> I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me.
>
> Thank you,
> Joe
>
> Sent from my iPad

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140222/0033879e/attachment-0002.html>

From mikael.durling at slu.se  Sun Feb 23 09:57:09 2014
From: mikael.durling at slu.se (=?iso-8859-1?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Sun, 23 Feb 2014 16:57:09 +0000
Subject: [maker-devel] Maker predicting fusion genes?
Message-ID: <4CFD158A-DE75-4756-AD05-4CBF99BAF72D@slu.se>

Dear list and maker developers,

I was browsing the results of a recent maker run, focusing on differences between this run with the a recent maker (svn r1067) and a previous run with svn revision 1022 (I recall). One of the differences I found was a gene lost in the new prediction set, but replaced by an extended version of a previous neighbor (see http://figshare.com/articles/Maker_prediction_comparison/942300).  As you can see, there is no support for the join in the evidence. Do you have any clue to what might cause this?

Best regards,
Mikael Durling


From carsonhh at gmail.com  Sun Feb 23 13:00:50 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Sun, 23 Feb 2014 13:00:50 -0700
Subject: [maker-devel] Maker predicting fusion genes?
Message-ID: <CF2FA087.A21D%carsonhh@gmail.com>

The image doesn?t show all evidence sources, but the short answer is that
one of you evidence sources (est2genome, protein2genome, or blastx)
bridges the two regions, and when provided the bridged hint one of the
gene predictors thinks it makes sense to create a single model instead.
my guess is that it?s blastx evidence.

?Carson


On 2/23/14, 9:57 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
wrote:

>Dear list and maker developers,
>
>I was browsing the results of a recent maker run, focusing on differences
>between this run with the a recent maker (svn r1067) and a previous run
>with svn revision 1022 (I recall). One of the differences I found was a
>gene lost in the new prediction set, but replaced by an extended version
>of a previous neighbor (see
>http://figshare.com/articles/Maker_prediction_comparison/942300).  As you
>can see, there is no support for the join in the evidence. Do you have
>any clue to what might cause this?
>
>Best regards,
>Mikael Durling
>
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From mikael.durling at slu.se  Sun Feb 23 14:14:00 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Sun, 23 Feb 2014 21:14:00 +0000
Subject: [maker-devel] Maker predicting fusion genes?
In-Reply-To: <CF2FA087.A21D%carsonhh@gmail.com>
References: <CF2FA087.A21D%carsonhh@gmail.com>
Message-ID: <7CCC5270-93B9-4E5A-9687-26A1BF0EB1F8@slu.se>

Ok, do you by that imply that the predictions that end up in the gff3 output from the ab initio predictors (snap_masked, augustus_masked, and genemark), are not the final hinted predictions? Otherwise, I?m sorry that I can?t follow your reasoning. I checked my gff file, and there is no evidence there to support the bridge, as far as I can tell (See attached gff of the region or http://figshare.com/articles/Maker_prediction/942301 where all evidence is plotted).

Mikael


23 feb 2014 kl. 21:00 skrev Carson Holt <carsonhh at gmail.com>:

> The image doesn?t show all evidence sources, but the short answer is that
> one of you evidence sources (est2genome, protein2genome, or blastx)
> bridges the two regions, and when provided the bridged hint one of the
> gene predictors thinks it makes sense to create a single model instead.
> my guess is that it?s blastx evidence.
>
> ?Carson
>
>
> On 2/23/14, 9:57 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
> wrote:
>
>> Dear list and maker developers,
>>
>> I was browsing the results of a recent maker run, focusing on differences
>> between this run with the a recent maker (svn r1067) and a previous run
>> with svn revision 1022 (I recall). One of the differences I found was a
>> gene lost in the new prediction set, but replaced by an extended version
>> of a previous neighbor (see
>> http://figshare.com/articles/Maker_prediction_comparison/942300).  As you
>> can see, there is no support for the join in the evidence. Do you have
>> any clue to what might cause this?
>>
>> Best regards,
>> Mikael Durling
>>
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140223/240ecba4/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: region.gff3
Type: application/octet-stream
Size: 19612 bytes
Desc: region.gff3
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140223/240ecba4/attachment-0002.obj>

From hedgyx at yahoo.com  Mon Feb 24 00:02:41 2014
From: hedgyx at yahoo.com (Megan)
Date: Sun, 23 Feb 2014 23:02:41 -0800 (PST)
Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides
Message-ID: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com>

Maker folks,
I am re-annotating a single contig and I am having a few problems.

First, I am having trouble passing through a Maker derived gff (from Maker 2.09, with some modifications to gene names and functional information added).  The gff file passes the modencode validator but Maker always fails on the first gene in the file, regardless of which gene comes first.  So it appears to be a systematic error across the entire file.  The Maker error is "Check your input GFF3 file for errors! (from GFFDB)".   I have tried Maker 2.10 and 2.31, using both genome_gff with model_pass=1 and pred_gff.  Attached is a gff with the first 2 genes.  

Second, when I updated to Maker 2.31, Maker now complains that my EST fasta file has nucleotides that are not supported [RYKMSWBDHV].  It suggests "set -fix_nucleotides on the command line to fix this automatically".  Is the -fix_nucleotides a Maker flag?  What exactly does it do?  Does it remove the entire sequence or replace ambiguous bases with a randomly selected one?  Half of my 20k ESTs contain these characters, so I don't want to throw them out entirely.  

Also, just curious, has Maker never supported these characters but just never complained?  I used this EST data set with Maker 2.09.  I did note poor EST coverage, but thought it was an issue with the EST data itself.

I appreciate any suggestions.
Thanks,
Megan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: part_passthru.gff
Type: application/octet-stream
Size: 4363 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140223/3950a0b4/attachment-0002.obj>

From zh9118 at gmail.com  Sat Feb 22 16:00:28 2014
From: zh9118 at gmail.com (Hua Zhong)
Date: Sat, 22 Feb 2014 16:00:28 -0700
Subject: [maker-devel] I am a PhD candidate at NMSU and have a question
	about maker2
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D66F8F@mxb2.hg.genetics.utah.edu>
References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu>
	<7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66E9C@mxb2.hg.genetics.utah.edu>
	<d5533a5c463b498e877651cd01820309@BY2PR01MB506.prod.exchangelabs.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66ECE@mxb2.hg.genetics.utah.edu>
	<6FA1C2F9-68A0-4154-8825-7B502E4762BF@nmsu.edu>
	<CA+ebk3=kXzXEH+DVjKFvMNt689-Gwjw-+6GtySaMG_gZLQ5XvA@mail.gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66F8F@mxb2.hg.genetics.utah.edu>
Message-ID: <CA+ebk3=2mJi_1wxy5gnkOb4syEVZ14Pcj_bGRVcq=uHgySPmqQ@mail.gmail.com>

The long file we used is a whole genome. Quite huge a file. I am not able
to send that. Sorry. But in the simple test i told you, the nucleotide
sequence sent you is consider to be the genome file, and protein sequence
is another input. There two are what we want to blast against to each other
to see if Maker2 works well.
Thanks.
On Feb 22, 2014 3:51 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>  Hi,
>
>  Will you send me the long file that you were trying to blast against?
>
>  Thanks,
> Daniel
>
>  Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
>   ------------------------------
> *From:* Hua Zhong [zh9118 at gmail.com]
> *Sent:* Saturday, February 22, 2014 10:46 AM
> *To:* Daniel Ence
> *Cc:* Joe Song; Joseph Said
> *Subject:* Re: I am a PhD candidate at NMSU and have a question about
> maker2
>
>   hi all,
> Attached are the three configuration files and two input files, which are
> used to predict something between the genome and protein. For a simple
> test, we used one short sequence about 60bp and its translated protein
> sequence as inputs. But got nothing returned. What's more, we did test long
> genome sequence as one input as well, but still got nothing. I am not sure
> what's the reason cause this result.
> Thanks a lot for help.
>
>  Hua
>
>
>
>
> On Fri, Feb 21, 2014 at 9:31 PM, Joseph Said <joesaid at nmsu.edu> wrote:
>
>> Hi Daniel,
>>
>> I do not have the exact files with me right now, but my coauthors on the
>> paper I am working on have been copied on this email. Hua can send you
>> those files. Thank you for being very helpful especially on a Friday night.
>>
>> Thanks,
>> Joe
>>
>> Sent from my iPad
>>
>> > On Feb 21, 2014, at 9:27 PM, "Daniel Ence" <dence at genetics.utah.edu>
>> wrote:
>> >
>> > Hi Joe,
>> >
>> > MAKER runs blast from your local system (or your server where MAKER is
>> installed), and it blasts evidence that the user supplies in the "est" and
>> "protein" settings. The est and protein settings are set in the
>> maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file
>> and the specific blast settings are in the "maker_bopts.ctl" file.
>> >
>> > Will you attach those file to your reply, so we can make sure that the
>> settings are set up correctly?
>> >
>> > Thanks,
>> > Daniel
>> >
>> >
>> > Daniel Ence
>> > Graduate Student
>> > Eccles Institute of Human Genetics
>> > University of Utah
>> > 15 North 2030 East, Room 2100
>> > Salt Lake City, UT 84112-5330
>> > ________________________________________
>> > From: Joseph Said [joesaid at nmsu.edu]
>> > Sent: Friday, February 21, 2014 7:44 PM
>> > To: Daniel Ence
>> > Subject: RE: I am a PhD candidate at NMSU and have a question about
>> maker2
>> >
>> > Hi Daniel,
>> >
>> > Thank you for getting back to me so quickly. I am using the cotton
>> Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the
>> GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I
>> believe maker2 just calls BLAST from NCBI's page. So when I search the
>> cotton genome it returns zero hits. But then I used a known cotton gene as
>> a test and ran a search and also returned zero hits. I am not sure what the
>> problem is but it seems like the protocol that should be returning the
>> results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits.
>> I can a BLAST standalone and came up with hits for both my gene of interest
>> and the control test gene and came up with results.
>> >
>> > Thanks,
>> > Joe
>> > ________________________________________
>> > From: Daniel Ence <dence at genetics.utah.edu>
>> > Sent: Friday, February 21, 2014 7:38 PM
>> > To: Joseph Said
>> > Cc: maker-devel at yandell-lab.org
>> > Subject: RE: I am a PhD candidate at NMSU and have a question about
>> maker2
>> >
>> > Hi Joe,
>> >
>> > Will you upload your control files and data at this URL?
>> > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169
>> >
>> > Also, what version of MAKER and blast are you using? And which file are
>> you using for the known arabidopsis gene?
>> >
>> > I've copied this email to the maker-development list, which is a really
>> good resource for trouble-shooting MAKER issues.
>> >
>> > Thanks,
>> > Daniel
>> >
>> >
>> > Daniel Ence
>> > Graduate Student
>> > Eccles Institute of Human Genetics
>> > University of Utah
>> > 15 North 2030 East, Room 2100
>> > Salt Lake City, UT 84112-5330
>> > ________________________________________
>> > From: Mark Yandell
>> > Sent: Friday, February 21, 2014 7:32 PM
>> > To: Daniel Ence
>> > Subject: FW: I am a PhD candidate at NMSU and have a question about
>> maker2
>> >
>> > Mark Yandell
>> > Professor of Human Genetics
>> > H.A. & Edna Benning Presidential Endowed Chair
>> > Eccles Institute of Human Genetics
>> > University of Utah
>> > 15 North 2030 East, Room 2100
>> > Salt Lake City, UT 84112-5330
>> > ph:801-587-7707
>> >
>> > ________________________________________
>> > From: Joseph Said [joesaid at nmsu.edu]
>> > Sent: Friday, February 21, 2014 5:18 PM
>> > To: Mark Yandell
>> > Subject: I am a PhD candidate at NMSU and have a question about maker2
>> >
>> > Dear Dr. Yandell,
>> >
>> > I am a molecular biologist at NMSU. I am trying to use maker2 with the
>> cotton genome, and search an Arabidopsis gene against it. I think there is
>> a problem with the blast component because zero results are returned. I
>> tried troubleshooting by searching a known gene and still returned zero
>> results. Is this a common problem maybe with the pipeline? I would
>> appreciate any ideas you might have to help me.
>> >
>> > Thank you,
>> > Joe
>> >
>> > Sent from my iPad
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140222/57e1804c/attachment-0002.html>

From carsonhh at gmail.com  Mon Feb 24 11:18:18 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Feb 2014 11:18:18 -0700
Subject: [maker-devel] gff pass thru problem and unsupported EST
 nucleotides
In-Reply-To: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com>
References: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com>
Message-ID: <CF30D6EC.A2CC%carsonhh@gmail.com>

The -fix_nucleotides flag is added to the command line (I.e. maker
-fix_nucleotides flag).  It is there so you are aware that there is an
issue with your fasta file, that will cause things downstream to fail.
MAKER can fix the errors for you, but first it gives a warning designed to
make you look at the file and validate it.  Why would you want to do this?
 For example, what if you provided protein sequence to the EST option
accidentally, you wouldn?t want MAKER to just proceed.  You want a warning
so you can check first.  If your file is in fact EST data, then set the
flag and those characters will be changed to N?s in the fixed fasta
sequence, otherwise those characters will cause errors in downstream tools
like exonerate, and even some downstream GMOD tools, so they can?t be
allowed to remain as is.

For the GFF3 file, there is almost definitely a logic issue in the file
(mod encode validator won?t check for those).  This can be from prior
manipulation of the GFF3 file.  For example, IDs for a gene that are the
same across two contigs (technically valid but a logic error).  The GFF3
error message will normally give the ID of the feature causing the issue.

I could also take a look for you.  You can upload the GFF3 file here ?>
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
Click on 'new guest account' then e-mail me back you guest ID, so I know
which files to review.

Thanks,
Carson


On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com> wrote:

>Maker folks,
>I am re-annotating a single contig and I am having a few problems.
>
>First, I am having trouble passing through a Maker derived gff (from
>Maker 2.09, with some modifications to gene names and functional
>information added).  The gff file passes the modencode validator but
>Maker always fails on the first gene in the file, regardless of which
>gene comes first.  So it appears to be a systematic error across the
>entire file.  The Maker error is "Check your input GFF3 file for errors!
>(from GFFDB)".   I have tried Maker 2.10 and 2.31, using both genome_gff
>with model_pass=1 and pred_gff.  Attached is a gff with the first 2
>genes.  
>
>Second, when I updated to Maker 2.31, Maker now complains that my EST
>fasta file has nucleotides that are not supported [RYKMSWBDHV].  It
>suggests "set -fix_nucleotides on the command line to fix this
>automatically".  Is the -fix_nucleotides a Maker flag?  What exactly does
>it do?  Does it remove the entire sequence or replace ambiguous bases
>with a randomly selected one?  Half of my 20k ESTs contain these
>characters, so I don't want to throw them out entirely.
>
>Also, just curious, has Maker never supported these characters but just
>never complained?  I used this EST data set with Maker 2.09.  I did note
>poor EST coverage, but thought it was an issue with the EST data itself.
>
>I appreciate any suggestions.
>Thanks,
>Megan_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From dence at genetics.utah.edu  Mon Feb 24 11:31:47 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Mon, 24 Feb 2014 18:31:47 +0000
Subject: [maker-devel] gff pass thru problem and unsupported EST
 nucleotides
In-Reply-To: <CF30D6EC.A2CC%carsonhh@gmail.com>
References: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com>,
	<CF30D6EC.A2CC%carsonhh@gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D671BB@mxb2.hg.genetics.utah.edu>

Hi Megan, 

One problem with the GFF3 that you attached is that the ID's for the CDS features are being made wrong. All of the CDS features for a given mRNA or transcript should have the same ID. The CDS features in your GFF3 have IDs that use the exon name. 

You can fix it with this command-line perl:
cat part_passthru.gff | perl -ane 'if(/\tCDS\t/){ chomp; /Parent=([\S]+)/; my $parent=$1; s/ID=([^\;]+)/ID=$parent-cds/; print "$_\n"}else{print $_}' > fixed.gff3

It just fixes the ID attributes in all of the CDS features. Try it on the test gff3 you sent and let me know if it works. I can't test it myself without the fasta file that you are annotating. 

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Carson Holt [carsonhh at gmail.com]
Sent: Monday, February 24, 2014 11:18 AM
To: Megan; maker-devel at yandell-lab.org
Subject: Re: [maker-devel] gff pass thru problem and unsupported EST nucleotides

The -fix_nucleotides flag is added to the command line (I.e. maker
-fix_nucleotides flag).  It is there so you are aware that there is an
issue with your fasta file, that will cause things downstream to fail.
MAKER can fix the errors for you, but first it gives a warning designed to
make you look at the file and validate it.  Why would you want to do this?
 For example, what if you provided protein sequence to the EST option
accidentally, you wouldn?t want MAKER to just proceed.  You want a warning
so you can check first.  If your file is in fact EST data, then set the
flag and those characters will be changed to N?s in the fixed fasta
sequence, otherwise those characters will cause errors in downstream tools
like exonerate, and even some downstream GMOD tools, so they can?t be
allowed to remain as is.

For the GFF3 file, there is almost definitely a logic issue in the file
(mod encode validator won?t check for those).  This can be from prior
manipulation of the GFF3 file.  For example, IDs for a gene that are the
same across two contigs (technically valid but a logic error).  The GFF3
error message will normally give the ID of the feature causing the issue.

I could also take a look for you.  You can upload the GFF3 file here ?>
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
Click on 'new guest account' then e-mail me back you guest ID, so I know
which files to review.

Thanks,
Carson


On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com> wrote:

>Maker folks,
>I am re-annotating a single contig and I am having a few problems.
>
>First, I am having trouble passing through a Maker derived gff (from
>Maker 2.09, with some modifications to gene names and functional
>information added).  The gff file passes the modencode validator but
>Maker always fails on the first gene in the file, regardless of which
>gene comes first.  So it appears to be a systematic error across the
>entire file.  The Maker error is "Check your input GFF3 file for errors!
>(from GFFDB)".   I have tried Maker 2.10 and 2.31, using both genome_gff
>with model_pass=1 and pred_gff.  Attached is a gff with the first 2
>genes.
>
>Second, when I updated to Maker 2.31, Maker now complains that my EST
>fasta file has nucleotides that are not supported [RYKMSWBDHV].  It
>suggests "set -fix_nucleotides on the command line to fix this
>automatically".  Is the -fix_nucleotides a Maker flag?  What exactly does
>it do?  Does it remove the entire sequence or replace ambiguous bases
>with a randomly selected one?  Half of my 20k ESTs contain these
>characters, so I don't want to throw them out entirely.
>
>Also, just curious, has Maker never supported these characters but just
>never complained?  I used this EST data set with Maker 2.09.  I did note
>poor EST coverage, but thought it was an issue with the EST data itself.
>
>I appreciate any suggestions.
>Thanks,
>Megan_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Mon Feb 24 11:34:28 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Feb 2014 11:34:28 -0700
Subject: [maker-devel] gff pass thru problem and unsupported EST
 nucleotides
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D671BB@mxb2.hg.genetics.utah.edu>
References: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com>
	<CF30D6EC.A2CC%carsonhh@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D671BB@mxb2.hg.genetics.utah.edu>
Message-ID: <CF30DE6B.A2F6%carsonhh@gmail.com>

Actually that is not true.  CDS IDs can be the same or different.  MAKER
doesn?t care either way.  Both are valid in GFF3.  Having the same ID just
allows then to be put together by some GMOD viewers without having to go
through a container feature.

?Carson

On 2/24/14, 11:31 AM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hi Megan, 
>
>One problem with the GFF3 that you attached is that the ID's for the CDS
>features are being made wrong. All of the CDS features for a given mRNA
>or transcript should have the same ID. The CDS features in your GFF3 have
>IDs that use the exon name.
>
>You can fix it with this command-line perl:
>cat part_passthru.gff | perl -ane 'if(/\tCDS\t/){ chomp;
>/Parent=([\S]+)/; my $parent=$1; s/ID=([^\;]+)/ID=$parent-cds/; print
>"$_\n"}else{print $_}' > fixed.gff3
>
>It just fixes the ID attributes in all of the CDS features. Try it on the
>test gff3 you sent and let me know if it works. I can't test it myself
>without the fasta file that you are annotating.
>
>Thanks,
>Daniel
>
>Daniel Ence
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>Carson Holt [carsonhh at gmail.com]
>Sent: Monday, February 24, 2014 11:18 AM
>To: Megan; maker-devel at yandell-lab.org
>Subject: Re: [maker-devel] gff pass thru problem and unsupported EST
>nucleotides
>
>The -fix_nucleotides flag is added to the command line (I.e. maker
>-fix_nucleotides flag).  It is there so you are aware that there is an
>issue with your fasta file, that will cause things downstream to fail.
>MAKER can fix the errors for you, but first it gives a warning designed to
>make you look at the file and validate it.  Why would you want to do this?
> For example, what if you provided protein sequence to the EST option
>accidentally, you wouldn?t want MAKER to just proceed.  You want a warning
>so you can check first.  If your file is in fact EST data, then set the
>flag and those characters will be changed to N?s in the fixed fasta
>sequence, otherwise those characters will cause errors in downstream tools
>like exonerate, and even some downstream GMOD tools, so they can?t be
>allowed to remain as is.
>
>For the GFF3 file, there is almost definitely a logic issue in the file
>(mod encode validator won?t check for those).  This can be from prior
>manipulation of the GFF3 file.  For example, IDs for a gene that are the
>same across two contigs (technically valid but a logic error).  The GFF3
>error message will normally give the ID of the feature causing the issue.
>
>I could also take a look for you.  You can upload the GFF3 file here ?>
>http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>Click on 'new guest account' then e-mail me back you guest ID, so I know
>which files to review.
>
>Thanks,
>Carson
>
>
>
>On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com> wrote:
>
>>Maker folks,
>>I am re-annotating a single contig and I am having a few problems.
>>
>>First, I am having trouble passing through a Maker derived gff (from
>>Maker 2.09, with some modifications to gene names and functional
>>information added).  The gff file passes the modencode validator but
>>Maker always fails on the first gene in the file, regardless of which
>>gene comes first.  So it appears to be a systematic error across the
>>entire file.  The Maker error is "Check your input GFF3 file for errors!
>>(from GFFDB)".   I have tried Maker 2.10 and 2.31, using both genome_gff
>>with model_pass=1 and pred_gff.  Attached is a gff with the first 2
>>genes.
>>
>>Second, when I updated to Maker 2.31, Maker now complains that my EST
>>fasta file has nucleotides that are not supported [RYKMSWBDHV].  It
>>suggests "set -fix_nucleotides on the command line to fix this
>>automatically".  Is the -fix_nucleotides a Maker flag?  What exactly does
>>it do?  Does it remove the entire sequence or replace ambiguous bases
>>with a randomly selected one?  Half of my 20k ESTs contain these
>>characters, so I don't want to throw them out entirely.
>>
>>Also, just curious, has Maker never supported these characters but just
>>never complained?  I used this EST data set with Maker 2.09.  I did note
>>poor EST coverage, but thought it was an issue with the EST data itself.
>>
>>I appreciate any suggestions.
>>Thanks,
>>Megan_______________________________________________
>>maker-devel mailing list
>>maker-devel at box290.bluehost.com
>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Mon Feb 24 13:59:12 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Feb 2014 13:59:12 -0700
Subject: [maker-devel] gff pass thru problem and unsupported EST
 nucleotides
In-Reply-To: <1393273971.41635.YahooMailBasic@web162205.mail.bf1.yahoo.com>
References: <CF30D6EC.A2CC%carsonhh@gmail.com>
	<1393273971.41635.YahooMailBasic@web162205.mail.bf1.yahoo.com>
Message-ID: <CF30FEE0.A32D%carsonhh@gmail.com>

I found the issue.  You have non-ascii characters at the end of almost
every line.  Because they are happening within the Parent= tag, they then
become part of the Parent ID when the file is read.

So instead of "HERA000031-RA? you get ?> "HERA000031-RA\cM? as the Parent
ID.

?\cM? is a meta-return.

I ran the attached script to remove these characters (perl purify
<gff3_file>), and then it works.  Make sure to remove the
.../Hera_Cr_HmelHybd_Nov2013.maker.output/Hera_Cr_HmelHybd_Nov2013.db file
to force the GFF3 database to be rebuilt after fixing the file when you
rerun MAKER.

Thanks,
Carson


On 2/24/14, 1:32 PM, "Megan" <hedgyx at yahoo.com> wrote:

>Hi Carson and Daniel,
>
>Thanks for your suggestions.  I have looked at the gff file, but I do not
>see any obvious errors.  I have uploaded the files to your website.  The
>reference fasta is there, the full gff, and a single gene gff that also
>causes an error.  If I remove that gene from the full gff, then the error
>is on the next gene in the file, so it appears to be a systematic problem
>throughout the gff.  The gff was generated by Maker, but I may have
>messed it up when I modified it to rename genes and add functional
>information.  I checked with cat -te, but don't see any obvious
>formatting errors.
>
>Thanks!
>Megan
>
>
>--------------------------------------------
>On Mon, 2/24/14, Carson Holt <carsonhh at gmail.com> wrote:
>
> Subject: Re: [maker-devel] gff pass thru problem and unsupported EST
>nucleotides
> To: "Megan" <hedgyx at yahoo.com>, maker-devel at yandell-lab.org
> Date: Monday, February 24, 2014, 10:18 AM
> 
> The -fix_nucleotides flag is added to
> the command line (I.e. maker
> -fix_nucleotides flag).  It is there so you are aware
> that there is an
> issue with your fasta file, that will cause things
> downstream to fail.
> MAKER can fix the errors for you, but first it gives a
> warning designed to
> make you look at the file and validate it.  Why would
> you want to do this?
>  For example, what if you provided protein sequence to the
> EST option
> accidentally, you wouldn?t want MAKER to just
> proceed.  You want a warning
> so you can check first.  If your file is in fact EST
> data, then set the
> flag and those characters will be changed to N?s in the
> fixed fasta
> sequence, otherwise those characters will cause errors in
> downstream tools
> like exonerate, and even some downstream GMOD tools, so they
> can?t be
> allowed to remain as is.
> 
> For the GFF3 file, there is almost definitely a logic issue
> in the file
> (mod encode validator won?t check for those).  This
> can be from prior
> manipulation of the GFF3 file.  For example, IDs for a
> gene that are the
> same across two contigs (technically valid but a logic
> error).  The GFF3
> error message will normally give the ID of the feature
> causing the issue.
> 
> I could also take a look for you.  You can upload the
> GFF3 file here ?>
> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
> Click on 'new guest account' then e-mail me back you guest
> ID, so I know
> which files to review.
> 
> Thanks,
> Carson
> 
> 
> 
> On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com>
> wrote:
> 
> >Maker folks,
> >I am re-annotating a single contig and I am having a few
> problems.
> >
> >First, I am having trouble passing through a Maker
> derived gff (from
> >Maker 2.09, with some modifications to gene names and
> functional
> >information added).  The gff file passes the
> modencode validator but
> >Maker always fails on the first gene in the file,
> regardless of which
> >gene comes first.  So it appears to be a systematic
> error across the
> >entire file.  The Maker error is "Check your input
> GFF3 file for errors!
> >(from GFFDB)".   I have tried Maker 2.10
> and 2.31, using both genome_gff
> >with model_pass=1 and pred_gff.  Attached is a gff
> with the first 2
> >genes.  
> >
> >Second, when I updated to Maker 2.31, Maker now
> complains that my EST
> >fasta file has nucleotides that are not supported
> [RYKMSWBDHV].  It
> >suggests "set -fix_nucleotides on the command line to
> fix this
> >automatically".  Is the -fix_nucleotides a Maker
> flag?  What exactly does
> >it do?  Does it remove the entire sequence or
> replace ambiguous bases
> >with a randomly selected one?  Half of my 20k ESTs
> contain these
> >characters, so I don't want to throw them out entirely.
> >
> >Also, just curious, has Maker never supported these
> characters but just
> >never complained?  I used this EST data set with
> Maker 2.09.  I did note
> >poor EST coverage, but thought it was an issue with the
> EST data itself.
> >
> >I appreciate any suggestions.
> >Thanks,
> >Megan_______________________________________________
> >maker-devel mailing list
> >maker-devel at box290.bluehost.com
> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> 
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: purify
Type: application/octet-stream
Size: 1966 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140224/a1582e7d/attachment-0002.obj>

From carsonhh at gmail.com  Mon Feb 24 14:03:00 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Feb 2014 14:03:00 -0700
Subject: [maker-devel] gff pass thru problem and unsupported EST
 nucleotides
In-Reply-To: <CF30FEE0.A32D%carsonhh@gmail.com>
References: <CF30D6EC.A2CC%carsonhh@gmail.com>
	<1393273971.41635.YahooMailBasic@web162205.mail.bf1.yahoo.com>
	<CF30FEE0.A32D%carsonhh@gmail.com>
Message-ID: <CF310121.A33F%carsonhh@gmail.com>

One more thing.  You must give the file to pred_gff or model_gff.  It is
no longer strictly a MAKER file, as many of the source columns read ?.?
meaning it has been edited by Apollo or another editor.  So it will not be
guaranteed to be recognized by genome_gff, because many of the source tags
have changed.

Thanks,
Carson


On 2/24/14, 1:59 PM, "Carson Holt" <carsonhh at gmail.com> wrote:

>I found the issue.  You have non-ascii characters at the end of almost
>every line.  Because they are happening within the Parent= tag, they then
>become part of the Parent ID when the file is read.
>
>So instead of "HERA000031-RA? you get ?> "HERA000031-RA\cM? as the Parent
>ID.
>
>?\cM? is a meta-return.
>
>I ran the attached script to remove these characters (perl purify
><gff3_file>), and then it works.  Make sure to remove the
>.../Hera_Cr_HmelHybd_Nov2013.maker.output/Hera_Cr_HmelHybd_Nov2013.db file
>to force the GFF3 database to be rebuilt after fixing the file when you
>rerun MAKER.
>
>Thanks,
>Carson
>
>
>
>
>On 2/24/14, 1:32 PM, "Megan" <hedgyx at yahoo.com> wrote:
>
>>Hi Carson and Daniel,
>>
>>Thanks for your suggestions.  I have looked at the gff file, but I do not
>>see any obvious errors.  I have uploaded the files to your website.  The
>>reference fasta is there, the full gff, and a single gene gff that also
>>causes an error.  If I remove that gene from the full gff, then the error
>>is on the next gene in the file, so it appears to be a systematic problem
>>throughout the gff.  The gff was generated by Maker, but I may have
>>messed it up when I modified it to rename genes and add functional
>>information.  I checked with cat -te, but don't see any obvious
>>formatting errors.
>>
>>Thanks!
>>Megan
>>
>>
>>--------------------------------------------
>>On Mon, 2/24/14, Carson Holt <carsonhh at gmail.com> wrote:
>>
>> Subject: Re: [maker-devel] gff pass thru problem and unsupported EST
>>nucleotides
>> To: "Megan" <hedgyx at yahoo.com>, maker-devel at yandell-lab.org
>> Date: Monday, February 24, 2014, 10:18 AM
>> 
>> The -fix_nucleotides flag is added to
>> the command line (I.e. maker
>> -fix_nucleotides flag).  It is there so you are aware
>> that there is an
>> issue with your fasta file, that will cause things
>> downstream to fail.
>> MAKER can fix the errors for you, but first it gives a
>> warning designed to
>> make you look at the file and validate it.  Why would
>> you want to do this?
>>  For example, what if you provided protein sequence to the
>> EST option
>> accidentally, you wouldn?t want MAKER to just
>> proceed.  You want a warning
>> so you can check first.  If your file is in fact EST
>> data, then set the
>> flag and those characters will be changed to N?s in the
>> fixed fasta
>> sequence, otherwise those characters will cause errors in
>> downstream tools
>> like exonerate, and even some downstream GMOD tools, so they
>> can?t be
>> allowed to remain as is.
>> 
>> For the GFF3 file, there is almost definitely a logic issue
>> in the file
>> (mod encode validator won?t check for those).  This
>> can be from prior
>> manipulation of the GFF3 file.  For example, IDs for a
>> gene that are the
>> same across two contigs (technically valid but a logic
>> error).  The GFF3
>> error message will normally give the ID of the feature
>> causing the issue.
>> 
>> I could also take a look for you.  You can upload the
>> GFF3 file here ?>
>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>> Click on 'new guest account' then e-mail me back you guest
>> ID, so I know
>> which files to review.
>> 
>> Thanks,
>> Carson
>> 
>> 
>> 
>> On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com>
>> wrote:
>> 
>> >Maker folks,
>> >I am re-annotating a single contig and I am having a few
>> problems.
>> >
>> >First, I am having trouble passing through a Maker
>> derived gff (from
>> >Maker 2.09, with some modifications to gene names and
>> functional
>> >information added).  The gff file passes the
>> modencode validator but
>> >Maker always fails on the first gene in the file,
>> regardless of which
>> >gene comes first.  So it appears to be a systematic
>> error across the
>> >entire file.  The Maker error is "Check your input
>> GFF3 file for errors!
>> >(from GFFDB)".   I have tried Maker 2.10
>> and 2.31, using both genome_gff
>> >with model_pass=1 and pred_gff.  Attached is a gff
>> with the first 2
>> >genes.  
>> >
>> >Second, when I updated to Maker 2.31, Maker now
>> complains that my EST
>> >fasta file has nucleotides that are not supported
>> [RYKMSWBDHV].  It
>> >suggests "set -fix_nucleotides on the command line to
>> fix this
>> >automatically".  Is the -fix_nucleotides a Maker
>> flag?  What exactly does
>> >it do?  Does it remove the entire sequence or
>> replace ambiguous bases
>> >with a randomly selected one?  Half of my 20k ESTs
>> contain these
>> >characters, so I don't want to throw them out entirely.
>> >
>> >Also, just curious, has Maker never supported these
>> characters but just
>> >never complained?  I used this EST data set with
>> Maker 2.09.  I did note
>> >poor EST coverage, but thought it was an issue with the
>> EST data itself.
>> >
>> >I appreciate any suggestions.
>> >Thanks,
>> >Megan_______________________________________________
>> >maker-devel mailing list
>> >maker-devel at box290.bluehost.com
>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
>> 
>>
>


From rbharris at uw.edu  Tue Feb 25 14:49:57 2014
From: rbharris at uw.edu (Rebecca Harris)
Date: Tue, 25 Feb 2014 13:49:57 -0800
Subject: [maker-devel] error in snap training
Message-ID: <CAESS276MjRUmto+9fkr68jRXBE9or4geWB-q4Oc5_qKsQOdnpA@mail.gmail.com>

Hey -

I'm trying to train SNAP and am running into errors. I don't have any EST
evidence, just protein. My .gff file reports 10865 genes but when I run
maker2zff  -c0 -e0 I get back empty genome files. When I run maker2zff -n,
a ton of overlap_prev_exon errors get written to the screen and then with I
get to the forge step I get an "impossible error5". Any help would be
greatly appreciated.

Thanks!
Rebecca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140225/cc68f3a6/attachment-0002.html>

From carsonhh at gmail.com  Tue Feb 25 15:12:14 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 25 Feb 2014 15:12:14 -0700
Subject: [maker-devel] error in snap training
In-Reply-To: <CAESS276MjRUmto+9fkr68jRXBE9or4geWB-q4Oc5_qKsQOdnpA@mail.gmail.com>
References: <CAESS276MjRUmto+9fkr68jRXBE9or4geWB-q4Oc5_qKsQOdnpA@mail.gmail.com>
Message-ID: <16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com>

Make sure you are using 2.31,  and then try the maker2zff filters individually.  If the protein models are not working well, use CEGMA to generate models. It's from the same group as SNAP.  Use cegma2zff for the conversion.

--Carson

Sent from my iPhone

> On Feb 25, 2014, at 2:49 PM, Rebecca Harris <rbharris at uw.edu> wrote:
> 
> Hey - 
> 
> I'm trying to train SNAP and am running into errors. I don't have any EST evidence, just protein. My .gff file reports 10865 genes but when I run maker2zff  -c0 -e0 I get back empty genome files. When I run maker2zff -n, a ton of overlap_prev_exon errors get written to the screen and then with I get to the forge step I get an "impossible error5". Any help would be greatly appreciated.
> 
> Thanks!
> Rebecca
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From sjackman at gmail.com  Tue Feb 25 17:06:03 2014
From: sjackman at gmail.com (Shaun Jackman)
Date: Tue, 25 Feb 2014 16:06:03 -0800
Subject: [maker-devel] Mapping gene names
Message-ID: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>

Hi,

I?m annotating a genome using a closely related genome from Genbank, using
the .frn (RNA) and .faa (protein) files from Genbank as evidence to
annotate my genome. I?ve run Maker, and the annotation seems to have worked
well. Is it possible to map the names of the genes from the related species
to my annotation? I see the *map_forward* option, which applies to the
*model_gff* parameter. Is there a similar option for *est* and *protein*?

*maker_opts.ctl*

est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1

Thanks,
Shaun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140225/7ae5e966/attachment-0002.html>

From hedgyx at yahoo.com  Tue Feb 25 17:26:11 2014
From: hedgyx at yahoo.com (Megan)
Date: Tue, 25 Feb 2014 16:26:11 -0800 (PST)
Subject: [maker-devel] gff pass thru problem and unsupported EST
	nucleotides
In-Reply-To: <CF30FEE0.A32D%carsonhh@gmail.com>
Message-ID: <1393374371.45210.YahooMailBasic@web162201.mail.bf1.yahoo.com>

Carson,

Everything ran through smoothly after removing the ^Ms.  Thanks for the help.

Megan
--------------------------------------------
On Mon, 2/24/14, Carson Holt <carsonhh at gmail.com> wrote:

 Subject: Re: [maker-devel] gff pass thru problem and unsupported EST nucleotides
 To: "Megan" <hedgyx at yahoo.com>, "Daniel Ence" <dence at genetics.utah.edu>
 Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
 Date: Monday, February 24, 2014, 12:59 PM
 
 I found the issue.? You have
 non-ascii characters at the end of almost
 every line.? Because they are happening within the
 Parent= tag, they then
 become part of the Parent ID when the file is read.
 
 So instead of "HERA000031-RA? you get ?>
 "HERA000031-RA\cM? as the Parent
 ID.
 
 ?\cM? is a meta-return.
 
 I ran the attached script to remove these characters (perl
 purify
 <gff3_file>), and then it works.? Make sure to
 remove the
 .../Hera_Cr_HmelHybd_Nov2013.maker.output/Hera_Cr_HmelHybd_Nov2013.db
 file
 to force the GFF3 database to be rebuilt after fixing the
 file when you
 rerun MAKER.
 
 Thanks,
 Carson
 
 
 On 2/24/14, 1:32 PM, "Megan" <hedgyx at yahoo.com>
 wrote:
 
 >Hi Carson and Daniel,
 >
 >Thanks for your suggestions.? I have looked at the
 gff file, but I do not
 >see any obvious errors.? I have uploaded the files
 to your website.? The
 >reference fasta is there, the full gff, and a single
 gene gff that also
 >causes an error.? If I remove that gene from the
 full gff, then the error
 >is on the next gene in the file, so it appears to be a
 systematic problem
 >throughout the gff.? The gff was generated by
 Maker, but I may have
 >messed it up when I modified it to rename genes and add
 functional
 >information.? I checked with cat -te, but don't see
 any obvious
 >formatting errors.
 >
 >Thanks!
 >Megan
 >
 >
 >--------------------------------------------
 >On Mon, 2/24/14, Carson Holt <carsonhh at gmail.com>
 wrote:
 >
 > Subject: Re: [maker-devel] gff pass thru problem and
 unsupported EST
 >nucleotides
 > To: "Megan" <hedgyx at yahoo.com>,
 maker-devel at yandell-lab.org
 > Date: Monday, February 24, 2014, 10:18 AM
 > 
 > The -fix_nucleotides flag is added to
 > the command line (I.e. maker
 > -fix_nucleotides flag).? It is there so you are
 aware
 > that there is an
 > issue with your fasta file, that will cause things
 > downstream to fail.
 > MAKER can fix the errors for you, but first it gives a
 > warning designed to
 > make you look at the file and validate it.? Why
 would
 > you want to do this?
 >? For example, what if you provided protein
 sequence to the
 > EST option
 > accidentally, you wouldn?t want MAKER to just
 > proceed.? You want a warning
 > so you can check first.? If your file is in fact
 EST
 > data, then set the
 > flag and those characters will be changed to N?s in
 the
 > fixed fasta
 > sequence, otherwise those characters will cause errors
 in
 > downstream tools
 > like exonerate, and even some downstream GMOD tools, so
 they
 > can?t be
 > allowed to remain as is.
 > 
 > For the GFF3 file, there is almost definitely a logic
 issue
 > in the file
 > (mod encode validator won?t check for those).?
 This
 > can be from prior
 > manipulation of the GFF3 file.? For example, IDs
 for a
 > gene that are the
 > same across two contigs (technically valid but a logic
 > error).? The GFF3
 > error message will normally give the ID of the feature
 > causing the issue.
 > 
 > I could also take a look for you.? You can upload
 the
 > GFF3 file here ?>
 > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
 > Click on 'new guest account' then e-mail me back you
 guest
 > ID, so I know
 > which files to review.
 > 
 > Thanks,
 > Carson
 > 
 > 
 > 
 > On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com>
 > wrote:
 > 
 > >Maker folks,
 > >I am re-annotating a single contig and I am having
 a few
 > problems.
 > >
 > >First, I am having trouble passing through a Maker
 > derived gff (from
 > >Maker 2.09, with some modifications to gene names
 and
 > functional
 > >information added).? The gff file passes the
 > modencode validator but
 > >Maker always fails on the first gene in the file,
 > regardless of which
 > >gene comes first.? So it appears to be a
 systematic
 > error across the
 > >entire file.? The Maker error is "Check your
 input
 > GFF3 file for errors!
 > >(from GFFDB)".???I have tried Maker
 2.10
 > and 2.31, using both genome_gff
 > >with model_pass=1 and pred_gff.? Attached is a
 gff
 > with the first 2
 > >genes.? 
 > >
 > >Second, when I updated to Maker 2.31, Maker now
 > complains that my EST
 > >fasta file has nucleotides that are not supported
 > [RYKMSWBDHV].? It
 > >suggests "set -fix_nucleotides on the command line
 to
 > fix this
 > >automatically".? Is the -fix_nucleotides a
 Maker
 > flag?? What exactly does
 > >it do?? Does it remove the entire sequence or
 > replace ambiguous bases
 > >with a randomly selected one?? Half of my 20k
 ESTs
 > contain these
 > >characters, so I don't want to throw them out
 entirely.
 > >
 > >Also, just curious, has Maker never supported
 these
 > characters but just
 > >never complained?? I used this EST data set
 with
 > Maker 2.09.? I did note
 > >poor EST coverage, but thought it was an issue with
 the
 > EST data itself.
 > >
 > >I appreciate any suggestions.
 > >Thanks,
 >
 >Megan_______________________________________________
 > >maker-devel mailing list
 > >maker-devel at box290.bluehost.com
 > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
 > 
 > 
 >
 
 
From carsonhh at gmail.com  Tue Feb 25 17:58:08 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 25 Feb 2014 17:58:08 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
Message-ID: <CF32868D.A42A%carsonhh@gmail.com>

There is a way.  It?s not a standard option and it?s undocumented, but if
you add est_forward=1 to the maker_opts.ctl file, then it will do just that.
The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option.  If you add tags
to your fasta headers, those can be used to guide the mapping and naming.
For example, gene_id=<some_gene>  will ensure different isoforms that share
a common gene_id get clustered into the same gene, and
maker_coor=chr1:1-10000 in the fasta header will force a particular sequence
to only be mapped against chr1 within the range of 1-10000 bp  and just
using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast
alignments of earlier transcript or protein annotations as a guide.

?Carson


From:  Shaun Jackman <sjackman at gmail.com>
Reply-To:  Shaun Jackman <sjackman at gmail.com>
Date:  Tuesday, February 25, 2014 at 5:06 PM
To:  <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Mapping gene names

Hi,

I?m annotating a genome using a closely related genome from Genbank, using
the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate
my genome. I?ve run Maker, and the annotation seems to have worked well. Is
it possible to map the names of the genes from the related species to my
annotation? I see the map_forward option, which applies to the model_gff
parameter. Is there a similar option for est and protein?

maker_opts.ctl
est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1
Thanks,
Shaun
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140225/acb85579/attachment-0002.html>

From carsonhh at gmail.com  Tue Feb 25 18:04:48 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 25 Feb 2014 18:04:48 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CF32868D.A42A%carsonhh@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
Message-ID: <CF328AAA.A44D%carsonhh@gmail.com>

One more note.  When using this option, the score column of mRNA features
will represent how completely this gene matches the source EST/protein
(fraction coverage multiplied by % identity).  So a value of 100 means there
is perfect match.  This way if the same transcript maps to multiple
locations, then you can identify which locations is the closest match (also
works for identifying likly orthologs vs. paralogs).

?Carson


From:  Carson Holt <carsonhh at gmail.com>
Date:  Tuesday, February 25, 2014 at 5:58 PM
To:  Shaun Jackman <sjackman at gmail.com>, <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Mapping gene names

There is a way.  It?s not a standard option and it?s undocumented, but if
you add est_forward=1 to the maker_opts.ctl file, then it will do just that.
The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option.  If you add tags
to your fasta headers, those can be used to guide the mapping and naming.
For example, gene_id=<some_gene>  will ensure different isoforms that share
a common gene_id get clustered into the same gene, and
maker_coor=chr1:1-10000 in the fasta header will force a particular sequence
to only be mapped against chr1 within the range of 1-10000 bp  and just
using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast
alignments of earlier transcript or protein annotations as a guide.

?Carson


From:  Shaun Jackman <sjackman at gmail.com>
Reply-To:  Shaun Jackman <sjackman at gmail.com>
Date:  Tuesday, February 25, 2014 at 5:06 PM
To:  <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Mapping gene names

Hi,

I?m annotating a genome using a closely related genome from Genbank, using
the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate
my genome. I?ve run Maker, and the annotation seems to have worked well. Is
it possible to map the names of the genes from the related species to my
annotation? I see the map_forward option, which applies to the model_gff
parameter. Is there a similar option for est and protein?

maker_opts.ctl
est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1
Thanks,
Shaun
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m
aker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140225/bc343f94/attachment-0002.html>

From weckalba at asu.edu  Tue Feb 25 18:36:21 2014
From: weckalba at asu.edu (Walter Eckalbar)
Date: Tue, 25 Feb 2014 17:36:21 -0800
Subject: [maker-devel] invalid gff3 format issues
Message-ID: <CANRPJScjqJDph_SMu0+8PaTMDT7aym9a3u_nhVihYa6BNxZ3AQ@mail.gmail.com>

Hi all,

I am trying to update maker annotations with PASA and encountered errors
stemming from file format issues in the gff3 file.

I put a few lines from the gff3 to highlight the issue below.  Basically,
the problem is that there are non-unique IDs for a number of the
annotations.

Is there anything that can be done to right this problem?

Thanks,

Walter

Lines from GFF3 file, repeated IDs are highlighted:


chr1    maker    gene    9377440    9432028    .    -    .
ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16
chr1    maker    mRNA    9377440    9432028    .    -    .
ID=maker-chr1-snap-gene-4.53-mRNA-1;
Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234
chr1    maker    exon    9431899    9432028    .    -    .
ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1
chr1    maker    exon    9431698    9431808    .    -    .
ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1

chr1    maker    gene    8894975    9021577    .    +    .
ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53
chr1    maker    mRNA    8894975    9021577    .    +    .
ID=maker-chr1-snap-gene-4.53-mRNA-1;
Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007
chr1    maker    exon    8894975    8895153    .    +    .
ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
chr1    maker    exon    8942215    8942531    .    +    .
ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140225/2bb3934c/attachment-0002.html>

From dence at genetics.utah.edu  Tue Feb 25 19:02:04 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Wed, 26 Feb 2014 02:02:04 +0000
Subject: [maker-devel] invalid gff3 format issues
In-Reply-To: <CANRPJScjqJDph_SMu0+8PaTMDT7aym9a3u_nhVihYa6BNxZ3AQ@mail.gmail.com>
References: <CANRPJScjqJDph_SMu0+8PaTMDT7aym9a3u_nhVihYa6BNxZ3AQ@mail.gmail.com>
Message-ID: <BA9485A1-B761-4C33-A695-9FF6EF43B109@genetics.utah.edu>

Hi Walter,

Will you upload the full GFF3 and the control files that you used to this URL?
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189
Also, what version of MAKER are you running this with?

Thanks,
Daniel


On Feb 25, 2014, at 6:36 PM, Walter Eckalbar <weckalba at asu.edu<mailto:weckalba at asu.edu>>
 wrote:

Hi all,

I am trying to update maker annotations with PASA and encountered errors stemming from file format issues in the gff3 file.

I put a few lines from the gff3 to highlight the issue below.  Basically, the problem is that there are non-unique IDs for a number of the annotations.

Is there anything that can be done to right this problem?

Thanks,

Walter

Lines from GFF3 file, repeated IDs are highlighted:


chr1    maker    gene    9377440    9432028    .    -    .    ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16
chr1    maker    mRNA    9377440    9432028    .    -    .    ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234
chr1    maker    exon    9431899    9432028    .    -    .    ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1
chr1    maker    exon    9431698    9431808    .    -    .    ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1

chr1    maker    gene    8894975    9021577    .    +    .    ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53
chr1    maker    mRNA    8894975    9021577    .    +    .    ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007
chr1    maker    exon    8894975    8895153    .    +    .    ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
chr1    maker    exon    8942215    8942531    .    +    .    ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/72236939/attachment-0002.html>

From weckalba at asu.edu  Tue Feb 25 19:11:12 2014
From: weckalba at asu.edu (Walter Eckalbar)
Date: Tue, 25 Feb 2014 18:11:12 -0800
Subject: [maker-devel] invalid gff3 format issues
In-Reply-To: <BA9485A1-B761-4C33-A695-9FF6EF43B109@genetics.utah.edu>
References: <CANRPJScjqJDph_SMu0+8PaTMDT7aym9a3u_nhVihYa6BNxZ3AQ@mail.gmail.com>
	<BA9485A1-B761-4C33-A695-9FF6EF43B109@genetics.utah.edu>
Message-ID: <CANRPJSdY6--A0QtTOUBNQM+HN7dWRDv1YZv7bi=+CVef8LLRXw@mail.gmail.com>

Hi Daniel, those have been uploaded and I'm using version 2.28.

Walter


On 25 February 2014 18:02, Daniel Ence <dence at genetics.utah.edu> wrote:

>  Hi Walter,
>
>  Will you upload the full GFF3 and the control files that you used to
> this URL?
>  http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189
> Also, what version of MAKER are you running this with?
>
>  Thanks,
> Daniel
>
>
>
>  On Feb 25, 2014, at 6:36 PM, Walter Eckalbar <weckalba at asu.edu>
>  wrote:
>
>   Hi all,
>
> I am trying to update maker annotations with PASA and encountered errors
> stemming from file format issues in the gff3 file.
>
>  I put a few lines from the gff3 to highlight the issue below.  Basically,
> the problem is that there are non-unique IDs for a number of the
> annotations.
>
>  Is there anything that can be done to right this problem?
>
> Thanks,
>
>  Walter
>
> Lines from GFF3 file, repeated IDs are highlighted:
>
>
> chr1    maker    gene    9377440    9432028    .    -    .
> ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16
> chr1    maker    mRNA    9377440    9432028    .    -    .
> ID=maker-chr1-snap-gene-4.53-mRNA-1;
> Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234
> chr1    maker    exon    9431899    9432028    .    -    .
> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1
> chr1    maker    exon    9431698    9431808    .    -    .
> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1
>
> chr1    maker    gene    8894975    9021577    .    +    .
> ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53
> chr1    maker    mRNA    8894975    9021577    .    +    .   ID=maker-chr1-snap-gene-4.53-mRNA-1;
> Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007
> chr1    maker    exon    8894975    8895153    .    +    .
> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
> chr1    maker    exon    8942215    8942531    .    +    .
> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
>  _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140225/2392a8fd/attachment-0002.html>

From carsonhh at gmail.com  Tue Feb 25 21:10:27 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 25 Feb 2014 21:10:27 -0700
Subject: [maker-devel] invalid gff3 format issues
In-Reply-To: <CANRPJSdY6--A0QtTOUBNQM+HN7dWRDv1YZv7bi=+CVef8LLRXw@mail.gmail.com>
References: <CANRPJScjqJDph_SMu0+8PaTMDT7aym9a3u_nhVihYa6BNxZ3AQ@mail.gmail.com>
	<BA9485A1-B761-4C33-A695-9FF6EF43B109@genetics.utah.edu>
	<CANRPJSdY6--A0QtTOUBNQM+HN7dWRDv1YZv7bi=+CVef8LLRXw@mail.gmail.com>
Message-ID: <CF32B115.A46C%carsonhh@gmail.com>

Could you try version 2.31 (the current version)?  I believe this is
happening because you are passing in MAKER genes as pred_gff the transcripts
thus ended up with the same Names and IDs as the genes being generated by
the MAKER run via SNAP etc.  This shouldn?t happen with model_gff, and
shouldn?t happen in 2.31 (IDs and names are generated slightly differently
in 2.30+).

Thanks,
Carson

From:  Walter Eckalbar <weckalba at asu.edu>
Date:  Tuesday, February 25, 2014 at 7:11 PM
To:  Daniel Ence <dence at genetics.utah.edu>
Cc:  "<maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] invalid gff3 format issues

Hi Daniel, those have been uploaded and I?m using version 2.28.

Walter


On 25 February 2014 18:02, Daniel Ence <dence at genetics.utah.edu> wrote:
> Hi Walter, 
> 
> Will you upload the full GFF3 and the control files that you used to this URL?
> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189
> Also, what version of MAKER are you running this with?
> 
> Thanks,
> Daniel
> 
> 
> 
> On Feb 25, 2014, at 6:36 PM, Walter Eckalbar <weckalba at asu.edu>
>  wrote:
> 
>> Hi all,
>> 
>> I am trying to update maker annotations with PASA and encountered errors
>> stemming from file format issues in the gff3 file.
>> 
>> I put a few lines from the gff3 to highlight the issue below.  Basically, the
>> problem is that there are non-unique IDs for a number of the annotations.
>> 
>> Is there anything that can be done to right this problem?
>> 
>> Thanks,
>> 
>> Walter
>> 
>> Lines from GFF3 file, repeated IDs are highlighted:
>> 
>> 
>> chr1    maker    gene    9377440    9432028    .    -    .
>> ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.
>> 16
>> chr1    maker    mRNA    9377440    9432028    .    -    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-pred_gff_maker-gene-4.1
>> 6;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82
>> |1|1|1|28|1680|1234
>> chr1    maker    exon    9431899    9432028    .    -    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53
>> -mRNA-1
>> chr1    maker    exon    9431698    9431808    .    -    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53
>> -mRNA-1
>> 
>> chr1    maker    gene    8894975    9021577    .    +    .
>> ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53
>> chr1    maker    mRNA    8894975    9021577    .    +    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-snap-gene-4.53;Name=mak
>> er-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0
>> .88|27|503|2007
>> chr1    maker    exon    8894975    8895153    .    +    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53
>> -mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,mak
>> er-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-sna
>> p-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53
>> -mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,ma
>> ker-chr1-snap-gene-4.53-mRNA-11
>> chr1    maker    exon    8942215    8942531    .    +    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53
>> -mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,mak
>> er-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-sna
>> p-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53
>> -mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,ma
>> ker-chr1-snap-gene-4.53-mRNA-11
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 

_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140225/f87e77c7/attachment-0002.html>

From marc.hoeppner at imbim.uu.se  Wed Feb 26 01:26:35 2014
From: marc.hoeppner at imbim.uu.se (=?Windows-1252?Q?Marc_H=F6ppner?=)
Date: Wed, 26 Feb 2014 08:26:35 +0000
Subject: [maker-devel] Functional annotation options
Message-ID: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se>

Dear List,

I have finished a gene build now, and I would like to go over to functional annotation. I understand that maker includes a few script to facilitate such analyses. However, I have a few questions about this:

1) iprscan
It seems maker includes a MPI wrapper for InterProscan, but requests ?iprscan? to be in $PATH. The latest versions of Interproscan I have worked with are java applications and eventho I put their location in $PATH, mpi_iprscan seems to want something else? But what?

2) maker_functional_gff
This script seems to be very useful, but the description suggests that it requires WuBlast tabular output ?2', which I think looks quite different from the ncbi blast tabular output. Since Wublast is not really available anymore (except this very old, frozen binary bundle), I was wondering how to address this issue. 

3) maker_functional
This just throws an error about a missing Job ID, so no clue what this would be used for.

I guess what I am after is some suggestion as to how use the scripts included with Maker to achieve a reasonable functional annotation. 

With kind regards,

Marc Hoeppner

Marc P. Hoeppner, PhD
Team Leader
BILS Genome Annotation Platform
Department for Medical Biochemistry and Microbiology
Uppsala University, Sweden
marc.hoeppner at imbim.uu.se


From mikael.durling at slu.se  Wed Feb 26 02:43:43 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Wed, 26 Feb 2014 09:43:43 +0000
Subject: [maker-devel] Functional annotation options
In-Reply-To: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se>
References: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se>
Message-ID: <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se>


26 feb 2014 kl. 09:26 skrev Marc H?ppner <marc.hoeppner at imbim.uu.se>:

> Dear List,
> 
> I have finished a gene build now, and I would like to go over to functional annotation. I understand that maker includes a few script to facilitate such analyses. However, I have a few questions about this:
> 
> 1) iprscan
> It seems maker includes a MPI wrapper for InterProscan, but requests ?iprscan? to be in $PATH. The latest versions of Interproscan I have worked with are java applications and eventho I put their location in $PATH, mpi_iprscan seems to want something else? But what?

I don?t believe it works with interproscan5. What I usually do is to split the maker protein file into chunks, and then run these chunks as separate jobs on our cluster, then finally merge the results. The TSV file form iprscan5 can be input into the maker tool ipr_update_gff. I have not tried the iprscan2gff3, as I haven?t figured how to get an iprscan4 raw file from iprscan5.


> 2) maker_functional_gff
> This script seems to be very useful, but the description suggests that it requires WuBlast tabular output ?2', which I think looks quite different from the ncbi blast tabular output. Since Wublast is not really available anymore (except this very old, frozen binary bundle), I was wondering how to address this issue. 

It works fine with ncbiblast+ and the blastp command with -outfmt 6. 

cheers,
Mikael

Ps. Your welcome to visit me at SLU if you would like to discuss experiences of genome annotations.


> 
> 3) maker_functional
> This just throws an error about a missing Job ID, so no clue what this would be used for.
> 
> I guess what I am after is some suggestion as to how use the scripts included with Maker to achieve a reasonable functional annotation. 
> 
> With kind regards,
> 
> Marc Hoeppner
> 
> Marc P. Hoeppner, PhD
> Team Leader
> BILS Genome Annotation Platform
> Department for Medical Biochemistry and Microbiology
> Uppsala University, Sweden
> marc.hoeppner at imbim.uu.se
> 
> 
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From mikael.durling at slu.se  Wed Feb 26 02:55:56 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Wed, 26 Feb 2014 09:55:56 +0000
Subject: [maker-devel] Functional annotation options
In-Reply-To: <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se>
References: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se>
	<63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se>
Message-ID: <29357689-D616-465F-BCC4-66AF5B1D5D2E@slu.se>


26 feb 2014 kl. 10:43 skrev Mikael Brandstr?m Durling <mikael.durling at slu.se<mailto:mikael.durling at slu.se>>:


26 feb 2014 kl. 09:26 skrev Marc H?ppner <marc.hoeppner at imbim.uu.se<mailto:marc.hoeppner at imbim.uu.se>>:

Dear List,

I have finished a gene build now, and I would like to go over to functional annotation. I understand that maker includes a few script to facilitate such analyses. However, I have a few questions about this:

1) iprscan
It seems maker includes a MPI wrapper for InterProscan, but requests ?iprscan? to be in $PATH. The latest versions of Interproscan I have worked with are java applications and eventho I put their location in $PATH, mpi_iprscan seems to want something else? But what?

I don?t believe it works with interproscan5. What I usually do is to split the maker protein file into chunks, and then run these chunks as separate jobs on our cluster, then finally merge the results. The TSV file form iprscan5 can be input into the maker tool ipr_update_gff. I have not tried the iprscan2gff3, as I haven?t figured how to get an iprscan4 raw file from iprscan5.

I should clarify this and say that mpi_iprscan doesn?t seem to work with iprscan5. ipr_update_gff3 does, however.


Mikael

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/b4a81f22/attachment-0002.html>

From mikael.durling at slu.se  Wed Feb 26 05:30:44 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Wed, 26 Feb 2014 12:30:44 +0000
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CF32868D.A42A%carsonhh@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
Message-ID: <BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>

Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?

Thanks,
Mikael

26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>:

There is a way.  It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that.  The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option.  If you add tags to your fasta headers, those can be used to guide the mapping and naming.  For example, gene_id=<some_gene>  will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp  and just using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.

?Carson


From: Shaun Jackman <sjackman at gmail.com<mailto:sjackman at gmail.com>>
Reply-To: Shaun Jackman <sjackman at gmail.com<mailto:sjackman at gmail.com>>
Date: Tuesday, February 25, 2014 at 5:06 PM
To: <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: [maker-devel] Mapping gene names


Hi,

I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?

maker_opts.ctl

est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1


Thanks,
Shaun

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/874f135e/attachment-0002.html>

From carsonhh at gmail.com  Wed Feb 26 06:22:34 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 26 Feb 2014 06:22:34 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
Message-ID: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>

Yes.  That should work as well as an accidental feature.

--Carson 

Sent from my iPhone

> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se> wrote:
> 
> Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?
> 
> Thanks,
> Mikael
> 
>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>> 
>> There is a way.  It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that.  The option won?t already be there so you?ll have to type it in.
>> 
>> There is also a feature designed to work with this option.  If you add tags to your fasta headers, those can be used to guide the mapping and naming.  For example, gene_id=<some_gene>  will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp  and just using maker_coor=chr1 will force it to only be mapped against chr1.
>> 
>> This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.
>> 
>> ?Carson
>> 
>> 
>> 
>> 
>> From: Shaun Jackman <sjackman at gmail.com>
>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>> Date: Tuesday, February 25, 2014 at 5:06 PM
>> To: <maker-devel at yandell-lab.org>
>> Subject: [maker-devel] Mapping gene names
>> 
>> Hi,
>> 
>> I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?
>> 
>> maker_opts.ctl
>> 
>> est=NC_123456.frn
>> protein=NC_123456.faa
>> est2genome=1
>> protein2genome=1
>> Thanks,
>> Shaun
>> 
>> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/f3b97c58/attachment-0002.html>

From mikael.durling at slu.se  Wed Feb 26 06:37:29 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Wed, 26 Feb 2014 13:37:29 +0000
Subject: [maker-devel] Mapping gene names
In-Reply-To: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
Message-ID: <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>

That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right?

Mikael

26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>:

Yes.  That should work as well as an accidental feature.

--Carson

Sent from my iPhone

On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se<mailto:mikael.durling at slu.se>> wrote:

Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?

Thanks,
Mikael

26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>:

There is a way.  It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that.  The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option.  If you add tags to your fasta headers, those can be used to guide the mapping and naming.  For example, gene_id=<some_gene>  will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp  and just using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.

?Carson


From: Shaun Jackman <sjackman at gmail.com<mailto:sjackman at gmail.com>>
Reply-To: Shaun Jackman <sjackman at gmail.com<mailto:sjackman at gmail.com>>
Date: Tuesday, February 25, 2014 at 5:06 PM
To: <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: [maker-devel] Mapping gene names


Hi,

I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?

maker_opts.ctl

est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1


Thanks,
Shaun

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/791ef46d/attachment-0002.html>

From nextgen.usfs at gmail.com  Wed Feb 26 09:21:33 2014
From: nextgen.usfs at gmail.com (USFS Ion PGM)
Date: Wed, 26 Feb 2014 10:21:33 -0600
Subject: [maker-devel] change program locations in maker_exe
Message-ID: <CDD24D4E-4555-474F-9367-B6F6D05F11B4@gmail.com>

Hello,
I was wondering if there is a way to make permanent changes to the maker_exe.ctl file, as it seems on the install that maker didn?t find the gene mark or pro build locations correctly, which means that I have to manually edit the maker_exe.ctl file every time and add that information.  Where can I modify this permanently so that the maker -CTL command creates the appropriate maker_exe file?  Thank you.

- Jon


From carsonhh at gmail.com  Wed Feb 26 08:38:47 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 26 Feb 2014 08:38:47 -0700
Subject: [maker-devel] Functional annotation options
In-Reply-To: <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se>
References: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se>
	<63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se>
Message-ID: <CF33558F.A4C3%carsonhh@gmail.com>

maker_functional is a script that gets called by another script, not meant
to be called directly by the user.  So ignore that.

Just run iprscan directly it already works pretty well.  The mpi_iprscan
and iprscan_wrap scripts, just give some logging functionality by wrapping
the iprscan call.  In most cases there is not advantage over just running
iprscan directly.

?Carson


On 2/26/14, 2:43 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
wrote:

>
>26 feb 2014 kl. 09:26 skrev Marc H?ppner <marc.hoeppner at imbim.uu.se>:
>
>> Dear List,
>> 
>> I have finished a gene build now, and I would like to go over to
>>functional annotation. I understand that maker includes a few script to
>>facilitate such analyses. However, I have a few questions about this:
>> 
>> 1) iprscan
>> It seems maker includes a MPI wrapper for InterProscan, but requests
>>?iprscan? to be in $PATH. The latest versions of Interproscan I have
>>worked with are java applications and eventho I put their location in
>>$PATH, mpi_iprscan seems to want something else? But what?
>
>I don?t believe it works with interproscan5. What I usually do is to
>split the maker protein file into chunks, and then run these chunks as
>separate jobs on our cluster, then finally merge the results. The TSV
>file form iprscan5 can be input into the maker tool ipr_update_gff. I
>have not tried the iprscan2gff3, as I haven?t figured how to get an
>iprscan4 raw file from iprscan5.
>
>
>> 2) maker_functional_gff
>> This script seems to be very useful, but the description suggests that
>>it requires WuBlast tabular output ?2', which I think looks quite
>>different from the ncbi blast tabular output. Since Wublast is not
>>really available anymore (except this very old, frozen binary bundle), I
>>was wondering how to address this issue.
>
>It works fine with ncbiblast+ and the blastp command with -outfmt 6.
>
>cheers,
>Mikael
>
>Ps. Your welcome to visit me at SLU if you would like to discuss
>experiences of genome annotations.
>
>
>> 
>> 3) maker_functional
>> This just throws an error about a missing Job ID, so no clue what this
>>would be used for.
>> 
>> I guess what I am after is some suggestion as to how use the scripts
>>included with Maker to achieve a reasonable functional annotation.
>> 
>> With kind regards,
>> 
>> Marc Hoeppner
>> 
>> Marc P. Hoeppner, PhD
>> Team Leader
>> BILS Genome Annotation Platform
>> Department for Medical Biochemistry and Microbiology
>> Uppsala University, Sweden
>> marc.hoeppner at imbim.uu.se
>> 
>> 
>> 
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Wed Feb 26 09:09:14 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 26 Feb 2014 09:09:14 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
Message-ID: <CF335A95.A4DE%carsonhh@gmail.com>

It will still work without est_forward.  It just works a little differently.
Keep in mind this was a hidden feature I used to find stubborn or hard to
find missing genes after reassembly of a genome.

If est_forward is provided, MAKER will parse the database to look for the
maker_coor tags early in the pipeline.  Then it will create a list of
locations to search, and it will search them even if there are no BLAST
results to seed the search (normally MAKER gets a BLAST result first and
then polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to
look for a match using all of chr1 as the input to exonerate even when BLAST
finds nothing (this is a very very slow search, but can help pick up one or
two stubborn genes that don?t remap well).  To allow this, MAKER gives
exonerate looser matching parameters (i.e. allows for single base pair
introns perhaps caused by assembly errors).  The logic here is that given
the fact that I already told MAKER that with some degree of confidence I
expect sequence A to map to to location X, it will try its hardest to make
it match. 

Without est_forward set, the maker_coor= flag still gets read in GI.pm at
line 1563, but only after a BLAST alignment has already seeded it to the
region (that BLAST result has the information in its description parameter).
MAKER will then ignore seeds completely outside of maker_coor. In addition
any BLAST seeds that overlap maker_coor will get the search space for
alignment polishing adjusted to match maker_coor exactly.  Also match
parameters for exonerate will not be relaxed as they were with est_forward.

As you can see the behavior, is slightly different (because it?s an
accidental feature).

Thanks,
Carson


From:  Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date:  Wednesday, February 26, 2014 at 6:37 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Mapping gene names

That might be a useful and time saving accidental feature. But, reading the
code, it seems that I need to supply maker_coor but not gene_id, as well as
the configuration option est_forward for this to work. Any occurrences of
maker_coor in GI.pm seems to be conditioned on set_forward=1 right?

Mikael

26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:

> Yes.  That should work as well as an accidental feature.
> 
> --Carson 
> 
> Sent from my iPhone
> 
> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se>
> wrote:
> 
>> Can this use of maker_coor be used only to hint about the placement of the
>> ests, without affecting the naming of the final genes? Ie if I have a
>> database of EST where I have a priori knowledge of their rough placement, can
>> this placement be given to maker without providing est_forward=1?
>> 
>> Thanks,
>> Mikael
>> 
>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>> 
>>> There is a way.  It?s not a standard option and it?s undocumented, but if
>>> you add est_forward=1 to the maker_opts.ctl file, then it will do just that.
>>> The option won?t already be there so you?ll have to type it in.
>>> 
>>> There is also a feature designed to work with this option.  If you add tags
>>> to your fasta headers, those can be used to guide the mapping and naming.
>>> For example, gene_id=<some_gene>  will ensure different isoforms that share
>>> a common gene_id get clustered into the same gene, and
>>> maker_coor=chr1:1-10000 in the fasta header will force a particular sequence
>>> to only be mapped against chr1 within the range of 1-10000 bp  and just
>>> using maker_coor=chr1 will force it to only be mapped against chr1.
>>> 
>>> This is an undocumented way to remap genes onto new assemblies using blast
>>> alignments of earlier transcript or protein annotations as a guide.
>>> 
>>> ?Carson
>>> 
>>> 
>>> 
>>> 
>>> From: Shaun Jackman <sjackman at gmail.com>
>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>> To: <maker-devel at yandell-lab.org>
>>> Subject: [maker-devel] Mapping gene names
>>> 
>>> Hi,
>>> 
>>> I?m annotating a genome using a closely related genome from Genbank, using
>>> the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate
>>> my genome. I?ve run Maker, and the annotation seems to have worked well. Is
>>> it possible to map the names of the genes from the related species to my
>>> annotation? I see the map_forward option, which applies to the model_gff
>>> parameter. Is there a similar option for est and protein?
>>> 
>>> maker_opts.ctl
>>> est=NC_123456.frn
>>> protein=NC_123456.faa
>>> est2genome=1
>>> protein2genome=1
>>> Thanks,
>>> Shaun
>>> _______________________________________________ maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/4889751f/attachment-0002.html>

From carson.holt at genetics.utah.edu  Wed Feb 26 09:38:37 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Wed, 26 Feb 2014 16:38:37 +0000
Subject: [maker-devel] change program locations in maker_exe
In-Reply-To: <CDD24D4E-4555-474F-9367-B6F6D05F11B4@gmail.com>
References: <CDD24D4E-4555-474F-9367-B6F6D05F11B4@gmail.com>
Message-ID: <CF33655B.A514%carson.holt@genetics.utah.edu>

MAKER first looks inside of .../maker/exe/ for any executables.  Then it
uses the systems ?which? command to identify executables in your PATH
environmental variable.  If MAKER is not finding the one you want, then
you can either put the program in the .../maker/exe/ folder (I.e. create
.../maker/exe/bin/  and then put soft links to the executables you want to
be used first), or you can rearrange the order of paraameters in your PATH
environmental variable so that ?which <program_name>? returns the location
you want.  If MAKER is always leaving the locations to those programs
empty, it is because you need to add them to your PATH environmental
variable.

Thanks,
Carson

On 2/26/14, 9:21 AM, "USFS Ion PGM" <nextgen.usfs at gmail.com> wrote:

>Hello,
>I was wondering if there is a way to make permanent changes to the
>maker_exe.ctl file, as it seems on the install that maker didn?t find the
>gene mark or pro build locations correctly, which means that I have to
>manually edit the maker_exe.ctl file every time and add that information.
> Where can I modify this permanently so that the maker -CTL command
>creates the appropriate maker_exe file?  Thank you.
>
>- Jon
>
>


From nextgen.usfs at gmail.com  Wed Feb 26 09:58:11 2014
From: nextgen.usfs at gmail.com (USFS Ion PGM)
Date: Wed, 26 Feb 2014 10:58:11 -0600
Subject: [maker-devel] change program locations in maker_exe
In-Reply-To: <CF33655B.A514%carson.holt@genetics.utah.edu>
References: <CDD24D4E-4555-474F-9367-B6F6D05F11B4@gmail.com>
	<CF33655B.A514%carson.holt@genetics.utah.edu>
Message-ID: <2FA61AAE-0548-4030-9F4A-6964A631703C@gmail.com>

Hi Carson,

Thank you - that did it, I didn?t have them in the PATH.  All working now.

Cheers,
Jon

On Feb 26, 2014, at 10:38 AM, Carson Holt <carson.holt at genetics.utah.edu> wrote:

> MAKER first looks inside of .../maker/exe/ for any executables.  Then it
> uses the systems ?which? command to identify executables in your PATH
> environmental variable.  If MAKER is not finding the one you want, then
> you can either put the program in the .../maker/exe/ folder (I.e. create
> .../maker/exe/bin/  and then put soft links to the executables you want to
> be used first), or you can rearrange the order of paraameters in your PATH
> environmental variable so that ?which <program_name>? returns the location
> you want.  If MAKER is always leaving the locations to those programs
> empty, it is because you need to add them to your PATH environmental
> variable.
> 
> Thanks,
> Carson
> 
> On 2/26/14, 9:21 AM, "USFS Ion PGM" <nextgen.usfs at gmail.com> wrote:
> 
>> Hello,
>> I was wondering if there is a way to make permanent changes to the
>> maker_exe.ctl file, as it seems on the install that maker didn?t find the
>> gene mark or pro build locations correctly, which means that I have to
>> manually edit the maker_exe.ctl file every time and add that information.
>> Where can I modify this permanently so that the maker -CTL command
>> creates the appropriate maker_exe file?  Thank you.
>> 
>> - Jon
>> 
>> 
> 


From weckalba at asu.edu  Wed Feb 26 13:05:05 2014
From: weckalba at asu.edu (Walter Eckalbar)
Date: Wed, 26 Feb 2014 12:05:05 -0800
Subject: [maker-devel] invalid gff3 format issues
In-Reply-To: <CF32B115.A46C%carsonhh@gmail.com>
References: <CANRPJScjqJDph_SMu0+8PaTMDT7aym9a3u_nhVihYa6BNxZ3AQ@mail.gmail.com>
	<BA9485A1-B761-4C33-A695-9FF6EF43B109@genetics.utah.edu>
	<CANRPJSdY6--A0QtTOUBNQM+HN7dWRDv1YZv7bi=+CVef8LLRXw@mail.gmail.com>
	<CF32B115.A46C%carsonhh@gmail.com>
Message-ID: <CANRPJSfTAZrey0m6usseLZ6Sj-2fOsMWe_q1_6-9yXvOiwm44w@mail.gmail.com>

Hi Carson,

Thanks, that seems to have mostly resolved the issue.  Oddly enough though,
PASA still complains about the GFF3 file directly from gff3_merge, but if I
first transform it with maker2eval_gtf, then use PASA's
gtf_to_gff3_format.pl script, everything seems to run fine.


On 25 February 2014 20:10, Carson Holt <carsonhh at gmail.com> wrote:

> Could you try version 2.31 (the current version)?  I believe this is
> happening because you are passing in MAKER genes as pred_gff the
> transcripts thus ended up with the same Names and IDs as the genes being
> generated by the MAKER run via SNAP etc.  This shouldn't happen with
> model_gff, and shouldn't happen in 2.31 (IDs and names are generated
> slightly differently in 2.30+).
>
> Thanks,
> Carson
>
> From: Walter Eckalbar <weckalba at asu.edu>
> Date: Tuesday, February 25, 2014 at 7:11 PM
> To: Daniel Ence <dence at genetics.utah.edu>
> Cc: "<maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] invalid gff3 format issues
>
> Hi Daniel, those have been uploaded and I'm using version 2.28.
>
> Walter
>
>
> On 25 February 2014 18:02, Daniel Ence <dence at genetics.utah.edu> wrote:
>
>> Hi Walter,
>>
>> Will you upload the full GFF3 and the control files that you used to this
>> URL?
>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189
>> Also, what version of MAKER are you running this with?
>>
>> Thanks,
>> Daniel
>>
>>
>>
>> On Feb 25, 2014, at 6:36 PM, Walter Eckalbar <weckalba at asu.edu>
>>  wrote:
>>
>> Hi all,
>>
>> I am trying to update maker annotations with PASA and encountered errors
>> stemming from file format issues in the gff3 file.
>>
>> I put a few lines from the gff3 to highlight the issue below.  Basically,
>> the problem is that there are non-unique IDs for a number of the
>> annotations.
>>
>> Is there anything that can be done to right this problem?
>>
>> Thanks,
>>
>> Walter
>>
>> Lines from GFF3 file, repeated IDs are highlighted:
>>
>>
>> chr1    maker    gene    9377440    9432028    .    -    .
>> ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16
>> chr1    maker    mRNA    9377440    9432028    .    -    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1;
>> Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234
>> chr1    maker    exon    9431899    9432028    .    -    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1
>> chr1    maker    exon    9431698    9431808    .    -    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1
>>
>> chr1    maker    gene    8894975    9021577    .    +    .
>> ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53
>> chr1    maker    mRNA    8894975    9021577    .    +    .   ID=maker-chr1-snap-gene-4.53-mRNA-1;
>> Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007
>> chr1    maker    exon    8894975    8895153    .    +    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
>> chr1    maker    exon    8942215    8942531    .    +    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>>
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/2d2f2884/attachment-0002.html>

From carsonhh at gmail.com  Wed Feb 26 14:12:23 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 26 Feb 2014 14:12:23 -0700
Subject: [maker-devel] invalid gff3 format issues
In-Reply-To: <CANRPJSfTAZrey0m6usseLZ6Sj-2fOsMWe_q1_6-9yXvOiwm44w@mail.gmail.com>
References: <CANRPJScjqJDph_SMu0+8PaTMDT7aym9a3u_nhVihYa6BNxZ3AQ@mail.gmail.com>
	<BA9485A1-B761-4C33-A695-9FF6EF43B109@genetics.utah.edu>
	<CANRPJSdY6--A0QtTOUBNQM+HN7dWRDv1YZv7bi=+CVef8LLRXw@mail.gmail.com>
	<CF32B115.A46C%carsonhh@gmail.com>
	<CANRPJSfTAZrey0m6usseLZ6Sj-2fOsMWe_q1_6-9yXvOiwm44w@mail.gmail.com>
Message-ID: <CF33A669.A53C%carsonhh@gmail.com>

Could you put the file in this GFF3 validator to see if anything comes up?
?> http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online

Maybe it?s just PASA.  But I?d like to know there?s no issue being caused by
something else.

Thanks,
Carson


From:  Walter Eckalbar <weckalba at asu.edu>
Date:  Wednesday, February 26, 2014 at 1:05 PM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Daniel Ence <dence at genetics.utah.edu>, "<maker-devel at yandell-lab.org>"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] invalid gff3 format issues

Hi Carson,

Thanks, that seems to have mostly resolved the issue.  Oddly enough though,
PASA still complains about the GFF3 file directly from gff3_merge, but if I
first transform it with maker2eval_gtf, then use PASA?s
gtf_to_gff3_format.pl <http://gtf_to_gff3_format.pl>  script, everything
seems to run fine.


On 25 February 2014 20:10, Carson Holt <carsonhh at gmail.com> wrote:
> Could you try version 2.31 (the current version)?  I believe this is happening
> because you are passing in MAKER genes as pred_gff the transcripts thus ended
> up with the same Names and IDs as the genes being generated by the MAKER run
> via SNAP etc.  This shouldn?t happen with model_gff, and shouldn?t happen in
> 2.31 (IDs and names are generated slightly differently in 2.30+).
> 
> Thanks,
> Carson
> 
> From:  Walter Eckalbar <weckalba at asu.edu>
> Date:  Tuesday, February 25, 2014 at 7:11 PM
> To:  Daniel Ence <dence at genetics.utah.edu>
> Cc:  "<maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org>
> Subject:  Re: [maker-devel] invalid gff3 format issues
> 
> Hi Daniel, those have been uploaded and I?m using version 2.28.
> 
> Walter
> 
> 
> On 25 February 2014 18:02, Daniel Ence <dence at genetics.utah.edu> wrote:
>> Hi Walter, 
>> 
>> Will you upload the full GFF3 and the control files that you used to this
>> URL?
>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189
>> Also, what version of MAKER are you running this with?
>> 
>> Thanks,
>> Daniel
>> 
>> 
>> 
>> On Feb 25, 2014, at 6:36 PM, Walter Eckalbar <weckalba at asu.edu>
>>  wrote:
>> 
>>> Hi all,
>>> 
>>> I am trying to update maker annotations with PASA and encountered errors
>>> stemming from file format issues in the gff3 file.
>>> 
>>> I put a few lines from the gff3 to highlight the issue below.  Basically,
>>> the problem is that there are non-unique IDs for a number of the
>>> annotations.
>>> 
>>> Is there anything that can be done to right this problem?
>>> 
>>> Thanks,
>>> 
>>> Walter
>>> 
>>> Lines from GFF3 file, repeated IDs are highlighted:
>>> 
>>> 
>>> chr1    maker    gene    9377440    9432028    .    -    .
>>> ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4
>>> .16
>>> chr1    maker    mRNA    9377440    9432028    .    -    .
>>> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-pred_gff_maker-gene-4.
>>> 16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.
>>> 82|1|1|1|28|1680|1234
>>> chr1    maker    exon    9431899    9432028    .    -    .
>>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.5
>>> 3-mRNA-1
>>> chr1    maker    exon    9431698    9431808    .    -    .
>>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.5
>>> 3-mRNA-1
>>> 
>>> chr1    maker    gene    8894975    9021577    .    +    .
>>> ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53
>>> chr1    maker    mRNA    8894975    9021577    .    +    .
>>> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-snap-gene-4.53;Name=ma
>>> ker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84
>>> |0.88|27|503|2007
>>> chr1    maker    exon    8894975    8895153    .    +    .
>>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.5
>>> 3-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,m
>>> aker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-
>>> snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-
>>> 4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-
>>> 10,maker-chr1-snap-gene-4.53-mRNA-11
>>> chr1    maker    exon    8942215    8942531    .    +    .
>>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.5
>>> 3-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,m
>>> aker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-
>>> snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-
>>> 4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-
>>> 10,maker-chr1-snap-gene-4.53-mRNA-11
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
> 
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak
> er-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/ea166d94/attachment-0002.html>

From mikael.durling at slu.se  Wed Feb 26 15:04:37 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Wed, 26 Feb 2014 22:04:37 +0000
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CF335A95.A4DE%carsonhh@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
Message-ID: <ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>

It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions.

In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature.

THanks,
Mikael

26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>:

It will still work without est_forward.  It just works a little differently.  Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome.

If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline.  Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well).  To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors).  The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match.

Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter).  MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly.  Also match parameters for exonerate will not be relaxed as they were with est_forward.

As you can see the behavior, is slightly different (because it?s an accidental feature).

Thanks,
Carson


From: Mikael Brandstr?m Durling <mikael.durling at slu.se<mailto:mikael.durling at slu.se>>
Date: Wednesday, February 26, 2014 at 6:37 AM
To: Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] Mapping gene names

That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right?

Mikael

26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>:

Yes.  That should work as well as an accidental feature.

--Carson

Sent from my iPhone

On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se<mailto:mikael.durling at slu.se>> wrote:

Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?

Thanks,
Mikael

26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>:

There is a way.  It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that.  The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option.  If you add tags to your fasta headers, those can be used to guide the mapping and naming.  For example, gene_id=<some_gene>  will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp  and just using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.

?Carson


From: Shaun Jackman <sjackman at gmail.com<mailto:sjackman at gmail.com>>
Reply-To: Shaun Jackman <sjackman at gmail.com<mailto:sjackman at gmail.com>>
Date: Tuesday, February 25, 2014 at 5:06 PM
To: <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: [maker-devel] Mapping gene names


Hi,

I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?

maker_opts.ctl

est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1


Thanks,
Shaun

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/0409040d/attachment-0002.html>

From carsonhh at gmail.com  Wed Feb 26 15:50:30 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 26 Feb 2014 15:50:30 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
Message-ID: <CF33B334.A551%carsonhh@gmail.com>

What you can do is run it once with just est_forward=1 and
est2genome/protein2genome set to 1.  Then take those results, pass them in
as model_gff and use the map_forward option to then filter the results based
on mRNA score and that would copy names onto new gene under the standard
MAKER pipeline.  Eventually it?s really supposed to go into a separate tool
that will map genes onto new assemblies (but under the hood the tool will
just be calling MAKER with certain parameters restricted).  I do this
because if people commonly use it mixed with things like SNAP I can start to
get some very weird behaviors.

Thanks,
Carson

From:  Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date:  Wednesday, February 26, 2014 at 3:04 PM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Mapping gene names

It seems that this could be a very useful option in those cases where you
have firm a priori knowledge of the placement of ESTs. However, while trying
it I note that est_forward implies that the est2genome predictor is turned
on, implicitly. Is this necessary for this to work? I?m after the behavior
you describe below where exonerate is made to try really hard within a
limited region to align an est, but I would not like maker to produce
est2genome predictions.

In general, I think this maker_coor and est_forward is a feature set that is
worthy to be promoted into a documented feature.

THanks,
Mikael

26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:

> It will still work without est_forward.  It just works a little differently.
> Keep in mind this was a hidden feature I used to find stubborn or hard to find
> missing genes after reassembly of a genome.
> 
> If est_forward is provided, MAKER will parse the database to look for the
> maker_coor tags early in the pipeline.  Then it will create a list of
> locations to search, and it will search them even if there are no BLAST
> results to seed the search (normally MAKER gets a BLAST result first and then
> polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to look for
> a match using all of chr1 as the input to exonerate even when BLAST finds
> nothing (this is a very very slow search, but can help pick up one or two
> stubborn genes that don?t remap well).  To allow this, MAKER gives exonerate
> looser matching parameters (i.e. allows for single base pair introns perhaps
> caused by assembly errors).  The logic here is that given the fact that I
> already told MAKER that with some degree of confidence I expect sequence A to
> map to to location X, it will try its hardest to make it match.
> 
> Without est_forward set, the maker_coor= flag still gets read in GI.pm at line
> 1563, but only after a BLAST alignment has already seeded it to the region
> (that BLAST result has the information in its description parameter).  MAKER
> will then ignore seeds completely outside of maker_coor. In addition any BLAST
> seeds that overlap maker_coor will get the search space for alignment
> polishing adjusted to match maker_coor exactly.  Also match parameters for
> exonerate will not be relaxed as they were with est_forward.
> 
> As you can see the behavior, is slightly different (because it?s an accidental
> feature).
> 
> Thanks,
> Carson
> 
> 
> 
> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
> Date: Wednesday, February 26, 2014 at 6:37 AM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Mapping gene names
> 
> That might be a useful and time saving accidental feature. But, reading the
> code, it seems that I need to supply maker_coor but not gene_id, as well as
> the configuration option est_forward for this to work. Any occurrences of
> maker_coor in GI.pm seems to be conditioned on set_forward=1 right?
> 
> Mikael
> 
> 26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
> 
>> Yes.  That should work as well as an accidental feature.
>> 
>> --Carson 
>> 
>> Sent from my iPhone
>> 
>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling
>> <mikael.durling at slu.se> wrote:
>> 
>>> Can this use of maker_coor be used only to hint about the placement of the
>>> ests, without affecting the naming of the final genes? Ie if I have a
>>> database of EST where I have a priori knowledge of their rough placement,
>>> can this placement be given to maker without providing est_forward=1?
>>> 
>>> Thanks,
>>> Mikael
>>> 
>>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>> 
>>>> There is a way.  It?s not a standard option and it?s undocumented, but if
>>>> you add est_forward=1 to the maker_opts.ctl file, then it will do just
>>>> that.  The option won?t already be there so you?ll have to type it in.
>>>> 
>>>> There is also a feature designed to work with this option.  If you add tags
>>>> to your fasta headers, those can be used to guide the mapping and naming.
>>>> For example, gene_id=<some_gene>  will ensure different isoforms that share
>>>> a common gene_id get clustered into the same gene, and
>>>> maker_coor=chr1:1-10000 in the fasta header will force a particular
>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp  and
>>>> just using maker_coor=chr1 will force it to only be mapped against chr1.
>>>> 
>>>> This is an undocumented way to remap genes onto new assemblies using blast
>>>> alignments of earlier transcript or protein annotations as a guide.
>>>> 
>>>> ?Carson
>>>> 
>>>> 
>>>> 
>>>> 
>>>> From: Shaun Jackman <sjackman at gmail.com>
>>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>>> To: <maker-devel at yandell-lab.org>
>>>> Subject: [maker-devel] Mapping gene names
>>>> 
>>>> Hi,
>>>> 
>>>> I?m annotating a genome using a closely related genome from Genbank, using
>>>> the .frn (RNA) and .faa (protein) files from Genbank as evidence to
>>>> annotate my genome. I?ve run Maker, and the annotation seems to have worked
>>>> well. Is it possible to map the names of the genes from the related species
>>>> to my annotation? I see the map_forward option, which applies to the
>>>> model_gff parameter. Is there a similar option for est and protein?
>>>> 
>>>> maker_opts.ctl
>>>> est=NC_123456.frn
>>>> protein=NC_123456.faa
>>>> est2genome=1
>>>> protein2genome=1
>>>> Thanks,
>>>> Shaun
>>>> _______________________________________________ maker-devel mailing list
>>>> maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>> _______________________________________________
>>>> maker-devel mailing list
>>>> maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/8981875a/attachment-0002.html>

From carsonhh at gmail.com  Wed Feb 26 16:45:30 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 26 Feb 2014 16:45:30 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CF33B334.A551%carsonhh@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
Message-ID: <B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>

Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff.

--Carson 

Sent from my iPhone

> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
> 
> What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1.  Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline.  Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted).  I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors. 
> 
> Thanks,
> Carson
> 
> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
> Date: Wednesday, February 26, 2014 at 3:04 PM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Mapping gene names
> 
> It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions.
> 
> In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature.
> 
> THanks,
> Mikael
> 
>> 26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>> 
>> It will still work without est_forward.  It just works a little differently.  Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome.
>> 
>> If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline.  Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well).  To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors).  The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match. 
>> 
>> Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter).  MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly.  Also match parameters for exonerate will not be relaxed as they were with est_forward.
>> 
>> As you can see the behavior, is slightly different (because it?s an accidental feature).
>> 
>> Thanks,
>> Carson
>> 
>> 
>> 
>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>> Date: Wednesday, February 26, 2014 at 6:37 AM
>> To: Carson Holt <carsonhh at gmail.com>
>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] Mapping gene names
>> 
>> That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right?
>> 
>> Mikael
>> 
>>> 26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>>> 
>>> Yes.  That should work as well as an accidental feature.
>>> 
>>> --Carson 
>>> 
>>> Sent from my iPhone
>>> 
>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se> wrote:
>>>> 
>>>> Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?
>>>> 
>>>> Thanks,
>>>> Mikael
>>>> 
>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>>>> 
>>>>> There is a way.  It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that.  The option won?t already be there so you?ll have to type it in.
>>>>> 
>>>>> There is also a feature designed to work with this option.  If you add tags to your fasta headers, those can be used to guide the mapping and naming.  For example, gene_id=<some_gene>  will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp  and just using maker_coor=chr1 will force it to only be mapped against chr1.
>>>>> 
>>>>> This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.
>>>>> 
>>>>> ?Carson
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> From: Shaun Jackman <sjackman at gmail.com>
>>>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>>>> To: <maker-devel at yandell-lab.org>
>>>>> Subject: [maker-devel] Mapping gene names
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?
>>>>> 
>>>>> maker_opts.ctl
>>>>> 
>>>>> est=NC_123456.frn
>>>>> protein=NC_123456.faa
>>>>> est2genome=1
>>>>> protein2genome=1
>>>>> Thanks,
>>>>> Shaun
>>>>> 
>>>>> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org 
>>>>> _______________________________________________
>>>>> maker-devel mailing list
>>>>> maker-devel at box290.bluehost.com
>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/4b8b7fdb/attachment-0002.html>

From bioinformatics.umd at gmail.com  Thu Feb 27 09:46:44 2014
From: bioinformatics.umd at gmail.com (UMD Bioinformatics)
Date: Thu, 27 Feb 2014 11:46:44 -0500
Subject: [maker-devel] Problem with OpenFabrics and infiniband
Message-ID: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com>

Hello,

I?ve had my IT folks install maker on our cluster at UMD. I?m having a SEGFAULT error when running maker on inifiniband nodes vs gigE nodes. According to the logs this appears to be an issue with forks but I?m not sure how to fix this. I would simply use the gigE nodes but we are in the process of updating everything to inifiniband so I?ll need to address this issue as some point. I?ve attached the error log from the MPI run as well as commentary from my HPCC team. 

IT suggestions

If you look at the top of the error log for the problematic job, it clearly
warns of an issue with doing 'fork's within openmpi/openfabrics framework.

In particular, the use of the fork system call is only partially supported
in the OpenFabrics software (this is the drivers, etc for the infiniband
connections). See e.g. 
http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork
for more information. In particular the paragraphs starting with the
sentence with the red highlighted "it does not mean that your fork()-calling 
application is safe". (The kernel, openMPI version, and OFED version are 
sufficiently recent to mean that there is _some_ fork support).

The fact that the job runs over gigE but not IB, in conjunction with the
warning from openmpi, strongly suggests that this is the issue that you are 
encountering. I suspect that maker touches registered memory before the fork,
which would result in a segfault (matching what was observed).

You can try adding the arguments
--mca mpi_warn_on_fork 0 
to the mpirun command, just in case the crash was somehow caused by openmpi's
warning, but I would not hold out much hope for that.

###UPDATE### This does not fix the problem.


Basically, it looks like maker uses some system calls like fork in a manner
which is incompatible with the current OpenFabrics software, and thus will
not work with infiniband. This situation is likely to remain until either
maker changes to be compatible with OFED, or OFED's support for the fork
system call is broadened.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140227/acd7e3ab/attachment-0004.html>
-------------- next part --------------
STATUS: Parsing control files...
--------------------------------------------------------------------------
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process.  Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption.  The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.  

The process that invoked fork was:

  Local host:          compute-g20-7.deepthought.umd.edu (PID 28015)
  MPI_COMM_WORLD rank: 0

If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
--------------------------------------------------------------------------
[compute-g20-8:09542] *** Process received signal ***
[compute-g20-8:09542] Signal: Segmentation fault (11)
[compute-g20-8:09542] Signal code: Address not mapped (1)
[compute-g20-8:09542] Failing at address: 0xee00350
[compute-g20-8:09543] *** Process received signal ***
[compute-g20-8:09543] Signal: Segmentation fault (11)
[compute-g20-8:09543] Signal code: Address not mapped (1)
[compute-g20-8:09543] Failing at address: 0xf020c90
[compute-g20-8:09544] *** Process received signal ***
[compute-g20-8:09544] Signal: Segmentation fault (11)
[compute-g20-8:09544] Signal code: Address not mapped (1)
[compute-g20-8:09544] Failing at address: 0x1ad68f10
[compute-g20-8:09545] *** Process received signal ***
[compute-g20-8:09545] Signal: Segmentation fault (11)
[compute-g20-8:09545] Signal code: Address not mapped (1)
[compute-g20-8:09545] Failing at address: 0x84a3188
[compute-g20-8:09545] [ 0] /lib64/libpthread.so.0 [0x2b98fac5eca0]
[compute-g20-8:09545] [ 1] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_memory_ptmalloc2_int_malloc+0x530) [0x2b98f9ea4ec0]
[compute-g20-8:09545] [ 2] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_memory_ptmalloc2_malloc+0x4a) [0x2b98f9ea60ca]
[compute-g20-8:09545] [ 3] perl(Perl_safesysmalloc+0x12) [0x481602]
[compute-g20-8:09545] [ 4] perl(Perl_savepvn+0x26) [0x4816b6]
[compute-g20-8:09545] [ 5] perl(Perl_do_exec3+0x31e) [0x4f715e]
[compute-g20-8:09545] [ 6] perl(Perl_my_popen+0x403) [0x484d63]
[compute-g20-8:09545] [ 7] perl(Perl_do_openn+0x1696) [0x4f9536]
[compute-g20-8:09545] [ 8] perl(Perl_pp_open+0x184) [0x4efc44]
[compute-g20-8:09545] [ 9] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-8:09545] [10] perl(perl_run+0x243) [0x4340f3]
[compute-g20-8:09545] [11] perl(main+0x135) [0x41b485]
[compute-g20-8:09545] [12] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b98fae899c4]
[compute-g20-8:09545] [13] perl [0x41b299]
[compute-g20-8:09545] *** End of error message ***
[compute-g20-8:09546] *** Process received signal ***
[compute-g20-8:09546] Signal: Segmentation fault (11)
[compute-g20-8:09546] Signal code: Address not mapped (1)
[compute-g20-8:09546] Failing at address: 0x8240850
[compute-g20-8:09547] *** Process received signal ***
[compute-g20-8:09547] Signal: Segmentation fault (11)
[compute-g20-8:09547] Signal code: Address not mapped (1)
[compute-g20-8:09547] Failing at address: 0xd5c8850
[compute-g20-8:09548] *** Process received signal ***
[compute-g20-8:09548] Signal: Segmentation fault (11)
[compute-g20-8:09548] Signal code: Address not mapped (1)
[compute-g20-8:09548] Failing at address: 0x8c80850
[compute-g20-8:09549] *** Process received signal ***
[compute-g20-8:09549] Signal: Segmentation fault (11)
[compute-g20-8:09549] Signal code: Address not mapped (1)
[compute-g20-8:09549] Failing at address: 0x18d72850
[compute-g20-10:07087] *** Process received signal ***
[compute-g20-10:07087] Signal: Segmentation fault (11)
[compute-g20-10:07087] Signal code: Address not mapped (1)
[compute-g20-10:07087] Failing at address: 0x6659f10
[compute-g20-10:07088] *** Process received signal ***
[compute-g20-10:07088] Signal: Segmentation fault (11)
[compute-g20-10:07088] Signal code: Address not mapped (1)
[compute-g20-10:07088] Failing at address: 0x1fe3b5d0
[compute-g20-10:07089] *** Process received signal ***
[compute-g20-10:07089] Signal: Segmentation fault (11)
[compute-g20-10:07089] Signal code: Address not mapped (1)
[compute-g20-10:07089] Failing at address: 0x9870350
[compute-g20-10:07090] *** Process received signal ***
[compute-g20-10:07090] Signal: Segmentation fault (11)
[compute-g20-10:07090] Signal code: Address not mapped (1)
[compute-g20-10:07090] Failing at address: 0x17bad350
STATUS: Processing and indexing input FASTA files...
[compute-g20-8:09567] *** Process received signal ***
[compute-g20-8:09567] Signal: Segmentation fault (11)
[compute-g20-8:09567] Signal code: Address not mapped (1)
[compute-g20-8:09567] Failing at address: 0x1ad5aa10
[compute-g20-8:09567] [ 0] /lib64/libpthread.so.0 [0x2b6de3ce1ca0]
[compute-g20-8:09567] [ 1] /lib64/libc.so.6(strlen+0x30) [0x2b6de3f67f40]
[compute-g20-8:09567] [ 2] perl(Perl_do_exec3+0x3a) [0x4f6e7a]
[compute-g20-8:09567] [ 3] perl(Perl_my_popen+0x403) [0x484d63]
[compute-g20-8:09567] [ 4] perl(Perl_do_openn+0x1696) [0x4f9536]
[compute-g20-8:09567] [ 5] perl(Perl_pp_open+0x184) [0x4efc44]
[compute-g20-8:09567] [ 6] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-8:09567] [ 7] perl(perl_run+0x243) [0x4340f3]
[compute-g20-8:09567] [ 8] perl(main+0x135) [0x41b485]
[compute-g20-8:09567] [ 9] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b6de3f0c9c4]
[compute-g20-8:09567] [10] perl [0x41b299]
[compute-g20-8:09567] *** End of error message ***
[compute-g20-7:28123] *** Process received signal ***
[compute-g20-7:28123] Signal: Segmentation fault (11)
[compute-g20-7:28123] Signal code: Address not mapped (1)
[compute-g20-7:28123] Failing at address: 0x19ad9f10
STATUS: Setting up database for any GFF3 input...
A data structure will be created for you at:
/export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore

To access files for individual sequences use the datastore index:
/export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_master_datastore_index.log

STATUS: Now running MAKER...
examining contents of the fasta file and run log
[compute-g20-10:07107] *** Process received signal ***
[compute-g20-10:07107] Signal: Segmentation fault (11)
[compute-g20-10:07107] Signal code: Address not mapped (1)
[compute-g20-10:07107] Failing at address: 0x9870362
[compute-g20-10:07107] [ 0] /lib64/libpthread.so.0 [0x2b50c5c8cca0]
[compute-g20-10:07107] [ 1] perl [0x487218]
[compute-g20-10:07107] [ 2] perl(Perl_hv_common+0xe67) [0x499dd7]
[compute-g20-10:07107] [ 3] perl [0x49d9dc]
[compute-g20-10:07107] [ 4] perl(Perl_pp_method_named+0x6e) [0x49dd4e]
[compute-g20-10:07107] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-10:07107] [ 6] perl(perl_run+0x243) [0x4340f3]
[compute-g20-10:07107] [ 7] perl(main+0x135) [0x41b485]
[compute-g20-10:07107] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b50c5eb79c4]
[compute-g20-10:07107] [ 9] perl [0x41b299]
[compute-g20-10:07107] *** End of error message ***
examining contents of the fasta file and run log
examining contents of the fasta file and run log
[compute-g20-10:07108] *** Process received signal ***
[compute-g20-10:07108] Signal: Segmentation fault (11)
[compute-g20-10:07108] Signal code: Address not mapped (1)
[compute-g20-10:07108] Failing at address: 0x1fe3b5c8
examining contents of the fasta file and run log
[compute-g20-10:07108] [ 0] /lib64/libpthread.so.0 [0x2b88f6f8dca0]
[compute-g20-10:07108] [ 1] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_memory_ptmalloc2_free+0x22) [0x2b88f61d55b2]
[compute-g20-10:07108] [ 2] /lib64/libc.so.6(cfree+0xd1) [0x2b88f7210ad1]
[compute-g20-10:07108] [ 3] perl(Perl_sv_setsv_flags+0xb49) [0x4ad919]
[compute-g20-10:07108] [ 4] perl(Perl_pp_aassign+0x209) [0x4a3a19]
[compute-g20-10:07108] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-10:07108] [ 6] perl(perl_run+0x243) [0x4340f3]
[compute-g20-10:07108] [ 7] perl(main+0x135) [0x41b485]
[compute-g20-10:07108] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b88f71b89c4]
[compute-g20-10:07108] [ 9] perl [0x41b299]
[compute-g20-10:07108] *** End of error message ***
examining contents of the fasta file and run log
[compute-g20-10:07109] *** Process received signal ***
[compute-g20-10:07109] Signal: Segmentation fault (11)
[compute-g20-10:07109] Signal code: Address not mapped (1)
[compute-g20-10:07109] Failing at address: 0x6664ad0
[compute-g20-10:07109] [ 0] /lib64/libpthread.so.0 [0x2b0809664ca0]
[compute-g20-10:07109] [ 1] /lib64/libc.so.6 [0x2b08098edada]
[compute-g20-10:07109] [ 2] /lib64/libc.so.6(memmove+0x75) [0x2b08098ec095]
[compute-g20-10:07109] [ 3] perl(Perl_sv_setpvn+0x7a) [0x4b775a]
[compute-g20-10:07109] [ 4] perl(Perl_pp_concat+0xc9) [0x4a5739]
[compute-g20-10:07109] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-10:07109] [ 6] perl(Perl_call_sv+0x160) [0x4333a0]
[compute-g20-10:07109] [ 7] perl(Perl_magic_methcall+0x182) [0x488c22]
[compute-g20-10:07109] [ 8] perl(Perl_magic_setpack+0x52) [0x489292]
[compute-g20-10:07109] [ 9] perl(Perl_mg_set+0x66) [0x48aca6]
[compute-g20-10:07109] [10] perl(Perl_pp_sassign+0x19c) [0x4a5c8c]
[compute-g20-10:07109] [11] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-10:07109] [12] perl(perl_run+0x243) [0x4340f3]
[compute-g20-10:07109] [13] perl(main+0x135) [0x41b485]
[compute-g20-10:07109] [14] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b080988f9c4]
[compute-g20-10:07109] [15] perl [0x41b299]
[compute-g20-10:07109] *** End of error message ***
examining contents of the fasta file and run log
examining contents of the fasta file and run log
examining contents of the fasta file and run log
examining contents of the fasta file and run log


--Next Contig--


--Next Contig--


--Next Contig--

examining contents of the fasta file and run log


--Next Contig--

Processing run.log file...
Processing run.log file...
examining contents of the fasta file and run log
Processing run.log file...
Processing run.log file...


--Next Contig--


--Next Contig--


--Next Contig--


--Next Contig--


--Next Contig--

Processing run.log file...
Processing run.log file...


--Next Contig--


--Next Contig--

Processing run.log file...
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_2
Length: 2857
#---------------------------------------------------------------------


Processing run.log file...
MAKER WARNING: The file UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/D5/5A/Gc_UCSC1_contig_17//theVoid.Gc_UCSC1_contig_17/0/Gc_UCSC1_contig_17.0.all.rb.out
did not finish on the last run and must be erased
Processing run.log file...
setting up GFF3 output and fasta chunks
Processing run.log file...
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_7
Length: 972
#---------------------------------------------------------------------


[compute-g20-8:09576] *** Process received signal ***
[compute-g20-8:09576] Signal: Segmentation fault (11)
[compute-g20-8:09576] Signal code: Address not mapped (1)
[compute-g20-8:09576] Failing at address: 0x1ad68f08
examining contents of the fasta file and run log
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_3
Length: 2316
#---------------------------------------------------------------------


[compute-g20-8:09576] [ 0] /lib64/libpthread.so.0 [0x2b6de3ce1ca0]
[compute-g20-8:09576] [ 1] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_memory_ptmalloc2_free+0x22) [0x2b6de2f295b2]
[compute-g20-8:09576] [ 2] /lib64/libc.so.6(cfree+0xd1) [0x2b6de3f64ad1]
[compute-g20-8:09576] [ 3] perl(Perl_sv_setsv_flags+0xb49) [0x4ad919]
[compute-g20-8:09576] [ 4] perl(Perl_pp_aassign+0x209) [0x4a3a19]
[compute-g20-8:09576] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-8:09576] [ 6] perl(perl_run+0x243) [0x4340f3]
[compute-g20-8:09576] [ 7] perl(main+0x135) [0x41b485]
[compute-g20-8:09576] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b6de3f0c9c4]
[compute-g20-8:09576] [ 9] perl [0x41b299]
[compute-g20-8:09576] *** End of error message ***
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_4
Length: 1230
#---------------------------------------------------------------------


examining contents of the fasta file and run log
examining contents of the fasta file and run log
examining contents of the fasta file and run log
examining contents of the fasta file and run log
examining contents of the fasta file and run log
[compute-g20-8:09578] *** Process received signal ***
[compute-g20-8:09578] Signal: Segmentation fault (11)
[compute-g20-8:09578] Signal code: Address not mapped (1)
[compute-g20-8:09578] Failing at address: 0xee0af18
[compute-g20-8:09578] [ 0] /lib64/libpthread.so.0 [0x2b03d0637ca0]
[compute-g20-8:09578] [ 1] perl(Perl_av_fetch+0x5b) [0x49cf8b]
[compute-g20-8:09578] [ 2] perl(Perl_pp_aelem+0x26e) [0x49e48e]
[compute-g20-8:09578] [ 3] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-8:09578] [ 4] perl(perl_run+0x243) [0x4340f3]
[compute-g20-8:09578] [ 5] perl(main+0x135) [0x41b485]
[compute-g20-8:09578] [ 6] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b03d08629c4]
[compute-g20-8:09578] [ 7] perl [0x41b299]
[compute-g20-8:09578] *** End of error message ***
setting up GFF3 output and fasta chunks
Processing run.log file...
[compute-g20-8:09583] *** Process received signal ***
[compute-g20-8:09583] Signal: Segmentation fault (11)
[compute-g20-8:09583] Signal code: Address not mapped (1)
[compute-g20-8:09583] Failing at address: 0x822b0e2
[compute-g20-8:09582] *** Process received signal ***
[compute-g20-8:09582] Signal: Segmentation fault (11)
[compute-g20-8:09582] Signal code: Address not mapped (1)
[compute-g20-8:09582] Failing at address: 0x8c6b0e2
[compute-g20-8:09583] [ 0] /lib64/libpthread.so.0 [0x2ab7f114dca0]
[compute-g20-8:09583] [ 1] perl [0x487218]
[compute-g20-8:09583] [ 2] perl(Perl_hv_common+0xe67) [0x499dd7]
[compute-g20-8:09583] [ 3] perl [0x49d9dc]
[compute-g20-8:09583] [ 4] perl(Perl_pp_method_named+0x6e) [0x49dd4e]
[compute-g20-8:09583] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-8:09583] [ 6] perl(perl_run+0x243) [0x4340f3]
[compute-g20-8:09583] [ 7] perl(main+0x135) [0x41b485]
[compute-g20-8:09583] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2ab7f13789c4]
[compute-g20-8:09583] [ 9] perl [0x41b299]
[compute-g20-8:09583] *** End of error message ***
[compute-g20-8:09582] [ 0] /lib64/libpthread.so.0 [0x2b4eace23ca0]
[compute-g20-8:09582] [ 1] perl [0x487218]
[compute-g20-8:09582] [ 2] perl(Perl_hv_common+0xe67) [0x499dd7]
[compute-g20-8:09582] [ 3] perl [0x49d9dc]
[compute-g20-8:09582] [ 4] perl(Perl_pp_method_named+0x6e) [0x49dd4e]
[compute-g20-8:09582] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-8:09582] [ 6] perl(perl_run+0x243) [0x4340f3]
[compute-g20-8:09582] [ 7] perl(main+0x135) [0x41b485]
[compute-g20-8:09582] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b4ead04e9c4]
[compute-g20-8:09582] [ 9] perl [0x41b299]
[compute-g20-8:09582] *** End of error message ***
examining contents of the fasta file and run log
[compute-g20-8:09581] *** Process received signal ***
[compute-g20-8:09581] Signal: Segmentation fault (11)
[compute-g20-8:09581] Signal code: Address not mapped (1)
[compute-g20-8:09581] Failing at address: 0x848da08
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_17
Length: 1413
#---------------------------------------------------------------------


#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_13
Length: 2019
#---------------------------------------------------------------------


[compute-g20-8:09581] [ 0] /lib64/libpthread.so.0 [0x2b98fac5eca0]
[compute-g20-8:09581] [ 1] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_memory_ptmalloc2_free+0x22) [0x2b98f9ea65b2]
[compute-g20-8:09581] [ 2] /lib64/libc.so.6(cfree+0xd1) [0x2b98faee1ad1]
[compute-g20-8:09581] [ 3] perl(Perl_sv_setsv_flags+0xb49) [0x4ad919]
[compute-g20-8:09581] [ 4] perl(Perl_pp_aassign+0x209) [0x4a3a19]
[compute-g20-8:09581] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-8:09581] [ 6] perl(perl_run+0x243) [0x4340f3]
[compute-g20-8:09581] [ 7] perl(main+0x135) [0x41b485]
[compute-g20-8:09581] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b98fae899c4]
[compute-g20-8:09581] [ 9] perl [0x41b299]
[compute-g20-8:09577] *** Process received signal ***
[compute-g20-8:09581] *** End of error message ***
[compute-g20-8:09577] Signal: Segmentation fault (11)
[compute-g20-8:09577] Signal code: Address not mapped (1)
[compute-g20-8:09577] Failing at address: 0xd5b30e2
[compute-g20-8:09577] [ 0] /lib64/libpthread.so.0 [0x2b79d382aca0]
[compute-g20-8:09577] [ 1] perl [0x487218]
[compute-g20-8:09577] [ 2] perl(Perl_hv_common+0xe67) [0x499dd7]
[compute-g20-8:09577] [ 3] perl [0x49d9dc]
[compute-g20-8:09577] [ 4] perl(Perl_pp_method_named+0x6e) [0x49dd4e]
[compute-g20-8:09577] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-8:09577] [ 6] perl(perl_run+0x243) [0x4340f3]
[compute-g20-8:09577] [ 7] perl(main+0x135) [0x41b485]
[compute-g20-8:09577] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b79d3a559c4]
[compute-g20-8:09577] [ 9] perl [0x41b299]
[compute-g20-8:09577] *** End of error message ***
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_1
Length: 1446
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
[compute-g20-8:09579] *** Process received signal ***
[compute-g20-8:09579] Signal: Segmentation fault (11)
[compute-g20-8:09579] Signal code: Address not mapped (1)
[compute-g20-8:09579] Failing at address: 0x18d64350
examining contents of the fasta file and run log
[compute-g20-8:09579] [ 0] /lib64/libpthread.so.0 [0x2b31b670fca0]
[compute-g20-8:09579] [ 1] /usr/local/BerkeleyDB/lib/libdb-4.7.so(__ham_get_meta+0x4c) [0x2b31bbd1bccc]
[compute-g20-8:09579] [ 2] /usr/local/BerkeleyDB/lib/libdb-4.7.so [0x2b31bbd103fb]
[compute-g20-8:09579] [ 3] /usr/local/BerkeleyDB/lib/libdb-4.7.so(__dbc_get+0x1fa) [0x2b31bbd81f3a]
[compute-g20-8:09579] [ 4] /usr/local/BerkeleyDB/lib/libdb-4.7.so(__dbc_get_pp+0xb4) [0x2b31bbd8db04]
[compute-g20-8:09579] [ 5] /usr/local/BerkeleyDB/lib/libdb-4.7.so [0x2b31bbce4b85]
[compute-g20-8:09579] [ 6] /usr/local/perl/5.16.3-threaded/lib/site_perl/5.16.3/x86_64-linux-thread-multi/auto/DB_File/DB_File.so [0x2b31bbabafc9]
[compute-g20-8:09579] [ 7] perl(Perl_pp_entersub+0x58f) [0x49ee4f]
[compute-g20-8:09579] [ 8] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-8:09579] [ 9] perl(perl_run+0x243) [0x4340f3]
[compute-g20-8:09579] [10] perl(main+0x135) [0x41b485]
[compute-g20-8:09579] [11] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b31b693a9c4]
[compute-g20-8:09579] [12] perl [0x41b299]
[compute-g20-8:09579] *** End of error message ***


--Next Contig--

setting up GFF3 output and fasta chunks
setting up GFF3 output and fasta chunks


--Next Contig--

setting up GFF3 output and fasta chunks


--Next Contig--


--Next Contig--

Processing run.log file...
MAKER WARNING: The file UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/3B/F3/Gc_UCSC1_contig_26//theVoid.Gc_UCSC1_contig_26/0/Gc_UCSC1_contig_26.0.all.rb.out
did not finish on the last run and must be erased


--Next Contig--


--Next Contig--


--Next Contig--

#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_18
Length: 937
#---------------------------------------------------------------------


Processing run.log file...
Processing run.log file...
Processing run.log file...


--Next Contig--

FATAL: Thread terminated, causing all processes to fail
--> rank=17, hostname=compute-g20-10.deepthought.umd.edu
setting up GFF3 output and fasta chunks
Processing run.log file...
Processing run.log file...
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_14
Length: 6745
#---------------------------------------------------------------------


#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_9
Length: 554
#---------------------------------------------------------------------


MAKER WARNING: The file UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/FB/E4/Gc_UCSC1_contig_22//theVoid.Gc_UCSC1_contig_22/0/Gc_UCSC1_contig_22.0.all.rb.out
did not finish on the last run and must be erased
setting up GFF3 output and fasta chunks
Processing run.log file...
setting up GFF3 output and fasta chunks
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_16
Length: 995
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_26
Length: 1895
#---------------------------------------------------------------------


FATAL: Thread terminated, causing all processes to fail
--> rank=16, hostname=compute-g20-10.deepthought.umd.edu
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_23
Length: 618
#---------------------------------------------------------------------


#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_31
Length: 506
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
setting up GFF3 output and fasta chunks
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_28
Length: 5246
#---------------------------------------------------------------------


MAKER WARNING: The file UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/E5/53/Gc_UCSC1_contig_29//theVoid.Gc_UCSC1_contig_29/0/Gc_UCSC1_contig_29.0.all.rb.out
did not finish on the last run and must be erased
setting up GFF3 output and fasta chunks
setting up GFF3 output and fasta chunks
setting up GFF3 output and fasta chunks
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_19
Length: 880
#---------------------------------------------------------------------


#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_22
Length: 831
#---------------------------------------------------------------------


#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_21
Length: 12421
#---------------------------------------------------------------------


doing repeat masking
FATAL: Thread terminated, causing all processes to fail
--> rank=18, hostname=compute-g20-10.deepthought.umd.edu
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_29
Length: 1161
#---------------------------------------------------------------------


doing repeat masking
DBD::SQLite::db do failed: disk I/O error at /export/rel50_shadow/glue.umd.edu/software/maker/2.28/.amd64_rel50/bin/../lib/GFFDB.pm line 105.
DBD::SQLite::db do failed: disk I/O error at /export/rel50_shadow/glue.umd.edu/software/maker/2.28/.amd64_rel50/bin/../lib/GFFDB.pm line 106.
DBD::SQLite::db selectcol_arrayref failed: disk I/O error at /export/rel50_shadow/glue.umd.edu/software/maker/2.28/.amd64_rel50/bin/../lib/GFFDB.pm line 108.
DBD::SQLite::db do failed: disk I/O error at /export/rel50_shadow/glue.umd.edu/software/maker/2.28/.amd64_rel50/bin/../lib/GFFDB.pm line 110.
[compute-g20-7.deepthought.umd.edu:28014] 19 more processes have sent help message help-mpi-runtime.txt / mpi_init:warn-fork
[compute-g20-7.deepthought.umd.edu:28014] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
doing repeat masking
running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
cd /tmp/maker_amJ13c; /a/g20-fs1/software/dt-sw0/RepeatMasker/4.0.3/RepeatMasker /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/0F/67/Gc_UCSC1_contig_9//theVoid.Gc_UCSC1_contig_9/0/Gc_UCSC1_contig_9.0.all.rb -species all -dir /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/0F/67/Gc_UCSC1_contig_9//theVoid.Gc_UCSC1_contig_9/0 -pa 1
#-------------------------------#
SIGTERM received
doing repeat masking
running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
cd /tmp/maker_amJ13c; /a/g20-fs1/software/dt-sw0/RepeatMasker/4.0.3/RepeatMasker /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/D5/5A/Gc_UCSC1_contig_17//theVoid.Gc_UCSC1_contig_17/0/Gc_UCSC1_contig_17.0.all.rb -species all -dir /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/D5/5A/Gc_UCSC1_contig_17//theVoid.Gc_UCSC1_contig_17/0 -pa 1
#-------------------------------#
SIGTERM received
SIGTERM received
[compute-g20-7:28161] *** Process received signal ***
[compute-g20-7:28161] Signal: Segmentation fault (11)
[compute-g20-7:28161] Signal code: Address not mapped (1)
[compute-g20-7:28161] Failing at address: 0x19a33ad0
[compute-g20-7:28161] [ 0] /lib64/libpthread.so.0 [0x2b9e1cd6bca0]
[compute-g20-7:28161] [ 1] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_memory_ptmalloc2_int_malloc+0xb0) [0x2b9e1bfb1a40]
[compute-g20-7:28161] [ 2] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_memory_ptmalloc2_malloc+0x4a) [0x2b9e1bfb30ca]
[compute-g20-7:28161] [ 3] perl(Perl_safesysmalloc+0x12) [0x481602]
[compute-g20-7:28161] [ 4] perl(Perl_do_exec3+0x46) [0x4f6e86]
[compute-g20-7:28161] [ 5] perl(Perl_my_popen+0x403) [0x484d63]
[compute-g20-7:28161] [ 6] perl(Perl_pp_backtick+0xc2) [0x4f0752]
[compute-g20-7:28161] [ 7] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-7:28161] [ 8] perl(Perl_call_sv+0x4d1) [0x433711]
[compute-g20-7:28161] [ 9] perl(Perl_sighandler+0x208) [0x4876c8]
[compute-g20-7:28161] [10] /lib64/libpthread.so.0 [0x2b9e1cd6bca0]
[compute-g20-7:28161] [11] /usr/local/ofed/1.5.4/lib64/libmthca-rdmav2.so [0x2b9e29187bbc]
[compute-g20-7:28161] [12] /cell_root/software/openmpi/1.6/gnu/sys/lib/openmpi/mca_btl_openib.so [0x2b9e2686a8dd]
[compute-g20-7:28161] [13] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_progress+0x5b) [0x2b9e1bfc93cb]
[compute-g20-7:28161] [14] /cell_root/software/openmpi/1.6/gnu/sys/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv+0x205) [0x2b9e25e22005]
[compute-g20-7:28161] [15] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(PMPI_Recv+0x14f) [0x2b9e1bf2927f]
[compute-g20-7:28161] [16] /export/rel50_shadow/glue.umd.edu/software/maker/2.28/.amd64_rel50/perl/lib/auto/Parallel/Application/MPI/MPI.so(_MPI_Recv+0x59) [0x2b9e23ba8d69]
[compute-g20-7:28161] [17] /export/rel50_shadow/glue.umd.edu/software/maker/2.28/.amd64_rel50/perl/lib/auto/Parallel/Application/MPI/MPI.so [0x2b9e23ba8f58]
[compute-g20-7:28161] [18] perl(Perl_pp_entersub+0x58f) [0x49ee4f]
[compute-g20-7:28161] [19] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-7:28161] [20] perl(perl_run+0x243) [0x4340f3]
[compute-g20-7:28161] [21] perl(main+0x135) [0x41b485]
[compute-g20-7:28161] [22] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b9e1cf969c4]
[compute-g20-7:28161] [23] perl [0x41b299]
[compute-g20-7:28161] *** End of error message ***
running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
cd /tmp/maker_amJ13c; /a/g20-fs1/software/dt-sw0/RepeatMasker/4.0.3/RepeatMasker /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/DC/D5/Gc_UCSC1_contig_18//theVoid.Gc_UCSC1_contig_18/0/Gc_UCSC1_contig_18.0.all.rb -species all -dir /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/DC/D5/Gc_UCSC1_contig_18//theVoid.Gc_UCSC1_contig_18/0 -pa 1
#-------------------------------#
SIGTERM received
doing repeat masking
running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
cd /tmp/maker_amJ13c; /a/g20-fs1/software/dt-sw0/RepeatMasker/4.0.3/RepeatMasker /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/BE/77/Gc_UCSC1_contig_16//theVoid.Gc_UCSC1_contig_16/0/Gc_UCSC1_contig_16.0.all.rb -species all -dir /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/BE/77/Gc_UCSC1_contig_16//theVoid.Gc_UCSC1_contig_16/0 -pa 1
#-------------------------------#
SIGTERM received
doing repeat masking
running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
cd /tmp/maker_amJ13c; /a/g20-fs1/software/dt-sw0/RepeatMasker/4.0.3/RepeatMasker /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/1C/8A/Gc_UCSC1_contig_14//theVoid.Gc_UCSC1_contig_14/0/Gc_UCSC1_contig_14.0.all.rb -species all -dir /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/1C/8A/Gc_UCSC1_contig_14//theVoid.Gc_UCSC1_contig_14/0 -pa 1
#-------------------------------#
SIGTERM received
doing repeat masking
running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
cd /tmp/maker_amJ13c; /a/g20-fs1/software/dt-sw0/RepeatMasker/4.0.3/RepeatMasker /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/CB/E5/Gc_UCSC1_contig_13//theVoid.Gc_UCSC1_contig_13/0/Gc_UCSC1_contig_13.0.all.rb -species all -dir /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/CB/E5/Gc_UCSC1_contig_13//theVoid.Gc_UCSC1_contig_13/0 -pa 1
#-------------------------------#
SIGTERM received
Perl exited with active threads:
	1 running and unjoined
	0 finished and unjoined
	0 running and detached
doing repeat masking
running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
cd /tmp/maker_amJ13c; /a/g20-fs1/software/dt-sw0/RepeatMasker/4.0.3/RepeatMasker /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/AA/A6/Gc_UCSC1_contig_1//theVoid.Gc_UCSC1_contig_1/0/Gc_UCSC1_contig_1.0.all.rb -species all -dir /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/AA/A6/Gc_UCSC1_contig_1//theVoid.Gc_UCSC1_contig_1/0 -pa 1
#-------------------------------#
SIGTERM received
--------------------------------------------------------------------------
mpirun has exited due to process rank 17 with PID 7052 on
node compute-g20-10.deepthought.umd.edu exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
SIGTERM received
SIGTERM received
Perl exited with active threads:
	0 running and unjoined
	1 finished and unjoined
	0 running and detached
FATAL: Thread terminated, causing all processes to fail
--> rank=14, hostname=compute-g20-8.deepthought.umd.edu
Perl exited with active threads:
	0 running and unjoined
	1 finished and unjoined
	0 running and detached
FATAL: Thread terminated, causing all processes to fail
--> rank=12, hostname=compute-g20-8.deepthought.umd.edu
[compute-g20-8:09470] *** Process received signal ***
[compute-g20-8:09470] Signal: Segmentation fault (11)
[compute-g20-8:09470] Signal code: Address not mapped (1)
[compute-g20-8:09470] Failing at address: 0x4b0
[compute-g20-8:09470] [ 0] /lib64/libpthread.so.0 [0x2b03d0637ca0]
[compute-g20-8:09470] [ 1] perl(Perl_csighandler+0x23) [0x488103]
[compute-g20-8:09470] [ 2] /lib64/libpthread.so.0 [0x2b03d0637ca0]
[compute-g20-8:09470] [ 3] /lib64/libc.so.6(__select+0x62) [0x2b03d0913402]
[compute-g20-8:09470] [ 4] /cell_root/software/openmpi/1.6/gnu/sys/lib/openmpi/mca_btl_openib.so [0x2b03da142ff3]
[compute-g20-8:09470] [ 5] /lib64/libpthread.so.0 [0x2b03d062f83d]
[compute-g20-8:09470] [ 6] /lib64/libc.so.6(clone+0x6d) [0x2b03d091a26d]
[compute-g20-8:09470] *** End of error message ***
Perl exited with active threads:
	0 running and unjoined
	1 finished and unjoined
	0 running and detached
FATAL: Thread terminated, causing all processes to fail
--> rank=11, hostname=compute-g20-8.deepthought.umd.edu
setting up GFF3 output and fasta chunks
FATAL: Thread terminated, causing all processes to fail
--> rank=10, hostname=compute-g20-8.deepthought.umd.edu
setting up GFF3 output and fasta chunks
FATAL: Thread terminated, causing all processes to fail
--> rank=13, hostname=compute-g20-8.deepthought.umd.edu
FATAL: Thread terminated, causing all processes to fail
--> rank=15, hostname=compute-g20-8.deepthought.umd.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140227/acd7e3ab/attachment-0005.html>

From carson.holt at genetics.utah.edu  Thu Feb 27 11:09:21 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Thu, 27 Feb 2014 18:09:21 +0000
Subject: [maker-devel] Problem with OpenFabrics and infiniband
In-Reply-To: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com>
References: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com>
Message-ID: <CF34C944.A5B0%carson.holt@genetics.utah.edu>

It?s a little more complicated than that.  MAKER is written in Perl, and Perl doesn?t give me the low level access that a language like C would for controlling memory access (I don?t control that).  All I get is Perl?s standard implementation of forks.  So it?s not really a matter of MAKER changing, it would be a matter of changing Perl itself (which I have no power over, and I don?t think will be changing anytime soon).

For now you just have to add this flag to OpenMPI when running MAKER with mpiexec ?>  -mca btl ^openib

Example :
mpiexec -mca btl ^openib -n 20 maker


Thanks,
Carson


From: UMD Bioinformatics <bioinformatics.umd at gmail.com<mailto:bioinformatics.umd at gmail.com>>
Date: Thursday, February 27, 2014 at 9:46 AM
To: <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Problem with OpenFabrics and infiniband

Hello,

I?ve had my IT folks install maker on our cluster at UMD. I?m having a SEGFAULT error when running maker on inifiniband nodes vs gigE nodes. According to the logs this appears to be an issue with forks but I?m not sure how to fix this. I would simply use the gigE nodes but we are in the process of updating everything to inifiniband so I?ll need to address this issue as some point. I?ve attached the error log from the MPI run as well as commentary from my HPCC team.

IT suggestions

If you look at the top of the error log for the problematic job, it clearly
warns of an issue with doing 'fork's within openmpi/openfabrics framework.

In particular, the use of the fork system call is only partially supported
in the OpenFabrics software (this is the drivers, etc for the infiniband
connections). See e.g.
http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork
for more information. In particular the paragraphs starting with the
sentence with the red highlighted "it does not mean that your fork()-calling
application is safe". (The kernel, openMPI version, and OFED version are
sufficiently recent to mean that there is _some_ fork support).

The fact that the job runs over gigE but not IB, in conjunction with the
warning from openmpi, strongly suggests that this is the issue that you are
encountering. I suspect that maker touches registered memory before the fork,
which would result in a segfault (matching what was observed).

You can try adding the arguments
--mca mpi_warn_on_fork 0
to the mpirun command, just in case the crash was somehow caused by openmpi's
warning, but I would not hold out much hope for that.

###UPDATE### This does not fix the problem.


Basically, it looks like maker uses some system calls like fork in a manner
which is incompatible with the current OpenFabrics software, and thus will
not work with infiniband. This situation is likely to remain until either
maker changes to be compatible with OFED, or OFED's support for the fork
system call is broadened.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140227/062719d0/attachment-0002.html>

From bioinformatics.umd at gmail.com  Thu Feb 27 11:55:34 2014
From: bioinformatics.umd at gmail.com (UMD Bioinformatics)
Date: Thu, 27 Feb 2014 13:55:34 -0500
Subject: [maker-devel] Problem with OpenFabrics and infiniband
In-Reply-To: <CF34C944.A5B0%carson.holt@genetics.utah.edu>
References: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com>
	<CF34C944.A5B0%carson.holt@genetics.utah.edu>
Message-ID: <2840BC1C-70CC-4A0D-AB44-AEFD718C7B8C@gmail.com>

Hi Carson,

Thanks that fixed the issue. 

Cheers
Ian

On Feb 27, 2014, at 1:09 PM, Carson Holt <carson.holt at genetics.utah.edu> wrote:

> It?s a little more complicated than that.  MAKER is written in Perl, and Perl doesn?t give me the low level access that a language like C would for controlling memory access (I don?t control that).  All I get is Perl?s standard implementation of forks.  So it?s not really a matter of MAKER changing, it would be a matter of changing Perl itself (which I have no power over, and I don?t think will be changing anytime soon).
> 
> For now you just have to add this flag to OpenMPI when running MAKER with mpiexec ?>  -mca btl ^openib
> 
> Example :
>> mpiexec -mca btl ^openib -n 20 maker
> 
> 
> Thanks,
> Carson
> 
> 
> From: UMD Bioinformatics <bioinformatics.umd at gmail.com>
> Date: Thursday, February 27, 2014 at 9:46 AM
> To: <maker-devel at yandell-lab.org>
> Subject: Problem with OpenFabrics and infiniband
> 
> Hello,
> 
> I?ve had my IT folks install maker on our cluster at UMD. I?m having a SEGFAULT error when running maker on inifiniband nodes vs gigE nodes. According to the logs this appears to be an issue with forks but I?m not sure how to fix this. I would simply use the gigE nodes but we are in the process of updating everything to inifiniband so I?ll need to address this issue as some point. I?ve attached the error log from the MPI run as well as commentary from my HPCC team. 
> 
> IT suggestions
> 
> If you look at the top of the error log for the problematic job, it clearly
> warns of an issue with doing 'fork's within openmpi/openfabrics framework.
> 
> In particular, the use of the fork system call is only partially supported
> in the OpenFabrics software (this is the drivers, etc for the infiniband
> connections). See e.g. 
> http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork
> for more information. In particular the paragraphs starting with the
> sentence with the red highlighted "it does not mean that your fork()-calling 
> application is safe". (The kernel, openMPI version, and OFED version are 
> sufficiently recent to mean that there is _some_ fork support).
> 
> The fact that the job runs over gigE but not IB, in conjunction with the
> warning from openmpi, strongly suggests that this is the issue that you are 
> encountering. I suspect that maker touches registered memory before the fork,
> which would result in a segfault (matching what was observed).
> 
> You can try adding the arguments
> --mca mpi_warn_on_fork 0 
> to the mpirun command, just in case the crash was somehow caused by openmpi's
> warning, but I would not hold out much hope for that.
> 
> ###UPDATE### This does not fix the problem.
> 
> 
> Basically, it looks like maker uses some system calls like fork in a manner
> which is incompatible with the current OpenFabrics software, and thus will
> not work with infiniband. This situation is likely to remain until either
> maker changes to be compatible with OFED, or OFED's support for the fork
> system call is broadened.
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140227/c8d05f7d/attachment-0002.html>

From sjackman at gmail.com  Thu Feb 27 16:17:22 2014
From: sjackman at gmail.com (Shaun Jackman)
Date: Thu, 27 Feb 2014 15:17:22 -0800
Subject: [maker-devel] Mapping gene names
In-Reply-To: <B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
Message-ID: <etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>

Is there a corresponding?protein_forward=1 option to map forward protein names from protein2genome?

Cheers,
Shaun

On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote:

Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff.

--Carson?

Sent from my iPhone

On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:

What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. ?Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. ?Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). ?I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors.?

Thanks,
Carson

From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date: Wednesday, February 26, 2014 at 3:04 PM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions.

In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature.

THanks,
Mikael

26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:

It will still work without est_forward. ?It just works a little differently. ?Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome.

If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. ?Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). ?So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). ?To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). ?The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match.?

Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). ?MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. ?Also match parameters for exonerate will not be relaxed as they were with est_forward.

As you can see the behavior, is slightly different (because it?s an accidental feature).

Thanks,
Carson


From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date: Wednesday, February 26, 2014 at 6:37 AM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right?

Mikael

26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:

Yes. ?That should work as well as an accidental feature.

--Carson?

Sent from my iPhone

On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se> wrote:

Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?

Thanks,
Mikael

26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:

There is a way. ?It?s not a standard option and it?s undocumented, but if you add?est_forward=1 to the maker_opts.ctl file, then it will do just that. ?The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option. ?If you add tags to your fasta headers, those can be used to guide the mapping and naming. ?For example, gene_id=<some_gene> ?will ensure different isoforms that share a common gene_id get clustered into the same gene, and?maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp ?and just using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.

?Carson


From: Shaun Jackman <sjackman at gmail.com>
Reply-To: Shaun Jackman <sjackman at gmail.com>
Date: Tuesday, February 25, 2014 at 5:06 PM
To: <maker-devel at yandell-lab.org>
Subject: [maker-devel] Mapping gene names

Hi,

I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?

maker_opts.ctl


est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1

Thanks,
Shaun

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________  
maker-devel mailing list  
maker-devel at box290.bluehost.com  
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140227/15f5085c/attachment-0002.html>

From sjackman at gmail.com  Thu Feb 27 17:27:30 2014
From: sjackman at gmail.com (Shaun Jackman)
Date: Thu, 27 Feb 2014 16:27:30 -0800
Subject: [maker-devel] Mapping gene names
In-Reply-To: <etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
Message-ID: <CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>

Sorry, ignore my previous question. est_forward also carries forward the
names of protein evidence and works like a charm. Thank you!

The larger rrn16 and rrn23 genes annotated perfectly, but the smaller
rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They
are in the blastn output, and in the evidence_0.gff. rrn5 has perfect
identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value
(2e-66 < eval_blastn=1e-10). How should I debug which filter is removing
these hits?

organism_type=prokaryotic
est2genome=1
protein2genome=1
est_forward=1

Cheers,
Shaun


On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:

> Is there a corresponding protein_forward=1 option to map forward protein
> names from protein2genome?
>
> Cheers,
> Shaun
>
> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com<//carsonhh at gmail.com>)
> wrote:
>
> Sorry I meant to say prefilter on the score in the mRNA column before
> passing the gff3 to model_gff.
>
> --Carson
>
> Sent from my iPhone
>
> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
>
>  What you can do is run it once with just est_forward=1 and
> est2genome/protein2genome set to 1.  Then take those results, pass them in
> as model_gff and use the map_forward option to then filter the results
> based on mRNA score and that would copy names onto new gene under the
> standard MAKER pipeline.  Eventually it?s really supposed to go into a
> separate tool that will map genes onto new assemblies (but under the hood
> the tool will just be calling MAKER with certain parameters restricted).  I
> do this because if people commonly use it mixed with things like SNAP I can
> start to get some very weird behaviors.
>
> Thanks,
> Carson
>
>  From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
> Date: Wednesday, February 26, 2014 at 3:04 PM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Mapping gene names
>
>  It seems that this could be a very useful option in those cases where
> you have firm a priori knowledge of the placement of ESTs. However, while
> trying it I note that est_forward implies that the est2genome predictor is
> turned on, implicitly. Is this necessary for this to work? I?m after the
> behavior you describe below where exonerate is made to try really hard
> within a limited region to align an est, but I would not like maker to
> produce est2genome predictions.
>
> In general, I think this maker_coor and est_forward is a feature set that
> is worthy to be promoted into a documented feature.
>
> THanks,
> Mikael
>
>  26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>
>  It will still work without est_forward.  It just works a little
> differently.  Keep in mind this was a hidden feature I used to find
> stubborn or hard to find missing genes after reassembly of a genome.
>
> If est_forward is provided, MAKER will parse the database to look for the
> maker_coor tags early in the pipeline.  Then it will create a list of
> locations to search, and it will search them even if there are no BLAST
> results to seed the search (normally MAKER gets a BLAST result first and
> then polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to
> look for a match using all of chr1 as the input to exonerate even when
> BLAST finds nothing (this is a very very slow search, but can help pick up
> one or two stubborn genes that don?t remap well).  To allow this, MAKER
> gives exonerate looser matching parameters (i.e. allows for single base
> pair introns perhaps caused by assembly errors).  The logic here is that
> given the fact that I already told MAKER that with some degree of
> confidence I expect sequence A to map to to location X, it will try its
> hardest to make it match.
>
> Without est_forward set, the maker_coor= flag still gets read in GI.pm at
> line 1563, but only after a BLAST alignment has already seeded it to the
> region (that BLAST result has the information in its description
> parameter).  MAKER will then ignore seeds completely outside of maker_coor.
> In addition any BLAST seeds that overlap maker_coor will get the search
> space for alignment polishing adjusted to match maker_coor exactly.  Also
> match parameters for exonerate will not be relaxed as they were with
> est_forward.
>
> As you can see the behavior, is slightly different (because it?s an
> accidental feature).
>
> Thanks,
> Carson
>
>
>
>  From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
> Date: Wednesday, February 26, 2014 at 6:37 AM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Mapping gene names
>
>  That might be a useful and time saving accidental feature. But, reading
> the code, it seems that I need to supply maker_coor but not gene_id, as
> well as the configuration option est_forward for this to work. Any
> occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1
> right?
>
> Mikael
>
>  26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>
>  Yes.  That should work as well as an accidental feature.
>
> --Carson
>
> Sent from my iPhone
>
> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <
> mikael.durling at slu.se> wrote:
>
> Can this use of maker_coor be used only to hint about the placement of the
> ests, without affecting the naming of the final genes? Ie if I have a
> database of EST where I have a priori knowledge of their rough placement,
> can this placement be given to maker without providing est_forward=1?
>
> Thanks,
> Mikael
>
>  26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>
>  There is a way.  It?s not a standard option and it?s undocumented, but
> if you add est_forward=1 to the maker_opts.ctl file, then it will do just
> that.  The option won?t already be there so you?ll have to type it in.
>
> There is also a feature designed to work with this option.  If you add
> tags to your fasta headers, those can be used to guide the mapping and
> naming.  For example, gene_id=<some_gene>  will ensure different isoforms
> that share a common gene_id get clustered into the same gene,
> and maker_coor=chr1:1-10000 in the fasta header will force a particular
> sequence to only be mapped against chr1 within the range of 1-10000 bp  and
> just using maker_coor=chr1 will force it to only be mapped against chr1.
>
> This is an undocumented way to remap genes onto new assemblies using blast
> alignments of earlier transcript or protein annotations as a guide.
>
> ?Carson
>
>
>
>
>  From: Shaun Jackman <sjackman at gmail.com>
> Reply-To: Shaun Jackman <sjackman at gmail.com>
> Date: Tuesday, February 25, 2014 at 5:06 PM
> To: <maker-devel at yandell-lab.org>
> Subject: [maker-devel] Mapping gene names
>
>  Hi,
>
> I?m annotating a genome using a closely related genome from Genbank, using
> the .frn (RNA) and .faa (protein) files from Genbank as evidence to
> annotate my genome. I?ve run Maker, and the annotation seems to have worked
> well. Is it possible to map the names of the genes from the related species
> to my annotation? I see the *map_forward* option, which applies to the
> *model_gff* parameter. Is there a similar option for *est* and *protein*?
>
> *maker_opts.ctl*
>
> est=NC_123456.frn
> protein=NC_123456.faa
> est2genome=1
> protein2genome=1
>
> Thanks,
> Shaun
>  _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
>  http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
>
>   _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140227/1670be5a/attachment-0002.html>

From carsonhh at gmail.com  Thu Feb 27 18:13:06 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 27 Feb 2014 18:13:06 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
	<CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
Message-ID: <CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>

Set single_exon=1, and the minimum size to a smaller value.  I think it's set to 250 right now.  Also est2genome is looking for ORF, so if there is none (as with tRNAs) they probably won't get picked up.

--Carson 

Sent from my iPhone

> On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:
> 
> Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you!
> 
> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits?
> 
> organism_type=prokaryotic
> est2genome=1
> protein2genome=1
> est_forward=1
> Cheers,
> Shaun
> 
> 
> 
>> On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
>> Is there a corresponding protein_forward=1 option to map forward protein names from protein2genome?
>> 
>> Cheers,
>> Shaun
>> 
>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote:
>>> 
>>> Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff.
>>> 
>>> --Carson 
>>> 
>>> Sent from my iPhone
>>> 
>>> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>> 
>>>> What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1.  Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline.  Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted).  I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors. 
>>>> 
>>>> Thanks,
>>>> Carson
>>>> 
>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>> Date: Wednesday, February 26, 2014 at 3:04 PM
>>>> To: Carson Holt <carsonhh at gmail.com>
>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>> Subject: Re: [maker-devel] Mapping gene names
>>>> 
>>>> It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions.
>>>> 
>>>> In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature.
>>>> 
>>>> THanks,
>>>> Mikael
>>>> 
>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>>>>> 
>>>>> It will still work without est_forward.  It just works a little differently.  Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome.
>>>>> 
>>>>> If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline.  Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well).  To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors).  The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match. 
>>>>> 
>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter).  MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly.  Also match parameters for exonerate will not be relaxed as they were with est_forward.
>>>>> 
>>>>> As you can see the behavior, is slightly different (because it?s an accidental feature).
>>>>> 
>>>>> Thanks,
>>>>> Carson
>>>>> 
>>>>> 
>>>>> 
>>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM
>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>> 
>>>>> That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? 
>>>>> 
>>>>> Mikael
>>>>> 
>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>> 
>>>>>> Yes.  That should work as well as an accidental feature.
>>>>>> 
>>>>>> --Carson 
>>>>>> 
>>>>>> Sent from my iPhone
>>>>>> 
>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se> wrote:
>>>>>> 
>>>>>>> Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Mikael
>>>>>>> 
>>>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>>> 
>>>>>>>> There is a way.  It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that.  The option won?t already be there so you?ll have to type it in.
>>>>>>>> 
>>>>>>>> There is also a feature designed to work with this option.  If you add tags to your fasta headers, those can be used to guide the mapping and naming.  For example, gene_id=<some_gene>  will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp  and just using maker_coor=chr1 will force it to only be mapped against chr1.
>>>>>>>> 
>>>>>>>> This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.
>>>>>>>> 
>>>>>>>> ?Carson
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> From: Shaun Jackman <sjackman at gmail.com>
>>>>>>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>>>>>>> To: <maker-devel at yandell-lab.org>
>>>>>>>> Subject: [maker-devel] Mapping gene names
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?
>>>>>>>> 
>>>>>>>> maker_opts.ctl
>>>>>>>> 
>>>>>>>> est=NC_123456.frn
>>>>>>>> protein=NC_123456.faa
>>>>>>>> est2genome=1
>>>>>>>> protein2genome=1
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Shaun
>>>>>>>> 
>>>>>>>> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>>>> _______________________________________________
>>>>>>>> maker-devel mailing list
>>>>>>>> maker-devel at box290.bluehost.com
>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> _______________________________________________ 
>>> maker-devel mailing list 
>>> maker-devel at box290.bluehost.com 
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140227/a927fc81/attachment-0002.html>

From mikael.durling at slu.se  Fri Feb 28 03:40:30 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Fri, 28 Feb 2014 10:40:30 +0000
Subject: [maker-devel] maker_coor behaviour
Message-ID: <8CA99854-CF5B-4533-B625-0EDD5DFFCE8B@slu.se>

Hi,

in a previous thread, the maker_coor feature for ETSs was mentioned. I have been trying it out, without using it for mapping gene names. I have placed these ESTs by other means, an thought the maker_coor feature would be a good use of this a priori knowledge. My major problem i try to solve is that I find that some ESTs where I know where they should be aligned, are not recruited to that position by maker?s blastn->exonerate method (I find them on other scaffolds). So I thought maker_coor with the est_forward behavior (as described) would be a good option to force my evidence onto the correct position, instead of ending up supporting or braking other models. However, as soon as I run with maker_coor tagged est sequences, no est2genome evidence appears in the final gff3 file. The blastn evidence is there when est_forward is disabled, but as expected, there is no blastn evidence when est_forward is turned on. It seems though as the evidence is used, as the QI lines indicate EST support for both splice sites as well as exon alignments, but I have no way to visualize and/or evaluate the congruence of evidence and models. Would it be possible to tweak Maker into outputting the est2genome alignments when est_forward/maker_coor is used? I couldn?t figure myself where in the code this was handled.

I could of course do my own exonerate alignments of these ESTs and feed them into maker as est_gff, but if maker already has the machinery to to this, I thought it would be a good idea to use it.

Thanks,
Mikael


From carsonhh at gmail.com  Fri Feb 28 07:09:09 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 28 Feb 2014 07:09:09 -0700
Subject: [maker-devel] maker_coor behaviour
Message-ID: <CF35E345.A60A%carsonhh@gmail.com>

I wouldn?t use those options for standard de novo annotation.  There are
really other more appropriate thing that should be used instead.  Both
maker_coor and est_forward are destined to be part of a separate tool that
will secretly just be calling MAKER, but will allow me to control what
other parameters MAKER sees to avoid certain logic incompatibilities that
make sense when mapping entire genes onto a new assembly, but not really
for de novo annotation using ESTs.

You should instead try modifying these options in the maker_bopts.ctl file
?>

pcov_blastn= #Blastn Percent Coverage Threhold EST-Genome Alignments
pid_blastn= #Blastn Percent Identity Threshold EST-Genome Aligments
eval_blastn= #Blastn eval cutoff
bit_blastn= #Blastn bit cutoff
depth_blastn= #Blastn depth cutoff (0 to disable cutoff). For trimming
high evidence overlap regions

en_score_limit= #Exonerate nucleotide percent of maximal score threshold


If either blastn or est2genome results disappear, it is because they don?t
meet one of these thresholds (blastn results that don?t meet the
thresholds but are borderline are kept if exonerate does meet the
thresholds, but if exonerate misses a threshold they will be thrown out).
That is whey the EST in question gets thrown out and it?s why the blastn
result disappears when you try and anchor it with maker_coor.

You can visualize everything with a browser when your done.  I still
recommend the old version of Apollo for this (it?s just easier).  You can
try and install it using the ?./Build apollo? option from the
.../maker/src/ directory, and it will be installed in
.../maker/exe/apollo.  It requires that you have apache ant installed to
do this.  Otherwise just download it from the GMOD source forge page and
install it manually.

Thanks,
Carson


On 2/28/14, 3:40 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
wrote:

>Hi,
>
>in a previous thread, the maker_coor feature for ETSs was mentioned. I
>have been trying it out, without using it for mapping gene names. I have
>placed these ESTs by other means, an thought the maker_coor feature would
>be a good use of this a priori knowledge. My major problem i try to solve
>is that I find that some ESTs where I know where they should be aligned,
>are not recruited to that position by maker?s blastn->exonerate method (I
>find them on other scaffolds). So I thought maker_coor with the
>est_forward behavior (as described) would be a good option to force my
>evidence onto the correct position, instead of ending up supporting or
>braking other models. However, as soon as I run with maker_coor tagged
>est sequences, no est2genome evidence appears in the final gff3 file. The
>blastn evidence is there when est_forward is disabled, but as expected,
>there is no blastn evidence when est_forward is turned on. It seems
>though as the evidence is used, as the QI lines indicate EST support for
>both splice sites as well as exon alignments, but I have no way to
>visualize and/or evaluate the congruence of evidence and models. Would it
>be possible to tweak Maker into outputting the est2genome alignments when
>est_forward/maker_coor is used? I couldn?t figure myself where in the
>code this was handled.
>
>I could of course do my own exonerate alignments of these ESTs and feed
>them into maker as est_gff, but if maker already has the machinery to to
>this, I thought it would be a good idea to use it.
>
>Thanks,
>Mikael
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From rbharris at uw.edu  Fri Feb 28 13:14:55 2014
From: rbharris at uw.edu (Rebecca Harris)
Date: Fri, 28 Feb 2014 12:14:55 -0800
Subject: [maker-devel] error in snap training
In-Reply-To: <16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com>
References: <CAESS276MjRUmto+9fkr68jRXBE9or4geWB-q4Oc5_qKsQOdnpA@mail.gmail.com>
	<16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com>
Message-ID: <CAESS277JnyDD48DQvpKtw_kDw1xqOnGR-Fiqu-PoOPaesO3Oug@mail.gmail.com>

Hi -

I tried this and ran cegma --genome on my original fasta file. I then tried
to use cegama2zff to convert, fathom, and forge. However, when I try to
generate new parameters with forge, I get the same error that I got when
trying to train SNAP without CEGMA: "ZOE ERROR (from forge): impossible
error5 KOG1342.20". Any suggestions would be great,
thanks!

Cheers,
Rebecca


On Tue, Feb 25, 2014 at 2:12 PM, Carson Holt <carsonhh at gmail.com> wrote:

> Make sure you are using 2.31,  and then try the maker2zff filters
> individually.  If the protein models are not working well, use CEGMA to
> generate models. It's from the same group as SNAP.  Use cegma2zff for the
> conversion.
>
> --Carson
>
> Sent from my iPhone
>
> > On Feb 25, 2014, at 2:49 PM, Rebecca Harris <rbharris at uw.edu> wrote:
> >
> > Hey -
> >
> > I'm trying to train SNAP and am running into errors. I don't have any
> EST evidence, just protein. My .gff file reports 10865 genes but when I run
> maker2zff  -c0 -e0 I get back empty genome files. When I run maker2zff -n,
> a ton of overlap_prev_exon errors get written to the screen and then with I
> get to the forge step I get an "impossible error5". Any help would be
> greatly appreciated.
> >
> > Thanks!
> > Rebecca
> > _______________________________________________
> > maker-devel mailing list
> > maker-devel at box290.bluehost.com
> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140228/4957d69e/attachment-0002.html>

From carsonhh at gmail.com  Fri Feb 28 13:22:12 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 28 Feb 2014 13:22:12 -0700
Subject: [maker-devel] error in snap training
In-Reply-To: <CAESS277JnyDD48DQvpKtw_kDw1xqOnGR-Fiqu-PoOPaesO3Oug@mail.gmail.com>
References: <CAESS276MjRUmto+9fkr68jRXBE9or4geWB-q4Oc5_qKsQOdnpA@mail.gmail.com>
	<16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com>
	<CAESS277JnyDD48DQvpKtw_kDw1xqOnGR-Fiqu-PoOPaesO3Oug@mail.gmail.com>
Message-ID: <CF363CE6.A6B6%carsonhh@gmail.com>

If it?s failing both ways I?m thinking this may be SNAP itself. Try these
two different versions of SNAP.

?> http://korflab.ucdavis.edu/Software/snap-2013-02-16.tar.gz
and 
?> http://korflab.ucdavis.edu/Software/snap-2013-11-29.tar.gz

If they both fail then contact the SNAP development group ?> korflab AT
ucdavis DOT edu

Thanks,
Carson


From:  Rebecca Harris <rbharris at uw.edu>
Date:  Friday, February 28, 2014 at 1:14 PM
To:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] error in snap training

Hi -

I tried this and ran cegma --genome on my original fasta file. I then tried
to use cegama2zff to convert, fathom, and forge. However, when I try to
generate new parameters with forge, I get the same error that I got when
trying to train SNAP without CEGMA: "ZOE ERROR (from forge): impossible
error5 KOG1342.20". Any suggestions would be great,
thanks!

Cheers,
Rebecca


On Tue, Feb 25, 2014 at 2:12 PM, Carson Holt <carsonhh at gmail.com> wrote:
> Make sure you are using 2.31,  and then try the maker2zff filters
> individually.  If the protein models are not working well, use CEGMA to
> generate models. It's from the same group as SNAP.  Use cegma2zff for the
> conversion.
> 
> --Carson
> 
> Sent from my iPhone
> 
>> > On Feb 25, 2014, at 2:49 PM, Rebecca Harris <rbharris at uw.edu> wrote:
>> >
>> > Hey -
>> >
>> > I'm trying to train SNAP and am running into errors. I don't have any EST
>> evidence, just protein. My .gff file reports 10865 genes but when I run
>> maker2zff  -c0 -e0 I get back empty genome files. When I run maker2zff -n, a
>> ton of overlap_prev_exon errors get written to the screen and then with I get
>> to the forge step I get an "impossible error5". Any help would be greatly
>> appreciated.
>> >
>> > Thanks!
>> > Rebecca
>> > _______________________________________________
>> > maker-devel mailing list
>> > maker-devel at box290.bluehost.com
>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140228/e77809ff/attachment-0002.html>

From darasappan at gmail.com  Mon Feb  3 09:31:16 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Mon, 3 Feb 2014 10:31:16 -0600
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
Message-ID: <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>

Hi Daniel,

I was able to check on some of those questions.

1. From trinity assembly: I started with 102000 contigs. I used  
trinotate to annotate proteins in this.

I ran maker on this data with est2genome set to 1. The output looks  
like this (most important parts on top):

     6653 gene
    46675 exon
  280534 protein_match
59934 CDS
     969 contig
  105388 expressed_sequence_match
   12584 five_prime_UTR
   78565 match
1401369 match_part
   10180 mRNA
   11545 three_prime_UTR

2. From cufflinks assembly: I started with 133380 entries (out of  
which there are 29,000 transcripts).  I used the protein sequences  
from trinity assembly.

I ran maker on this data with est2genome set to 1. The output looks  
like this:
      29 gene
      75 exon
  573659 protein_match
67 CDS
    1099 contig
  269298 expressed_sequence_match
      23 five_prime_UTR
  173844 match
2221846 match_part
      29 mRNA
      23 three_prime_UTR

The genes annotated using the trinity assembly is lower than expected,  
so I went the cufflinks route. I dont understand why when using the  
cufflinks transcripts, even less genes are being found.

3. Training SNAP:  I used the results of maker from 1 to train SNAP.   
I then used that training set to rerun maker:
snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
maker_mpi_withAlltrinity/snap/RHA.hmm
est2genome=0

And again I got results with no entries for gene, exon, CDS etc.
957 contig
   46555 expressed_sequence_match
   43651 match
  553633 match_part
  113738 protein_match

As I mentioned in another email, cegma results indicated that the  
genome was more than 90% complete. Any suggestions would be helpful.

Thank you
Dhivya


On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:

> Hi Dhivya,
>
> I think there a few numbers that could be helpful to understand  
> what's happening here.
>
> How many transcripts did Trinity assembly the RNA-seq data into?  
> Also, you had 29,000 transcripts from cufflinks, but fewer from  
> MAKER when you gave it the cufflinks data. How many transcripts did  
> MAKER identify with the cufflinks data? Did you still get more than  
> the 10,000 transcripts that you found with just the Trinity data?
>
> A key part of MAKER's approach to genome annotation that might be  
> affecting it's performance is that it only annotates a gene where  
> there is both evidence (like your RNA-seq data) and an ab-initio  
> prediction. If a prediction is unsupported by the evidence, then  
> MAKER won't annotate a gene and if evidence aligns where there's no  
> prediction, MAKER won't annotate a gene either. What ab-initio  
> predictors are you using and have they been trained specific genome?
>
> You can force MAKER to automatically promote evidence alignments to  
> a gene model by setting the est2genome option to 1, but that will  
> usually give you many false positives.
>
> Try rerunning it with either the Trinity data or the Cufflinks data  
> and with est2genome set to 1, and let us know how that affects the  
> MAKER results.
>
> Thanks,
> Daniel
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ________________________________________
> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of  
> dhivya arasappan [darasappan at gmail.com]
> Sent: Thursday, January 30, 2014 11:18 AM
> To: maker-devel at yandell-lab.org
> Subject: [maker-devel] maker annotation with cufflinks output
>
> Hello,
>
> I am trying to annotate a 200 mb plant genome for which I have a very
> good assembly.
>
> I tried to denovo assemble RNA-seq data using trinity and ran maker
> using my genome assembly and the trinity results.  I did not get as
> many transcripts as expected, around 10,000 transcripts.
>
> So, I decided to try a different approach.  I did a genome assisted
> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
> generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
> genome assembly and the cufflinks result.  I get much less number of
> transcripts as a result.
>
> If cufflinks found 29000 transcripts by mapping to the genome, I'm
> confused as to why maker is not finding the same.
>
> Any suggestions would be appreciated.
>
> Thanks
> Dhivya
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- 
> lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140203/f454f816/attachment-0003.html>

From rebzi87 at gmail.com  Tue Feb  4 15:29:41 2014
From: rebzi87 at gmail.com (Rebecca Harris)
Date: Tue, 4 Feb 2014 14:29:41 -0800
Subject: [maker-devel] maker output
Message-ID: <CAESS275oypPL7CUMF2QaV3MKxNtNtXppYdF3exxFQMKSA7Vqdw@mail.gmail.com>

Hi,

I'm running maker on a cluster and am having some problems with the run
ending prematurely. I would like to know if there is a straightforward way
to figure out whether maker has completed. I've tried: 1) counting the
number of run.log files in the datastore directly, and 2) counting the
instances of "FINISHED" in the master_datastore_index.log. These numbers
are inconsistent. I have 200,000 contigs in my fasta file - do I expect
200,000 run.log files? I've had to restart maker a few times - it appears
that maker is appending to the master_datastore_index.log, as I find
multiple instances of the same contig being finished.

Thanks!

Cheers,
Rebecca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140204/690873a4/attachment-0003.html>

From darasappan at gmail.com  Tue Feb  4 15:43:19 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Tue, 4 Feb 2014 16:43:19 -0600
Subject: [maker-devel] Fwd:  maker annotation with cufflinks output
References: <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
Message-ID: <EAFE0808-FDA7-49E5-8FD6-9AFD570DF20C@gmail.com>

Resending this since it didnt make it to the mailing list before.

>
> I was able to check on some of those questions.
>
> 1. From trinity assembly: I started with 102000 contigs. I used  
> trinotate to annotate proteins in this.
>
> I ran maker on this data with est2genome set to 1. The output looks  
> like this (most important parts on top):
>
>     6653 gene
>    46675 exon
>  280534 protein_match
> 59934 CDS
>     969 contig
>  105388 expressed_sequence_match
>   12584 five_prime_UTR
>   78565 match
> 1401369 match_part
>   10180 mRNA
>   11545 three_prime_UTR
>
> 2. From cufflinks assembly: I started with 133380 entries (out of  
> which there are 29,000 transcripts).  I used the protein sequences  
> from trinity assembly.
>
> I ran maker on this data with est2genome set to 1. The output looks  
> like this:
>      29 gene
>      75 exon
>  573659 protein_match
> 67 CDS
>    1099 contig
>  269298 expressed_sequence_match
>      23 five_prime_UTR
>  173844 match
> 2221846 match_part
>      29 mRNA
>      23 three_prime_UTR
>
> The genes annotated using the trinity assembly is lower than  
> expected, so I went the cufflinks route. I dont understand why when  
> using the cufflinks transcripts, even less genes are being found.
>
> 3. Training SNAP:  I used the results of maker from 1 to train  
> SNAP.  I then used that training set to rerun maker:
> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
> maker_mpi_withAlltrinity/snap/RHA.hmm
> est2genome=0
>
> And again I got results with no entries for gene, exon, CDS etc.
> 957 contig
>   46555 expressed_sequence_match
>   43651 match
>  553633 match_part
>  113738 protein_match
>
> As I mentioned in another email, cegma results indicated that the  
> genome was more than 90% complete. Any suggestions would be helpful.
>
> Thank you
> Dhivya
>
>
>
>
> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>
>> Hi Dhivya,
>>
>> I think there a few numbers that could be helpful to understand  
>> what's happening here.
>>
>> How many transcripts did Trinity assembly the RNA-seq data into?  
>> Also, you had 29,000 transcripts from cufflinks, but fewer from  
>> MAKER when you gave it the cufflinks data. How many transcripts did  
>> MAKER identify with the cufflinks data? Did you still get more than  
>> the 10,000 transcripts that you found with just the Trinity data?
>>
>> A key part of MAKER's approach to genome annotation that might be  
>> affecting it's performance is that it only annotates a gene where  
>> there is both evidence (like your RNA-seq data) and an ab-initio  
>> prediction. If a prediction is unsupported by the evidence, then  
>> MAKER won't annotate a gene and if evidence aligns where there's no  
>> prediction, MAKER won't annotate a gene either. What ab-initio  
>> predictors are you using and have they been trained specific genome?
>>
>> You can force MAKER to automatically promote evidence alignments to  
>> a gene model by setting the est2genome option to 1, but that will  
>> usually give you many false positives.
>>
>> Try rerunning it with either the Trinity data or the Cufflinks data  
>> and with est2genome set to 1, and let us know how that affects the  
>> MAKER results.
>>
>> Thanks,
>> Daniel
>>
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> ________________________________________
>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf  
>> of dhivya arasappan [darasappan at gmail.com]
>> Sent: Thursday, January 30, 2014 11:18 AM
>> To: maker-devel at yandell-lab.org
>> Subject: [maker-devel] maker annotation with cufflinks output
>>
>> Hello,
>>
>> I am trying to annotate a 200 mb plant genome for which I have a very
>> good assembly.
>>
>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>> using my genome assembly and the trinity results.  I did not get as
>> many transcripts as expected, around 10,000 transcripts.
>>
>> So, I decided to try a different approach.  I did a genome assisted
>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using  
>> my
>> genome assembly and the cufflinks result.  I get much less number of
>> transcripts as a result.
>>
>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>> confused as to why maker is not finding the same.
>>
>> Any suggestions would be appreciated.
>>
>> Thanks
>> Dhivya
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140204/b1755e26/attachment-0003.html>

From dence at genetics.utah.edu  Tue Feb  4 15:42:52 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Tue, 4 Feb 2014 22:42:52 +0000
Subject: [maker-devel] maker output
In-Reply-To: <CAESS275oypPL7CUMF2QaV3MKxNtNtXppYdF3exxFQMKSA7Vqdw@mail.gmail.com>
References: <CAESS275oypPL7CUMF2QaV3MKxNtNtXppYdF3exxFQMKSA7Vqdw@mail.gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D43E51@mxb2.hg.genetics.utah.edu>

Hi Rebecca, If you're looking at the master_datastore_index.log, then you're looking for lines with the "FINISHED" status. If you do a count on those (with "grep -c" for example), that will tell you how many contigs have finished.

If you have 200,000,000 contigs that you're trying to annotate, you might also consider settinng the "min_contig" parameter in the maker_opts.ctl file. This parameter sets a minimum length for a contig before MAKER tries to annotate it. Usually 5000 bp or larger is what you want. That will save you some time in the long run.

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Rebecca Harris [rebzi87 at gmail.com]
Sent: Tuesday, February 04, 2014 3:29 PM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] maker output

Hi,

I'm running maker on a cluster and am having some problems with the run ending prematurely. I would like to know if there is a straightforward way to figure out whether maker has completed. I've tried: 1) counting the number of run.log files in the datastore directly, and 2) counting the instances of "FINISHED" in the master_datastore_index.log. These numbers are inconsistent. I have 200,000 contigs in my fasta file - do I expect 200,000 run.log files? I've had to restart maker a few times - it appears that maker is appending to the master_datastore_index.log, as I find multiple instances of the same contig being finished.

Thanks!

Cheers,
Rebecca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140204/ce6b2734/attachment-0003.html>

From mikael.durling at slu.se  Tue Feb  4 15:49:46 2014
From: mikael.durling at slu.se (=?iso-8859-1?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Tue, 4 Feb 2014 22:49:46 +0000
Subject: [maker-devel] maker output
In-Reply-To: <CAESS275oypPL7CUMF2QaV3MKxNtNtXppYdF3exxFQMKSA7Vqdw@mail.gmail.com>
References: <CAESS275oypPL7CUMF2QaV3MKxNtNtXppYdF3exxFQMKSA7Vqdw@mail.gmail.com>
Message-ID: <D36EEC49-FC5A-4DB8-BF08-795103F1B485@slu.se>

> 4 feb 2014 kl. 23:32 skrev "Rebecca Harris" <rebzi87 at gmail.com>:
> 
> Hi,
> 
> I'm running maker on a cluster and am having some problems with the run ending prematurely. I would like to know if there is a straightforward way to figure out whether maker has completed. I've tried: 1) counting the number of run.log files in the datastore directly, and 2) counting the instances of "FINISHED" in the master_datastore_index.log.

This is usually what I do to check if maker has finished all scaffolds. There should be one FINISHED statement for each entry in the scata file. (It might be one for every scaffold longer than the gjven minimum length. 

> These numbers are inconsistent. I have 200,000 contigs in my fasta file - do I expect 200,000 run.log files? I've had to restart maker a few times - it appears that maker is appending to the master_datastore_index.log, as I find multiple instances of the same contig being finished. 

Run maker -daindex to rebuild the file if you like. The number of FINISHED should not change though

Mikael

> 
> Thanks!
> 
> Cheers,
> Rebecca
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Tue Feb  4 15:50:10 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 04 Feb 2014 15:50:10 -0700
Subject: [maker-devel] maker output
In-Reply-To: <CAESS275oypPL7CUMF2QaV3MKxNtNtXppYdF3exxFQMKSA7Vqdw@mail.gmail.com>
References: <CAESS275oypPL7CUMF2QaV3MKxNtNtXppYdF3exxFQMKSA7Vqdw@mail.gmail.com>
Message-ID: <CF16BBC3.9807%carsonhh@gmail.com>

Clusters are notoriously flakey, so maker is restartable (hence the need for
the log file).  Also since multiple nodes may write simultaneously to the
log, they can munge it?s contents.   You can rerun maker with the -dsindex
flag to regenerate the master_datastore_index.log as well without processing
anything else. You can even delete it before rebuilding it if you want to
ensure all entries are uniq (run on a single cpus when you do this).

Then count the number of FINISHED entries in the log.

Thanks,
Carson


From:  Rebecca Harris <rebzi87 at gmail.com>
Date:  Tuesday, February 4, 2014 at 3:29 PM
To:  <maker-devel at yandell-lab.org>
Subject:  [maker-devel] maker output

Hi,

I'm running maker on a cluster and am having some problems with the run
ending prematurely. I would like to know if there is a straightforward way
to figure out whether maker has completed. I've tried: 1) counting the
number of run.log files in the datastore directly, and 2) counting the
instances of "FINISHED" in the master_datastore_index.log. These numbers are
inconsistent. I have 200,000 contigs in my fasta file - do I expect 200,000
run.log files? I've had to restart maker a few times - it appears that maker
is appending to the master_datastore_index.log, as I find multiple instances
of the same contig being finished.

Thanks!

Cheers,
Rebecca
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140204/9fedef33/attachment-0003.html>

From carsonhh at gmail.com  Wed Feb  5 11:38:50 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 05 Feb 2014 11:38:50 -0700
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
Message-ID: <CF17D1FC.987A%carsonhh@gmail.com>

Do you have any features of type snap in your results from step 3?  We?ve
had a couple of recent posts where after training snap was giving no
results, and as a result maker couldn?t give any genes.  One cause of
something like that may be your step 2.  Make sure the ZFF wasn?t empty you
used to train with.  The maker2zff script uses filters to only put the best
genes in the off file, and if all your genes fail the filtering then you are
training with an empty ZFF.

Also you should use proteins from a related species as your protein file.  I
see that you protein marches are varying wildly from run to run? So is your
contig count?  Were the subset of contigs you have results for long enough
to contain genes?

?Carson

From:  dhivya arasappan <darasappan at gmail.com>
Date:  Monday, February 3, 2014 at 9:31 AM
To:  Daniel Ence <dence at genetics.utah.edu>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] maker annotation with cufflinks output

Hi Daniel,

I was able to check on some of those questions.

1. From trinity assembly: I started with 102000 contigs. I used trinotate to
annotate proteins in this.

I ran maker on this data with est2genome set to 1. The output looks like
this (most important parts on top):

    6653 gene
   46675 exon
 280534 protein_match
59934 CDS
    969 contig
 105388 expressed_sequence_match
  12584 five_prime_UTR
  78565 match
1401369 match_part
  10180 mRNA
  11545 three_prime_UTR

2. From cufflinks assembly: I started with 133380 entries (out of which
there are 29,000 transcripts).  I used the protein sequences from trinity
assembly.

I ran maker on this data with est2genome set to 1. The output looks like
this:
     29 gene
     75 exon
 573659 protein_match
67 CDS
   1099 contig
 269298 expressed_sequence_match
     23 five_prime_UTR
 173844 match
2221846 match_part
     29 mRNA
     23 three_prime_UTR

The genes annotated using the trinity assembly is lower than expected, so I
went the cufflinks route. I dont understand why when using the cufflinks
transcripts, even less genes are being found.

3. Training SNAP:  I used the results of maker from 1 to train SNAP.  I then
used that training set to rerun maker:
snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/sn
ap/RHA.hmm
est2genome=0

And again I got results with no entries for gene, exon, CDS etc.
957 contig
  46555 expressed_sequence_match
  43651 match
 553633 match_part
 113738 protein_match

As I mentioned in another email, cegma results indicated that the genome was
more than 90% complete. Any suggestions would be helpful.

Thank you
Dhivya


On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:

> Hi Dhivya, 
> 
> I think there a few numbers that could be helpful to understand what's
> happening here. 
> 
> How many transcripts did Trinity assembly the RNA-seq data into? Also, you had
> 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it the
> cufflinks data. How many transcripts did MAKER identify with the cufflinks
> data? Did you still get more than the 10,000 transcripts that you found with
> just the Trinity data?
> 
> A key part of MAKER's approach to genome annotation that might be affecting
> it's performance is that it only annotates a gene where there is both evidence
> (like your RNA-seq data) and an ab-initio prediction. If a prediction is
> unsupported by the evidence, then MAKER won't annotate a gene and if evidence
> aligns where there's no prediction, MAKER won't annotate a gene either. What
> ab-initio predictors are you using and have they been trained specific genome?
> 
> You can force MAKER to automatically promote evidence alignments to a gene
> model by setting the est2genome option to 1, but that will usually give you
> many false positives.
> 
> Try rerunning it with either the Trinity data or the Cufflinks data and with
> est2genome set to 1, and let us know how that affects the MAKER results.
> 
> Thanks,
> Daniel
> 
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ________________________________________
> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya
> arasappan [darasappan at gmail.com]
> Sent: Thursday, January 30, 2014 11:18 AM
> To: maker-devel at yandell-lab.org
> Subject: [maker-devel] maker annotation with cufflinks output
> 
> Hello,
> 
> I am trying to annotate a 200 mb plant genome for which I have a very
> good assembly.
> 
> I tried to denovo assemble RNA-seq data using trinity and ran maker
> using my genome assembly and the trinity results.  I did not get as
> many transcripts as expected, around 10,000 transcripts.
> 
> So, I decided to try a different approach.  I did a genome assisted
> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
> generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
> genome assembly and the cufflinks result.  I get much less number of
> transcripts as a result.
> 
> If cufflinks found 29000 transcripts by mapping to the genome, I'm
> confused as to why maker is not finding the same.
> 
> Any suggestions would be appreciated.
> 
> Thanks
> Dhivya
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140205/2bbca2c5/attachment-0003.html>

From dence at genetics.utah.edu  Wed Feb  5 12:28:48 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Wed, 5 Feb 2014 19:28:48 +0000
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <CF17D1FC.987A%carsonhh@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>,
	<CF17D1FC.987A%carsonhh@gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D43F95@mxb2.hg.genetics.utah.edu>

Hi Dhivya, Are the protein matches in your results coming from your annotations of the transcriptome? You should really use amino-acid sequences from related organisms and some kind of omnibus source like SwissProt.

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: Carson Holt [carsonhh at gmail.com]
Sent: Wednesday, February 05, 2014 11:38 AM
To: dhivya arasappan; Daniel Ence
Cc: maker-devel at yandell-lab.org
Subject: Re: [maker-devel] maker annotation with cufflinks output

Do you have any features of type snap in your results from step 3?  We?ve had a couple of recent posts where after training snap was giving no results, and as a result maker couldn?t give any genes.  One cause of something like that may be your step 2.  Make sure the ZFF wasn?t empty you used to train with.  The maker2zff script uses filters to only put the best genes in the off file, and if all your genes fail the filtering then you are training with an empty ZFF.

Also you should use proteins from a related species as your protein file.  I see that you protein marches are varying wildly from run to run? So is your contig count?  Were the subset of contigs you have results for long enough to contain genes?

?Carson

From: dhivya arasappan <darasappan at gmail.com<mailto:darasappan at gmail.com>>
Date: Monday, February 3, 2014 at 9:31 AM
To: Daniel Ence <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] maker annotation with cufflinks output

Hi Daniel,

I was able to check on some of those questions.

1. From trinity assembly: I started with 102000 contigs. I used trinotate to annotate proteins in this.

I ran maker on this data with est2genome set to 1. The output looks like this (most important parts on top):

    6653 gene
   46675 exon
 280534 protein_match
59934 CDS
    969 contig
 105388 expressed_sequence_match
  12584 five_prime_UTR
  78565 match
1401369 match_part
  10180 mRNA
  11545 three_prime_UTR

2. From cufflinks assembly: I started with 133380 entries (out of which there are 29,000 transcripts).  I used the protein sequences from trinity assembly.

I ran maker on this data with est2genome set to 1. The output looks like this:
     29 gene
     75 exon
 573659 protein_match
67 CDS
   1099 contig
 269298 expressed_sequence_match
     23 five_prime_UTR
 173844 match
2221846 match_part
     29 mRNA
     23 three_prime_UTR

The genes annotated using the trinity assembly is lower than expected, so I went the cufflinks route. I dont understand why when using the cufflinks transcripts, even less genes are being found.

3. Training SNAP:  I used the results of maker from 1 to train SNAP.  I then used that training set to rerun maker:
snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap/RHA.hmm
est2genome=0

And again I got results with no entries for gene, exon, CDS etc.
957 contig
  46555 expressed_sequence_match
  43651 match
 553633 match_part
 113738 protein_match

As I mentioned in another email, cegma results indicated that the genome was more than 90% complete. Any suggestions would be helpful.

Thank you
Dhivya


On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:

Hi Dhivya,

I think there a few numbers that could be helpful to understand what's happening here.

How many transcripts did Trinity assembly the RNA-seq data into? Also, you had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it the cufflinks data. How many transcripts did MAKER identify with the cufflinks data? Did you still get more than the 10,000 transcripts that you found with just the Trinity data?

A key part of MAKER's approach to genome annotation that might be affecting it's performance is that it only annotates a gene where there is both evidence (like your RNA-seq data) and an ab-initio prediction. If a prediction is unsupported by the evidence, then MAKER won't annotate a gene and if evidence aligns where there's no prediction, MAKER won't annotate a gene either. What ab-initio predictors are you using and have they been trained specific genome?

You can force MAKER to automatically promote evidence alignments to a gene model by setting the est2genome option to 1, but that will usually give you many false positives.

Try rerunning it with either the Trinity data or the Cufflinks data and with est2genome set to 1, and let us know how that affects the MAKER results.

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org>] on behalf of dhivya arasappan [darasappan at gmail.com<mailto:darasappan at gmail.com>]
Sent: Thursday, January 30, 2014 11:18 AM
To: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
Subject: [maker-devel] maker annotation with cufflinks output

Hello,

I am trying to annotate a 200 mb plant genome for which I have a very
good assembly.

I tried to denovo assemble RNA-seq data using trinity and ran maker
using my genome assembly and the trinity results.  I did not get as
many transcripts as expected, around 10,000 transcripts.

So, I decided to try a different approach.  I did a genome assisted
assembly of the RNA-seq data using tophat/cufflinks. This pipeline
generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
genome assembly and the cufflinks result.  I get much less number of
transcripts as a result.

If cufflinks found 29000 transcripts by mapping to the genome, I'm
confused as to why maker is not finding the same.

Any suggestions would be appreciated.

Thanks
Dhivya


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140205/98e0f3f4/attachment-0003.html>

From darasappan at gmail.com  Wed Feb  5 13:13:57 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Wed, 5 Feb 2014 14:13:57 -0600
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D43F95@mxb2.hg.genetics.utah.edu>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>,
	<CF17D1FC.987A%carsonhh@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43F95@mxb2.hg.genetics.utah.edu>
Message-ID: <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com>

Hello Daniel and Carson,

Thanks for your replies.

Yes I used the the protein sequences resulting from annotation of  
trinity assembly (using trinotate).  I'll try using protein sequences  
from related species (though there arent sequences from closely  
related orgs).  Could you tell me a little about why protein data from  
annotating my rnaseq data would not work best here?

Thanks
Dhivya

On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote:

> Hi Dhivya, Are the protein matches in your results coming from your  
> annotations of the transcriptome? You should really use amino-acid  
> sequences from related organisms and some kind of omnibus source  
> like SwissProt.
>
> Thanks,
> Daniel
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> From: Carson Holt [carsonhh at gmail.com]
> Sent: Wednesday, February 05, 2014 11:38 AM
> To: dhivya arasappan; Daniel Ence
> Cc: maker-devel at yandell-lab.org
> Subject: Re: [maker-devel] maker annotation with cufflinks output
>
> Do you have any features of type snap in your results from step 3?   
> We?ve had a couple of recent posts where after training snap was  
> giving no results, and as a result maker couldn?t give any genes.   
> One cause of something like that may be your step 2.  Make sure the  
> ZFF wasn?t empty you used to train with.  The maker2zff script uses  
> filters to only put the best genes in the off file, and if all your  
> genes fail the filtering then you are training with an empty ZFF.
>
> Also you should use proteins from a related species as your protein  
> file.  I see that you protein marches are varying wildly from run to  
> run? So is your contig count?  Were the subset of contigs you have  
> results for long enough to contain genes?
>
> ?Carson
>
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Monday, February 3, 2014 at 9:31 AM
> To: Daniel Ence <dence at genetics.utah.edu>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] maker annotation with cufflinks output
>
> Hi Daniel,
>
> I was able to check on some of those questions.
>
> 1. From trinity assembly: I started with 102000 contigs. I used  
> trinotate to annotate proteins in this.
>
> I ran maker on this data with est2genome set to 1. The output looks  
> like this (most important parts on top):
>
>     6653 gene
>    46675 exon
>  280534 protein_match
> 59934 CDS
>     969 contig
>  105388 expressed_sequence_match
>   12584 five_prime_UTR
>   78565 match
> 1401369 match_part
>   10180 mRNA
>   11545 three_prime_UTR
>
> 2. From cufflinks assembly: I started with 133380 entries (out of  
> which there are 29,000 transcripts).  I used the protein sequences  
> from trinity assembly.
>
> I ran maker on this data with est2genome set to 1. The output looks  
> like this:
>      29 gene
>      75 exon
>  573659 protein_match
> 67 CDS
>    1099 contig
>  269298 expressed_sequence_match
>      23 five_prime_UTR
>  173844 match
> 2221846 match_part
>      29 mRNA
>      23 three_prime_UTR
>
> The genes annotated using the trinity assembly is lower than  
> expected, so I went the cufflinks route. I dont understand why when  
> using the cufflinks transcripts, even less genes are being found.
>
> 3. Training SNAP:  I used the results of maker from 1 to train  
> SNAP.  I then used that training set to rerun maker:
> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
> maker_mpi_withAlltrinity/snap/RHA.hmm
> est2genome=0
>
> And again I got results with no entries for gene, exon, CDS etc.
> 957 contig
>   46555 expressed_sequence_match
>   43651 match
>  553633 match_part
>  113738 protein_match
>
> As I mentioned in another email, cegma results indicated that the  
> genome was more than 90% complete. Any suggestions would be helpful.
>
> Thank you
> Dhivya
>
>
>
>
> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>
>> Hi Dhivya,
>>
>> I think there a few numbers that could be helpful to understand  
>> what's happening here.
>>
>> How many transcripts did Trinity assembly the RNA-seq data into?  
>> Also, you had 29,000 transcripts from cufflinks, but fewer from  
>> MAKER when you gave it the cufflinks data. How many transcripts did  
>> MAKER identify with the cufflinks data? Did you still get more than  
>> the 10,000 transcripts that you found with just the Trinity data?
>>
>> A key part of MAKER's approach to genome annotation that might be  
>> affecting it's performance is that it only annotates a gene where  
>> there is both evidence (like your RNA-seq data) and an ab-initio  
>> prediction. If a prediction is unsupported by the evidence, then  
>> MAKER won't annotate a gene and if evidence aligns where there's no  
>> prediction, MAKER won't annotate a gene either. What ab-initio  
>> predictors are you using and have they been trained specific genome?
>>
>> You can force MAKER to automatically promote evidence alignments to  
>> a gene model by setting the est2genome option to 1, but that will  
>> usually give you many false positives.
>>
>> Try rerunning it with either the Trinity data or the Cufflinks data  
>> and with est2genome set to 1, and let us know how that affects the  
>> MAKER results.
>>
>> Thanks,
>> Daniel
>>
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> ________________________________________
>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf  
>> of dhivya arasappan [darasappan at gmail.com]
>> Sent: Thursday, January 30, 2014 11:18 AM
>> To: maker-devel at yandell-lab.org
>> Subject: [maker-devel] maker annotation with cufflinks output
>>
>> Hello,
>>
>> I am trying to annotate a 200 mb plant genome for which I have a very
>> good assembly.
>>
>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>> using my genome assembly and the trinity results.  I did not get as
>> many transcripts as expected, around 10,000 transcripts.
>>
>> So, I decided to try a different approach.  I did a genome assisted
>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using  
>> my
>> genome assembly and the cufflinks result.  I get much less number of
>> transcripts as a result.
>>
>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>> confused as to why maker is not finding the same.
>>
>> Any suggestions would be appreciated.
>>
>> Thanks
>> Dhivya
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
> _______________________________________________ maker-devel mailing  
> list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140205/44820157/attachment-0003.html>

From dence at genetics.utah.edu  Wed Feb  5 13:36:26 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Wed, 5 Feb 2014 20:36:26 +0000
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>,
	<CF17D1FC.987A%carsonhh@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43F95@mxb2.hg.genetics.utah.edu>,
	<4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D43FB4@mxb2.hg.genetics.utah.edu>

Hi Dhivya,

In genome annotation, often you want to use as many sources for evidence as is reasonable, but those sources should be distinct.  It will confuse downstream annotation efforts if your protein evidence is actually based on the RNA-seq data.

Using the trinotate results for protein evidence here restricts you first to the proteins coded by the transcripts in the RNA-seq data, which may be incomplete, and secondly to the proteins that trinotate could annotate from among the transcripts.

The problem that Carson mentioned with the SNAP HMM file is a real possibility also.

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: dhivya arasappan [darasappan at gmail.com]
Sent: Wednesday, February 05, 2014 1:13 PM
To: Daniel Ence
Cc: Carson Holt; maker-devel at yandell-lab.org
Subject: Re: [maker-devel] maker annotation with cufflinks output

Hello Daniel and Carson,

Thanks for your replies.

Yes I used the the protein sequences resulting from annotation of trinity assembly (using trinotate).  I'll try using protein sequences from related species (though there arent sequences from closely related orgs).  Could you tell me a little about why protein data from annotating my rnaseq data would not work best here?

Thanks
Dhivya

On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote:

Hi Dhivya, Are the protein matches in your results coming from your annotations of the transcriptome? You should really use amino-acid sequences from related organisms and some kind of omnibus source like SwissProt.

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: Carson Holt [carsonhh at gmail.com<mailto:carsonhh at gmail.com>]
Sent: Wednesday, February 05, 2014 11:38 AM
To: dhivya arasappan; Daniel Ence
Cc: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] maker annotation with cufflinks output

Do you have any features of type snap in your results from step 3?  We?ve had a couple of recent posts where after training snap was giving no results, and as a result maker couldn?t give any genes.  One cause of something like that may be your step 2.  Make sure the ZFF wasn?t empty you used to train with.  The maker2zff script uses filters to only put the best genes in the off file, and if all your genes fail the filtering then you are training with an empty ZFF.

Also you should use proteins from a related species as your protein file.  I see that you protein marches are varying wildly from run to run? So is your contig count?  Were the subset of contigs you have results for long enough to contain genes?

?Carson

From: dhivya arasappan <darasappan at gmail.com<mailto:darasappan at gmail.com>>
Date: Monday, February 3, 2014 at 9:31 AM
To: Daniel Ence <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] maker annotation with cufflinks output

Hi Daniel,

I was able to check on some of those questions.

1. From trinity assembly: I started with 102000 contigs. I used trinotate to annotate proteins in this.

I ran maker on this data with est2genome set to 1. The output looks like this (most important parts on top):

    6653 gene
   46675 exon
 280534 protein_match
59934 CDS
    969 contig
 105388 expressed_sequence_match
  12584 five_prime_UTR
  78565 match
1401369 match_part
  10180 mRNA
  11545 three_prime_UTR

2. From cufflinks assembly: I started with 133380 entries (out of which there are 29,000 transcripts).  I used the protein sequences from trinity assembly.

I ran maker on this data with est2genome set to 1. The output looks like this:
     29 gene
     75 exon
 573659 protein_match
67 CDS
   1099 contig
 269298 expressed_sequence_match
     23 five_prime_UTR
 173844 match
2221846 match_part
     29 mRNA
     23 three_prime_UTR

The genes annotated using the trinity assembly is lower than expected, so I went the cufflinks route. I dont understand why when using the cufflinks transcripts, even less genes are being found.

3. Training SNAP:  I used the results of maker from 1 to train SNAP.  I then used that training set to rerun maker:
snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap/RHA.hmm
est2genome=0

And again I got results with no entries for gene, exon, CDS etc.
957 contig
  46555 expressed_sequence_match
  43651 match
 553633 match_part
 113738 protein_match

As I mentioned in another email, cegma results indicated that the genome was more than 90% complete. Any suggestions would be helpful.

Thank you
Dhivya


On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:

Hi Dhivya,

I think there a few numbers that could be helpful to understand what's happening here.

How many transcripts did Trinity assembly the RNA-seq data into? Also, you had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it the cufflinks data. How many transcripts did MAKER identify with the cufflinks data? Did you still get more than the 10,000 transcripts that you found with just the Trinity data?

A key part of MAKER's approach to genome annotation that might be affecting it's performance is that it only annotates a gene where there is both evidence (like your RNA-seq data) and an ab-initio prediction. If a prediction is unsupported by the evidence, then MAKER won't annotate a gene and if evidence aligns where there's no prediction, MAKER won't annotate a gene either. What ab-initio predictors are you using and have they been trained specific genome?

You can force MAKER to automatically promote evidence alignments to a gene model by setting the est2genome option to 1, but that will usually give you many false positives.

Try rerunning it with either the Trinity data or the Cufflinks data and with est2genome set to 1, and let us know how that affects the MAKER results.

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org>] on behalf of dhivya arasappan [darasappan at gmail.com<mailto:darasappan at gmail.com>]
Sent: Thursday, January 30, 2014 11:18 AM
To: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
Subject: [maker-devel] maker annotation with cufflinks output

Hello,

I am trying to annotate a 200 mb plant genome for which I have a very
good assembly.

I tried to denovo assemble RNA-seq data using trinity and ran maker
using my genome assembly and the trinity results.  I did not get as
many transcripts as expected, around 10,000 transcripts.

So, I decided to try a different approach.  I did a genome assisted
assembly of the RNA-seq data using tophat/cufflinks. This pipeline
generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
genome assembly and the cufflinks result.  I get much less number of
transcripts as a result.

If cufflinks found 29000 transcripts by mapping to the genome, I'm
confused as to why maker is not finding the same.

Any suggestions would be appreciated.

Thanks
Dhivya


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140205/36c41e54/attachment-0003.html>

From carsonhh at gmail.com  Wed Feb  5 13:38:44 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 05 Feb 2014 13:38:44 -0700
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43F95@mxb2.hg.genetics.utah.edu>
	<4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com>
Message-ID: <CF17E9B9.9892%carsonhh@gmail.com>

Protein data doesn?t have to be from that closely a related species.  This
is because genes maintain homology at the amino acid level across even very
large evolutionary distances.  Having a closer related species just ensures
that genome contents are similar (fewer losses/gains relative to each
other). And use the entire proteome of at least one related species (just
using a database like swiss-prot is not sufficient).

Using translated mRNA-seq data will not give you any new information that
was not already available from the untranslated sequence.  Plus it will
introduce the complicating artifacts that mRNA-seq generates into the
protein part of the pipeline (gene merging, incorrect assembly, and false
calls caused by background transcription).  A big gotcha with mRNA-seq is
that all of your genome gets transcribed at a low level, not just the genes,
so you will always have contamination that does not represent real gene
models.  Also in the end you really only expect to capture about 50% of the
genes with mRNA-seq (maybe 70% if you are fortunate - and most of those will
be partial). So using the proteins from another species, is important to
improve sensitivity, and fix many of the issues that arise from the noisy
nature of mRNA-seq.  In fact if you were forced to use only one (either
protein evidence or mRNA-seq) you will actually get better annotations from
the protein evidence in most cases. You get better annotations when you use
both, but if using only one of them, the proteins from another species are
better, and noisy mRNA-seq will be the primary source of annotation error.

Thanks,
Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Wednesday, February 5, 2014 at 1:13 PM
To:  Daniel Ence <dence at genetics.utah.edu>
Cc:  Carson Holt <carsonhh at gmail.com>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] maker annotation with cufflinks output

Hello Daniel and Carson,

Thanks for your replies.

Yes I used the the protein sequences resulting from annotation of trinity
assembly (using trinotate).  I'll try using protein sequences from related
species (though there arent sequences from closely related orgs).  Could you
tell me a little about why protein data from annotating my rnaseq data would
not work best here?

Thanks
Dhivya
 
On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote:

> Hi Dhivya, Are the protein matches in your results coming from your
> annotations of the transcriptome? You should really use amino-acid sequences
> from related organisms and some kind of omnibus source like SwissProt.
> 
> Thanks,
> Daniel
> 
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> 
> From: Carson Holt [carsonhh at gmail.com]
> Sent: Wednesday, February 05, 2014 11:38 AM
> To: dhivya arasappan; Daniel Ence
> Cc: maker-devel at yandell-lab.org
> Subject: Re: [maker-devel] maker annotation with cufflinks output
> 
> Do you have any features of type snap in your results from step 3?  We?ve had
> a couple of recent posts where after training snap was giving no results, and
> as a result maker couldn?t give any genes.  One cause of something like that
> may be your step 2.  Make sure the ZFF wasn?t empty you used to train with.
> The maker2zff script uses filters to only put the best genes in the off file,
> and if all your genes fail the filtering then you are training with an empty
> ZFF.
> 
> Also you should use proteins from a related species as your protein file.  I
> see that you protein marches are varying wildly from run to run? So is your
> contig count?  Were the subset of contigs you have results for long enough to
> contain genes?
> 
> ?Carson
> 
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Monday, February 3, 2014 at 9:31 AM
> To: Daniel Ence <dence at genetics.utah.edu>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] maker annotation with cufflinks output
> 
> Hi Daniel,
> 
> I was able to check on some of those questions.
> 
> 1. From trinity assembly: I started with 102000 contigs. I used trinotate to
> annotate proteins in this.
> 
> I ran maker on this data with est2genome set to 1. The output looks like this
> (most important parts on top):
> 
>     6653 gene
>    46675 exon
>  280534 protein_match
> 59934 CDS
>     969 contig
>  105388 expressed_sequence_match
>   12584 five_prime_UTR
>   78565 match
> 1401369 match_part
>   10180 mRNA
>   11545 three_prime_UTR
> 
> 2. From cufflinks assembly: I started with 133380 entries (out of which there
> are 29,000 transcripts).  I used the protein sequences from trinity assembly.
> 
> I ran maker on this data with est2genome set to 1. The output looks like this:
>      29 gene
>      75 exon
>  573659 protein_match
> 67 CDS
>    1099 contig
>  269298 expressed_sequence_match
>      23 five_prime_UTR
>  173844 match
> 2221846 match_part
>      29 mRNA
>      23 three_prime_UTR
> 
> The genes annotated using the trinity assembly is lower than expected, so I
> went the cufflinks route. I dont understand why when using the cufflinks
> transcripts, even less genes are being found.
> 
> 3. Training SNAP:  I used the results of maker from 1 to train SNAP.  I then
> used that training set to rerun maker:
> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap
> /RHA.hmm
> est2genome=0
> 
> And again I got results with no entries for gene, exon, CDS etc.
> 957 contig
>   46555 expressed_sequence_match
>   43651 match
>  553633 match_part
>  113738 protein_match
> 
> As I mentioned in another email, cegma results indicated that the genome was
> more than 90% complete. Any suggestions would be helpful.
> 
> Thank you
> Dhivya
> 
> 
> 
> 
> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
> 
>> Hi Dhivya, 
>> 
>> I think there a few numbers that could be helpful to understand what's
>> happening here. 
>> 
>> How many transcripts did Trinity assembly the RNA-seq data into? Also, you
>> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it
>> the cufflinks data. How many transcripts did MAKER identify with the
>> cufflinks data? Did you still get more than the 10,000 transcripts that you
>> found with just the Trinity data?
>> 
>> A key part of MAKER's approach to genome annotation that might be affecting
>> it's performance is that it only annotates a gene where there is both
>> evidence (like your RNA-seq data) and an ab-initio prediction. If a
>> prediction is unsupported by the evidence, then MAKER won't annotate a gene
>> and if evidence aligns where there's no prediction, MAKER won't annotate a
>> gene either. What ab-initio predictors are you using and have they been
>> trained specific genome?
>> 
>> You can force MAKER to automatically promote evidence alignments to a gene
>> model by setting the est2genome option to 1, but that will usually give you
>> many false positives.
>> 
>> Try rerunning it with either the Trinity data or the Cufflinks data and with
>> est2genome set to 1, and let us know how that affects the MAKER results.
>> 
>> Thanks,
>> Daniel
>> 
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> ________________________________________
>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya
>> arasappan [darasappan at gmail.com]
>> Sent: Thursday, January 30, 2014 11:18 AM
>> To: maker-devel at yandell-lab.org
>> Subject: [maker-devel] maker annotation with cufflinks output
>> 
>> Hello,
>> 
>> I am trying to annotate a 200 mb plant genome for which I have a very
>> good assembly.
>> 
>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>> using my genome assembly and the trinity results.  I did not get as
>> many transcripts as expected, around 10,000 transcripts.
>> 
>> So, I decided to try a different approach.  I did a genome assisted
>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
>> genome assembly and the cufflinks result.  I get much less number of
>> transcripts as a result.
>> 
>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>> confused as to why maker is not finding the same.
>> 
>> Any suggestions would be appreciated.
>> 
>> Thanks
>> Dhivya
>> 
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140205/422a18ff/attachment-0003.html>

From darasappan at gmail.com  Wed Feb  5 22:16:43 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Wed, 5 Feb 2014 23:16:43 -0600
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <CF17E9B9.9892%carsonhh@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43F95@mxb2.hg.genetics.utah.edu>
	<4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com>
	<CF17E9B9.9892%carsonhh@gmail.com>
Message-ID: <1188173E-53C1-4FFE-B790-B710C3A55B86@gmail.com>

Thank you both for those explanations. I'll get back to you after I  
try rerunning maker.

Dhivya

On Feb 5, 2014, at 2:38 PM, Carson Holt wrote:

> Protein data doesn?t have to be from that closely a related  
> species.  This is because genes maintain homology at the amino acid  
> level across even very large evolutionary distances.  Having a  
> closer related species just ensures that genome contents are similar  
> (fewer losses/gains relative to each other). And use the entire  
> proteome of at least one related species (just using a database like  
> swiss-prot is not sufficient).
>
> Using translated mRNA-seq data will not give you any new information  
> that was not already available from the untranslated sequence.  Plus  
> it will introduce the complicating artifacts that mRNA-seq generates  
> into the protein part of the pipeline (gene merging, incorrect  
> assembly, and false calls caused by background transcription).  A  
> big gotcha with mRNA-seq is that all of your genome gets transcribed  
> at a low level, not just the genes, so you will always have  
> contamination that does not represent real gene models.  Also in the  
> end you really only expect to capture about 50% of the genes with  
> mRNA-seq (maybe 70% if you are fortunate - and most of those will be  
> partial). So using the proteins from another species, is important  
> to improve sensitivity, and fix many of the issues that arise from  
> the noisy nature of mRNA-seq.  In fact if you were forced to use  
> only one (either protein evidence or mRNA-seq) you will actually get  
> better annotations from the protein evidence in most cases. You get  
> better annotations when you use both, but if using only one of them,  
> the proteins from another species are better, and noisy mRNA-seq  
> will be the primary source of annotation error.
>
> Thanks,
> Carson
>
>
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Wednesday, February 5, 2014 at 1:13 PM
> To: Daniel Ence <dence at genetics.utah.edu>
> Cc: Carson Holt <carsonhh at gmail.com>, "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org 
> >
> Subject: Re: [maker-devel] maker annotation with cufflinks output
>
> Hello Daniel and Carson,
>
> Thanks for your replies.
>
> Yes I used the the protein sequences resulting from annotation of  
> trinity assembly (using trinotate).  I'll try using protein  
> sequences from related species (though there arent sequences from  
> closely related orgs).  Could you tell me a little about why protein  
> data from annotating my rnaseq data would not work best here?
>
> Thanks
> Dhivya
>
> On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote:
>
>> Hi Dhivya, Are the protein matches in your results coming from your  
>> annotations of the transcriptome? You should really use amino-acid  
>> sequences from related organisms and some kind of omnibus source  
>> like SwissProt.
>>
>> Thanks,
>> Daniel
>>
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> From: Carson Holt [carsonhh at gmail.com]
>> Sent: Wednesday, February 05, 2014 11:38 AM
>> To: dhivya arasappan; Daniel Ence
>> Cc: maker-devel at yandell-lab.org
>> Subject: Re: [maker-devel] maker annotation with cufflinks output
>>
>> Do you have any features of type snap in your results from step 3?   
>> We?ve had a couple of recent posts where after training snap was  
>> giving no results, and as a result maker couldn?t give any genes.   
>> One cause of something like that may be your step 2.  Make sure the  
>> ZFF wasn?t empty you used to train with.  The maker2zff script uses  
>> filters to only put the best genes in the off file, and if all your  
>> genes fail the filtering then you are training with an empty ZFF.
>>
>> Also you should use proteins from a related species as your protein  
>> file.  I see that you protein marches are varying wildly from run  
>> to run? So is your contig count?  Were the subset of contigs you  
>> have results for long enough to contain genes?
>>
>> ?Carson
>>
>> From: dhivya arasappan <darasappan at gmail.com>
>> Date: Monday, February 3, 2014 at 9:31 AM
>> To: Daniel Ence <dence at genetics.utah.edu>
>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] maker annotation with cufflinks output
>>
>> Hi Daniel,
>>
>> I was able to check on some of those questions.
>>
>> 1. From trinity assembly: I started with 102000 contigs. I used  
>> trinotate to annotate proteins in this.
>>
>> I ran maker on this data with est2genome set to 1. The output looks  
>> like this (most important parts on top):
>>
>>     6653 gene
>>    46675 exon
>>  280534 protein_match
>> 59934 CDS
>>     969 contig
>>  105388 expressed_sequence_match
>>   12584 five_prime_UTR
>>   78565 match
>> 1401369 match_part
>>   10180 mRNA
>>   11545 three_prime_UTR
>>
>> 2. From cufflinks assembly: I started with 133380 entries (out of  
>> which there are 29,000 transcripts).  I used the protein sequences  
>> from trinity assembly.
>>
>> I ran maker on this data with est2genome set to 1. The output looks  
>> like this:
>>      29 gene
>>      75 exon
>>  573659 protein_match
>> 67 CDS
>>    1099 contig
>>  269298 expressed_sequence_match
>>      23 five_prime_UTR
>>  173844 match
>> 2221846 match_part
>>      29 mRNA
>>      23 three_prime_UTR
>>
>> The genes annotated using the trinity assembly is lower than  
>> expected, so I went the cufflinks route. I dont understand why when  
>> using the cufflinks transcripts, even less genes are being found.
>>
>> 3. Training SNAP:  I used the results of maker from 1 to train  
>> SNAP.  I then used that training set to rerun maker:
>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
>> maker_mpi_withAlltrinity/snap/RHA.hmm
>> est2genome=0
>>
>> And again I got results with no entries for gene, exon, CDS etc.
>> 957 contig
>>   46555 expressed_sequence_match
>>   43651 match
>>  553633 match_part
>>  113738 protein_match
>>
>> As I mentioned in another email, cegma results indicated that the  
>> genome was more than 90% complete. Any suggestions would be helpful.
>>
>> Thank you
>> Dhivya
>>
>>
>>
>>
>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>>
>>> Hi Dhivya,
>>>
>>> I think there a few numbers that could be helpful to understand  
>>> what's happening here.
>>>
>>> How many transcripts did Trinity assembly the RNA-seq data into?  
>>> Also, you had 29,000 transcripts from cufflinks, but fewer from  
>>> MAKER when you gave it the cufflinks data. How many transcripts  
>>> did MAKER identify with the cufflinks data? Did you still get more  
>>> than the 10,000 transcripts that you found with just the Trinity  
>>> data?
>>>
>>> A key part of MAKER's approach to genome annotation that might be  
>>> affecting it's performance is that it only annotates a gene where  
>>> there is both evidence (like your RNA-seq data) and an ab-initio  
>>> prediction. If a prediction is unsupported by the evidence, then  
>>> MAKER won't annotate a gene and if evidence aligns where there's  
>>> no prediction, MAKER won't annotate a gene either. What ab-initio  
>>> predictors are you using and have they been trained specific genome?
>>>
>>> You can force MAKER to automatically promote evidence alignments  
>>> to a gene model by setting the est2genome option to 1, but that  
>>> will usually give you many false positives.
>>>
>>> Try rerunning it with either the Trinity data or the Cufflinks  
>>> data and with est2genome set to 1, and let us know how that  
>>> affects the MAKER results.
>>>
>>> Thanks,
>>> Daniel
>>>
>>> Daniel Ence
>>> Graduate Student
>>> Eccles Institute of Human Genetics
>>> University of Utah
>>> 15 North 2030 East, Room 2100
>>> Salt Lake City, UT 84112-5330
>>> ________________________________________
>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf  
>>> of dhivya arasappan [darasappan at gmail.com]
>>> Sent: Thursday, January 30, 2014 11:18 AM
>>> To: maker-devel at yandell-lab.org
>>> Subject: [maker-devel] maker annotation with cufflinks output
>>>
>>> Hello,
>>>
>>> I am trying to annotate a 200 mb plant genome for which I have a  
>>> very
>>> good assembly.
>>>
>>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>>> using my genome assembly and the trinity results.  I did not get as
>>> many transcripts as expected, around 10,000 transcripts.
>>>
>>> So, I decided to try a different approach.  I did a genome assisted
>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>>> generated 21,000 genes, 29,000 transcripts.  I then ran maker  
>>> using my
>>> genome assembly and the cufflinks result.  I get much less number of
>>> transcripts as a result.
>>>
>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>>> confused as to why maker is not finding the same.
>>>
>>> Any suggestions would be appreciated.
>>>
>>> Thanks
>>> Dhivya
>>>
>>>
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>> _______________________________________________ maker-devel mailing  
>> list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140205/02e0218f/attachment-0003.html>

From mikael.durling at slu.se  Thu Feb  6 04:02:37 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Thu, 6 Feb 2014 11:02:37 +0000
Subject: [maker-devel] ncRNA support in maker
In-Reply-To: <CEF5A8E0.88B4%carsonhh@gmail.com>
References: <CEF5A8E0.88B4%carsonhh@gmail.com>
Message-ID: <CCBE48F7-81F1-42E3-87A3-B251EE03140C@slu.se>

Hi Carson,

it?s nice to see all these new features in maker.

I gave the trnascan option a try by enabling it in the config file for one of my fungal genomes. It failed though, with this error message:

ERROR: You found a tRNA with an intron! This should not happen
--> rank=12, hostname=my-mgrid6
ERROR: Failed while gathering ab-init output files
ERROR: Chunk failed at level:1, tier_type:2
FAILED CONTIG:scf_013

ERROR: Chunk failed at level:4, tier_type:0
FAILED CONTIG:scf_013

I checked the trnascan output (scf_013.abinit_nomask.0.eukaryotic.trnascan) in theVoid for that contig, and the output seems valid to me:

scf_013         1       189339  189410  Thr     AGT     0       0       82.83
scf_013         2       510381  510462  Ser     AGA     0       0       67.09
scf_013         3       586886  587000  Leu     CAA     586924  586956  57.97
scf_013         4       942166  942069  Leu     AAG     942128  942113  57.48
scf_013         5       169102  168993  Leu     TAA     169065  169037  56.49


Hope this can be of some help while debugging. I?ll leave trnascan off for now.

thanks,

Mikael


10 jan 2014 kl. 22:03 skrev Carson Holt <carsonhh at gmail.com>:

> Hi Mikael,
> 
> The options are part of the new MAKER-P integration
> (http://www.plantphysiol.org/content/early/2013/12/06/pp.113.230144.abstrac
> t).  Additional documentation/tutorials will be forthcoming - probably in
> a nice wiki page as part of the upcoming GMOD Malaysia courses in February
> or alternatively with the annual GMOD summer school. The tRNA option is
> easy enough to turn on (just set trna=1 in the maker_opts.ctl file).
> 
> Thanks,
> Carson
> 
> 
> 
> On 1/10/14, 2:48 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
> wrote:
> 
>> Hi Carson and other maker developers,
>> 
>> I was reading the source code of the latest maker release and noted
>> several references to ncRNAs, snoscan and trnascan. Can these be
>> incorporated into the normal annotation workflow? If so, are there any
>> instructions available for that?
>> 
>> best regards,
>> Mikael Durling
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> 


From darasappan at gmail.com  Thu Feb  6 07:52:12 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Thu, 6 Feb 2014 08:52:12 -0600
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <CF17D1FC.987A%carsonhh@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
Message-ID: <73AFCD9F-3B60-4C9C-9E03-35BC682E14ED@gmail.com>

Hello,

I does appear than my genome.ann file from maker2zff script has data  
in it. However, the SNAP steps after that have created empty files.   
The following are all empty:

alt.dna  err.dna  export.dna  genome.dna  olp.dna  uni.dna  wrn.dna
alt.ann  err.ann  export.ann  genome.ann  olp.ann  uni.ann  wrn.ann

When I tried to get gene stats or validate genome.ann, I get errors  
like this for all of them:

fathom genome.ann genome.dna -gene-stats |more
MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds  
exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds  
exon-6:out_of_bounds
MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds  
exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds  
exon-1:out_of_bounds
MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds  
exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds  
exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds  
exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds  
exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds  
exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds  
exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds  
exon-20:out_of_bounds exon-21:out_of_bounds

I'm not sure why the annotation I'm seeing in genome.ann are all  
showing up as errors. I realize this may be an issue with snap, but  
are you familiar with anything like this? Snippet of my genome.ann  
file is attached (since its too big for the list) for reference.

Thanks
Dhivya


On Feb 5, 2014, at 12:38 PM, Carson Holt wrote:

> Do you have any features of type snap in your results from step 3?   
> We?ve had a couple of recent posts where after training snap was  
> giving no results, and as a result maker couldn?t give any genes.   
> One cause of something like that may be your step 2.  Make sure the  
> ZFF wasn?t empty you used to train with.  The maker2zff script uses  
> filters to only put the best genes in the off file, and if all your  
> genes fail the filtering then you are training with an empty ZFF.
>
> Also you should use proteins from a related species as your protein  
> file.  I see that you protein marches are varying wildly from run to  
> run? So is your contig count?  Were the subset of contigs you have  
> results for long enough to contain genes?
>
> ?Carson
>
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Monday, February 3, 2014 at 9:31 AM
> To: Daniel Ence <dence at genetics.utah.edu>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] maker annotation with cufflinks output
>
> Hi Daniel,
>
> I was able to check on some of those questions.
>
> 1. From trinity assembly: I started with 102000 contigs. I used  
> trinotate to annotate proteins in this.
>
> I ran maker on this data with est2genome set to 1. The output looks  
> like this (most important parts on top):
>
>     6653 gene
>    46675 exon
>  280534 protein_match
> 59934 CDS
>     969 contig
>  105388 expressed_sequence_match
>   12584 five_prime_UTR
>   78565 match
> 1401369 match_part
>   10180 mRNA
>   11545 three_prime_UTR
>
> 2. From cufflinks assembly: I started with 133380 entries (out of  
> which there are 29,000 transcripts).  I used the protein sequences  
> from trinity assembly.
>
> I ran maker on this data with est2genome set to 1. The output looks  
> like this:
>      29 gene
>      75 exon
>  573659 protein_match
> 67 CDS
>    1099 contig
>  269298 expressed_sequence_match
>      23 five_prime_UTR
>  173844 match
> 2221846 match_part
>      29 mRNA
>      23 three_prime_UTR
>
> The genes annotated using the trinity assembly is lower than  
> expected, so I went the cufflinks route. I dont understand why when  
> using the cufflinks transcripts, even less genes are being found.
>
> 3. Training SNAP:  I used the results of maker from 1 to train  
> SNAP.  I then used that training set to rerun maker:
> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
> maker_mpi_withAlltrinity/snap/RHA.hmm
> est2genome=0
>
> And again I got results with no entries for gene, exon, CDS etc.
> 957 contig
>   46555 expressed_sequence_match
>   43651 match
>  553633 match_part
>  113738 protein_match
>
> As I mentioned in another email, cegma results indicated that the  
> genome was more than 90% complete. Any suggestions would be helpful.
>
> Thank you
> Dhivya
>
>
>
>
> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>
>> Hi Dhivya,
>>
>> I think there a few numbers that could be helpful to understand  
>> what's happening here.
>>
>> How many transcripts did Trinity assembly the RNA-seq data into?  
>> Also, you had 29,000 transcripts from cufflinks, but fewer from  
>> MAKER when you gave it the cufflinks data. How many transcripts did  
>> MAKER identify with the cufflinks data? Did you still get more than  
>> the 10,000 transcripts that you found with just the Trinity data?
>>
>> A key part of MAKER's approach to genome annotation that might be  
>> affecting it's performance is that it only annotates a gene where  
>> there is both evidence (like your RNA-seq data) and an ab-initio  
>> prediction. If a prediction is unsupported by the evidence, then  
>> MAKER won't annotate a gene and if evidence aligns where there's no  
>> prediction, MAKER won't annotate a gene either. What ab-initio  
>> predictors are you using and have they been trained specific genome?
>>
>> You can force MAKER to automatically promote evidence alignments to  
>> a gene model by setting the est2genome option to 1, but that will  
>> usually give you many false positives.
>>
>> Try rerunning it with either the Trinity data or the Cufflinks data  
>> and with est2genome set to 1, and let us know how that affects the  
>> MAKER results.
>>
>> Thanks,
>> Daniel
>>
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> ________________________________________
>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf  
>> of dhivya arasappan [darasappan at gmail.com]
>> Sent: Thursday, January 30, 2014 11:18 AM
>> To: maker-devel at yandell-lab.org
>> Subject: [maker-devel] maker annotation with cufflinks output
>>
>> Hello,
>>
>> I am trying to annotate a 200 mb plant genome for which I have a very
>> good assembly.
>>
>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>> using my genome assembly and the trinity results.  I did not get as
>> many transcripts as expected, around 10,000 transcripts.
>>
>> So, I decided to try a different approach.  I did a genome assisted
>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using  
>> my
>> genome assembly and the cufflinks result.  I get much less number of
>> transcripts as a result.
>>
>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>> confused as to why maker is not finding the same.
>>
>> Any suggestions would be appreciated.
>>
>> Thanks
>> Dhivya
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
> _______________________________________________ maker-devel mailing  
> list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a6912d46/attachment-0009.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: head.genome.ann
Type: application/octet-stream
Size: 15761 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a6912d46/attachment-0006.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a6912d46/attachment-0010.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: head.genome.dna
Type: application/octet-stream
Size: 3075 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a6912d46/attachment-0007.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a6912d46/attachment-0011.html>

From carsonhh at gmail.com  Thu Feb  6 09:01:04 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Feb 2014 09:01:04 -0700
Subject: [maker-devel] ncRNA support in maker
In-Reply-To: <CCBE48F7-81F1-42E3-87A3-B251EE03140C@slu.se>
References: <CEF5A8E0.88B4%carsonhh@gmail.com>
	<CCBE48F7-81F1-42E3-87A3-B251EE03140C@slu.se>
Message-ID: <CF18FE86.9903%carsonhh@gmail.com>

I?m making a new release this weekend, but if you have access to the devel
version, you can test now.  All changes have been committed tot he
subversion repository.

Thanks,
Carson


On 2/6/14, 4:02 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
wrote:

>Hi Carson,
>
>it?s nice to see all these new features in maker.
>
>I gave the trnascan option a try by enabling it in the config file for
>one of my fungal genomes. It failed though, with this error message:
>
>ERROR: You found a tRNA with an intron! This should not happen
>--> rank=12, hostname=my-mgrid6
>ERROR: Failed while gathering ab-init output files
>ERROR: Chunk failed at level:1, tier_type:2
>FAILED CONTIG:scf_013
>
>ERROR: Chunk failed at level:4, tier_type:0
>FAILED CONTIG:scf_013
>
>I checked the trnascan output
>(scf_013.abinit_nomask.0.eukaryotic.trnascan) in theVoid for that contig,
>and the output seems valid to me:
>
>scf_013         1       189339  189410  Thr     AGT     0       0
>82.83
>scf_013         2       510381  510462  Ser     AGA     0       0
>67.09
>scf_013         3       586886  587000  Leu     CAA     586924  586956
>57.97
>scf_013         4       942166  942069  Leu     AAG     942128  942113
>57.48
>scf_013         5       169102  168993  Leu     TAA     169065  169037
>56.49
>
>
>Hope this can be of some help while debugging. I?ll leave trnascan off
>for now.
>
>thanks,
>
>Mikael
>
>
>10 jan 2014 kl. 22:03 skrev Carson Holt <carsonhh at gmail.com>:
>
>> Hi Mikael,
>> 
>> The options are part of the new MAKER-P integration
>> 
>>(http://www.plantphysiol.org/content/early/2013/12/06/pp.113.230144.abstr
>>ac
>> t).  Additional documentation/tutorials will be forthcoming - probably
>>in
>> a nice wiki page as part of the upcoming GMOD Malaysia courses in
>>February
>> or alternatively with the annual GMOD summer school. The tRNA option is
>> easy enough to turn on (just set trna=1 in the maker_opts.ctl file).
>> 
>> Thanks,
>> Carson
>> 
>> 
>> 
>> On 1/10/14, 2:48 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
>> wrote:
>> 
>>> Hi Carson and other maker developers,
>>> 
>>> I was reading the source code of the latest maker release and noted
>>> several references to ncRNAs, snoscan and trnascan. Can these be
>>> incorporated into the normal annotation workflow? If so, are there any
>>> instructions available for that?
>>> 
>>> best regards,
>>> Mikael Durling
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
>> 
>


From carsonhh at gmail.com  Thu Feb  6 09:05:05 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Feb 2014 09:05:05 -0700
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <C375C3D8-1B13-4685-9E90-AAF710CADCDD@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
	<C375C3D8-1B13-4685-9E90-AAF710CADCDD@gmail.com>
Message-ID: <CF19004A.9913%carsonhh@gmail.com>

Your genome.dna file has no sequence?  Did you by any chance strip the fasta
sequence from the GFF3 you are using as input to maker2zff?  There should be
fasta sequence at the end of that file.  Also can I see the GFF3 file you
are using as input to maker2zff.

Thanks,
Carson

From:  dhivya arasappan <darasappan at gmail.com>
Date:  Thursday, February 6, 2014 at 7:47 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] maker annotation with cufflinks output

Hello,

I does appear than my genome.ann file from maker2zff script has data in it.
However, the SNAP steps after that have created empty files.  The following
are all empty:

alt.dna  err.dna  export.dna  genome.dna  olp.dna  uni.dna  wrn.dna
alt.ann  err.ann  export.ann  genome.ann  olp.ann  uni.ann  wrn.ann

When I tried to get gene stats or validate genome.ann, I get errors like
this for all of them:

fathom genome.ann genome.dna -gene-stats |more
MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds
exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
exon-6:out_of_bounds
MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds
exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds
exon-1:out_of_bounds
MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds
exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds
exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds
exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds
exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds
exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds
exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds
exon-21:out_of_bounds

I'm not sure why the annotation I'm seeing in genome.ann are all showing up
as errors. I realize this may be an issue with snap, but are you familiar
with anything like this? My genome.ann file is attached for reference.

Thanks
Dhivya

On Feb 5, 2014, at 12:38 PM, Carson Holt wrote:

> Do you have any features of type snap in your results from step 3?  We?ve had
> a couple of recent posts where after training snap was giving no results, and
> as a result maker couldn?t give any genes.  One cause of something like that
> may be your step 2.  Make sure the ZFF wasn?t empty you used to train with.
> The maker2zff script uses filters to only put the best genes in the off file,
> and if all your genes fail the filtering then you are training with an empty
> ZFF.
> 
> Also you should use proteins from a related species as your protein file.  I
> see that you protein marches are varying wildly from run to run? So is your
> contig count?  Were the subset of contigs you have results for long enough to
> contain genes?
> 
> ?Carson
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Monday, February 3, 2014 at 9:31 AM
> To:  Daniel Ence <dence at genetics.utah.edu>
> Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject:  Re: [maker-devel] maker annotation with cufflinks output
> 
> Hi Daniel,
> 
> I was able to check on some of those questions.
> 
> 1. From trinity assembly: I started with 102000 contigs. I used trinotate to
> annotate proteins in this.
> 
> I ran maker on this data with est2genome set to 1. The output looks like this
> (most important parts on top):
> 
>     6653 gene
>    46675 exon
>  280534 protein_match
> 59934 CDS
>     969 contig
>  105388 expressed_sequence_match
>   12584 five_prime_UTR
>   78565 match
> 1401369 match_part
>   10180 mRNA
>   11545 three_prime_UTR
> 
> 2. From cufflinks assembly: I started with 133380 entries (out of which there
> are 29,000 transcripts).  I used the protein sequences from trinity assembly.
> 
> I ran maker on this data with est2genome set to 1. The output looks like this:
>      29 gene
>      75 exon
>  573659 protein_match
> 67 CDS
>    1099 contig
>  269298 expressed_sequence_match
>      23 five_prime_UTR
>  173844 match
> 2221846 match_part
>      29 mRNA
>      23 three_prime_UTR
> 
> The genes annotated using the trinity assembly is lower than expected, so I
> went the cufflinks route. I dont understand why when using the cufflinks
> transcripts, even less genes are being found.
> 
> 3. Training SNAP:  I used the results of maker from 1 to train SNAP.  I then
> used that training set to rerun maker:
> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap
> /RHA.hmm
> est2genome=0
> 
> And again I got results with no entries for gene, exon, CDS etc.
> 957 contig
>   46555 expressed_sequence_match
>   43651 match
>  553633 match_part
>  113738 protein_match
> 
> As I mentioned in another email, cegma results indicated that the genome was
> more than 90% complete. Any suggestions would be helpful.
> 
> Thank you
> Dhivya
> 
> 
> 
> 
> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
> 
>> Hi Dhivya, 
>> 
>> I think there a few numbers that could be helpful to understand what's
>> happening here. 
>> 
>> How many transcripts did Trinity assembly the RNA-seq data into? Also, you
>> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it
>> the cufflinks data. How many transcripts did MAKER identify with the
>> cufflinks data? Did you still get more than the 10,000 transcripts that you
>> found with just the Trinity data?
>> 
>> A key part of MAKER's approach to genome annotation that might be affecting
>> it's performance is that it only annotates a gene where there is both
>> evidence (like your RNA-seq data) and an ab-initio prediction. If a
>> prediction is unsupported by the evidence, then MAKER won't annotate a gene
>> and if evidence aligns where there's no prediction, MAKER won't annotate a
>> gene either. What ab-initio predictors are you using and have they been
>> trained specific genome?
>> 
>> You can force MAKER to automatically promote evidence alignments to a gene
>> model by setting the est2genome option to 1, but that will usually give you
>> many false positives.
>> 
>> Try rerunning it with either the Trinity data or the Cufflinks data and with
>> est2genome set to 1, and let us know how that affects the MAKER results.
>> 
>> Thanks,
>> Daniel
>> 
>> Daniel Ence
>> Graduate Student
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> ________________________________________
>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya
>> arasappan [darasappan at gmail.com]
>> Sent: Thursday, January 30, 2014 11:18 AM
>> To: maker-devel at yandell-lab.org
>> Subject: [maker-devel] maker annotation with cufflinks output
>> 
>> Hello,
>> 
>> I am trying to annotate a 200 mb plant genome for which I have a very
>> good assembly.
>> 
>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>> using my genome assembly and the trinity results.  I did not get as
>> many transcripts as expected, around 10,000 transcripts.
>> 
>> So, I decided to try a different approach.  I did a genome assisted
>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
>> genome assembly and the cufflinks result.  I get much less number of
>> transcripts as a result.
>> 
>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>> confused as to why maker is not finding the same.
>> 
>> Any suggestions would be appreciated.
>> 
>> Thanks
>> Dhivya
>> 
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/9fd72060/attachment-0003.html>

From carsonhh at gmail.com  Thu Feb  6 10:04:25 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 06 Feb 2014 10:04:25 -0700
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
	<C375C3D8-1B13-4685-9E90-AAF710CADCDD@gmail.com>
	<CF19004A.9913%carsonhh@gmail.com>
	<02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com>
Message-ID: <CF190E83.9927%carsonhh@gmail.com>

Could you give me the file without using 'head? to trim it, its cutting it
before it reaches the part I?m interested in.

?Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Thursday, February 6, 2014 at 10:01 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] maker annotation with cufflinks output

Oh yes I did- I took just the non sequence entries in the gff file and used
that as my input.  I will rerun snap with the gff file containing the
sequences as well. 

I'm attaching a snippet of the gff file that I used as input to maker2zff.

Thanks for your help
Dhivya


On Feb 6, 2014, at 10:05 AM, Carson Holt wrote:

> Your genome.dna file has no sequence?  Did you by any chance strip the fasta
> sequence from the GFF3 you are using as input to maker2zff?  There should be
> fasta sequence at the end of that file.  Also can I see the GFF3 file you are
> using as input to maker2zff.
> 
> Thanks,
> Carson
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Thursday, February 6, 2014 at 7:47 AM
> To:  Carson Holt <carsonhh at gmail.com>
> Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
> <maker-devel at yandell-lab.org>
> Subject:  Re: [maker-devel] maker annotation with cufflinks output
> 
> Hello,
> 
> I does appear than my genome.ann file from maker2zff script has data in it.
> However, the SNAP steps after that have created empty files.  The following
> are all empty:
> 
> alt.dna  err.dna  export.dna  genome.dna  olp.dna  uni.dna  wrn.dna
> alt.ann  err.ann  export.ann  genome.ann  olp.ann  uni.ann  wrn.ann
> 
> When I tried to get gene stats or validate genome.ann, I get errors like this
> for all of them:
> 
> fathom genome.ann genome.dna -gene-stats |more
> MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds
> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
> exon-6:out_of_bounds
> MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds
> exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds
> exon-1:out_of_bounds
> MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds
> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
> MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds
> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
> exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds
> exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds
> exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds
> exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds
> exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds
> exon-21:out_of_bounds
> 
> I'm not sure why the annotation I'm seeing in genome.ann are all showing up as
> errors. I realize this may be an issue with snap, but are you familiar with
> anything like this? My genome.ann file is attached for reference.
> 
> Thanks
> Dhivya
> 
> On Feb 5, 2014, at 12:38 PM, Carson Holt wrote:
> 
>> Do you have any features of type snap in your results from step 3?  We?ve had
>> a couple of recent posts where after training snap was giving no results, and
>> as a result maker couldn?t give any genes.  One cause of something like that
>> may be your step 2.  Make sure the ZFF wasn?t empty you used to train with.
>> The maker2zff script uses filters to only put the best genes in the off file,
>> and if all your genes fail the filtering then you are training with an empty
>> ZFF.
>> 
>> Also you should use proteins from a related species as your protein file.  I
>> see that you protein marches are varying wildly from run to run? So is your
>> contig count?  Were the subset of contigs you have results for long enough to
>> contain genes?
>> 
>> ?Carson
>> 
>> From:  dhivya arasappan <darasappan at gmail.com>
>> Date:  Monday, February 3, 2014 at 9:31 AM
>> To:  Daniel Ence <dence at genetics.utah.edu>
>> Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject:  Re: [maker-devel] maker annotation with cufflinks output
>> 
>> Hi Daniel,
>> 
>> I was able to check on some of those questions.
>> 
>> 1. From trinity assembly: I started with 102000 contigs. I used trinotate to
>> annotate proteins in this.
>> 
>> I ran maker on this data with est2genome set to 1. The output looks like this
>> (most important parts on top):
>> 
>>     6653 gene
>>    46675 exon
>>  280534 protein_match
>> 59934 CDS
>>     969 contig
>>  105388 expressed_sequence_match
>>   12584 five_prime_UTR
>>   78565 match
>> 1401369 match_part
>>   10180 mRNA
>>   11545 three_prime_UTR
>> 
>> 2. From cufflinks assembly: I started with 133380 entries (out of which there
>> are 29,000 transcripts).  I used the protein sequences from trinity assembly.
>> 
>> I ran maker on this data with est2genome set to 1. The output looks like
>> this:
>>      29 gene
>>      75 exon
>>  573659 protein_match
>> 67 CDS
>>    1099 contig
>>  269298 expressed_sequence_match
>>      23 five_prime_UTR
>>  173844 match
>> 2221846 match_part
>>      29 mRNA
>>      23 three_prime_UTR
>> 
>> The genes annotated using the trinity assembly is lower than expected, so I
>> went the cufflinks route. I dont understand why when using the cufflinks
>> transcripts, even less genes are being found.
>> 
>> 3. Training SNAP:  I used the results of maker from 1 to train SNAP.  I then
>> used that training set to rerun maker:
>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/sna
>> p/RHA.hmm
>> est2genome=0
>> 
>> And again I got results with no entries for gene, exon, CDS etc.
>> 957 contig
>>   46555 expressed_sequence_match
>>   43651 match
>>  553633 match_part
>>  113738 protein_match
>> 
>> As I mentioned in another email, cegma results indicated that the genome was
>> more than 90% complete. Any suggestions would be helpful.
>> 
>> Thank you
>> Dhivya
>> 
>> 
>> 
>> 
>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>> 
>>> Hi Dhivya, 
>>> 
>>> I think there a few numbers that could be helpful to understand what's
>>> happening here.
>>> 
>>> How many transcripts did Trinity assembly the RNA-seq data into? Also, you
>>> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it
>>> the cufflinks data. How many transcripts did MAKER identify with the
>>> cufflinks data? Did you still get more than the 10,000 transcripts that you
>>> found with just the Trinity data?
>>> 
>>> A key part of MAKER's approach to genome annotation that might be affecting
>>> it's performance is that it only annotates a gene where there is both
>>> evidence (like your RNA-seq data) and an ab-initio prediction. If a
>>> prediction is unsupported by the evidence, then MAKER won't annotate a gene
>>> and if evidence aligns where there's no prediction, MAKER won't annotate a
>>> gene either. What ab-initio predictors are you using and have they been
>>> trained specific genome?
>>> 
>>> You can force MAKER to automatically promote evidence alignments to a gene
>>> model by setting the est2genome option to 1, but that will usually give you
>>> many false positives.
>>> 
>>> Try rerunning it with either the Trinity data or the Cufflinks data and with
>>> est2genome set to 1, and let us know how that affects the MAKER results.
>>> 
>>> Thanks,
>>> Daniel
>>> 
>>> Daniel Ence
>>> Graduate Student
>>> Eccles Institute of Human Genetics
>>> University of Utah
>>> 15 North 2030 East, Room 2100
>>> Salt Lake City, UT 84112-5330
>>> ________________________________________
>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya
>>> arasappan [darasappan at gmail.com]
>>> Sent: Thursday, January 30, 2014 11:18 AM
>>> To: maker-devel at yandell-lab.org
>>> Subject: [maker-devel] maker annotation with cufflinks output
>>> 
>>> Hello,
>>> 
>>> I am trying to annotate a 200 mb plant genome for which I have a very
>>> good assembly.
>>> 
>>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>>> using my genome assembly and the trinity results.  I did not get as
>>> many transcripts as expected, around 10,000 transcripts.
>>> 
>>> So, I decided to try a different approach.  I did a genome assisted
>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
>>> genome assembly and the cufflinks result.  I get much less number of
>>> transcripts as a result.
>>> 
>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>>> confused as to why maker is not finding the same.
>>> 
>>> Any suggestions would be appreciated.
>>> 
>>> Thanks
>>> Dhivya
>>> 
>>> 
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
>> _______________________________________________ maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/0e6ce7ae/attachment-0003.html>

From darasappan at gmail.com  Thu Feb  6 10:01:44 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Thu, 6 Feb 2014 11:01:44 -0600
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <CF19004A.9913%carsonhh@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
	<C375C3D8-1B13-4685-9E90-AAF710CADCDD@gmail.com>
	<CF19004A.9913%carsonhh@gmail.com>
Message-ID: <02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com>

Oh yes I did- I took just the non sequence entries in the gff file and  
used that as my input.  I will rerun snap with the gff file containing  
the sequences as well.

I'm attaching a snippet of the gff file that I used as input to  
maker2zff.

Thanks for your help
Dhivya


On Feb 6, 2014, at 10:05 AM, Carson Holt wrote:

> Your genome.dna file has no sequence?  Did you by any chance strip  
> the fasta sequence from the GFF3 you are using as input to  
> maker2zff?  There should be fasta sequence at the end of that file.   
> Also can I see the GFF3 file you are using as input to maker2zff.
>
> Thanks,
> Carson
>
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Thursday, February 6, 2014 at 7:47 AM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org 
> " <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] maker annotation with cufflinks output
>
> Hello,
>
> I does appear than my genome.ann file from maker2zff script has data  
> in it. However, the SNAP steps after that have created empty files.   
> The following are all empty:
>
> alt.dna  err.dna  export.dna  genome.dna  olp.dna  uni.dna  wrn.dna
> alt.ann  err.ann  export.ann  genome.ann  olp.ann  uni.ann  wrn.ann
>
> When I tried to get gene stats or validate genome.ann, I get errors  
> like this for all of them:
>
> fathom genome.ann genome.dna -gene-stats |more
> MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds  
> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
> exon-5:out_of_bounds exon-6:out_of_bounds
> MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds  
> exon-5:out_of_bounds exon-4:out_of_bounds exon-3:out_of_bounds  
> exon-2:out_of_bounds exon-1:out_of_bounds
> MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds  
> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
> exon-5:out_of_bounds
> MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds  
> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
> exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds  
> exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds  
> exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds  
> exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds  
> exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds  
> exon-20:out_of_bounds exon-21:out_of_bounds
>
> I'm not sure why the annotation I'm seeing in genome.ann are all  
> showing up as errors. I realize this may be an issue with snap, but  
> are you familiar with anything like this? My genome.ann file is  
> attached for reference.
>
> Thanks
> Dhivya
>
> On Feb 5, 2014, at 12:38 PM, Carson Holt wrote:
>
>> Do you have any features of type snap in your results from step 3?   
>> We?ve had a couple of recent posts where after training snap was  
>> giving no results, and as a result maker couldn?t give any genes.   
>> One cause of something like that may be your step 2.  Make sure the  
>> ZFF wasn?t empty you used to train with.  The maker2zff script uses  
>> filters to only put the best genes in the off file, and if all your  
>> genes fail the filtering then you are training with an empty ZFF.
>>
>> Also you should use proteins from a related species as your protein  
>> file.  I see that you protein marches are varying wildly from run  
>> to run? So is your contig count?  Were the subset of contigs you  
>> have results for long enough to contain genes?
>>
>> ?Carson
>>
>> From: dhivya arasappan <darasappan at gmail.com>
>> Date: Monday, February 3, 2014 at 9:31 AM
>> To: Daniel Ence <dence at genetics.utah.edu>
>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] maker annotation with cufflinks output
>>
>> Hi Daniel,
>>
>> I was able to check on some of those questions.
>>
>> 1. From trinity assembly: I started with 102000 contigs. I used  
>> trinotate to annotate proteins in this.
>>
>> I ran maker on this data with est2genome set to 1. The output looks  
>> like this (most important parts on top):
>>
>>     6653 gene
>>    46675 exon
>>  280534 protein_match
>> 59934 CDS
>>     969 contig
>>  105388 expressed_sequence_match
>>   12584 five_prime_UTR
>>   78565 match
>> 1401369 match_part
>>   10180 mRNA
>>   11545 three_prime_UTR
>>
>> 2. From cufflinks assembly: I started with 133380 entries (out of  
>> which there are 29,000 transcripts).  I used the protein sequences  
>> from trinity assembly.
>>
>> I ran maker on this data with est2genome set to 1. The output looks  
>> like this:
>>      29 gene
>>      75 exon
>>  573659 protein_match
>> 67 CDS
>>    1099 contig
>>  269298 expressed_sequence_match
>>      23 five_prime_UTR
>>  173844 match
>> 2221846 match_part
>>      29 mRNA
>>      23 three_prime_UTR
>>
>> The genes annotated using the trinity assembly is lower than  
>> expected, so I went the cufflinks route. I dont understand why when  
>> using the cufflinks transcripts, even less genes are being found.
>>
>> 3. Training SNAP:  I used the results of maker from 1 to train  
>> SNAP.  I then used that training set to rerun maker:
>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
>> maker_mpi_withAlltrinity/snap/RHA.hmm
>> est2genome=0
>>
>> And again I got results with no entries for gene, exon, CDS etc.
>> 957 contig
>>   46555 expressed_sequence_match
>>   43651 match
>>  553633 match_part
>>  113738 protein_match
>>
>> As I mentioned in another email, cegma results indicated that the  
>> genome was more than 90% complete. Any suggestions would be helpful.
>>
>> Thank you
>> Dhivya
>>
>>
>>
>>
>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>>
>>> Hi Dhivya,
>>>
>>> I think there a few numbers that could be helpful to understand  
>>> what's happening here.
>>>
>>> How many transcripts did Trinity assembly the RNA-seq data into?  
>>> Also, you had 29,000 transcripts from cufflinks, but fewer from  
>>> MAKER when you gave it the cufflinks data. How many transcripts  
>>> did MAKER identify with the cufflinks data? Did you still get more  
>>> than the 10,000 transcripts that you found with just the Trinity  
>>> data?
>>>
>>> A key part of MAKER's approach to genome annotation that might be  
>>> affecting it's performance is that it only annotates a gene where  
>>> there is both evidence (like your RNA-seq data) and an ab-initio  
>>> prediction. If a prediction is unsupported by the evidence, then  
>>> MAKER won't annotate a gene and if evidence aligns where there's  
>>> no prediction, MAKER won't annotate a gene either. What ab-initio  
>>> predictors are you using and have they been trained specific genome?
>>>
>>> You can force MAKER to automatically promote evidence alignments  
>>> to a gene model by setting the est2genome option to 1, but that  
>>> will usually give you many false positives.
>>>
>>> Try rerunning it with either the Trinity data or the Cufflinks  
>>> data and with est2genome set to 1, and let us know how that  
>>> affects the MAKER results.
>>>
>>> Thanks,
>>> Daniel
>>>
>>> Daniel Ence
>>> Graduate Student
>>> Eccles Institute of Human Genetics
>>> University of Utah
>>> 15 North 2030 East, Room 2100
>>> Salt Lake City, UT 84112-5330
>>> ________________________________________
>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf  
>>> of dhivya arasappan [darasappan at gmail.com]
>>> Sent: Thursday, January 30, 2014 11:18 AM
>>> To: maker-devel at yandell-lab.org
>>> Subject: [maker-devel] maker annotation with cufflinks output
>>>
>>> Hello,
>>>
>>> I am trying to annotate a 200 mb plant genome for which I have a  
>>> very
>>> good assembly.
>>>
>>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>>> using my genome assembly and the trinity results.  I did not get as
>>> many transcripts as expected, around 10,000 transcripts.
>>>
>>> So, I decided to try a different approach.  I did a genome assisted
>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>>> generated 21,000 genes, 29,000 transcripts.  I then ran maker  
>>> using my
>>> genome assembly and the cufflinks result.  I get much less number of
>>> transcripts as a result.
>>>
>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>>> confused as to why maker is not finding the same.
>>>
>>> Any suggestions would be appreciated.
>>>
>>> Thanks
>>> Dhivya
>>>
>>>
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>> _______________________________________________ maker-devel mailing  
>> list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a662c5a7/attachment-0006.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: head.cat.formatted.gff
Type: application/octet-stream
Size: 19905 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a662c5a7/attachment-0003.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/a662c5a7/attachment-0007.html>

From sjackman at gmail.com  Thu Feb  6 17:22:57 2014
From: sjackman at gmail.com (Shaun Jackman)
Date: Thu, 6 Feb 2014 16:22:57 -0800
Subject: [maker-devel] Adding MAKER to Homebrew for ease of installation
Message-ID: <CADX6M3r=29brfAzzjPr22mAGW28VUb7np5MJz5bEjsAL-o2r-w@mail.gmail.com>

Hi MAKER developers,

I?d like to add MAKER to Homebrew <http://brew.sh> to make the installation
of MAKER and its dependencies as straight forward as brew install maker.
Homebrew is a system for installing software, originally developed for Mac
OS, and now also for Linux through
Linuxbrew<https://github.com/Homebrew/linuxbrew>.
Homebrew/science <https://github.com/Homebrew/homebrew-science> is a
collection of scientific software, which includes a lot of bioinformatics
software.

I?ve created a prototype for the MAKER installation
script<https://github.com/Homebrew/homebrew-science/blob/maker/maker.rb>(called
a formula, in Homebrew parlance). Is there a static URL for the
source code of MAKER? The current formula won?t work out of the box,
because part of the
URL<https://github.com/Homebrew/homebrew-science/blob/maker/maker.rb#L7>depends
on the user?s unique ID:
http://yandell.topaz.genetics.utah.edu/maker_downloads/$key/maker-2.28.tgz.

Would you be interested in adding MAKER to Homebrew? I know MAKER must be
licensed for commercial use. It is possible for Homebrew to display a
notice of the MAKER license when it?s installed.

MAKER is not available for commercial use without a license. Those wishing
to license MAKER for commercial use should contact Beth Drees at the
University of Utah TCO to discuss your needs.

Cheers,
Shaun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140206/404a2418/attachment-0003.html>

From bioinformatics.umd at gmail.com  Fri Feb  7 06:29:27 2014
From: bioinformatics.umd at gmail.com (UMD Bioinformatics)
Date: Fri, 7 Feb 2014 08:29:27 -0500
Subject: [maker-devel] NCBI feature table
Message-ID: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com>

Hello Maker Developers,

I have used this software with great success and I continue to look to it going forward. However, as I?m getting ready to submit my annotations to NCBI with the genomes I haven?t found a straightforward method of turning the MAKER produced GFF files into a NCBI feature table. What is the process for creating this table? It seem that the format NCBI is looking for is unique and I haven?t uncovered any scripts or tools to assist in the creation of this table from my annotation files. If anyone has any insight on this issue it would be greatly appreciated.

Cheers
Ian


From mike.thon at gmail.com  Fri Feb  7 07:14:06 2014
From: mike.thon at gmail.com (Michael Thon)
Date: Fri, 7 Feb 2014 15:14:06 +0100
Subject: [maker-devel] NCBI feature table
In-Reply-To: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com>
References: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com>
Message-ID: <7239FEAE-64AF-4F91-B608-EDDF44B7B51D@gmail.com>

Hi Ian -

We've been struggling with this too and I started developing a script to convert the maker gff into ncbi's .tbl format.  However we found that some of the gene models required manual editing so what we do is import the gff into a commercial application called Geneious where we do the edits.  From there we export the data in genbank format and then convert it to .tbl format with a script. Our submission just passed the automated checks and we're waiting for the manual review. Probably none of my code will help you, and in any case its kind of a mess.  The only advice I can offer is to say that you'll probably need some manual editing in your workflow, if not Apollo, then some other app.  In that case you'll need to convert the output of that app into .tbl format.

> On Feb 7, 2014, at 2:29 PM, UMD Bioinformatics <bioinformatics.umd at gmail.com> wrote:
> 
> Hello Maker Developers,
> 
> I have used this software with great success and I continue to look to it going forward. However, as I?m getting ready to submit my annotations to NCBI with the genomes I haven?t found a straightforward method of turning the MAKER produced GFF files into a NCBI feature table. What is the process for creating this table? It seem that the format NCBI is looking for is unique and I haven?t uncovered any scripts or tools to assist in the creation of this table from my annotation files. If anyone has any insight on this issue it would be greatly appreciated.
> 
> Cheers
> Ian
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From cexzurjimenezjr at gmail.com  Thu Feb  6 22:27:13 2014
From: cexzurjimenezjr at gmail.com (Cexzur Jimenez Jr.)
Date: Fri, 7 Feb 2014 13:27:13 +0800
Subject: [maker-devel] Testing MAKER After Installation
Message-ID: <CABb+y6SiT7D8ZLZGLXNdBORAW5ks_GdRdvMhfb0co+kp1N1_2Q@mail.gmail.com>

Hello,

I have finished installing MAKER marked by "PERL Dependencies: INSTALLED,
External Programs: INSTALLED, MPI SUPPORT: NOT CONFIGURED,
MAKER: INSTALLED" and it seems everything's fine. I'm using MAKER 2.10 and
I have followed the installation instructions both in its corresponding
"README" and "INSTALL" files and the 2012 GMOD MAKER Tutorial. After
editing the three configuration files and run with "maker", I saw the
following error in my terminal. I have searched Google and tried the
solutions offered there but the error is still showing. Below is the error
I got:


Can't locate package GDBM_File for @AnyDBM_File::ISA at
/usr/lib/perl/5.14/DB_File.pm line 287.
Can't locate package NDBM_File for @AnyDBM_File::ISA at
/usr/lib/perl/5.14/DB_File.pm line 287.
Can't locate package SDBM_File for @AnyDBM_File::ISA at
/usr/lib/perl/5.14/DB_File.pm line 287.
A data structure will be created for you at:
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore

To access files for individual sequences use the datastore index:
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_master_datastore_index.log


--Next Contig--

#---------------------------------------------------------------------
Now starting the contig!!
SeqID: contig-dpp-500-500
Length: 32156
#---------------------------------------------------------------------


running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
/usr/local/maker/exe/RepeatMasker/RepeatMasker
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb
-species all -dir
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500
-pa 1
#-------------------------------#
Building general libraries in:
/usr/local/maker/exe/RepeatMasker/Libraries/20120418/general
RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb
on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib.
ERROR: RepeatMasker failed

FATAL ERROR
ERROR: Failed while doing repeat masking!!

ERROR: Chunk failed at level 2
!!
FAILED CONTIG:contig-dpp-500-500


--Next Contig--

Processing run.log file...
MAKER WARNING: The file
dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out
did not finish on the last run and must be erased
#---------------------------------------------------------------------
Now retrying the contig!!
SeqID: contig-dpp-500-500
Length: 32156
Retry: 1!!
#---------------------------------------------------------------------


running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
/usr/local/maker/exe/RepeatMasker/RepeatMasker
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb
-species all -dir
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500
-pa 1
#-------------------------------#
Building general libraries in:
/usr/local/maker/exe/RepeatMasker/Libraries/20120418/general
RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb
on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib.
ERROR: RepeatMasker failed

FATAL ERROR
ERROR: Failed while doing repeat masking!!

ERROR: Chunk failed at level 2
!!
FAILED CONTIG:contig-dpp-500-500


--Next Contig--

Processing run.log file...
MAKER WARNING: The file
dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out
did not finish on the last run and must be erased


Maker is now finished!!!


Can you state to me the error and what part of the installation did I go
wrong? Your help will be very much appreciated. Thank you.

Attached herein are configuration files I used for MAKER.


Sincerely,

CJ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140207/b2025b2a/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: maker_bopts.ctl
Type: application/octet-stream
Size: 1502 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140207/b2025b2a/attachment-0009.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: maker_exe.ctl
Type: application/octet-stream
Size: 1320 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140207/b2025b2a/attachment-0010.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: maker_opts.ctl
Type: application/octet-stream
Size: 4541 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140207/b2025b2a/attachment-0011.obj>

From carson.holt at genetics.utah.edu  Fri Feb  7 11:11:44 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Fri, 7 Feb 2014 18:11:44 +0000
Subject: [maker-devel] Maker installation
In-Reply-To: <CAEpzfGCB9HFkj+Kd2suNTRN_prriqipM26kdj=3gW=QygmXjmw@mail.gmail.com>
References: <CAEpzfGCB9HFkj+Kd2suNTRN_prriqipM26kdj=3gW=QygmXjmw@mail.gmail.com>
Message-ID: <CF1A6E45.99DF%carson.holt@genetics.utah.edu>

Hi Tracy,

The older apollo is pretty much deprecated.  There are still people who like to use it though (myself among them).  You can download and install it manually from here ?> http://sourceforge.net/projects/gmod/files/Apollo/.

If you want to let MAKER install it for you, you can edit the URL in the .../maker/src/locations file to be this ?> http://weatherby.genetics.utah.edu/apollo/apollo.tar.gz

You can also use Web-Apollo for your data if you want, and that is what I would recommend.

On a side note, if you are trying to install the old Apollo as part of the optional web-based GUI, I?d recommend not doing that.  The GUI is really only for demonstration purposes or very small datasets.  It is not for production (that is why it is off by default).

Thanks,
Carson

From: Tracy Smith <tmsmith23 at wisc.edu<mailto:tmsmith23 at wisc.edu>>
Date: Friday, February 7, 2014 at 10:48 AM
To: Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>
Cc: <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Maker installation

Hi,

I am trying to install Maker and am running into the same problem noted on this page, namely I cannot install Apollo.

https://groups.google.com/forum/#!msg/maker-devel/vrVa2mEsKbg/0e_25LvOvdEJ

I tried using the new url you provided, "Here is a new location for the source --> http://sourceforge.net/code-snapshots/svn/g/gm/gmod/svn/gmod-svn-25291-apollo-trunk.zip"
but that url now points nowhere.

Is it possible to use WebApollo instead? Or do you know of another location where a copy of Apollo could be downloaded?

Thank you so much.

Best regards,
Tracy

--
Tracy Smith
University of Wisconsin- Madison
Pepperell Lab
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140207/9ac7950e/attachment-0003.html>

From carson.holt at genetics.utah.edu  Fri Feb  7 11:28:29 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Fri, 7 Feb 2014 18:28:29 +0000
Subject: [maker-devel] NCBI feature table
In-Reply-To: <7239FEAE-64AF-4F91-B608-EDDF44B7B51D@gmail.com>
References: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com>
	<7239FEAE-64AF-4F91-B608-EDDF44B7B51D@gmail.com>
Message-ID: <CF1A7331.9A09%carson.holt@genetics.utah.edu>

Yes.  The non-web version of apollo can open GFF3 and then save to table
format ?> http://sourceforge.net/projects/gmod/files/Apollo/

I?ve also attached a script made by a lab member that can convert MAKER
derived GFF3 gene entries into raw table format, and I?ve CC?d the scripts
author (Michael Campbell) incase you have any questions.

Thanks,
Carson


On 2/7/14, 7:14 AM, "Michael Thon" <mike.thon at gmail.com> wrote:

>Hi Ian -
>
>We've been struggling with this too and I started developing a script to
>convert the maker gff into ncbi's .tbl format.  However we found that
>some of the gene models required manual editing so what we do is import
>the gff into a commercial application called Geneious where we do the
>edits.  From there we export the data in genbank format and then convert
>it to .tbl format with a script. Our submission just passed the automated
>checks and we're waiting for the manual review. Probably none of my code
>will help you, and in any case its kind of a mess.  The only advice I can
>offer is to say that you'll probably need some manual editing in your
>workflow, if not Apollo, then some other app.  In that case you'll need
>to convert the output of that app into .tbl format.
>
>> On Feb 7, 2014, at 2:29 PM, UMD Bioinformatics
>><bioinformatics.umd at gmail.com> wrote:
>> 
>> Hello Maker Developers,
>> 
>> I have used this software with great success and I continue to look to
>>it going forward. However, as I?m getting ready to submit my annotations
>>to NCBI with the genomes I haven?t found a straightforward method of
>>turning the MAKER produced GFF files into a NCBI feature table. What is
>>the process for creating this table? It seem that the format NCBI is
>>looking for is unique and I haven?t uncovered any scripts or tools to
>>assist in the creation of this table from my annotation files. If anyone
>>has any insight on this issue it would be greatly appreciated.
>> 
>> Cheers
>> Ian
>> 
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: gff32table
Type: application/octet-stream
Size: 7511 bytes
Desc: gff32table
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140207/2e51f964/attachment-0003.obj>

From carson.holt at genetics.utah.edu  Fri Feb  7 11:31:17 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Fri, 7 Feb 2014 18:31:17 +0000
Subject: [maker-devel] Testing MAKER After Installation
In-Reply-To: <CABb+y6SiT7D8ZLZGLXNdBORAW5ks_GdRdvMhfb0co+kp1N1_2Q@mail.gmail.com>
References: <CABb+y6SiT7D8ZLZGLXNdBORAW5ks_GdRdvMhfb0co+kp1N1_2Q@mail.gmail.com>
Message-ID: <CF1A7417.9A11%carson.holt@genetics.utah.edu>

That can happen on some systems with that very old version of MAKER.  Use MAKER 2.28 or 2.30 instead ?> http://www.yandell-lab.org/software/maker.html

Thanks,
Carson


From: "Cexzur Jimenez Jr." <cexzurjimenezjr at gmail.com<mailto:cexzurjimenezjr at gmail.com>>
Date: Thursday, February 6, 2014 at 10:27 PM
To: <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: [maker-devel] Testing MAKER After Installation

Hello,

I have finished installing MAKER marked by "PERL Dependencies: INSTALLED, External Programs: INSTALLED, MPI SUPPORT: NOT CONFIGURED,
MAKER: INSTALLED" and it seems everything's fine. I'm using MAKER 2.10 and I have followed the installation instructions both in its corresponding "README" and "INSTALL" files and the 2012 GMOD MAKER Tutorial. After editing the three configuration files and run with "maker", I saw the following error in my terminal. I have searched Google and tried the solutions offered there but the error is still showing. Below is the error I got:


Can't locate package GDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287.
Can't locate package NDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287.
Can't locate package SDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287.
A data structure will be created for you at:
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore

To access files for individual sequences use the datastore index:
/home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_master_datastore_index.log


--Next Contig--

#---------------------------------------------------------------------
Now starting the contig!!
SeqID: contig-dpp-500-500
Length: 32156
#---------------------------------------------------------------------


running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
/usr/local/maker/exe/RepeatMasker/RepeatMasker /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb -species all -dir /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500 -pa 1
#-------------------------------#
Building general libraries in: /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general
RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib.
ERROR: RepeatMasker failed

FATAL ERROR
ERROR: Failed while doing repeat masking!!

ERROR: Chunk failed at level 2
!!
FAILED CONTIG:contig-dpp-500-500


--Next Contig--

Processing run.log file...
MAKER WARNING: The file dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out
did not finish on the last run and must be erased
#---------------------------------------------------------------------
Now retrying the contig!!
SeqID: contig-dpp-500-500
Length: 32156
Retry: 1!!
#---------------------------------------------------------------------


running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
/usr/local/maker/exe/RepeatMasker/RepeatMasker /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb -species all -dir /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500 -pa 1
#-------------------------------#
Building general libraries in: /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general
RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib.
ERROR: RepeatMasker failed

FATAL ERROR
ERROR: Failed while doing repeat masking!!

ERROR: Chunk failed at level 2
!!
FAILED CONTIG:contig-dpp-500-500


--Next Contig--

Processing run.log file...
MAKER WARNING: The file dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out
did not finish on the last run and must be erased


Maker is now finished!!!


Can you state to me the error and what part of the installation did I go wrong? Your help will be very much appreciated. Thank you.

Attached herein are configuration files I used for MAKER.


Sincerely,

CJ

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140207/333ceab2/attachment-0003.html>

From bhall7 at hawaii.edu  Fri Feb  7 17:31:36 2014
From: bhall7 at hawaii.edu (Brian Hall)
Date: Fri, 07 Feb 2014 14:31:36 -1000
Subject: [maker-devel] NCBI feature table
In-Reply-To: <mailman.61.1391786169.26968.maker-devel_yandell-lab.org@box290.bluehost.com>
References: <mailman.61.1391786169.26968.maker-devel_yandell-lab.org@box290.bluehost.com>
Message-ID: <52F57AE8.5090002@hawaii.edu>

Hi Ian,

My colleagues are also working on preparing a genome for submission to 
the NCBI. The software we are developing for this task is still a work 
in progress, but you are welcome to give it a try:

https://github.com/tedsta/GAG

It's a console-based application and it requires Python 2.6. Its 
strength is in filtering and modifying large segments of the genome at 
once -- where Apollo is good for removing a few erroneous exons, we are 
dealing with lists of dozens or more. This program seeks to make such 
changes as painless as possible.

My advice is to try the simplest gff3-to-tbl script you can find and 
then run tbl2asn. If it works out okay, great! If you get a massive 
error report, get in touch and we'll help you out if we can :)

--Brian

On 02/07/2014 05:16 AM, maker-devel-request at yandell-lab.org wrote:
> Date: Fri, 7 Feb 2014 08:29:27 -0500
> From: UMD Bioinformatics <bioinformatics.umd at gmail.com>
> To: maker-devel at yandell-lab.org
> Subject: [maker-devel] NCBI feature table
> Message-ID: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8 at gmail.com>
> Content-Type: text/plain; charset=windows-1252
>
> Hello Maker Developers,
>
> I have used this software with great success and I continue to look to it going forward. However, as I?m getting ready to submit my annotations to NCBI with the genomes I haven?t found a straightforward method of turning the MAKER produced GFF files into a NCBI feature table. What is the process for creating this table? It seem that the format NCBI is looking for is unique and I haven?t uncovered any scripts or tools to assist in the creation of this table from my annotation files. If anyone has any insight on this issue it would be greatly appreciated.
>
> Cheers
> Ian
>


From tmsmith23 at wisc.edu  Fri Feb  7 10:48:13 2014
From: tmsmith23 at wisc.edu (Tracy Smith)
Date: Fri, 7 Feb 2014 11:48:13 -0600
Subject: [maker-devel] Maker installation
Message-ID: <CAEpzfGCB9HFkj+Kd2suNTRN_prriqipM26kdj=3gW=QygmXjmw@mail.gmail.com>

Hi,

I am trying to install Maker and am running into the same problem noted on
this page, namely I cannot install Apollo.

https://groups.google.com/forum/#!msg/maker-devel/vrVa2mEsKbg/0e_25LvOvdEJ

I tried using the new url you provided, "Here is a new location for the
source -->
http://sourceforge.net/code-snapshots/svn/g/gm/gmod/svn/gmod-svn-25291-apollo-trunk.zip
"
but that url now points nowhere.

Is it possible to use WebApollo instead? Or do you know of another location
where a copy of Apollo could be downloaded?

Thank you so much.

Best regards,
Tracy

-- 
Tracy Smith
University of Wisconsin- Madison
Pepperell Lab
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140207/0ddc7929/attachment-0003.html>

From carsonhh at gmail.com  Mon Feb 10 08:34:58 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Feb 2014 08:34:58 -0700
Subject: [maker-devel] MAKER presentation at PAG
In-Reply-To: <CAAer89Z=ivW==Pv0eSA+RQtPg1r9JoLHv7hH+TP2c4=DUwh8tg@mail.gmail.com>
References: <CAAer89Z=ivW==Pv0eSA+RQtPg1r9JoLHv7hH+TP2c4=DUwh8tg@mail.gmail.com>
Message-ID: <CF1E3E65.9B13%carsonhh@gmail.com>

* 
* maker_map_ids - Build shorter IDs/Names for MAKER genes and transcripts
following the NCBI suggested naming format.
* map_fasta_ids - Maps short IDs/Names generated by maker_map_ids to MAKER
fasta files.
* map_gff_ids - Maps short IDs/Names generated by maker_map_id to MAKER GFF3
files, old IDs/Names are mapped to to the Alias attribute.
* maker_functional_fasta - Maps putative functions identified from BLASTP
against UniProt/SwissProt to the MAKER produced transcript and protein fasta
files.
* maker_functional_gff - Maps putative functions identified from BLASTP
against UniProt/SwissProt to the MAKER produced GFF3 files in the Note
attribute
* ipr_update_gff - Takes InterproScan (iprscan) output and maps domain IDs
and GO terms to the Dbxref and Ontology_term attributes in the GFF3 file.
This is meta data that shows up when you click on an annotation in JBrowse
/GBrowse.
* iprscan2gff3 - Takes InerproScan (iprscan) output and generates GFF3
features representing domains. Interesting tier for GBrowse. These are
visible features tracks that can be seen in JBrowse/GBrowse.
Thanks,
Carson

From:  Kevin Dorn <dorn at umn.edu>
Date:  Sunday, February 9, 2014 at 9:23 PM
To:  <carson.holt at utah.edu>
Subject:  MAKER presentation at PAG

Hi Carson, 

I saw your MAKER presentation at PAG this year and have a quick question.
I've used MAKER to annotate the plant genome we're working on, and am mostly
done. I had to step out for a second during your talk, and when I came back,
you were talking about how you can transfer meaningful annotations (getting
rid of the 'ugly MAKER names' for genes). Is there an accessory script to do
this? 

Thanks, 
Kevin Dorn 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140210/26f43039/attachment-0003.html>

From amitha at ccmb.res.in  Mon Feb 10 00:04:37 2014
From: amitha at ccmb.res.in (AMITHA SAMPATH KUMAR)
Date: Mon, 10 Feb 2014 12:34:37 +0530 (IST)
Subject: [maker-devel] Falied to create new account
In-Reply-To: <bea52988-c660-488d-aae4-196364348cea@node1>
Message-ID: <11349995-a97a-43fd-9fd6-420dd067cd6b@node1>

Hi,

I an interested in using Maker online version, for which i tried to create a profile using the email id 'amitha at ccmb.res.in', but unfortunately, I did not successfully login. 
I am also pasting a link of the error here, http://weatherby.genetics.utah.edu/cgi-bin/mwas/maker.cgi.

The error mentioned is:
Error executing run mode 'forgot_login': Can't call method "MailMsg" without a package or object reference at /var/www/cgi-bin/mwas/lib/MWAS_util.pm line 529.
 at /var/www/cgi-bin/mwas/maker.cgi line 21.

Kindly help me through the registration asap.

Thanks
Amitha.


From listona at science.oregonstate.edu  Sat Feb  8 19:08:42 2014
From: listona at science.oregonstate.edu (Aaron Liston)
Date: Sat, 08 Feb 2014 18:08:42 -0800
Subject: [maker-devel] Re-using repeat masking in SNAP training
Message-ID: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu>

I am following the tutorial for training SNAP, and it works fine.  
However, the tutorial instructions have MAKER repeat the repeat  
masking. To avoid this, I concatenated my gff files from the first  
round of annotation and used maker_gff=round1.gff and rm_pass=1  but  
at the end of the process, the repeat annotations were not there. Any  
suggestions?  Thanks, Aaron


From caigh02 at gmail.com  Sun Feb  9 20:26:57 2014
From: caigh02 at gmail.com (Guohong Cai)
Date: Sun, 9 Feb 2014 21:26:57 -0600
Subject: [maker-devel] Fwd: Functional annotation of MAKER gene models
In-Reply-To: <CAOcLemT5qaFvSRfjQ1QrObr9WCLh915aJ14a7ZbSemcuOBypfQ@mail.gmail.com>
References: <CAOcLemT5qaFvSRfjQ1QrObr9WCLh915aJ14a7ZbSemcuOBypfQ@mail.gmail.com>
Message-ID: <CAOcLemT3CCPmWMpwoZr_w322Gv9ZXFrmD70t7ygZWOk1Kq9TMg@mail.gmail.com>

I sent the following message to Carson but forgot to send to the
maker-devel list

Hi Carson,

Again need your help!

With your guidance, I have the gene models for my genomes. Now I am trying
to assign functions to the gene models. I noticed that I can use
maker_functional_gff/fasta or interproScan. I dig out some old messages in
maker-devel google group, but still have a few questions:

1. Will maker_functional_gff/fasta take NCBI blastp results, or only
wu-blast results? I do not have wu-blast.

2. Do I have to use Uniprot/Swiss_prot database or I can use something
else? For example, may I add a few high-quality genome annotations of
related species to the swiss_prot database? Or may I use Uniref90 or nr
database instead of swiss_prot?

3. Do you have a script to integrate blast2go results to the maker
gff/fasta?

Thanks.

Guohong

Rutgers University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140209/bad045be/attachment-0003.html>

From carsonhh at gmail.com  Mon Feb 10 10:25:06 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Feb 2014 10:25:06 -0700
Subject: [maker-devel] Falied to create new account
In-Reply-To: <11349995-a97a-43fd-9fd6-420dd067cd6b@node1>
References: <bea52988-c660-488d-aae4-196364348cea@node1>
	<11349995-a97a-43fd-9fd6-420dd067cd6b@node1>
Message-ID: <CF1E5936.9B37%carsonhh@gmail.com>

The smtp server that sends e-mails out is just down.  So when you said you
forgot your login, it couldn?t e-mail you.  I switched to a different
server for the time being.

?Carson


On 2/10/14, 12:04 AM, "AMITHA SAMPATH KUMAR" <amitha at ccmb.res.in> wrote:

>Hi,
>
>I an interested in using Maker online version, for which i tried to
>create a profile using the email id 'amitha at ccmb.res.in', but
>unfortunately, I did not successfully login.
>I am also pasting a link of the error here,
>http://weatherby.genetics.utah.edu/cgi-bin/mwas/maker.cgi.
>
>The error mentioned is:
>Error executing run mode 'forgot_login': Can't call method "MailMsg"
>without a package or object reference at
>/var/www/cgi-bin/mwas/lib/MWAS_util.pm line 529.
> at /var/www/cgi-bin/mwas/maker.cgi line 21.
>
>Kindly help me through the registration asap.
>
>Thanks
>Amitha.
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Mon Feb 10 10:26:06 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 10 Feb 2014 10:26:06 -0700
Subject: [maker-devel] Fwd: Functional annotation of MAKER gene models
In-Reply-To: <CAOcLemT3CCPmWMpwoZr_w322Gv9ZXFrmD70t7ygZWOk1Kq9TMg@mail.gmail.com>
References: <CAOcLemT5qaFvSRfjQ1QrObr9WCLh915aJ14a7ZbSemcuOBypfQ@mail.gmail.com>
	<CAOcLemT3CCPmWMpwoZr_w322Gv9ZXFrmD70t7ygZWOk1Kq9TMg@mail.gmail.com>
Message-ID: <CF1E59B4.9B3B%carsonhh@gmail.com>

1. yes. It should take NCBI BLAST+ results.
2. It has to be UniProt/Swissprot or you can modify the comments of another
database to look like UniProt/Swissport
3. ipr_update_gff, can also take BLAST2GO results as an undocumented feature
(or at least it could last time I tested it - which was quite a long time
ago).

Thanks,
Carson

From:  Guohong Cai <caigh02 at gmail.com>
Date:  Sunday, February 9, 2014 at 8:26 PM
To:  <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Fwd: Functional annotation of MAKER gene models

I sent the following message to Carson but forgot to send to the maker-devel
list

Hi Carson,

Again need your help!

With your guidance, I have the gene models for my genomes. Now I am trying
to assign functions to the gene models. I noticed that I can use
maker_functional_gff/fasta or interproScan. I dig out some old messages in
maker-devel google group, but still have a few questions:

1. Will maker_functional_gff/fasta take NCBI blastp results, or only
wu-blast results? I do not have wu-blast.

2. Do I have to use Uniprot/Swiss_prot database or I can use something else?
For example, may I add a few high-quality genome annotations of related
species to the swiss_prot database? Or may I use Uniref90 or nr database
instead of swiss_prot?

3. Do you have a script to integrate blast2go results to the maker
gff/fasta?  

Thanks.

Guohong

Rutgers University 

_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140210/5042428b/attachment-0003.html>

From barry.utah at gmail.com  Mon Feb 10 12:21:31 2014
From: barry.utah at gmail.com (Barry Moore)
Date: Mon, 10 Feb 2014 12:21:31 -0700
Subject: [maker-devel] Re-using repeat masking in SNAP training
In-Reply-To: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu>
References: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu>
Message-ID: <78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu>

Hi Arron,

If you re-run maker and don't change the details about the repeat library (i.e. you only update the SNAP HMM file) then MAKER shouldn't redo any work with repeat masking it should reuse the work it has already done.  Is this not what you are seeing?

Barry


On Feb 8, 2014, at 7:08 PM, Aaron Liston wrote:

> I am following the tutorial for training SNAP, and it works fine. However, the tutorial instructions have MAKER repeat the repeat masking. To avoid this, I concatenated my gff files from the first round of annotation and used maker_gff=round1.gff and rm_pass=1  but at the end of the process, the repeat annotations were not there. Any suggestions?  Thanks, Aaron
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140210/15a1305a/attachment-0003.html>

From listona at science.oregonstate.edu  Mon Feb 10 12:46:06 2014
From: listona at science.oregonstate.edu (Aaron Liston)
Date: Mon, 10 Feb 2014 11:46:06 -0800
Subject: [maker-devel] Re-using repeat masking in SNAP training
In-Reply-To: <78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu>
References: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu>
	<78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu>
Message-ID: <02b401cf2698$bd2a1550$377e3ff0$@science.oregonstate.edu>

Hi Barry:   I changed the name of the genome file, so that I could see the
results at each step. However, it sounds like if I had kept the same name,
MAKER would use the info from the previous run.  Is that correct?  Aaron

 
From: Barry Moore [mailto:barry.utah at gmail.com] 
Sent: Monday, February 10, 2014 11:22 AM
To: Aaron Liston
Cc: maker-devel at yandell-lab.org
Subject: Re: [maker-devel] Re-using repeat masking in SNAP training

 
Hi Arron,

 
If you re-run maker and don't change the details about the repeat library
(i.e. you only update the SNAP HMM file) then MAKER shouldn't redo any work
with repeat masking it should reuse the work it has already done.  Is this
not what you are seeing?

 
Barry

 
On Feb 8, 2014, at 7:08 PM, Aaron Liston wrote:


I am following the tutorial for training SNAP, and it works fine. However,
the tutorial instructions have MAKER repeat the repeat masking. To avoid
this, I concatenated my gff files from the first round of annotation and
used maker_gff=round1.gff and rm_pass=1  but at the end of the process, the
repeat annotations were not there. Any suggestions?  Thanks, Aaron


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

 
Barry Moore

Research Scientist

Dept. of Human Genetics

University of Utah

Salt Lake City, UT 84112

--------------------------------------------

(801) 585-3543

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140210/6c808a76/attachment-0003.html>

From barry.utah at gmail.com  Mon Feb 10 12:56:26 2014
From: barry.utah at gmail.com (Barry Moore)
Date: Mon, 10 Feb 2014 12:56:26 -0700
Subject: [maker-devel] Re-using repeat masking in SNAP training
In-Reply-To: <02b401cf2698$bd2a1550$377e3ff0$@science.oregonstate.edu>
References: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu>
	<78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu>
	<02b401cf2698$bd2a1550$377e3ff0$@science.oregonstate.edu>
Message-ID: <19FC4633-46F6-4B32-820A-A68C242A1E77@gmail.com>

Yep.  If you want to keep the results from each step just copy the GFF3 file from your first run to a new name and then redo your run.

B

On Feb 10, 2014, at 12:46 PM, Aaron Liston wrote:

> Hi Barry:   I changed the name of the genome file, so that I could see the results at each step. However, it sounds like if I had kept the same name, MAKER would use the info from the previous run.  Is that correct?  Aaron
>  
> From: Barry Moore [mailto:barry.utah at gmail.com] 
> Sent: Monday, February 10, 2014 11:22 AM
> To: Aaron Liston
> Cc: maker-devel at yandell-lab.org
> Subject: Re: [maker-devel] Re-using repeat masking in SNAP training
>  
> Hi Arron,
>  
> If you re-run maker and don't change the details about the repeat library (i.e. you only update the SNAP HMM file) then MAKER shouldn't redo any work with repeat masking it should reuse the work it has already done.  Is this not what you are seeing?
>  
> Barry
>  
>  
> On Feb 8, 2014, at 7:08 PM, Aaron Liston wrote:
> 
> 
> I am following the tutorial for training SNAP, and it works fine. However, the tutorial instructions have MAKER repeat the repeat masking. To avoid this, I concatenated my gff files from the first round of annotation and used maker_gff=round1.gff and rm_pass=1  but at the end of the process, the repeat annotations were not there. Any suggestions?  Thanks, Aaron
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>  
> Barry Moore
> Research Scientist
> Dept. of Human Genetics
> University of Utah
> Salt Lake City, UT 84112
> --------------------------------------------
> (801) 585-3543
>  
>  
>  
>  

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140210/344b73a2/attachment-0003.html>

From dence at genetics.utah.edu  Tue Feb 11 11:37:36 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Tue, 11 Feb 2014 18:37:36 +0000
Subject: [maker-devel] Falied to create new account
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A89089AF@SKREGIXES2.AGR.GC.CA>
References: <bea52988-c660-488d-aae4-196364348cea@node1>
	<11349995-a97a-43fd-9fd6-420dd067cd6b@node1>
	<CF1E5936.9B37%carsonhh@gmail.com>
	<E8EDFB90D92694478065C37017B3A3A6A8908910@SKREGIXES2.AGR.GC.CA>
	<CF1FA919.9BBB%carsonhh@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D445A3@mxb2.hg.genetics.utah.edu>,
	<E8EDFB90D92694478065C37017B3A3A6A89089AF@SKREGIXES2.AGR.GC.CA>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D445B3@mxb2.hg.genetics.utah.edu>

Hossein, 

Ok. So since this error came up on a local install, I'm going to need some more information to understand what went wrong. Is it the same contig that always causes this error? If it is, then is the the only error or warning that MAKER encounters while running on this contig? Or, if multiple contigs fail, then is it always the same error? 

If you can narrow it down to the smallest possible dataset that consistently gives the same error, then we canb egin to understand what's wrong. 

Thanks,
Daniel 


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
Sent: Tuesday, February 11, 2014 11:20 AM
To: Daniel Ence
Subject: Re: [maker-devel] Falied to create new account

Hi Daniel

I running it through the local server at my work


M. Hossein Borhan, Ph.D.
Research Scientist/ Chercheur Scientifique
Saskatoon Research Centre/Centre de Recherches de Saskatoon
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
107 Science Place, Saskatoon, SK.,S7N 0X2
Telephone/T?l?phone: (306) 385-9441
Facsimile/T?l?copieur: (306) 385-9482
Hossein.borhan at agr.gc.ca


On 14-02-11 12:16 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hi Hossein,
>
>Did you encounter this error while you were running MAKER on your local
>machine or through the MAKER web annotation service?
>
>Thanks,
>Daniel
>
>
>Daniel Ence
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: Carson Holt [carsonhh at gmail.com]
>Sent: Tuesday, February 11, 2014 10:18 AM
>To: Daniel Ence
>Cc: Mark Yandell
>Subject: FW: [maker-devel] Falied to create new account
>
>Hey Daniel could you download his dataset, and see if you can replicate
>the error.  Also check if this was an MWAS job or a local maker run (his
>dataset will already be there for MWAS, you just need the job ID).
>
>Thanks,
>Carson
>
>On 2/11/14, 10:16 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:
>
>>Hi Carson
>>
>>
>>I encountered this error while running maker
>>
>>FATAL ERROR
>>ERROR: Failed while processing the chunk divide!!
>>
>>ERROR: Chunk failed at level 17
>>!!
>>FAILED CONTIG:PbPT3Sc00006
>>
>>
>>
>>
>>
>>HB
>>
>>
>>
>>
>>
>>
>>
>>>
>>
>
>


From darasappan at gmail.com  Tue Feb 11 11:48:23 2014
From: darasappan at gmail.com (dhivya arasappan)
Date: Tue, 11 Feb 2014 12:48:23 -0600
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <CF19187C.994D%carsonhh@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
	<C375C3D8-1B13-4685-9E90-AAF710CADCDD@gmail.com>
	<CF19004A.9913%carsonhh@gmail.com>
	<02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com>
	<CF190E83.9927%carsonhh@gmail.com>
	<CAGWaY_4mGU2DLWwcQ=_F3-O+YE1ZmDtE=zgdi6cVouhkH=N5HQ@mail.gmail.com>
	<CF19187C.994D%carsonhh@gmail.com>
Message-ID: <0BB3E178-1CA3-46E7-8923-3E7C6B834665@gmail.com>

With your suggested changes (using a protein file not derived from the  
RNA-seq data and fixing the gff file for training SNAP), I was able to  
increase the number of genes from 6000+ to 18116.

I'm now trying to evaluate the quality of the annotation.  I have a  
question about the usage for mpi_evaluator.

In the maker tutorial,  the usage is given as:

  mpi_evaluator [options] <eval_opts> <eval_bopts> <eval_exe>
What files are being referred to in the input parameters: eval_opts,  
eval_bopts and eval_exe?

Thanks
Dhivya

On Feb 6, 2014, at 11:47 AM, Carson Holt wrote:

> Ok.  Content looks good.  Just make sure to use gff3_merge to join  
> the GFF3?s without stripping out the fasta sequence at the end when  
> training SNAP.
>
> Thanks,
> Carson
>
>
> From: dhivya arasappan <darasappan at gmail.com>
> Date: Thursday, February 6, 2014 at 10:29 AM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: Daniel Ence <dence at genetics.utah.edu>
> Subject: Re: [maker-devel] maker annotation with cufflinks output
>
> Sorry I was just trying to make it small enough to be approved by  
> the mailing list.
>
> Here is the whole file:
>
>
>  cat.formatted.gff.tgz
>
>
>
> On Thu, Feb 6, 2014 at 11:04 AM, Carson Holt <carsonhh at gmail.com>  
> wrote:
>> Could you give me the file without using 'head? to trim it, its  
>> cutting it before it reaches the part I?m interested in.
>>
>> ?Carson
>>
>>
>> From: dhivya arasappan <darasappan at gmail.com>
>> Date: Thursday, February 6, 2014 at 10:01 AM
>>
>> To: Carson Holt <carsonhh at gmail.com>
>> Cc: Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org 
>> " <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] maker annotation with cufflinks output
>>
>> Oh yes I did- I took just the non sequence entries in the gff file  
>> and used that as my input.  I will rerun snap with the gff file  
>> containing the sequences as well.
>>
>> I'm attaching a snippet of the gff file that I used as input to  
>> maker2zff.
>>
>> Thanks for your help
>> Dhivya
>>
>>
>>
>>
>> On Feb 6, 2014, at 10:05 AM, Carson Holt wrote:
>>
>>> Your genome.dna file has no sequence?  Did you by any chance strip  
>>> the fasta sequence from the GFF3 you are using as input to  
>>> maker2zff?  There should be fasta sequence at the end of that  
>>> file.  Also can I see the GFF3 file you are using as input to  
>>> maker2zff.
>>>
>>> Thanks,
>>> Carson
>>>
>>> From: dhivya arasappan <darasappan at gmail.com>
>>> Date: Thursday, February 6, 2014 at 7:47 AM
>>> To: Carson Holt <carsonhh at gmail.com>
>>> Cc: Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org 
>>> " <maker-devel at yandell-lab.org>
>>> Subject: Re: [maker-devel] maker annotation with cufflinks output
>>>
>>> Hello,
>>>
>>> I does appear than my genome.ann file from maker2zff script has  
>>> data in it. However, the SNAP steps after that have created empty  
>>> files.  The following are all empty:
>>>
>>> alt.dna  err.dna  export.dna  genome.dna  olp.dna  uni.dna  wrn.dna
>>> alt.ann  err.ann  export.ann  genome.ann  olp.ann  uni.ann  wrn.ann
>>>
>>> When I tried to get gene stats or validate genome.ann, I get  
>>> errors like this for all of them:
>>>
>>> fathom genome.ann genome.dna -gene-stats |more
>>> MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds  
>>> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
>>> exon-5:out_of_bounds exon-6:out_of_bounds
>>> MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds  
>>> exon-5:out_of_bounds exon-4:out_of_bounds exon-3:out_of_bounds  
>>> exon-2:out_of_bounds exon-1:out_of_bounds
>>> MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds  
>>> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
>>> exon-5:out_of_bounds
>>> MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds  
>>> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds  
>>> exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds  
>>> exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds  
>>> exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds  
>>> exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds  
>>> exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds  
>>> exon-20:out_of_bounds exon-21:out_of_bounds
>>>
>>> I'm not sure why the annotation I'm seeing in genome.ann are all  
>>> showing up as errors. I realize this may be an issue with snap,  
>>> but are you familiar with anything like this? My genome.ann file  
>>> is attached for reference.
>>>
>>> Thanks
>>> Dhivya
>>>
>>> On Feb 5, 2014, at 12:38 PM, Carson Holt wrote:
>>>
>>>> Do you have any features of type snap in your results from step  
>>>> 3?  We?ve had a couple of recent posts where after training snap  
>>>> was giving no results, and as a result maker couldn?t give any  
>>>> genes.  One cause of something like that may be your step 2.   
>>>> Make sure the ZFF wasn?t empty you used to train with.  The  
>>>> maker2zff script uses filters to only put the best genes in the  
>>>> off file, and if all your genes fail the filtering then you are  
>>>> training with an empty ZFF.
>>>>
>>>> Also you should use proteins from a related species as your  
>>>> protein file.  I see that you protein marches are varying wildly  
>>>> from run to run? So is your contig count?  Were the subset of  
>>>> contigs you have results for long enough to contain genes?
>>>>
>>>> ?Carson
>>>>
>>>> From: dhivya arasappan <darasappan at gmail.com>
>>>> Date: Monday, February 3, 2014 at 9:31 AM
>>>> To: Daniel Ence <dence at genetics.utah.edu>
>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>> Subject: Re: [maker-devel] maker annotation with cufflinks output
>>>>
>>>> Hi Daniel,
>>>>
>>>> I was able to check on some of those questions.
>>>>
>>>> 1. From trinity assembly: I started with 102000 contigs. I used  
>>>> trinotate to annotate proteins in this.
>>>>
>>>> I ran maker on this data with est2genome set to 1. The output  
>>>> looks like this (most important parts on top):
>>>>
>>>>     6653 gene
>>>>    46675 exon
>>>>  280534 protein_match
>>>> 59934 CDS
>>>>     969 contig
>>>>  105388 expressed_sequence_match
>>>>   12584 five_prime_UTR
>>>>   78565 match
>>>> 1401369 match_part
>>>>   10180 mRNA
>>>>   11545 three_prime_UTR
>>>>
>>>> 2. From cufflinks assembly: I started with 133380 entries (out of  
>>>> which there are 29,000 transcripts).  I used the protein  
>>>> sequences from trinity assembly.
>>>>
>>>> I ran maker on this data with est2genome set to 1. The output  
>>>> looks like this:
>>>>      29 gene
>>>>      75 exon
>>>>  573659 protein_match
>>>> 67 CDS
>>>>    1099 contig
>>>>  269298 expressed_sequence_match
>>>>      23 five_prime_UTR
>>>>  173844 match
>>>> 2221846 match_part
>>>>      29 mRNA
>>>>      23 three_prime_UTR
>>>>
>>>> The genes annotated using the trinity assembly is lower than  
>>>> expected, so I went the cufflinks route. I dont understand why  
>>>> when using the cufflinks transcripts, even less genes are being  
>>>> found.
>>>>
>>>> 3. Training SNAP:  I used the results of maker from 1 to train  
>>>> SNAP.  I then used that training set to rerun maker:
>>>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ 
>>>> maker_mpi_withAlltrinity/snap/RHA.hmm
>>>> est2genome=0
>>>>
>>>> And again I got results with no entries for gene, exon, CDS etc.
>>>> 957 contig
>>>>   46555 expressed_sequence_match
>>>>   43651 match
>>>>  553633 match_part
>>>>  113738 protein_match
>>>>
>>>> As I mentioned in another email, cegma results indicated that the  
>>>> genome was more than 90% complete. Any suggestions would be  
>>>> helpful.
>>>>
>>>> Thank you
>>>> Dhivya
>>>>
>>>>
>>>>
>>>>
>>>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>>>>
>>>>> Hi Dhivya,
>>>>>
>>>>> I think there a few numbers that could be helpful to understand  
>>>>> what's happening here.
>>>>>
>>>>> How many transcripts did Trinity assembly the RNA-seq data into?  
>>>>> Also, you had 29,000 transcripts from cufflinks, but fewer from  
>>>>> MAKER when you gave it the cufflinks data. How many transcripts  
>>>>> did MAKER identify with the cufflinks data? Did you still get  
>>>>> more than the 10,000 transcripts that you found with just the  
>>>>> Trinity data?
>>>>>
>>>>> A key part of MAKER's approach to genome annotation that might  
>>>>> be affecting it's performance is that it only annotates a gene  
>>>>> where there is both evidence (like your RNA-seq data) and an ab- 
>>>>> initio prediction. If a prediction is unsupported by the  
>>>>> evidence, then MAKER won't annotate a gene and if evidence  
>>>>> aligns where there's no prediction, MAKER won't annotate a gene  
>>>>> either. What ab-initio predictors are you using and have they  
>>>>> been trained specific genome?
>>>>>
>>>>> You can force MAKER to automatically promote evidence alignments  
>>>>> to a gene model by setting the est2genome option to 1, but that  
>>>>> will usually give you many false positives.
>>>>>
>>>>> Try rerunning it with either the Trinity data or the Cufflinks  
>>>>> data and with est2genome set to 1, and let us know how that  
>>>>> affects the MAKER results.
>>>>>
>>>>> Thanks,
>>>>> Daniel
>>>>>
>>>>> Daniel Ence
>>>>> Graduate Student
>>>>> Eccles Institute of Human Genetics
>>>>> University of Utah
>>>>> 15 North 2030 East, Room 2100
>>>>> Salt Lake City, UT 84112-5330
>>>>> ________________________________________
>>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on  
>>>>> behalf of dhivya arasappan [darasappan at gmail.com]
>>>>> Sent: Thursday, January 30, 2014 11:18 AM
>>>>> To: maker-devel at yandell-lab.org
>>>>> Subject: [maker-devel] maker annotation with cufflinks output
>>>>>
>>>>> Hello,
>>>>>
>>>>> I am trying to annotate a 200 mb plant genome for which I have a  
>>>>> very
>>>>> good assembly.
>>>>>
>>>>> I tried to denovo assemble RNA-seq data using trinity and ran  
>>>>> maker
>>>>> using my genome assembly and the trinity results.  I did not get  
>>>>> as
>>>>> many transcripts as expected, around 10,000 transcripts.
>>>>>
>>>>> So, I decided to try a different approach.  I did a genome  
>>>>> assisted
>>>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>>>>> generated 21,000 genes, 29,000 transcripts.  I then ran maker  
>>>>> using my
>>>>> genome assembly and the cufflinks result.  I get much less  
>>>>> number of
>>>>> transcripts as a result.
>>>>>
>>>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>>>>> confused as to why maker is not finding the same.
>>>>>
>>>>> Any suggestions would be appreciated.
>>>>>
>>>>> Thanks
>>>>> Dhivya
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> maker-devel mailing list
>>>>> maker-devel at box290.bluehost.com
>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>
>>>> _______________________________________________ maker-devel  
>>>> mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140211/bf1fae70/attachment-0003.html>

From carsonhh at gmail.com  Tue Feb 11 11:55:38 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 11 Feb 2014 11:55:38 -0700
Subject: [maker-devel] maker annotation with cufflinks output
In-Reply-To: <0BB3E178-1CA3-46E7-8923-3E7C6B834665@gmail.com>
References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D43654@mxb2.hg.genetics.utah.edu>
	<0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>
	<CF17D1FC.987A%carsonhh@gmail.com>
	<C375C3D8-1B13-4685-9E90-AAF710CADCDD@gmail.com>
	<CF19004A.9913%carsonhh@gmail.com>
	<02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com>
	<CF190E83.9927%carsonhh@gmail.com>
	<CAGWaY_4mGU2DLWwcQ=_F3-O+YE1ZmDtE=zgdi6cVouhkH=N5HQ@mail.gmail.com>
	<CF19187C.994D%carsonhh@gmail.com>
	<0BB3E178-1CA3-46E7-8923-3E7C6B834665@gmail.com>
Message-ID: <CF1FBEEF.9BF5%carsonhh@gmail.com>

I wouldn?t use mpi_evaluator.  It is buggy and has virtually no
documentation.  The AED values are the best way to identify which genes are
higher and lower quality.  You can also run interproscan to identify protein
domain content as an independent evaluation. Look at this paper here ?>
http://www.biomedcentral.com/1471-2105/12/491

Figure 4 has a nice example of how AED, domain content, and gene orthology
correlate to show the quality of different subsets of genes in seven ant
genomes.

If you choose to try mpi_evaluator it uses the -CTL option to generate empty
files that you then fill in.

Thanks,
Carson


From:  dhivya arasappan <darasappan at gmail.com>
Date:  Tuesday, February 11, 2014 at 11:48 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Daniel Ence <dence at genetics.utah.edu>, <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] maker annotation with cufflinks output

With your suggested changes (using a protein file not derived from the
RNA-seq data and fixing the gff file for training SNAP), I was able to
increase the number of genes from 6000+ to 18116.

I'm now trying to evaluate the quality of the annotation.  I have a question
about the usage for mpi_evaluator.

In the maker tutorial,  the usage is given as:

 mpi_evaluator [options] <eval_opts> <eval_bopts> <eval_exe>
What files are being referred to in the input parameters: eval_opts,
eval_bopts and eval_exe?

Thanks 
Dhivya

On Feb 6, 2014, at 11:47 AM, Carson Holt wrote:

> Ok.  Content looks good.  Just make sure to use gff3_merge to join the GFF3?s
> without stripping out the fasta sequence at the end when training SNAP.
> 
> Thanks,
> Carson
> 
> 
> From:  dhivya arasappan <darasappan at gmail.com>
> Date:  Thursday, February 6, 2014 at 10:29 AM
> To:  Carson Holt <carsonhh at gmail.com>
> Cc:  Daniel Ence <dence at genetics.utah.edu>
> Subject:  Re: [maker-devel] maker annotation with cufflinks output
> 
> Sorry I was just trying to make it small enough to be approved by the mailing
> list.
> 
> Here is the whole file:
> 
> 
>  cat.formatted.gff.tgz
> <https://docs.google.com/file/d/0B3fACsJDXQi6VEE1VG5tWEh5M1U/edit?usp=drive_we
> b> 
> 
> 
> 
> On Thu, Feb 6, 2014 at 11:04 AM, Carson Holt <carsonhh at gmail.com> wrote:
>> Could you give me the file without using 'head? to trim it, its cutting it
>> before it reaches the part I?m interested in.
>> 
>> ?Carson
>> 
>> 
>> From:  dhivya arasappan <darasappan at gmail.com>
>> Date:  Thursday, February 6, 2014 at 10:01 AM
>> 
>> To:  Carson Holt <carsonhh at gmail.com>
>> Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
>> <maker-devel at yandell-lab.org>
>> Subject:  Re: [maker-devel] maker annotation with cufflinks output
>> 
>> Oh yes I did- I took just the non sequence entries in the gff file and used
>> that as my input.  I will rerun snap with the gff file containing the
>> sequences as well.
>> 
>> I'm attaching a snippet of the gff file that I used as input to maker2zff.
>> 
>> Thanks for your help
>> Dhivya
>> 
>> 
>> 
>> 
>> On Feb 6, 2014, at 10:05 AM, Carson Holt wrote:
>> 
>>> Your genome.dna file has no sequence?  Did you by any chance strip the fasta
>>> sequence from the GFF3 you are using as input to maker2zff?  There should be
>>> fasta sequence at the end of that file.  Also can I see the GFF3 file you
>>> are using as input to maker2zff.
>>> 
>>> Thanks,
>>> Carson
>>> 
>>> From:  dhivya arasappan <darasappan at gmail.com>
>>> Date:  Thursday, February 6, 2014 at 7:47 AM
>>> To:  Carson Holt <carsonhh at gmail.com>
>>> Cc:  Daniel Ence <dence at genetics.utah.edu>, "maker-devel at yandell-lab.org"
>>> <maker-devel at yandell-lab.org>
>>> Subject:  Re: [maker-devel] maker annotation with cufflinks output
>>> 
>>> Hello,
>>> 
>>> I does appear than my genome.ann file from maker2zff script has data in it.
>>> However, the SNAP steps after that have created empty files.  The following
>>> are all empty:
>>> 
>>> alt.dna  err.dna  export.dna  genome.dna  olp.dna  uni.dna  wrn.dna
>>> alt.ann  err.ann  export.ann  genome.ann  olp.ann  uni.ann  wrn.ann
>>> 
>>> When I tried to get gene stats or validate genome.ann, I get errors like
>>> this for all of them:
>>> 
>>> fathom genome.ann genome.dna -gene-stats |more
>>> MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds
>>> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
>>> exon-6:out_of_bounds
>>> MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds
>>> exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds
>>> exon-1:out_of_bounds
>>> MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds
>>> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
>>> MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds
>>> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds
>>> exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds
>>> exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds
>>> exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds
>>> exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds
>>> exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds
>>> exon-21:out_of_bounds
>>> 
>>> I'm not sure why the annotation I'm seeing in genome.ann are all showing up
>>> as errors. I realize this may be an issue with snap, but are you familiar
>>> with anything like this? My genome.ann file is attached for reference.
>>> 
>>> Thanks
>>> Dhivya
>>> 
>>> On Feb 5, 2014, at 12:38 PM, Carson Holt wrote:
>>> 
>>>> Do you have any features of type snap in your results from step 3?  We?ve
>>>> had a couple of recent posts where after training snap was giving no
>>>> results, and as a result maker couldn?t give any genes.  One cause of
>>>> something like that may be your step 2.  Make sure the ZFF wasn?t empty you
>>>> used to train with.  The maker2zff script uses filters to only put the best
>>>> genes in the off file, and if all your genes fail the filtering then you
>>>> are training with an empty ZFF.
>>>> 
>>>> Also you should use proteins from a related species as your protein file.
>>>> I see that you protein marches are varying wildly from run to run? So is
>>>> your contig count?  Were the subset of contigs you have results for long
>>>> enough to contain genes?
>>>> 
>>>> ?Carson
>>>> 
>>>> From:  dhivya arasappan <darasappan at gmail.com>
>>>> Date:  Monday, February 3, 2014 at 9:31 AM
>>>> To:  Daniel Ence <dence at genetics.utah.edu>
>>>> Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>> Subject:  Re: [maker-devel] maker annotation with cufflinks output
>>>> 
>>>> Hi Daniel,
>>>> 
>>>> I was able to check on some of those questions.
>>>> 
>>>> 1. From trinity assembly: I started with 102000 contigs. I used trinotate
>>>> to annotate proteins in this.
>>>> 
>>>> I ran maker on this data with est2genome set to 1. The output looks like
>>>> this (most important parts on top):
>>>> 
>>>>     6653 gene
>>>>    46675 exon
>>>>  280534 protein_match
>>>> 59934 CDS
>>>>     969 contig
>>>>  105388 expressed_sequence_match
>>>>   12584 five_prime_UTR
>>>>   78565 match
>>>> 1401369 match_part
>>>>   10180 mRNA
>>>>   11545 three_prime_UTR
>>>> 
>>>> 2. From cufflinks assembly: I started with 133380 entries (out of which
>>>> there are 29,000 transcripts).  I used the protein sequences from trinity
>>>> assembly.
>>>> 
>>>> I ran maker on this data with est2genome set to 1. The output looks like
>>>> this:
>>>>      29 gene
>>>>      75 exon
>>>>  573659 protein_match
>>>> 67 CDS
>>>>    1099 contig
>>>>  269298 expressed_sequence_match
>>>>      23 five_prime_UTR
>>>>  173844 match
>>>> 2221846 match_part
>>>>      29 mRNA
>>>>      23 three_prime_UTR
>>>> 
>>>> The genes annotated using the trinity assembly is lower than expected, so I
>>>> went the cufflinks route. I dont understand why when using the cufflinks
>>>> transcripts, even less genes are being found.
>>>> 
>>>> 3. Training SNAP:  I used the results of maker from 1 to train SNAP.  I
>>>> then used that training set to rerun maker:
>>>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/s
>>>> nap/RHA.hmm
>>>> est2genome=0
>>>> 
>>>> And again I got results with no entries for gene, exon, CDS etc.
>>>> 957 contig
>>>>   46555 expressed_sequence_match
>>>>   43651 match
>>>>  553633 match_part
>>>>  113738 protein_match
>>>> 
>>>> As I mentioned in another email, cegma results indicated that the genome
>>>> was more than 90% complete. Any suggestions would be helpful.
>>>> 
>>>> Thank you
>>>> Dhivya
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:
>>>> 
>>>>> Hi Dhivya, 
>>>>> 
>>>>> I think there a few numbers that could be helpful to understand what's
>>>>> happening here.
>>>>> 
>>>>> How many transcripts did Trinity assembly the RNA-seq data into? Also, you
>>>>> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave
>>>>> it the cufflinks data. How many transcripts did MAKER identify with the
>>>>> cufflinks data? Did you still get more than the 10,000 transcripts that
>>>>> you found with just the Trinity data?
>>>>> 
>>>>> A key part of MAKER's approach to genome annotation that might be
>>>>> affecting it's performance is that it only annotates a gene where there is
>>>>> both evidence (like your RNA-seq data) and an ab-initio prediction. If a
>>>>> prediction is unsupported by the evidence, then MAKER won't annotate a
>>>>> gene and if evidence aligns where there's no prediction, MAKER won't
>>>>> annotate a gene either. What ab-initio predictors are you using and have
>>>>> they been trained specific genome?
>>>>> 
>>>>> You can force MAKER to automatically promote evidence alignments to a gene
>>>>> model by setting the est2genome option to 1, but that will usually give
>>>>> you many false positives.
>>>>> 
>>>>> Try rerunning it with either the Trinity data or the Cufflinks data and
>>>>> with est2genome set to 1, and let us know how that affects the MAKER
>>>>> results. 
>>>>> 
>>>>> Thanks,
>>>>> Daniel
>>>>> 
>>>>> Daniel Ence
>>>>> Graduate Student
>>>>> Eccles Institute of Human Genetics
>>>>> University of Utah
>>>>> 15 North 2030 East, Room 2100
>>>>> Salt Lake City, UT 84112-5330
>>>>> ________________________________________
>>>>>  From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>>>>> dhivya arasappan [darasappan at gmail.com]
>>>>>  Sent: Thursday, January 30, 2014 11:18 AM
>>>>> To: maker-devel at yandell-lab.org
>>>>> Subject: [maker-devel] maker annotation with cufflinks output
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> I am trying to annotate a 200 mb plant genome for which I have a very
>>>>> good assembly.
>>>>> 
>>>>> I tried to denovo assemble RNA-seq data using trinity and ran maker
>>>>> using my genome assembly and the trinity results.  I did not get as
>>>>>  many transcripts as expected, around 10,000 transcripts.
>>>>> 
>>>>> So, I decided to try a different approach.  I did a genome assisted
>>>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline
>>>>> generated 21,000 genes, 29,000 transcripts.  I then ran maker using my
>>>>>  genome assembly and the cufflinks result.  I get much less number of
>>>>> transcripts as a result.
>>>>> 
>>>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm
>>>>> confused as to why maker is not finding the same.
>>>>> 
>>>>> Any suggestions would be appreciated.
>>>>> 
>>>>> Thanks
>>>>> Dhivya
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> maker-devel mailing list
>>>>> maker-devel at box290.bluehost.com
>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>> 
>>>> _______________________________________________ maker-devel mailing list
>>>> maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> 
>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140211/0f491f93/attachment-0003.html>

From carson.holt at genetics.utah.edu  Tue Feb 11 13:52:05 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Tue, 11 Feb 2014 20:52:05 +0000
Subject: [maker-devel] New MAKER release
Message-ID: <CF1FDB84.9C17%carson.holt@genetics.utah.edu>

Hello all,

MAKER has been updated to 2.31.

There are no major new features over 2.30.  It is primarily just bug fixes, and updates to the features that were added from MAKER-P like tRNAscan support.  I also was able to remove the seg faults that sometimes happened on exit under OpenMPI.

Thanks,
Carson

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140211/bce7d2a5/attachment-0003.html>

From carson.holt at genetics.utah.edu  Tue Feb 11 14:19:17 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Tue, 11 Feb 2014 21:19:17 +0000
Subject: [maker-devel] New MAKER release
In-Reply-To: <CA+77YqG+FiWr+HvSNYY6R6UOBCtcejA1wCLCXvzQr_Top5Eemw@mail.gmail.com>
References: <CF1FDB84.9C17%carson.holt@genetics.utah.edu>
	<CA+77YqG+FiWr+HvSNYY6R6UOBCtcejA1wCLCXvzQr_Top5Eemw@mail.gmail.com>
Message-ID: <CF1FDDCC.9C1B%carson.holt@genetics.utah.edu>

URLs can be manually edited in the .../maker/src/locations file. I?ve also updated that file in the latest MAKER download. to point to the new RepBase URL.

Thanks,
Carson

From: Joanna Kelley <jokelley at stanford.edu<mailto:jokelley at stanford.edu>>
Date: Tuesday, February 11, 2014 at 2:00 PM
To: Carson Holt <carson.holt at genetics.utah.edu<mailto:carson.holt at genetics.utah.edu>>
Subject: Re: [maker-devel] New MAKER release

Hi Carson,

The RepBase step is failing, it seems to be looking for the incorrect version, where do I change the code to solve that?

Thanks,
Joanna

 Downloading RepBase...
--2014-02-11 12:59:38--  http://www.girinst.org/server/RepBase/protected/repeatmaskerlibraries/repeatmaskerlibraries-20130422.tar.gz
Resolving www.girinst.org... 66.201.49.247
Connecting to www.girinst.org<http://www.girinst.org>|66.201.49.247|:80... connected.
HTTP request sent, awaiting response... 401 Authorization Required
Connecting to www.girinst.org<http://www.girinst.org>|66.201.49.247|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2014-02-11 12:59:38 ERROR 404: Not Found.


ERROR: Failed installing RepBase, now cleaning installation path...
You may need to install RepBase manually.


On Tue, Feb 11, 2014 at 12:52 PM, Carson Holt <carson.holt at genetics.utah.edu<mailto:carson.holt at genetics.utah.edu>> wrote:
Hello all,

MAKER has been updated to 2.31.

There are no major new features over 2.30.  It is primarily just bug fixes, and updates to the features that were added from MAKER-P like tRNAscan support.  I also was able to remove the seg faults that sometimes happened on exit under OpenMPI.

Thanks,
Carson


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


--
Please update your address book, my new email address is joanna.l.kelley at wsu.edu<mailto:joanna.l.kelley at wsu.edu>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140211/3da9afda/attachment-0003.html>

From dence at genetics.utah.edu  Tue Feb 11 15:59:57 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Tue, 11 Feb 2014 22:59:57 +0000
Subject: [maker-devel] ERROR: Failed while processing the chunk divide!!
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A8908ADE@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A8908ADE@SKREGIXES2.AGR.GC.CA>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D4462B@mxb2.hg.genetics.utah.edu>

Hi Hossen, 

I think that what would be the most help right now is if you ran MAKER on only one of those contigs that are failing and send me the entire error output along with the maker control files that you are using. It looks like the error is coming from the gff3 files that you are using as input. 

Thanks,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
Sent: Tuesday, February 11, 2014 3:51 PM
To: Daniel Ence
Subject: ERROR: Failed while processing the chunk divide!!

Dear Daniel

I re-started maker and it is still running. But in error our file that has
been generated so far it seems that smaller conitgs are affected. There
are contigs of 2-4 kb with this error but also I noticed a contig of 30kb
length having this error

I was wondering if I need to change the setting in the maker_opt file

#-----MAKER Behavior Options
max_dna_len=100000 #length for dividing up contigs into chunks
(increases/decreases  memory usage)
min_contig=1 #skip genome contigs below this length (under 10kb are often
useless)


If I understand correctly max_dna_len   divide conitgs  of over 100kb to
smaller chucks. However it is not clear to me that for the min_contig
option if the default contig length is 10kb or less, then why I have error
message for 30kb long contigs. Should I change this to 0

Here is an example of the error message for one of the contigs


#--------- command -------------#
Widget::exonerate::est2genome:
/usr/local/exonerate-2.2.0-x86_64/bin/exonerate  -q
/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.brass
icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35
/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta
-t
/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.brass
icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom
e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-1136.
fasta
-Q dna -T dna --model est2genome
--minintron 20 --showcigar --percent 20 >
/raid01/projects/Plasmodiophora/brassica
e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.brassi
cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3S
c00001.235-1136.comp14545_c0_seq1.est_exonerate
#-------------------------------#
cleaning blastn...
cleaning tblastx...
cleaning blastx...
ERROR: Failed on
PbPT3Sc00001_S_0.8_1-mRNA-1
Check your input GFF3 file for errors!
(from GFFDB)

FATAL ERROR
ERROR: Failed while processing the chunk
divide!!

ERROR: Chunk failed at level 17
!!
FAILED CONTIG:PbPT3Sc00001


--Next Contig--


Regards


HB


On 14-02-11 12:37 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hossein,
>
>Ok. So since this error came up on a local install, I'm going to need
>some more information to understand what went wrong. Is it the same
>contig that always causes this error? If it is, then is the the only
>error or warning that MAKER encounters while running on this contig? Or,
>if multiple contigs fail, then is it always the same error?
>
>If you can narrow it down to the smallest possible dataset that
>consistently gives the same error, then we canb egin to understand what's
>wrong.
>
>Thanks,
>Daniel
>
>
>Daniel Ence
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>Sent: Tuesday, February 11, 2014 11:20 AM
>To: Daniel Ence
>Subject: Re: [maker-devel] Falied to create new account
>
>Hi Daniel
>
>I running it through the local server at my work
>
>
>
>
>
>
>M. Hossein Borhan, Ph.D.
>Research Scientist/ Chercheur Scientifique
>Saskatoon Research Centre/Centre de Recherches de Saskatoon
>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
>107 Science Place, Saskatoon, SK.,S7N 0X2
>Telephone/T?l?phone: (306) 385-9441
>Facsimile/T?l?copieur: (306) 385-9482
>Hossein.borhan at agr.gc.ca
>
>
>
>
>
>
>
>
>On 14-02-11 12:16 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>
>>Hi Hossein,
>>
>>Did you encounter this error while you were running MAKER on your local
>>machine or through the MAKER web annotation service?
>>
>>Thanks,
>>Daniel
>>
>>
>>Daniel Ence
>>Graduate Student
>>Eccles Institute of Human Genetics
>>University of Utah
>>15 North 2030 East, Room 2100
>>Salt Lake City, UT 84112-5330
>>________________________________________
>>From: Carson Holt [carsonhh at gmail.com]
>>Sent: Tuesday, February 11, 2014 10:18 AM
>>To: Daniel Ence
>>Cc: Mark Yandell
>>Subject: FW: [maker-devel] Falied to create new account
>>
>>Hey Daniel could you download his dataset, and see if you can replicate
>>the error.  Also check if this was an MWAS job or a local maker run (his
>>dataset will already be there for MWAS, you just need the job ID).
>>
>>Thanks,
>>Carson
>>
>>On 2/11/14, 10:16 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA> wrote:
>>
>>>Hi Carson
>>>
>>>
>>>I encountered this error while running maker
>>>
>>>FATAL ERROR
>>>ERROR: Failed while processing the chunk divide!!
>>>
>>>ERROR: Chunk failed at level 17
>>>!!
>>>FAILED CONTIG:PbPT3Sc00006
>>>
>>>
>>>
>>>
>>>
>>>HB
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>>
>>>
>>
>>
>


From marc.hoeppner at imbim.uu.se  Wed Feb 12 01:34:12 2014
From: marc.hoeppner at imbim.uu.se (Marc P. Hoeppner)
Date: Wed, 12 Feb 2014 09:34:12 +0100
Subject: [maker-devel] Annotations from protein alignments
Message-ID: <52FB3204.60606@imbim.uu.se>

Dear list,

I have an annotation project with both protein data (it's a bird, so 
I've been using both vertebrates in general and chicken in specific), 
and huge amounts of somewhat dodgy (as in lot's of pre-mRNA) RNA-seq 
data. The chicken augustus model seems to do a decent job in seeding 
gene loci, but it's not quite perfect. I want to use protein alignments 
to create a high-confidence set of exons and subsequently a set of gene 
loci to train e.g. snap), but when testing to set protein2genome=1 I 
never get any annotations. This is also true for the test data set that 
is delivered together with Maker (hsap_). Anything I should know about 
the use of proteins to generate annotations? I left all settings in the 
config file at their defaults (except protein2genome=1). I've tried this 
with both Maker 2.30 and 2.31.

All the best,

Marc

-- 
-----------
Marc P. Hoeppner, PhD
Group leader
BILS Genome annotation platform

Department of Medical Biochemistry and Microbiology
Uppsala University, Sweden
marc.hoepner at imbim.uu.se


From carsonhh at gmail.com  Wed Feb 12 08:42:36 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 12 Feb 2014 08:42:36 -0700
Subject: [maker-devel] Annotations from protein alignments
In-Reply-To: <52FB3204.60606@imbim.uu.se>
References: <52FB3204.60606@imbim.uu.se>
Message-ID: <CF20E42A.9C8C%carsonhh@gmail.com>

I updated the 2.31 tar ball.  Go ahead and download it again.
protein2genome was turned off for eukaryotes and only working for
prokaryotic genomes.

?Carson


On 2/12/14, 1:34 AM, "Marc P. Hoeppner" <marc.hoeppner at imbim.uu.se> wrote:

>Dear list,
>
>I have an annotation project with both protein data (it's a bird, so
>I've been using both vertebrates in general and chicken in specific),
>and huge amounts of somewhat dodgy (as in lot's of pre-mRNA) RNA-seq
>data. The chicken augustus model seems to do a decent job in seeding
>gene loci, but it's not quite perfect. I want to use protein alignments
>to create a high-confidence set of exons and subsequently a set of gene
>loci to train e.g. snap), but when testing to set protein2genome=1 I
>never get any annotations. This is also true for the test data set that
>is delivered together with Maker (hsap_). Anything I should know about
>the use of proteins to generate annotations? I left all settings in the
>config file at their defaults (except protein2genome=1). I've tried this
>with both Maker 2.30 and 2.31.
>
>All the best,
>
>Marc
>
>-- 
>-----------
>Marc P. Hoeppner, PhD
>Group leader
>BILS Genome annotation platform
>
>Department of Medical Biochemistry and Microbiology
>Uppsala University, Sweden
>marc.hoepner at imbim.uu.se
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From dence at genetics.utah.edu  Wed Feb 12 11:59:11 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Wed, 12 Feb 2014 18:59:11 +0000
Subject: [maker-devel] ERROR: Failed while processing the chunk divide!!
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A8908D02@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A8908ADE@SKREGIXES2.AGR.GC.CA>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D4462B@mxb2.hg.genetics.utah.edu>,
	<E8EDFB90D92694478065C37017B3A3A6A8908D02@SKREGIXES2.AGR.GC.CA>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D448BA@mxb2.hg.genetics.utah.edu>

Hi Hossein, 

So, after looking at the gff3 and your control files, I had an idea. There's the part of the control file called "Re-annotation Using MAKER Derived GFF3", but you can also passthrough features from a gff3 using the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines. 

Sometimes we encounter problems with the MAKER passthrough. Could you try dividing the gff3 file into the different feature sources and passing it through the "est_gff" etc options and not with the MAKER passthrough? That will tell us if the problem is with the gff3 file or with how MAKER is processing it. 

Another also to check is to make sure that the contig names in the gff3 file match the contig names in the fasta file that you're annotating. 

Thanks,
Daniel


Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
Sent: Wednesday, February 12, 2014 8:49 AM
To: Daniel Ence
Subject: Re: ERROR: Failed while processing the chunk divide!!

Dear Daniel


I have generated the files that you requested. I choose Sc00009 from my
genome which is 30 kb and was one of the scaffolds coming up with error.
In addition to Ctl files and error output file I also attached a part of
the gff file related to SC00009 that is indicated in the error message.


Thanks for helping with this


Regards


HB


On 14-02-11 4:59 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hi Hossen,
>
>I think that what would be the most help right now is if you ran MAKER on
>only one of those contigs that are failing and send me the entire error
>output along with the maker control files that you are using. It looks
>like the error is coming from the gff3 files that you are using as input.
>
>Thanks,
>Daniel
>
>
>
>Daniel Ence
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>Sent: Tuesday, February 11, 2014 3:51 PM
>To: Daniel Ence
>Subject: ERROR: Failed while processing the chunk divide!!
>
>Dear Daniel
>
>I re-started maker and it is still running. But in error our file that has
>been generated so far it seems that smaller conitgs are affected. There
>are contigs of 2-4 kb with this error but also I noticed a contig of 30kb
>length having this error
>
>I was wondering if I need to change the setting in the maker_opt file
>
>#-----MAKER Behavior Options
>max_dna_len=100000 #length for dividing up contigs into chunks
>(increases/decreases  memory usage)
>min_contig=1 #skip genome contigs below this length (under 10kb are often
>useless)
>
>
>If I understand correctly max_dna_len   divide conitgs  of over 100kb to
>smaller chucks. However it is not clear to me that for the min_contig
>option if the default contig length is 10kb or less, then why I have error
>message for 30kb long contigs. Should I change this to 0
>
>Here is an example of the error message for one of the contigs
>
>
>#--------- command -------------#
>Widget::exonerate::est2genome:
>/usr/local/exonerate-2.2.0-x86_64/bin/exonerate  -q
>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras
>s
>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35
>/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta
>-t
>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras
>s
>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom
>e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-1136
>.
>fasta
>-Q dna -T dna --model est2genome
>--minintron 20 --showcigar --percent 20 >
>/raid01/projects/Plasmodiophora/brassica
>e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.brass
>i
>cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3
>S
>c00001.235-1136.comp14545_c0_seq1.est_exonerate
>#-------------------------------#
>cleaning blastn...
>cleaning tblastx...
>cleaning blastx...
>ERROR: Failed on
>PbPT3Sc00001_S_0.8_1-mRNA-1
>Check your input GFF3 file for errors!
>(from GFFDB)
>
>FATAL ERROR
>ERROR: Failed while processing the chunk
>divide!!
>
>ERROR: Chunk failed at level 17
>!!
>FAILED CONTIG:PbPT3Sc00001
>
>
>
>
>--Next Contig--
>
>
>
>
>
>
>Regards
>
>
>HB
>
>
>
>
>
>
>
>
>
>
>On 14-02-11 12:37 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>
>>Hossein,
>>
>>Ok. So since this error came up on a local install, I'm going to need
>>some more information to understand what went wrong. Is it the same
>>contig that always causes this error? If it is, then is the the only
>>error or warning that MAKER encounters while running on this contig? Or,
>>if multiple contigs fail, then is it always the same error?
>>
>>If you can narrow it down to the smallest possible dataset that
>>consistently gives the same error, then we canb egin to understand what's
>>wrong.
>>
>>Thanks,
>>Daniel
>>
>>
>>Daniel Ence
>>Graduate Student
>>Eccles Institute of Human Genetics
>>University of Utah
>>15 North 2030 East, Room 2100
>>Salt Lake City, UT 84112-5330
>>________________________________________
>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>>Sent: Tuesday, February 11, 2014 11:20 AM
>>To: Daniel Ence
>>Subject: Re: [maker-devel] Falied to create new account
>>
>>Hi Daniel
>>
>>I running it through the local server at my work
>>
>>
>>
>>
>>
>>
>>M. Hossein Borhan, Ph.D.
>>Research Scientist/ Chercheur Scientifique
>>Saskatoon Research Centre/Centre de Recherches de Saskatoon
>>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
>>107 Science Place, Saskatoon, SK.,S7N 0X2
>>Telephone/T?l?phone: (306) 385-9441
>>Facsimile/T?l?copieur: (306) 385-9482
>>Hossein.borhan at agr.gc.ca
>>
>>
>>
>>
>>
>>
>>
>>
>>On 14-02-11 12:16 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>
>>>Hi Hossein,
>>>
>>>Did you encounter this error while you were running MAKER on your local
>>>machine or through the MAKER web annotation service?
>>>
>>>Thanks,
>>>Daniel
>>>
>>>
>>>Daniel Ence
>>>Graduate Student
>>>Eccles Institute of Human Genetics
>>>University of Utah
>>>15 North 2030 East, Room 2100
>>>Salt Lake City, UT 84112-5330
>>>________________________________________
>>>From: Carson Holt [carsonhh at gmail.com]
>>>Sent: Tuesday, February 11, 2014 10:18 AM
>>>To: Daniel Ence
>>>Cc: Mark Yandell
>>>Subject: FW: [maker-devel] Falied to create new account
>>>
>>>Hey Daniel could you download his dataset, and see if you can replicate
>>>the error.  Also check if this was an MWAS job or a local maker run (his
>>>dataset will already be there for MWAS, you just need the job ID).
>>>
>>>Thanks,
>>>Carson
>>>
>>>On 2/11/14, 10:16 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA>
>>>wrote:
>>>
>>>>Hi Carson
>>>>
>>>>
>>>>I encountered this error while running maker
>>>>
>>>>FATAL ERROR
>>>>ERROR: Failed while processing the chunk divide!!
>>>>
>>>>ERROR: Chunk failed at level 17
>>>>!!
>>>>FAILED CONTIG:PbPT3Sc00006
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>HB
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>
>>>
>>>
>>
>


From dence at genetics.utah.edu  Wed Feb 12 12:15:59 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Wed, 12 Feb 2014 19:15:59 +0000
Subject: [maker-devel] ERROR: Failed while processing the chunk divide!!
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D448BA@mxb2.hg.genetics.utah.edu>
References: <E8EDFB90D92694478065C37017B3A3A6A8908ADE@SKREGIXES2.AGR.GC.CA>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D4462B@mxb2.hg.genetics.utah.edu>,
	<E8EDFB90D92694478065C37017B3A3A6A8908D02@SKREGIXES2.AGR.GC.CA>,
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D448BA@mxb2.hg.genetics.utah.edu>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D44928@mxb2.hg.genetics.utah.edu>

Hi Hossein, 

One more question. How did you make the gff3 that you're passing through here? 

Thanks,
Daniel 


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Daniel Ence [dence at genetics.utah.edu]
Sent: Wednesday, February 12, 2014 11:59 AM
To: Borhan, Hossein
Cc: maker-devel at yandell-lab.org
Subject: Re: [maker-devel] ERROR: Failed while processing the chunk divide!!

Hi Hossein,

So, after looking at the gff3 and your control files, I had an idea. There's the part of the control file called "Re-annotation Using MAKER Derived GFF3", but you can also passthrough features from a gff3 using the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines.

Sometimes we encounter problems with the MAKER passthrough. Could you try dividing the gff3 file into the different feature sources and passing it through the "est_gff" etc options and not with the MAKER passthrough? That will tell us if the problem is with the gff3 file or with how MAKER is processing it.

Another also to check is to make sure that the contig names in the gff3 file match the contig names in the fasta file that you're annotating.

Thanks,
Daniel


Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
Sent: Wednesday, February 12, 2014 8:49 AM
To: Daniel Ence
Subject: Re: ERROR: Failed while processing the chunk divide!!

Dear Daniel


I have generated the files that you requested. I choose Sc00009 from my
genome which is 30 kb and was one of the scaffolds coming up with error.
In addition to Ctl files and error output file I also attached a part of
the gff file related to SC00009 that is indicated in the error message.


Thanks for helping with this


Regards


HB


On 14-02-11 4:59 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hi Hossen,
>
>I think that what would be the most help right now is if you ran MAKER on
>only one of those contigs that are failing and send me the entire error
>output along with the maker control files that you are using. It looks
>like the error is coming from the gff3 files that you are using as input.
>
>Thanks,
>Daniel
>
>
>
>Daniel Ence
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>Sent: Tuesday, February 11, 2014 3:51 PM
>To: Daniel Ence
>Subject: ERROR: Failed while processing the chunk divide!!
>
>Dear Daniel
>
>I re-started maker and it is still running. But in error our file that has
>been generated so far it seems that smaller conitgs are affected. There
>are contigs of 2-4 kb with this error but also I noticed a contig of 30kb
>length having this error
>
>I was wondering if I need to change the setting in the maker_opt file
>
>#-----MAKER Behavior Options
>max_dna_len=100000 #length for dividing up contigs into chunks
>(increases/decreases  memory usage)
>min_contig=1 #skip genome contigs below this length (under 10kb are often
>useless)
>
>
>If I understand correctly max_dna_len   divide conitgs  of over 100kb to
>smaller chucks. However it is not clear to me that for the min_contig
>option if the default contig length is 10kb or less, then why I have error
>message for 30kb long contigs. Should I change this to 0
>
>Here is an example of the error message for one of the contigs
>
>
>#--------- command -------------#
>Widget::exonerate::est2genome:
>/usr/local/exonerate-2.2.0-x86_64/bin/exonerate  -q
>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras
>s
>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35
>/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta
>-t
>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras
>s
>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom
>e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-1136
>.
>fasta
>-Q dna -T dna --model est2genome
>--minintron 20 --showcigar --percent 20 >
>/raid01/projects/Plasmodiophora/brassica
>e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.brass
>i
>cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3
>S
>c00001.235-1136.comp14545_c0_seq1.est_exonerate
>#-------------------------------#
>cleaning blastn...
>cleaning tblastx...
>cleaning blastx...
>ERROR: Failed on
>PbPT3Sc00001_S_0.8_1-mRNA-1
>Check your input GFF3 file for errors!
>(from GFFDB)
>
>FATAL ERROR
>ERROR: Failed while processing the chunk
>divide!!
>
>ERROR: Chunk failed at level 17
>!!
>FAILED CONTIG:PbPT3Sc00001
>
>
>
>
>--Next Contig--
>
>
>
>
>
>
>Regards
>
>
>HB
>
>
>
>
>
>
>
>
>
>
>On 14-02-11 12:37 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>
>>Hossein,
>>
>>Ok. So since this error came up on a local install, I'm going to need
>>some more information to understand what went wrong. Is it the same
>>contig that always causes this error? If it is, then is the the only
>>error or warning that MAKER encounters while running on this contig? Or,
>>if multiple contigs fail, then is it always the same error?
>>
>>If you can narrow it down to the smallest possible dataset that
>>consistently gives the same error, then we canb egin to understand what's
>>wrong.
>>
>>Thanks,
>>Daniel
>>
>>
>>Daniel Ence
>>Graduate Student
>>Eccles Institute of Human Genetics
>>University of Utah
>>15 North 2030 East, Room 2100
>>Salt Lake City, UT 84112-5330
>>________________________________________
>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>>Sent: Tuesday, February 11, 2014 11:20 AM
>>To: Daniel Ence
>>Subject: Re: [maker-devel] Falied to create new account
>>
>>Hi Daniel
>>
>>I running it through the local server at my work
>>
>>
>>
>>
>>
>>
>>M. Hossein Borhan, Ph.D.
>>Research Scientist/ Chercheur Scientifique
>>Saskatoon Research Centre/Centre de Recherches de Saskatoon
>>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
>>107 Science Place, Saskatoon, SK.,S7N 0X2
>>Telephone/T?l?phone: (306) 385-9441
>>Facsimile/T?l?copieur: (306) 385-9482
>>Hossein.borhan at agr.gc.ca
>>
>>
>>
>>
>>
>>
>>
>>
>>On 14-02-11 12:16 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>
>>>Hi Hossein,
>>>
>>>Did you encounter this error while you were running MAKER on your local
>>>machine or through the MAKER web annotation service?
>>>
>>>Thanks,
>>>Daniel
>>>
>>>
>>>Daniel Ence
>>>Graduate Student
>>>Eccles Institute of Human Genetics
>>>University of Utah
>>>15 North 2030 East, Room 2100
>>>Salt Lake City, UT 84112-5330
>>>________________________________________
>>>From: Carson Holt [carsonhh at gmail.com]
>>>Sent: Tuesday, February 11, 2014 10:18 AM
>>>To: Daniel Ence
>>>Cc: Mark Yandell
>>>Subject: FW: [maker-devel] Falied to create new account
>>>
>>>Hey Daniel could you download his dataset, and see if you can replicate
>>>the error.  Also check if this was an MWAS job or a local maker run (his
>>>dataset will already be there for MWAS, you just need the job ID).
>>>
>>>Thanks,
>>>Carson
>>>
>>>On 2/11/14, 10:16 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA>
>>>wrote:
>>>
>>>>Hi Carson
>>>>
>>>>
>>>>I encountered this error while running maker
>>>>
>>>>FATAL ERROR
>>>>ERROR: Failed while processing the chunk divide!!
>>>>
>>>>ERROR: Chunk failed at level 17
>>>>!!
>>>>FAILED CONTIG:PbPT3Sc00006
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>HB
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>
>>>
>>>
>>
>


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From dence at genetics.utah.edu  Wed Feb 12 13:42:03 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Wed, 12 Feb 2014 20:42:03 +0000
Subject: [maker-devel] ERROR: Failed while processing the chunk divide!!
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A8908E3E@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A8908ADE@SKREGIXES2.AGR.GC.CA>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D4462B@mxb2.hg.genetics.utah.edu>
	<E8EDFB90D92694478065C37017B3A3A6A8908D02@SKREGIXES2.AGR.GC.CA>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D448BA@mxb2.hg.genetics.utah.edu>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D44928@mxb2.hg.genetics.utah.edu>
	<E8EDFB90D92694478065C37017B3A3A6A8908DE5@SKREGIXES2.AGR.GC.CA>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D4498A@mxb2.hg.genetics.utah.edu>,
	<E8EDFB90D92694478065C37017B3A3A6A8908E3E@SKREGIXES2.AGR.GC.CA>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D44A3B@mxb2.hg.genetics.utah.edu>

Hi Hossein, 

So, those problems with passing through MAKER-derived gff3 have been addressed in newer versions of MAKER. The current version is 2.31 and is available for download now on our website. Try installing that version and trying the same controls file you started out using, and let me know if that fixes the problems. 

Thanks,
Daniel

 
Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
Sent: Wednesday, February 12, 2014 12:55 PM
To: Daniel Ence
Subject: Re: ERROR: Failed while processing the chunk divide!!

Hi Daniel

I am using maker 2.10
 I also checked the naming of the scaffold in the genome file and the gff
file for the failed example. Naming is the same

Thanks

Hossein


M. Hossein Borhan, Ph.D.
Research Scientist/ Chercheur Scientifique
Saskatoon Research Centre/Centre de Recherches de Saskatoon
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
107 Science Place, Saskatoon, SK.,S7N 0X2
Telephone/T?l?phone: (306) 385-9441
Facsimile/T?l?copieur: (306) 385-9482
Hossein.borhan at agr.gc.ca


On 14-02-12 1:30 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hi Hossein,
>
>And which version of MAKER are you using?
>
>Thanks,
>Daniel
>
>
>Daniel Ence
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>Sent: Wednesday, February 12, 2014 12:25 PM
>To: Daniel Ence
>Subject: Re: ERROR: Failed while processing the chunk divide!!
>
>Hi Daniel
>
>Gff file was generated by the 1st run of maker
>
>
>
>HB
>
>
>
>
>
>
>
>On 14-02-12 1:15 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>
>>Hi Hossein,
>>
>>One more question. How did you make the gff3 that you're passing through
>>here?
>>
>>Thanks,
>>Daniel
>>
>>
>>Daniel Ence
>>Graduate Student
>>Eccles Institute of Human Genetics
>>University of Utah
>>15 North 2030 East, Room 2100
>>Salt Lake City, UT 84112-5330
>>________________________________________
>>From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>>Daniel Ence [dence at genetics.utah.edu]
>>Sent: Wednesday, February 12, 2014 11:59 AM
>>To: Borhan, Hossein
>>Cc: maker-devel at yandell-lab.org
>>Subject: Re: [maker-devel] ERROR: Failed while processing the chunk
>>divide!!
>>
>>Hi Hossein,
>>
>>So, after looking at the gff3 and your control files, I had an idea.
>>There's the part of the control file called "Re-annotation Using MAKER
>>Derived GFF3", but you can also passthrough features from a gff3 using
>>the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines.
>>
>>Sometimes we encounter problems with the MAKER passthrough. Could you try
>>dividing the gff3 file into the different feature sources and passing it
>>through the "est_gff" etc options and not with the MAKER passthrough?
>>That will tell us if the problem is with the gff3 file or with how MAKER
>>is processing it.
>>
>>Another also to check is to make sure that the contig names in the gff3
>>file match the contig names in the fasta file that you're annotating.
>>
>>Thanks,
>>Daniel
>>
>>
>>
>>Graduate Student
>>Eccles Institute of Human Genetics
>>University of Utah
>>15 North 2030 East, Room 2100
>>Salt Lake City, UT 84112-5330
>>________________________________________
>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>>Sent: Wednesday, February 12, 2014 8:49 AM
>>To: Daniel Ence
>>Subject: Re: ERROR: Failed while processing the chunk divide!!
>>
>>Dear Daniel
>>
>>
>>I have generated the files that you requested. I choose Sc00009 from my
>>genome which is 30 kb and was one of the scaffolds coming up with error.
>>In addition to Ctl files and error output file I also attached a part of
>>the gff file related to SC00009 that is indicated in the error message.
>>
>>
>>Thanks for helping with this
>>
>>
>>
>>Regards
>>
>>
>>HB
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>On 14-02-11 4:59 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>
>>>Hi Hossen,
>>>
>>>I think that what would be the most help right now is if you ran MAKER
>>>on
>>>only one of those contigs that are failing and send me the entire error
>>>output along with the maker control files that you are using. It looks
>>>like the error is coming from the gff3 files that you are using as
>>>input.
>>>
>>>Thanks,
>>>Daniel
>>>
>>>
>>>
>>>Daniel Ence
>>>Graduate Student
>>>Eccles Institute of Human Genetics
>>>University of Utah
>>>15 North 2030 East, Room 2100
>>>Salt Lake City, UT 84112-5330
>>>________________________________________
>>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>>>Sent: Tuesday, February 11, 2014 3:51 PM
>>>To: Daniel Ence
>>>Subject: ERROR: Failed while processing the chunk divide!!
>>>
>>>Dear Daniel
>>>
>>>I re-started maker and it is still running. But in error our file that
>>>has
>>>been generated so far it seems that smaller conitgs are affected. There
>>>are contigs of 2-4 kb with this error but also I noticed a contig of
>>>30kb
>>>length having this error
>>>
>>>I was wondering if I need to change the setting in the maker_opt file
>>>
>>>#-----MAKER Behavior Options
>>>max_dna_len=100000 #length for dividing up contigs into chunks
>>>(increases/decreases  memory usage)
>>>min_contig=1 #skip genome contigs below this length (under 10kb are
>>>often
>>>useless)
>>>
>>>
>>>If I understand correctly max_dna_len   divide conitgs  of over 100kb to
>>>smaller chucks. However it is not clear to me that for the min_contig
>>>option if the default contig length is 10kb or less, then why I have
>>>error
>>>message for 30kb long contigs. Should I change this to 0
>>>
>>>Here is an example of the error message for one of the contigs
>>>
>>>
>>>#--------- command -------------#
>>>Widget::exonerate::est2genome:
>>>/usr/local/exonerate-2.2.0-x86_64/bin/exonerate  -q
>>>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.br
>>>a
>>>s
>>>s
>>>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35
>>>/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta
>>>-t
>>>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.br
>>>a
>>>s
>>>s
>>>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom
>>>e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-11
>>>3
>>>6
>>>.
>>>fasta
>>>-Q dna -T dna --model est2genome
>>>--minintron 20 --showcigar --percent 20 >
>>>/raid01/projects/Plasmodiophora/brassica
>>>e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.bra
>>>s
>>>s
>>>i
>>>cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbP
>>>T
>>>3
>>>S
>>>c00001.235-1136.comp14545_c0_seq1.est_exonerate
>>>#-------------------------------#
>>>cleaning blastn...
>>>cleaning tblastx...
>>>cleaning blastx...
>>>ERROR: Failed on
>>>PbPT3Sc00001_S_0.8_1-mRNA-1
>>>Check your input GFF3 file for errors!
>>>(from GFFDB)
>>>
>>>FATAL ERROR
>>>ERROR: Failed while processing the chunk
>>>divide!!
>>>
>>>ERROR: Chunk failed at level 17
>>>!!
>>>FAILED CONTIG:PbPT3Sc00001
>>>
>>>
>>>
>>>
>>>--Next Contig--
>>>
>>>
>>>
>>>
>>>
>>>
>>>Regards
>>>
>>>
>>>HB
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>On 14-02-11 12:37 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>>
>>>>Hossein,
>>>>
>>>>Ok. So since this error came up on a local install, I'm going to need
>>>>some more information to understand what went wrong. Is it the same
>>>>contig that always causes this error? If it is, then is the the only
>>>>error or warning that MAKER encounters while running on this contig?
>>>>Or,
>>>>if multiple contigs fail, then is it always the same error?
>>>>
>>>>If you can narrow it down to the smallest possible dataset that
>>>>consistently gives the same error, then we canb egin to understand
>>>>what's
>>>>wrong.
>>>>
>>>>Thanks,
>>>>Daniel
>>>>
>>>>
>>>>Daniel Ence
>>>>Graduate Student
>>>>Eccles Institute of Human Genetics
>>>>University of Utah
>>>>15 North 2030 East, Room 2100
>>>>Salt Lake City, UT 84112-5330
>>>>________________________________________
>>>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>>>>Sent: Tuesday, February 11, 2014 11:20 AM
>>>>To: Daniel Ence
>>>>Subject: Re: [maker-devel] Falied to create new account
>>>>
>>>>Hi Daniel
>>>>
>>>>I running it through the local server at my work
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>M. Hossein Borhan, Ph.D.
>>>>Research Scientist/ Chercheur Scientifique
>>>>Saskatoon Research Centre/Centre de Recherches de Saskatoon
>>>>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
>>>>107 Science Place, Saskatoon, SK.,S7N 0X2
>>>>Telephone/T?l?phone: (306) 385-9441
>>>>Facsimile/T?l?copieur: (306) 385-9482
>>>>Hossein.borhan at agr.gc.ca
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>On 14-02-11 12:16 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>>>
>>>>>Hi Hossein,
>>>>>
>>>>>Did you encounter this error while you were running MAKER on your
>>>>>local
>>>>>machine or through the MAKER web annotation service?
>>>>>
>>>>>Thanks,
>>>>>Daniel
>>>>>
>>>>>
>>>>>Daniel Ence
>>>>>Graduate Student
>>>>>Eccles Institute of Human Genetics
>>>>>University of Utah
>>>>>15 North 2030 East, Room 2100
>>>>>Salt Lake City, UT 84112-5330
>>>>>________________________________________
>>>>>From: Carson Holt [carsonhh at gmail.com]
>>>>>Sent: Tuesday, February 11, 2014 10:18 AM
>>>>>To: Daniel Ence
>>>>>Cc: Mark Yandell
>>>>>Subject: FW: [maker-devel] Falied to create new account
>>>>>
>>>>>Hey Daniel could you download his dataset, and see if you can
>>>>>replicate
>>>>>the error.  Also check if this was an MWAS job or a local maker run
>>>>>(his
>>>>>dataset will already be there for MWAS, you just need the job ID).
>>>>>
>>>>>Thanks,
>>>>>Carson
>>>>>
>>>>>On 2/11/14, 10:16 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA>
>>>>>wrote:
>>>>>
>>>>>>Hi Carson
>>>>>>
>>>>>>
>>>>>>I encountered this error while running maker
>>>>>>
>>>>>>FATAL ERROR
>>>>>>ERROR: Failed while processing the chunk divide!!
>>>>>>
>>>>>>ERROR: Chunk failed at level 17
>>>>>>!!
>>>>>>FAILED CONTIG:PbPT3Sc00006
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>HB
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>>_______________________________________________
>>maker-devel mailing list
>>maker-devel at box290.bluehost.com
>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>


From masa at bioinfo.hr  Thu Feb 13 03:17:11 2014
From: masa at bioinfo.hr (Masa Roller)
Date: Thu, 13 Feb 2014 11:17:11 +0100
Subject: [maker-devel] SNAP scores and AED scores
Message-ID: <52FC9BA7.6060505@bioinfo.hr>

Dear all,

I ran snap2 based gene prediction through maker.

In the resulting gff file, in the source "snap_masked" I can find the 
score in the score column of every snap prediction that did not get 
promoted to a maker gene. This would be the score of how well the 
prediction matches the HMM?

It seems to me that those snap models that are given gene status no 
longer appear as snap_masked source but only as source "maker". Maker 
then removes the score column, instead giving AED and eAED scores (which 
are more about how the model corresponds to the evidence). When viewing 
the maker transcripts and SNAP predictions in a browser, they do not 
match (mostly, maker predictions are longer).

I am interested in the score of individual gene predictions that 
underlined maker gene models. Where could I find that information?

Many thanks!


From carsonhh at gmail.com  Thu Feb 13 13:11:22 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 13 Feb 2014 13:11:22 -0700
Subject: [maker-devel] SNAP scores and AED scores
In-Reply-To: <52FC9BA7.6060505@bioinfo.hr>
References: <52FC9BA7.6060505@bioinfo.hr>
Message-ID: <CF227374.9D6F%carsonhh@gmail.com>

No.  Snap genes do not disappear. All SNAP ab initio calls will always be
kept as reference fetters marked snap_masked (for repeat masked genome)
and snap (for unmasked genome).  MAKER then runs SNAP another time where
it feeds hints to SNAP based on EST and protein alignment evidence.  These
hint based models can then compete against the ab initio SNAP models to be
promoted to genes if their AED scores are better.  Fianl models can also
get UTR added based on EST evidence.  That is why you can get models from
MAKER that do not match the original SNAP ab initio calls.

So in summary, all SNAP ab initio models will be in snap_masked.  The
MAKER models will consist of hint based SNAP rerun plus SNAP ab intio
models processed to add UTR.

Thanks,
Carson


On 2/13/14, 3:17 AM, "Masa Roller" <masa at bioinfo.hr> wrote:

>Dear all,
>
>I ran snap2 based gene prediction through maker.
>
>In the resulting gff file, in the source "snap_masked" I can find the
>score in the score column of every snap prediction that did not get
>promoted to a maker gene. This would be the score of how well the
>prediction matches the HMM?
>
>It seems to me that those snap models that are given gene status no
>longer appear as snap_masked source but only as source "maker". Maker
>then removes the score column, instead giving AED and eAED scores (which
>are more about how the model corresponds to the evidence). When viewing
>the maker transcripts and SNAP predictions in a browser, they do not
>match (mostly, maker predictions are longer).
>
>I am interested in the score of individual gene predictions that
>underlined maker gene models. Where could I find that information?
>
>Many thanks!
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Thu Feb 13 13:23:07 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 13 Feb 2014 13:23:07 -0700
Subject: [maker-devel] SNAP scores and AED scores
In-Reply-To: <CF227374.9D6F%carsonhh@gmail.com>
References: <52FC9BA7.6060505@bioinfo.hr>
 <CF227374.9D6F%carsonhh@gmail.com>
Message-ID: <CF227602.9D7E%carsonhh@gmail.com>

On a side note.  Because the MAKER models involve modifying either the ab
initio SNAP model or manipulating the underlying scoring scheme using
hints, the SNAP score on those is virtually meaningless.  However Ian Korf
has developed a tool that can take any gene structure and reverse generate
a score (i.e. what would the score of this gene have been if SNAP would
have called it that way in the first place).

I believe the tool is called fathom and is part of the SNAP package.  It
is not well documented, so you might have to contact Ian Korf directly for
that.  You can use the maker2zff tool to generate the input to fathom.

Thanks,
Carson


On 2/13/14, 1:11 PM, "Carson Holt" <carsonhh at gmail.com> wrote:

>No.  Snap genes do not disappear. All SNAP ab initio calls will always be
>kept as reference fetters marked snap_masked (for repeat masked genome)
>and snap (for unmasked genome).  MAKER then runs SNAP another time where
>it feeds hints to SNAP based on EST and protein alignment evidence.  These
>hint based models can then compete against the ab initio SNAP models to be
>promoted to genes if their AED scores are better.  Fianl models can also
>get UTR added based on EST evidence.  That is why you can get models from
>MAKER that do not match the original SNAP ab initio calls.
>
>So in summary, all SNAP ab initio models will be in snap_masked.  The
>MAKER models will consist of hint based SNAP rerun plus SNAP ab intio
>models processed to add UTR.
>
>Thanks,
>Carson
>
>
>
>On 2/13/14, 3:17 AM, "Masa Roller" <masa at bioinfo.hr> wrote:
>
>>Dear all,
>>
>>I ran snap2 based gene prediction through maker.
>>
>>In the resulting gff file, in the source "snap_masked" I can find the
>>score in the score column of every snap prediction that did not get
>>promoted to a maker gene. This would be the score of how well the
>>prediction matches the HMM?
>>
>>It seems to me that those snap models that are given gene status no
>>longer appear as snap_masked source but only as source "maker". Maker
>>then removes the score column, instead giving AED and eAED scores (which
>>are more about how the model corresponds to the evidence). When viewing
>>the maker transcripts and SNAP predictions in a browser, they do not
>>match (mostly, maker predictions are longer).
>>
>>I am interested in the score of individual gene predictions that
>>underlined maker gene models. Where could I find that information?
>>
>>Many thanks!
>>
>>_______________________________________________
>>maker-devel mailing list
>>maker-devel at box290.bluehost.com
>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>


From barry.utah at gmail.com  Thu Feb 13 13:27:17 2014
From: barry.utah at gmail.com (Barry Moore)
Date: Thu, 13 Feb 2014 13:27:17 -0700
Subject: [maker-devel] SNAP scores and AED scores
In-Reply-To: <CF227374.9D6F%carsonhh@gmail.com>
References: <52FC9BA7.6060505@bioinfo.hr> <CF227374.9D6F%carsonhh@gmail.com>
Message-ID: <39AA5089-3E89-4067-A8DF-60B6716C98DF@genetics.utah.edu>

Hi Masa,

Also, if you want additional SNAP output that hasn't been passed forward in MAKER you can alway access the original SNAP output files in the MAKER datastore.  This is a directory structure created by MAKER to store contig specific data.  There is a datastore directory (and a corresponding index file) in the make output directory.  The index file will provide the path to individual contigs and in that contig specific directory there is a directory call theVoid.  This contains all of the output of each program that MAKER runs.

B

On Feb 13, 2014, at 1:11 PM, Carson Holt wrote:

> No.  Snap genes do not disappear. All SNAP ab initio calls will always be
> kept as reference fetters marked snap_masked (for repeat masked genome)
> and snap (for unmasked genome).  MAKER then runs SNAP another time where
> it feeds hints to SNAP based on EST and protein alignment evidence.  These
> hint based models can then compete against the ab initio SNAP models to be
> promoted to genes if their AED scores are better.  Fianl models can also
> get UTR added based on EST evidence.  That is why you can get models from
> MAKER that do not match the original SNAP ab initio calls.
> 
> So in summary, all SNAP ab initio models will be in snap_masked.  The
> MAKER models will consist of hint based SNAP rerun plus SNAP ab intio
> models processed to add UTR.
> 
> Thanks,
> Carson
> 
> 
> 
> On 2/13/14, 3:17 AM, "Masa Roller" <masa at bioinfo.hr> wrote:
> 
>> Dear all,
>> 
>> I ran snap2 based gene prediction through maker.
>> 
>> In the resulting gff file, in the source "snap_masked" I can find the
>> score in the score column of every snap prediction that did not get
>> promoted to a maker gene. This would be the score of how well the
>> prediction matches the HMM?
>> 
>> It seems to me that those snap models that are given gene status no
>> longer appear as snap_masked source but only as source "maker". Maker
>> then removes the score column, instead giving AED and eAED scores (which
>> are more about how the model corresponds to the evidence). When viewing
>> the maker transcripts and SNAP predictions in a browser, they do not
>> match (mostly, maker predictions are longer).
>> 
>> I am interested in the score of individual gene predictions that
>> underlined maker gene models. Where could I find that information?
>> 
>> Many thanks!
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140213/4966ce40/attachment-0003.html>

From mptrsen at uni-bonn.de  Thu Feb 13 20:00:24 2014
From: mptrsen at uni-bonn.de (Malte Petersen)
Date: Fri, 14 Feb 2014 04:00:24 +0100
Subject: [maker-devel] BLAST options error / should Maker check for file
	format?
Message-ID: <52FD86C8.6040007@uni-bonn.de>

Dear MAKER devs,

I was running Maker version 2.30p-beta on an insect genome, and it
didn't produce any output. I got these error messages:


Widget::formater:
/path/to/makeblastdb -dbtype nucl -in
/tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-62_e3%2Escaf.mpi.10.0
#-------------------------------#
BLAST options error: File
/tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-62_e3%2Escaf.mpi.10.0
is empty
ERROR: /path/to/makeblastdb failed in Widget::formater
--> rank=NA, hostname=Jeanne-GBR
ERROR: Failed while doing blastn of ESTs
ERROR: Chunk failed at level:0, tier_type:3
FAILED CONTIG:scf7180005143343

ERROR: Chunk failed at level:4, tier_type:0
FAILED CONTIG:scf7180005143343


I figured out that this error is due to a non-Fasta file format being
fed to Maker as extrinsic evidence (I gave it a meta-info file).  While
I got the pipeline running now with the correct file, I think that it
should be complaining (a lot earlier) if any of the input files are of
the wrong format.  More people might run into this problem and have no
idea where to look for a solution.

What do you think?

Best,
Malte


From carsonhh at gmail.com  Thu Feb 13 20:11:22 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 13 Feb 2014 20:11:22 -0700
Subject: [maker-devel] BLAST options error / should Maker check for file
 format?
In-Reply-To: <52FD86C8.6040007@uni-bonn.de>
References: <52FD86C8.6040007@uni-bonn.de>
Message-ID: <CF22D59B.9DEB%carsonhh@gmail.com>

Hi Malte,

Actually there already is.  I?m very surprised your file made it that far.
Normally it fails right away.

Example ?>

STATUS: Parsing control files...
STATUS: Processing and indexing input FASTA files...
ERROR: The fasta file /Users/cholt/Developer/maker/trunk/data/test1
appears to be empty.


Another test file ?>


STATUS: Parsing control files...
STATUS: Processing and indexing input FASTA files...
ERROR: The nucleotide sequence file
'/Users/cholt/Developer/maker/trunk/data/test2'
appears to contain protein sequence or unrecognized characters. Note
the following nucleotides may be valid but are unsupported [RYKMSWBDHV]
Please check/fix the file before continuing, or set -fix_nucleotides on
the command line to fix this automatically.
Invalid Character: 'M'


You seem to have found just the right formula of improper input to get
past the filters on your run :-)


Thanks,
Carson


On 2/13/14, 8:00 PM, "Malte Petersen" <mptrsen at uni-bonn.de> wrote:

>Dear MAKER devs,
>
>I was running Maker version 2.30p-beta on an insect genome, and it
>didn't produce any output. I got these error messages:
>
>
>Widget::formater:
>/path/to/makeblastdb -dbtype nucl -in
>/tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-6
>2_e3%2Escaf.mpi.10.0
>#-------------------------------#
>BLAST options error: File
>/tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-6
>2_e3%2Escaf.mpi.10.0
>is empty
>ERROR: /path/to/makeblastdb failed in Widget::formater
>--> rank=NA, hostname=Jeanne-GBR
>ERROR: Failed while doing blastn of ESTs
>ERROR: Chunk failed at level:0, tier_type:3
>FAILED CONTIG:scf7180005143343
>
>ERROR: Chunk failed at level:4, tier_type:0
>FAILED CONTIG:scf7180005143343
>
>
>I figured out that this error is due to a non-Fasta file format being
>fed to Maker as extrinsic evidence (I gave it a meta-info file).  While
>I got the pipeline running now with the correct file, I think that it
>should be complaining (a lot earlier) if any of the input files are of
>the wrong format.  More people might run into this problem and have no
>idea where to look for a solution.
>
>What do you think?
>
>Best,
>Malte
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From dence at genetics.utah.edu  Fri Feb 14 12:09:08 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Fri, 14 Feb 2014 19:09:08 +0000
Subject: [maker-devel] ERROR: Failed while processing the chunk divide!!
In-Reply-To: <E8EDFB90D92694478065C37017B3A3A6A89090D3@SKREGIXES2.AGR.GC.CA>
References: <E8EDFB90D92694478065C37017B3A3A6A8908ADE@SKREGIXES2.AGR.GC.CA>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D4462B@mxb2.hg.genetics.utah.edu>
	<E8EDFB90D92694478065C37017B3A3A6A8908D02@SKREGIXES2.AGR.GC.CA>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D448BA@mxb2.hg.genetics.utah.edu>,
	<E8EDFB90D92694478065C37017B3A3A6A89090D3@SKREGIXES2.AGR.GC.CA>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D452AD@mxb2.hg.genetics.utah.edu>

Hi Hossein, 

So, this is what is going on. The problem is with the GFF3 file, and the problem is that the exon features in that GFF3 should have the mRNA as their parent instead of the gene. When you deleted the "-mRNA-1", the Name of the mRNA became the same as the Name of the gene, which restored the proper relationship between the features. The same problem exists for the CDS features.

The solution for this is to make the exon and CDS parent's "point" to the mRNA and not the gene. Since MAKER has very regular rules for making names, this should be pretty straight forward. You should be ok with just adding "-mRNA-1" to the end of all the exon and CDS lines. This will work unless there some mRNAs with alternative splice forms because then the mRNA's will end with something like "-mRNA-2". 

I've attached a script that should do this for you. 

Run it with this command

"perl fix_gff3_script.pl <your_gff3> > <fixed_gff3>"

And then run MAKER with the fixed gff3 file in place of the old gff3 file. 

Let me know if that works, 

Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
Sent: Thursday, February 13, 2014 3:27 PM
To: Daniel Ence
Subject: Re: ERROR: Failed while processing the chunk divide!!

Dear Daniel


I downloaded maker 2.31 and ran the same scaffold. Again it gave error on
the gff file. I then removed the word mRNA-1 from my gff file and ran it
again. It seems to have worked this time. Attached are std error files for
first try std-err (the one that failed) and 2nd one named std-err-wo-mRNA
(that apparently worked).  Since the gff file is as evidence only I
thought it should not matter to remove the mRNA-1 naming form the gff file.


Cheers

HB


On 14-02-12 12:59 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hi Hossein,
>
>So, after looking at the gff3 and your control files, I had an idea.
>There's the part of the control file called "Re-annotation Using MAKER
>Derived GFF3", but you can also passthrough features from a gff3 using
>the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines.
>
>Sometimes we encounter problems with the MAKER passthrough. Could you try
>dividing the gff3 file into the different feature sources and passing it
>through the "est_gff" etc options and not with the MAKER passthrough?
>That will tell us if the problem is with the gff3 file or with how MAKER
>is processing it.
>
>Another also to check is to make sure that the contig names in the gff3
>file match the contig names in the fasta file that you're annotating.
>
>Thanks,
>Daniel
>
>
>
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>Sent: Wednesday, February 12, 2014 8:49 AM
>To: Daniel Ence
>Subject: Re: ERROR: Failed while processing the chunk divide!!
>
>Dear Daniel
>
>
>I have generated the files that you requested. I choose Sc00009 from my
>genome which is 30 kb and was one of the scaffolds coming up with error.
>In addition to Ctl files and error output file I also attached a part of
>the gff file related to SC00009 that is indicated in the error message.
>
>
>Thanks for helping with this
>
>
>
>Regards
>
>
>HB
>
>
>
>
>
>
>
>
>
>
>
>
>On 14-02-11 4:59 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>
>>Hi Hossen,
>>
>>I think that what would be the most help right now is if you ran MAKER on
>>only one of those contigs that are failing and send me the entire error
>>output along with the maker control files that you are using. It looks
>>like the error is coming from the gff3 files that you are using as input.
>>
>>Thanks,
>>Daniel
>>
>>
>>
>>Daniel Ence
>>Graduate Student
>>Eccles Institute of Human Genetics
>>University of Utah
>>15 North 2030 East, Room 2100
>>Salt Lake City, UT 84112-5330
>>________________________________________
>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>>Sent: Tuesday, February 11, 2014 3:51 PM
>>To: Daniel Ence
>>Subject: ERROR: Failed while processing the chunk divide!!
>>
>>Dear Daniel
>>
>>I re-started maker and it is still running. But in error our file that
>>has
>>been generated so far it seems that smaller conitgs are affected. There
>>are contigs of 2-4 kb with this error but also I noticed a contig of 30kb
>>length having this error
>>
>>I was wondering if I need to change the setting in the maker_opt file
>>
>>#-----MAKER Behavior Options
>>max_dna_len=100000 #length for dividing up contigs into chunks
>>(increases/decreases  memory usage)
>>min_contig=1 #skip genome contigs below this length (under 10kb are often
>>useless)
>>
>>
>>If I understand correctly max_dna_len   divide conitgs  of over 100kb to
>>smaller chucks. However it is not clear to me that for the min_contig
>>option if the default contig length is 10kb or less, then why I have
>>error
>>message for 30kb long contigs. Should I change this to 0
>>
>>Here is an example of the error message for one of the contigs
>>
>>
>>#--------- command -------------#
>>Widget::exonerate::est2genome:
>>/usr/local/exonerate-2.2.0-x86_64/bin/exonerate  -q
>>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bra
>>s
>>s
>>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35
>>/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta
>>-t
>>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bra
>>s
>>s
>>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom
>>e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-113
>>6
>>.
>>fasta
>>-Q dna -T dna --model est2genome
>>--minintron 20 --showcigar --percent 20 >
>>/raid01/projects/Plasmodiophora/brassica
>>e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.bras
>>s
>>i
>>cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT
>>3
>>S
>>c00001.235-1136.comp14545_c0_seq1.est_exonerate
>>#-------------------------------#
>>cleaning blastn...
>>cleaning tblastx...
>>cleaning blastx...
>>ERROR: Failed on
>>PbPT3Sc00001_S_0.8_1-mRNA-1
>>Check your input GFF3 file for errors!
>>(from GFFDB)
>>
>>FATAL ERROR
>>ERROR: Failed while processing the chunk
>>divide!!
>>
>>ERROR: Chunk failed at level 17
>>!!
>>FAILED CONTIG:PbPT3Sc00001
>>
>>
>>
>>
>>--Next Contig--
>>
>>
>>
>>
>>
>>
>>Regards
>>
>>
>>HB
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>On 14-02-11 12:37 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>
>>>Hossein,
>>>
>>>Ok. So since this error came up on a local install, I'm going to need
>>>some more information to understand what went wrong. Is it the same
>>>contig that always causes this error? If it is, then is the the only
>>>error or warning that MAKER encounters while running on this contig? Or,
>>>if multiple contigs fail, then is it always the same error?
>>>
>>>If you can narrow it down to the smallest possible dataset that
>>>consistently gives the same error, then we canb egin to understand
>>>what's
>>>wrong.
>>>
>>>Thanks,
>>>Daniel
>>>
>>>
>>>Daniel Ence
>>>Graduate Student
>>>Eccles Institute of Human Genetics
>>>University of Utah
>>>15 North 2030 East, Room 2100
>>>Salt Lake City, UT 84112-5330
>>>________________________________________
>>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA]
>>>Sent: Tuesday, February 11, 2014 11:20 AM
>>>To: Daniel Ence
>>>Subject: Re: [maker-devel] Falied to create new account
>>>
>>>Hi Daniel
>>>
>>>I running it through the local server at my work
>>>
>>>
>>>
>>>
>>>
>>>
>>>M. Hossein Borhan, Ph.D.
>>>Research Scientist/ Chercheur Scientifique
>>>Saskatoon Research Centre/Centre de Recherches de Saskatoon
>>>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
>>>107 Science Place, Saskatoon, SK.,S7N 0X2
>>>Telephone/T?l?phone: (306) 385-9441
>>>Facsimile/T?l?copieur: (306) 385-9482
>>>Hossein.borhan at agr.gc.ca
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>On 14-02-11 12:16 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>>
>>>>Hi Hossein,
>>>>
>>>>Did you encounter this error while you were running MAKER on your local
>>>>machine or through the MAKER web annotation service?
>>>>
>>>>Thanks,
>>>>Daniel
>>>>
>>>>
>>>>Daniel Ence
>>>>Graduate Student
>>>>Eccles Institute of Human Genetics
>>>>University of Utah
>>>>15 North 2030 East, Room 2100
>>>>Salt Lake City, UT 84112-5330
>>>>________________________________________
>>>>From: Carson Holt [carsonhh at gmail.com]
>>>>Sent: Tuesday, February 11, 2014 10:18 AM
>>>>To: Daniel Ence
>>>>Cc: Mark Yandell
>>>>Subject: FW: [maker-devel] Falied to create new account
>>>>
>>>>Hey Daniel could you download his dataset, and see if you can replicate
>>>>the error.  Also check if this was an MWAS job or a local maker run
>>>>(his
>>>>dataset will already be there for MWAS, you just need the job ID).
>>>>
>>>>Thanks,
>>>>Carson
>>>>
>>>>On 2/11/14, 10:16 AM, "Borhan, Hossein" <Hossein.Borhan at AGR.GC.CA>
>>>>wrote:
>>>>
>>>>>Hi Carson
>>>>>
>>>>>
>>>>>I encountered this error while running maker
>>>>>
>>>>>FATAL ERROR
>>>>>ERROR: Failed while processing the chunk divide!!
>>>>>
>>>>>ERROR: Chunk failed at level 17
>>>>>!!
>>>>>FAILED CONTIG:PbPT3Sc00006
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>HB
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix_gff3_script.pl
Type: application/octet-stream
Size: 349 bytes
Desc: fix_gff3_script.pl
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140214/364c961e/attachment-0003.obj>

From claudio.valero at wur.nl  Mon Feb 17 02:23:21 2014
From: claudio.valero at wur.nl (Valero Jimenez, Claudio)
Date: Mon, 17 Feb 2014 09:23:21 +0000
Subject: [maker-devel] Maker not predicting many genes
Message-ID: <A60E0B903F7C834D8F8ED0D21DE86ECF1CF820@SCOMP0936.wurnet.nl>

Dear list,

I'm trying to annotate a fungal genome, and I'm surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

Regards,

Claudio


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140217/69ce0cfc/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: maker_opts.log
Type: application/octet-stream
Size: 4776 bytes
Desc: maker_opts.log
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140217/69ce0cfc/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SOBA.pdf
Type: application/pdf
Size: 210262 bytes
Desc: SOBA.pdf
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140217/69ce0cfc/attachment-0003.pdf>

From carson.holt at genetics.utah.edu  Mon Feb 17 12:22:13 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Mon, 17 Feb 2014 19:22:13 +0000
Subject: [maker-devel] Maker not predicting many genes
In-Reply-To: <A60E0B903F7C834D8F8ED0D21DE86ECF1CF820@SCOMP0936.wurnet.nl>
References: <A60E0B903F7C834D8F8ED0D21DE86ECF1CF820@SCOMP0936.wurnet.nl>
Message-ID: <CF27AB29.9F59%carson.holt@genetics.utah.edu>

You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off.

If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).

Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.

Thanks,
Carson


From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>
Date: Monday, February 17, 2014 at 2:23 AM
To: "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Maker not predicting many genes

Dear list,

I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

Regards,

Claudio


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140217/d8a9d19c/attachment-0003.html>

From carsonhh at gmail.com  Mon Feb 17 12:26:05 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 17 Feb 2014 12:26:05 -0700
Subject: [maker-devel] Maker not predicting many genes
Message-ID: <CF27AFF8.9F83%carsonhh@gmail.com>

>From your control file, it looks like not setting single_exon=1, and only
using UniProt rather than supplying complete proteomes of a related species
are your primary shortcomings.  I?d set correct_est_fusion=1 as well.

?Carson


From:  Carson Holt <carson.holt at genetics.utah.edu>
Date:  Monday, February 17, 2014 at 12:22 PM
To:  "Valero Jimenez, Claudio" <claudio.valero at wur.nl>,
"'maker-devel at yandell-lab.org'" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Maker not predicting many genes

You also need to look at the contigs in a browser like apollo.  That will
allow you to see both the predictions and the evidence in context.  You can
then see if genes are being dropped because they are only being supported by
single exon evidence, they have no evidence support whatsoever, or if they
are being excluded because of UTR overlap.  That last one is a common
problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so
close that they often overlap in the UTR.  As a result, mRNA-seq assemblers
falsely asseble neighboring genes into single transcripts.  The result is
really long UTR on some of your gene models that force other models to be
excluded.  If this is the case, rerun something like trinity with the
jacquard clip option set  to avoid transcript fusion.  Then set
correct_est_fusion=1 in the MAKER control files to get those long false
UTR?s clipped off.

If it is a lack of evidence overlap, make sure you provided minimum 1
proteome from a related species to the protein= option.  At least 2
proteomes are recommended though (these are not proteins from the same
species but rather complete proteomes from related species).  Also
comprehensive databases like UniProt/Swiss-prot are not sufficient on their
own, but can supplement the other proteome data.  Also are you providing EST
data?  Note that EST/mRNA-seq data without a proteome from a related species
is also not siufficient (because both quality and how comprehensive
EST/mRNA-seq databsases are can vary so widely, and may only capture as
little as 30% of the genes).

Another thing that comes into play are single exon evidence.  In anything
but fungi, single exon evidence is mostly caused by spurious alignments.
But fungi have so many single exon genes, that this is not the case for
them.  Make sure single_exon=1 is set to allow that evidence to be kept, and
set the length of single exon evidence to keep to something like 250 bp.

Thanks,
Carson


From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl>
Date: Monday, February 17, 2014 at 2:23 AM
To: "'maker-devel at yandell-lab.org'" <maker-devel at yandell-lab.org>
Subject: Maker not predicting many genes

Dear list,
 
I?m trying to annotate a fungal genome, and I?m surprised that Maker does
not predict many genes (3697). I have trained SNAP and followed all the
tutorials available. Ab initio predictors are able to predict between
8000-10000 genes. It is something that I have in the configuration file that
is wrong?? I attach the ops file and the SOBA summary of the annotation.
 
Regards,
 
Claudio
 
 
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140217/6c29cf24/attachment-0003.html>

From claudio.valero at wur.nl  Wed Feb 19 01:20:04 2014
From: claudio.valero at wur.nl (Valero Jimenez, Claudio)
Date: Wed, 19 Feb 2014 08:20:04 +0000
Subject: [maker-devel] Maker not predicting many genes
In-Reply-To: <CF27AFF8.9F83%carsonhh@gmail.com>
References: <CF27AFF8.9F83%carsonhh@gmail.com>
Message-ID: <A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>

Hi Carson,

Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error:

Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67.

Similar thing happens when I try fasta_merge:

Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52.

I never had this problem before with these commands.


Regards,

Claudio

From: Carson Holt [mailto:carsonhh at gmail.com]
Sent: maandag 17 februari 2014 20:26
To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org'
Subject: Re: [maker-devel] Maker not predicting many genes

From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings.  I?d set correct_est_fusion=1 as well.

?Carson


From: Carson Holt <carson.holt at genetics.utah.edu<mailto:carson.holt at genetics.utah.edu>>
Date: Monday, February 17, 2014 at 12:22 PM
To: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>, "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] Maker not predicting many genes

You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off.

If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).

Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.

Thanks,
Carson


From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>
Date: Monday, February 17, 2014 at 2:23 AM
To: "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Maker not predicting many genes

Dear list,

I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

Regards,

Claudio


_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140219/ac13ef29/attachment-0003.html>

From carsonhh at gmail.com  Wed Feb 19 08:34:33 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 19 Feb 2014 08:34:33 -0700
Subject: [maker-devel] Maker not predicting many genes
In-Reply-To: <A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>
References: <CF27AFF8.9F83%carsonhh@gmail.com>
	<A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>
Message-ID: <CF2A1C44.A02B%carsonhh@gmail.com>

You provided a directory rather than a file to the -d option (?d' stands for
datastore log).
You must provide the location of the datastore index log file and not the
datastore directory.

Example ?> ./dpp_contig.maker.output/dpp_contig_master_datastore_index.log

Thanks,
Carson


From:  "Valero Jimenez, Claudio" <claudio.valero at wur.nl>
Date:  Wednesday, February 19, 2014 at 1:20 AM
To:  Carson Holt <carsonhh at gmail.com>, Carson Holt
<carson.holt at genetics.utah.edu>, "'maker-devel at yandell-lab.org'"
<maker-devel at yandell-lab.org>
Subject:  RE: [maker-devel] Maker not predicting many genes

Hi Carson,
 
Thank you for your suggestions. I ran again Maker and it was able to predict
many more genes. Although I have a different problem now. I try to run
gff3_merge and get the following error:
 
Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge
line 67.
 
Similar thing happens when I try fasta_merge:
 
Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge
line 52.
 
I never had this problem before with these commands.
 
 
Regards,
 
Claudio
 

From: Carson Holt [mailto:carsonhh at gmail.com]
Sent: maandag 17 februari 2014 20:26
To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org'
Subject: Re: [maker-devel] Maker not predicting many genes
 

>From your control file, it looks like not setting single_exon=1, and only
using UniProt rather than supplying complete proteomes of a related species
are your primary shortcomings.  I?d set correct_est_fusion=1 as well.

 
?Carson

 
From: Carson Holt <carson.holt at genetics.utah.edu>
Date: Monday, February 17, 2014 at 12:22 PM
To: "Valero Jimenez, Claudio" <claudio.valero at wur.nl>,
"'maker-devel at yandell-lab.org'" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Maker not predicting many genes

 
You also need to look at the contigs in a browser like apollo.  That will
allow you to see both the predictions and the evidence in context.  You can
then see if genes are being dropped because they are only being supported by
single exon evidence, they have no evidence support whatsoever, or if they
are being excluded because of UTR overlap.  That last one is a common
problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so
close that they often overlap in the UTR.  As a result, mRNA-seq assemblers
falsely asseble neighboring genes into single transcripts.  The result is
really long UTR on some of your gene models that force other models to be
excluded.  If this is the case, rerun something like trinity with the
jacquard clip option set  to avoid transcript fusion.  Then set
correct_est_fusion=1 in the MAKER control files to get those long false
UTR?s clipped off.

 
If it is a lack of evidence overlap, make sure you provided minimum 1
proteome from a related species to the protein= option.  At least 2
proteomes are recommended though (these are not proteins from the same
species but rather complete proteomes from related species).  Also
comprehensive databases like UniProt/Swiss-prot are not sufficient on their
own, but can supplement the other proteome data.  Also are you providing EST
data?  Note that EST/mRNA-seq data without a proteome from a related species
is also not siufficient (because both quality and how comprehensive
EST/mRNA-seq databsases are can vary so widely, and may only capture as
little as 30% of the genes).

 
Another thing that comes into play are single exon evidence.  In anything
but fungi, single exon evidence is mostly caused by spurious alignments.
But fungi have so many single exon genes, that this is not the case for
them.  Make sure single_exon=1 is set to allow that evidence to be kept, and
set the length of single exon evidence to keep to something like 250 bp.

 
Thanks,

Carson

 
From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl>
Date: Monday, February 17, 2014 at 2:23 AM
To: "'maker-devel at yandell-lab.org'" <maker-devel at yandell-lab.org>
Subject: Maker not predicting many genes

 
Dear list,
 
I?m trying to annotate a fungal genome, and I?m surprised that Maker does
not predict many genes (3697). I have trained SNAP and followed all the
tutorials available. Ab initio predictors are able to predict between
8000-10000 genes. It is something that I have in the configuration file that
is wrong?? I attach the ops file and the SOBA summary of the annotation.
 
Regards,
 
Claudio
 
 
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
<http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140219/a158d5b1/attachment-0003.html>

From dence at genetics.utah.edu  Wed Feb 19 09:04:08 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Wed, 19 Feb 2014 16:04:08 +0000
Subject: [maker-devel] Maker not predicting many genes
In-Reply-To: <A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>
References: <CF27AFF8.9F83%carsonhh@gmail.com>,
	<A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6272A@mxb2.hg.genetics.utah.edu>

Hi Claudio,

What was the command line you used for gff3_merge?

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl]
Sent: Wednesday, February 19, 2014 1:20 AM
To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org'
Subject: Re: [maker-devel] Maker not predicting many genes

Hi Carson,

Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error:

Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67.

Similar thing happens when I try fasta_merge:

Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52.

I never had this problem before with these commands.


Regards,

Claudio

From: Carson Holt [mailto:carsonhh at gmail.com]
Sent: maandag 17 februari 2014 20:26
To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org'
Subject: Re: [maker-devel] Maker not predicting many genes

>From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings.  I?d set correct_est_fusion=1 as well.

?Carson


From: Carson Holt <carson.holt at genetics.utah.edu<mailto:carson.holt at genetics.utah.edu>>
Date: Monday, February 17, 2014 at 12:22 PM
To: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>, "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] Maker not predicting many genes

You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off.

If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).

Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.

Thanks,
Carson


From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>
Date: Monday, February 17, 2014 at 2:23 AM
To: "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Maker not predicting many genes

Dear list,

I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

Regards,

Claudio


_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140219/4b409201/attachment-0003.html>

From claudio.valero at wur.nl  Wed Feb 19 09:33:36 2014
From: claudio.valero at wur.nl (Valero Jimenez, Claudio)
Date: Wed, 19 Feb 2014 16:33:36 +0000
Subject: [maker-devel] Maker not predicting many genes
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6272A@mxb2.hg.genetics.utah.edu>
References: <CF27AFF8.9F83%carsonhh@gmail.com>,
	<A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6272A@mxb2.hg.genetics.utah.edu>
Message-ID: <A60E0B903F7C834D8F8ED0D21DE86ECF1D695A@SCOMP0936.wurnet.nl>

Hi,

Thanks, I had a mistake in the command line!!!

Regards,

Claudio

From: Daniel Ence [mailto:dence at genetics.utah.edu]
Sent: woensdag 19 februari 2014 17:04
To: Valero Jimenez, Claudio; 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org'
Subject: RE: [maker-devel] Maker not predicting many genes

Hi Claudio,

What was the command line you used for gff3_merge?

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl]
Sent: Wednesday, February 19, 2014 1:20 AM
To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org'
Subject: Re: [maker-devel] Maker not predicting many genes
Hi Carson,

Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error:

Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67.

Similar thing happens when I try fasta_merge:

Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52.

I never had this problem before with these commands.


Regards,

Claudio

From: Carson Holt [mailto:carsonhh at gmail.com]
Sent: maandag 17 februari 2014 20:26
To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org'
Subject: Re: [maker-devel] Maker not predicting many genes

>From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings.  I'd set correct_est_fusion=1 as well.

-Carson


From: Carson Holt <carson.holt at genetics.utah.edu<mailto:carson.holt at genetics.utah.edu>>
Date: Monday, February 17, 2014 at 12:22 PM
To: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>, "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] Maker not predicting many genes

You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR's clipped off.

If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).

Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.

Thanks,
Carson


From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>
Date: Monday, February 17, 2014 at 2:23 AM
To: "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Maker not predicting many genes

Dear list,

I'm trying to annotate a fungal genome, and I'm surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

Regards,

Claudio


_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140219/2ad5ced8/attachment-0003.html>

From barry.utah at gmail.com  Wed Feb 19 11:03:47 2014
From: barry.utah at gmail.com (Barry Moore)
Date: Wed, 19 Feb 2014 11:03:47 -0700
Subject: [maker-devel] Maker not predicting many genes
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D6272A@mxb2.hg.genetics.utah.edu>
References: <CF27AFF8.9F83%carsonhh@gmail.com>,
	<A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6272A@mxb2.hg.genetics.utah.edu>
Message-ID: <0F5B5A10-4B50-47EC-847B-0223E4CCF612@genetics.utah.edu>

Hi Daniel,

Could you add an error message to those two scripts that detects that a filename is missing or that a directory was given instead and gives the user a suggested solution.

Thanks,

B

On Feb 19, 2014, at 9:04 AM, Daniel Ence wrote:

> Hi Claudio, 
> 
> What was the command line you used for gff3_merge?
> 
> Thanks,
> Daniel
> 
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl]
> Sent: Wednesday, February 19, 2014 1:20 AM
> To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org'
> Subject: Re: [maker-devel] Maker not predicting many genes
> 
> Hi Carson,
>  
> Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error:
>  
> Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67.
>  
> Similar thing happens when I try fasta_merge:
>  
> Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52.
>  
> I never had this problem before with these commands.
>  
>  
> Regards,
>  
> Claudio
>  
> From: Carson Holt [mailto:carsonhh at gmail.com] 
> Sent: maandag 17 februari 2014 20:26
> To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org'
> Subject: Re: [maker-devel] Maker not predicting many genes
>  
> From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings.  I?d set correct_est_fusion=1 as well.
>  
> ?Carson
>  
>  
> From: Carson Holt <carson.holt at genetics.utah.edu>
> Date: Monday, February 17, 2014 at 12:22 PM
> To: "Valero Jimenez, Claudio" <claudio.valero at wur.nl>, "'maker-devel at yandell-lab.org'" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Maker not predicting many genes
>  
> You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off.
>  
> If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).
>  
> Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.
>  
> Thanks,
> Carson
>  
>  
>  
>  
>  
>  
> From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl>
> Date: Monday, February 17, 2014 at 2:23 AM
> To: "'maker-devel at yandell-lab.org'" <maker-devel at yandell-lab.org>
> Subject: Maker not predicting many genes
>  
> Dear list,
>  
> I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.
>  
> Regards,
>  
> Claudio
>  
>  
> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140219/fa42921a/attachment-0003.html>

From carson.holt at genetics.utah.edu  Wed Feb 19 11:06:52 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Wed, 19 Feb 2014 18:06:52 +0000
Subject: [maker-devel] Maker not predicting many genes
In-Reply-To: <0F5B5A10-4B50-47EC-847B-0223E4CCF612@genetics.utah.edu>
References: <CF27AFF8.9F83%carsonhh@gmail.com>
	<A60E0B903F7C834D8F8ED0D21DE86ECF1D68DE@SCOMP0936.wurnet.nl>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D6272A@mxb2.hg.genetics.utah.edu>
	<0F5B5A10-4B50-47EC-847B-0223E4CCF612@genetics.utah.edu>
Message-ID: <CF2A4058.A064%carson.holt@genetics.utah.edu>

You only need to swap a single character in the script.  Just change the  -e (exists) test to a -f (is file) test.

Thanks,
Carson

From: Barry Moore <barry.utah at gmail.com<mailto:barry.utah at gmail.com>>
Date: Wednesday, February 19, 2014 at 11:03 AM
To: Daniel Ence <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>>
Cc: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>, Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>, Carson Holt <carson.holt at genetics.utah.edu<mailto:carson.holt at genetics.utah.edu>>, "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] Maker not predicting many genes

Hi Daniel,

Could you add an error message to those two scripts that detects that a filename is missing or that a directory was given instead and gives the user a suggested solution.

Thanks,

B

On Feb 19, 2014, at 9:04 AM, Daniel Ence wrote:

Hi Claudio,

What was the command line you used for gff3_merge?

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org>] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>]
Sent: Wednesday, February 19, 2014 1:20 AM
To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>'
Subject: Re: [maker-devel] Maker not predicting many genes

Hi Carson,

Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error:

Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67.

Similar thing happens when I try fasta_merge:

Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52.

I never had this problem before with these commands.


Regards,

Claudio

From: Carson Holt [mailto:carsonhh at gmail.com]
Sent: maandag 17 februari 2014 20:26
To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>'
Subject: Re: [maker-devel] Maker not predicting many genes

From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings.  I?d set correct_est_fusion=1 as well.

?Carson


From: Carson Holt <carson.holt at genetics.utah.edu<mailto:carson.holt at genetics.utah.edu>>
Date: Monday, February 17, 2014 at 12:22 PM
To: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>, "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] Maker not predicting many genes

You also need to look at the contigs in a browser like apollo.  That will allow you to see both the predictions and the evidence in context.  You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap.  That last one is a common problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so close that they often overlap in the UTR.  As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts.  The result is really long UTR on some of your gene models that force other models to be excluded.  If this is the case, rerun something like trinity with the jacquard clip option set  to avoid transcript fusion.  Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off.

If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option.  At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species).  Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data.  Also are you providing EST data?  Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes).

Another thing that comes into play are single exon evidence.  In anything but fungi, single exon evidence is mostly caused by spurious alignments.  But fungi have so many single exon genes, that this is not the case for them.  Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp.

Thanks,
Carson


From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl<mailto:claudio.valero at wur.nl>>
Date: Monday, February 17, 2014 at 2:23 AM
To: "'maker-devel at yandell-lab.org<mailto:'maker-devel at yandell-lab.org>'" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Maker not predicting many genes

Dear list,

I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation.

Regards,

Claudio


_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140219/6a80ec35/attachment-0003.html>

From gtaylor at bcgsc.ca  Fri Feb 21 11:48:42 2014
From: gtaylor at bcgsc.ca (Greg Taylor)
Date: Fri, 21 Feb 2014 10:48:42 -0800
Subject: [maker-devel] Maker jobs hanging
Message-ID: <C521977B031ADB40857D0FE9C98CC82737CC600AA1@xchange4>

Hello,
 I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb genome with predictors SNAP and Genemark, and using ABySS assembled RNA-seq data. To do this I am using 480 processors on our local cluster. Once a run begins, 479 contigs are started, as noted in the *_master_datastore_index.log file, the standard error log for the whole job looks normal, as do the run.log and run.log.child.0 for the daughter processes. This seems to be sequence dependent, as re-running contigs that hang doesn't help, the same contigs will always hang. I'm still looking into this myself, but it seems most if not all the jobs are stuck at the Blastx stage. If you have any suggestions, your help would be greatly appreciated.

sincerely,
Greg Taylor


From dence at genetics.utah.edu  Fri Feb 21 11:54:17 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Fri, 21 Feb 2014 18:54:17 +0000
Subject: [maker-devel] Maker jobs hanging
In-Reply-To: <C521977B031ADB40857D0FE9C98CC82737CC600AA1@xchange4>
References: <C521977B031ADB40857D0FE9C98CC82737CC600AA1@xchange4>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D66CD0@mxb2.hg.genetics.utah.edu>

Hi Greg, 

Since this is probably going to be a more complicated situation, would you upload your data and control file at this URL so that we can try to replicate the error on our machines?
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=166

Also, which version of MPI are you using? And you might want to try updating MAKER. I think version 2.31 was just updated a few weeks ago. 

Thanks,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Greg Taylor [gtaylor at bcgsc.ca]
Sent: Friday, February 21, 2014 11:48 AM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] Maker jobs hanging

Hello,
 I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb genome with predictors SNAP and Genemark, and using ABySS assembled RNA-seq data. To do this I am using 480 processors on our local cluster. Once a run begins, 479 contigs are started, as noted in the *_master_datastore_index.log file, the standard error log for the whole job looks normal, as do the run.log and run.log.child.0 for the daughter processes. This seems to be sequence dependent, as re-running contigs that hang doesn't help, the same contigs will always hang. I'm still looking into this myself, but it seems most if not all the jobs are stuck at the Blastx stage. If you have any suggestions, your help would be greatly appreciated.

sincerely,
Greg Taylor
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Fri Feb 21 11:56:50 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 21 Feb 2014 11:56:50 -0700
Subject: [maker-devel] Maker jobs hanging
Message-ID: <CF2CEDC6.A15D%carsonhh@gmail.com>

Use 2.31.  It has been tested to work without issue on several thousand
cpus.  Also use OpenMPI for any jobs greater than 100 cpus. In addition,
OpenMPI can freeze on some systems without the following flag when using
perl based MPI programs --> -mca btl ^openib

Example --> mpiexec -mca btl ^openib -n 200 maker


Finally, never use MVAPICH2.  It doesn't play well with perl, and freezes
whenever perl based MPI jobs extend across nodes (they run fine within a
single node though).

?Carson


On 2/21/14, 11:48 AM, "Greg Taylor" <gtaylor at bcgsc.ca> wrote:

>Hello,
> I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb
>genome with predictors SNAP and Genemark, and using ABySS assembled
>RNA-seq data. To do this I am using 480 processors on our local cluster.
>Once a run begins, 479 contigs are started, as noted in the
>*_master_datastore_index.log file, the standard error log for the whole
>job looks normal, as do the run.log and run.log.child.0 for the daughter
>processes. This seems to be sequence dependent, as re-running contigs
>that hang doesn't help, the same contigs will always hang. I'm still
>looking into this myself, but it seems most if not all the jobs are stuck
>at the Blastx stage. If you have any suggestions, your help would be
>greatly appreciated.
>
>sincerely,
>Greg Taylor
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From dence at genetics.utah.edu  Fri Feb 21 15:04:34 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Fri, 21 Feb 2014 22:04:34 +0000
Subject: [maker-devel] FW:  Maker jobs hanging
In-Reply-To: <CF2CEDC6.A15D%carsonhh@gmail.com>
References: <CF2CEDC6.A15D%carsonhh@gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D66D7E@mxb2.hg.genetics.utah.edu>

Hi Greg, 

You should be able to have the new MAKER work on the old datastore. Note the following advice from the main MAKER developer, Carson Holt. 

Thanks,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Carson Holt [carsonhh at gmail.com]
Sent: Friday, February 21, 2014 11:56 AM
To: Greg Taylor; maker-devel at yandell-lab.org
Subject: Re: [maker-devel] Maker jobs hanging

Use 2.31.  It has been tested to work without issue on several thousand
cpus.  Also use OpenMPI for any jobs greater than 100 cpus. In addition,
OpenMPI can freeze on some systems without the following flag when using
perl based MPI programs --> -mca btl ^openib

Example --> mpiexec -mca btl ^openib -n 200 maker


Finally, never use MVAPICH2.  It doesn't play well with perl, and freezes
whenever perl based MPI jobs extend across nodes (they run fine within a
single node though).

?Carson


On 2/21/14, 11:48 AM, "Greg Taylor" <gtaylor at bcgsc.ca> wrote:

>Hello,
> I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb
>genome with predictors SNAP and Genemark, and using ABySS assembled
>RNA-seq data. To do this I am using 480 processors on our local cluster.
>Once a run begins, 479 contigs are started, as noted in the
>*_master_datastore_index.log file, the standard error log for the whole
>job looks normal, as do the run.log and run.log.child.0 for the daughter
>processes. This seems to be sequence dependent, as re-running contigs
>that hang doesn't help, the same contigs will always hang. I'm still
>looking into this myself, but it seems most if not all the jobs are stuck
>at the Blastx stage. If you have any suggestions, your help would be
>greatly appreciated.
>
>sincerely,
>Greg Taylor
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From dence at genetics.utah.edu  Fri Feb 21 19:38:59 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Sat, 22 Feb 2014 02:38:59 +0000
Subject: [maker-devel] I am a PhD candidate at NMSU and have a question
	about maker2
In-Reply-To: <7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu>
References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu>,
	<7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D66E9C@mxb2.hg.genetics.utah.edu>

Hi Joe, 

Will you upload your control files and data at this URL?
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169

Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene?

I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues. 

Thanks,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Mark Yandell
Sent: Friday, February 21, 2014 7:32 PM
To: Daniel Ence
Subject: FW: I am a PhD candidate at NMSU and have a question about maker2

Mark Yandell
Professor of Human Genetics
H.A. & Edna Benning Presidential Endowed Chair
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
ph:801-587-7707

________________________________________
From: Joseph Said [joesaid at nmsu.edu]
Sent: Friday, February 21, 2014 5:18 PM
To: Mark Yandell
Subject: I am a PhD candidate at NMSU and have a question about maker2

Dear Dr. Yandell,

I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me.

Thank you,
Joe

Sent from my iPad


From dence at genetics.utah.edu  Fri Feb 21 21:27:10 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Sat, 22 Feb 2014 04:27:10 +0000
Subject: [maker-devel] I am a PhD candidate at NMSU and have a question
	about maker2
In-Reply-To: <d5533a5c463b498e877651cd01820309@BY2PR01MB506.prod.exchangelabs.com>
References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu>,
	<7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu>,
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66E9C@mxb2.hg.genetics.utah.edu>,
	<d5533a5c463b498e877651cd01820309@BY2PR01MB506.prod.exchangelabs.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D66ECE@mxb2.hg.genetics.utah.edu>

Hi Joe, 

MAKER runs blast from your local system (or your server where MAKER is installed), and it blasts evidence that the user supplies in the "est" and "protein" settings. The est and protein settings are set in the maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file and the specific blast settings are in the "maker_bopts.ctl" file. 

Will you attach those file to your reply, so we can make sure that the settings are set up correctly?

Thanks,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Joseph Said [joesaid at nmsu.edu]
Sent: Friday, February 21, 2014 7:44 PM
To: Daniel Ence
Subject: RE: I am a PhD candidate at NMSU and have a question about maker2

Hi Daniel,

Thank you for getting back to me so quickly. I am using the cotton Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I believe maker2 just calls BLAST from NCBI's page. So when I search the cotton genome it returns zero hits. But then I used a known cotton gene as a test and ran a search and also returned zero hits. I am not sure what the problem is but it seems like the protocol that should be returning the results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits. I can a BLAST standalone and came up with hits for both my gene of interest and the control test gene and came up with results.

Thanks,
Joe
________________________________________
From: Daniel Ence <dence at genetics.utah.edu>
Sent: Friday, February 21, 2014 7:38 PM
To: Joseph Said
Cc: maker-devel at yandell-lab.org
Subject: RE: I am a PhD candidate at NMSU and have a question about maker2

Hi Joe,

Will you upload your control files and data at this URL?
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169

Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene?

I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues.

Thanks,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: Mark Yandell
Sent: Friday, February 21, 2014 7:32 PM
To: Daniel Ence
Subject: FW: I am a PhD candidate at NMSU and have a question about maker2

Mark Yandell
Professor of Human Genetics
H.A. & Edna Benning Presidential Endowed Chair
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
ph:801-587-7707

________________________________________
From: Joseph Said [joesaid at nmsu.edu]
Sent: Friday, February 21, 2014 5:18 PM
To: Mark Yandell
Subject: I am a PhD candidate at NMSU and have a question about maker2

Dear Dr. Yandell,

I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me.

Thank you,
Joe

Sent from my iPad


From dence at genetics.utah.edu  Sat Feb 22 15:51:48 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Sat, 22 Feb 2014 22:51:48 +0000
Subject: [maker-devel] I am a PhD candidate at NMSU and have a question
	about maker2
In-Reply-To: <CA+ebk3=kXzXEH+DVjKFvMNt689-Gwjw-+6GtySaMG_gZLQ5XvA@mail.gmail.com>
References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu>
	<7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66E9C@mxb2.hg.genetics.utah.edu>
	<d5533a5c463b498e877651cd01820309@BY2PR01MB506.prod.exchangelabs.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66ECE@mxb2.hg.genetics.utah.edu>
	<6FA1C2F9-68A0-4154-8825-7B502E4762BF@nmsu.edu>,
	<CA+ebk3=kXzXEH+DVjKFvMNt689-Gwjw-+6GtySaMG_gZLQ5XvA@mail.gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D66F8F@mxb2.hg.genetics.utah.edu>

Hi,

Will you send me the long file that you were trying to blast against?

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: Hua Zhong [zh9118 at gmail.com]
Sent: Saturday, February 22, 2014 10:46 AM
To: Daniel Ence
Cc: Joe Song; Joseph Said
Subject: Re: I am a PhD candidate at NMSU and have a question about maker2

hi all,
Attached are the three configuration files and two input files, which are used to predict something between the genome and protein. For a simple test, we used one short sequence about 60bp and its translated protein sequence as inputs. But got nothing returned. What's more, we did test long genome sequence as one input as well, but still got nothing. I am not sure what's the reason cause this result.
Thanks a lot for help.

Hua


On Fri, Feb 21, 2014 at 9:31 PM, Joseph Said <joesaid at nmsu.edu<mailto:joesaid at nmsu.edu>> wrote:
Hi Daniel,

I do not have the exact files with me right now, but my coauthors on the paper I am working on have been copied on this email. Hua can send you those files. Thank you for being very helpful especially on a Friday night.

Thanks,
Joe

Sent from my iPad

> On Feb 21, 2014, at 9:27 PM, "Daniel Ence" <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>> wrote:
>
> Hi Joe,
>
> MAKER runs blast from your local system (or your server where MAKER is installed), and it blasts evidence that the user supplies in the "est" and "protein" settings. The est and protein settings are set in the maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file and the specific blast settings are in the "maker_bopts.ctl" file.
>
> Will you attach those file to your reply, so we can make sure that the settings are set up correctly?
>
> Thanks,
> Daniel
>
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ________________________________________
> From: Joseph Said [joesaid at nmsu.edu<mailto:joesaid at nmsu.edu>]
> Sent: Friday, February 21, 2014 7:44 PM
> To: Daniel Ence
> Subject: RE: I am a PhD candidate at NMSU and have a question about maker2
>
> Hi Daniel,
>
> Thank you for getting back to me so quickly. I am using the cotton Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I believe maker2 just calls BLAST from NCBI's page. So when I search the cotton genome it returns zero hits. But then I used a known cotton gene as a test and ran a search and also returned zero hits. I am not sure what the problem is but it seems like the protocol that should be returning the results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits. I can a BLAST standalone and came up with hits for both my gene of interest and the control test gene and came up with results.
>
> Thanks,
> Joe
> ________________________________________
> From: Daniel Ence <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>>
> Sent: Friday, February 21, 2014 7:38 PM
> To: Joseph Said
> Cc: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
> Subject: RE: I am a PhD candidate at NMSU and have a question about maker2
>
> Hi Joe,
>
> Will you upload your control files and data at this URL?
> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169
>
> Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene?
>
> I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues.
>
> Thanks,
> Daniel
>
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ________________________________________
> From: Mark Yandell
> Sent: Friday, February 21, 2014 7:32 PM
> To: Daniel Ence
> Subject: FW: I am a PhD candidate at NMSU and have a question about maker2
>
> Mark Yandell
> Professor of Human Genetics
> H.A. & Edna Benning Presidential Endowed Chair
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ph:801-587-7707
>
> ________________________________________
> From: Joseph Said [joesaid at nmsu.edu<mailto:joesaid at nmsu.edu>]
> Sent: Friday, February 21, 2014 5:18 PM
> To: Mark Yandell
> Subject: I am a PhD candidate at NMSU and have a question about maker2
>
> Dear Dr. Yandell,
>
> I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me.
>
> Thank you,
> Joe
>
> Sent from my iPad

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140222/2fbf1dbc/attachment-0003.html>

From dence at genetics.utah.edu  Sat Feb 22 16:21:51 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Sat, 22 Feb 2014 23:21:51 +0000
Subject: [maker-devel] I am a PhD candidate at NMSU and have a question
	about maker2
In-Reply-To: <CA+ebk3=2mJi_1wxy5gnkOb4syEVZ14Pcj_bGRVcq=uHgySPmqQ@mail.gmail.com>
References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu>
	<7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66E9C@mxb2.hg.genetics.utah.edu>
	<d5533a5c463b498e877651cd01820309@BY2PR01MB506.prod.exchangelabs.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66ECE@mxb2.hg.genetics.utah.edu>
	<6FA1C2F9-68A0-4154-8825-7B502E4762BF@nmsu.edu>
	<CA+ebk3=kXzXEH+DVjKFvMNt689-Gwjw-+6GtySaMG_gZLQ5XvA@mail.gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66F8F@mxb2.hg.genetics.utah.edu>,
	<CA+ebk3=2mJi_1wxy5gnkOb4syEVZ14Pcj_bGRVcq=uHgySPmqQ@mail.gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D66FAB@mxb2.hg.genetics.utah.edu>

Hi Hua, will you upload the genome file to this URL?
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=170
I am more concerned that MAKER didn't find the gene in the whole genome than in the 60bp substring. I think that MAKER needs more sequence than that to annotate a gene model.

Will you also upload the MAKER output and datastore from the MAKER run?

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: Hua Zhong [zh9118 at gmail.com]
Sent: Saturday, February 22, 2014 4:00 PM
To: Daniel Ence
Cc: maker-devel at yandell-lab.org; Joseph Said; Joe Song
Subject: RE: I am a PhD candidate at NMSU and have a question about maker2


The long file we used is a whole genome. Quite huge a file. I am not able to send that. Sorry. But in the simple test i told you, the nucleotide sequence sent you is consider to be the genome file, and protein sequence is another input. There two are what we want to blast against to each other to see if Maker2 works well.
Thanks.

On Feb 22, 2014 3:51 PM, "Daniel Ence" <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>> wrote:
Hi,

Will you send me the long file that you were trying to blast against?

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: Hua Zhong [zh9118 at gmail.com<mailto:zh9118 at gmail.com>]
Sent: Saturday, February 22, 2014 10:46 AM
To: Daniel Ence
Cc: Joe Song; Joseph Said
Subject: Re: I am a PhD candidate at NMSU and have a question about maker2

hi all,
Attached are the three configuration files and two input files, which are used to predict something between the genome and protein. For a simple test, we used one short sequence about 60bp and its translated protein sequence as inputs. But got nothing returned. What's more, we did test long genome sequence as one input as well, but still got nothing. I am not sure what's the reason cause this result.
Thanks a lot for help.

Hua


On Fri, Feb 21, 2014 at 9:31 PM, Joseph Said <joesaid at nmsu.edu<mailto:joesaid at nmsu.edu>> wrote:
Hi Daniel,

I do not have the exact files with me right now, but my coauthors on the paper I am working on have been copied on this email. Hua can send you those files. Thank you for being very helpful especially on a Friday night.

Thanks,
Joe

Sent from my iPad

> On Feb 21, 2014, at 9:27 PM, "Daniel Ence" <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>> wrote:
>
> Hi Joe,
>
> MAKER runs blast from your local system (or your server where MAKER is installed), and it blasts evidence that the user supplies in the "est" and "protein" settings. The est and protein settings are set in the maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file and the specific blast settings are in the "maker_bopts.ctl" file.
>
> Will you attach those file to your reply, so we can make sure that the settings are set up correctly?
>
> Thanks,
> Daniel
>
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ________________________________________
> From: Joseph Said [joesaid at nmsu.edu<mailto:joesaid at nmsu.edu>]
> Sent: Friday, February 21, 2014 7:44 PM
> To: Daniel Ence
> Subject: RE: I am a PhD candidate at NMSU and have a question about maker2
>
> Hi Daniel,
>
> Thank you for getting back to me so quickly. I am using the cotton Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I believe maker2 just calls BLAST from NCBI's page. So when I search the cotton genome it returns zero hits. But then I used a known cotton gene as a test and ran a search and also returned zero hits. I am not sure what the problem is but it seems like the protocol that should be returning the results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits. I can a BLAST standalone and came up with hits for both my gene of interest and the control test gene and came up with results.
>
> Thanks,
> Joe
> ________________________________________
> From: Daniel Ence <dence at genetics.utah.edu<mailto:dence at genetics.utah.edu>>
> Sent: Friday, February 21, 2014 7:38 PM
> To: Joseph Said
> Cc: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
> Subject: RE: I am a PhD candidate at NMSU and have a question about maker2
>
> Hi Joe,
>
> Will you upload your control files and data at this URL?
> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169
>
> Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene?
>
> I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues.
>
> Thanks,
> Daniel
>
>
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ________________________________________
> From: Mark Yandell
> Sent: Friday, February 21, 2014 7:32 PM
> To: Daniel Ence
> Subject: FW: I am a PhD candidate at NMSU and have a question about maker2
>
> Mark Yandell
> Professor of Human Genetics
> H.A. & Edna Benning Presidential Endowed Chair
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ph:801-587-7707<tel:801-587-7707>
>
> ________________________________________
> From: Joseph Said [joesaid at nmsu.edu<mailto:joesaid at nmsu.edu>]
> Sent: Friday, February 21, 2014 5:18 PM
> To: Mark Yandell
> Subject: I am a PhD candidate at NMSU and have a question about maker2
>
> Dear Dr. Yandell,
>
> I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me.
>
> Thank you,
> Joe
>
> Sent from my iPad

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140222/0033879e/attachment-0003.html>

From mikael.durling at slu.se  Sun Feb 23 09:57:09 2014
From: mikael.durling at slu.se (=?iso-8859-1?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Sun, 23 Feb 2014 16:57:09 +0000
Subject: [maker-devel] Maker predicting fusion genes?
Message-ID: <4CFD158A-DE75-4756-AD05-4CBF99BAF72D@slu.se>

Dear list and maker developers,

I was browsing the results of a recent maker run, focusing on differences between this run with the a recent maker (svn r1067) and a previous run with svn revision 1022 (I recall). One of the differences I found was a gene lost in the new prediction set, but replaced by an extended version of a previous neighbor (see http://figshare.com/articles/Maker_prediction_comparison/942300).  As you can see, there is no support for the join in the evidence. Do you have any clue to what might cause this?

Best regards,
Mikael Durling


From carsonhh at gmail.com  Sun Feb 23 13:00:50 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Sun, 23 Feb 2014 13:00:50 -0700
Subject: [maker-devel] Maker predicting fusion genes?
Message-ID: <CF2FA087.A21D%carsonhh@gmail.com>

The image doesn?t show all evidence sources, but the short answer is that
one of you evidence sources (est2genome, protein2genome, or blastx)
bridges the two regions, and when provided the bridged hint one of the
gene predictors thinks it makes sense to create a single model instead.
my guess is that it?s blastx evidence.

?Carson


On 2/23/14, 9:57 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
wrote:

>Dear list and maker developers,
>
>I was browsing the results of a recent maker run, focusing on differences
>between this run with the a recent maker (svn r1067) and a previous run
>with svn revision 1022 (I recall). One of the differences I found was a
>gene lost in the new prediction set, but replaced by an extended version
>of a previous neighbor (see
>http://figshare.com/articles/Maker_prediction_comparison/942300).  As you
>can see, there is no support for the join in the evidence. Do you have
>any clue to what might cause this?
>
>Best regards,
>Mikael Durling
>
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From mikael.durling at slu.se  Sun Feb 23 14:14:00 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Sun, 23 Feb 2014 21:14:00 +0000
Subject: [maker-devel] Maker predicting fusion genes?
In-Reply-To: <CF2FA087.A21D%carsonhh@gmail.com>
References: <CF2FA087.A21D%carsonhh@gmail.com>
Message-ID: <7CCC5270-93B9-4E5A-9687-26A1BF0EB1F8@slu.se>

Ok, do you by that imply that the predictions that end up in the gff3 output from the ab initio predictors (snap_masked, augustus_masked, and genemark), are not the final hinted predictions? Otherwise, I?m sorry that I can?t follow your reasoning. I checked my gff file, and there is no evidence there to support the bridge, as far as I can tell (See attached gff of the region or http://figshare.com/articles/Maker_prediction/942301 where all evidence is plotted).

Mikael


23 feb 2014 kl. 21:00 skrev Carson Holt <carsonhh at gmail.com>:

> The image doesn?t show all evidence sources, but the short answer is that
> one of you evidence sources (est2genome, protein2genome, or blastx)
> bridges the two regions, and when provided the bridged hint one of the
> gene predictors thinks it makes sense to create a single model instead.
> my guess is that it?s blastx evidence.
>
> ?Carson
>
>
> On 2/23/14, 9:57 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
> wrote:
>
>> Dear list and maker developers,
>>
>> I was browsing the results of a recent maker run, focusing on differences
>> between this run with the a recent maker (svn r1067) and a previous run
>> with svn revision 1022 (I recall). One of the differences I found was a
>> gene lost in the new prediction set, but replaced by an extended version
>> of a previous neighbor (see
>> http://figshare.com/articles/Maker_prediction_comparison/942300).  As you
>> can see, there is no support for the join in the evidence. Do you have
>> any clue to what might cause this?
>>
>> Best regards,
>> Mikael Durling
>>
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140223/240ecba4/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: region.gff3
Type: application/octet-stream
Size: 19612 bytes
Desc: region.gff3
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140223/240ecba4/attachment-0003.obj>

From hedgyx at yahoo.com  Mon Feb 24 00:02:41 2014
From: hedgyx at yahoo.com (Megan)
Date: Sun, 23 Feb 2014 23:02:41 -0800 (PST)
Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides
Message-ID: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com>

Maker folks,
I am re-annotating a single contig and I am having a few problems.

First, I am having trouble passing through a Maker derived gff (from Maker 2.09, with some modifications to gene names and functional information added).  The gff file passes the modencode validator but Maker always fails on the first gene in the file, regardless of which gene comes first.  So it appears to be a systematic error across the entire file.  The Maker error is "Check your input GFF3 file for errors! (from GFFDB)".   I have tried Maker 2.10 and 2.31, using both genome_gff with model_pass=1 and pred_gff.  Attached is a gff with the first 2 genes.  

Second, when I updated to Maker 2.31, Maker now complains that my EST fasta file has nucleotides that are not supported [RYKMSWBDHV].  It suggests "set -fix_nucleotides on the command line to fix this automatically".  Is the -fix_nucleotides a Maker flag?  What exactly does it do?  Does it remove the entire sequence or replace ambiguous bases with a randomly selected one?  Half of my 20k ESTs contain these characters, so I don't want to throw them out entirely.  

Also, just curious, has Maker never supported these characters but just never complained?  I used this EST data set with Maker 2.09.  I did note poor EST coverage, but thought it was an issue with the EST data itself.

I appreciate any suggestions.
Thanks,
Megan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: part_passthru.gff
Type: application/octet-stream
Size: 4363 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140223/3950a0b4/attachment-0003.obj>

From zh9118 at gmail.com  Sat Feb 22 16:00:28 2014
From: zh9118 at gmail.com (Hua Zhong)
Date: Sat, 22 Feb 2014 16:00:28 -0700
Subject: [maker-devel] I am a PhD candidate at NMSU and have a question
	about maker2
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D66F8F@mxb2.hg.genetics.utah.edu>
References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu>
	<7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66E9C@mxb2.hg.genetics.utah.edu>
	<d5533a5c463b498e877651cd01820309@BY2PR01MB506.prod.exchangelabs.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66ECE@mxb2.hg.genetics.utah.edu>
	<6FA1C2F9-68A0-4154-8825-7B502E4762BF@nmsu.edu>
	<CA+ebk3=kXzXEH+DVjKFvMNt689-Gwjw-+6GtySaMG_gZLQ5XvA@mail.gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D66F8F@mxb2.hg.genetics.utah.edu>
Message-ID: <CA+ebk3=2mJi_1wxy5gnkOb4syEVZ14Pcj_bGRVcq=uHgySPmqQ@mail.gmail.com>

The long file we used is a whole genome. Quite huge a file. I am not able
to send that. Sorry. But in the simple test i told you, the nucleotide
sequence sent you is consider to be the genome file, and protein sequence
is another input. There two are what we want to blast against to each other
to see if Maker2 works well.
Thanks.
On Feb 22, 2014 3:51 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>  Hi,
>
>  Will you send me the long file that you were trying to blast against?
>
>  Thanks,
> Daniel
>
>  Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
>   ------------------------------
> *From:* Hua Zhong [zh9118 at gmail.com]
> *Sent:* Saturday, February 22, 2014 10:46 AM
> *To:* Daniel Ence
> *Cc:* Joe Song; Joseph Said
> *Subject:* Re: I am a PhD candidate at NMSU and have a question about
> maker2
>
>   hi all,
> Attached are the three configuration files and two input files, which are
> used to predict something between the genome and protein. For a simple
> test, we used one short sequence about 60bp and its translated protein
> sequence as inputs. But got nothing returned. What's more, we did test long
> genome sequence as one input as well, but still got nothing. I am not sure
> what's the reason cause this result.
> Thanks a lot for help.
>
>  Hua
>
>
>
>
> On Fri, Feb 21, 2014 at 9:31 PM, Joseph Said <joesaid at nmsu.edu> wrote:
>
>> Hi Daniel,
>>
>> I do not have the exact files with me right now, but my coauthors on the
>> paper I am working on have been copied on this email. Hua can send you
>> those files. Thank you for being very helpful especially on a Friday night.
>>
>> Thanks,
>> Joe
>>
>> Sent from my iPad
>>
>> > On Feb 21, 2014, at 9:27 PM, "Daniel Ence" <dence at genetics.utah.edu>
>> wrote:
>> >
>> > Hi Joe,
>> >
>> > MAKER runs blast from your local system (or your server where MAKER is
>> installed), and it blasts evidence that the user supplies in the "est" and
>> "protein" settings. The est and protein settings are set in the
>> maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file
>> and the specific blast settings are in the "maker_bopts.ctl" file.
>> >
>> > Will you attach those file to your reply, so we can make sure that the
>> settings are set up correctly?
>> >
>> > Thanks,
>> > Daniel
>> >
>> >
>> > Daniel Ence
>> > Graduate Student
>> > Eccles Institute of Human Genetics
>> > University of Utah
>> > 15 North 2030 East, Room 2100
>> > Salt Lake City, UT 84112-5330
>> > ________________________________________
>> > From: Joseph Said [joesaid at nmsu.edu]
>> > Sent: Friday, February 21, 2014 7:44 PM
>> > To: Daniel Ence
>> > Subject: RE: I am a PhD candidate at NMSU and have a question about
>> maker2
>> >
>> > Hi Daniel,
>> >
>> > Thank you for getting back to me so quickly. I am using the cotton
>> Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the
>> GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I
>> believe maker2 just calls BLAST from NCBI's page. So when I search the
>> cotton genome it returns zero hits. But then I used a known cotton gene as
>> a test and ran a search and also returned zero hits. I am not sure what the
>> problem is but it seems like the protocol that should be returning the
>> results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits.
>> I can a BLAST standalone and came up with hits for both my gene of interest
>> and the control test gene and came up with results.
>> >
>> > Thanks,
>> > Joe
>> > ________________________________________
>> > From: Daniel Ence <dence at genetics.utah.edu>
>> > Sent: Friday, February 21, 2014 7:38 PM
>> > To: Joseph Said
>> > Cc: maker-devel at yandell-lab.org
>> > Subject: RE: I am a PhD candidate at NMSU and have a question about
>> maker2
>> >
>> > Hi Joe,
>> >
>> > Will you upload your control files and data at this URL?
>> > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169
>> >
>> > Also, what version of MAKER and blast are you using? And which file are
>> you using for the known arabidopsis gene?
>> >
>> > I've copied this email to the maker-development list, which is a really
>> good resource for trouble-shooting MAKER issues.
>> >
>> > Thanks,
>> > Daniel
>> >
>> >
>> > Daniel Ence
>> > Graduate Student
>> > Eccles Institute of Human Genetics
>> > University of Utah
>> > 15 North 2030 East, Room 2100
>> > Salt Lake City, UT 84112-5330
>> > ________________________________________
>> > From: Mark Yandell
>> > Sent: Friday, February 21, 2014 7:32 PM
>> > To: Daniel Ence
>> > Subject: FW: I am a PhD candidate at NMSU and have a question about
>> maker2
>> >
>> > Mark Yandell
>> > Professor of Human Genetics
>> > H.A. & Edna Benning Presidential Endowed Chair
>> > Eccles Institute of Human Genetics
>> > University of Utah
>> > 15 North 2030 East, Room 2100
>> > Salt Lake City, UT 84112-5330
>> > ph:801-587-7707
>> >
>> > ________________________________________
>> > From: Joseph Said [joesaid at nmsu.edu]
>> > Sent: Friday, February 21, 2014 5:18 PM
>> > To: Mark Yandell
>> > Subject: I am a PhD candidate at NMSU and have a question about maker2
>> >
>> > Dear Dr. Yandell,
>> >
>> > I am a molecular biologist at NMSU. I am trying to use maker2 with the
>> cotton genome, and search an Arabidopsis gene against it. I think there is
>> a problem with the blast component because zero results are returned. I
>> tried troubleshooting by searching a known gene and still returned zero
>> results. Is this a common problem maybe with the pipeline? I would
>> appreciate any ideas you might have to help me.
>> >
>> > Thank you,
>> > Joe
>> >
>> > Sent from my iPad
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140222/57e1804c/attachment-0003.html>

From carsonhh at gmail.com  Mon Feb 24 11:18:18 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Feb 2014 11:18:18 -0700
Subject: [maker-devel] gff pass thru problem and unsupported EST
 nucleotides
In-Reply-To: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com>
References: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com>
Message-ID: <CF30D6EC.A2CC%carsonhh@gmail.com>

The -fix_nucleotides flag is added to the command line (I.e. maker
-fix_nucleotides flag).  It is there so you are aware that there is an
issue with your fasta file, that will cause things downstream to fail.
MAKER can fix the errors for you, but first it gives a warning designed to
make you look at the file and validate it.  Why would you want to do this?
 For example, what if you provided protein sequence to the EST option
accidentally, you wouldn?t want MAKER to just proceed.  You want a warning
so you can check first.  If your file is in fact EST data, then set the
flag and those characters will be changed to N?s in the fixed fasta
sequence, otherwise those characters will cause errors in downstream tools
like exonerate, and even some downstream GMOD tools, so they can?t be
allowed to remain as is.

For the GFF3 file, there is almost definitely a logic issue in the file
(mod encode validator won?t check for those).  This can be from prior
manipulation of the GFF3 file.  For example, IDs for a gene that are the
same across two contigs (technically valid but a logic error).  The GFF3
error message will normally give the ID of the feature causing the issue.

I could also take a look for you.  You can upload the GFF3 file here ?>
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
Click on 'new guest account' then e-mail me back you guest ID, so I know
which files to review.

Thanks,
Carson


On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com> wrote:

>Maker folks,
>I am re-annotating a single contig and I am having a few problems.
>
>First, I am having trouble passing through a Maker derived gff (from
>Maker 2.09, with some modifications to gene names and functional
>information added).  The gff file passes the modencode validator but
>Maker always fails on the first gene in the file, regardless of which
>gene comes first.  So it appears to be a systematic error across the
>entire file.  The Maker error is "Check your input GFF3 file for errors!
>(from GFFDB)".   I have tried Maker 2.10 and 2.31, using both genome_gff
>with model_pass=1 and pred_gff.  Attached is a gff with the first 2
>genes.  
>
>Second, when I updated to Maker 2.31, Maker now complains that my EST
>fasta file has nucleotides that are not supported [RYKMSWBDHV].  It
>suggests "set -fix_nucleotides on the command line to fix this
>automatically".  Is the -fix_nucleotides a Maker flag?  What exactly does
>it do?  Does it remove the entire sequence or replace ambiguous bases
>with a randomly selected one?  Half of my 20k ESTs contain these
>characters, so I don't want to throw them out entirely.
>
>Also, just curious, has Maker never supported these characters but just
>never complained?  I used this EST data set with Maker 2.09.  I did note
>poor EST coverage, but thought it was an issue with the EST data itself.
>
>I appreciate any suggestions.
>Thanks,
>Megan_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From dence at genetics.utah.edu  Mon Feb 24 11:31:47 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Mon, 24 Feb 2014 18:31:47 +0000
Subject: [maker-devel] gff pass thru problem and unsupported EST
 nucleotides
In-Reply-To: <CF30D6EC.A2CC%carsonhh@gmail.com>
References: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com>,
	<CF30D6EC.A2CC%carsonhh@gmail.com>
Message-ID: <F2774D6F47BB9D449EEA8B0BF6679D9C65D671BB@mxb2.hg.genetics.utah.edu>

Hi Megan, 

One problem with the GFF3 that you attached is that the ID's for the CDS features are being made wrong. All of the CDS features for a given mRNA or transcript should have the same ID. The CDS features in your GFF3 have IDs that use the exon name. 

You can fix it with this command-line perl:
cat part_passthru.gff | perl -ane 'if(/\tCDS\t/){ chomp; /Parent=([\S]+)/; my $parent=$1; s/ID=([^\;]+)/ID=$parent-cds/; print "$_\n"}else{print $_}' > fixed.gff3

It just fixes the ID attributes in all of the CDS features. Try it on the test gff3 you sent and let me know if it works. I can't test it myself without the fasta file that you are annotating. 

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Carson Holt [carsonhh at gmail.com]
Sent: Monday, February 24, 2014 11:18 AM
To: Megan; maker-devel at yandell-lab.org
Subject: Re: [maker-devel] gff pass thru problem and unsupported EST nucleotides

The -fix_nucleotides flag is added to the command line (I.e. maker
-fix_nucleotides flag).  It is there so you are aware that there is an
issue with your fasta file, that will cause things downstream to fail.
MAKER can fix the errors for you, but first it gives a warning designed to
make you look at the file and validate it.  Why would you want to do this?
 For example, what if you provided protein sequence to the EST option
accidentally, you wouldn?t want MAKER to just proceed.  You want a warning
so you can check first.  If your file is in fact EST data, then set the
flag and those characters will be changed to N?s in the fixed fasta
sequence, otherwise those characters will cause errors in downstream tools
like exonerate, and even some downstream GMOD tools, so they can?t be
allowed to remain as is.

For the GFF3 file, there is almost definitely a logic issue in the file
(mod encode validator won?t check for those).  This can be from prior
manipulation of the GFF3 file.  For example, IDs for a gene that are the
same across two contigs (technically valid but a logic error).  The GFF3
error message will normally give the ID of the feature causing the issue.

I could also take a look for you.  You can upload the GFF3 file here ?>
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
Click on 'new guest account' then e-mail me back you guest ID, so I know
which files to review.

Thanks,
Carson


On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com> wrote:

>Maker folks,
>I am re-annotating a single contig and I am having a few problems.
>
>First, I am having trouble passing through a Maker derived gff (from
>Maker 2.09, with some modifications to gene names and functional
>information added).  The gff file passes the modencode validator but
>Maker always fails on the first gene in the file, regardless of which
>gene comes first.  So it appears to be a systematic error across the
>entire file.  The Maker error is "Check your input GFF3 file for errors!
>(from GFFDB)".   I have tried Maker 2.10 and 2.31, using both genome_gff
>with model_pass=1 and pred_gff.  Attached is a gff with the first 2
>genes.
>
>Second, when I updated to Maker 2.31, Maker now complains that my EST
>fasta file has nucleotides that are not supported [RYKMSWBDHV].  It
>suggests "set -fix_nucleotides on the command line to fix this
>automatically".  Is the -fix_nucleotides a Maker flag?  What exactly does
>it do?  Does it remove the entire sequence or replace ambiguous bases
>with a randomly selected one?  Half of my 20k ESTs contain these
>characters, so I don't want to throw them out entirely.
>
>Also, just curious, has Maker never supported these characters but just
>never complained?  I used this EST data set with Maker 2.09.  I did note
>poor EST coverage, but thought it was an issue with the EST data itself.
>
>I appreciate any suggestions.
>Thanks,
>Megan_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Mon Feb 24 11:34:28 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Feb 2014 11:34:28 -0700
Subject: [maker-devel] gff pass thru problem and unsupported EST
 nucleotides
In-Reply-To: <F2774D6F47BB9D449EEA8B0BF6679D9C65D671BB@mxb2.hg.genetics.utah.edu>
References: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com>
	<CF30D6EC.A2CC%carsonhh@gmail.com>
	<F2774D6F47BB9D449EEA8B0BF6679D9C65D671BB@mxb2.hg.genetics.utah.edu>
Message-ID: <CF30DE6B.A2F6%carsonhh@gmail.com>

Actually that is not true.  CDS IDs can be the same or different.  MAKER
doesn?t care either way.  Both are valid in GFF3.  Having the same ID just
allows then to be put together by some GMOD viewers without having to go
through a container feature.

?Carson

On 2/24/14, 11:31 AM, "Daniel Ence" <dence at genetics.utah.edu> wrote:

>Hi Megan, 
>
>One problem with the GFF3 that you attached is that the ID's for the CDS
>features are being made wrong. All of the CDS features for a given mRNA
>or transcript should have the same ID. The CDS features in your GFF3 have
>IDs that use the exon name.
>
>You can fix it with this command-line perl:
>cat part_passthru.gff | perl -ane 'if(/\tCDS\t/){ chomp;
>/Parent=([\S]+)/; my $parent=$1; s/ID=([^\;]+)/ID=$parent-cds/; print
>"$_\n"}else{print $_}' > fixed.gff3
>
>It just fixes the ID attributes in all of the CDS features. Try it on the
>test gff3 you sent and let me know if it works. I can't test it myself
>without the fasta file that you are annotating.
>
>Thanks,
>Daniel
>
>Daniel Ence
>Graduate Student
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>________________________________________
>From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of
>Carson Holt [carsonhh at gmail.com]
>Sent: Monday, February 24, 2014 11:18 AM
>To: Megan; maker-devel at yandell-lab.org
>Subject: Re: [maker-devel] gff pass thru problem and unsupported EST
>nucleotides
>
>The -fix_nucleotides flag is added to the command line (I.e. maker
>-fix_nucleotides flag).  It is there so you are aware that there is an
>issue with your fasta file, that will cause things downstream to fail.
>MAKER can fix the errors for you, but first it gives a warning designed to
>make you look at the file and validate it.  Why would you want to do this?
> For example, what if you provided protein sequence to the EST option
>accidentally, you wouldn?t want MAKER to just proceed.  You want a warning
>so you can check first.  If your file is in fact EST data, then set the
>flag and those characters will be changed to N?s in the fixed fasta
>sequence, otherwise those characters will cause errors in downstream tools
>like exonerate, and even some downstream GMOD tools, so they can?t be
>allowed to remain as is.
>
>For the GFF3 file, there is almost definitely a logic issue in the file
>(mod encode validator won?t check for those).  This can be from prior
>manipulation of the GFF3 file.  For example, IDs for a gene that are the
>same across two contigs (technically valid but a logic error).  The GFF3
>error message will normally give the ID of the feature causing the issue.
>
>I could also take a look for you.  You can upload the GFF3 file here ?>
>http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>Click on 'new guest account' then e-mail me back you guest ID, so I know
>which files to review.
>
>Thanks,
>Carson
>
>
>
>On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com> wrote:
>
>>Maker folks,
>>I am re-annotating a single contig and I am having a few problems.
>>
>>First, I am having trouble passing through a Maker derived gff (from
>>Maker 2.09, with some modifications to gene names and functional
>>information added).  The gff file passes the modencode validator but
>>Maker always fails on the first gene in the file, regardless of which
>>gene comes first.  So it appears to be a systematic error across the
>>entire file.  The Maker error is "Check your input GFF3 file for errors!
>>(from GFFDB)".   I have tried Maker 2.10 and 2.31, using both genome_gff
>>with model_pass=1 and pred_gff.  Attached is a gff with the first 2
>>genes.
>>
>>Second, when I updated to Maker 2.31, Maker now complains that my EST
>>fasta file has nucleotides that are not supported [RYKMSWBDHV].  It
>>suggests "set -fix_nucleotides on the command line to fix this
>>automatically".  Is the -fix_nucleotides a Maker flag?  What exactly does
>>it do?  Does it remove the entire sequence or replace ambiguous bases
>>with a randomly selected one?  Half of my 20k ESTs contain these
>>characters, so I don't want to throw them out entirely.
>>
>>Also, just curious, has Maker never supported these characters but just
>>never complained?  I used this EST data set with Maker 2.09.  I did note
>>poor EST coverage, but thought it was an issue with the EST data itself.
>>
>>I appreciate any suggestions.
>>Thanks,
>>Megan_______________________________________________
>>maker-devel mailing list
>>maker-devel at box290.bluehost.com
>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Mon Feb 24 13:59:12 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Feb 2014 13:59:12 -0700
Subject: [maker-devel] gff pass thru problem and unsupported EST
 nucleotides
In-Reply-To: <1393273971.41635.YahooMailBasic@web162205.mail.bf1.yahoo.com>
References: <CF30D6EC.A2CC%carsonhh@gmail.com>
	<1393273971.41635.YahooMailBasic@web162205.mail.bf1.yahoo.com>
Message-ID: <CF30FEE0.A32D%carsonhh@gmail.com>

I found the issue.  You have non-ascii characters at the end of almost
every line.  Because they are happening within the Parent= tag, they then
become part of the Parent ID when the file is read.

So instead of "HERA000031-RA? you get ?> "HERA000031-RA\cM? as the Parent
ID.

?\cM? is a meta-return.

I ran the attached script to remove these characters (perl purify
<gff3_file>), and then it works.  Make sure to remove the
.../Hera_Cr_HmelHybd_Nov2013.maker.output/Hera_Cr_HmelHybd_Nov2013.db file
to force the GFF3 database to be rebuilt after fixing the file when you
rerun MAKER.

Thanks,
Carson


On 2/24/14, 1:32 PM, "Megan" <hedgyx at yahoo.com> wrote:

>Hi Carson and Daniel,
>
>Thanks for your suggestions.  I have looked at the gff file, but I do not
>see any obvious errors.  I have uploaded the files to your website.  The
>reference fasta is there, the full gff, and a single gene gff that also
>causes an error.  If I remove that gene from the full gff, then the error
>is on the next gene in the file, so it appears to be a systematic problem
>throughout the gff.  The gff was generated by Maker, but I may have
>messed it up when I modified it to rename genes and add functional
>information.  I checked with cat -te, but don't see any obvious
>formatting errors.
>
>Thanks!
>Megan
>
>
>--------------------------------------------
>On Mon, 2/24/14, Carson Holt <carsonhh at gmail.com> wrote:
>
> Subject: Re: [maker-devel] gff pass thru problem and unsupported EST
>nucleotides
> To: "Megan" <hedgyx at yahoo.com>, maker-devel at yandell-lab.org
> Date: Monday, February 24, 2014, 10:18 AM
> 
> The -fix_nucleotides flag is added to
> the command line (I.e. maker
> -fix_nucleotides flag).  It is there so you are aware
> that there is an
> issue with your fasta file, that will cause things
> downstream to fail.
> MAKER can fix the errors for you, but first it gives a
> warning designed to
> make you look at the file and validate it.  Why would
> you want to do this?
>  For example, what if you provided protein sequence to the
> EST option
> accidentally, you wouldn?t want MAKER to just
> proceed.  You want a warning
> so you can check first.  If your file is in fact EST
> data, then set the
> flag and those characters will be changed to N?s in the
> fixed fasta
> sequence, otherwise those characters will cause errors in
> downstream tools
> like exonerate, and even some downstream GMOD tools, so they
> can?t be
> allowed to remain as is.
> 
> For the GFF3 file, there is almost definitely a logic issue
> in the file
> (mod encode validator won?t check for those).  This
> can be from prior
> manipulation of the GFF3 file.  For example, IDs for a
> gene that are the
> same across two contigs (technically valid but a logic
> error).  The GFF3
> error message will normally give the ID of the feature
> causing the issue.
> 
> I could also take a look for you.  You can upload the
> GFF3 file here ?>
> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
> Click on 'new guest account' then e-mail me back you guest
> ID, so I know
> which files to review.
> 
> Thanks,
> Carson
> 
> 
> 
> On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com>
> wrote:
> 
> >Maker folks,
> >I am re-annotating a single contig and I am having a few
> problems.
> >
> >First, I am having trouble passing through a Maker
> derived gff (from
> >Maker 2.09, with some modifications to gene names and
> functional
> >information added).  The gff file passes the
> modencode validator but
> >Maker always fails on the first gene in the file,
> regardless of which
> >gene comes first.  So it appears to be a systematic
> error across the
> >entire file.  The Maker error is "Check your input
> GFF3 file for errors!
> >(from GFFDB)".   I have tried Maker 2.10
> and 2.31, using both genome_gff
> >with model_pass=1 and pred_gff.  Attached is a gff
> with the first 2
> >genes.  
> >
> >Second, when I updated to Maker 2.31, Maker now
> complains that my EST
> >fasta file has nucleotides that are not supported
> [RYKMSWBDHV].  It
> >suggests "set -fix_nucleotides on the command line to
> fix this
> >automatically".  Is the -fix_nucleotides a Maker
> flag?  What exactly does
> >it do?  Does it remove the entire sequence or
> replace ambiguous bases
> >with a randomly selected one?  Half of my 20k ESTs
> contain these
> >characters, so I don't want to throw them out entirely.
> >
> >Also, just curious, has Maker never supported these
> characters but just
> >never complained?  I used this EST data set with
> Maker 2.09.  I did note
> >poor EST coverage, but thought it was an issue with the
> EST data itself.
> >
> >I appreciate any suggestions.
> >Thanks,
> >Megan_______________________________________________
> >maker-devel mailing list
> >maker-devel at box290.bluehost.com
> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> 
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: purify
Type: application/octet-stream
Size: 1966 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140224/a1582e7d/attachment-0003.obj>

From carsonhh at gmail.com  Mon Feb 24 14:03:00 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 24 Feb 2014 14:03:00 -0700
Subject: [maker-devel] gff pass thru problem and unsupported EST
 nucleotides
In-Reply-To: <CF30FEE0.A32D%carsonhh@gmail.com>
References: <CF30D6EC.A2CC%carsonhh@gmail.com>
	<1393273971.41635.YahooMailBasic@web162205.mail.bf1.yahoo.com>
	<CF30FEE0.A32D%carsonhh@gmail.com>
Message-ID: <CF310121.A33F%carsonhh@gmail.com>

One more thing.  You must give the file to pred_gff or model_gff.  It is
no longer strictly a MAKER file, as many of the source columns read ?.?
meaning it has been edited by Apollo or another editor.  So it will not be
guaranteed to be recognized by genome_gff, because many of the source tags
have changed.

Thanks,
Carson


On 2/24/14, 1:59 PM, "Carson Holt" <carsonhh at gmail.com> wrote:

>I found the issue.  You have non-ascii characters at the end of almost
>every line.  Because they are happening within the Parent= tag, they then
>become part of the Parent ID when the file is read.
>
>So instead of "HERA000031-RA? you get ?> "HERA000031-RA\cM? as the Parent
>ID.
>
>?\cM? is a meta-return.
>
>I ran the attached script to remove these characters (perl purify
><gff3_file>), and then it works.  Make sure to remove the
>.../Hera_Cr_HmelHybd_Nov2013.maker.output/Hera_Cr_HmelHybd_Nov2013.db file
>to force the GFF3 database to be rebuilt after fixing the file when you
>rerun MAKER.
>
>Thanks,
>Carson
>
>
>
>
>On 2/24/14, 1:32 PM, "Megan" <hedgyx at yahoo.com> wrote:
>
>>Hi Carson and Daniel,
>>
>>Thanks for your suggestions.  I have looked at the gff file, but I do not
>>see any obvious errors.  I have uploaded the files to your website.  The
>>reference fasta is there, the full gff, and a single gene gff that also
>>causes an error.  If I remove that gene from the full gff, then the error
>>is on the next gene in the file, so it appears to be a systematic problem
>>throughout the gff.  The gff was generated by Maker, but I may have
>>messed it up when I modified it to rename genes and add functional
>>information.  I checked with cat -te, but don't see any obvious
>>formatting errors.
>>
>>Thanks!
>>Megan
>>
>>
>>--------------------------------------------
>>On Mon, 2/24/14, Carson Holt <carsonhh at gmail.com> wrote:
>>
>> Subject: Re: [maker-devel] gff pass thru problem and unsupported EST
>>nucleotides
>> To: "Megan" <hedgyx at yahoo.com>, maker-devel at yandell-lab.org
>> Date: Monday, February 24, 2014, 10:18 AM
>> 
>> The -fix_nucleotides flag is added to
>> the command line (I.e. maker
>> -fix_nucleotides flag).  It is there so you are aware
>> that there is an
>> issue with your fasta file, that will cause things
>> downstream to fail.
>> MAKER can fix the errors for you, but first it gives a
>> warning designed to
>> make you look at the file and validate it.  Why would
>> you want to do this?
>>  For example, what if you provided protein sequence to the
>> EST option
>> accidentally, you wouldn?t want MAKER to just
>> proceed.  You want a warning
>> so you can check first.  If your file is in fact EST
>> data, then set the
>> flag and those characters will be changed to N?s in the
>> fixed fasta
>> sequence, otherwise those characters will cause errors in
>> downstream tools
>> like exonerate, and even some downstream GMOD tools, so they
>> can?t be
>> allowed to remain as is.
>> 
>> For the GFF3 file, there is almost definitely a logic issue
>> in the file
>> (mod encode validator won?t check for those).  This
>> can be from prior
>> manipulation of the GFF3 file.  For example, IDs for a
>> gene that are the
>> same across two contigs (technically valid but a logic
>> error).  The GFF3
>> error message will normally give the ID of the feature
>> causing the issue.
>> 
>> I could also take a look for you.  You can upload the
>> GFF3 file here ?>
>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>> Click on 'new guest account' then e-mail me back you guest
>> ID, so I know
>> which files to review.
>> 
>> Thanks,
>> Carson
>> 
>> 
>> 
>> On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com>
>> wrote:
>> 
>> >Maker folks,
>> >I am re-annotating a single contig and I am having a few
>> problems.
>> >
>> >First, I am having trouble passing through a Maker
>> derived gff (from
>> >Maker 2.09, with some modifications to gene names and
>> functional
>> >information added).  The gff file passes the
>> modencode validator but
>> >Maker always fails on the first gene in the file,
>> regardless of which
>> >gene comes first.  So it appears to be a systematic
>> error across the
>> >entire file.  The Maker error is "Check your input
>> GFF3 file for errors!
>> >(from GFFDB)".   I have tried Maker 2.10
>> and 2.31, using both genome_gff
>> >with model_pass=1 and pred_gff.  Attached is a gff
>> with the first 2
>> >genes.  
>> >
>> >Second, when I updated to Maker 2.31, Maker now
>> complains that my EST
>> >fasta file has nucleotides that are not supported
>> [RYKMSWBDHV].  It
>> >suggests "set -fix_nucleotides on the command line to
>> fix this
>> >automatically".  Is the -fix_nucleotides a Maker
>> flag?  What exactly does
>> >it do?  Does it remove the entire sequence or
>> replace ambiguous bases
>> >with a randomly selected one?  Half of my 20k ESTs
>> contain these
>> >characters, so I don't want to throw them out entirely.
>> >
>> >Also, just curious, has Maker never supported these
>> characters but just
>> >never complained?  I used this EST data set with
>> Maker 2.09.  I did note
>> >poor EST coverage, but thought it was an issue with the
>> EST data itself.
>> >
>> >I appreciate any suggestions.
>> >Thanks,
>> >Megan_______________________________________________
>> >maker-devel mailing list
>> >maker-devel at box290.bluehost.com
>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
>> 
>>
>


From rbharris at uw.edu  Tue Feb 25 14:49:57 2014
From: rbharris at uw.edu (Rebecca Harris)
Date: Tue, 25 Feb 2014 13:49:57 -0800
Subject: [maker-devel] error in snap training
Message-ID: <CAESS276MjRUmto+9fkr68jRXBE9or4geWB-q4Oc5_qKsQOdnpA@mail.gmail.com>

Hey -

I'm trying to train SNAP and am running into errors. I don't have any EST
evidence, just protein. My .gff file reports 10865 genes but when I run
maker2zff  -c0 -e0 I get back empty genome files. When I run maker2zff -n,
a ton of overlap_prev_exon errors get written to the screen and then with I
get to the forge step I get an "impossible error5". Any help would be
greatly appreciated.

Thanks!
Rebecca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140225/cc68f3a6/attachment-0003.html>

From carsonhh at gmail.com  Tue Feb 25 15:12:14 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 25 Feb 2014 15:12:14 -0700
Subject: [maker-devel] error in snap training
In-Reply-To: <CAESS276MjRUmto+9fkr68jRXBE9or4geWB-q4Oc5_qKsQOdnpA@mail.gmail.com>
References: <CAESS276MjRUmto+9fkr68jRXBE9or4geWB-q4Oc5_qKsQOdnpA@mail.gmail.com>
Message-ID: <16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com>

Make sure you are using 2.31,  and then try the maker2zff filters individually.  If the protein models are not working well, use CEGMA to generate models. It's from the same group as SNAP.  Use cegma2zff for the conversion.

--Carson

Sent from my iPhone

> On Feb 25, 2014, at 2:49 PM, Rebecca Harris <rbharris at uw.edu> wrote:
> 
> Hey - 
> 
> I'm trying to train SNAP and am running into errors. I don't have any EST evidence, just protein. My .gff file reports 10865 genes but when I run maker2zff  -c0 -e0 I get back empty genome files. When I run maker2zff -n, a ton of overlap_prev_exon errors get written to the screen and then with I get to the forge step I get an "impossible error5". Any help would be greatly appreciated.
> 
> Thanks!
> Rebecca
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From sjackman at gmail.com  Tue Feb 25 17:06:03 2014
From: sjackman at gmail.com (Shaun Jackman)
Date: Tue, 25 Feb 2014 16:06:03 -0800
Subject: [maker-devel] Mapping gene names
Message-ID: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>

Hi,

I?m annotating a genome using a closely related genome from Genbank, using
the .frn (RNA) and .faa (protein) files from Genbank as evidence to
annotate my genome. I?ve run Maker, and the annotation seems to have worked
well. Is it possible to map the names of the genes from the related species
to my annotation? I see the *map_forward* option, which applies to the
*model_gff* parameter. Is there a similar option for *est* and *protein*?

*maker_opts.ctl*

est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1

Thanks,
Shaun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140225/7ae5e966/attachment-0003.html>

From hedgyx at yahoo.com  Tue Feb 25 17:26:11 2014
From: hedgyx at yahoo.com (Megan)
Date: Tue, 25 Feb 2014 16:26:11 -0800 (PST)
Subject: [maker-devel] gff pass thru problem and unsupported EST
	nucleotides
In-Reply-To: <CF30FEE0.A32D%carsonhh@gmail.com>
Message-ID: <1393374371.45210.YahooMailBasic@web162201.mail.bf1.yahoo.com>

Carson,

Everything ran through smoothly after removing the ^Ms.  Thanks for the help.

Megan
--------------------------------------------
On Mon, 2/24/14, Carson Holt <carsonhh at gmail.com> wrote:

 Subject: Re: [maker-devel] gff pass thru problem and unsupported EST nucleotides
 To: "Megan" <hedgyx at yahoo.com>, "Daniel Ence" <dence at genetics.utah.edu>
 Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
 Date: Monday, February 24, 2014, 12:59 PM
 
 I found the issue.? You have
 non-ascii characters at the end of almost
 every line.? Because they are happening within the
 Parent= tag, they then
 become part of the Parent ID when the file is read.
 
 So instead of "HERA000031-RA? you get ?>
 "HERA000031-RA\cM? as the Parent
 ID.
 
 ?\cM? is a meta-return.
 
 I ran the attached script to remove these characters (perl
 purify
 <gff3_file>), and then it works.? Make sure to
 remove the
 .../Hera_Cr_HmelHybd_Nov2013.maker.output/Hera_Cr_HmelHybd_Nov2013.db
 file
 to force the GFF3 database to be rebuilt after fixing the
 file when you
 rerun MAKER.
 
 Thanks,
 Carson
 
 
 On 2/24/14, 1:32 PM, "Megan" <hedgyx at yahoo.com>
 wrote:
 
 >Hi Carson and Daniel,
 >
 >Thanks for your suggestions.? I have looked at the
 gff file, but I do not
 >see any obvious errors.? I have uploaded the files
 to your website.? The
 >reference fasta is there, the full gff, and a single
 gene gff that also
 >causes an error.? If I remove that gene from the
 full gff, then the error
 >is on the next gene in the file, so it appears to be a
 systematic problem
 >throughout the gff.? The gff was generated by
 Maker, but I may have
 >messed it up when I modified it to rename genes and add
 functional
 >information.? I checked with cat -te, but don't see
 any obvious
 >formatting errors.
 >
 >Thanks!
 >Megan
 >
 >
 >--------------------------------------------
 >On Mon, 2/24/14, Carson Holt <carsonhh at gmail.com>
 wrote:
 >
 > Subject: Re: [maker-devel] gff pass thru problem and
 unsupported EST
 >nucleotides
 > To: "Megan" <hedgyx at yahoo.com>,
 maker-devel at yandell-lab.org
 > Date: Monday, February 24, 2014, 10:18 AM
 > 
 > The -fix_nucleotides flag is added to
 > the command line (I.e. maker
 > -fix_nucleotides flag).? It is there so you are
 aware
 > that there is an
 > issue with your fasta file, that will cause things
 > downstream to fail.
 > MAKER can fix the errors for you, but first it gives a
 > warning designed to
 > make you look at the file and validate it.? Why
 would
 > you want to do this?
 >? For example, what if you provided protein
 sequence to the
 > EST option
 > accidentally, you wouldn?t want MAKER to just
 > proceed.? You want a warning
 > so you can check first.? If your file is in fact
 EST
 > data, then set the
 > flag and those characters will be changed to N?s in
 the
 > fixed fasta
 > sequence, otherwise those characters will cause errors
 in
 > downstream tools
 > like exonerate, and even some downstream GMOD tools, so
 they
 > can?t be
 > allowed to remain as is.
 > 
 > For the GFF3 file, there is almost definitely a logic
 issue
 > in the file
 > (mod encode validator won?t check for those).?
 This
 > can be from prior
 > manipulation of the GFF3 file.? For example, IDs
 for a
 > gene that are the
 > same across two contigs (technically valid but a logic
 > error).? The GFF3
 > error message will normally give the ID of the feature
 > causing the issue.
 > 
 > I could also take a look for you.? You can upload
 the
 > GFF3 file here ?>
 > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
 > Click on 'new guest account' then e-mail me back you
 guest
 > ID, so I know
 > which files to review.
 > 
 > Thanks,
 > Carson
 > 
 > 
 > 
 > On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com>
 > wrote:
 > 
 > >Maker folks,
 > >I am re-annotating a single contig and I am having
 a few
 > problems.
 > >
 > >First, I am having trouble passing through a Maker
 > derived gff (from
 > >Maker 2.09, with some modifications to gene names
 and
 > functional
 > >information added).? The gff file passes the
 > modencode validator but
 > >Maker always fails on the first gene in the file,
 > regardless of which
 > >gene comes first.? So it appears to be a
 systematic
 > error across the
 > >entire file.? The Maker error is "Check your
 input
 > GFF3 file for errors!
 > >(from GFFDB)".???I have tried Maker
 2.10
 > and 2.31, using both genome_gff
 > >with model_pass=1 and pred_gff.? Attached is a
 gff
 > with the first 2
 > >genes.? 
 > >
 > >Second, when I updated to Maker 2.31, Maker now
 > complains that my EST
 > >fasta file has nucleotides that are not supported
 > [RYKMSWBDHV].? It
 > >suggests "set -fix_nucleotides on the command line
 to
 > fix this
 > >automatically".? Is the -fix_nucleotides a
 Maker
 > flag?? What exactly does
 > >it do?? Does it remove the entire sequence or
 > replace ambiguous bases
 > >with a randomly selected one?? Half of my 20k
 ESTs
 > contain these
 > >characters, so I don't want to throw them out
 entirely.
 > >
 > >Also, just curious, has Maker never supported
 these
 > characters but just
 > >never complained?? I used this EST data set
 with
 > Maker 2.09.? I did note
 > >poor EST coverage, but thought it was an issue with
 the
 > EST data itself.
 > >
 > >I appreciate any suggestions.
 > >Thanks,
 >
 >Megan_______________________________________________
 > >maker-devel mailing list
 > >maker-devel at box290.bluehost.com
 > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
 > 
 > 
 >
 
 
From carsonhh at gmail.com  Tue Feb 25 17:58:08 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 25 Feb 2014 17:58:08 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
Message-ID: <CF32868D.A42A%carsonhh@gmail.com>

There is a way.  It?s not a standard option and it?s undocumented, but if
you add est_forward=1 to the maker_opts.ctl file, then it will do just that.
The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option.  If you add tags
to your fasta headers, those can be used to guide the mapping and naming.
For example, gene_id=<some_gene>  will ensure different isoforms that share
a common gene_id get clustered into the same gene, and
maker_coor=chr1:1-10000 in the fasta header will force a particular sequence
to only be mapped against chr1 within the range of 1-10000 bp  and just
using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast
alignments of earlier transcript or protein annotations as a guide.

?Carson


From:  Shaun Jackman <sjackman at gmail.com>
Reply-To:  Shaun Jackman <sjackman at gmail.com>
Date:  Tuesday, February 25, 2014 at 5:06 PM
To:  <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Mapping gene names

Hi,

I?m annotating a genome using a closely related genome from Genbank, using
the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate
my genome. I?ve run Maker, and the annotation seems to have worked well. Is
it possible to map the names of the genes from the related species to my
annotation? I see the map_forward option, which applies to the model_gff
parameter. Is there a similar option for est and protein?

maker_opts.ctl
est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1
Thanks,
Shaun
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140225/acb85579/attachment-0003.html>

From carsonhh at gmail.com  Tue Feb 25 18:04:48 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 25 Feb 2014 18:04:48 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CF32868D.A42A%carsonhh@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
Message-ID: <CF328AAA.A44D%carsonhh@gmail.com>

One more note.  When using this option, the score column of mRNA features
will represent how completely this gene matches the source EST/protein
(fraction coverage multiplied by % identity).  So a value of 100 means there
is perfect match.  This way if the same transcript maps to multiple
locations, then you can identify which locations is the closest match (also
works for identifying likly orthologs vs. paralogs).

?Carson


From:  Carson Holt <carsonhh at gmail.com>
Date:  Tuesday, February 25, 2014 at 5:58 PM
To:  Shaun Jackman <sjackman at gmail.com>, <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Mapping gene names

There is a way.  It?s not a standard option and it?s undocumented, but if
you add est_forward=1 to the maker_opts.ctl file, then it will do just that.
The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option.  If you add tags
to your fasta headers, those can be used to guide the mapping and naming.
For example, gene_id=<some_gene>  will ensure different isoforms that share
a common gene_id get clustered into the same gene, and
maker_coor=chr1:1-10000 in the fasta header will force a particular sequence
to only be mapped against chr1 within the range of 1-10000 bp  and just
using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast
alignments of earlier transcript or protein annotations as a guide.

?Carson


From:  Shaun Jackman <sjackman at gmail.com>
Reply-To:  Shaun Jackman <sjackman at gmail.com>
Date:  Tuesday, February 25, 2014 at 5:06 PM
To:  <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Mapping gene names

Hi,

I?m annotating a genome using a closely related genome from Genbank, using
the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate
my genome. I?ve run Maker, and the annotation seems to have worked well. Is
it possible to map the names of the genes from the related species to my
annotation? I see the map_forward option, which applies to the model_gff
parameter. Is there a similar option for est and protein?

maker_opts.ctl
est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1
Thanks,
Shaun
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m
aker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140225/bc343f94/attachment-0003.html>

From weckalba at asu.edu  Tue Feb 25 18:36:21 2014
From: weckalba at asu.edu (Walter Eckalbar)
Date: Tue, 25 Feb 2014 17:36:21 -0800
Subject: [maker-devel] invalid gff3 format issues
Message-ID: <CANRPJScjqJDph_SMu0+8PaTMDT7aym9a3u_nhVihYa6BNxZ3AQ@mail.gmail.com>

Hi all,

I am trying to update maker annotations with PASA and encountered errors
stemming from file format issues in the gff3 file.

I put a few lines from the gff3 to highlight the issue below.  Basically,
the problem is that there are non-unique IDs for a number of the
annotations.

Is there anything that can be done to right this problem?

Thanks,

Walter

Lines from GFF3 file, repeated IDs are highlighted:


chr1    maker    gene    9377440    9432028    .    -    .
ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16
chr1    maker    mRNA    9377440    9432028    .    -    .
ID=maker-chr1-snap-gene-4.53-mRNA-1;
Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234
chr1    maker    exon    9431899    9432028    .    -    .
ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1
chr1    maker    exon    9431698    9431808    .    -    .
ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1

chr1    maker    gene    8894975    9021577    .    +    .
ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53
chr1    maker    mRNA    8894975    9021577    .    +    .
ID=maker-chr1-snap-gene-4.53-mRNA-1;
Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007
chr1    maker    exon    8894975    8895153    .    +    .
ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
chr1    maker    exon    8942215    8942531    .    +    .
ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140225/2bb3934c/attachment-0003.html>

From dence at genetics.utah.edu  Tue Feb 25 19:02:04 2014
From: dence at genetics.utah.edu (Daniel Ence)
Date: Wed, 26 Feb 2014 02:02:04 +0000
Subject: [maker-devel] invalid gff3 format issues
In-Reply-To: <CANRPJScjqJDph_SMu0+8PaTMDT7aym9a3u_nhVihYa6BNxZ3AQ@mail.gmail.com>
References: <CANRPJScjqJDph_SMu0+8PaTMDT7aym9a3u_nhVihYa6BNxZ3AQ@mail.gmail.com>
Message-ID: <BA9485A1-B761-4C33-A695-9FF6EF43B109@genetics.utah.edu>

Hi Walter,

Will you upload the full GFF3 and the control files that you used to this URL?
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189
Also, what version of MAKER are you running this with?

Thanks,
Daniel


On Feb 25, 2014, at 6:36 PM, Walter Eckalbar <weckalba at asu.edu<mailto:weckalba at asu.edu>>
 wrote:

Hi all,

I am trying to update maker annotations with PASA and encountered errors stemming from file format issues in the gff3 file.

I put a few lines from the gff3 to highlight the issue below.  Basically, the problem is that there are non-unique IDs for a number of the annotations.

Is there anything that can be done to right this problem?

Thanks,

Walter

Lines from GFF3 file, repeated IDs are highlighted:


chr1    maker    gene    9377440    9432028    .    -    .    ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16
chr1    maker    mRNA    9377440    9432028    .    -    .    ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234
chr1    maker    exon    9431899    9432028    .    -    .    ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1
chr1    maker    exon    9431698    9431808    .    -    .    ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1

chr1    maker    gene    8894975    9021577    .    +    .    ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53
chr1    maker    mRNA    8894975    9021577    .    +    .    ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007
chr1    maker    exon    8894975    8895153    .    +    .    ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
chr1    maker    exon    8942215    8942531    .    +    .    ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/72236939/attachment-0003.html>

From weckalba at asu.edu  Tue Feb 25 19:11:12 2014
From: weckalba at asu.edu (Walter Eckalbar)
Date: Tue, 25 Feb 2014 18:11:12 -0800
Subject: [maker-devel] invalid gff3 format issues
In-Reply-To: <BA9485A1-B761-4C33-A695-9FF6EF43B109@genetics.utah.edu>
References: <CANRPJScjqJDph_SMu0+8PaTMDT7aym9a3u_nhVihYa6BNxZ3AQ@mail.gmail.com>
	<BA9485A1-B761-4C33-A695-9FF6EF43B109@genetics.utah.edu>
Message-ID: <CANRPJSdY6--A0QtTOUBNQM+HN7dWRDv1YZv7bi=+CVef8LLRXw@mail.gmail.com>

Hi Daniel, those have been uploaded and I'm using version 2.28.

Walter


On 25 February 2014 18:02, Daniel Ence <dence at genetics.utah.edu> wrote:

>  Hi Walter,
>
>  Will you upload the full GFF3 and the control files that you used to
> this URL?
>  http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189
> Also, what version of MAKER are you running this with?
>
>  Thanks,
> Daniel
>
>
>
>  On Feb 25, 2014, at 6:36 PM, Walter Eckalbar <weckalba at asu.edu>
>  wrote:
>
>   Hi all,
>
> I am trying to update maker annotations with PASA and encountered errors
> stemming from file format issues in the gff3 file.
>
>  I put a few lines from the gff3 to highlight the issue below.  Basically,
> the problem is that there are non-unique IDs for a number of the
> annotations.
>
>  Is there anything that can be done to right this problem?
>
> Thanks,
>
>  Walter
>
> Lines from GFF3 file, repeated IDs are highlighted:
>
>
> chr1    maker    gene    9377440    9432028    .    -    .
> ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16
> chr1    maker    mRNA    9377440    9432028    .    -    .
> ID=maker-chr1-snap-gene-4.53-mRNA-1;
> Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234
> chr1    maker    exon    9431899    9432028    .    -    .
> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1
> chr1    maker    exon    9431698    9431808    .    -    .
> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1
>
> chr1    maker    gene    8894975    9021577    .    +    .
> ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53
> chr1    maker    mRNA    8894975    9021577    .    +    .   ID=maker-chr1-snap-gene-4.53-mRNA-1;
> Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007
> chr1    maker    exon    8894975    8895153    .    +    .
> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
> chr1    maker    exon    8942215    8942531    .    +    .
> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
>  _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140225/2392a8fd/attachment-0003.html>

From carsonhh at gmail.com  Tue Feb 25 21:10:27 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 25 Feb 2014 21:10:27 -0700
Subject: [maker-devel] invalid gff3 format issues
In-Reply-To: <CANRPJSdY6--A0QtTOUBNQM+HN7dWRDv1YZv7bi=+CVef8LLRXw@mail.gmail.com>
References: <CANRPJScjqJDph_SMu0+8PaTMDT7aym9a3u_nhVihYa6BNxZ3AQ@mail.gmail.com>
	<BA9485A1-B761-4C33-A695-9FF6EF43B109@genetics.utah.edu>
	<CANRPJSdY6--A0QtTOUBNQM+HN7dWRDv1YZv7bi=+CVef8LLRXw@mail.gmail.com>
Message-ID: <CF32B115.A46C%carsonhh@gmail.com>

Could you try version 2.31 (the current version)?  I believe this is
happening because you are passing in MAKER genes as pred_gff the transcripts
thus ended up with the same Names and IDs as the genes being generated by
the MAKER run via SNAP etc.  This shouldn?t happen with model_gff, and
shouldn?t happen in 2.31 (IDs and names are generated slightly differently
in 2.30+).

Thanks,
Carson

From:  Walter Eckalbar <weckalba at asu.edu>
Date:  Tuesday, February 25, 2014 at 7:11 PM
To:  Daniel Ence <dence at genetics.utah.edu>
Cc:  "<maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] invalid gff3 format issues

Hi Daniel, those have been uploaded and I?m using version 2.28.

Walter


On 25 February 2014 18:02, Daniel Ence <dence at genetics.utah.edu> wrote:
> Hi Walter, 
> 
> Will you upload the full GFF3 and the control files that you used to this URL?
> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189
> Also, what version of MAKER are you running this with?
> 
> Thanks,
> Daniel
> 
> 
> 
> On Feb 25, 2014, at 6:36 PM, Walter Eckalbar <weckalba at asu.edu>
>  wrote:
> 
>> Hi all,
>> 
>> I am trying to update maker annotations with PASA and encountered errors
>> stemming from file format issues in the gff3 file.
>> 
>> I put a few lines from the gff3 to highlight the issue below.  Basically, the
>> problem is that there are non-unique IDs for a number of the annotations.
>> 
>> Is there anything that can be done to right this problem?
>> 
>> Thanks,
>> 
>> Walter
>> 
>> Lines from GFF3 file, repeated IDs are highlighted:
>> 
>> 
>> chr1    maker    gene    9377440    9432028    .    -    .
>> ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.
>> 16
>> chr1    maker    mRNA    9377440    9432028    .    -    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-pred_gff_maker-gene-4.1
>> 6;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82
>> |1|1|1|28|1680|1234
>> chr1    maker    exon    9431899    9432028    .    -    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53
>> -mRNA-1
>> chr1    maker    exon    9431698    9431808    .    -    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53
>> -mRNA-1
>> 
>> chr1    maker    gene    8894975    9021577    .    +    .
>> ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53
>> chr1    maker    mRNA    8894975    9021577    .    +    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-snap-gene-4.53;Name=mak
>> er-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0
>> .88|27|503|2007
>> chr1    maker    exon    8894975    8895153    .    +    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53
>> -mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,mak
>> er-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-sna
>> p-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53
>> -mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,ma
>> ker-chr1-snap-gene-4.53-mRNA-11
>> chr1    maker    exon    8942215    8942531    .    +    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53
>> -mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,mak
>> er-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-sna
>> p-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53
>> -mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,ma
>> ker-chr1-snap-gene-4.53-mRNA-11
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 

_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140225/f87e77c7/attachment-0003.html>

From marc.hoeppner at imbim.uu.se  Wed Feb 26 01:26:35 2014
From: marc.hoeppner at imbim.uu.se (=?Windows-1252?Q?Marc_H=F6ppner?=)
Date: Wed, 26 Feb 2014 08:26:35 +0000
Subject: [maker-devel] Functional annotation options
Message-ID: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se>

Dear List,

I have finished a gene build now, and I would like to go over to functional annotation. I understand that maker includes a few script to facilitate such analyses. However, I have a few questions about this:

1) iprscan
It seems maker includes a MPI wrapper for InterProscan, but requests ?iprscan? to be in $PATH. The latest versions of Interproscan I have worked with are java applications and eventho I put their location in $PATH, mpi_iprscan seems to want something else? But what?

2) maker_functional_gff
This script seems to be very useful, but the description suggests that it requires WuBlast tabular output ?2', which I think looks quite different from the ncbi blast tabular output. Since Wublast is not really available anymore (except this very old, frozen binary bundle), I was wondering how to address this issue. 

3) maker_functional
This just throws an error about a missing Job ID, so no clue what this would be used for.

I guess what I am after is some suggestion as to how use the scripts included with Maker to achieve a reasonable functional annotation. 

With kind regards,

Marc Hoeppner

Marc P. Hoeppner, PhD
Team Leader
BILS Genome Annotation Platform
Department for Medical Biochemistry and Microbiology
Uppsala University, Sweden
marc.hoeppner at imbim.uu.se


From mikael.durling at slu.se  Wed Feb 26 02:43:43 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Wed, 26 Feb 2014 09:43:43 +0000
Subject: [maker-devel] Functional annotation options
In-Reply-To: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se>
References: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se>
Message-ID: <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se>


26 feb 2014 kl. 09:26 skrev Marc H?ppner <marc.hoeppner at imbim.uu.se>:

> Dear List,
> 
> I have finished a gene build now, and I would like to go over to functional annotation. I understand that maker includes a few script to facilitate such analyses. However, I have a few questions about this:
> 
> 1) iprscan
> It seems maker includes a MPI wrapper for InterProscan, but requests ?iprscan? to be in $PATH. The latest versions of Interproscan I have worked with are java applications and eventho I put their location in $PATH, mpi_iprscan seems to want something else? But what?

I don?t believe it works with interproscan5. What I usually do is to split the maker protein file into chunks, and then run these chunks as separate jobs on our cluster, then finally merge the results. The TSV file form iprscan5 can be input into the maker tool ipr_update_gff. I have not tried the iprscan2gff3, as I haven?t figured how to get an iprscan4 raw file from iprscan5.


> 2) maker_functional_gff
> This script seems to be very useful, but the description suggests that it requires WuBlast tabular output ?2', which I think looks quite different from the ncbi blast tabular output. Since Wublast is not really available anymore (except this very old, frozen binary bundle), I was wondering how to address this issue. 

It works fine with ncbiblast+ and the blastp command with -outfmt 6. 

cheers,
Mikael

Ps. Your welcome to visit me at SLU if you would like to discuss experiences of genome annotations.


> 
> 3) maker_functional
> This just throws an error about a missing Job ID, so no clue what this would be used for.
> 
> I guess what I am after is some suggestion as to how use the scripts included with Maker to achieve a reasonable functional annotation. 
> 
> With kind regards,
> 
> Marc Hoeppner
> 
> Marc P. Hoeppner, PhD
> Team Leader
> BILS Genome Annotation Platform
> Department for Medical Biochemistry and Microbiology
> Uppsala University, Sweden
> marc.hoeppner at imbim.uu.se
> 
> 
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From mikael.durling at slu.se  Wed Feb 26 02:55:56 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Wed, 26 Feb 2014 09:55:56 +0000
Subject: [maker-devel] Functional annotation options
In-Reply-To: <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se>
References: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se>
	<63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se>
Message-ID: <29357689-D616-465F-BCC4-66AF5B1D5D2E@slu.se>


26 feb 2014 kl. 10:43 skrev Mikael Brandstr?m Durling <mikael.durling at slu.se<mailto:mikael.durling at slu.se>>:


26 feb 2014 kl. 09:26 skrev Marc H?ppner <marc.hoeppner at imbim.uu.se<mailto:marc.hoeppner at imbim.uu.se>>:

Dear List,

I have finished a gene build now, and I would like to go over to functional annotation. I understand that maker includes a few script to facilitate such analyses. However, I have a few questions about this:

1) iprscan
It seems maker includes a MPI wrapper for InterProscan, but requests ?iprscan? to be in $PATH. The latest versions of Interproscan I have worked with are java applications and eventho I put their location in $PATH, mpi_iprscan seems to want something else? But what?

I don?t believe it works with interproscan5. What I usually do is to split the maker protein file into chunks, and then run these chunks as separate jobs on our cluster, then finally merge the results. The TSV file form iprscan5 can be input into the maker tool ipr_update_gff. I have not tried the iprscan2gff3, as I haven?t figured how to get an iprscan4 raw file from iprscan5.

I should clarify this and say that mpi_iprscan doesn?t seem to work with iprscan5. ipr_update_gff3 does, however.


Mikael

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/b4a81f22/attachment-0003.html>

From mikael.durling at slu.se  Wed Feb 26 05:30:44 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Wed, 26 Feb 2014 12:30:44 +0000
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CF32868D.A42A%carsonhh@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
Message-ID: <BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>

Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?

Thanks,
Mikael

26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>:

There is a way.  It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that.  The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option.  If you add tags to your fasta headers, those can be used to guide the mapping and naming.  For example, gene_id=<some_gene>  will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp  and just using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.

?Carson


From: Shaun Jackman <sjackman at gmail.com<mailto:sjackman at gmail.com>>
Reply-To: Shaun Jackman <sjackman at gmail.com<mailto:sjackman at gmail.com>>
Date: Tuesday, February 25, 2014 at 5:06 PM
To: <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: [maker-devel] Mapping gene names


Hi,

I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?

maker_opts.ctl

est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1


Thanks,
Shaun

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/874f135e/attachment-0003.html>

From carsonhh at gmail.com  Wed Feb 26 06:22:34 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 26 Feb 2014 06:22:34 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
Message-ID: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>

Yes.  That should work as well as an accidental feature.

--Carson 

Sent from my iPhone

> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se> wrote:
> 
> Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?
> 
> Thanks,
> Mikael
> 
>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>> 
>> There is a way.  It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that.  The option won?t already be there so you?ll have to type it in.
>> 
>> There is also a feature designed to work with this option.  If you add tags to your fasta headers, those can be used to guide the mapping and naming.  For example, gene_id=<some_gene>  will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp  and just using maker_coor=chr1 will force it to only be mapped against chr1.
>> 
>> This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.
>> 
>> ?Carson
>> 
>> 
>> 
>> 
>> From: Shaun Jackman <sjackman at gmail.com>
>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>> Date: Tuesday, February 25, 2014 at 5:06 PM
>> To: <maker-devel at yandell-lab.org>
>> Subject: [maker-devel] Mapping gene names
>> 
>> Hi,
>> 
>> I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?
>> 
>> maker_opts.ctl
>> 
>> est=NC_123456.frn
>> protein=NC_123456.faa
>> est2genome=1
>> protein2genome=1
>> Thanks,
>> Shaun
>> 
>> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/f3b97c58/attachment-0003.html>

From mikael.durling at slu.se  Wed Feb 26 06:37:29 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Wed, 26 Feb 2014 13:37:29 +0000
Subject: [maker-devel] Mapping gene names
In-Reply-To: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
Message-ID: <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>

That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right?

Mikael

26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>:

Yes.  That should work as well as an accidental feature.

--Carson

Sent from my iPhone

On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se<mailto:mikael.durling at slu.se>> wrote:

Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?

Thanks,
Mikael

26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>:

There is a way.  It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that.  The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option.  If you add tags to your fasta headers, those can be used to guide the mapping and naming.  For example, gene_id=<some_gene>  will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp  and just using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.

?Carson


From: Shaun Jackman <sjackman at gmail.com<mailto:sjackman at gmail.com>>
Reply-To: Shaun Jackman <sjackman at gmail.com<mailto:sjackman at gmail.com>>
Date: Tuesday, February 25, 2014 at 5:06 PM
To: <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: [maker-devel] Mapping gene names


Hi,

I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?

maker_opts.ctl

est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1


Thanks,
Shaun

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/791ef46d/attachment-0003.html>

From nextgen.usfs at gmail.com  Wed Feb 26 09:21:33 2014
From: nextgen.usfs at gmail.com (USFS Ion PGM)
Date: Wed, 26 Feb 2014 10:21:33 -0600
Subject: [maker-devel] change program locations in maker_exe
Message-ID: <CDD24D4E-4555-474F-9367-B6F6D05F11B4@gmail.com>

Hello,
I was wondering if there is a way to make permanent changes to the maker_exe.ctl file, as it seems on the install that maker didn?t find the gene mark or pro build locations correctly, which means that I have to manually edit the maker_exe.ctl file every time and add that information.  Where can I modify this permanently so that the maker -CTL command creates the appropriate maker_exe file?  Thank you.

- Jon


From carsonhh at gmail.com  Wed Feb 26 08:38:47 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 26 Feb 2014 08:38:47 -0700
Subject: [maker-devel] Functional annotation options
In-Reply-To: <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se>
References: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se>
	<63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se>
Message-ID: <CF33558F.A4C3%carsonhh@gmail.com>

maker_functional is a script that gets called by another script, not meant
to be called directly by the user.  So ignore that.

Just run iprscan directly it already works pretty well.  The mpi_iprscan
and iprscan_wrap scripts, just give some logging functionality by wrapping
the iprscan call.  In most cases there is not advantage over just running
iprscan directly.

?Carson


On 2/26/14, 2:43 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
wrote:

>
>26 feb 2014 kl. 09:26 skrev Marc H?ppner <marc.hoeppner at imbim.uu.se>:
>
>> Dear List,
>> 
>> I have finished a gene build now, and I would like to go over to
>>functional annotation. I understand that maker includes a few script to
>>facilitate such analyses. However, I have a few questions about this:
>> 
>> 1) iprscan
>> It seems maker includes a MPI wrapper for InterProscan, but requests
>>?iprscan? to be in $PATH. The latest versions of Interproscan I have
>>worked with are java applications and eventho I put their location in
>>$PATH, mpi_iprscan seems to want something else? But what?
>
>I don?t believe it works with interproscan5. What I usually do is to
>split the maker protein file into chunks, and then run these chunks as
>separate jobs on our cluster, then finally merge the results. The TSV
>file form iprscan5 can be input into the maker tool ipr_update_gff. I
>have not tried the iprscan2gff3, as I haven?t figured how to get an
>iprscan4 raw file from iprscan5.
>
>
>> 2) maker_functional_gff
>> This script seems to be very useful, but the description suggests that
>>it requires WuBlast tabular output ?2', which I think looks quite
>>different from the ncbi blast tabular output. Since Wublast is not
>>really available anymore (except this very old, frozen binary bundle), I
>>was wondering how to address this issue.
>
>It works fine with ncbiblast+ and the blastp command with -outfmt 6.
>
>cheers,
>Mikael
>
>Ps. Your welcome to visit me at SLU if you would like to discuss
>experiences of genome annotations.
>
>
>> 
>> 3) maker_functional
>> This just throws an error about a missing Job ID, so no clue what this
>>would be used for.
>> 
>> I guess what I am after is some suggestion as to how use the scripts
>>included with Maker to achieve a reasonable functional annotation.
>> 
>> With kind regards,
>> 
>> Marc Hoeppner
>> 
>> Marc P. Hoeppner, PhD
>> Team Leader
>> BILS Genome Annotation Platform
>> Department for Medical Biochemistry and Microbiology
>> Uppsala University, Sweden
>> marc.hoeppner at imbim.uu.se
>> 
>> 
>> 
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Wed Feb 26 09:09:14 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 26 Feb 2014 09:09:14 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
Message-ID: <CF335A95.A4DE%carsonhh@gmail.com>

It will still work without est_forward.  It just works a little differently.
Keep in mind this was a hidden feature I used to find stubborn or hard to
find missing genes after reassembly of a genome.

If est_forward is provided, MAKER will parse the database to look for the
maker_coor tags early in the pipeline.  Then it will create a list of
locations to search, and it will search them even if there are no BLAST
results to seed the search (normally MAKER gets a BLAST result first and
then polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to
look for a match using all of chr1 as the input to exonerate even when BLAST
finds nothing (this is a very very slow search, but can help pick up one or
two stubborn genes that don?t remap well).  To allow this, MAKER gives
exonerate looser matching parameters (i.e. allows for single base pair
introns perhaps caused by assembly errors).  The logic here is that given
the fact that I already told MAKER that with some degree of confidence I
expect sequence A to map to to location X, it will try its hardest to make
it match. 

Without est_forward set, the maker_coor= flag still gets read in GI.pm at
line 1563, but only after a BLAST alignment has already seeded it to the
region (that BLAST result has the information in its description parameter).
MAKER will then ignore seeds completely outside of maker_coor. In addition
any BLAST seeds that overlap maker_coor will get the search space for
alignment polishing adjusted to match maker_coor exactly.  Also match
parameters for exonerate will not be relaxed as they were with est_forward.

As you can see the behavior, is slightly different (because it?s an
accidental feature).

Thanks,
Carson


From:  Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date:  Wednesday, February 26, 2014 at 6:37 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Mapping gene names

That might be a useful and time saving accidental feature. But, reading the
code, it seems that I need to supply maker_coor but not gene_id, as well as
the configuration option est_forward for this to work. Any occurrences of
maker_coor in GI.pm seems to be conditioned on set_forward=1 right?

Mikael

26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:

> Yes.  That should work as well as an accidental feature.
> 
> --Carson 
> 
> Sent from my iPhone
> 
> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se>
> wrote:
> 
>> Can this use of maker_coor be used only to hint about the placement of the
>> ests, without affecting the naming of the final genes? Ie if I have a
>> database of EST where I have a priori knowledge of their rough placement, can
>> this placement be given to maker without providing est_forward=1?
>> 
>> Thanks,
>> Mikael
>> 
>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>> 
>>> There is a way.  It?s not a standard option and it?s undocumented, but if
>>> you add est_forward=1 to the maker_opts.ctl file, then it will do just that.
>>> The option won?t already be there so you?ll have to type it in.
>>> 
>>> There is also a feature designed to work with this option.  If you add tags
>>> to your fasta headers, those can be used to guide the mapping and naming.
>>> For example, gene_id=<some_gene>  will ensure different isoforms that share
>>> a common gene_id get clustered into the same gene, and
>>> maker_coor=chr1:1-10000 in the fasta header will force a particular sequence
>>> to only be mapped against chr1 within the range of 1-10000 bp  and just
>>> using maker_coor=chr1 will force it to only be mapped against chr1.
>>> 
>>> This is an undocumented way to remap genes onto new assemblies using blast
>>> alignments of earlier transcript or protein annotations as a guide.
>>> 
>>> ?Carson
>>> 
>>> 
>>> 
>>> 
>>> From: Shaun Jackman <sjackman at gmail.com>
>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>> To: <maker-devel at yandell-lab.org>
>>> Subject: [maker-devel] Mapping gene names
>>> 
>>> Hi,
>>> 
>>> I?m annotating a genome using a closely related genome from Genbank, using
>>> the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate
>>> my genome. I?ve run Maker, and the annotation seems to have worked well. Is
>>> it possible to map the names of the genes from the related species to my
>>> annotation? I see the map_forward option, which applies to the model_gff
>>> parameter. Is there a similar option for est and protein?
>>> 
>>> maker_opts.ctl
>>> est=NC_123456.frn
>>> protein=NC_123456.faa
>>> est2genome=1
>>> protein2genome=1
>>> Thanks,
>>> Shaun
>>> _______________________________________________ maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/4889751f/attachment-0003.html>

From carson.holt at genetics.utah.edu  Wed Feb 26 09:38:37 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Wed, 26 Feb 2014 16:38:37 +0000
Subject: [maker-devel] change program locations in maker_exe
In-Reply-To: <CDD24D4E-4555-474F-9367-B6F6D05F11B4@gmail.com>
References: <CDD24D4E-4555-474F-9367-B6F6D05F11B4@gmail.com>
Message-ID: <CF33655B.A514%carson.holt@genetics.utah.edu>

MAKER first looks inside of .../maker/exe/ for any executables.  Then it
uses the systems ?which? command to identify executables in your PATH
environmental variable.  If MAKER is not finding the one you want, then
you can either put the program in the .../maker/exe/ folder (I.e. create
.../maker/exe/bin/  and then put soft links to the executables you want to
be used first), or you can rearrange the order of paraameters in your PATH
environmental variable so that ?which <program_name>? returns the location
you want.  If MAKER is always leaving the locations to those programs
empty, it is because you need to add them to your PATH environmental
variable.

Thanks,
Carson

On 2/26/14, 9:21 AM, "USFS Ion PGM" <nextgen.usfs at gmail.com> wrote:

>Hello,
>I was wondering if there is a way to make permanent changes to the
>maker_exe.ctl file, as it seems on the install that maker didn?t find the
>gene mark or pro build locations correctly, which means that I have to
>manually edit the maker_exe.ctl file every time and add that information.
> Where can I modify this permanently so that the maker -CTL command
>creates the appropriate maker_exe file?  Thank you.
>
>- Jon
>
>


From nextgen.usfs at gmail.com  Wed Feb 26 09:58:11 2014
From: nextgen.usfs at gmail.com (USFS Ion PGM)
Date: Wed, 26 Feb 2014 10:58:11 -0600
Subject: [maker-devel] change program locations in maker_exe
In-Reply-To: <CF33655B.A514%carson.holt@genetics.utah.edu>
References: <CDD24D4E-4555-474F-9367-B6F6D05F11B4@gmail.com>
	<CF33655B.A514%carson.holt@genetics.utah.edu>
Message-ID: <2FA61AAE-0548-4030-9F4A-6964A631703C@gmail.com>

Hi Carson,

Thank you - that did it, I didn?t have them in the PATH.  All working now.

Cheers,
Jon

On Feb 26, 2014, at 10:38 AM, Carson Holt <carson.holt at genetics.utah.edu> wrote:

> MAKER first looks inside of .../maker/exe/ for any executables.  Then it
> uses the systems ?which? command to identify executables in your PATH
> environmental variable.  If MAKER is not finding the one you want, then
> you can either put the program in the .../maker/exe/ folder (I.e. create
> .../maker/exe/bin/  and then put soft links to the executables you want to
> be used first), or you can rearrange the order of paraameters in your PATH
> environmental variable so that ?which <program_name>? returns the location
> you want.  If MAKER is always leaving the locations to those programs
> empty, it is because you need to add them to your PATH environmental
> variable.
> 
> Thanks,
> Carson
> 
> On 2/26/14, 9:21 AM, "USFS Ion PGM" <nextgen.usfs at gmail.com> wrote:
> 
>> Hello,
>> I was wondering if there is a way to make permanent changes to the
>> maker_exe.ctl file, as it seems on the install that maker didn?t find the
>> gene mark or pro build locations correctly, which means that I have to
>> manually edit the maker_exe.ctl file every time and add that information.
>> Where can I modify this permanently so that the maker -CTL command
>> creates the appropriate maker_exe file?  Thank you.
>> 
>> - Jon
>> 
>> 
> 


From weckalba at asu.edu  Wed Feb 26 13:05:05 2014
From: weckalba at asu.edu (Walter Eckalbar)
Date: Wed, 26 Feb 2014 12:05:05 -0800
Subject: [maker-devel] invalid gff3 format issues
In-Reply-To: <CF32B115.A46C%carsonhh@gmail.com>
References: <CANRPJScjqJDph_SMu0+8PaTMDT7aym9a3u_nhVihYa6BNxZ3AQ@mail.gmail.com>
	<BA9485A1-B761-4C33-A695-9FF6EF43B109@genetics.utah.edu>
	<CANRPJSdY6--A0QtTOUBNQM+HN7dWRDv1YZv7bi=+CVef8LLRXw@mail.gmail.com>
	<CF32B115.A46C%carsonhh@gmail.com>
Message-ID: <CANRPJSfTAZrey0m6usseLZ6Sj-2fOsMWe_q1_6-9yXvOiwm44w@mail.gmail.com>

Hi Carson,

Thanks, that seems to have mostly resolved the issue.  Oddly enough though,
PASA still complains about the GFF3 file directly from gff3_merge, but if I
first transform it with maker2eval_gtf, then use PASA's
gtf_to_gff3_format.pl script, everything seems to run fine.


On 25 February 2014 20:10, Carson Holt <carsonhh at gmail.com> wrote:

> Could you try version 2.31 (the current version)?  I believe this is
> happening because you are passing in MAKER genes as pred_gff the
> transcripts thus ended up with the same Names and IDs as the genes being
> generated by the MAKER run via SNAP etc.  This shouldn't happen with
> model_gff, and shouldn't happen in 2.31 (IDs and names are generated
> slightly differently in 2.30+).
>
> Thanks,
> Carson
>
> From: Walter Eckalbar <weckalba at asu.edu>
> Date: Tuesday, February 25, 2014 at 7:11 PM
> To: Daniel Ence <dence at genetics.utah.edu>
> Cc: "<maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] invalid gff3 format issues
>
> Hi Daniel, those have been uploaded and I'm using version 2.28.
>
> Walter
>
>
> On 25 February 2014 18:02, Daniel Ence <dence at genetics.utah.edu> wrote:
>
>> Hi Walter,
>>
>> Will you upload the full GFF3 and the control files that you used to this
>> URL?
>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189
>> Also, what version of MAKER are you running this with?
>>
>> Thanks,
>> Daniel
>>
>>
>>
>> On Feb 25, 2014, at 6:36 PM, Walter Eckalbar <weckalba at asu.edu>
>>  wrote:
>>
>> Hi all,
>>
>> I am trying to update maker annotations with PASA and encountered errors
>> stemming from file format issues in the gff3 file.
>>
>> I put a few lines from the gff3 to highlight the issue below.  Basically,
>> the problem is that there are non-unique IDs for a number of the
>> annotations.
>>
>> Is there anything that can be done to right this problem?
>>
>> Thanks,
>>
>> Walter
>>
>> Lines from GFF3 file, repeated IDs are highlighted:
>>
>>
>> chr1    maker    gene    9377440    9432028    .    -    .
>> ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16
>> chr1    maker    mRNA    9377440    9432028    .    -    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1;
>> Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234
>> chr1    maker    exon    9431899    9432028    .    -    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1
>> chr1    maker    exon    9431698    9431808    .    -    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1
>>
>> chr1    maker    gene    8894975    9021577    .    +    .
>> ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53
>> chr1    maker    mRNA    8894975    9021577    .    +    .   ID=maker-chr1-snap-gene-4.53-mRNA-1;
>> Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007
>> chr1    maker    exon    8894975    8895153    .    +    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
>> chr1    maker    exon    8942215    8942531    .    +    .
>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>>
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/2d2f2884/attachment-0003.html>

From carsonhh at gmail.com  Wed Feb 26 14:12:23 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 26 Feb 2014 14:12:23 -0700
Subject: [maker-devel] invalid gff3 format issues
In-Reply-To: <CANRPJSfTAZrey0m6usseLZ6Sj-2fOsMWe_q1_6-9yXvOiwm44w@mail.gmail.com>
References: <CANRPJScjqJDph_SMu0+8PaTMDT7aym9a3u_nhVihYa6BNxZ3AQ@mail.gmail.com>
	<BA9485A1-B761-4C33-A695-9FF6EF43B109@genetics.utah.edu>
	<CANRPJSdY6--A0QtTOUBNQM+HN7dWRDv1YZv7bi=+CVef8LLRXw@mail.gmail.com>
	<CF32B115.A46C%carsonhh@gmail.com>
	<CANRPJSfTAZrey0m6usseLZ6Sj-2fOsMWe_q1_6-9yXvOiwm44w@mail.gmail.com>
Message-ID: <CF33A669.A53C%carsonhh@gmail.com>

Could you put the file in this GFF3 validator to see if anything comes up?
?> http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online

Maybe it?s just PASA.  But I?d like to know there?s no issue being caused by
something else.

Thanks,
Carson


From:  Walter Eckalbar <weckalba at asu.edu>
Date:  Wednesday, February 26, 2014 at 1:05 PM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Daniel Ence <dence at genetics.utah.edu>, "<maker-devel at yandell-lab.org>"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] invalid gff3 format issues

Hi Carson,

Thanks, that seems to have mostly resolved the issue.  Oddly enough though,
PASA still complains about the GFF3 file directly from gff3_merge, but if I
first transform it with maker2eval_gtf, then use PASA?s
gtf_to_gff3_format.pl <http://gtf_to_gff3_format.pl>  script, everything
seems to run fine.


On 25 February 2014 20:10, Carson Holt <carsonhh at gmail.com> wrote:
> Could you try version 2.31 (the current version)?  I believe this is happening
> because you are passing in MAKER genes as pred_gff the transcripts thus ended
> up with the same Names and IDs as the genes being generated by the MAKER run
> via SNAP etc.  This shouldn?t happen with model_gff, and shouldn?t happen in
> 2.31 (IDs and names are generated slightly differently in 2.30+).
> 
> Thanks,
> Carson
> 
> From:  Walter Eckalbar <weckalba at asu.edu>
> Date:  Tuesday, February 25, 2014 at 7:11 PM
> To:  Daniel Ence <dence at genetics.utah.edu>
> Cc:  "<maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org>
> Subject:  Re: [maker-devel] invalid gff3 format issues
> 
> Hi Daniel, those have been uploaded and I?m using version 2.28.
> 
> Walter
> 
> 
> On 25 February 2014 18:02, Daniel Ence <dence at genetics.utah.edu> wrote:
>> Hi Walter, 
>> 
>> Will you upload the full GFF3 and the control files that you used to this
>> URL?
>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189
>> Also, what version of MAKER are you running this with?
>> 
>> Thanks,
>> Daniel
>> 
>> 
>> 
>> On Feb 25, 2014, at 6:36 PM, Walter Eckalbar <weckalba at asu.edu>
>>  wrote:
>> 
>>> Hi all,
>>> 
>>> I am trying to update maker annotations with PASA and encountered errors
>>> stemming from file format issues in the gff3 file.
>>> 
>>> I put a few lines from the gff3 to highlight the issue below.  Basically,
>>> the problem is that there are non-unique IDs for a number of the
>>> annotations.
>>> 
>>> Is there anything that can be done to right this problem?
>>> 
>>> Thanks,
>>> 
>>> Walter
>>> 
>>> Lines from GFF3 file, repeated IDs are highlighted:
>>> 
>>> 
>>> chr1    maker    gene    9377440    9432028    .    -    .
>>> ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4
>>> .16
>>> chr1    maker    mRNA    9377440    9432028    .    -    .
>>> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-pred_gff_maker-gene-4.
>>> 16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.
>>> 82|1|1|1|28|1680|1234
>>> chr1    maker    exon    9431899    9432028    .    -    .
>>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.5
>>> 3-mRNA-1
>>> chr1    maker    exon    9431698    9431808    .    -    .
>>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.5
>>> 3-mRNA-1
>>> 
>>> chr1    maker    gene    8894975    9021577    .    +    .
>>> ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53
>>> chr1    maker    mRNA    8894975    9021577    .    +    .
>>> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-snap-gene-4.53;Name=ma
>>> ker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84
>>> |0.88|27|503|2007
>>> chr1    maker    exon    8894975    8895153    .    +    .
>>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.5
>>> 3-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,m
>>> aker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-
>>> snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-
>>> 4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-
>>> 10,maker-chr1-snap-gene-4.53-mRNA-11
>>> chr1    maker    exon    8942215    8942531    .    +    .
>>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.5
>>> 3-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,m
>>> aker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-
>>> snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-
>>> 4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-
>>> 10,maker-chr1-snap-gene-4.53-mRNA-11
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
> 
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak
> er-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/ea166d94/attachment-0003.html>

From mikael.durling at slu.se  Wed Feb 26 15:04:37 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Wed, 26 Feb 2014 22:04:37 +0000
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CF335A95.A4DE%carsonhh@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
Message-ID: <ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>

It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions.

In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature.

THanks,
Mikael

26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>:

It will still work without est_forward.  It just works a little differently.  Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome.

If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline.  Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well).  To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors).  The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match.

Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter).  MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly.  Also match parameters for exonerate will not be relaxed as they were with est_forward.

As you can see the behavior, is slightly different (because it?s an accidental feature).

Thanks,
Carson


From: Mikael Brandstr?m Durling <mikael.durling at slu.se<mailto:mikael.durling at slu.se>>
Date: Wednesday, February 26, 2014 at 6:37 AM
To: Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] Mapping gene names

That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right?

Mikael

26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>:

Yes.  That should work as well as an accidental feature.

--Carson

Sent from my iPhone

On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se<mailto:mikael.durling at slu.se>> wrote:

Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?

Thanks,
Mikael

26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>:

There is a way.  It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that.  The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option.  If you add tags to your fasta headers, those can be used to guide the mapping and naming.  For example, gene_id=<some_gene>  will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp  and just using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.

?Carson


From: Shaun Jackman <sjackman at gmail.com<mailto:sjackman at gmail.com>>
Reply-To: Shaun Jackman <sjackman at gmail.com<mailto:sjackman at gmail.com>>
Date: Tuesday, February 25, 2014 at 5:06 PM
To: <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: [maker-devel] Mapping gene names


Hi,

I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?

maker_opts.ctl

est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1


Thanks,
Shaun

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/0409040d/attachment-0003.html>

From carsonhh at gmail.com  Wed Feb 26 15:50:30 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 26 Feb 2014 15:50:30 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
Message-ID: <CF33B334.A551%carsonhh@gmail.com>

What you can do is run it once with just est_forward=1 and
est2genome/protein2genome set to 1.  Then take those results, pass them in
as model_gff and use the map_forward option to then filter the results based
on mRNA score and that would copy names onto new gene under the standard
MAKER pipeline.  Eventually it?s really supposed to go into a separate tool
that will map genes onto new assemblies (but under the hood the tool will
just be calling MAKER with certain parameters restricted).  I do this
because if people commonly use it mixed with things like SNAP I can start to
get some very weird behaviors.

Thanks,
Carson

From:  Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date:  Wednesday, February 26, 2014 at 3:04 PM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Mapping gene names

It seems that this could be a very useful option in those cases where you
have firm a priori knowledge of the placement of ESTs. However, while trying
it I note that est_forward implies that the est2genome predictor is turned
on, implicitly. Is this necessary for this to work? I?m after the behavior
you describe below where exonerate is made to try really hard within a
limited region to align an est, but I would not like maker to produce
est2genome predictions.

In general, I think this maker_coor and est_forward is a feature set that is
worthy to be promoted into a documented feature.

THanks,
Mikael

26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:

> It will still work without est_forward.  It just works a little differently.
> Keep in mind this was a hidden feature I used to find stubborn or hard to find
> missing genes after reassembly of a genome.
> 
> If est_forward is provided, MAKER will parse the database to look for the
> maker_coor tags early in the pipeline.  Then it will create a list of
> locations to search, and it will search them even if there are no BLAST
> results to seed the search (normally MAKER gets a BLAST result first and then
> polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to look for
> a match using all of chr1 as the input to exonerate even when BLAST finds
> nothing (this is a very very slow search, but can help pick up one or two
> stubborn genes that don?t remap well).  To allow this, MAKER gives exonerate
> looser matching parameters (i.e. allows for single base pair introns perhaps
> caused by assembly errors).  The logic here is that given the fact that I
> already told MAKER that with some degree of confidence I expect sequence A to
> map to to location X, it will try its hardest to make it match.
> 
> Without est_forward set, the maker_coor= flag still gets read in GI.pm at line
> 1563, but only after a BLAST alignment has already seeded it to the region
> (that BLAST result has the information in its description parameter).  MAKER
> will then ignore seeds completely outside of maker_coor. In addition any BLAST
> seeds that overlap maker_coor will get the search space for alignment
> polishing adjusted to match maker_coor exactly.  Also match parameters for
> exonerate will not be relaxed as they were with est_forward.
> 
> As you can see the behavior, is slightly different (because it?s an accidental
> feature).
> 
> Thanks,
> Carson
> 
> 
> 
> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
> Date: Wednesday, February 26, 2014 at 6:37 AM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Mapping gene names
> 
> That might be a useful and time saving accidental feature. But, reading the
> code, it seems that I need to supply maker_coor but not gene_id, as well as
> the configuration option est_forward for this to work. Any occurrences of
> maker_coor in GI.pm seems to be conditioned on set_forward=1 right?
> 
> Mikael
> 
> 26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
> 
>> Yes.  That should work as well as an accidental feature.
>> 
>> --Carson 
>> 
>> Sent from my iPhone
>> 
>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling
>> <mikael.durling at slu.se> wrote:
>> 
>>> Can this use of maker_coor be used only to hint about the placement of the
>>> ests, without affecting the naming of the final genes? Ie if I have a
>>> database of EST where I have a priori knowledge of their rough placement,
>>> can this placement be given to maker without providing est_forward=1?
>>> 
>>> Thanks,
>>> Mikael
>>> 
>>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>> 
>>>> There is a way.  It?s not a standard option and it?s undocumented, but if
>>>> you add est_forward=1 to the maker_opts.ctl file, then it will do just
>>>> that.  The option won?t already be there so you?ll have to type it in.
>>>> 
>>>> There is also a feature designed to work with this option.  If you add tags
>>>> to your fasta headers, those can be used to guide the mapping and naming.
>>>> For example, gene_id=<some_gene>  will ensure different isoforms that share
>>>> a common gene_id get clustered into the same gene, and
>>>> maker_coor=chr1:1-10000 in the fasta header will force a particular
>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp  and
>>>> just using maker_coor=chr1 will force it to only be mapped against chr1.
>>>> 
>>>> This is an undocumented way to remap genes onto new assemblies using blast
>>>> alignments of earlier transcript or protein annotations as a guide.
>>>> 
>>>> ?Carson
>>>> 
>>>> 
>>>> 
>>>> 
>>>> From: Shaun Jackman <sjackman at gmail.com>
>>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>>> To: <maker-devel at yandell-lab.org>
>>>> Subject: [maker-devel] Mapping gene names
>>>> 
>>>> Hi,
>>>> 
>>>> I?m annotating a genome using a closely related genome from Genbank, using
>>>> the .frn (RNA) and .faa (protein) files from Genbank as evidence to
>>>> annotate my genome. I?ve run Maker, and the annotation seems to have worked
>>>> well. Is it possible to map the names of the genes from the related species
>>>> to my annotation? I see the map_forward option, which applies to the
>>>> model_gff parameter. Is there a similar option for est and protein?
>>>> 
>>>> maker_opts.ctl
>>>> est=NC_123456.frn
>>>> protein=NC_123456.faa
>>>> est2genome=1
>>>> protein2genome=1
>>>> Thanks,
>>>> Shaun
>>>> _______________________________________________ maker-devel mailing list
>>>> maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>> _______________________________________________
>>>> maker-devel mailing list
>>>> maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/8981875a/attachment-0003.html>

From carsonhh at gmail.com  Wed Feb 26 16:45:30 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 26 Feb 2014 16:45:30 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CF33B334.A551%carsonhh@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
Message-ID: <B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>

Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff.

--Carson 

Sent from my iPhone

> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
> 
> What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1.  Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline.  Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted).  I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors. 
> 
> Thanks,
> Carson
> 
> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
> Date: Wednesday, February 26, 2014 at 3:04 PM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Mapping gene names
> 
> It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions.
> 
> In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature.
> 
> THanks,
> Mikael
> 
>> 26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>> 
>> It will still work without est_forward.  It just works a little differently.  Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome.
>> 
>> If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline.  Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well).  To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors).  The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match. 
>> 
>> Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter).  MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly.  Also match parameters for exonerate will not be relaxed as they were with est_forward.
>> 
>> As you can see the behavior, is slightly different (because it?s an accidental feature).
>> 
>> Thanks,
>> Carson
>> 
>> 
>> 
>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>> Date: Wednesday, February 26, 2014 at 6:37 AM
>> To: Carson Holt <carsonhh at gmail.com>
>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] Mapping gene names
>> 
>> That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right?
>> 
>> Mikael
>> 
>>> 26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>>> 
>>> Yes.  That should work as well as an accidental feature.
>>> 
>>> --Carson 
>>> 
>>> Sent from my iPhone
>>> 
>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se> wrote:
>>>> 
>>>> Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?
>>>> 
>>>> Thanks,
>>>> Mikael
>>>> 
>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>>>> 
>>>>> There is a way.  It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that.  The option won?t already be there so you?ll have to type it in.
>>>>> 
>>>>> There is also a feature designed to work with this option.  If you add tags to your fasta headers, those can be used to guide the mapping and naming.  For example, gene_id=<some_gene>  will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp  and just using maker_coor=chr1 will force it to only be mapped against chr1.
>>>>> 
>>>>> This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.
>>>>> 
>>>>> ?Carson
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> From: Shaun Jackman <sjackman at gmail.com>
>>>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>>>> To: <maker-devel at yandell-lab.org>
>>>>> Subject: [maker-devel] Mapping gene names
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?
>>>>> 
>>>>> maker_opts.ctl
>>>>> 
>>>>> est=NC_123456.frn
>>>>> protein=NC_123456.faa
>>>>> est2genome=1
>>>>> protein2genome=1
>>>>> Thanks,
>>>>> Shaun
>>>>> 
>>>>> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org 
>>>>> _______________________________________________
>>>>> maker-devel mailing list
>>>>> maker-devel at box290.bluehost.com
>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/4b8b7fdb/attachment-0003.html>

From bioinformatics.umd at gmail.com  Thu Feb 27 09:46:44 2014
From: bioinformatics.umd at gmail.com (UMD Bioinformatics)
Date: Thu, 27 Feb 2014 11:46:44 -0500
Subject: [maker-devel] Problem with OpenFabrics and infiniband
Message-ID: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com>

Hello,

I?ve had my IT folks install maker on our cluster at UMD. I?m having a SEGFAULT error when running maker on inifiniband nodes vs gigE nodes. According to the logs this appears to be an issue with forks but I?m not sure how to fix this. I would simply use the gigE nodes but we are in the process of updating everything to inifiniband so I?ll need to address this issue as some point. I?ve attached the error log from the MPI run as well as commentary from my HPCC team. 

IT suggestions

If you look at the top of the error log for the problematic job, it clearly
warns of an issue with doing 'fork's within openmpi/openfabrics framework.

In particular, the use of the fork system call is only partially supported
in the OpenFabrics software (this is the drivers, etc for the infiniband
connections). See e.g. 
http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork
for more information. In particular the paragraphs starting with the
sentence with the red highlighted "it does not mean that your fork()-calling 
application is safe". (The kernel, openMPI version, and OFED version are 
sufficiently recent to mean that there is _some_ fork support).

The fact that the job runs over gigE but not IB, in conjunction with the
warning from openmpi, strongly suggests that this is the issue that you are 
encountering. I suspect that maker touches registered memory before the fork,
which would result in a segfault (matching what was observed).

You can try adding the arguments
--mca mpi_warn_on_fork 0 
to the mpirun command, just in case the crash was somehow caused by openmpi's
warning, but I would not hold out much hope for that.

###UPDATE### This does not fix the problem.


Basically, it looks like maker uses some system calls like fork in a manner
which is incompatible with the current OpenFabrics software, and thus will
not work with infiniband. This situation is likely to remain until either
maker changes to be compatible with OFED, or OFED's support for the fork
system call is broadened.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140227/acd7e3ab/attachment-0006.html>
-------------- next part --------------
STATUS: Parsing control files...
--------------------------------------------------------------------------
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process.  Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption.  The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.  

The process that invoked fork was:

  Local host:          compute-g20-7.deepthought.umd.edu (PID 28015)
  MPI_COMM_WORLD rank: 0

If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
--------------------------------------------------------------------------
[compute-g20-8:09542] *** Process received signal ***
[compute-g20-8:09542] Signal: Segmentation fault (11)
[compute-g20-8:09542] Signal code: Address not mapped (1)
[compute-g20-8:09542] Failing at address: 0xee00350
[compute-g20-8:09543] *** Process received signal ***
[compute-g20-8:09543] Signal: Segmentation fault (11)
[compute-g20-8:09543] Signal code: Address not mapped (1)
[compute-g20-8:09543] Failing at address: 0xf020c90
[compute-g20-8:09544] *** Process received signal ***
[compute-g20-8:09544] Signal: Segmentation fault (11)
[compute-g20-8:09544] Signal code: Address not mapped (1)
[compute-g20-8:09544] Failing at address: 0x1ad68f10
[compute-g20-8:09545] *** Process received signal ***
[compute-g20-8:09545] Signal: Segmentation fault (11)
[compute-g20-8:09545] Signal code: Address not mapped (1)
[compute-g20-8:09545] Failing at address: 0x84a3188
[compute-g20-8:09545] [ 0] /lib64/libpthread.so.0 [0x2b98fac5eca0]
[compute-g20-8:09545] [ 1] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_memory_ptmalloc2_int_malloc+0x530) [0x2b98f9ea4ec0]
[compute-g20-8:09545] [ 2] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_memory_ptmalloc2_malloc+0x4a) [0x2b98f9ea60ca]
[compute-g20-8:09545] [ 3] perl(Perl_safesysmalloc+0x12) [0x481602]
[compute-g20-8:09545] [ 4] perl(Perl_savepvn+0x26) [0x4816b6]
[compute-g20-8:09545] [ 5] perl(Perl_do_exec3+0x31e) [0x4f715e]
[compute-g20-8:09545] [ 6] perl(Perl_my_popen+0x403) [0x484d63]
[compute-g20-8:09545] [ 7] perl(Perl_do_openn+0x1696) [0x4f9536]
[compute-g20-8:09545] [ 8] perl(Perl_pp_open+0x184) [0x4efc44]
[compute-g20-8:09545] [ 9] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-8:09545] [10] perl(perl_run+0x243) [0x4340f3]
[compute-g20-8:09545] [11] perl(main+0x135) [0x41b485]
[compute-g20-8:09545] [12] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b98fae899c4]
[compute-g20-8:09545] [13] perl [0x41b299]
[compute-g20-8:09545] *** End of error message ***
[compute-g20-8:09546] *** Process received signal ***
[compute-g20-8:09546] Signal: Segmentation fault (11)
[compute-g20-8:09546] Signal code: Address not mapped (1)
[compute-g20-8:09546] Failing at address: 0x8240850
[compute-g20-8:09547] *** Process received signal ***
[compute-g20-8:09547] Signal: Segmentation fault (11)
[compute-g20-8:09547] Signal code: Address not mapped (1)
[compute-g20-8:09547] Failing at address: 0xd5c8850
[compute-g20-8:09548] *** Process received signal ***
[compute-g20-8:09548] Signal: Segmentation fault (11)
[compute-g20-8:09548] Signal code: Address not mapped (1)
[compute-g20-8:09548] Failing at address: 0x8c80850
[compute-g20-8:09549] *** Process received signal ***
[compute-g20-8:09549] Signal: Segmentation fault (11)
[compute-g20-8:09549] Signal code: Address not mapped (1)
[compute-g20-8:09549] Failing at address: 0x18d72850
[compute-g20-10:07087] *** Process received signal ***
[compute-g20-10:07087] Signal: Segmentation fault (11)
[compute-g20-10:07087] Signal code: Address not mapped (1)
[compute-g20-10:07087] Failing at address: 0x6659f10
[compute-g20-10:07088] *** Process received signal ***
[compute-g20-10:07088] Signal: Segmentation fault (11)
[compute-g20-10:07088] Signal code: Address not mapped (1)
[compute-g20-10:07088] Failing at address: 0x1fe3b5d0
[compute-g20-10:07089] *** Process received signal ***
[compute-g20-10:07089] Signal: Segmentation fault (11)
[compute-g20-10:07089] Signal code: Address not mapped (1)
[compute-g20-10:07089] Failing at address: 0x9870350
[compute-g20-10:07090] *** Process received signal ***
[compute-g20-10:07090] Signal: Segmentation fault (11)
[compute-g20-10:07090] Signal code: Address not mapped (1)
[compute-g20-10:07090] Failing at address: 0x17bad350
STATUS: Processing and indexing input FASTA files...
[compute-g20-8:09567] *** Process received signal ***
[compute-g20-8:09567] Signal: Segmentation fault (11)
[compute-g20-8:09567] Signal code: Address not mapped (1)
[compute-g20-8:09567] Failing at address: 0x1ad5aa10
[compute-g20-8:09567] [ 0] /lib64/libpthread.so.0 [0x2b6de3ce1ca0]
[compute-g20-8:09567] [ 1] /lib64/libc.so.6(strlen+0x30) [0x2b6de3f67f40]
[compute-g20-8:09567] [ 2] perl(Perl_do_exec3+0x3a) [0x4f6e7a]
[compute-g20-8:09567] [ 3] perl(Perl_my_popen+0x403) [0x484d63]
[compute-g20-8:09567] [ 4] perl(Perl_do_openn+0x1696) [0x4f9536]
[compute-g20-8:09567] [ 5] perl(Perl_pp_open+0x184) [0x4efc44]
[compute-g20-8:09567] [ 6] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-8:09567] [ 7] perl(perl_run+0x243) [0x4340f3]
[compute-g20-8:09567] [ 8] perl(main+0x135) [0x41b485]
[compute-g20-8:09567] [ 9] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b6de3f0c9c4]
[compute-g20-8:09567] [10] perl [0x41b299]
[compute-g20-8:09567] *** End of error message ***
[compute-g20-7:28123] *** Process received signal ***
[compute-g20-7:28123] Signal: Segmentation fault (11)
[compute-g20-7:28123] Signal code: Address not mapped (1)
[compute-g20-7:28123] Failing at address: 0x19ad9f10
STATUS: Setting up database for any GFF3 input...
A data structure will be created for you at:
/export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore

To access files for individual sequences use the datastore index:
/export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_master_datastore_index.log

STATUS: Now running MAKER...
examining contents of the fasta file and run log
[compute-g20-10:07107] *** Process received signal ***
[compute-g20-10:07107] Signal: Segmentation fault (11)
[compute-g20-10:07107] Signal code: Address not mapped (1)
[compute-g20-10:07107] Failing at address: 0x9870362
[compute-g20-10:07107] [ 0] /lib64/libpthread.so.0 [0x2b50c5c8cca0]
[compute-g20-10:07107] [ 1] perl [0x487218]
[compute-g20-10:07107] [ 2] perl(Perl_hv_common+0xe67) [0x499dd7]
[compute-g20-10:07107] [ 3] perl [0x49d9dc]
[compute-g20-10:07107] [ 4] perl(Perl_pp_method_named+0x6e) [0x49dd4e]
[compute-g20-10:07107] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-10:07107] [ 6] perl(perl_run+0x243) [0x4340f3]
[compute-g20-10:07107] [ 7] perl(main+0x135) [0x41b485]
[compute-g20-10:07107] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b50c5eb79c4]
[compute-g20-10:07107] [ 9] perl [0x41b299]
[compute-g20-10:07107] *** End of error message ***
examining contents of the fasta file and run log
examining contents of the fasta file and run log
[compute-g20-10:07108] *** Process received signal ***
[compute-g20-10:07108] Signal: Segmentation fault (11)
[compute-g20-10:07108] Signal code: Address not mapped (1)
[compute-g20-10:07108] Failing at address: 0x1fe3b5c8
examining contents of the fasta file and run log
[compute-g20-10:07108] [ 0] /lib64/libpthread.so.0 [0x2b88f6f8dca0]
[compute-g20-10:07108] [ 1] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_memory_ptmalloc2_free+0x22) [0x2b88f61d55b2]
[compute-g20-10:07108] [ 2] /lib64/libc.so.6(cfree+0xd1) [0x2b88f7210ad1]
[compute-g20-10:07108] [ 3] perl(Perl_sv_setsv_flags+0xb49) [0x4ad919]
[compute-g20-10:07108] [ 4] perl(Perl_pp_aassign+0x209) [0x4a3a19]
[compute-g20-10:07108] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-10:07108] [ 6] perl(perl_run+0x243) [0x4340f3]
[compute-g20-10:07108] [ 7] perl(main+0x135) [0x41b485]
[compute-g20-10:07108] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b88f71b89c4]
[compute-g20-10:07108] [ 9] perl [0x41b299]
[compute-g20-10:07108] *** End of error message ***
examining contents of the fasta file and run log
[compute-g20-10:07109] *** Process received signal ***
[compute-g20-10:07109] Signal: Segmentation fault (11)
[compute-g20-10:07109] Signal code: Address not mapped (1)
[compute-g20-10:07109] Failing at address: 0x6664ad0
[compute-g20-10:07109] [ 0] /lib64/libpthread.so.0 [0x2b0809664ca0]
[compute-g20-10:07109] [ 1] /lib64/libc.so.6 [0x2b08098edada]
[compute-g20-10:07109] [ 2] /lib64/libc.so.6(memmove+0x75) [0x2b08098ec095]
[compute-g20-10:07109] [ 3] perl(Perl_sv_setpvn+0x7a) [0x4b775a]
[compute-g20-10:07109] [ 4] perl(Perl_pp_concat+0xc9) [0x4a5739]
[compute-g20-10:07109] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-10:07109] [ 6] perl(Perl_call_sv+0x160) [0x4333a0]
[compute-g20-10:07109] [ 7] perl(Perl_magic_methcall+0x182) [0x488c22]
[compute-g20-10:07109] [ 8] perl(Perl_magic_setpack+0x52) [0x489292]
[compute-g20-10:07109] [ 9] perl(Perl_mg_set+0x66) [0x48aca6]
[compute-g20-10:07109] [10] perl(Perl_pp_sassign+0x19c) [0x4a5c8c]
[compute-g20-10:07109] [11] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-10:07109] [12] perl(perl_run+0x243) [0x4340f3]
[compute-g20-10:07109] [13] perl(main+0x135) [0x41b485]
[compute-g20-10:07109] [14] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b080988f9c4]
[compute-g20-10:07109] [15] perl [0x41b299]
[compute-g20-10:07109] *** End of error message ***
examining contents of the fasta file and run log
examining contents of the fasta file and run log
examining contents of the fasta file and run log
examining contents of the fasta file and run log


--Next Contig--


--Next Contig--


--Next Contig--

examining contents of the fasta file and run log


--Next Contig--

Processing run.log file...
Processing run.log file...
examining contents of the fasta file and run log
Processing run.log file...
Processing run.log file...


--Next Contig--


--Next Contig--


--Next Contig--


--Next Contig--


--Next Contig--

Processing run.log file...
Processing run.log file...


--Next Contig--


--Next Contig--

Processing run.log file...
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_2
Length: 2857
#---------------------------------------------------------------------


Processing run.log file...
MAKER WARNING: The file UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/D5/5A/Gc_UCSC1_contig_17//theVoid.Gc_UCSC1_contig_17/0/Gc_UCSC1_contig_17.0.all.rb.out
did not finish on the last run and must be erased
Processing run.log file...
setting up GFF3 output and fasta chunks
Processing run.log file...
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_7
Length: 972
#---------------------------------------------------------------------


[compute-g20-8:09576] *** Process received signal ***
[compute-g20-8:09576] Signal: Segmentation fault (11)
[compute-g20-8:09576] Signal code: Address not mapped (1)
[compute-g20-8:09576] Failing at address: 0x1ad68f08
examining contents of the fasta file and run log
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_3
Length: 2316
#---------------------------------------------------------------------


[compute-g20-8:09576] [ 0] /lib64/libpthread.so.0 [0x2b6de3ce1ca0]
[compute-g20-8:09576] [ 1] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_memory_ptmalloc2_free+0x22) [0x2b6de2f295b2]
[compute-g20-8:09576] [ 2] /lib64/libc.so.6(cfree+0xd1) [0x2b6de3f64ad1]
[compute-g20-8:09576] [ 3] perl(Perl_sv_setsv_flags+0xb49) [0x4ad919]
[compute-g20-8:09576] [ 4] perl(Perl_pp_aassign+0x209) [0x4a3a19]
[compute-g20-8:09576] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-8:09576] [ 6] perl(perl_run+0x243) [0x4340f3]
[compute-g20-8:09576] [ 7] perl(main+0x135) [0x41b485]
[compute-g20-8:09576] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b6de3f0c9c4]
[compute-g20-8:09576] [ 9] perl [0x41b299]
[compute-g20-8:09576] *** End of error message ***
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_4
Length: 1230
#---------------------------------------------------------------------


examining contents of the fasta file and run log
examining contents of the fasta file and run log
examining contents of the fasta file and run log
examining contents of the fasta file and run log
examining contents of the fasta file and run log
[compute-g20-8:09578] *** Process received signal ***
[compute-g20-8:09578] Signal: Segmentation fault (11)
[compute-g20-8:09578] Signal code: Address not mapped (1)
[compute-g20-8:09578] Failing at address: 0xee0af18
[compute-g20-8:09578] [ 0] /lib64/libpthread.so.0 [0x2b03d0637ca0]
[compute-g20-8:09578] [ 1] perl(Perl_av_fetch+0x5b) [0x49cf8b]
[compute-g20-8:09578] [ 2] perl(Perl_pp_aelem+0x26e) [0x49e48e]
[compute-g20-8:09578] [ 3] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-8:09578] [ 4] perl(perl_run+0x243) [0x4340f3]
[compute-g20-8:09578] [ 5] perl(main+0x135) [0x41b485]
[compute-g20-8:09578] [ 6] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b03d08629c4]
[compute-g20-8:09578] [ 7] perl [0x41b299]
[compute-g20-8:09578] *** End of error message ***
setting up GFF3 output and fasta chunks
Processing run.log file...
[compute-g20-8:09583] *** Process received signal ***
[compute-g20-8:09583] Signal: Segmentation fault (11)
[compute-g20-8:09583] Signal code: Address not mapped (1)
[compute-g20-8:09583] Failing at address: 0x822b0e2
[compute-g20-8:09582] *** Process received signal ***
[compute-g20-8:09582] Signal: Segmentation fault (11)
[compute-g20-8:09582] Signal code: Address not mapped (1)
[compute-g20-8:09582] Failing at address: 0x8c6b0e2
[compute-g20-8:09583] [ 0] /lib64/libpthread.so.0 [0x2ab7f114dca0]
[compute-g20-8:09583] [ 1] perl [0x487218]
[compute-g20-8:09583] [ 2] perl(Perl_hv_common+0xe67) [0x499dd7]
[compute-g20-8:09583] [ 3] perl [0x49d9dc]
[compute-g20-8:09583] [ 4] perl(Perl_pp_method_named+0x6e) [0x49dd4e]
[compute-g20-8:09583] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-8:09583] [ 6] perl(perl_run+0x243) [0x4340f3]
[compute-g20-8:09583] [ 7] perl(main+0x135) [0x41b485]
[compute-g20-8:09583] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2ab7f13789c4]
[compute-g20-8:09583] [ 9] perl [0x41b299]
[compute-g20-8:09583] *** End of error message ***
[compute-g20-8:09582] [ 0] /lib64/libpthread.so.0 [0x2b4eace23ca0]
[compute-g20-8:09582] [ 1] perl [0x487218]
[compute-g20-8:09582] [ 2] perl(Perl_hv_common+0xe67) [0x499dd7]
[compute-g20-8:09582] [ 3] perl [0x49d9dc]
[compute-g20-8:09582] [ 4] perl(Perl_pp_method_named+0x6e) [0x49dd4e]
[compute-g20-8:09582] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-8:09582] [ 6] perl(perl_run+0x243) [0x4340f3]
[compute-g20-8:09582] [ 7] perl(main+0x135) [0x41b485]
[compute-g20-8:09582] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b4ead04e9c4]
[compute-g20-8:09582] [ 9] perl [0x41b299]
[compute-g20-8:09582] *** End of error message ***
examining contents of the fasta file and run log
[compute-g20-8:09581] *** Process received signal ***
[compute-g20-8:09581] Signal: Segmentation fault (11)
[compute-g20-8:09581] Signal code: Address not mapped (1)
[compute-g20-8:09581] Failing at address: 0x848da08
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_17
Length: 1413
#---------------------------------------------------------------------


#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_13
Length: 2019
#---------------------------------------------------------------------


[compute-g20-8:09581] [ 0] /lib64/libpthread.so.0 [0x2b98fac5eca0]
[compute-g20-8:09581] [ 1] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_memory_ptmalloc2_free+0x22) [0x2b98f9ea65b2]
[compute-g20-8:09581] [ 2] /lib64/libc.so.6(cfree+0xd1) [0x2b98faee1ad1]
[compute-g20-8:09581] [ 3] perl(Perl_sv_setsv_flags+0xb49) [0x4ad919]
[compute-g20-8:09581] [ 4] perl(Perl_pp_aassign+0x209) [0x4a3a19]
[compute-g20-8:09581] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-8:09581] [ 6] perl(perl_run+0x243) [0x4340f3]
[compute-g20-8:09581] [ 7] perl(main+0x135) [0x41b485]
[compute-g20-8:09581] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b98fae899c4]
[compute-g20-8:09581] [ 9] perl [0x41b299]
[compute-g20-8:09577] *** Process received signal ***
[compute-g20-8:09581] *** End of error message ***
[compute-g20-8:09577] Signal: Segmentation fault (11)
[compute-g20-8:09577] Signal code: Address not mapped (1)
[compute-g20-8:09577] Failing at address: 0xd5b30e2
[compute-g20-8:09577] [ 0] /lib64/libpthread.so.0 [0x2b79d382aca0]
[compute-g20-8:09577] [ 1] perl [0x487218]
[compute-g20-8:09577] [ 2] perl(Perl_hv_common+0xe67) [0x499dd7]
[compute-g20-8:09577] [ 3] perl [0x49d9dc]
[compute-g20-8:09577] [ 4] perl(Perl_pp_method_named+0x6e) [0x49dd4e]
[compute-g20-8:09577] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-8:09577] [ 6] perl(perl_run+0x243) [0x4340f3]
[compute-g20-8:09577] [ 7] perl(main+0x135) [0x41b485]
[compute-g20-8:09577] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b79d3a559c4]
[compute-g20-8:09577] [ 9] perl [0x41b299]
[compute-g20-8:09577] *** End of error message ***
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_1
Length: 1446
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
[compute-g20-8:09579] *** Process received signal ***
[compute-g20-8:09579] Signal: Segmentation fault (11)
[compute-g20-8:09579] Signal code: Address not mapped (1)
[compute-g20-8:09579] Failing at address: 0x18d64350
examining contents of the fasta file and run log
[compute-g20-8:09579] [ 0] /lib64/libpthread.so.0 [0x2b31b670fca0]
[compute-g20-8:09579] [ 1] /usr/local/BerkeleyDB/lib/libdb-4.7.so(__ham_get_meta+0x4c) [0x2b31bbd1bccc]
[compute-g20-8:09579] [ 2] /usr/local/BerkeleyDB/lib/libdb-4.7.so [0x2b31bbd103fb]
[compute-g20-8:09579] [ 3] /usr/local/BerkeleyDB/lib/libdb-4.7.so(__dbc_get+0x1fa) [0x2b31bbd81f3a]
[compute-g20-8:09579] [ 4] /usr/local/BerkeleyDB/lib/libdb-4.7.so(__dbc_get_pp+0xb4) [0x2b31bbd8db04]
[compute-g20-8:09579] [ 5] /usr/local/BerkeleyDB/lib/libdb-4.7.so [0x2b31bbce4b85]
[compute-g20-8:09579] [ 6] /usr/local/perl/5.16.3-threaded/lib/site_perl/5.16.3/x86_64-linux-thread-multi/auto/DB_File/DB_File.so [0x2b31bbabafc9]
[compute-g20-8:09579] [ 7] perl(Perl_pp_entersub+0x58f) [0x49ee4f]
[compute-g20-8:09579] [ 8] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-8:09579] [ 9] perl(perl_run+0x243) [0x4340f3]
[compute-g20-8:09579] [10] perl(main+0x135) [0x41b485]
[compute-g20-8:09579] [11] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b31b693a9c4]
[compute-g20-8:09579] [12] perl [0x41b299]
[compute-g20-8:09579] *** End of error message ***


--Next Contig--

setting up GFF3 output and fasta chunks
setting up GFF3 output and fasta chunks


--Next Contig--

setting up GFF3 output and fasta chunks


--Next Contig--


--Next Contig--

Processing run.log file...
MAKER WARNING: The file UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/3B/F3/Gc_UCSC1_contig_26//theVoid.Gc_UCSC1_contig_26/0/Gc_UCSC1_contig_26.0.all.rb.out
did not finish on the last run and must be erased


--Next Contig--


--Next Contig--


--Next Contig--

#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_18
Length: 937
#---------------------------------------------------------------------


Processing run.log file...
Processing run.log file...
Processing run.log file...


--Next Contig--

FATAL: Thread terminated, causing all processes to fail
--> rank=17, hostname=compute-g20-10.deepthought.umd.edu
setting up GFF3 output and fasta chunks
Processing run.log file...
Processing run.log file...
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_14
Length: 6745
#---------------------------------------------------------------------


#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_9
Length: 554
#---------------------------------------------------------------------


MAKER WARNING: The file UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/FB/E4/Gc_UCSC1_contig_22//theVoid.Gc_UCSC1_contig_22/0/Gc_UCSC1_contig_22.0.all.rb.out
did not finish on the last run and must be erased
setting up GFF3 output and fasta chunks
Processing run.log file...
setting up GFF3 output and fasta chunks
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_16
Length: 995
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_26
Length: 1895
#---------------------------------------------------------------------


FATAL: Thread terminated, causing all processes to fail
--> rank=16, hostname=compute-g20-10.deepthought.umd.edu
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_23
Length: 618
#---------------------------------------------------------------------


#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_31
Length: 506
#---------------------------------------------------------------------


setting up GFF3 output and fasta chunks
setting up GFF3 output and fasta chunks
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_28
Length: 5246
#---------------------------------------------------------------------


MAKER WARNING: The file UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/E5/53/Gc_UCSC1_contig_29//theVoid.Gc_UCSC1_contig_29/0/Gc_UCSC1_contig_29.0.all.rb.out
did not finish on the last run and must be erased
setting up GFF3 output and fasta chunks
setting up GFF3 output and fasta chunks
setting up GFF3 output and fasta chunks
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_19
Length: 880
#---------------------------------------------------------------------


#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_22
Length: 831
#---------------------------------------------------------------------


#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_21
Length: 12421
#---------------------------------------------------------------------


doing repeat masking
FATAL: Thread terminated, causing all processes to fail
--> rank=18, hostname=compute-g20-10.deepthought.umd.edu
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: Gc_UCSC1_contig_29
Length: 1161
#---------------------------------------------------------------------


doing repeat masking
DBD::SQLite::db do failed: disk I/O error at /export/rel50_shadow/glue.umd.edu/software/maker/2.28/.amd64_rel50/bin/../lib/GFFDB.pm line 105.
DBD::SQLite::db do failed: disk I/O error at /export/rel50_shadow/glue.umd.edu/software/maker/2.28/.amd64_rel50/bin/../lib/GFFDB.pm line 106.
DBD::SQLite::db selectcol_arrayref failed: disk I/O error at /export/rel50_shadow/glue.umd.edu/software/maker/2.28/.amd64_rel50/bin/../lib/GFFDB.pm line 108.
DBD::SQLite::db do failed: disk I/O error at /export/rel50_shadow/glue.umd.edu/software/maker/2.28/.amd64_rel50/bin/../lib/GFFDB.pm line 110.
[compute-g20-7.deepthought.umd.edu:28014] 19 more processes have sent help message help-mpi-runtime.txt / mpi_init:warn-fork
[compute-g20-7.deepthought.umd.edu:28014] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
doing repeat masking
running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
cd /tmp/maker_amJ13c; /a/g20-fs1/software/dt-sw0/RepeatMasker/4.0.3/RepeatMasker /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/0F/67/Gc_UCSC1_contig_9//theVoid.Gc_UCSC1_contig_9/0/Gc_UCSC1_contig_9.0.all.rb -species all -dir /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/0F/67/Gc_UCSC1_contig_9//theVoid.Gc_UCSC1_contig_9/0 -pa 1
#-------------------------------#
SIGTERM received
doing repeat masking
running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
cd /tmp/maker_amJ13c; /a/g20-fs1/software/dt-sw0/RepeatMasker/4.0.3/RepeatMasker /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/D5/5A/Gc_UCSC1_contig_17//theVoid.Gc_UCSC1_contig_17/0/Gc_UCSC1_contig_17.0.all.rb -species all -dir /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/D5/5A/Gc_UCSC1_contig_17//theVoid.Gc_UCSC1_contig_17/0 -pa 1
#-------------------------------#
SIGTERM received
SIGTERM received
[compute-g20-7:28161] *** Process received signal ***
[compute-g20-7:28161] Signal: Segmentation fault (11)
[compute-g20-7:28161] Signal code: Address not mapped (1)
[compute-g20-7:28161] Failing at address: 0x19a33ad0
[compute-g20-7:28161] [ 0] /lib64/libpthread.so.0 [0x2b9e1cd6bca0]
[compute-g20-7:28161] [ 1] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_memory_ptmalloc2_int_malloc+0xb0) [0x2b9e1bfb1a40]
[compute-g20-7:28161] [ 2] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_memory_ptmalloc2_malloc+0x4a) [0x2b9e1bfb30ca]
[compute-g20-7:28161] [ 3] perl(Perl_safesysmalloc+0x12) [0x481602]
[compute-g20-7:28161] [ 4] perl(Perl_do_exec3+0x46) [0x4f6e86]
[compute-g20-7:28161] [ 5] perl(Perl_my_popen+0x403) [0x484d63]
[compute-g20-7:28161] [ 6] perl(Perl_pp_backtick+0xc2) [0x4f0752]
[compute-g20-7:28161] [ 7] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-7:28161] [ 8] perl(Perl_call_sv+0x4d1) [0x433711]
[compute-g20-7:28161] [ 9] perl(Perl_sighandler+0x208) [0x4876c8]
[compute-g20-7:28161] [10] /lib64/libpthread.so.0 [0x2b9e1cd6bca0]
[compute-g20-7:28161] [11] /usr/local/ofed/1.5.4/lib64/libmthca-rdmav2.so [0x2b9e29187bbc]
[compute-g20-7:28161] [12] /cell_root/software/openmpi/1.6/gnu/sys/lib/openmpi/mca_btl_openib.so [0x2b9e2686a8dd]
[compute-g20-7:28161] [13] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_progress+0x5b) [0x2b9e1bfc93cb]
[compute-g20-7:28161] [14] /cell_root/software/openmpi/1.6/gnu/sys/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv+0x205) [0x2b9e25e22005]
[compute-g20-7:28161] [15] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(PMPI_Recv+0x14f) [0x2b9e1bf2927f]
[compute-g20-7:28161] [16] /export/rel50_shadow/glue.umd.edu/software/maker/2.28/.amd64_rel50/perl/lib/auto/Parallel/Application/MPI/MPI.so(_MPI_Recv+0x59) [0x2b9e23ba8d69]
[compute-g20-7:28161] [17] /export/rel50_shadow/glue.umd.edu/software/maker/2.28/.amd64_rel50/perl/lib/auto/Parallel/Application/MPI/MPI.so [0x2b9e23ba8f58]
[compute-g20-7:28161] [18] perl(Perl_pp_entersub+0x58f) [0x49ee4f]
[compute-g20-7:28161] [19] perl(Perl_runops_standard+0xe) [0x49d5ce]
[compute-g20-7:28161] [20] perl(perl_run+0x243) [0x4340f3]
[compute-g20-7:28161] [21] perl(main+0x135) [0x41b485]
[compute-g20-7:28161] [22] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b9e1cf969c4]
[compute-g20-7:28161] [23] perl [0x41b299]
[compute-g20-7:28161] *** End of error message ***
running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
cd /tmp/maker_amJ13c; /a/g20-fs1/software/dt-sw0/RepeatMasker/4.0.3/RepeatMasker /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/DC/D5/Gc_UCSC1_contig_18//theVoid.Gc_UCSC1_contig_18/0/Gc_UCSC1_contig_18.0.all.rb -species all -dir /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/DC/D5/Gc_UCSC1_contig_18//theVoid.Gc_UCSC1_contig_18/0 -pa 1
#-------------------------------#
SIGTERM received
doing repeat masking
running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
cd /tmp/maker_amJ13c; /a/g20-fs1/software/dt-sw0/RepeatMasker/4.0.3/RepeatMasker /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/BE/77/Gc_UCSC1_contig_16//theVoid.Gc_UCSC1_contig_16/0/Gc_UCSC1_contig_16.0.all.rb -species all -dir /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/BE/77/Gc_UCSC1_contig_16//theVoid.Gc_UCSC1_contig_16/0 -pa 1
#-------------------------------#
SIGTERM received
doing repeat masking
running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
cd /tmp/maker_amJ13c; /a/g20-fs1/software/dt-sw0/RepeatMasker/4.0.3/RepeatMasker /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/1C/8A/Gc_UCSC1_contig_14//theVoid.Gc_UCSC1_contig_14/0/Gc_UCSC1_contig_14.0.all.rb -species all -dir /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/1C/8A/Gc_UCSC1_contig_14//theVoid.Gc_UCSC1_contig_14/0 -pa 1
#-------------------------------#
SIGTERM received
doing repeat masking
running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
cd /tmp/maker_amJ13c; /a/g20-fs1/software/dt-sw0/RepeatMasker/4.0.3/RepeatMasker /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/CB/E5/Gc_UCSC1_contig_13//theVoid.Gc_UCSC1_contig_13/0/Gc_UCSC1_contig_13.0.all.rb -species all -dir /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/CB/E5/Gc_UCSC1_contig_13//theVoid.Gc_UCSC1_contig_13/0 -pa 1
#-------------------------------#
SIGTERM received
Perl exited with active threads:
	1 running and unjoined
	0 finished and unjoined
	0 running and detached
doing repeat masking
running  repeat masker.
#--------- command -------------#
Widget::RepeatMasker:
cd /tmp/maker_amJ13c; /a/g20-fs1/software/dt-sw0/RepeatMasker/4.0.3/RepeatMasker /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/AA/A6/Gc_UCSC1_contig_1//theVoid.Gc_UCSC1_contig_1/0/Gc_UCSC1_contig_1.0.all.rb -species all -dir /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/AA/A6/Gc_UCSC1_contig_1//theVoid.Gc_UCSC1_contig_1/0 -pa 1
#-------------------------------#
SIGTERM received
--------------------------------------------------------------------------
mpirun has exited due to process rank 17 with PID 7052 on
node compute-g20-10.deepthought.umd.edu exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
SIGTERM received
SIGTERM received
Perl exited with active threads:
	0 running and unjoined
	1 finished and unjoined
	0 running and detached
FATAL: Thread terminated, causing all processes to fail
--> rank=14, hostname=compute-g20-8.deepthought.umd.edu
Perl exited with active threads:
	0 running and unjoined
	1 finished and unjoined
	0 running and detached
FATAL: Thread terminated, causing all processes to fail
--> rank=12, hostname=compute-g20-8.deepthought.umd.edu
[compute-g20-8:09470] *** Process received signal ***
[compute-g20-8:09470] Signal: Segmentation fault (11)
[compute-g20-8:09470] Signal code: Address not mapped (1)
[compute-g20-8:09470] Failing at address: 0x4b0
[compute-g20-8:09470] [ 0] /lib64/libpthread.so.0 [0x2b03d0637ca0]
[compute-g20-8:09470] [ 1] perl(Perl_csighandler+0x23) [0x488103]
[compute-g20-8:09470] [ 2] /lib64/libpthread.so.0 [0x2b03d0637ca0]
[compute-g20-8:09470] [ 3] /lib64/libc.so.6(__select+0x62) [0x2b03d0913402]
[compute-g20-8:09470] [ 4] /cell_root/software/openmpi/1.6/gnu/sys/lib/openmpi/mca_btl_openib.so [0x2b03da142ff3]
[compute-g20-8:09470] [ 5] /lib64/libpthread.so.0 [0x2b03d062f83d]
[compute-g20-8:09470] [ 6] /lib64/libc.so.6(clone+0x6d) [0x2b03d091a26d]
[compute-g20-8:09470] *** End of error message ***
Perl exited with active threads:
	0 running and unjoined
	1 finished and unjoined
	0 running and detached
FATAL: Thread terminated, causing all processes to fail
--> rank=11, hostname=compute-g20-8.deepthought.umd.edu
setting up GFF3 output and fasta chunks
FATAL: Thread terminated, causing all processes to fail
--> rank=10, hostname=compute-g20-8.deepthought.umd.edu
setting up GFF3 output and fasta chunks
FATAL: Thread terminated, causing all processes to fail
--> rank=13, hostname=compute-g20-8.deepthought.umd.edu
FATAL: Thread terminated, causing all processes to fail
--> rank=15, hostname=compute-g20-8.deepthought.umd.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140227/acd7e3ab/attachment-0007.html>

From carson.holt at genetics.utah.edu  Thu Feb 27 11:09:21 2014
From: carson.holt at genetics.utah.edu (Carson Holt)
Date: Thu, 27 Feb 2014 18:09:21 +0000
Subject: [maker-devel] Problem with OpenFabrics and infiniband
In-Reply-To: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com>
References: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com>
Message-ID: <CF34C944.A5B0%carson.holt@genetics.utah.edu>

It?s a little more complicated than that.  MAKER is written in Perl, and Perl doesn?t give me the low level access that a language like C would for controlling memory access (I don?t control that).  All I get is Perl?s standard implementation of forks.  So it?s not really a matter of MAKER changing, it would be a matter of changing Perl itself (which I have no power over, and I don?t think will be changing anytime soon).

For now you just have to add this flag to OpenMPI when running MAKER with mpiexec ?>  -mca btl ^openib

Example :
mpiexec -mca btl ^openib -n 20 maker


Thanks,
Carson


From: UMD Bioinformatics <bioinformatics.umd at gmail.com<mailto:bioinformatics.umd at gmail.com>>
Date: Thursday, February 27, 2014 at 9:46 AM
To: <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Problem with OpenFabrics and infiniband

Hello,

I?ve had my IT folks install maker on our cluster at UMD. I?m having a SEGFAULT error when running maker on inifiniband nodes vs gigE nodes. According to the logs this appears to be an issue with forks but I?m not sure how to fix this. I would simply use the gigE nodes but we are in the process of updating everything to inifiniband so I?ll need to address this issue as some point. I?ve attached the error log from the MPI run as well as commentary from my HPCC team.

IT suggestions

If you look at the top of the error log for the problematic job, it clearly
warns of an issue with doing 'fork's within openmpi/openfabrics framework.

In particular, the use of the fork system call is only partially supported
in the OpenFabrics software (this is the drivers, etc for the infiniband
connections). See e.g.
http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork
for more information. In particular the paragraphs starting with the
sentence with the red highlighted "it does not mean that your fork()-calling
application is safe". (The kernel, openMPI version, and OFED version are
sufficiently recent to mean that there is _some_ fork support).

The fact that the job runs over gigE but not IB, in conjunction with the
warning from openmpi, strongly suggests that this is the issue that you are
encountering. I suspect that maker touches registered memory before the fork,
which would result in a segfault (matching what was observed).

You can try adding the arguments
--mca mpi_warn_on_fork 0
to the mpirun command, just in case the crash was somehow caused by openmpi's
warning, but I would not hold out much hope for that.

###UPDATE### This does not fix the problem.


Basically, it looks like maker uses some system calls like fork in a manner
which is incompatible with the current OpenFabrics software, and thus will
not work with infiniband. This situation is likely to remain until either
maker changes to be compatible with OFED, or OFED's support for the fork
system call is broadened.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140227/062719d0/attachment-0003.html>

From bioinformatics.umd at gmail.com  Thu Feb 27 11:55:34 2014
From: bioinformatics.umd at gmail.com (UMD Bioinformatics)
Date: Thu, 27 Feb 2014 13:55:34 -0500
Subject: [maker-devel] Problem with OpenFabrics and infiniband
In-Reply-To: <CF34C944.A5B0%carson.holt@genetics.utah.edu>
References: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com>
	<CF34C944.A5B0%carson.holt@genetics.utah.edu>
Message-ID: <2840BC1C-70CC-4A0D-AB44-AEFD718C7B8C@gmail.com>

Hi Carson,

Thanks that fixed the issue. 

Cheers
Ian

On Feb 27, 2014, at 1:09 PM, Carson Holt <carson.holt at genetics.utah.edu> wrote:

> It?s a little more complicated than that.  MAKER is written in Perl, and Perl doesn?t give me the low level access that a language like C would for controlling memory access (I don?t control that).  All I get is Perl?s standard implementation of forks.  So it?s not really a matter of MAKER changing, it would be a matter of changing Perl itself (which I have no power over, and I don?t think will be changing anytime soon).
> 
> For now you just have to add this flag to OpenMPI when running MAKER with mpiexec ?>  -mca btl ^openib
> 
> Example :
>> mpiexec -mca btl ^openib -n 20 maker
> 
> 
> Thanks,
> Carson
> 
> 
> From: UMD Bioinformatics <bioinformatics.umd at gmail.com>
> Date: Thursday, February 27, 2014 at 9:46 AM
> To: <maker-devel at yandell-lab.org>
> Subject: Problem with OpenFabrics and infiniband
> 
> Hello,
> 
> I?ve had my IT folks install maker on our cluster at UMD. I?m having a SEGFAULT error when running maker on inifiniband nodes vs gigE nodes. According to the logs this appears to be an issue with forks but I?m not sure how to fix this. I would simply use the gigE nodes but we are in the process of updating everything to inifiniband so I?ll need to address this issue as some point. I?ve attached the error log from the MPI run as well as commentary from my HPCC team. 
> 
> IT suggestions
> 
> If you look at the top of the error log for the problematic job, it clearly
> warns of an issue with doing 'fork's within openmpi/openfabrics framework.
> 
> In particular, the use of the fork system call is only partially supported
> in the OpenFabrics software (this is the drivers, etc for the infiniband
> connections). See e.g. 
> http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork
> for more information. In particular the paragraphs starting with the
> sentence with the red highlighted "it does not mean that your fork()-calling 
> application is safe". (The kernel, openMPI version, and OFED version are 
> sufficiently recent to mean that there is _some_ fork support).
> 
> The fact that the job runs over gigE but not IB, in conjunction with the
> warning from openmpi, strongly suggests that this is the issue that you are 
> encountering. I suspect that maker touches registered memory before the fork,
> which would result in a segfault (matching what was observed).
> 
> You can try adding the arguments
> --mca mpi_warn_on_fork 0 
> to the mpirun command, just in case the crash was somehow caused by openmpi's
> warning, but I would not hold out much hope for that.
> 
> ###UPDATE### This does not fix the problem.
> 
> 
> Basically, it looks like maker uses some system calls like fork in a manner
> which is incompatible with the current OpenFabrics software, and thus will
> not work with infiniband. This situation is likely to remain until either
> maker changes to be compatible with OFED, or OFED's support for the fork
> system call is broadened.
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140227/c8d05f7d/attachment-0003.html>

From sjackman at gmail.com  Thu Feb 27 16:17:22 2014
From: sjackman at gmail.com (Shaun Jackman)
Date: Thu, 27 Feb 2014 15:17:22 -0800
Subject: [maker-devel] Mapping gene names
In-Reply-To: <B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
Message-ID: <etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>

Is there a corresponding?protein_forward=1 option to map forward protein names from protein2genome?

Cheers,
Shaun

On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote:

Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff.

--Carson?

Sent from my iPhone

On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:

What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. ?Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. ?Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). ?I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors.?

Thanks,
Carson

From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date: Wednesday, February 26, 2014 at 3:04 PM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions.

In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature.

THanks,
Mikael

26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:

It will still work without est_forward. ?It just works a little differently. ?Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome.

If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. ?Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). ?So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). ?To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). ?The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match.?

Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). ?MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. ?Also match parameters for exonerate will not be relaxed as they were with est_forward.

As you can see the behavior, is slightly different (because it?s an accidental feature).

Thanks,
Carson


From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
Date: Wednesday, February 26, 2014 at 6:37 AM
To: Carson Holt <carsonhh at gmail.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Mapping gene names

That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right?

Mikael

26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:

Yes. ?That should work as well as an accidental feature.

--Carson?

Sent from my iPhone

On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se> wrote:

Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?

Thanks,
Mikael

26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:

There is a way. ?It?s not a standard option and it?s undocumented, but if you add?est_forward=1 to the maker_opts.ctl file, then it will do just that. ?The option won?t already be there so you?ll have to type it in.

There is also a feature designed to work with this option. ?If you add tags to your fasta headers, those can be used to guide the mapping and naming. ?For example, gene_id=<some_gene> ?will ensure different isoforms that share a common gene_id get clustered into the same gene, and?maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp ?and just using maker_coor=chr1 will force it to only be mapped against chr1.

This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.

?Carson


From: Shaun Jackman <sjackman at gmail.com>
Reply-To: Shaun Jackman <sjackman at gmail.com>
Date: Tuesday, February 25, 2014 at 5:06 PM
To: <maker-devel at yandell-lab.org>
Subject: [maker-devel] Mapping gene names

Hi,

I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?

maker_opts.ctl


est=NC_123456.frn
protein=NC_123456.faa
est2genome=1
protein2genome=1

Thanks,
Shaun

_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________  
maker-devel mailing list  
maker-devel at box290.bluehost.com  
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140227/15f5085c/attachment-0003.html>

From sjackman at gmail.com  Thu Feb 27 17:27:30 2014
From: sjackman at gmail.com (Shaun Jackman)
Date: Thu, 27 Feb 2014 16:27:30 -0800
Subject: [maker-devel] Mapping gene names
In-Reply-To: <etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
Message-ID: <CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>

Sorry, ignore my previous question. est_forward also carries forward the
names of protein evidence and works like a charm. Thank you!

The larger rrn16 and rrn23 genes annotated perfectly, but the smaller
rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They
are in the blastn output, and in the evidence_0.gff. rrn5 has perfect
identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value
(2e-66 < eval_blastn=1e-10). How should I debug which filter is removing
these hits?

organism_type=prokaryotic
est2genome=1
protein2genome=1
est_forward=1

Cheers,
Shaun


On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:

> Is there a corresponding protein_forward=1 option to map forward protein
> names from protein2genome?
>
> Cheers,
> Shaun
>
> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com<//carsonhh at gmail.com>)
> wrote:
>
> Sorry I meant to say prefilter on the score in the mRNA column before
> passing the gff3 to model_gff.
>
> --Carson
>
> Sent from my iPhone
>
> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
>
>  What you can do is run it once with just est_forward=1 and
> est2genome/protein2genome set to 1.  Then take those results, pass them in
> as model_gff and use the map_forward option to then filter the results
> based on mRNA score and that would copy names onto new gene under the
> standard MAKER pipeline.  Eventually it?s really supposed to go into a
> separate tool that will map genes onto new assemblies (but under the hood
> the tool will just be calling MAKER with certain parameters restricted).  I
> do this because if people commonly use it mixed with things like SNAP I can
> start to get some very weird behaviors.
>
> Thanks,
> Carson
>
>  From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
> Date: Wednesday, February 26, 2014 at 3:04 PM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Mapping gene names
>
>  It seems that this could be a very useful option in those cases where
> you have firm a priori knowledge of the placement of ESTs. However, while
> trying it I note that est_forward implies that the est2genome predictor is
> turned on, implicitly. Is this necessary for this to work? I?m after the
> behavior you describe below where exonerate is made to try really hard
> within a limited region to align an est, but I would not like maker to
> produce est2genome predictions.
>
> In general, I think this maker_coor and est_forward is a feature set that
> is worthy to be promoted into a documented feature.
>
> THanks,
> Mikael
>
>  26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>
>  It will still work without est_forward.  It just works a little
> differently.  Keep in mind this was a hidden feature I used to find
> stubborn or hard to find missing genes after reassembly of a genome.
>
> If est_forward is provided, MAKER will parse the database to look for the
> maker_coor tags early in the pipeline.  Then it will create a list of
> locations to search, and it will search them even if there are no BLAST
> results to seed the search (normally MAKER gets a BLAST result first and
> then polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to
> look for a match using all of chr1 as the input to exonerate even when
> BLAST finds nothing (this is a very very slow search, but can help pick up
> one or two stubborn genes that don?t remap well).  To allow this, MAKER
> gives exonerate looser matching parameters (i.e. allows for single base
> pair introns perhaps caused by assembly errors).  The logic here is that
> given the fact that I already told MAKER that with some degree of
> confidence I expect sequence A to map to to location X, it will try its
> hardest to make it match.
>
> Without est_forward set, the maker_coor= flag still gets read in GI.pm at
> line 1563, but only after a BLAST alignment has already seeded it to the
> region (that BLAST result has the information in its description
> parameter).  MAKER will then ignore seeds completely outside of maker_coor.
> In addition any BLAST seeds that overlap maker_coor will get the search
> space for alignment polishing adjusted to match maker_coor exactly.  Also
> match parameters for exonerate will not be relaxed as they were with
> est_forward.
>
> As you can see the behavior, is slightly different (because it?s an
> accidental feature).
>
> Thanks,
> Carson
>
>
>
>  From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
> Date: Wednesday, February 26, 2014 at 6:37 AM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Mapping gene names
>
>  That might be a useful and time saving accidental feature. But, reading
> the code, it seems that I need to supply maker_coor but not gene_id, as
> well as the configuration option est_forward for this to work. Any
> occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1
> right?
>
> Mikael
>
>  26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>
>  Yes.  That should work as well as an accidental feature.
>
> --Carson
>
> Sent from my iPhone
>
> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <
> mikael.durling at slu.se> wrote:
>
> Can this use of maker_coor be used only to hint about the placement of the
> ests, without affecting the naming of the final genes? Ie if I have a
> database of EST where I have a priori knowledge of their rough placement,
> can this placement be given to maker without providing est_forward=1?
>
> Thanks,
> Mikael
>
>  26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>
>  There is a way.  It?s not a standard option and it?s undocumented, but
> if you add est_forward=1 to the maker_opts.ctl file, then it will do just
> that.  The option won?t already be there so you?ll have to type it in.
>
> There is also a feature designed to work with this option.  If you add
> tags to your fasta headers, those can be used to guide the mapping and
> naming.  For example, gene_id=<some_gene>  will ensure different isoforms
> that share a common gene_id get clustered into the same gene,
> and maker_coor=chr1:1-10000 in the fasta header will force a particular
> sequence to only be mapped against chr1 within the range of 1-10000 bp  and
> just using maker_coor=chr1 will force it to only be mapped against chr1.
>
> This is an undocumented way to remap genes onto new assemblies using blast
> alignments of earlier transcript or protein annotations as a guide.
>
> ?Carson
>
>
>
>
>  From: Shaun Jackman <sjackman at gmail.com>
> Reply-To: Shaun Jackman <sjackman at gmail.com>
> Date: Tuesday, February 25, 2014 at 5:06 PM
> To: <maker-devel at yandell-lab.org>
> Subject: [maker-devel] Mapping gene names
>
>  Hi,
>
> I?m annotating a genome using a closely related genome from Genbank, using
> the .frn (RNA) and .faa (protein) files from Genbank as evidence to
> annotate my genome. I?ve run Maker, and the annotation seems to have worked
> well. Is it possible to map the names of the genes from the related species
> to my annotation? I see the *map_forward* option, which applies to the
> *model_gff* parameter. Is there a similar option for *est* and *protein*?
>
> *maker_opts.ctl*
>
> est=NC_123456.frn
> protein=NC_123456.faa
> est2genome=1
> protein2genome=1
>
> Thanks,
> Shaun
>  _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
>  http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
>
>   _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140227/1670be5a/attachment-0003.html>

From carsonhh at gmail.com  Thu Feb 27 18:13:06 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Thu, 27 Feb 2014 18:13:06 -0700
Subject: [maker-devel] Mapping gene names
In-Reply-To: <CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
References: <CADX6M3qqxo33y08QpQ_Ed3tj6W7nRpE67pZO=xktR+LeyE0tNA@mail.gmail.com>
	<CF32868D.A42A%carsonhh@gmail.com>
	<BE25178B-0B16-42A6-928D-EDE27EDDA5B2@slu.se>
	<7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com>
	<104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se>
	<CF335A95.A4DE%carsonhh@gmail.com>
	<ADBDAEEB-BF49-48D7-ABDC-3732065B03EB@slu.se>
	<CF33B334.A551%carsonhh@gmail.com>
	<B1DE7396-14FC-400B-97A7-013EDACEA48C@gmail.com>
	<etPan.530fc791.3bda9527.3ca@pshen01-imac.phage.bcgsc.ca>
	<CADX6M3qnuc0SRfCd9aNfXwXVTRw-w5NRbN+jZzAdbxPWGZsofw@mail.gmail.com>
Message-ID: <CFF1954A-C7DE-4038-BC71-8F5CB5000737@gmail.com>

Set single_exon=1, and the minimum size to a smaller value.  I think it's set to 250 right now.  Also est2genome is looking for ORF, so if there is none (as with tRNAs) they probably won't get picked up.

--Carson 

Sent from my iPhone

> On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:
> 
> Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you!
> 
> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits?
> 
> organism_type=prokaryotic
> est2genome=1
> protein2genome=1
> est_forward=1
> Cheers,
> Shaun
> 
> 
> 
>> On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
>> Is there a corresponding protein_forward=1 option to map forward protein names from protein2genome?
>> 
>> Cheers,
>> Shaun
>> 
>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote:
>>> 
>>> Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff.
>>> 
>>> --Carson 
>>> 
>>> Sent from my iPhone
>>> 
>>> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>> 
>>>> What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1.  Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline.  Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted).  I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors. 
>>>> 
>>>> Thanks,
>>>> Carson
>>>> 
>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>> Date: Wednesday, February 26, 2014 at 3:04 PM
>>>> To: Carson Holt <carsonhh at gmail.com>
>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>> Subject: Re: [maker-devel] Mapping gene names
>>>> 
>>>> It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions.
>>>> 
>>>> In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature.
>>>> 
>>>> THanks,
>>>> Mikael
>>>> 
>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>>>>> 
>>>>> It will still work without est_forward.  It just works a little differently.  Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome.
>>>>> 
>>>>> If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline.  Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well).  To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors).  The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match. 
>>>>> 
>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter).  MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly.  Also match parameters for exonerate will not be relaxed as they were with est_forward.
>>>>> 
>>>>> As you can see the behavior, is slightly different (because it?s an accidental feature).
>>>>> 
>>>>> Thanks,
>>>>> Carson
>>>>> 
>>>>> 
>>>>> 
>>>>> From: Mikael Brandstr?m Durling <mikael.durling at slu.se>
>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM
>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>> 
>>>>> That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? 
>>>>> 
>>>>> Mikael
>>>>> 
>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>> 
>>>>>> Yes.  That should work as well as an accidental feature.
>>>>>> 
>>>>>> --Carson 
>>>>>> 
>>>>>> Sent from my iPhone
>>>>>> 
>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling <mikael.durling at slu.se> wrote:
>>>>>> 
>>>>>>> Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Mikael
>>>>>>> 
>>>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>>> 
>>>>>>>> There is a way.  It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that.  The option won?t already be there so you?ll have to type it in.
>>>>>>>> 
>>>>>>>> There is also a feature designed to work with this option.  If you add tags to your fasta headers, those can be used to guide the mapping and naming.  For example, gene_id=<some_gene>  will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp  and just using maker_coor=chr1 will force it to only be mapped against chr1.
>>>>>>>> 
>>>>>>>> This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide.
>>>>>>>> 
>>>>>>>> ?Carson
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> From: Shaun Jackman <sjackman at gmail.com>
>>>>>>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>>>>>>> To: <maker-devel at yandell-lab.org>
>>>>>>>> Subject: [maker-devel] Mapping gene names
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein?
>>>>>>>> 
>>>>>>>> maker_opts.ctl
>>>>>>>> 
>>>>>>>> est=NC_123456.frn
>>>>>>>> protein=NC_123456.faa
>>>>>>>> est2genome=1
>>>>>>>> protein2genome=1
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Shaun
>>>>>>>> 
>>>>>>>> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>>>> _______________________________________________
>>>>>>>> maker-devel mailing list
>>>>>>>> maker-devel at box290.bluehost.com
>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> _______________________________________________ 
>>> maker-devel mailing list 
>>> maker-devel at box290.bluehost.com 
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140227/a927fc81/attachment-0003.html>

From mikael.durling at slu.se  Fri Feb 28 03:40:30 2014
From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=)
Date: Fri, 28 Feb 2014 10:40:30 +0000
Subject: [maker-devel] maker_coor behaviour
Message-ID: <8CA99854-CF5B-4533-B625-0EDD5DFFCE8B@slu.se>

Hi,

in a previous thread, the maker_coor feature for ETSs was mentioned. I have been trying it out, without using it for mapping gene names. I have placed these ESTs by other means, an thought the maker_coor feature would be a good use of this a priori knowledge. My major problem i try to solve is that I find that some ESTs where I know where they should be aligned, are not recruited to that position by maker?s blastn->exonerate method (I find them on other scaffolds). So I thought maker_coor with the est_forward behavior (as described) would be a good option to force my evidence onto the correct position, instead of ending up supporting or braking other models. However, as soon as I run with maker_coor tagged est sequences, no est2genome evidence appears in the final gff3 file. The blastn evidence is there when est_forward is disabled, but as expected, there is no blastn evidence when est_forward is turned on. It seems though as the evidence is used, as the QI lines indicate EST support for both splice sites as well as exon alignments, but I have no way to visualize and/or evaluate the congruence of evidence and models. Would it be possible to tweak Maker into outputting the est2genome alignments when est_forward/maker_coor is used? I couldn?t figure myself where in the code this was handled.

I could of course do my own exonerate alignments of these ESTs and feed them into maker as est_gff, but if maker already has the machinery to to this, I thought it would be a good idea to use it.

Thanks,
Mikael


From carsonhh at gmail.com  Fri Feb 28 07:09:09 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 28 Feb 2014 07:09:09 -0700
Subject: [maker-devel] maker_coor behaviour
Message-ID: <CF35E345.A60A%carsonhh@gmail.com>

I wouldn?t use those options for standard de novo annotation.  There are
really other more appropriate thing that should be used instead.  Both
maker_coor and est_forward are destined to be part of a separate tool that
will secretly just be calling MAKER, but will allow me to control what
other parameters MAKER sees to avoid certain logic incompatibilities that
make sense when mapping entire genes onto a new assembly, but not really
for de novo annotation using ESTs.

You should instead try modifying these options in the maker_bopts.ctl file
?>

pcov_blastn= #Blastn Percent Coverage Threhold EST-Genome Alignments
pid_blastn= #Blastn Percent Identity Threshold EST-Genome Aligments
eval_blastn= #Blastn eval cutoff
bit_blastn= #Blastn bit cutoff
depth_blastn= #Blastn depth cutoff (0 to disable cutoff). For trimming
high evidence overlap regions

en_score_limit= #Exonerate nucleotide percent of maximal score threshold


If either blastn or est2genome results disappear, it is because they don?t
meet one of these thresholds (blastn results that don?t meet the
thresholds but are borderline are kept if exonerate does meet the
thresholds, but if exonerate misses a threshold they will be thrown out).
That is whey the EST in question gets thrown out and it?s why the blastn
result disappears when you try and anchor it with maker_coor.

You can visualize everything with a browser when your done.  I still
recommend the old version of Apollo for this (it?s just easier).  You can
try and install it using the ?./Build apollo? option from the
.../maker/src/ directory, and it will be installed in
.../maker/exe/apollo.  It requires that you have apache ant installed to
do this.  Otherwise just download it from the GMOD source forge page and
install it manually.

Thanks,
Carson


On 2/28/14, 3:40 AM, "Mikael Brandstr?m Durling" <mikael.durling at slu.se>
wrote:

>Hi,
>
>in a previous thread, the maker_coor feature for ETSs was mentioned. I
>have been trying it out, without using it for mapping gene names. I have
>placed these ESTs by other means, an thought the maker_coor feature would
>be a good use of this a priori knowledge. My major problem i try to solve
>is that I find that some ESTs where I know where they should be aligned,
>are not recruited to that position by maker?s blastn->exonerate method (I
>find them on other scaffolds). So I thought maker_coor with the
>est_forward behavior (as described) would be a good option to force my
>evidence onto the correct position, instead of ending up supporting or
>braking other models. However, as soon as I run with maker_coor tagged
>est sequences, no est2genome evidence appears in the final gff3 file. The
>blastn evidence is there when est_forward is disabled, but as expected,
>there is no blastn evidence when est_forward is turned on. It seems
>though as the evidence is used, as the QI lines indicate EST support for
>both splice sites as well as exon alignments, but I have no way to
>visualize and/or evaluate the congruence of evidence and models. Would it
>be possible to tweak Maker into outputting the est2genome alignments when
>est_forward/maker_coor is used? I couldn?t figure myself where in the
>code this was handled.
>
>I could of course do my own exonerate alignments of these ESTs and feed
>them into maker as est_gff, but if maker already has the machinery to to
>this, I thought it would be a good idea to use it.
>
>Thanks,
>Mikael
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From rbharris at uw.edu  Fri Feb 28 13:14:55 2014
From: rbharris at uw.edu (Rebecca Harris)
Date: Fri, 28 Feb 2014 12:14:55 -0800
Subject: [maker-devel] error in snap training
In-Reply-To: <16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com>
References: <CAESS276MjRUmto+9fkr68jRXBE9or4geWB-q4Oc5_qKsQOdnpA@mail.gmail.com>
	<16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com>
Message-ID: <CAESS277JnyDD48DQvpKtw_kDw1xqOnGR-Fiqu-PoOPaesO3Oug@mail.gmail.com>

Hi -

I tried this and ran cegma --genome on my original fasta file. I then tried
to use cegama2zff to convert, fathom, and forge. However, when I try to
generate new parameters with forge, I get the same error that I got when
trying to train SNAP without CEGMA: "ZOE ERROR (from forge): impossible
error5 KOG1342.20". Any suggestions would be great,
thanks!

Cheers,
Rebecca


On Tue, Feb 25, 2014 at 2:12 PM, Carson Holt <carsonhh at gmail.com> wrote:

> Make sure you are using 2.31,  and then try the maker2zff filters
> individually.  If the protein models are not working well, use CEGMA to
> generate models. It's from the same group as SNAP.  Use cegma2zff for the
> conversion.
>
> --Carson
>
> Sent from my iPhone
>
> > On Feb 25, 2014, at 2:49 PM, Rebecca Harris <rbharris at uw.edu> wrote:
> >
> > Hey -
> >
> > I'm trying to train SNAP and am running into errors. I don't have any
> EST evidence, just protein. My .gff file reports 10865 genes but when I run
> maker2zff  -c0 -e0 I get back empty genome files. When I run maker2zff -n,
> a ton of overlap_prev_exon errors get written to the screen and then with I
> get to the forge step I get an "impossible error5". Any help would be
> greatly appreciated.
> >
> > Thanks!
> > Rebecca
> > _______________________________________________
> > maker-devel mailing list
> > maker-devel at box290.bluehost.com
> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140228/4957d69e/attachment-0003.html>

From carsonhh at gmail.com  Fri Feb 28 13:22:12 2014
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 28 Feb 2014 13:22:12 -0700
Subject: [maker-devel] error in snap training
In-Reply-To: <CAESS277JnyDD48DQvpKtw_kDw1xqOnGR-Fiqu-PoOPaesO3Oug@mail.gmail.com>
References: <CAESS276MjRUmto+9fkr68jRXBE9or4geWB-q4Oc5_qKsQOdnpA@mail.gmail.com>
	<16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com>
	<CAESS277JnyDD48DQvpKtw_kDw1xqOnGR-Fiqu-PoOPaesO3Oug@mail.gmail.com>
Message-ID: <CF363CE6.A6B6%carsonhh@gmail.com>

If it?s failing both ways I?m thinking this may be SNAP itself. Try these
two different versions of SNAP.

?> http://korflab.ucdavis.edu/Software/snap-2013-02-16.tar.gz
and 
?> http://korflab.ucdavis.edu/Software/snap-2013-11-29.tar.gz

If they both fail then contact the SNAP development group ?> korflab AT
ucdavis DOT edu

Thanks,
Carson


From:  Rebecca Harris <rbharris at uw.edu>
Date:  Friday, February 28, 2014 at 1:14 PM
To:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] error in snap training

Hi -

I tried this and ran cegma --genome on my original fasta file. I then tried
to use cegama2zff to convert, fathom, and forge. However, when I try to
generate new parameters with forge, I get the same error that I got when
trying to train SNAP without CEGMA: "ZOE ERROR (from forge): impossible
error5 KOG1342.20". Any suggestions would be great,
thanks!

Cheers,
Rebecca


On Tue, Feb 25, 2014 at 2:12 PM, Carson Holt <carsonhh at gmail.com> wrote:
> Make sure you are using 2.31,  and then try the maker2zff filters
> individually.  If the protein models are not working well, use CEGMA to
> generate models. It's from the same group as SNAP.  Use cegma2zff for the
> conversion.
> 
> --Carson
> 
> Sent from my iPhone
> 
>> > On Feb 25, 2014, at 2:49 PM, Rebecca Harris <rbharris at uw.edu> wrote:
>> >
>> > Hey -
>> >
>> > I'm trying to train SNAP and am running into errors. I don't have any EST
>> evidence, just protein. My .gff file reports 10865 genes but when I run
>> maker2zff  -c0 -e0 I get back empty genome files. When I run maker2zff -n, a
>> ton of overlap_prev_exon errors get written to the screen and then with I get
>> to the forge step I get an "impossible error5". Any help would be greatly
>> appreciated.
>> >
>> > Thanks!
>> > Rebecca
>> > _______________________________________________
>> > maker-devel mailing list
>> > maker-devel at box290.bluehost.com
>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140228/e77809ff/attachment-0003.html>