From morgan_starr_s at live.com  Sun Feb  3 13:13:47 2019
From: morgan_starr_s at live.com (morgan sobol)
Date: Sun, 3 Feb 2019 19:13:47 +0000
Subject: [maker-devel] Re-annotation, fewer gene predictions
Message-ID: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>

Hello,

I previously used Maker to annotate two different fungal genomes that were created using Illumina sequences only. For these genomes, I had over 11,000 genes predicted.
I recently obtained PacBio sequences for the same genomes, so I created two hybrid assemblies. Both assemblies were very familiar in length and completed number of orthologs to the Illumina only assembly, but had much fewer, but longer contigs.

I re-ran Maker using the settings below. For one of my genomes, I got around 11,000 genes predicted again, as expected. However, for the other genome, I am continuously getting ~4,400 predicted genes.

I am asking for help as to how I can determine why I keep getting fewer predicted genes for only one of my genomes, even though I ran them the same?

Thanks,
Morgan S.

maker_opts.log
#-----Genome (these are always required)
genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked #genome sequence (fasta file or$
organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic

#-----Re-annotation Using MAKER Derived GFF3
maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff #MAKER derive$
est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no

#-----EST Evidence (for best results provide a file for at least one)
est= #set of ESTs or assembled mRNA-seq in fasta format
altest= #EST/cDNA sequence file in fasta format from an alternate organism
est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
altest_gff= #aligned ESTs from a closly relate species in GFF3 format

#-----Protein Homology Evidence (for best results provide a file for at least one)
protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta  #protein sequence file in fasta format (i.e. from mutiple oransisms)
protein_gff=  #aligned protein homology evidence from an external GFF3 file

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org= #select a model organism for RepBase masking in RepeatMasker
rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker
repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner
rm_gff= #pre-identified repeat elements from an external GFF3 file
prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no
softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering)

#-----Gene Prediction
snaphmm= #SNAP HMM file
gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file
augustus_species=1368D_uni #Augustus gene prediction species model
fgenesh_par_file= #FGENESH parameter file
pred_gff= #ab-initio predictions from an external GFF3 file
model_gff= #annotated gene models from an external GFF3 file (annotation pass-through)
est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no
trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no

#-----Other Annotation Feature Types (features MAKER doesn't recognize)
other_gff= #extra features to pass-through to final MAKER generated GFF3 file

#-----External Application Behavior Options
alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases
cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI)

#-----MAKER Behavior Options
max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage)
min_contig=1 #skip genome contigs below this length (under 10kb are often useless)

pred_flank=200 #flank for extending evidence clusters sent to gene predictors
pred_stats=1 #report AED and QI statistics for all predictions as well as models
AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
min_protein=0 #require at least this many amino acids in predicted proteins
alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no
always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no
map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no
keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1)

split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments)
single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no
single_length=250 #min length required for single exon ESTs if 'single_exon is enabled'
correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes

tries=2 #number of times to try a contig if there is a failure for some reason
clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no
clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no
TMP= #specify a directory other than the system default temporary directory for temporary files

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20190203/ce613295/attachment.html>

From xvazquezc at gmail.com  Sun Feb  3 16:43:42 2019
From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=)
Date: Mon, 4 Feb 2019 09:43:42 +1100
Subject: [maker-devel] Re-annotation, fewer gene predictions
In-Reply-To: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>
References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>
Message-ID: <CAL0hg4HevFbPhVLfuLq3WF7iJUFpHKwm0X9q+X_yX5sJsCqKDA@mail.gmail.com>

Hi Morgan,

We had a similar issue with AUGUSTUS underpredicting when using a
BUSCO-derived gene model
https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ

Also, check the number of proteins by each individual predictor. If the
numbers from one of them are off, you may find a possible source of issues.
We didn't have a very good experience with GM, as it used to overpredict an
absurd number of proteins.

Xabi

On Mon, 4 Feb 2019 at 06:15, morgan sobol <morgan_starr_s at live.com> wrote:

> Hello,
>
> I previously used Maker to annotate two different fungal genomes that were
> created using Illumina sequences only. For these genomes, I had over 11,000
> genes predicted.
> I recently obtained PacBio sequences for the same genomes, so I created
> two hybrid assemblies. Both assemblies were very familiar in length and
> completed number of orthologs to the Illumina only assembly, but had much
> fewer, but longer contigs.
>
> I re-ran Maker using the settings below. For one of my genomes, I got
> around 11,000 genes predicted again, as expected. However, for the other
> genome, I am continuously getting ~4,400 predicted genes.
>
> I am asking for help as to how I can determine why I keep getting fewer
> predicted genes for only one of my genomes, even though I ran them the same?
>
> Thanks,
> Morgan S.
>
> maker_opts.log
> #-----Genome (these are always required)
> genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked
> #genome sequence (fasta file or$
> organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic
>
> #-----Re-annotation Using MAKER Derived GFF3
> maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff
> #MAKER derive$
> est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
> altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
> protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
> rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
> model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
> pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
> other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no
>
> #-----EST Evidence (for best results provide a file for at least one)
> est= #set of ESTs or assembled mRNA-seq in fasta format
> altest= #EST/cDNA sequence file in fasta format from an alternate organism
> est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
> altest_gff= #aligned ESTs from a closly relate species in GFF3 format
>
> #-----Protein Homology Evidence (for best results provide a file for at
> least one)
> protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta
> #protein sequence file in fasta format (i.e. from mutiple oransisms)
> protein_gff=  #aligned protein homology evidence from an external GFF3 file
>
> #-----Repeat Masking (leave values blank to skip repeat masking)
> model_org= #select a model organism for RepBase masking in RepeatMasker
> rmlib= #provide an organism specific repeat library in fasta format for
> RepeatMasker
> repeat_protein= #provide a fasta file of transposable element proteins for
> RepeatRunner
> rm_gff= #pre-identified repeat elements from an external GFF3 file
> prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change
> this), 1 = yes, 0 = no
> softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg
> and dust filtering)
>
> #-----Gene Prediction
> snaphmm= #SNAP HMM file
> gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file
> augustus_species=1368D_uni #Augustus gene prediction species model
> fgenesh_par_file= #FGENESH parameter file
> pred_gff= #ab-initio predictions from an external GFF3 file
> model_gff= #annotated gene models from an external GFF3 file (annotation
> pass-through)
> est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
> protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no
> trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
> snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
> unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 =
> yes, 0 = no
>
> #-----Other Annotation Feature Types (features MAKER doesn't recognize)
> other_gff= #extra features to pass-through to final MAKER generated GFF3
> file
>
> #-----External Application Behavior Options
> alt_peptide=C #amino acid used to replace non-standard amino acids in
> BLAST databases
> cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI,
> leave 1 when using MPI)
>
> #-----MAKER Behavior Options
> max_dna_len=100000 #length for dividing up contigs into chunks
> (increases/decreases memory usage)
> min_contig=1 #skip genome contigs below this length (under 10kb are often
> useless)
>
> pred_flank=200 #flank for extending evidence clusters sent to gene
> predictors
> pred_stats=1 #report AED and QI statistics for all predictions as well as
> models
> AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and
> 1)
> min_protein=0 #require at least this many amino acids in predicted proteins
> alt_splice=0 #Take extra steps to try and find alternative splicing, 1 =
> yes, 0 = no
> always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0
> = no
> map_forward=0 #map names and attributes forward from old GFF3 genes, 1 =
> yes, 0 = no
> keep_preds=1 #Concordance threshold to add unsupported gene prediction
> (bound by 0 and 1)
>
> split_hit=10000 #length for the splitting of hits (expected max intron
> size for evidence alignments)
> single_exon=1 #consider single exon EST evidence when generating
> annotations, 1 = yes, 0 = no
> single_length=250 #min length required for single exon ESTs if
> 'single_exon is enabled'
> correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion
> genes
>
> tries=2 #number of times to try a contig if there is a failure for some
> reason
> clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0
> = no
> clean_up=0 #removes theVoid directory with individual analysis files, 1 =
> yes, 0 = no
> TMP= #specify a directory other than the system default temporary
> directory for temporary files
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>


-- 
Xabier V?zquez-Campos, *PhD*
*Research Associate*
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20190204/2d94d0d9/attachment.html>

From keith.decker at bayer.com  Mon Feb  4 12:09:35 2019
From: keith.decker at bayer.com (DECKER, KEITH F [AG/1005])
Date: Mon, 4 Feb 2019 18:09:35 +0000
Subject: [maker-devel] MAKER on AWS
Message-ID: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com>

I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be.
I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html
but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use.

So my questions are

1.  Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS?
2.  Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end?  If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate?

Thanks and sorry for the long question
Keith
This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.
Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.
Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so.
If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20190204/e803b13e/attachment.html>

From carsonhh at gmail.com  Mon Feb  4 12:31:29 2019
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 4 Feb 2019 11:31:29 -0700
Subject: [maker-devel] MAKER on AWS
In-Reply-To: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com>
References: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com>
Message-ID: <0934DD0D-9431-4454-A278-87E27D44F984@gmail.com>

You can try and stand up a cluster inside AWS, or like you said just start independent instances each with their own piece of the total dataset. There is a tools called fasta_tool inside of maker that makes it easy to split up the dataset into equal sized chunks.

Alternatively, CyVerse has set up an interesting MAKER wrapper (WQ-MAKER) that launches multiple cloud instances for MAKER and handles data chunking for you (they?ve been using XSEDE cloud resources through the NSF)  ?>
http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf <http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf>

Here is an example of an external project using their setup ?> http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/ <http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/>

?Carson


> On Feb 4, 2019, at 11:09 AM, DECKER, KEITH F [AG/1005] <keith.decker at bayer.com> wrote:
> 
> I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be.
> I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html <http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html>
> but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use. 
>  
> So my questions are
>  
> 1.  Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS?
> 2.  Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end?  If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate?
>  
> Thanks and sorry for the long question
> Keith
> 
> 
> This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.
> Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.
> Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so.
> If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20190204/839034e2/attachment.html>

From liorglck at gmail.com  Mon Feb  4 03:00:29 2019
From: liorglck at gmail.com (Lior Glick)
Date: Mon, 4 Feb 2019 11:00:29 +0200
Subject: [maker-devel] MAKER not calling RepeatMasker exe indicated in
 maker_exe.ctl
Message-ID: <CAFOVipNgzGd-wLNqz1WGx+mM_8R3KZOtqatq6D+nuNCHboRPXQ@mail.gmail.com>

Dear MAKER users,

I've been using MAKER for a while now, with RepeatMasker installed locally.
By that I mean that I can type 'RepeatMasker' in my terminal and the
software is initiated. Typing 'which RepeatMasker' shows the correct local
path.
I also use this path as value for the maker_exe.ctl parameter
'RepeatMasker'.
Trying to generalize my working environment, I am trying to use a conda env
<https://anaconda.org/bioconda/maker> which is capable of running MAKER.
This env comes with RepeatMasker as well. Once I activate this env, I can
still run RepeatMasker, but it points to a different path. When I run MAKER
within this env, it fails right away with the error message:
ERROR: Could not determine if RepBase is installed
Running the same configuration files locally (i.e. outside the conda env)
results in a successful run.
This leads me to think that MAKER is not actually using the path indicated
in the maker_exe.ctl file, and rather looks for RepeatMasker in $PATH or
something similar. Is that the expected behavior? Any suggestions of how to
overcome this issue?

Thanks and best regards,
Lior
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20190204/bd480e61/attachment.html>

From keith.decker at bayer.com  Mon Feb  4 12:39:48 2019
From: keith.decker at bayer.com (DECKER, KEITH F [AG/1005])
Date: Mon, 4 Feb 2019 18:39:48 +0000
Subject: [maker-devel] MAKER on AWS
In-Reply-To: <0934DD0D-9431-4454-A278-87E27D44F984@gmail.com>
References: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com>
	<0934DD0D-9431-4454-A278-87E27D44F984@gmail.com>
Message-ID: <1BAD7C53-AFA5-4A4A-B35B-D760B3D4C28D@monsanto.com>

Thanks,
Do you have metrics on how MAKER performs on annotating a single chromosome on a single machine?  For example, will I see anything close to 16X speed-up using a 16 core machine, and does performance improvement saturate at a certain number of cores?

-Keith

From: Carson Holt <carsonhh at gmail.com>
Date: Monday, February 4, 2019 at 12:33 PM
To: "DECKER, KEITH F [AG/1005]" <keith.decker at bayer.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] MAKER on AWS

You can try and stand up a cluster inside AWS, or like you said just start independent instances each with their own piece of the total dataset. There is a tools called fasta_tool inside of maker that makes it easy to split up the dataset into equal sized chunks.

Alternatively, CyVerse has set up an interesting MAKER wrapper (WQ-MAKER) that launches multiple cloud instances for MAKER and handles data chunking for you (they?ve been using XSEDE cloud resources through the NSF)  ?>
http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf

Here is an example of an external project using their setup ?> http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/

?Carson


On Feb 4, 2019, at 11:09 AM, DECKER, KEITH F [AG/1005] <keith.decker at bayer.com<mailto:keith.decker at bayer.com>> wrote:

I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be.
I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html
but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use.

So my questions are

1.  Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS?
2.  Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end?  If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate?

Thanks and sorry for the long question
Keith


This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.

Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.

Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so.

If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.
Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.
Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so.
If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20190204/05ee72b5/attachment.html>

From carsonhh at gmail.com  Mon Feb  4 13:00:00 2019
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 4 Feb 2019 12:00:00 -0700
Subject: [maker-devel] MAKER on AWS
In-Reply-To: <1BAD7C53-AFA5-4A4A-B35B-D760B3D4C28D@monsanto.com>
References: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com>
	<0934DD0D-9431-4454-A278-87E27D44F984@gmail.com>
	<1BAD7C53-AFA5-4A4A-B35B-D760B3D4C28D@monsanto.com>
Message-ID: <EF78A658-7C9E-4F10-AA30-73E97DB30297@gmail.com>

I don?t have cloud performance stats, but I do have cluster performance stats you may be able to somewhat correlate (attached). On a cluster we see nearly linear performance gains until ~100 CPU cores, and the plateau doesn?t fully level out until well after 600 cores (we are hitting IO and networking limits for inter-node communication). So if you are only using a single instance, you can essentially consider it the equivalent of a single real machine which would fall well under 100 CPU cores, and performance growth would be expected to be linear on that instance.

?Carson


> On Feb 4, 2019, at 11:39 AM, DECKER, KEITH F [AG/1005] <keith.decker at bayer.com> wrote:
> 
> Thanks,
> Do you have metrics on how MAKER performs on annotating a single chromosome on a single machine?  For example, will I see anything close to 16X speed-up using a 16 core machine, and does performance improvement saturate at a certain number of cores?
>  
> -Keith
>  
> From: Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>>
> Date: Monday, February 4, 2019 at 12:33 PM
> To: "DECKER, KEITH F [AG/1005]" <keith.decker at bayer.com <mailto:keith.decker at bayer.com>>
> Cc: "maker-devel at yandell-lab.org <mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org <mailto:maker-devel at yandell-lab.org>>
> Subject: Re: [maker-devel] MAKER on AWS
>  
> You can try and stand up a cluster inside AWS, or like you said just start independent instances each with their own piece of the total dataset. There is a tools called fasta_tool inside of maker that makes it easy to split up the dataset into equal sized chunks. 
>  
> Alternatively, CyVerse has set up an interesting MAKER wrapper (WQ-MAKER) that launches multiple cloud instances for MAKER and handles data chunking for you (they?ve been using XSEDE cloud resources through the NSF)  ?>
> http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf <http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf>
>  
> Here is an example of an external project using their setup ?> http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/ <http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/>
>  
> ?Carson
>  
>  
>  
> 
> 
> On Feb 4, 2019, at 11:09 AM, DECKER, KEITH F [AG/1005] <keith.decker at bayer.com <mailto:keith.decker at bayer.com>> wrote:
>  
> I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be.
> I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html <http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html>
> but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use. 
>  
> So my questions are
>  
> 1.  Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS?
> 2.  Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end?  If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate?
>  
> Thanks and sorry for the long question
> Keith
> 
> 
>  
> This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.
> Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.
> Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so.
> If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.
>  
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
>  
> 
> 
> This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.
> Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.
> Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so.
> If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20190204/43c5cc9f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-2.pdf
Type: application/pdf
Size: 41424 bytes
Desc: not available
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20190204/43c5cc9f/attachment.pdf>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20190204/43c5cc9f/attachment-0001.html>

From xvazquezc at gmail.com  Tue Feb  5 16:42:40 2019
From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=)
Date: Wed, 6 Feb 2019 09:42:40 +1100
Subject: [maker-devel] Re-annotation, fewer gene predictions
In-Reply-To: <DM5PR14MB129277D10A397B2CBE0DDA08AE6E0@DM5PR14MB1292.namprd14.prod.outlook.com>
References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>
	<CAL0hg4HevFbPhVLfuLq3WF7iJUFpHKwm0X9q+X_yX5sJsCqKDA@mail.gmail.com>
	<DM5PR14MB129277D10A397B2CBE0DDA08AE6E0@DM5PR14MB1292.namprd14.prod.outlook.com>
Message-ID: <CAL0hg4EH=79A7ucKe=ORznXh=7Suu9Q8AEWj7C8Xio82=G4fvw@mail.gmail.com>

Don't you use SNAP? It usually produces quite decent results. And easier to
train than any of the other predictors

In any case, the Augustus gene model is way off in both cases
GM doesn't seem bad if your fungus has a rather usual genome... in the
first. For the second, it looks bad

I'm not too familiar with the reannotation but I'd rather create the gene
models from scratch rather than reuse the ones from the Illumina-only
genomes.
Note that assemblies with long-reads, have a higher proportion of
repetitive elements that need masking and RepeatMasker only may not be
enough. In theory, this shouldn't affect Augustus model if trained through
BUSCO as it uses defined conserved markers to create the gene model, but
I'm not so sure about GM.

If you trained Augustus with BUSCO, and this is the result, I'd discard the
gene model and train it again by the "traditional way", i.e. as it used to
be when we only had CEGMA. I had good results just by changing the training
method.

Hope it helps,
Xabi


On Wed, 6 Feb 2019 at 02:19, morgan sobol <morgan_starr_s at live.com> wrote:

> Thank you, Xabi for the response.
> The number of proteins from each source is greatly lower than before.
> Previous numbers were 325, 10,899, and 11,243 for augustus, genemark, and
> maker respectively.
> The more recent numbers are 25, 857, 4418 respectively.
>
> So do you think maybe this hints that something is wrong from genemark?
>
> Morgan
>
>
> ------------------------------
> *From:* Xabier V?zquez-Campos <xvazquezc at gmail.com>
> *Sent:* Sunday, February 3, 2019 4:43 PM
> *To:* morgan sobol
> *Cc:* maker-devel at yandell-lab.org
> *Subject:* Re: [maker-devel] Re-annotation, fewer gene predictions
>
> Hi Morgan,
>
> We had a similar issue with AUGUSTUS underpredicting when using a
> BUSCO-derived gene model
> https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ
>
> Also, check the number of proteins by each individual predictor. If the
> numbers from one of them are off, you may find a possible source of issues.
> We didn't have a very good experience with GM, as it used to overpredict
> an absurd number of proteins.
>
> Xabi
>
> On Mon, 4 Feb 2019 at 06:15, morgan sobol <morgan_starr_s at live.com> wrote:
>
> Hello,
>
> I previously used Maker to annotate two different fungal genomes that were
> created using Illumina sequences only. For these genomes, I had over 11,000
> genes predicted.
> I recently obtained PacBio sequences for the same genomes, so I created
> two hybrid assemblies. Both assemblies were very familiar in length and
> completed number of orthologs to the Illumina only assembly, but had much
> fewer, but longer contigs.
>
> I re-ran Maker using the settings below. For one of my genomes, I got
> around 11,000 genes predicted again, as expected. However, for the other
> genome, I am continuously getting ~4,400 predicted genes.
>
> I am asking for help as to how I can determine why I keep getting fewer
> predicted genes for only one of my genomes, even though I ran them the same?
>
> Thanks,
> Morgan S.
>
> maker_opts.log
> #-----Genome (these are always required)
> genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked
> #genome sequence (fasta file or$
> organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic
>
> #-----Re-annotation Using MAKER Derived GFF3
> maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff
> #MAKER derive$
> est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
> altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
> protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
> rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
> model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
> pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
> other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no
>
> #-----EST Evidence (for best results provide a file for at least one)
> est= #set of ESTs or assembled mRNA-seq in fasta format
> altest= #EST/cDNA sequence file in fasta format from an alternate organism
> est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
> altest_gff= #aligned ESTs from a closly relate species in GFF3 format
>
> #-----Protein Homology Evidence (for best results provide a file for at
> least one)
> protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta
> #protein sequence file in fasta format (i.e. from mutiple oransisms)
> protein_gff=  #aligned protein homology evidence from an external GFF3 file
>
> #-----Repeat Masking (leave values blank to skip repeat masking)
> model_org= #select a model organism for RepBase masking in RepeatMasker
> rmlib= #provide an organism specific repeat library in fasta format for
> RepeatMasker
> repeat_protein= #provide a fasta file of transposable element proteins for
> RepeatRunner
> rm_gff= #pre-identified repeat elements from an external GFF3 file
> prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change
> this), 1 = yes, 0 = no
> softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg
> and dust filtering)
>
> #-----Gene Prediction
> snaphmm= #SNAP HMM file
> gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file
> augustus_species=1368D_uni #Augustus gene prediction species model
> fgenesh_par_file= #FGENESH parameter file
> pred_gff= #ab-initio predictions from an external GFF3 file
> model_gff= #annotated gene models from an external GFF3 file (annotation
> pass-through)
> est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
> protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no
> trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
> snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
> unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 =
> yes, 0 = no
>
> #-----Other Annotation Feature Types (features MAKER doesn't recognize)
> other_gff= #extra features to pass-through to final MAKER generated GFF3
> file
>
> #-----External Application Behavior Options
> alt_peptide=C #amino acid used to replace non-standard amino acids in
> BLAST databases
> cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI,
> leave 1 when using MPI)
>
> #-----MAKER Behavior Options
> max_dna_len=100000 #length for dividing up contigs into chunks
> (increases/decreases memory usage)
> min_contig=1 #skip genome contigs below this length (under 10kb are often
> useless)
>
> pred_flank=200 #flank for extending evidence clusters sent to gene
> predictors
> pred_stats=1 #report AED and QI statistics for all predictions as well as
> models
> AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and
> 1)
> min_protein=0 #require at least this many amino acids in predicted proteins
> alt_splice=0 #Take extra steps to try and find alternative splicing, 1 =
> yes, 0 = no
> always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0
> = no
> map_forward=0 #map names and attributes forward from old GFF3 genes, 1 =
> yes, 0 = no
> keep_preds=1 #Concordance threshold to add unsupported gene prediction
> (bound by 0 and 1)
>
> split_hit=10000 #length for the splitting of hits (expected max intron
> size for evidence alignments)
> single_exon=1 #consider single exon EST evidence when generating
> annotations, 1 = yes, 0 = no
> single_length=250 #min length required for single exon ESTs if
> 'single_exon is enabled'
> correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion
> genes
>
> tries=2 #number of times to try a contig if there is a failure for some
> reason
> clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0
> = no
> clean_up=0 #removes theVoid directory with individual analysis files, 1 =
> yes, 0 = no
> TMP= #specify a directory other than the system default temporary
> directory for temporary files
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
> --
> Xabier V?zquez-Campos, *PhD*
> *Research Associate*
> NSW Systems Biology Initiative
> School of Biotechnology and Biomolecular Sciences
> The University of New South Wales
> Sydney NSW 2052 AUSTRALIA
>


-- 
Xabier V?zquez-Campos, *PhD*
*Research Associate*
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20190206/ed24fbe6/attachment.html>

From xvazquezc at gmail.com  Wed Feb  6 16:33:47 2019
From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=)
Date: Thu, 7 Feb 2019 09:33:47 +1100
Subject: [maker-devel] Re-annotation, fewer gene predictions
In-Reply-To: <DM5PR14MB1292FEA9F662D408FEBB3D21AE6F0@DM5PR14MB1292.namprd14.prod.outlook.com>
References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>
	<CAL0hg4HevFbPhVLfuLq3WF7iJUFpHKwm0X9q+X_yX5sJsCqKDA@mail.gmail.com>
	<DM5PR14MB129277D10A397B2CBE0DDA08AE6E0@DM5PR14MB1292.namprd14.prod.outlook.com>
	<CAL0hg4EH=79A7ucKe=ORznXh=7Suu9Q8AEWj7C8Xio82=G4fvw@mail.gmail.com>
	<DM5PR14MB1292FEA9F662D408FEBB3D21AE6F0@DM5PR14MB1292.namprd14.prod.outlook.com>
Message-ID: <CAL0hg4HG0n1+kw4PpFL_LG66nE+Sdd1fzX2Atn5+o+KryVCtug@mail.gmail.com>

 SNAP is easy to train, works well in fungal genomes and it's explained in
Maker's wiki:
http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_WGS_Assembly_and_Annotation_Winter_School_2018#Training_ab_initio_Gene_Predictors

Oh, sorry, I didn't explain myself well. What I was trying to say is that
before BUSCO, when we only had CEGMA, we would proceed in a different way
to train Augustus as CEGMA wouldn't produce Augustus gene models
automatically. I don't mean you to use CEGMA.

This is what I have on my own documentation about how to train Augustus
"the old way"

> AUGUSTUS? the old way
>
> Alternatively, you can train AUGUSTUS in a more ?manual? way, like when we
> were using CEGMA. The training starts with the output from the second
> instance of fathom in the SNAP training section.
>
> cd ${MYGENOME_DIR}/maker/snap1
> perl ~/bin/zff2augustus_gbk.pl > ${MYGENOME}.train1.gb
>
> zff2augustus_gbk.pl generates a GenBank file from export.dna.
>
> The actual training of AUGUSTUS will be through the *webAUGUSTUS server*.
>
> Before proceed, it is recommended to rename the fasta headers, specially
> if they contain special characters and/or very long headers. This is the
> main reason of failure for the jobs submitted to webAUGUSTUS. You can use
> the simplifyFastaHeaders.pl
> <http://bioinf.uni-greifswald.de/bioinf/downloads/simplifyFastaHeaders.pl>
> script for that:
>
> perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_assembly.fasta nameStem ${MYGENOME}_contigs_rename.fasta ${MYGENOME}_contigs.map
>
> perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_transcripts_assembled.fasta nameStem ${MYGENOME}_rna_rename.fasta ${MYGENOME}_rna.map
>
> nameStem is the base name for naming each of the sequences in the
> multifasta files. Use a value with something appropriate. Use *contig*
> and *rna* for the assembly and RNA-seq files, respectively; or something
> based on that. For example, ?pgcontig? and ?pgrna? for contigs and RNA from *Puccinia
> graminis*
> *DO NOT* give the same nameStem to both fasta files, and don?t use any
> special character.
>
> We need the following files (minimum):
>
>    - ${MYGENOME}_assembly.fasta as *Genome file*
>    - ${MYGENOME}.train1.gb as *Training gene structure file*
>
> If we also have RNA-seq data:
>
>    - ${MYGENOME}_assembled_transcripts.fasta as *cDNA file*
>
> Use ${MYGENOME}_v1 as *Species name*. We will need to have a different
> species name in the retraining step. Otherwise when Maker2 is rerun, Maker2
> will see the same name and will not rerun AUGUSTUS, even though the species
> profile is different. So, ${MYGENOME}_v1 just do the job and tracks
> version.
>
> Once the job is finished, the *Species parameter archive* (
> parameters.tar.gz) will contain a folder with the model files for your
> species. Copy it to the species folder of your AUGUSTUS installation.
>
Hope this helps

PS: hit reply all so this is logged in Maker's mail list in case anybody
else experiences similar issues

On Thu, 7 Feb 2019 at 06:36, morgan sobol <morgan_starr_s at live.com> wrote:

> I have not used SNAP or CEGMA, however, I see that CEGMA was discontinued
> in 2015.
> Do you think that will be a problem, or is it still worth using the old
> version?
>
>
> ------------------------------
> *From:* Xabier V?zquez-Campos <xvazquezc at gmail.com>
> *Sent:* Tuesday, February 5, 2019 4:42 PM
> *To:* morgan sobol; Maker Mailing List
> *Subject:* Re: [maker-devel] Re-annotation, fewer gene predictions
>
> Don't you use SNAP? It usually produces quite decent results. And easier
> to train than any of the other predictors
>
> In any case, the Augustus gene model is way off in both cases
> GM doesn't seem bad if your fungus has a rather usual genome... in the
> first. For the second, it looks bad
>
> I'm not too familiar with the reannotation but I'd rather create the gene
> models from scratch rather than reuse the ones from the Illumina-only
> genomes.
> Note that assemblies with long-reads, have a higher proportion of
> repetitive elements that need masking and RepeatMasker only may not be
> enough. In theory, this shouldn't affect Augustus model if trained through
> BUSCO as it uses defined conserved markers to create the gene model, but
> I'm not so sure about GM.
>
> If you trained Augustus with BUSCO, and this is the result, I'd discard
> the gene model and train it again by the "traditional way", i.e. as it used
> to be when we only had CEGMA. I had good results just by changing the
> training method.
>
> Hope it helps,
> Xabi
>
>
>
>
> On Wed, 6 Feb 2019 at 02:19, morgan sobol <morgan_starr_s at live.com> wrote:
>
> Thank you, Xabi for the response.
> The number of proteins from each source is greatly lower than before.
> Previous numbers were 325, 10,899, and 11,243 for augustus, genemark, and
> maker respectively.
> The more recent numbers are 25, 857, 4418 respectively.
>
> So do you think maybe this hints that something is wrong from genemark?
>
> Morgan
>
>
> ------------------------------
> *From:* Xabier V?zquez-Campos <xvazquezc at gmail.com>
> *Sent:* Sunday, February 3, 2019 4:43 PM
> *To:* morgan sobol
> *Cc:* maker-devel at yandell-lab.org
> *Subject:* Re: [maker-devel] Re-annotation, fewer gene predictions
>
> Hi Morgan,
>
> We had a similar issue with AUGUSTUS underpredicting when using a
> BUSCO-derived gene model
> https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ
>
> Also, check the number of proteins by each individual predictor. If the
> numbers from one of them are off, you may find a possible source of issues.
> We didn't have a very good experience with GM, as it used to overpredict
> an absurd number of proteins.
>
> Xabi
>
> On Mon, 4 Feb 2019 at 06:15, morgan sobol <morgan_starr_s at live.com> wrote:
>
> Hello,
>
> I previously used Maker to annotate two different fungal genomes that were
> created using Illumina sequences only. For these genomes, I had over 11,000
> genes predicted.
> I recently obtained PacBio sequences for the same genomes, so I created
> two hybrid assemblies. Both assemblies were very familiar in length and
> completed number of orthologs to the Illumina only assembly, but had much
> fewer, but longer contigs.
>
> I re-ran Maker using the settings below. For one of my genomes, I got
> around 11,000 genes predicted again, as expected. However, for the other
> genome, I am continuously getting ~4,400 predicted genes.
>
> I am asking for help as to how I can determine why I keep getting fewer
> predicted genes for only one of my genomes, even though I ran them the same?
>
> Thanks,
> Morgan S.
>
> maker_opts.log
> #-----Genome (these are always required)
> genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked
> #genome sequence (fasta file or$
> organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic
>
> #-----Re-annotation Using MAKER Derived GFF3
> maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff
> #MAKER derive$
> est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
> altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
> protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
> rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
> model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
> pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
> other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no
>
> #-----EST Evidence (for best results provide a file for at least one)
> est= #set of ESTs or assembled mRNA-seq in fasta format
> altest= #EST/cDNA sequence file in fasta format from an alternate organism
> est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
> altest_gff= #aligned ESTs from a closly relate species in GFF3 format
>
> #-----Protein Homology Evidence (for best results provide a file for at
> least one)
> protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta
> #protein sequence file in fasta format (i.e. from mutiple oransisms)
> protein_gff=  #aligned protein homology evidence from an external GFF3 file
>
> #-----Repeat Masking (leave values blank to skip repeat masking)
> model_org= #select a model organism for RepBase masking in RepeatMasker
> rmlib= #provide an organism specific repeat library in fasta format for
> RepeatMasker
> repeat_protein= #provide a fasta file of transposable element proteins for
> RepeatRunner
> rm_gff= #pre-identified repeat elements from an external GFF3 file
> prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change
> this), 1 = yes, 0 = no
> softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg
> and dust filtering)
>
> #-----Gene Prediction
> snaphmm= #SNAP HMM file
> gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file
> augustus_species=1368D_uni #Augustus gene prediction species model
> fgenesh_par_file= #FGENESH parameter file
> pred_gff= #ab-initio predictions from an external GFF3 file
> model_gff= #annotated gene models from an external GFF3 file (annotation
> pass-through)
> est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
> protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no
> trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
> snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
> unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 =
> yes, 0 = no
>
> #-----Other Annotation Feature Types (features MAKER doesn't recognize)
> other_gff= #extra features to pass-through to final MAKER generated GFF3
> file
>
> #-----External Application Behavior Options
> alt_peptide=C #amino acid used to replace non-standard amino acids in
> BLAST databases
> cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI,
> leave 1 when using MPI)
>
> #-----MAKER Behavior Options
> max_dna_len=100000 #length for dividing up contigs into chunks
> (increases/decreases memory usage)
> min_contig=1 #skip genome contigs below this length (under 10kb are often
> useless)
>
> pred_flank=200 #flank for extending evidence clusters sent to gene
> predictors
> pred_stats=1 #report AED and QI statistics for all predictions as well as
> models
> AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and
> 1)
> min_protein=0 #require at least this many amino acids in predicted proteins
> alt_splice=0 #Take extra steps to try and find alternative splicing, 1 =
> yes, 0 = no
> always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0
> = no
> map_forward=0 #map names and attributes forward from old GFF3 genes, 1 =
> yes, 0 = no
> keep_preds=1 #Concordance threshold to add unsupported gene prediction
> (bound by 0 and 1)
>
> split_hit=10000 #length for the splitting of hits (expected max intron
> size for evidence alignments)
> single_exon=1 #consider single exon EST evidence when generating
> annotations, 1 = yes, 0 = no
> single_length=250 #min length required for single exon ESTs if
> 'single_exon is enabled'
> correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion
> genes
>
> tries=2 #number of times to try a contig if there is a failure for some
> reason
> clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0
> = no
> clean_up=0 #removes theVoid directory with individual analysis files, 1 =
> yes, 0 = no
> TMP= #specify a directory other than the system default temporary
> directory for temporary files
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
> --
> Xabier V?zquez-Campos, *PhD*
> *Research Associate*
> NSW Systems Biology Initiative
> School of Biotechnology and Biomolecular Sciences
> The University of New South Wales
> Sydney NSW 2052 AUSTRALIA
>
>
>
> --
> Xabier V?zquez-Campos, *PhD*
> *Research Associate*
> NSW Systems Biology Initiative
> School of Biotechnology and Biomolecular Sciences
> The University of New South Wales
> Sydney NSW 2052 AUSTRALIA
>


-- 
Xabier V?zquez-Campos, *PhD*
*Research Associate*
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20190207/e334d07a/attachment.html>

From liorglic at mail.tau.ac.il  Mon Feb 11 08:04:16 2019
From: liorglic at mail.tau.ac.il (Lior Glick)
Date: Mon, 11 Feb 2019 16:04:16 +0200
Subject: [maker-devel] MAKER not calling RepeatMasker exe indicated in
 maker_exe.ctl
Message-ID: <CAOzMDPxUf8a9orgsmbJ8QDdq4=OoKL_AkjVbsbPcGGm8z6ufXg@mail.gmail.com>

Dear MAKER users,

I've been using MAKER for a while now, with RepeatMasker installed locally.
By that I mean that I can type 'RepeatMasker' in my terminal and the
software is initiated. Typing 'which RepeatMasker' shows the correct local
path.
I also use this path as value for the maker_exe.ctl parameter
'RepeatMasker'.
Trying to generalize my working environment, I am trying to use a conda env
<https://anaconda.org/bioconda/maker> which is capable of running MAKER.
This env comes with RepeatMasker as well. Once I activate this env, I can
still run RepeatMasker, but it points to a different path. When I run MAKER
within this env, it fails right away with the error message:
ERROR: Could not determine if RepBase is installed
Running the same configuration files locally (i.e. outside the conda env)
results in a successful run.
This leads me to think that MAKER is not actually using the path indicated
in the maker_exe.ctl file, and rather looks for RepeatMasker in $PATH or
something similar. Is that the expected behavior? Any suggestions of how to
overcome this issue?

Thanks and best regards,
Lior
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20190211/2c8039fa/attachment.html>

From liorglic at mail.tau.ac.il  Mon Feb 11 08:12:25 2019
From: liorglic at mail.tau.ac.il (Lior Glick)
Date: Mon, 11 Feb 2019 16:12:25 +0200
Subject: [maker-devel] Unknown (X) amino acids in predicted proteins
Message-ID: <CAOzMDPwAC-KnF_h__kOUM_s5nziOHmrGq8ika9Hfb40wny3_xQ@mail.gmail.com>

Dear MAKER users,

After completing a MAKER run, I looked at the protein fasta files that
MAKER outputs and noticed that a small fraction of the sequences include X
characters, indicating unknown amino acids. I was wondering how such
sequences are obtained, I mean how come there are unknown amino acids in
the prediction? Is this an indication of low-quality predictions?
Is there any documentation regarding the procedure that generates the
protein sequences?

Thanks a lot,
Lior
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20190211/55a59fcd/attachment.html>

From kapeelc at gmail.com  Thu Feb  7 13:43:47 2019
From: kapeelc at gmail.com (Kapeel Chougule)
Date: Thu, 7 Feb 2019 14:43:47 -0500
Subject: [maker-devel] MAKER v3 Fgenesh ERROR
Message-ID: <CA+DOtefuUEc5_fFh7j2ykb4yBKmtEp1vgt0Pea-RF+7GCqr9ig@mail.gmail.com>

Hi, Carson

I have been getting this error with fgenesh tool within MAKER. It runs ok
with most of the assembly contigs but seems to fail on one contig or part
of the contig with the below error

Widget::fgenesh:
/mnt/grid/ware/hpc/home/data/mcampbel/applications/maker/bin/../lib/Widget/fgenesh/fgenesh_wrap
/mnt/grid/ware/hpc_norepl/data/data/programs/fgenesh_v8/fgenesh_suite_v8.0.0a/fgenesh
/sonas-hs/ware/hpc_norepl/data/programs/fgenesh_v8/fgenesh_suite_v8.0.0a/Zeamays.mpar.dat.new
/tmp/uge/53139300.1.primary.q/maker_j3ttxX/6/6_1.600610-613023.Zeamays.mpar.dat.new.auto_annotator.fgenesh.fasta
-exon_table:/tmp/uge/53139300.1.primary.q/maker_j3ttxX/6/6_1.600610-613023.Zeamays.mpar.dat.new.auto_annotator.xdef.fgenesh
>
/tmp/uge/53139300.1.primary.q/maker_j3ttxX/6/6_1.600610-613023.Zeamays.mpar.dat.new.auto_annotator.fgenesh
#-------------------------------#
 ...processing 9 of 24
 ...processing 8 of 28
 ...processing 10 of 24
 ...processing 9 of 28
 ...processing 11 of 24
 ...processing 10 of 28
 ...processing 12 of 24
 ...processing 11 of 28
deleted:0 genes
ERROR: FgenesH failed
--> rank=14, hostname=bnbcompute50
ERROR: Failed while annotating transcripts
ERROR: Chunk failed at level:1, tier_type:4
FAILED CONTIG:Super-Scaffold_14.2_contig2

I updated the perl module fgenesh.pm as suggested in the previous threads.
Attached are the  maker_opts.ctl and STDERR log file.

Thanks

Kapeel


-- 


*Kapeel ChouguleComputational Scientist Developer II*


*One Bungtown Road Cold Spring Harbor, NY 11724http://www.warelab.org/
<http://www.warelab.org/>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20190207/b825acee/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: maker_opts.ctl
Type: application/octet-stream
Size: 5420 bytes
Desc: not available
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20190207/b825acee/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stderr.log
Type: application/octet-stream
Size: 10012917 bytes
Desc: not available
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20190207/b825acee/attachment-0001.obj>

From fatih.sarigoel at durham.ac.uk  Wed Feb 13 06:20:40 2019
From: fatih.sarigoel at durham.ac.uk (SARIGOEL, FATIH)
Date: Wed, 13 Feb 2019 12:20:40 +0000
Subject: [maker-devel] Does Conda Maker actually work?
Message-ID: <VI1PR06MB5613478CC864D85EB234EDF2B5660@VI1PR06MB5613.eurprd06.prod.outlook.com>

Greetings,
I notice that you never mention conda installation on your website, so I am curious if the conda version is actually supposed to be working fine or not; as for me it didn't.
I created a new conda environment and installed Maker (tried this with both installation options)
When I run the example files, I get this error:

"make: *** [Makefile:330: IndexedBase_14e0.o] Error 127
A problem was encountered while attempting to compile and install your Inline
C code. The command that failed was:
  "make > out.make 2>&1" with error code 2"

My conda environment is here
/fast_new/work/users/fsarigo_m/miniconda3
I don't understand why the program is trying to look here:
/home/conda
which does not exist

Also begins with a "possible precedence issue"

Thanks for your help in advance!
Fatih

+++++

Here is the full log until the end of the contig:

(MakerX) [fsarigo_m at med0223 MAKER]$ maker
Possible precedence issue with control flow operator at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 845.
STATUS: Parsing control files...
STATUS: Processing and indexing input FASTA files...
STATUS: Setting up database for any GFF3 input...
A data structure will be created for you at:
/fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_datastore

To access files for individual sequences use the datastore index:
/fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_master_datastore_index.log

STATUS: Now running MAKER...
examining contents of the fasta file and run log


--Next Contig--

Processing run.log file...
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: contig-dpp-500-500
Length: 32156
#---------------------------------------------------------------------


Running Mkbootstrap for IndexedBase_14e0 ()
chmod 644 "IndexedBase_14e0.bs"
"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644
"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp"  -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap"   IndexedBase_14e0.xs > IndexedBase_14e0.xsc
mv IndexedBase_14e0.xsc IndexedBase_14e0.c
/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c  -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2   -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE"   IndexedBase_14e0.c
/bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory
make: *** [Makefile:330: IndexedBase_14e0.o] Error 127

A problem was encountered while attempting to compile and install your Inline
C code. The command that failed was:
  "make > out.make 2>&1" with error code 2

The build directory was:
/fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0

To debug the problem, cd to the build directory, and inspect the output files.

Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin'
 at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275.
--> rank=NA, hostname=med0223
...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869.
--> rank=NA, hostname=med0223
 at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 38.
Error::_throw_Error_Simple(HASH(0x564b40c78870)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 306
Error::subs::run_clauses(HASH(0x564b40688970), "Running Mkbootstrap for IndexedBase_14e0 ()\x{a}chmod 644 \"Indexe"..., undef, ARRAY(0x564b40673ad0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 426
Error::subs::try(CODE(0x564b406899b8), HASH(0x564b40688970)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/FastaSeq.pm line 95
FastaSeq::seq(FastaSeq=HASH(0x564b4068a7f0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 478
Process::MpiChunk::_go(Process::MpiChunk=HASH(0x564b40673c08), "run", HASH(0x564b40673c80), 0, 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 341
Process::MpiChunk::run(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 357
Process::MpiChunk::run_all(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiTiers.pm line 287
Process::MpiTiers::run_all(Process::MpiTiers=HASH(0x564b4053f9f0), 0) called at /fast/users/fsarigo_m/miniconda3/envs/MakerX/bin/maker line 683
Running Mkbootstrap for IndexedBase_14e0 ()
chmod 644 "IndexedBase_14e0.bs"
"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644
"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp"  -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap"   IndexedBase_14e0.xs > IndexedBase_14e0.xsc
mv IndexedBase_14e0.xsc IndexedBase_14e0.c
/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c  -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2   -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE"   IndexedBase_14e0.c
/bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory
make: *** [Makefile:330: IndexedBase_14e0.o] Error 127

A problem was encountered while attempting to compile and install your Inline
C code. The command that failed was:
  "make > out.make 2>&1" with error code 2

The build directory was:
/fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0

To debug the problem, cd to the build directory, and inspect the output files.

Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin'
 at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275.
--> rank=NA, hostname=med0223
...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869.
--> rank=NA, hostname=med0223
--> rank=NA, hostname=med0223
--> rank=NA, hostname=med0223
ERROR: Failed while examining contents of the fasta file and run log
ERROR: Chunk failed at level:0, tier_type:0
FAILED CONTIG:contig-dpp-500-500

examining contents of the fasta file and run log


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20190213/5e5ba244/attachment.html>

From carsonhh at gmail.com  Wed Feb 13 08:51:44 2019
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 13 Feb 2019 07:51:44 -0700
Subject: [maker-devel] Does Conda Maker actually work?
In-Reply-To: <VI1PR06MB5613478CC864D85EB234EDF2B5660@VI1PR06MB5613.eurprd06.prod.outlook.com>
References: <VI1PR06MB5613478CC864D85EB234EDF2B5660@VI1PR06MB5613.eurprd06.prod.outlook.com>
Message-ID: <0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com>

The conda recipe was produced by another group. I do not currently recommend using it because I have seen a number of issues pop up on the list based on people attempting to install MAKER via conda.  I know there is at least an issue with the conda RepeatMasker install, and there may be others. The specific failure you show is from Bio::DB::IndexedBase trying to compile an Inline::C function. It may be that conda is installing an older BioPerl where this issue still exists ?> https://github.com/bioperl/bioperl-live/issues/215 <https://github.com/bioperl/bioperl-live/issues/215>

Or it may be that there is a new related issue (I?ve seen a handful of other examples that seem to relate back to Bio::DB::IndexedBase) ?> https://github.com/bioperl/bioperl-live/issues/305 <https://github.com/bioperl/bioperl-live/issues/305>

Try installing MAKER without conda (make sure to remove any components that are in conda first to avoid conflicts).

?Carson


> On Feb 13, 2019, at 5:20 AM, SARIGOEL, FATIH <fatih.sarigoel at durham.ac.uk> wrote:
> 
> Greetings,
> I notice that you never mention conda installation on your website, so I am curious if the conda version is actually supposed to be working fine or not; as for me it didn't.
> I created a new conda environment and installed Maker (tried this with both installation options)
> When I run the example files, I get this error:
> 
> "make: *** [Makefile:330: IndexedBase_14e0.o] Error 127
> A problem was encountered while attempting to compile and install your Inline
> C code. The command that failed was:
>   "make > out.make 2>&1" with error code 2"
> 
> My conda environment is here
> /fast_new/work/users/fsarigo_m/miniconda3
> I don't understand why the program is trying to look here:
> /home/conda
> which does not exist
> 
> Also begins with a "possible precedence issue"
> 
> Thanks for your help in advance!
> Fatih
> 
> +++++
> 
> Here is the full log until the end of the contig:
> 
> (MakerX) [fsarigo_m at med0223 MAKER]$ maker
> Possible precedence issue with control flow operator at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 845.
> STATUS: Parsing control files...
> STATUS: Processing and indexing input FASTA files...
> STATUS: Setting up database for any GFF3 input...
> A data structure will be created for you at:
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_datastore
> 
> To access files for individual sequences use the datastore index:
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_master_datastore_index.log
> 
> STATUS: Now running MAKER...
> examining contents of the fasta file and run log
> 
> 
> 
> --Next Contig--
> 
> Processing run.log file...
> #---------------------------------------------------------------------
> Now starting the contig!!
> SeqID: contig-dpp-500-500
> Length: 32156
> #---------------------------------------------------------------------
> 
> 
> Running Mkbootstrap for IndexedBase_14e0 ()
> chmod 644 "IndexedBase_14e0.bs"
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp"  -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap"   IndexedBase_14e0.xs > IndexedBase_14e0.xsc
> mv IndexedBase_14e0.xsc IndexedBase_14e0.c
> /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c  -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2   -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE"   IndexedBase_14e0.c
> /bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory
> make: *** [Makefile:330: IndexedBase_14e0.o] Error 127
> 
> A problem was encountered while attempting to compile and install your Inline
> C code. The command that failed was:
>   "make > out.make 2>&1" with error code 2
> 
> The build directory was:
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0
> 
> To debug the problem, cd to the build directory, and inspect the output files.
> 
> Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin'
>  at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275.
> --> rank=NA, hostname=med0223
> ...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869.
> --> rank=NA, hostname=med0223
>  at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 38.
> Error::_throw_Error_Simple(HASH(0x564b40c78870)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 306
> Error::subs::run_clauses(HASH(0x564b40688970), "Running Mkbootstrap for IndexedBase_14e0 ()\x{a}chmod 644 \"Indexe"..., undef, ARRAY(0x564b40673ad0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 426
> Error::subs::try(CODE(0x564b406899b8), HASH(0x564b40688970)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/FastaSeq.pm line 95
> FastaSeq::seq(FastaSeq=HASH(0x564b4068a7f0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 478
> Process::MpiChunk::_go(Process::MpiChunk=HASH(0x564b40673c08), "run", HASH(0x564b40673c80), 0, 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 341
> Process::MpiChunk::run(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 357
> Process::MpiChunk::run_all(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiTiers.pm line 287
> Process::MpiTiers::run_all(Process::MpiTiers=HASH(0x564b4053f9f0), 0) called at /fast/users/fsarigo_m/miniconda3/envs/MakerX/bin/maker line 683
> Running Mkbootstrap for IndexedBase_14e0 ()
> chmod 644 "IndexedBase_14e0.bs"
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp"  -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap"   IndexedBase_14e0.xs > IndexedBase_14e0.xsc
> mv IndexedBase_14e0.xsc IndexedBase_14e0.c
> /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c  -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2   -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE"   IndexedBase_14e0.c
> /bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory
> make: *** [Makefile:330: IndexedBase_14e0.o] Error 127
> 
> A problem was encountered while attempting to compile and install your Inline
> C code. The command that failed was:
>   "make > out.make 2>&1" with error code 2
> 
> The build directory was:
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0
> 
> To debug the problem, cd to the build directory, and inspect the output files.
> 
> Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin'
>  at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275.
> --> rank=NA, hostname=med0223
> ...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869.
> --> rank=NA, hostname=med0223
> --> rank=NA, hostname=med0223
> --> rank=NA, hostname=med0223
> ERROR: Failed while examining contents of the fasta file and run log
> ERROR: Chunk failed at level:0, tier_type:0
> FAILED CONTIG:contig-dpp-500-500
> 
> examining contents of the fasta file and run log
> 
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20190213/033ff22a/attachment.html>

From carsonhh at gmail.com  Wed Feb 13 11:14:13 2019
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 13 Feb 2019 10:14:13 -0700
Subject: [maker-devel] MAKER not calling RepeatMasker exe indicated in
 maker_exe.ctl
In-Reply-To: <CAFOVipNgzGd-wLNqz1WGx+mM_8R3KZOtqatq6D+nuNCHboRPXQ@mail.gmail.com>
References: <CAFOVipNgzGd-wLNqz1WGx+mM_8R3KZOtqatq6D+nuNCHboRPXQ@mail.gmail.com>
Message-ID: <6AFF11A9-9860-4047-A337-4B974C6C0F30@gmail.com>

The conda installation of RepeatMasker runs oddly. It does not appear to run the ./configure script during setup, and is missing files inside the repeat library as a result.

--Carson


> On Feb 4, 2019, at 2:00 AM, Lior Glick <liorglck at gmail.com> wrote:
> 
> Dear MAKER users,
> 
> I've been using MAKER for a while now, with RepeatMasker installed locally. By that I mean that I can type 'RepeatMasker' in my terminal and the software is initiated. Typing 'which RepeatMasker' shows the correct local path.
> I also use this path as value for the maker_exe.ctl parameter 'RepeatMasker'.
> Trying to generalize my working environment, I am trying to use a conda env <https://anaconda.org/bioconda/maker> which is capable of running MAKER. This env comes with RepeatMasker as well. Once I activate this env, I can still run RepeatMasker, but it points to a different path. When I run MAKER within this env, it fails right away with the error message:
> ERROR: Could not determine if RepBase is installed
> Running the same configuration files locally (i.e. outside the conda env) results in a successful run.
> This leads me to think that MAKER is not actually using the path indicated in the maker_exe.ctl file, and rather looks for RepeatMasker in $PATH or something similar. Is that the expected behavior? Any suggestions of how to overcome this issue?
> 
> Thanks and best regards,
> Lior
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20190213/204470fd/attachment.html>

From carsonhh at gmail.com  Wed Feb 13 11:18:44 2019
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 13 Feb 2019 10:18:44 -0700
Subject: [maker-devel] Unknown (X) amino acids in predicted proteins
In-Reply-To: <CAOzMDPwAC-KnF_h__kOUM_s5nziOHmrGq8ika9Hfb40wny3_xQ@mail.gmail.com>
References: <CAOzMDPwAC-KnF_h__kOUM_s5nziOHmrGq8ika9Hfb40wny3_xQ@mail.gmail.com>
Message-ID: <1472E55C-62CB-4A73-B45D-C4BEF3E014B7@gmail.com>

If you use GFF3 as input, or use est2genome or protein2genome in your final run, you may have ?N? characters from the assembly as part of your CDS (?N? is the ambiguity code for DNA which will result in an ?X? when translated which is the ambiguity code for amino acids). Augustus will do internal gymnastics and completely splice out exons containing N?s to try and never have this issue, but may not always be able to. It?s an indication of genome assembly issues.

--Carson


> On Feb 11, 2019, at 7:12 AM, Lior Glick <liorglic at mail.tau.ac.il> wrote:
> 
> Dear MAKER users,
> 
> After completing a MAKER run, I looked at the protein fasta files that MAKER outputs and noticed that a small fraction of the sequences include X characters, indicating unknown amino acids. I was wondering how such sequences are obtained, I mean how come there are unknown amino acids in the prediction? Is this an indication of low-quality predictions?
> Is there any documentation regarding the procedure that generates the protein sequences?
> 
> Thanks a lot,
> Lior
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Wed Feb 13 11:24:01 2019
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 13 Feb 2019 10:24:01 -0700
Subject: [maker-devel] Re-annotation, fewer gene predictions
In-Reply-To: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>
References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>
Message-ID: <D33A2A92-BFCA-4493-A66E-99C567954AD2@gmail.com>

One thing you can also do is use old models as protein= input and run the protein2genome option just to see where things align. You may find that not all old models are recoverable in the new assembly. Fewer genes in the new assembly may mean redundant/duplicate contigs were collapse and split contigs were joined resulting in multiple gene fragments becoming a unified single model. Make sure to always review contigs in a browser to see how models and evidence correlate.

?Carson


> On Feb 3, 2019, at 12:13 PM, morgan sobol <morgan_starr_s at live.com> wrote:
> 
> Hello, 
> 
> I previously used Maker to annotate two different fungal genomes that were created using Illumina sequences only. For these genomes, I had over 11,000 genes predicted. 
> I recently obtained PacBio sequences for the same genomes, so I created two hybrid assemblies. Both assemblies were very familiar in length and completed number of orthologs to the Illumina only assembly, but had much fewer, but longer contigs. 
> 
> I re-ran Maker using the settings below. For one of my genomes, I got around 11,000 genes predicted again, as expected. However, for the other genome, I am continuously getting ~4,400 predicted genes. 
> 
> I am asking for help as to how I can determine why I keep getting fewer predicted genes for only one of my genomes, even though I ran them the same?
> 
> Thanks,
> Morgan S. 
> 
> maker_opts.log
> #-----Genome (these are always required)
> genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked #genome sequence (fasta file or$
> organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic
> 
> #-----Re-annotation Using MAKER Derived GFF3
> maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff #MAKER derive$
> est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
> altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
> protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
> rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
> model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
> pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
> other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no
> 
> #-----EST Evidence (for best results provide a file for at least one)
> est= #set of ESTs or assembled mRNA-seq in fasta format
> altest= #EST/cDNA sequence file in fasta format from an alternate organism
> est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
> altest_gff= #aligned ESTs from a closly relate species in GFF3 format
> 
> #-----Protein Homology Evidence (for best results provide a file for at least one)
> protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta  #protein sequence file in fasta format (i.e. from mutiple oransisms)
> protein_gff=  #aligned protein homology evidence from an external GFF3 file
> 
> #-----Repeat Masking (leave values blank to skip repeat masking)
> model_org= #select a model organism for RepBase masking in RepeatMasker
> rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker
> repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner
> rm_gff= #pre-identified repeat elements from an external GFF3 file
> prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no
> softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering)
> 
> #-----Gene Prediction
> snaphmm= #SNAP HMM file
> gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file
> augustus_species=1368D_uni #Augustus gene prediction species model
> fgenesh_par_file= #FGENESH parameter file
> pred_gff= #ab-initio predictions from an external GFF3 file
> model_gff= #annotated gene models from an external GFF3 file (annotation pass-through)
> est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
> protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no
> trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
> snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
> unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no
> 
> #-----Other Annotation Feature Types (features MAKER doesn't recognize)
> other_gff= #extra features to pass-through to final MAKER generated GFF3 file
> 
> #-----External Application Behavior Options
> alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases
> cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI)
> 
> #-----MAKER Behavior Options
> max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage)
> min_contig=1 #skip genome contigs below this length (under 10kb are often useless)
> 
> pred_flank=200 #flank for extending evidence clusters sent to gene predictors
> pred_stats=1 #report AED and QI statistics for all predictions as well as models
> AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
> min_protein=0 #require at least this many amino acids in predicted proteins
> alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no
> always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no
> map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no
> keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1)
> 
> split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments)
> single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no
> single_length=250 #min length required for single exon ESTs if 'single_exon is enabled'
> correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes
> 
> tries=2 #number of times to try a contig if there is a failure for some reason
> clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no
> clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no
> TMP= #specify a directory other than the system default temporary directory for temporary files
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20190213/9051057c/attachment.html>

From liorglck at gmail.com  Sun Feb 17 12:50:10 2019
From: liorglck at gmail.com (Lior Glick)
Date: Sun, 17 Feb 2019 20:50:10 +0200
Subject: [maker-devel] Does Conda Maker actually work?
In-Reply-To: <0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com>
References: <VI1PR06MB5613478CC864D85EB234EDF2B5660@VI1PR06MB5613.eurprd06.prod.outlook.com>
	<0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com>
Message-ID: <CAFOVipPHWZ++FwVdBMDuMx_PTRT2Ep-MZc=iD13ezT1bgrMZwg@mail.gmail.com>

That's good to know. Any plans on creating a stable conda package in the
future? It'd be a very nice feature, especially since MAKER is not always
straightforward to install.

On Wed, Feb 13, 2019 at 5:22 PM Carson Holt <carsonhh at gmail.com> wrote:

> The conda recipe was produced by another group. I do not currently
> recommend using it because I have seen a number of issues pop up on the
> list based on people attempting to install MAKER via conda.  I know there
> is at least an issue with the conda RepeatMasker install, and there may be
> others. The specific failure you show is from Bio::DB::IndexedBase trying
> to compile an Inline::C function. It may be that conda is installing an
> older BioPerl where this issue still exists ?>
> https://github.com/bioperl/bioperl-live/issues/215
>
> Or it may be that there is a new related issue (I?ve seen a handful of
> other examples that seem to relate back to Bio::DB::IndexedBase) ?>
> https://github.com/bioperl/bioperl-live/issues/305
>
> Try installing MAKER without conda (make sure to remove any components
> that are in conda first to avoid conflicts).
>
> ?Carson
>
>
> On Feb 13, 2019, at 5:20 AM, SARIGOEL, FATIH <fatih.sarigoel at durham.ac.uk>
> wrote:
>
> Greetings,
> I notice that you never mention conda installation on your website, so I
> am curious if the conda version is actually supposed to be working fine or
> not; as for me it didn't.
> I created a new conda environment and installed Maker (tried this with
> both installation options)
> When I run the example files, I get this error:
>
> "make: *** [Makefile:330: IndexedBase_14e0.o] Error 127
> A problem was encountered while attempting to compile and install your
> Inline
> C code. The command that failed was:
>   "make > out.make 2>&1" with error code 2"
>
> My conda environment is here
> /fast_new/work/users/fsarigo_m/miniconda3
> I don't understand why the program is trying to look here:
> /home/conda
> which does not exist
>
> Also begins with a "possible precedence issue"
>
> Thanks for your help in advance!
> Fatih
>
> +++++
>
> Here is the full log until the end of the contig:
>
> (MakerX) [fsarigo_m at med0223 MAKER]$ maker
> Possible precedence issue with control flow operator at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm
> line 845.
> STATUS: Parsing control files...
> STATUS: Processing and indexing input FASTA files...
> STATUS: Setting up database for any GFF3 input...
> A data structure will be created for you at:
>
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_datastore
>
> To access files for individual sequences use the datastore index:
>
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_master_datastore_index.log
>
> STATUS: Now running MAKER...
> examining contents of the fasta file and run log
>
>
>
> --Next Contig--
>
> Processing run.log file...
> #---------------------------------------------------------------------
> Now starting the contig!!
> SeqID: contig-dpp-500-500
> Length: 32156
> #---------------------------------------------------------------------
>
>
> Running Mkbootstrap for IndexedBase_14e0 ()
> chmod 644 "IndexedBase_14e0.bs"
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl"
> -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs
> blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl"
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp"
> -typemap
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap"
>  IndexedBase_14e0.xs > IndexedBase_14e0.xsc
> mv IndexedBase_14e0.xsc IndexedBase_14e0.c
> /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc
> -c  -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin"
> -D_REENTRANT -D_GNU_SOURCE
> --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot
> -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong
> -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2
>  -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC
> --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot
> "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE"
>  IndexedBase_14e0.c
> /bin/sh:
> /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc:
> No such file or directory
> make: *** [Makefile:330: IndexedBase_14e0.o] Error 127
>
> A problem was encountered while attempting to compile and install your
> Inline
> C code. The command that failed was:
>   "make > out.make 2>&1" with error code 2
>
> The build directory was:
>
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0
>
> To debug the problem, cd to the build directory, and inspect the output
> files.
>
> Environment PATH =
> '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin'
>  at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm
> line 275.
> --> rank=NA, hostname=med0223
> ...propagated at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm
> line 869.
> --> rank=NA, hostname=med0223
>  at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm
> line 38.
> Error::_throw_Error_Simple(HASH(0x564b40c78870)) called at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm
> line 306
> Error::subs::run_clauses(HASH(0x564b40688970), "Running Mkbootstrap for
> IndexedBase_14e0 ()\x{a}chmod 644 \"Indexe"..., undef,
> ARRAY(0x564b40673ad0)) called at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm
> line 426
> Error::subs::try(CODE(0x564b406899b8), HASH(0x564b40688970)) called at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/FastaSeq.pm
> line 95
> FastaSeq::seq(FastaSeq=HASH(0x564b4068a7f0)) called at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm
> line 478
> Process::MpiChunk::_go(Process::MpiChunk=HASH(0x564b40673c08), "run",
> HASH(0x564b40673c80), 0, 0) called at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm
> line 341
> Process::MpiChunk::run(Process::MpiChunk=HASH(0x564b40673c08), 0) called
> at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm
> line 357
> Process::MpiChunk::run_all(Process::MpiChunk=HASH(0x564b40673c08), 0)
> called at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiTiers.pm
> line 287
> Process::MpiTiers::run_all(Process::MpiTiers=HASH(0x564b4053f9f0), 0)
> called at /fast/users/fsarigo_m/miniconda3/envs/MakerX/bin/maker line 683
> Running Mkbootstrap for IndexedBase_14e0 ()
> chmod 644 "IndexedBase_14e0.bs"
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl"
> -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs
> blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl"
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp"
> -typemap
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap"
>  IndexedBase_14e0.xs > IndexedBase_14e0.xsc
> mv IndexedBase_14e0.xsc IndexedBase_14e0.c
> /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc
> -c  -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin"
> -D_REENTRANT -D_GNU_SOURCE
> --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot
> -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong
> -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2
>  -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC
> --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot
> "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE"
>  IndexedBase_14e0.c
> /bin/sh:
> /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc:
> No such file or directory
> make: *** [Makefile:330: IndexedBase_14e0.o] Error 127
>
> A problem was encountered while attempting to compile and install your
> Inline
> C code. The command that failed was:
>   "make > out.make 2>&1" with error code 2
>
> The build directory was:
>
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0
>
> To debug the problem, cd to the build directory, and inspect the output
> files.
>
> Environment PATH =
> '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin'
>  at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm
> line 275.
> --> rank=NA, hostname=med0223
> ...propagated at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm
> line 869.
> --> rank=NA, hostname=med0223
> --> rank=NA, hostname=med0223
> --> rank=NA, hostname=med0223
> ERROR: Failed while examining contents of the fasta file and run log
> ERROR: Chunk failed at level:0, tier_type:0
> FAILED CONTIG:contig-dpp-500-500
>
> examining contents of the fasta file and run log
>
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20190217/678d8fd1/attachment.html>

From morgan_starr_s at live.com  Mon Feb 18 03:08:56 2019
From: morgan_starr_s at live.com (morgan sobol)
Date: Mon, 18 Feb 2019 09:08:56 +0000
Subject: [maker-devel] Re-annotation, fewer gene predictions
In-Reply-To: <CAL0hg4HG0n1+kw4PpFL_LG66nE+Sdd1fzX2Atn5+o+KryVCtug@mail.gmail.com>
References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>
	<CAL0hg4HevFbPhVLfuLq3WF7iJUFpHKwm0X9q+X_yX5sJsCqKDA@mail.gmail.com>
	<DM5PR14MB129277D10A397B2CBE0DDA08AE6E0@DM5PR14MB1292.namprd14.prod.outlook.com>
	<CAL0hg4EH=79A7ucKe=ORznXh=7Suu9Q8AEWj7C8Xio82=G4fvw@mail.gmail.com>
	<DM5PR14MB1292FEA9F662D408FEBB3D21AE6F0@DM5PR14MB1292.namprd14.prod.outlook.com>,
	<CAL0hg4HG0n1+kw4PpFL_LG66nE+Sdd1fzX2Atn5+o+KryVCtug@mail.gmail.com>
Message-ID: <DM5PR14MB1292E82A4864CCC40B80122EAE630@DM5PR14MB1292.namprd14.prod.outlook.com>

Thank you, Xabi and Carson.
With your help, I was able to improve the annotation with a more appropriate number of predictions.

Best,
Morgan

________________________________
From: Xabier V?zquez-Campos <xvazquezc at gmail.com>
Sent: Wednesday, February 6, 2019 11:33 PM
To: morgan sobol; Maker Mailing List
Subject: Re: [maker-devel] Re-annotation, fewer gene predictions

SNAP is easy to train, works well in fungal genomes and it's explained in Maker's wiki:
http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_WGS_Assembly_and_Annotation_Winter_School_2018#Training_ab_initio_Gene_Predictors

Oh, sorry, I didn't explain myself well. What I was trying to say is that before BUSCO, when we only had CEGMA, we would proceed in a different way to train Augustus as CEGMA wouldn't produce Augustus gene models automatically. I don't mean you to use CEGMA.

This is what I have on my own documentation about how to train Augustus "the old way"
AUGUSTUS? the old way

Alternatively, you can train AUGUSTUS in a more ?manual? way, like when we were using CEGMA. The training starts with the output from the second instance of fathom in the SNAP training section.

cd ${MYGENOME_DIR}/maker/snap1
perl ~/bin/zff2augustus_gbk.pl<http://zff2augustus_gbk.pl> > ${MYGENOME}.train1.gb<http://train1.gb>

zff2augustus_gbk.pl<http://zff2augustus_gbk.pl> generates a GenBank file from export.dna.

The actual training of AUGUSTUS will be through the webAUGUSTUS server.

Before proceed, it is recommended to rename the fasta headers, specially if they contain special characters and/or very long headers. This is the main reason of failure for the jobs submitted to webAUGUSTUS. You can use the simplifyFastaHeaders.pl<http://bioinf.uni-greifswald.de/bioinf/downloads/simplifyFastaHeaders.pl> script for that:

perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_assembly.fasta nameStem ${MYGENOME}_contigs_rename.fasta ${MYGENOME}_contigs.map

perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_transcripts_assembled.fasta nameStem ${MYGENOME}_rna_rename.fasta ${MYGENOME}_rna.map

nameStem is the base name for naming each of the sequences in the multifasta files. Use a value with something appropriate. Use contig and rna for the assembly and RNA-seq files, respectively; or something based on that. For example, ?pgcontig? and ?pgrna? for contigs and RNA from Puccinia graminis
DO NOT give the same nameStem to both fasta files, and don?t use any special character.

We need the following files (minimum):

  *   ${MYGENOME}_assembly.fasta as Genome file
  *   ${MYGENOME}.train1.gb<http://train1.gb> as Training gene structure file

If we also have RNA-seq data:

  *   ${MYGENOME}_assembled_transcripts.fasta as cDNA file

Use ${MYGENOME}_v1 as Species name. We will need to have a different species name in the retraining step. Otherwise when Maker2 is rerun, Maker2 will see the same name and will not rerun AUGUSTUS, even though the species profile is different. So, ${MYGENOME}_v1 just do the job and tracks version.

Once the job is finished, the Species parameter archive (parameters.tar.gz) will contain a folder with the model files for your species. Copy it to the species folder of your AUGUSTUS installation.

Hope this helps

PS: hit reply all so this is logged in Maker's mail list in case anybody else experiences similar issues

On Thu, 7 Feb 2019 at 06:36, morgan sobol <morgan_starr_s at live.com<mailto:morgan_starr_s at live.com>> wrote:
I have not used SNAP or CEGMA, however, I see that CEGMA was discontinued in 2015.
Do you think that will be a problem, or is it still worth using the old version?


________________________________
From: Xabier V?zquez-Campos <xvazquezc at gmail.com<mailto:xvazquezc at gmail.com>>
Sent: Tuesday, February 5, 2019 4:42 PM
To: morgan sobol; Maker Mailing List
Subject: Re: [maker-devel] Re-annotation, fewer gene predictions

Don't you use SNAP? It usually produces quite decent results. And easier to train than any of the other predictors

In any case, the Augustus gene model is way off in both cases
GM doesn't seem bad if your fungus has a rather usual genome... in the first. For the second, it looks bad

I'm not too familiar with the reannotation but I'd rather create the gene models from scratch rather than reuse the ones from the Illumina-only genomes.
Note that assemblies with long-reads, have a higher proportion of repetitive elements that need masking and RepeatMasker only may not be enough. In theory, this shouldn't affect Augustus model if trained through BUSCO as it uses defined conserved markers to create the gene model, but I'm not so sure about GM.

If you trained Augustus with BUSCO, and this is the result, I'd discard the gene model and train it again by the "traditional way", i.e. as it used to be when we only had CEGMA. I had good results just by changing the training method.

Hope it helps,
Xabi


On Wed, 6 Feb 2019 at 02:19, morgan sobol <morgan_starr_s at live.com<mailto:morgan_starr_s at live.com>> wrote:
Thank you, Xabi for the response.
The number of proteins from each source is greatly lower than before.
Previous numbers were 325, 10,899, and 11,243 for augustus, genemark, and maker respectively.
The more recent numbers are 25, 857, 4418 respectively.

So do you think maybe this hints that something is wrong from genemark?

Morgan


________________________________
From: Xabier V?zquez-Campos <xvazquezc at gmail.com<mailto:xvazquezc at gmail.com>>
Sent: Sunday, February 3, 2019 4:43 PM
To: morgan sobol
Cc: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Re-annotation, fewer gene predictions

Hi Morgan,

We had a similar issue with AUGUSTUS underpredicting when using a BUSCO-derived gene model
https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ

Also, check the number of proteins by each individual predictor. If the numbers from one of them are off, you may find a possible source of issues.
We didn't have a very good experience with GM, as it used to overpredict an absurd number of proteins.

Xabi

On Mon, 4 Feb 2019 at 06:15, morgan sobol <morgan_starr_s at live.com<mailto:morgan_starr_s at live.com>> wrote:
Hello,

I previously used Maker to annotate two different fungal genomes that were created using Illumina sequences only. For these genomes, I had over 11,000 genes predicted.
I recently obtained PacBio sequences for the same genomes, so I created two hybrid assemblies. Both assemblies were very familiar in length and completed number of orthologs to the Illumina only assembly, but had much fewer, but longer contigs.

I re-ran Maker using the settings below. For one of my genomes, I got around 11,000 genes predicted again, as expected. However, for the other genome, I am continuously getting ~4,400 predicted genes.

I am asking for help as to how I can determine why I keep getting fewer predicted genes for only one of my genomes, even though I ran them the same?

Thanks,
Morgan S.

maker_opts.log
#-----Genome (these are always required)
genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked #genome sequence (fasta file or$
organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic

#-----Re-annotation Using MAKER Derived GFF3
maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff #MAKER derive$
est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no

#-----EST Evidence (for best results provide a file for at least one)
est= #set of ESTs or assembled mRNA-seq in fasta format
altest= #EST/cDNA sequence file in fasta format from an alternate organism
est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
altest_gff= #aligned ESTs from a closly relate species in GFF3 format

#-----Protein Homology Evidence (for best results provide a file for at least one)
protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta  #protein sequence file in fasta format (i.e. from mutiple oransisms)
protein_gff=  #aligned protein homology evidence from an external GFF3 file

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org= #select a model organism for RepBase masking in RepeatMasker
rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker
repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner
rm_gff= #pre-identified repeat elements from an external GFF3 file
prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no
softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering)

#-----Gene Prediction
snaphmm= #SNAP HMM file
gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file
augustus_species=1368D_uni #Augustus gene prediction species model
fgenesh_par_file= #FGENESH parameter file
pred_gff= #ab-initio predictions from an external GFF3 file
model_gff= #annotated gene models from an external GFF3 file (annotation pass-through)
est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no
trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no

#-----Other Annotation Feature Types (features MAKER doesn't recognize)
other_gff= #extra features to pass-through to final MAKER generated GFF3 file

#-----External Application Behavior Options
alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases
cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI)

#-----MAKER Behavior Options
max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage)
min_contig=1 #skip genome contigs below this length (under 10kb are often useless)

pred_flank=200 #flank for extending evidence clusters sent to gene predictors
pred_stats=1 #report AED and QI statistics for all predictions as well as models
AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
min_protein=0 #require at least this many amino acids in predicted proteins
alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no
always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no
map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no
keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1)

split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments)
single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no
single_length=250 #min length required for single exon ESTs if 'single_exon is enabled'
correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes

tries=2 #number of times to try a contig if there is a failure for some reason
clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no
clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no
TMP= #specify a directory other than the system default temporary directory for temporary files

_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


--
Xabier V?zquez-Campos, PhD
Research Associate
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA


--
Xabier V?zquez-Campos, PhD
Research Associate
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA


--
Xabier V?zquez-Campos, PhD
Research Associate
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20190218/cd5b4c18/attachment.html>

From anthony.bretaudeau at inria.fr  Mon Feb 18 03:53:39 2019
From: anthony.bretaudeau at inria.fr (Anthony Bretaudeau)
Date: Mon, 18 Feb 2019 10:53:39 +0100
Subject: [maker-devel] Does Conda Maker actually work?
In-Reply-To: <CAFOVipPHWZ++FwVdBMDuMx_PTRT2Ep-MZc=iD13ezT1bgrMZwg@mail.gmail.com>
References: <VI1PR06MB5613478CC864D85EB234EDF2B5660@VI1PR06MB5613.eurprd06.prod.outlook.com>
	<0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com>
	<CAFOVipPHWZ++FwVdBMDuMx_PTRT2Ep-MZc=iD13ezT1bgrMZwg@mail.gmail.com>
Message-ID: <3aa1eb97-f8bf-dd61-febf-464ad4b1626c@inria.fr>

An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20190218/d42974d5/attachment.html>

From liorglic at mail.tau.ac.il  Sun Feb 24 06:50:49 2019
From: liorglic at mail.tau.ac.il (Lior Glick)
Date: Sun, 24 Feb 2019 14:50:49 +0200
Subject: [maker-devel] Profiling MAKER runs
Message-ID: <CAOzMDPyHL9tM-DWTBJb=SSMT1KH6FwhArdgqgN-8aVoBthY69g@mail.gmail.com>

Dear MAKER users,
I was wondering if any of you has an idea of a way by which I can profile
my runs. What I mean is I'd like to know how much time was spent on each
step of the analysis - am I spending most of the time masking repeats,
blasting transcripts/proteins, running ab-initio predictors etc. Based on
this information, I might want to adjust my configuration, e.g. maybe I'm
spending a lot of time blasting transcripts, and reducing the number of
input transcripts would reduce run time significantly without having a
major effect on results quality.
As far as I can see, the main run log does not provide such information,
and I'm not sure where else to look. Any ideas or directions could be of
help.

Thanks!
Lior
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20190224/584449c3/attachment.html>

From morgan_starr_s at live.com  Sun Feb  3 12:13:47 2019
From: morgan_starr_s at live.com (morgan sobol)
Date: Sun, 3 Feb 2019 19:13:47 +0000
Subject: [maker-devel] Re-annotation, fewer gene predictions
Message-ID: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>

Hello,

I previously used Maker to annotate two different fungal genomes that were created using Illumina sequences only. For these genomes, I had over 11,000 genes predicted.
I recently obtained PacBio sequences for the same genomes, so I created two hybrid assemblies. Both assemblies were very familiar in length and completed number of orthologs to the Illumina only assembly, but had much fewer, but longer contigs.

I re-ran Maker using the settings below. For one of my genomes, I got around 11,000 genes predicted again, as expected. However, for the other genome, I am continuously getting ~4,400 predicted genes.

I am asking for help as to how I can determine why I keep getting fewer predicted genes for only one of my genomes, even though I ran them the same?

Thanks,
Morgan S.

maker_opts.log
#-----Genome (these are always required)
genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked #genome sequence (fasta file or$
organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic

#-----Re-annotation Using MAKER Derived GFF3
maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff #MAKER derive$
est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no

#-----EST Evidence (for best results provide a file for at least one)
est= #set of ESTs or assembled mRNA-seq in fasta format
altest= #EST/cDNA sequence file in fasta format from an alternate organism
est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
altest_gff= #aligned ESTs from a closly relate species in GFF3 format

#-----Protein Homology Evidence (for best results provide a file for at least one)
protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta  #protein sequence file in fasta format (i.e. from mutiple oransisms)
protein_gff=  #aligned protein homology evidence from an external GFF3 file

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org= #select a model organism for RepBase masking in RepeatMasker
rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker
repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner
rm_gff= #pre-identified repeat elements from an external GFF3 file
prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no
softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering)

#-----Gene Prediction
snaphmm= #SNAP HMM file
gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file
augustus_species=1368D_uni #Augustus gene prediction species model
fgenesh_par_file= #FGENESH parameter file
pred_gff= #ab-initio predictions from an external GFF3 file
model_gff= #annotated gene models from an external GFF3 file (annotation pass-through)
est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no
trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no

#-----Other Annotation Feature Types (features MAKER doesn't recognize)
other_gff= #extra features to pass-through to final MAKER generated GFF3 file

#-----External Application Behavior Options
alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases
cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI)

#-----MAKER Behavior Options
max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage)
min_contig=1 #skip genome contigs below this length (under 10kb are often useless)

pred_flank=200 #flank for extending evidence clusters sent to gene predictors
pred_stats=1 #report AED and QI statistics for all predictions as well as models
AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
min_protein=0 #require at least this many amino acids in predicted proteins
alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no
always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no
map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no
keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1)

split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments)
single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no
single_length=250 #min length required for single exon ESTs if 'single_exon is enabled'
correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes

tries=2 #number of times to try a contig if there is a failure for some reason
clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no
clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no
TMP= #specify a directory other than the system default temporary directory for temporary files

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190203/ce613295/attachment-0001.html>

From xvazquezc at gmail.com  Sun Feb  3 15:43:42 2019
From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=)
Date: Mon, 4 Feb 2019 09:43:42 +1100
Subject: [maker-devel] Re-annotation, fewer gene predictions
In-Reply-To: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>
References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>
Message-ID: <CAL0hg4HevFbPhVLfuLq3WF7iJUFpHKwm0X9q+X_yX5sJsCqKDA@mail.gmail.com>

Hi Morgan,

We had a similar issue with AUGUSTUS underpredicting when using a
BUSCO-derived gene model
https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ

Also, check the number of proteins by each individual predictor. If the
numbers from one of them are off, you may find a possible source of issues.
We didn't have a very good experience with GM, as it used to overpredict an
absurd number of proteins.

Xabi

On Mon, 4 Feb 2019 at 06:15, morgan sobol <morgan_starr_s at live.com> wrote:

> Hello,
>
> I previously used Maker to annotate two different fungal genomes that were
> created using Illumina sequences only. For these genomes, I had over 11,000
> genes predicted.
> I recently obtained PacBio sequences for the same genomes, so I created
> two hybrid assemblies. Both assemblies were very familiar in length and
> completed number of orthologs to the Illumina only assembly, but had much
> fewer, but longer contigs.
>
> I re-ran Maker using the settings below. For one of my genomes, I got
> around 11,000 genes predicted again, as expected. However, for the other
> genome, I am continuously getting ~4,400 predicted genes.
>
> I am asking for help as to how I can determine why I keep getting fewer
> predicted genes for only one of my genomes, even though I ran them the same?
>
> Thanks,
> Morgan S.
>
> maker_opts.log
> #-----Genome (these are always required)
> genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked
> #genome sequence (fasta file or$
> organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic
>
> #-----Re-annotation Using MAKER Derived GFF3
> maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff
> #MAKER derive$
> est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
> altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
> protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
> rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
> model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
> pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
> other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no
>
> #-----EST Evidence (for best results provide a file for at least one)
> est= #set of ESTs or assembled mRNA-seq in fasta format
> altest= #EST/cDNA sequence file in fasta format from an alternate organism
> est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
> altest_gff= #aligned ESTs from a closly relate species in GFF3 format
>
> #-----Protein Homology Evidence (for best results provide a file for at
> least one)
> protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta
> #protein sequence file in fasta format (i.e. from mutiple oransisms)
> protein_gff=  #aligned protein homology evidence from an external GFF3 file
>
> #-----Repeat Masking (leave values blank to skip repeat masking)
> model_org= #select a model organism for RepBase masking in RepeatMasker
> rmlib= #provide an organism specific repeat library in fasta format for
> RepeatMasker
> repeat_protein= #provide a fasta file of transposable element proteins for
> RepeatRunner
> rm_gff= #pre-identified repeat elements from an external GFF3 file
> prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change
> this), 1 = yes, 0 = no
> softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg
> and dust filtering)
>
> #-----Gene Prediction
> snaphmm= #SNAP HMM file
> gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file
> augustus_species=1368D_uni #Augustus gene prediction species model
> fgenesh_par_file= #FGENESH parameter file
> pred_gff= #ab-initio predictions from an external GFF3 file
> model_gff= #annotated gene models from an external GFF3 file (annotation
> pass-through)
> est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
> protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no
> trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
> snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
> unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 =
> yes, 0 = no
>
> #-----Other Annotation Feature Types (features MAKER doesn't recognize)
> other_gff= #extra features to pass-through to final MAKER generated GFF3
> file
>
> #-----External Application Behavior Options
> alt_peptide=C #amino acid used to replace non-standard amino acids in
> BLAST databases
> cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI,
> leave 1 when using MPI)
>
> #-----MAKER Behavior Options
> max_dna_len=100000 #length for dividing up contigs into chunks
> (increases/decreases memory usage)
> min_contig=1 #skip genome contigs below this length (under 10kb are often
> useless)
>
> pred_flank=200 #flank for extending evidence clusters sent to gene
> predictors
> pred_stats=1 #report AED and QI statistics for all predictions as well as
> models
> AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and
> 1)
> min_protein=0 #require at least this many amino acids in predicted proteins
> alt_splice=0 #Take extra steps to try and find alternative splicing, 1 =
> yes, 0 = no
> always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0
> = no
> map_forward=0 #map names and attributes forward from old GFF3 genes, 1 =
> yes, 0 = no
> keep_preds=1 #Concordance threshold to add unsupported gene prediction
> (bound by 0 and 1)
>
> split_hit=10000 #length for the splitting of hits (expected max intron
> size for evidence alignments)
> single_exon=1 #consider single exon EST evidence when generating
> annotations, 1 = yes, 0 = no
> single_length=250 #min length required for single exon ESTs if
> 'single_exon is enabled'
> correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion
> genes
>
> tries=2 #number of times to try a contig if there is a failure for some
> reason
> clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0
> = no
> clean_up=0 #removes theVoid directory with individual analysis files, 1 =
> yes, 0 = no
> TMP= #specify a directory other than the system default temporary
> directory for temporary files
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>


-- 
Xabier V?zquez-Campos, *PhD*
*Research Associate*
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190204/2d94d0d9/attachment-0001.html>

From keith.decker at bayer.com  Mon Feb  4 11:09:35 2019
From: keith.decker at bayer.com (DECKER, KEITH F [AG/1005])
Date: Mon, 4 Feb 2019 18:09:35 +0000
Subject: [maker-devel] MAKER on AWS
Message-ID: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com>

I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be.
I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html
but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use.

So my questions are

1.  Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS?
2.  Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end?  If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate?

Thanks and sorry for the long question
Keith
This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.
Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.
Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so.
If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190204/e803b13e/attachment-0001.html>

From carsonhh at gmail.com  Mon Feb  4 11:31:29 2019
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 4 Feb 2019 11:31:29 -0700
Subject: [maker-devel] MAKER on AWS
In-Reply-To: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com>
References: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com>
Message-ID: <0934DD0D-9431-4454-A278-87E27D44F984@gmail.com>

You can try and stand up a cluster inside AWS, or like you said just start independent instances each with their own piece of the total dataset. There is a tools called fasta_tool inside of maker that makes it easy to split up the dataset into equal sized chunks.

Alternatively, CyVerse has set up an interesting MAKER wrapper (WQ-MAKER) that launches multiple cloud instances for MAKER and handles data chunking for you (they?ve been using XSEDE cloud resources through the NSF)  ?>
http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf <http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf>

Here is an example of an external project using their setup ?> http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/ <http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/>

?Carson


> On Feb 4, 2019, at 11:09 AM, DECKER, KEITH F [AG/1005] <keith.decker at bayer.com> wrote:
> 
> I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be.
> I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html <http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html>
> but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use. 
>  
> So my questions are
>  
> 1.  Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS?
> 2.  Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end?  If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate?
>  
> Thanks and sorry for the long question
> Keith
> 
> 
> This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.
> Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.
> Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so.
> If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190204/839034e2/attachment-0001.html>

From liorglck at gmail.com  Mon Feb  4 02:00:29 2019
From: liorglck at gmail.com (Lior Glick)
Date: Mon, 4 Feb 2019 11:00:29 +0200
Subject: [maker-devel] MAKER not calling RepeatMasker exe indicated in
 maker_exe.ctl
Message-ID: <CAFOVipNgzGd-wLNqz1WGx+mM_8R3KZOtqatq6D+nuNCHboRPXQ@mail.gmail.com>

Dear MAKER users,

I've been using MAKER for a while now, with RepeatMasker installed locally.
By that I mean that I can type 'RepeatMasker' in my terminal and the
software is initiated. Typing 'which RepeatMasker' shows the correct local
path.
I also use this path as value for the maker_exe.ctl parameter
'RepeatMasker'.
Trying to generalize my working environment, I am trying to use a conda env
<https://anaconda.org/bioconda/maker> which is capable of running MAKER.
This env comes with RepeatMasker as well. Once I activate this env, I can
still run RepeatMasker, but it points to a different path. When I run MAKER
within this env, it fails right away with the error message:
ERROR: Could not determine if RepBase is installed
Running the same configuration files locally (i.e. outside the conda env)
results in a successful run.
This leads me to think that MAKER is not actually using the path indicated
in the maker_exe.ctl file, and rather looks for RepeatMasker in $PATH or
something similar. Is that the expected behavior? Any suggestions of how to
overcome this issue?

Thanks and best regards,
Lior
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190204/bd480e61/attachment-0001.html>

From keith.decker at bayer.com  Mon Feb  4 11:39:48 2019
From: keith.decker at bayer.com (DECKER, KEITH F [AG/1005])
Date: Mon, 4 Feb 2019 18:39:48 +0000
Subject: [maker-devel] MAKER on AWS
In-Reply-To: <0934DD0D-9431-4454-A278-87E27D44F984@gmail.com>
References: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com>
	<0934DD0D-9431-4454-A278-87E27D44F984@gmail.com>
Message-ID: <1BAD7C53-AFA5-4A4A-B35B-D760B3D4C28D@monsanto.com>

Thanks,
Do you have metrics on how MAKER performs on annotating a single chromosome on a single machine?  For example, will I see anything close to 16X speed-up using a 16 core machine, and does performance improvement saturate at a certain number of cores?

-Keith

From: Carson Holt <carsonhh at gmail.com>
Date: Monday, February 4, 2019 at 12:33 PM
To: "DECKER, KEITH F [AG/1005]" <keith.decker at bayer.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] MAKER on AWS

You can try and stand up a cluster inside AWS, or like you said just start independent instances each with their own piece of the total dataset. There is a tools called fasta_tool inside of maker that makes it easy to split up the dataset into equal sized chunks.

Alternatively, CyVerse has set up an interesting MAKER wrapper (WQ-MAKER) that launches multiple cloud instances for MAKER and handles data chunking for you (they?ve been using XSEDE cloud resources through the NSF)  ?>
http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf

Here is an example of an external project using their setup ?> http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/

?Carson


On Feb 4, 2019, at 11:09 AM, DECKER, KEITH F [AG/1005] <keith.decker at bayer.com<mailto:keith.decker at bayer.com>> wrote:

I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be.
I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html
but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use.

So my questions are

1.  Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS?
2.  Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end?  If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate?

Thanks and sorry for the long question
Keith


This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.

Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.

Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so.

If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.
Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.
Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so.
If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190204/05ee72b5/attachment-0001.html>

From carsonhh at gmail.com  Mon Feb  4 12:00:00 2019
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 4 Feb 2019 12:00:00 -0700
Subject: [maker-devel] MAKER on AWS
In-Reply-To: <1BAD7C53-AFA5-4A4A-B35B-D760B3D4C28D@monsanto.com>
References: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com>
	<0934DD0D-9431-4454-A278-87E27D44F984@gmail.com>
	<1BAD7C53-AFA5-4A4A-B35B-D760B3D4C28D@monsanto.com>
Message-ID: <EF78A658-7C9E-4F10-AA30-73E97DB30297@gmail.com>

I don?t have cloud performance stats, but I do have cluster performance stats you may be able to somewhat correlate (attached). On a cluster we see nearly linear performance gains until ~100 CPU cores, and the plateau doesn?t fully level out until well after 600 cores (we are hitting IO and networking limits for inter-node communication). So if you are only using a single instance, you can essentially consider it the equivalent of a single real machine which would fall well under 100 CPU cores, and performance growth would be expected to be linear on that instance.

?Carson


> On Feb 4, 2019, at 11:39 AM, DECKER, KEITH F [AG/1005] <keith.decker at bayer.com> wrote:
> 
> Thanks,
> Do you have metrics on how MAKER performs on annotating a single chromosome on a single machine?  For example, will I see anything close to 16X speed-up using a 16 core machine, and does performance improvement saturate at a certain number of cores?
>  
> -Keith
>  
> From: Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>>
> Date: Monday, February 4, 2019 at 12:33 PM
> To: "DECKER, KEITH F [AG/1005]" <keith.decker at bayer.com <mailto:keith.decker at bayer.com>>
> Cc: "maker-devel at yandell-lab.org <mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org <mailto:maker-devel at yandell-lab.org>>
> Subject: Re: [maker-devel] MAKER on AWS
>  
> You can try and stand up a cluster inside AWS, or like you said just start independent instances each with their own piece of the total dataset. There is a tools called fasta_tool inside of maker that makes it easy to split up the dataset into equal sized chunks. 
>  
> Alternatively, CyVerse has set up an interesting MAKER wrapper (WQ-MAKER) that launches multiple cloud instances for MAKER and handles data chunking for you (they?ve been using XSEDE cloud resources through the NSF)  ?>
> http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf <http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf>
>  
> Here is an example of an external project using their setup ?> http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/ <http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/>
>  
> ?Carson
>  
>  
>  
> 
> 
> On Feb 4, 2019, at 11:09 AM, DECKER, KEITH F [AG/1005] <keith.decker at bayer.com <mailto:keith.decker at bayer.com>> wrote:
>  
> I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be.
> I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html <http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html>
> but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use. 
>  
> So my questions are
>  
> 1.  Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS?
> 2.  Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end?  If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate?
>  
> Thanks and sorry for the long question
> Keith
> 
> 
>  
> This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.
> Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.
> Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so.
> If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.
>  
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
>  
> 
> 
> This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.
> Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.
> Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so.
> If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190204/43c5cc9f/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-2.pdf
Type: application/pdf
Size: 41424 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190204/43c5cc9f/attachment-0001.pdf>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190204/43c5cc9f/attachment-0003.html>

From xvazquezc at gmail.com  Tue Feb  5 15:42:40 2019
From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=)
Date: Wed, 6 Feb 2019 09:42:40 +1100
Subject: [maker-devel] Re-annotation, fewer gene predictions
In-Reply-To: <DM5PR14MB129277D10A397B2CBE0DDA08AE6E0@DM5PR14MB1292.namprd14.prod.outlook.com>
References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>
	<CAL0hg4HevFbPhVLfuLq3WF7iJUFpHKwm0X9q+X_yX5sJsCqKDA@mail.gmail.com>
	<DM5PR14MB129277D10A397B2CBE0DDA08AE6E0@DM5PR14MB1292.namprd14.prod.outlook.com>
Message-ID: <CAL0hg4EH=79A7ucKe=ORznXh=7Suu9Q8AEWj7C8Xio82=G4fvw@mail.gmail.com>

Don't you use SNAP? It usually produces quite decent results. And easier to
train than any of the other predictors

In any case, the Augustus gene model is way off in both cases
GM doesn't seem bad if your fungus has a rather usual genome... in the
first. For the second, it looks bad

I'm not too familiar with the reannotation but I'd rather create the gene
models from scratch rather than reuse the ones from the Illumina-only
genomes.
Note that assemblies with long-reads, have a higher proportion of
repetitive elements that need masking and RepeatMasker only may not be
enough. In theory, this shouldn't affect Augustus model if trained through
BUSCO as it uses defined conserved markers to create the gene model, but
I'm not so sure about GM.

If you trained Augustus with BUSCO, and this is the result, I'd discard the
gene model and train it again by the "traditional way", i.e. as it used to
be when we only had CEGMA. I had good results just by changing the training
method.

Hope it helps,
Xabi


On Wed, 6 Feb 2019 at 02:19, morgan sobol <morgan_starr_s at live.com> wrote:

> Thank you, Xabi for the response.
> The number of proteins from each source is greatly lower than before.
> Previous numbers were 325, 10,899, and 11,243 for augustus, genemark, and
> maker respectively.
> The more recent numbers are 25, 857, 4418 respectively.
>
> So do you think maybe this hints that something is wrong from genemark?
>
> Morgan
>
>
> ------------------------------
> *From:* Xabier V?zquez-Campos <xvazquezc at gmail.com>
> *Sent:* Sunday, February 3, 2019 4:43 PM
> *To:* morgan sobol
> *Cc:* maker-devel at yandell-lab.org
> *Subject:* Re: [maker-devel] Re-annotation, fewer gene predictions
>
> Hi Morgan,
>
> We had a similar issue with AUGUSTUS underpredicting when using a
> BUSCO-derived gene model
> https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ
>
> Also, check the number of proteins by each individual predictor. If the
> numbers from one of them are off, you may find a possible source of issues.
> We didn't have a very good experience with GM, as it used to overpredict
> an absurd number of proteins.
>
> Xabi
>
> On Mon, 4 Feb 2019 at 06:15, morgan sobol <morgan_starr_s at live.com> wrote:
>
> Hello,
>
> I previously used Maker to annotate two different fungal genomes that were
> created using Illumina sequences only. For these genomes, I had over 11,000
> genes predicted.
> I recently obtained PacBio sequences for the same genomes, so I created
> two hybrid assemblies. Both assemblies were very familiar in length and
> completed number of orthologs to the Illumina only assembly, but had much
> fewer, but longer contigs.
>
> I re-ran Maker using the settings below. For one of my genomes, I got
> around 11,000 genes predicted again, as expected. However, for the other
> genome, I am continuously getting ~4,400 predicted genes.
>
> I am asking for help as to how I can determine why I keep getting fewer
> predicted genes for only one of my genomes, even though I ran them the same?
>
> Thanks,
> Morgan S.
>
> maker_opts.log
> #-----Genome (these are always required)
> genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked
> #genome sequence (fasta file or$
> organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic
>
> #-----Re-annotation Using MAKER Derived GFF3
> maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff
> #MAKER derive$
> est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
> altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
> protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
> rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
> model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
> pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
> other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no
>
> #-----EST Evidence (for best results provide a file for at least one)
> est= #set of ESTs or assembled mRNA-seq in fasta format
> altest= #EST/cDNA sequence file in fasta format from an alternate organism
> est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
> altest_gff= #aligned ESTs from a closly relate species in GFF3 format
>
> #-----Protein Homology Evidence (for best results provide a file for at
> least one)
> protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta
> #protein sequence file in fasta format (i.e. from mutiple oransisms)
> protein_gff=  #aligned protein homology evidence from an external GFF3 file
>
> #-----Repeat Masking (leave values blank to skip repeat masking)
> model_org= #select a model organism for RepBase masking in RepeatMasker
> rmlib= #provide an organism specific repeat library in fasta format for
> RepeatMasker
> repeat_protein= #provide a fasta file of transposable element proteins for
> RepeatRunner
> rm_gff= #pre-identified repeat elements from an external GFF3 file
> prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change
> this), 1 = yes, 0 = no
> softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg
> and dust filtering)
>
> #-----Gene Prediction
> snaphmm= #SNAP HMM file
> gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file
> augustus_species=1368D_uni #Augustus gene prediction species model
> fgenesh_par_file= #FGENESH parameter file
> pred_gff= #ab-initio predictions from an external GFF3 file
> model_gff= #annotated gene models from an external GFF3 file (annotation
> pass-through)
> est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
> protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no
> trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
> snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
> unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 =
> yes, 0 = no
>
> #-----Other Annotation Feature Types (features MAKER doesn't recognize)
> other_gff= #extra features to pass-through to final MAKER generated GFF3
> file
>
> #-----External Application Behavior Options
> alt_peptide=C #amino acid used to replace non-standard amino acids in
> BLAST databases
> cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI,
> leave 1 when using MPI)
>
> #-----MAKER Behavior Options
> max_dna_len=100000 #length for dividing up contigs into chunks
> (increases/decreases memory usage)
> min_contig=1 #skip genome contigs below this length (under 10kb are often
> useless)
>
> pred_flank=200 #flank for extending evidence clusters sent to gene
> predictors
> pred_stats=1 #report AED and QI statistics for all predictions as well as
> models
> AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and
> 1)
> min_protein=0 #require at least this many amino acids in predicted proteins
> alt_splice=0 #Take extra steps to try and find alternative splicing, 1 =
> yes, 0 = no
> always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0
> = no
> map_forward=0 #map names and attributes forward from old GFF3 genes, 1 =
> yes, 0 = no
> keep_preds=1 #Concordance threshold to add unsupported gene prediction
> (bound by 0 and 1)
>
> split_hit=10000 #length for the splitting of hits (expected max intron
> size for evidence alignments)
> single_exon=1 #consider single exon EST evidence when generating
> annotations, 1 = yes, 0 = no
> single_length=250 #min length required for single exon ESTs if
> 'single_exon is enabled'
> correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion
> genes
>
> tries=2 #number of times to try a contig if there is a failure for some
> reason
> clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0
> = no
> clean_up=0 #removes theVoid directory with individual analysis files, 1 =
> yes, 0 = no
> TMP= #specify a directory other than the system default temporary
> directory for temporary files
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
> --
> Xabier V?zquez-Campos, *PhD*
> *Research Associate*
> NSW Systems Biology Initiative
> School of Biotechnology and Biomolecular Sciences
> The University of New South Wales
> Sydney NSW 2052 AUSTRALIA
>


-- 
Xabier V?zquez-Campos, *PhD*
*Research Associate*
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190206/ed24fbe6/attachment-0001.html>

From xvazquezc at gmail.com  Wed Feb  6 15:33:47 2019
From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=)
Date: Thu, 7 Feb 2019 09:33:47 +1100
Subject: [maker-devel] Re-annotation, fewer gene predictions
In-Reply-To: <DM5PR14MB1292FEA9F662D408FEBB3D21AE6F0@DM5PR14MB1292.namprd14.prod.outlook.com>
References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>
	<CAL0hg4HevFbPhVLfuLq3WF7iJUFpHKwm0X9q+X_yX5sJsCqKDA@mail.gmail.com>
	<DM5PR14MB129277D10A397B2CBE0DDA08AE6E0@DM5PR14MB1292.namprd14.prod.outlook.com>
	<CAL0hg4EH=79A7ucKe=ORznXh=7Suu9Q8AEWj7C8Xio82=G4fvw@mail.gmail.com>
	<DM5PR14MB1292FEA9F662D408FEBB3D21AE6F0@DM5PR14MB1292.namprd14.prod.outlook.com>
Message-ID: <CAL0hg4HG0n1+kw4PpFL_LG66nE+Sdd1fzX2Atn5+o+KryVCtug@mail.gmail.com>

 SNAP is easy to train, works well in fungal genomes and it's explained in
Maker's wiki:
http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_WGS_Assembly_and_Annotation_Winter_School_2018#Training_ab_initio_Gene_Predictors

Oh, sorry, I didn't explain myself well. What I was trying to say is that
before BUSCO, when we only had CEGMA, we would proceed in a different way
to train Augustus as CEGMA wouldn't produce Augustus gene models
automatically. I don't mean you to use CEGMA.

This is what I have on my own documentation about how to train Augustus
"the old way"

> AUGUSTUS? the old way
>
> Alternatively, you can train AUGUSTUS in a more ?manual? way, like when we
> were using CEGMA. The training starts with the output from the second
> instance of fathom in the SNAP training section.
>
> cd ${MYGENOME_DIR}/maker/snap1
> perl ~/bin/zff2augustus_gbk.pl > ${MYGENOME}.train1.gb
>
> zff2augustus_gbk.pl generates a GenBank file from export.dna.
>
> The actual training of AUGUSTUS will be through the *webAUGUSTUS server*.
>
> Before proceed, it is recommended to rename the fasta headers, specially
> if they contain special characters and/or very long headers. This is the
> main reason of failure for the jobs submitted to webAUGUSTUS. You can use
> the simplifyFastaHeaders.pl
> <http://bioinf.uni-greifswald.de/bioinf/downloads/simplifyFastaHeaders.pl>
> script for that:
>
> perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_assembly.fasta nameStem ${MYGENOME}_contigs_rename.fasta ${MYGENOME}_contigs.map
>
> perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_transcripts_assembled.fasta nameStem ${MYGENOME}_rna_rename.fasta ${MYGENOME}_rna.map
>
> nameStem is the base name for naming each of the sequences in the
> multifasta files. Use a value with something appropriate. Use *contig*
> and *rna* for the assembly and RNA-seq files, respectively; or something
> based on that. For example, ?pgcontig? and ?pgrna? for contigs and RNA from *Puccinia
> graminis*
> *DO NOT* give the same nameStem to both fasta files, and don?t use any
> special character.
>
> We need the following files (minimum):
>
>    - ${MYGENOME}_assembly.fasta as *Genome file*
>    - ${MYGENOME}.train1.gb as *Training gene structure file*
>
> If we also have RNA-seq data:
>
>    - ${MYGENOME}_assembled_transcripts.fasta as *cDNA file*
>
> Use ${MYGENOME}_v1 as *Species name*. We will need to have a different
> species name in the retraining step. Otherwise when Maker2 is rerun, Maker2
> will see the same name and will not rerun AUGUSTUS, even though the species
> profile is different. So, ${MYGENOME}_v1 just do the job and tracks
> version.
>
> Once the job is finished, the *Species parameter archive* (
> parameters.tar.gz) will contain a folder with the model files for your
> species. Copy it to the species folder of your AUGUSTUS installation.
>
Hope this helps

PS: hit reply all so this is logged in Maker's mail list in case anybody
else experiences similar issues

On Thu, 7 Feb 2019 at 06:36, morgan sobol <morgan_starr_s at live.com> wrote:

> I have not used SNAP or CEGMA, however, I see that CEGMA was discontinued
> in 2015.
> Do you think that will be a problem, or is it still worth using the old
> version?
>
>
> ------------------------------
> *From:* Xabier V?zquez-Campos <xvazquezc at gmail.com>
> *Sent:* Tuesday, February 5, 2019 4:42 PM
> *To:* morgan sobol; Maker Mailing List
> *Subject:* Re: [maker-devel] Re-annotation, fewer gene predictions
>
> Don't you use SNAP? It usually produces quite decent results. And easier
> to train than any of the other predictors
>
> In any case, the Augustus gene model is way off in both cases
> GM doesn't seem bad if your fungus has a rather usual genome... in the
> first. For the second, it looks bad
>
> I'm not too familiar with the reannotation but I'd rather create the gene
> models from scratch rather than reuse the ones from the Illumina-only
> genomes.
> Note that assemblies with long-reads, have a higher proportion of
> repetitive elements that need masking and RepeatMasker only may not be
> enough. In theory, this shouldn't affect Augustus model if trained through
> BUSCO as it uses defined conserved markers to create the gene model, but
> I'm not so sure about GM.
>
> If you trained Augustus with BUSCO, and this is the result, I'd discard
> the gene model and train it again by the "traditional way", i.e. as it used
> to be when we only had CEGMA. I had good results just by changing the
> training method.
>
> Hope it helps,
> Xabi
>
>
>
>
> On Wed, 6 Feb 2019 at 02:19, morgan sobol <morgan_starr_s at live.com> wrote:
>
> Thank you, Xabi for the response.
> The number of proteins from each source is greatly lower than before.
> Previous numbers were 325, 10,899, and 11,243 for augustus, genemark, and
> maker respectively.
> The more recent numbers are 25, 857, 4418 respectively.
>
> So do you think maybe this hints that something is wrong from genemark?
>
> Morgan
>
>
> ------------------------------
> *From:* Xabier V?zquez-Campos <xvazquezc at gmail.com>
> *Sent:* Sunday, February 3, 2019 4:43 PM
> *To:* morgan sobol
> *Cc:* maker-devel at yandell-lab.org
> *Subject:* Re: [maker-devel] Re-annotation, fewer gene predictions
>
> Hi Morgan,
>
> We had a similar issue with AUGUSTUS underpredicting when using a
> BUSCO-derived gene model
> https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ
>
> Also, check the number of proteins by each individual predictor. If the
> numbers from one of them are off, you may find a possible source of issues.
> We didn't have a very good experience with GM, as it used to overpredict
> an absurd number of proteins.
>
> Xabi
>
> On Mon, 4 Feb 2019 at 06:15, morgan sobol <morgan_starr_s at live.com> wrote:
>
> Hello,
>
> I previously used Maker to annotate two different fungal genomes that were
> created using Illumina sequences only. For these genomes, I had over 11,000
> genes predicted.
> I recently obtained PacBio sequences for the same genomes, so I created
> two hybrid assemblies. Both assemblies were very familiar in length and
> completed number of orthologs to the Illumina only assembly, but had much
> fewer, but longer contigs.
>
> I re-ran Maker using the settings below. For one of my genomes, I got
> around 11,000 genes predicted again, as expected. However, for the other
> genome, I am continuously getting ~4,400 predicted genes.
>
> I am asking for help as to how I can determine why I keep getting fewer
> predicted genes for only one of my genomes, even though I ran them the same?
>
> Thanks,
> Morgan S.
>
> maker_opts.log
> #-----Genome (these are always required)
> genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked
> #genome sequence (fasta file or$
> organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic
>
> #-----Re-annotation Using MAKER Derived GFF3
> maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff
> #MAKER derive$
> est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
> altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
> protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
> rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
> model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
> pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
> other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no
>
> #-----EST Evidence (for best results provide a file for at least one)
> est= #set of ESTs or assembled mRNA-seq in fasta format
> altest= #EST/cDNA sequence file in fasta format from an alternate organism
> est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
> altest_gff= #aligned ESTs from a closly relate species in GFF3 format
>
> #-----Protein Homology Evidence (for best results provide a file for at
> least one)
> protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta
> #protein sequence file in fasta format (i.e. from mutiple oransisms)
> protein_gff=  #aligned protein homology evidence from an external GFF3 file
>
> #-----Repeat Masking (leave values blank to skip repeat masking)
> model_org= #select a model organism for RepBase masking in RepeatMasker
> rmlib= #provide an organism specific repeat library in fasta format for
> RepeatMasker
> repeat_protein= #provide a fasta file of transposable element proteins for
> RepeatRunner
> rm_gff= #pre-identified repeat elements from an external GFF3 file
> prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change
> this), 1 = yes, 0 = no
> softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg
> and dust filtering)
>
> #-----Gene Prediction
> snaphmm= #SNAP HMM file
> gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file
> augustus_species=1368D_uni #Augustus gene prediction species model
> fgenesh_par_file= #FGENESH parameter file
> pred_gff= #ab-initio predictions from an external GFF3 file
> model_gff= #annotated gene models from an external GFF3 file (annotation
> pass-through)
> est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
> protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no
> trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
> snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
> unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 =
> yes, 0 = no
>
> #-----Other Annotation Feature Types (features MAKER doesn't recognize)
> other_gff= #extra features to pass-through to final MAKER generated GFF3
> file
>
> #-----External Application Behavior Options
> alt_peptide=C #amino acid used to replace non-standard amino acids in
> BLAST databases
> cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI,
> leave 1 when using MPI)
>
> #-----MAKER Behavior Options
> max_dna_len=100000 #length for dividing up contigs into chunks
> (increases/decreases memory usage)
> min_contig=1 #skip genome contigs below this length (under 10kb are often
> useless)
>
> pred_flank=200 #flank for extending evidence clusters sent to gene
> predictors
> pred_stats=1 #report AED and QI statistics for all predictions as well as
> models
> AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and
> 1)
> min_protein=0 #require at least this many amino acids in predicted proteins
> alt_splice=0 #Take extra steps to try and find alternative splicing, 1 =
> yes, 0 = no
> always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0
> = no
> map_forward=0 #map names and attributes forward from old GFF3 genes, 1 =
> yes, 0 = no
> keep_preds=1 #Concordance threshold to add unsupported gene prediction
> (bound by 0 and 1)
>
> split_hit=10000 #length for the splitting of hits (expected max intron
> size for evidence alignments)
> single_exon=1 #consider single exon EST evidence when generating
> annotations, 1 = yes, 0 = no
> single_length=250 #min length required for single exon ESTs if
> 'single_exon is enabled'
> correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion
> genes
>
> tries=2 #number of times to try a contig if there is a failure for some
> reason
> clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0
> = no
> clean_up=0 #removes theVoid directory with individual analysis files, 1 =
> yes, 0 = no
> TMP= #specify a directory other than the system default temporary
> directory for temporary files
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
> --
> Xabier V?zquez-Campos, *PhD*
> *Research Associate*
> NSW Systems Biology Initiative
> School of Biotechnology and Biomolecular Sciences
> The University of New South Wales
> Sydney NSW 2052 AUSTRALIA
>
>
>
> --
> Xabier V?zquez-Campos, *PhD*
> *Research Associate*
> NSW Systems Biology Initiative
> School of Biotechnology and Biomolecular Sciences
> The University of New South Wales
> Sydney NSW 2052 AUSTRALIA
>


-- 
Xabier V?zquez-Campos, *PhD*
*Research Associate*
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190207/e334d07a/attachment-0001.html>

From liorglic at mail.tau.ac.il  Mon Feb 11 07:04:16 2019
From: liorglic at mail.tau.ac.il (Lior Glick)
Date: Mon, 11 Feb 2019 16:04:16 +0200
Subject: [maker-devel] MAKER not calling RepeatMasker exe indicated in
 maker_exe.ctl
Message-ID: <CAOzMDPxUf8a9orgsmbJ8QDdq4=OoKL_AkjVbsbPcGGm8z6ufXg@mail.gmail.com>

Dear MAKER users,

I've been using MAKER for a while now, with RepeatMasker installed locally.
By that I mean that I can type 'RepeatMasker' in my terminal and the
software is initiated. Typing 'which RepeatMasker' shows the correct local
path.
I also use this path as value for the maker_exe.ctl parameter
'RepeatMasker'.
Trying to generalize my working environment, I am trying to use a conda env
<https://anaconda.org/bioconda/maker> which is capable of running MAKER.
This env comes with RepeatMasker as well. Once I activate this env, I can
still run RepeatMasker, but it points to a different path. When I run MAKER
within this env, it fails right away with the error message:
ERROR: Could not determine if RepBase is installed
Running the same configuration files locally (i.e. outside the conda env)
results in a successful run.
This leads me to think that MAKER is not actually using the path indicated
in the maker_exe.ctl file, and rather looks for RepeatMasker in $PATH or
something similar. Is that the expected behavior? Any suggestions of how to
overcome this issue?

Thanks and best regards,
Lior
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190211/2c8039fa/attachment-0001.html>

From liorglic at mail.tau.ac.il  Mon Feb 11 07:12:25 2019
From: liorglic at mail.tau.ac.il (Lior Glick)
Date: Mon, 11 Feb 2019 16:12:25 +0200
Subject: [maker-devel] Unknown (X) amino acids in predicted proteins
Message-ID: <CAOzMDPwAC-KnF_h__kOUM_s5nziOHmrGq8ika9Hfb40wny3_xQ@mail.gmail.com>

Dear MAKER users,

After completing a MAKER run, I looked at the protein fasta files that
MAKER outputs and noticed that a small fraction of the sequences include X
characters, indicating unknown amino acids. I was wondering how such
sequences are obtained, I mean how come there are unknown amino acids in
the prediction? Is this an indication of low-quality predictions?
Is there any documentation regarding the procedure that generates the
protein sequences?

Thanks a lot,
Lior
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190211/55a59fcd/attachment-0001.html>

From kapeelc at gmail.com  Thu Feb  7 12:43:47 2019
From: kapeelc at gmail.com (Kapeel Chougule)
Date: Thu, 7 Feb 2019 14:43:47 -0500
Subject: [maker-devel] MAKER v3 Fgenesh ERROR
Message-ID: <CA+DOtefuUEc5_fFh7j2ykb4yBKmtEp1vgt0Pea-RF+7GCqr9ig@mail.gmail.com>

Hi, Carson

I have been getting this error with fgenesh tool within MAKER. It runs ok
with most of the assembly contigs but seems to fail on one contig or part
of the contig with the below error

Widget::fgenesh:
/mnt/grid/ware/hpc/home/data/mcampbel/applications/maker/bin/../lib/Widget/fgenesh/fgenesh_wrap
/mnt/grid/ware/hpc_norepl/data/data/programs/fgenesh_v8/fgenesh_suite_v8.0.0a/fgenesh
/sonas-hs/ware/hpc_norepl/data/programs/fgenesh_v8/fgenesh_suite_v8.0.0a/Zeamays.mpar.dat.new
/tmp/uge/53139300.1.primary.q/maker_j3ttxX/6/6_1.600610-613023.Zeamays.mpar.dat.new.auto_annotator.fgenesh.fasta
-exon_table:/tmp/uge/53139300.1.primary.q/maker_j3ttxX/6/6_1.600610-613023.Zeamays.mpar.dat.new.auto_annotator.xdef.fgenesh
>
/tmp/uge/53139300.1.primary.q/maker_j3ttxX/6/6_1.600610-613023.Zeamays.mpar.dat.new.auto_annotator.fgenesh
#-------------------------------#
 ...processing 9 of 24
 ...processing 8 of 28
 ...processing 10 of 24
 ...processing 9 of 28
 ...processing 11 of 24
 ...processing 10 of 28
 ...processing 12 of 24
 ...processing 11 of 28
deleted:0 genes
ERROR: FgenesH failed
--> rank=14, hostname=bnbcompute50
ERROR: Failed while annotating transcripts
ERROR: Chunk failed at level:1, tier_type:4
FAILED CONTIG:Super-Scaffold_14.2_contig2

I updated the perl module fgenesh.pm as suggested in the previous threads.
Attached are the  maker_opts.ctl and STDERR log file.

Thanks

Kapeel


-- 


*Kapeel ChouguleComputational Scientist Developer II*


*One Bungtown Road Cold Spring Harbor, NY 11724http://www.warelab.org/
<http://www.warelab.org/>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190207/b825acee/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: maker_opts.ctl
Type: application/octet-stream
Size: 5420 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190207/b825acee/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stderr.log
Type: application/octet-stream
Size: 10012917 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190207/b825acee/attachment-0003.obj>

From fatih.sarigoel at durham.ac.uk  Wed Feb 13 05:20:40 2019
From: fatih.sarigoel at durham.ac.uk (SARIGOEL, FATIH)
Date: Wed, 13 Feb 2019 12:20:40 +0000
Subject: [maker-devel] Does Conda Maker actually work?
Message-ID: <VI1PR06MB5613478CC864D85EB234EDF2B5660@VI1PR06MB5613.eurprd06.prod.outlook.com>

Greetings,
I notice that you never mention conda installation on your website, so I am curious if the conda version is actually supposed to be working fine or not; as for me it didn't.
I created a new conda environment and installed Maker (tried this with both installation options)
When I run the example files, I get this error:

"make: *** [Makefile:330: IndexedBase_14e0.o] Error 127
A problem was encountered while attempting to compile and install your Inline
C code. The command that failed was:
  "make > out.make 2>&1" with error code 2"

My conda environment is here
/fast_new/work/users/fsarigo_m/miniconda3
I don't understand why the program is trying to look here:
/home/conda
which does not exist

Also begins with a "possible precedence issue"

Thanks for your help in advance!
Fatih

+++++

Here is the full log until the end of the contig:

(MakerX) [fsarigo_m at med0223 MAKER]$ maker
Possible precedence issue with control flow operator at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 845.
STATUS: Parsing control files...
STATUS: Processing and indexing input FASTA files...
STATUS: Setting up database for any GFF3 input...
A data structure will be created for you at:
/fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_datastore

To access files for individual sequences use the datastore index:
/fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_master_datastore_index.log

STATUS: Now running MAKER...
examining contents of the fasta file and run log


--Next Contig--

Processing run.log file...
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: contig-dpp-500-500
Length: 32156
#---------------------------------------------------------------------


Running Mkbootstrap for IndexedBase_14e0 ()
chmod 644 "IndexedBase_14e0.bs"
"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644
"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp"  -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap"   IndexedBase_14e0.xs > IndexedBase_14e0.xsc
mv IndexedBase_14e0.xsc IndexedBase_14e0.c
/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c  -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2   -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE"   IndexedBase_14e0.c
/bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory
make: *** [Makefile:330: IndexedBase_14e0.o] Error 127

A problem was encountered while attempting to compile and install your Inline
C code. The command that failed was:
  "make > out.make 2>&1" with error code 2

The build directory was:
/fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0

To debug the problem, cd to the build directory, and inspect the output files.

Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin'
 at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275.
--> rank=NA, hostname=med0223
...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869.
--> rank=NA, hostname=med0223
 at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 38.
Error::_throw_Error_Simple(HASH(0x564b40c78870)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 306
Error::subs::run_clauses(HASH(0x564b40688970), "Running Mkbootstrap for IndexedBase_14e0 ()\x{a}chmod 644 \"Indexe"..., undef, ARRAY(0x564b40673ad0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 426
Error::subs::try(CODE(0x564b406899b8), HASH(0x564b40688970)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/FastaSeq.pm line 95
FastaSeq::seq(FastaSeq=HASH(0x564b4068a7f0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 478
Process::MpiChunk::_go(Process::MpiChunk=HASH(0x564b40673c08), "run", HASH(0x564b40673c80), 0, 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 341
Process::MpiChunk::run(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 357
Process::MpiChunk::run_all(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiTiers.pm line 287
Process::MpiTiers::run_all(Process::MpiTiers=HASH(0x564b4053f9f0), 0) called at /fast/users/fsarigo_m/miniconda3/envs/MakerX/bin/maker line 683
Running Mkbootstrap for IndexedBase_14e0 ()
chmod 644 "IndexedBase_14e0.bs"
"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644
"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp"  -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap"   IndexedBase_14e0.xs > IndexedBase_14e0.xsc
mv IndexedBase_14e0.xsc IndexedBase_14e0.c
/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c  -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2   -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE"   IndexedBase_14e0.c
/bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory
make: *** [Makefile:330: IndexedBase_14e0.o] Error 127

A problem was encountered while attempting to compile and install your Inline
C code. The command that failed was:
  "make > out.make 2>&1" with error code 2

The build directory was:
/fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0

To debug the problem, cd to the build directory, and inspect the output files.

Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin'
 at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275.
--> rank=NA, hostname=med0223
...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869.
--> rank=NA, hostname=med0223
--> rank=NA, hostname=med0223
--> rank=NA, hostname=med0223
ERROR: Failed while examining contents of the fasta file and run log
ERROR: Chunk failed at level:0, tier_type:0
FAILED CONTIG:contig-dpp-500-500

examining contents of the fasta file and run log


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190213/5e5ba244/attachment-0001.html>

From carsonhh at gmail.com  Wed Feb 13 07:51:44 2019
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 13 Feb 2019 07:51:44 -0700
Subject: [maker-devel] Does Conda Maker actually work?
In-Reply-To: <VI1PR06MB5613478CC864D85EB234EDF2B5660@VI1PR06MB5613.eurprd06.prod.outlook.com>
References: <VI1PR06MB5613478CC864D85EB234EDF2B5660@VI1PR06MB5613.eurprd06.prod.outlook.com>
Message-ID: <0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com>

The conda recipe was produced by another group. I do not currently recommend using it because I have seen a number of issues pop up on the list based on people attempting to install MAKER via conda.  I know there is at least an issue with the conda RepeatMasker install, and there may be others. The specific failure you show is from Bio::DB::IndexedBase trying to compile an Inline::C function. It may be that conda is installing an older BioPerl where this issue still exists ?> https://github.com/bioperl/bioperl-live/issues/215 <https://github.com/bioperl/bioperl-live/issues/215>

Or it may be that there is a new related issue (I?ve seen a handful of other examples that seem to relate back to Bio::DB::IndexedBase) ?> https://github.com/bioperl/bioperl-live/issues/305 <https://github.com/bioperl/bioperl-live/issues/305>

Try installing MAKER without conda (make sure to remove any components that are in conda first to avoid conflicts).

?Carson


> On Feb 13, 2019, at 5:20 AM, SARIGOEL, FATIH <fatih.sarigoel at durham.ac.uk> wrote:
> 
> Greetings,
> I notice that you never mention conda installation on your website, so I am curious if the conda version is actually supposed to be working fine or not; as for me it didn't.
> I created a new conda environment and installed Maker (tried this with both installation options)
> When I run the example files, I get this error:
> 
> "make: *** [Makefile:330: IndexedBase_14e0.o] Error 127
> A problem was encountered while attempting to compile and install your Inline
> C code. The command that failed was:
>   "make > out.make 2>&1" with error code 2"
> 
> My conda environment is here
> /fast_new/work/users/fsarigo_m/miniconda3
> I don't understand why the program is trying to look here:
> /home/conda
> which does not exist
> 
> Also begins with a "possible precedence issue"
> 
> Thanks for your help in advance!
> Fatih
> 
> +++++
> 
> Here is the full log until the end of the contig:
> 
> (MakerX) [fsarigo_m at med0223 MAKER]$ maker
> Possible precedence issue with control flow operator at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 845.
> STATUS: Parsing control files...
> STATUS: Processing and indexing input FASTA files...
> STATUS: Setting up database for any GFF3 input...
> A data structure will be created for you at:
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_datastore
> 
> To access files for individual sequences use the datastore index:
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_master_datastore_index.log
> 
> STATUS: Now running MAKER...
> examining contents of the fasta file and run log
> 
> 
> 
> --Next Contig--
> 
> Processing run.log file...
> #---------------------------------------------------------------------
> Now starting the contig!!
> SeqID: contig-dpp-500-500
> Length: 32156
> #---------------------------------------------------------------------
> 
> 
> Running Mkbootstrap for IndexedBase_14e0 ()
> chmod 644 "IndexedBase_14e0.bs"
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp"  -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap"   IndexedBase_14e0.xs > IndexedBase_14e0.xsc
> mv IndexedBase_14e0.xsc IndexedBase_14e0.c
> /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c  -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2   -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE"   IndexedBase_14e0.c
> /bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory
> make: *** [Makefile:330: IndexedBase_14e0.o] Error 127
> 
> A problem was encountered while attempting to compile and install your Inline
> C code. The command that failed was:
>   "make > out.make 2>&1" with error code 2
> 
> The build directory was:
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0
> 
> To debug the problem, cd to the build directory, and inspect the output files.
> 
> Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin'
>  at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275.
> --> rank=NA, hostname=med0223
> ...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869.
> --> rank=NA, hostname=med0223
>  at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 38.
> Error::_throw_Error_Simple(HASH(0x564b40c78870)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 306
> Error::subs::run_clauses(HASH(0x564b40688970), "Running Mkbootstrap for IndexedBase_14e0 ()\x{a}chmod 644 \"Indexe"..., undef, ARRAY(0x564b40673ad0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 426
> Error::subs::try(CODE(0x564b406899b8), HASH(0x564b40688970)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/FastaSeq.pm line 95
> FastaSeq::seq(FastaSeq=HASH(0x564b4068a7f0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 478
> Process::MpiChunk::_go(Process::MpiChunk=HASH(0x564b40673c08), "run", HASH(0x564b40673c80), 0, 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 341
> Process::MpiChunk::run(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 357
> Process::MpiChunk::run_all(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiTiers.pm line 287
> Process::MpiTiers::run_all(Process::MpiTiers=HASH(0x564b4053f9f0), 0) called at /fast/users/fsarigo_m/miniconda3/envs/MakerX/bin/maker line 683
> Running Mkbootstrap for IndexedBase_14e0 ()
> chmod 644 "IndexedBase_14e0.bs"
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp"  -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap"   IndexedBase_14e0.xs > IndexedBase_14e0.xsc
> mv IndexedBase_14e0.xsc IndexedBase_14e0.c
> /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c  -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2   -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE"   IndexedBase_14e0.c
> /bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory
> make: *** [Makefile:330: IndexedBase_14e0.o] Error 127
> 
> A problem was encountered while attempting to compile and install your Inline
> C code. The command that failed was:
>   "make > out.make 2>&1" with error code 2
> 
> The build directory was:
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0
> 
> To debug the problem, cd to the build directory, and inspect the output files.
> 
> Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin'
>  at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275.
> --> rank=NA, hostname=med0223
> ...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869.
> --> rank=NA, hostname=med0223
> --> rank=NA, hostname=med0223
> --> rank=NA, hostname=med0223
> ERROR: Failed while examining contents of the fasta file and run log
> ERROR: Chunk failed at level:0, tier_type:0
> FAILED CONTIG:contig-dpp-500-500
> 
> examining contents of the fasta file and run log
> 
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190213/033ff22a/attachment-0001.html>

From carsonhh at gmail.com  Wed Feb 13 10:14:13 2019
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 13 Feb 2019 10:14:13 -0700
Subject: [maker-devel] MAKER not calling RepeatMasker exe indicated in
 maker_exe.ctl
In-Reply-To: <CAFOVipNgzGd-wLNqz1WGx+mM_8R3KZOtqatq6D+nuNCHboRPXQ@mail.gmail.com>
References: <CAFOVipNgzGd-wLNqz1WGx+mM_8R3KZOtqatq6D+nuNCHboRPXQ@mail.gmail.com>
Message-ID: <6AFF11A9-9860-4047-A337-4B974C6C0F30@gmail.com>

The conda installation of RepeatMasker runs oddly. It does not appear to run the ./configure script during setup, and is missing files inside the repeat library as a result.

--Carson


> On Feb 4, 2019, at 2:00 AM, Lior Glick <liorglck at gmail.com> wrote:
> 
> Dear MAKER users,
> 
> I've been using MAKER for a while now, with RepeatMasker installed locally. By that I mean that I can type 'RepeatMasker' in my terminal and the software is initiated. Typing 'which RepeatMasker' shows the correct local path.
> I also use this path as value for the maker_exe.ctl parameter 'RepeatMasker'.
> Trying to generalize my working environment, I am trying to use a conda env <https://anaconda.org/bioconda/maker> which is capable of running MAKER. This env comes with RepeatMasker as well. Once I activate this env, I can still run RepeatMasker, but it points to a different path. When I run MAKER within this env, it fails right away with the error message:
> ERROR: Could not determine if RepBase is installed
> Running the same configuration files locally (i.e. outside the conda env) results in a successful run.
> This leads me to think that MAKER is not actually using the path indicated in the maker_exe.ctl file, and rather looks for RepeatMasker in $PATH or something similar. Is that the expected behavior? Any suggestions of how to overcome this issue?
> 
> Thanks and best regards,
> Lior
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190213/204470fd/attachment-0001.html>

From carsonhh at gmail.com  Wed Feb 13 10:18:44 2019
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 13 Feb 2019 10:18:44 -0700
Subject: [maker-devel] Unknown (X) amino acids in predicted proteins
In-Reply-To: <CAOzMDPwAC-KnF_h__kOUM_s5nziOHmrGq8ika9Hfb40wny3_xQ@mail.gmail.com>
References: <CAOzMDPwAC-KnF_h__kOUM_s5nziOHmrGq8ika9Hfb40wny3_xQ@mail.gmail.com>
Message-ID: <1472E55C-62CB-4A73-B45D-C4BEF3E014B7@gmail.com>

If you use GFF3 as input, or use est2genome or protein2genome in your final run, you may have ?N? characters from the assembly as part of your CDS (?N? is the ambiguity code for DNA which will result in an ?X? when translated which is the ambiguity code for amino acids). Augustus will do internal gymnastics and completely splice out exons containing N?s to try and never have this issue, but may not always be able to. It?s an indication of genome assembly issues.

--Carson


> On Feb 11, 2019, at 7:12 AM, Lior Glick <liorglic at mail.tau.ac.il> wrote:
> 
> Dear MAKER users,
> 
> After completing a MAKER run, I looked at the protein fasta files that MAKER outputs and noticed that a small fraction of the sequences include X characters, indicating unknown amino acids. I was wondering how such sequences are obtained, I mean how come there are unknown amino acids in the prediction? Is this an indication of low-quality predictions?
> Is there any documentation regarding the procedure that generates the protein sequences?
> 
> Thanks a lot,
> Lior
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Wed Feb 13 10:24:01 2019
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 13 Feb 2019 10:24:01 -0700
Subject: [maker-devel] Re-annotation, fewer gene predictions
In-Reply-To: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>
References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>
Message-ID: <D33A2A92-BFCA-4493-A66E-99C567954AD2@gmail.com>

One thing you can also do is use old models as protein= input and run the protein2genome option just to see where things align. You may find that not all old models are recoverable in the new assembly. Fewer genes in the new assembly may mean redundant/duplicate contigs were collapse and split contigs were joined resulting in multiple gene fragments becoming a unified single model. Make sure to always review contigs in a browser to see how models and evidence correlate.

?Carson


> On Feb 3, 2019, at 12:13 PM, morgan sobol <morgan_starr_s at live.com> wrote:
> 
> Hello, 
> 
> I previously used Maker to annotate two different fungal genomes that were created using Illumina sequences only. For these genomes, I had over 11,000 genes predicted. 
> I recently obtained PacBio sequences for the same genomes, so I created two hybrid assemblies. Both assemblies were very familiar in length and completed number of orthologs to the Illumina only assembly, but had much fewer, but longer contigs. 
> 
> I re-ran Maker using the settings below. For one of my genomes, I got around 11,000 genes predicted again, as expected. However, for the other genome, I am continuously getting ~4,400 predicted genes. 
> 
> I am asking for help as to how I can determine why I keep getting fewer predicted genes for only one of my genomes, even though I ran them the same?
> 
> Thanks,
> Morgan S. 
> 
> maker_opts.log
> #-----Genome (these are always required)
> genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked #genome sequence (fasta file or$
> organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic
> 
> #-----Re-annotation Using MAKER Derived GFF3
> maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff #MAKER derive$
> est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
> altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
> protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
> rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
> model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
> pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
> other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no
> 
> #-----EST Evidence (for best results provide a file for at least one)
> est= #set of ESTs or assembled mRNA-seq in fasta format
> altest= #EST/cDNA sequence file in fasta format from an alternate organism
> est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
> altest_gff= #aligned ESTs from a closly relate species in GFF3 format
> 
> #-----Protein Homology Evidence (for best results provide a file for at least one)
> protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta  #protein sequence file in fasta format (i.e. from mutiple oransisms)
> protein_gff=  #aligned protein homology evidence from an external GFF3 file
> 
> #-----Repeat Masking (leave values blank to skip repeat masking)
> model_org= #select a model organism for RepBase masking in RepeatMasker
> rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker
> repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner
> rm_gff= #pre-identified repeat elements from an external GFF3 file
> prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no
> softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering)
> 
> #-----Gene Prediction
> snaphmm= #SNAP HMM file
> gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file
> augustus_species=1368D_uni #Augustus gene prediction species model
> fgenesh_par_file= #FGENESH parameter file
> pred_gff= #ab-initio predictions from an external GFF3 file
> model_gff= #annotated gene models from an external GFF3 file (annotation pass-through)
> est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
> protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no
> trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
> snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
> unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no
> 
> #-----Other Annotation Feature Types (features MAKER doesn't recognize)
> other_gff= #extra features to pass-through to final MAKER generated GFF3 file
> 
> #-----External Application Behavior Options
> alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases
> cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI)
> 
> #-----MAKER Behavior Options
> max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage)
> min_contig=1 #skip genome contigs below this length (under 10kb are often useless)
> 
> pred_flank=200 #flank for extending evidence clusters sent to gene predictors
> pred_stats=1 #report AED and QI statistics for all predictions as well as models
> AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
> min_protein=0 #require at least this many amino acids in predicted proteins
> alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no
> always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no
> map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no
> keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1)
> 
> split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments)
> single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no
> single_length=250 #min length required for single exon ESTs if 'single_exon is enabled'
> correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes
> 
> tries=2 #number of times to try a contig if there is a failure for some reason
> clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no
> clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no
> TMP= #specify a directory other than the system default temporary directory for temporary files
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190213/9051057c/attachment-0001.html>

From liorglck at gmail.com  Sun Feb 17 11:50:10 2019
From: liorglck at gmail.com (Lior Glick)
Date: Sun, 17 Feb 2019 20:50:10 +0200
Subject: [maker-devel] Does Conda Maker actually work?
In-Reply-To: <0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com>
References: <VI1PR06MB5613478CC864D85EB234EDF2B5660@VI1PR06MB5613.eurprd06.prod.outlook.com>
	<0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com>
Message-ID: <CAFOVipPHWZ++FwVdBMDuMx_PTRT2Ep-MZc=iD13ezT1bgrMZwg@mail.gmail.com>

That's good to know. Any plans on creating a stable conda package in the
future? It'd be a very nice feature, especially since MAKER is not always
straightforward to install.

On Wed, Feb 13, 2019 at 5:22 PM Carson Holt <carsonhh at gmail.com> wrote:

> The conda recipe was produced by another group. I do not currently
> recommend using it because I have seen a number of issues pop up on the
> list based on people attempting to install MAKER via conda.  I know there
> is at least an issue with the conda RepeatMasker install, and there may be
> others. The specific failure you show is from Bio::DB::IndexedBase trying
> to compile an Inline::C function. It may be that conda is installing an
> older BioPerl where this issue still exists ?>
> https://github.com/bioperl/bioperl-live/issues/215
>
> Or it may be that there is a new related issue (I?ve seen a handful of
> other examples that seem to relate back to Bio::DB::IndexedBase) ?>
> https://github.com/bioperl/bioperl-live/issues/305
>
> Try installing MAKER without conda (make sure to remove any components
> that are in conda first to avoid conflicts).
>
> ?Carson
>
>
> On Feb 13, 2019, at 5:20 AM, SARIGOEL, FATIH <fatih.sarigoel at durham.ac.uk>
> wrote:
>
> Greetings,
> I notice that you never mention conda installation on your website, so I
> am curious if the conda version is actually supposed to be working fine or
> not; as for me it didn't.
> I created a new conda environment and installed Maker (tried this with
> both installation options)
> When I run the example files, I get this error:
>
> "make: *** [Makefile:330: IndexedBase_14e0.o] Error 127
> A problem was encountered while attempting to compile and install your
> Inline
> C code. The command that failed was:
>   "make > out.make 2>&1" with error code 2"
>
> My conda environment is here
> /fast_new/work/users/fsarigo_m/miniconda3
> I don't understand why the program is trying to look here:
> /home/conda
> which does not exist
>
> Also begins with a "possible precedence issue"
>
> Thanks for your help in advance!
> Fatih
>
> +++++
>
> Here is the full log until the end of the contig:
>
> (MakerX) [fsarigo_m at med0223 MAKER]$ maker
> Possible precedence issue with control flow operator at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm
> line 845.
> STATUS: Parsing control files...
> STATUS: Processing and indexing input FASTA files...
> STATUS: Setting up database for any GFF3 input...
> A data structure will be created for you at:
>
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_datastore
>
> To access files for individual sequences use the datastore index:
>
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_master_datastore_index.log
>
> STATUS: Now running MAKER...
> examining contents of the fasta file and run log
>
>
>
> --Next Contig--
>
> Processing run.log file...
> #---------------------------------------------------------------------
> Now starting the contig!!
> SeqID: contig-dpp-500-500
> Length: 32156
> #---------------------------------------------------------------------
>
>
> Running Mkbootstrap for IndexedBase_14e0 ()
> chmod 644 "IndexedBase_14e0.bs"
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl"
> -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs
> blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl"
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp"
> -typemap
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap"
>  IndexedBase_14e0.xs > IndexedBase_14e0.xsc
> mv IndexedBase_14e0.xsc IndexedBase_14e0.c
> /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc
> -c  -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin"
> -D_REENTRANT -D_GNU_SOURCE
> --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot
> -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong
> -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2
>  -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC
> --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot
> "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE"
>  IndexedBase_14e0.c
> /bin/sh:
> /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc:
> No such file or directory
> make: *** [Makefile:330: IndexedBase_14e0.o] Error 127
>
> A problem was encountered while attempting to compile and install your
> Inline
> C code. The command that failed was:
>   "make > out.make 2>&1" with error code 2
>
> The build directory was:
>
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0
>
> To debug the problem, cd to the build directory, and inspect the output
> files.
>
> Environment PATH =
> '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin'
>  at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm
> line 275.
> --> rank=NA, hostname=med0223
> ...propagated at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm
> line 869.
> --> rank=NA, hostname=med0223
>  at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm
> line 38.
> Error::_throw_Error_Simple(HASH(0x564b40c78870)) called at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm
> line 306
> Error::subs::run_clauses(HASH(0x564b40688970), "Running Mkbootstrap for
> IndexedBase_14e0 ()\x{a}chmod 644 \"Indexe"..., undef,
> ARRAY(0x564b40673ad0)) called at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm
> line 426
> Error::subs::try(CODE(0x564b406899b8), HASH(0x564b40688970)) called at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/FastaSeq.pm
> line 95
> FastaSeq::seq(FastaSeq=HASH(0x564b4068a7f0)) called at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm
> line 478
> Process::MpiChunk::_go(Process::MpiChunk=HASH(0x564b40673c08), "run",
> HASH(0x564b40673c80), 0, 0) called at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm
> line 341
> Process::MpiChunk::run(Process::MpiChunk=HASH(0x564b40673c08), 0) called
> at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm
> line 357
> Process::MpiChunk::run_all(Process::MpiChunk=HASH(0x564b40673c08), 0)
> called at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiTiers.pm
> line 287
> Process::MpiTiers::run_all(Process::MpiTiers=HASH(0x564b4053f9f0), 0)
> called at /fast/users/fsarigo_m/miniconda3/envs/MakerX/bin/maker line 683
> Running Mkbootstrap for IndexedBase_14e0 ()
> chmod 644 "IndexedBase_14e0.bs"
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl"
> -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs
> blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl"
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp"
> -typemap
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap"
>  IndexedBase_14e0.xs > IndexedBase_14e0.xsc
> mv IndexedBase_14e0.xsc IndexedBase_14e0.c
> /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc
> -c  -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin"
> -D_REENTRANT -D_GNU_SOURCE
> --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot
> -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong
> -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2
>  -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC
> --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot
> "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE"
>  IndexedBase_14e0.c
> /bin/sh:
> /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc:
> No such file or directory
> make: *** [Makefile:330: IndexedBase_14e0.o] Error 127
>
> A problem was encountered while attempting to compile and install your
> Inline
> C code. The command that failed was:
>   "make > out.make 2>&1" with error code 2
>
> The build directory was:
>
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0
>
> To debug the problem, cd to the build directory, and inspect the output
> files.
>
> Environment PATH =
> '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin'
>  at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm
> line 275.
> --> rank=NA, hostname=med0223
> ...propagated at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm
> line 869.
> --> rank=NA, hostname=med0223
> --> rank=NA, hostname=med0223
> --> rank=NA, hostname=med0223
> ERROR: Failed while examining contents of the fasta file and run log
> ERROR: Chunk failed at level:0, tier_type:0
> FAILED CONTIG:contig-dpp-500-500
>
> examining contents of the fasta file and run log
>
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190217/678d8fd1/attachment-0001.html>

From morgan_starr_s at live.com  Mon Feb 18 02:08:56 2019
From: morgan_starr_s at live.com (morgan sobol)
Date: Mon, 18 Feb 2019 09:08:56 +0000
Subject: [maker-devel] Re-annotation, fewer gene predictions
In-Reply-To: <CAL0hg4HG0n1+kw4PpFL_LG66nE+Sdd1fzX2Atn5+o+KryVCtug@mail.gmail.com>
References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>
	<CAL0hg4HevFbPhVLfuLq3WF7iJUFpHKwm0X9q+X_yX5sJsCqKDA@mail.gmail.com>
	<DM5PR14MB129277D10A397B2CBE0DDA08AE6E0@DM5PR14MB1292.namprd14.prod.outlook.com>
	<CAL0hg4EH=79A7ucKe=ORznXh=7Suu9Q8AEWj7C8Xio82=G4fvw@mail.gmail.com>
	<DM5PR14MB1292FEA9F662D408FEBB3D21AE6F0@DM5PR14MB1292.namprd14.prod.outlook.com>,
	<CAL0hg4HG0n1+kw4PpFL_LG66nE+Sdd1fzX2Atn5+o+KryVCtug@mail.gmail.com>
Message-ID: <DM5PR14MB1292E82A4864CCC40B80122EAE630@DM5PR14MB1292.namprd14.prod.outlook.com>

Thank you, Xabi and Carson.
With your help, I was able to improve the annotation with a more appropriate number of predictions.

Best,
Morgan

________________________________
From: Xabier V?zquez-Campos <xvazquezc at gmail.com>
Sent: Wednesday, February 6, 2019 11:33 PM
To: morgan sobol; Maker Mailing List
Subject: Re: [maker-devel] Re-annotation, fewer gene predictions

SNAP is easy to train, works well in fungal genomes and it's explained in Maker's wiki:
http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_WGS_Assembly_and_Annotation_Winter_School_2018#Training_ab_initio_Gene_Predictors

Oh, sorry, I didn't explain myself well. What I was trying to say is that before BUSCO, when we only had CEGMA, we would proceed in a different way to train Augustus as CEGMA wouldn't produce Augustus gene models automatically. I don't mean you to use CEGMA.

This is what I have on my own documentation about how to train Augustus "the old way"
AUGUSTUS? the old way

Alternatively, you can train AUGUSTUS in a more ?manual? way, like when we were using CEGMA. The training starts with the output from the second instance of fathom in the SNAP training section.

cd ${MYGENOME_DIR}/maker/snap1
perl ~/bin/zff2augustus_gbk.pl<http://zff2augustus_gbk.pl> > ${MYGENOME}.train1.gb<http://train1.gb>

zff2augustus_gbk.pl<http://zff2augustus_gbk.pl> generates a GenBank file from export.dna.

The actual training of AUGUSTUS will be through the webAUGUSTUS server.

Before proceed, it is recommended to rename the fasta headers, specially if they contain special characters and/or very long headers. This is the main reason of failure for the jobs submitted to webAUGUSTUS. You can use the simplifyFastaHeaders.pl<http://bioinf.uni-greifswald.de/bioinf/downloads/simplifyFastaHeaders.pl> script for that:

perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_assembly.fasta nameStem ${MYGENOME}_contigs_rename.fasta ${MYGENOME}_contigs.map

perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_transcripts_assembled.fasta nameStem ${MYGENOME}_rna_rename.fasta ${MYGENOME}_rna.map

nameStem is the base name for naming each of the sequences in the multifasta files. Use a value with something appropriate. Use contig and rna for the assembly and RNA-seq files, respectively; or something based on that. For example, ?pgcontig? and ?pgrna? for contigs and RNA from Puccinia graminis
DO NOT give the same nameStem to both fasta files, and don?t use any special character.

We need the following files (minimum):

  *   ${MYGENOME}_assembly.fasta as Genome file
  *   ${MYGENOME}.train1.gb<http://train1.gb> as Training gene structure file

If we also have RNA-seq data:

  *   ${MYGENOME}_assembled_transcripts.fasta as cDNA file

Use ${MYGENOME}_v1 as Species name. We will need to have a different species name in the retraining step. Otherwise when Maker2 is rerun, Maker2 will see the same name and will not rerun AUGUSTUS, even though the species profile is different. So, ${MYGENOME}_v1 just do the job and tracks version.

Once the job is finished, the Species parameter archive (parameters.tar.gz) will contain a folder with the model files for your species. Copy it to the species folder of your AUGUSTUS installation.

Hope this helps

PS: hit reply all so this is logged in Maker's mail list in case anybody else experiences similar issues

On Thu, 7 Feb 2019 at 06:36, morgan sobol <morgan_starr_s at live.com<mailto:morgan_starr_s at live.com>> wrote:
I have not used SNAP or CEGMA, however, I see that CEGMA was discontinued in 2015.
Do you think that will be a problem, or is it still worth using the old version?


________________________________
From: Xabier V?zquez-Campos <xvazquezc at gmail.com<mailto:xvazquezc at gmail.com>>
Sent: Tuesday, February 5, 2019 4:42 PM
To: morgan sobol; Maker Mailing List
Subject: Re: [maker-devel] Re-annotation, fewer gene predictions

Don't you use SNAP? It usually produces quite decent results. And easier to train than any of the other predictors

In any case, the Augustus gene model is way off in both cases
GM doesn't seem bad if your fungus has a rather usual genome... in the first. For the second, it looks bad

I'm not too familiar with the reannotation but I'd rather create the gene models from scratch rather than reuse the ones from the Illumina-only genomes.
Note that assemblies with long-reads, have a higher proportion of repetitive elements that need masking and RepeatMasker only may not be enough. In theory, this shouldn't affect Augustus model if trained through BUSCO as it uses defined conserved markers to create the gene model, but I'm not so sure about GM.

If you trained Augustus with BUSCO, and this is the result, I'd discard the gene model and train it again by the "traditional way", i.e. as it used to be when we only had CEGMA. I had good results just by changing the training method.

Hope it helps,
Xabi


On Wed, 6 Feb 2019 at 02:19, morgan sobol <morgan_starr_s at live.com<mailto:morgan_starr_s at live.com>> wrote:
Thank you, Xabi for the response.
The number of proteins from each source is greatly lower than before.
Previous numbers were 325, 10,899, and 11,243 for augustus, genemark, and maker respectively.
The more recent numbers are 25, 857, 4418 respectively.

So do you think maybe this hints that something is wrong from genemark?

Morgan


________________________________
From: Xabier V?zquez-Campos <xvazquezc at gmail.com<mailto:xvazquezc at gmail.com>>
Sent: Sunday, February 3, 2019 4:43 PM
To: morgan sobol
Cc: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Re-annotation, fewer gene predictions

Hi Morgan,

We had a similar issue with AUGUSTUS underpredicting when using a BUSCO-derived gene model
https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ

Also, check the number of proteins by each individual predictor. If the numbers from one of them are off, you may find a possible source of issues.
We didn't have a very good experience with GM, as it used to overpredict an absurd number of proteins.

Xabi

On Mon, 4 Feb 2019 at 06:15, morgan sobol <morgan_starr_s at live.com<mailto:morgan_starr_s at live.com>> wrote:
Hello,

I previously used Maker to annotate two different fungal genomes that were created using Illumina sequences only. For these genomes, I had over 11,000 genes predicted.
I recently obtained PacBio sequences for the same genomes, so I created two hybrid assemblies. Both assemblies were very familiar in length and completed number of orthologs to the Illumina only assembly, but had much fewer, but longer contigs.

I re-ran Maker using the settings below. For one of my genomes, I got around 11,000 genes predicted again, as expected. However, for the other genome, I am continuously getting ~4,400 predicted genes.

I am asking for help as to how I can determine why I keep getting fewer predicted genes for only one of my genomes, even though I ran them the same?

Thanks,
Morgan S.

maker_opts.log
#-----Genome (these are always required)
genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked #genome sequence (fasta file or$
organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic

#-----Re-annotation Using MAKER Derived GFF3
maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff #MAKER derive$
est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no

#-----EST Evidence (for best results provide a file for at least one)
est= #set of ESTs or assembled mRNA-seq in fasta format
altest= #EST/cDNA sequence file in fasta format from an alternate organism
est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
altest_gff= #aligned ESTs from a closly relate species in GFF3 format

#-----Protein Homology Evidence (for best results provide a file for at least one)
protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta  #protein sequence file in fasta format (i.e. from mutiple oransisms)
protein_gff=  #aligned protein homology evidence from an external GFF3 file

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org= #select a model organism for RepBase masking in RepeatMasker
rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker
repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner
rm_gff= #pre-identified repeat elements from an external GFF3 file
prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no
softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering)

#-----Gene Prediction
snaphmm= #SNAP HMM file
gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file
augustus_species=1368D_uni #Augustus gene prediction species model
fgenesh_par_file= #FGENESH parameter file
pred_gff= #ab-initio predictions from an external GFF3 file
model_gff= #annotated gene models from an external GFF3 file (annotation pass-through)
est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no
trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no

#-----Other Annotation Feature Types (features MAKER doesn't recognize)
other_gff= #extra features to pass-through to final MAKER generated GFF3 file

#-----External Application Behavior Options
alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases
cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI)

#-----MAKER Behavior Options
max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage)
min_contig=1 #skip genome contigs below this length (under 10kb are often useless)

pred_flank=200 #flank for extending evidence clusters sent to gene predictors
pred_stats=1 #report AED and QI statistics for all predictions as well as models
AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
min_protein=0 #require at least this many amino acids in predicted proteins
alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no
always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no
map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no
keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1)

split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments)
single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no
single_length=250 #min length required for single exon ESTs if 'single_exon is enabled'
correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes

tries=2 #number of times to try a contig if there is a failure for some reason
clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no
clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no
TMP= #specify a directory other than the system default temporary directory for temporary files

_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


--
Xabier V?zquez-Campos, PhD
Research Associate
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA


--
Xabier V?zquez-Campos, PhD
Research Associate
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA


--
Xabier V?zquez-Campos, PhD
Research Associate
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190218/cd5b4c18/attachment-0001.html>

From anthony.bretaudeau at inria.fr  Mon Feb 18 02:53:39 2019
From: anthony.bretaudeau at inria.fr (Anthony Bretaudeau)
Date: Mon, 18 Feb 2019 10:53:39 +0100
Subject: [maker-devel] Does Conda Maker actually work?
In-Reply-To: <CAFOVipPHWZ++FwVdBMDuMx_PTRT2Ep-MZc=iD13ezT1bgrMZwg@mail.gmail.com>
References: <VI1PR06MB5613478CC864D85EB234EDF2B5660@VI1PR06MB5613.eurprd06.prod.outlook.com>
	<0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com>
	<CAFOVipPHWZ++FwVdBMDuMx_PTRT2Ep-MZc=iD13ezT1bgrMZwg@mail.gmail.com>
Message-ID: <3aa1eb97-f8bf-dd61-febf-464ad4b1626c@inria.fr>

An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190218/d42974d5/attachment-0001.html>

From liorglic at mail.tau.ac.il  Sun Feb 24 05:50:49 2019
From: liorglic at mail.tau.ac.il (Lior Glick)
Date: Sun, 24 Feb 2019 14:50:49 +0200
Subject: [maker-devel] Profiling MAKER runs
Message-ID: <CAOzMDPyHL9tM-DWTBJb=SSMT1KH6FwhArdgqgN-8aVoBthY69g@mail.gmail.com>

Dear MAKER users,
I was wondering if any of you has an idea of a way by which I can profile
my runs. What I mean is I'd like to know how much time was spent on each
step of the analysis - am I spending most of the time masking repeats,
blasting transcripts/proteins, running ab-initio predictors etc. Based on
this information, I might want to adjust my configuration, e.g. maybe I'm
spending a lot of time blasting transcripts, and reducing the number of
input transcripts would reduce run time significantly without having a
major effect on results quality.
As far as I can see, the main run log does not provide such information,
and I'm not sure where else to look. Any ideas or directions could be of
help.

Thanks!
Lior
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190224/584449c3/attachment-0001.html>

From morgan_starr_s at live.com  Sun Feb  3 12:13:47 2019
From: morgan_starr_s at live.com (morgan sobol)
Date: Sun, 3 Feb 2019 19:13:47 +0000
Subject: [maker-devel] Re-annotation, fewer gene predictions
Message-ID: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>

Hello,

I previously used Maker to annotate two different fungal genomes that were created using Illumina sequences only. For these genomes, I had over 11,000 genes predicted.
I recently obtained PacBio sequences for the same genomes, so I created two hybrid assemblies. Both assemblies were very familiar in length and completed number of orthologs to the Illumina only assembly, but had much fewer, but longer contigs.

I re-ran Maker using the settings below. For one of my genomes, I got around 11,000 genes predicted again, as expected. However, for the other genome, I am continuously getting ~4,400 predicted genes.

I am asking for help as to how I can determine why I keep getting fewer predicted genes for only one of my genomes, even though I ran them the same?

Thanks,
Morgan S.

maker_opts.log
#-----Genome (these are always required)
genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked #genome sequence (fasta file or$
organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic

#-----Re-annotation Using MAKER Derived GFF3
maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff #MAKER derive$
est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no

#-----EST Evidence (for best results provide a file for at least one)
est= #set of ESTs or assembled mRNA-seq in fasta format
altest= #EST/cDNA sequence file in fasta format from an alternate organism
est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
altest_gff= #aligned ESTs from a closly relate species in GFF3 format

#-----Protein Homology Evidence (for best results provide a file for at least one)
protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta  #protein sequence file in fasta format (i.e. from mutiple oransisms)
protein_gff=  #aligned protein homology evidence from an external GFF3 file

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org= #select a model organism for RepBase masking in RepeatMasker
rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker
repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner
rm_gff= #pre-identified repeat elements from an external GFF3 file
prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no
softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering)

#-----Gene Prediction
snaphmm= #SNAP HMM file
gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file
augustus_species=1368D_uni #Augustus gene prediction species model
fgenesh_par_file= #FGENESH parameter file
pred_gff= #ab-initio predictions from an external GFF3 file
model_gff= #annotated gene models from an external GFF3 file (annotation pass-through)
est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no
trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no

#-----Other Annotation Feature Types (features MAKER doesn't recognize)
other_gff= #extra features to pass-through to final MAKER generated GFF3 file

#-----External Application Behavior Options
alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases
cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI)

#-----MAKER Behavior Options
max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage)
min_contig=1 #skip genome contigs below this length (under 10kb are often useless)

pred_flank=200 #flank for extending evidence clusters sent to gene predictors
pred_stats=1 #report AED and QI statistics for all predictions as well as models
AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
min_protein=0 #require at least this many amino acids in predicted proteins
alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no
always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no
map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no
keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1)

split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments)
single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no
single_length=250 #min length required for single exon ESTs if 'single_exon is enabled'
correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes

tries=2 #number of times to try a contig if there is a failure for some reason
clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no
clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no
TMP= #specify a directory other than the system default temporary directory for temporary files

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190203/ce613295/attachment-0002.html>

From xvazquezc at gmail.com  Sun Feb  3 15:43:42 2019
From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=)
Date: Mon, 4 Feb 2019 09:43:42 +1100
Subject: [maker-devel] Re-annotation, fewer gene predictions
In-Reply-To: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>
References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>
Message-ID: <CAL0hg4HevFbPhVLfuLq3WF7iJUFpHKwm0X9q+X_yX5sJsCqKDA@mail.gmail.com>

Hi Morgan,

We had a similar issue with AUGUSTUS underpredicting when using a
BUSCO-derived gene model
https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ

Also, check the number of proteins by each individual predictor. If the
numbers from one of them are off, you may find a possible source of issues.
We didn't have a very good experience with GM, as it used to overpredict an
absurd number of proteins.

Xabi

On Mon, 4 Feb 2019 at 06:15, morgan sobol <morgan_starr_s at live.com> wrote:

> Hello,
>
> I previously used Maker to annotate two different fungal genomes that were
> created using Illumina sequences only. For these genomes, I had over 11,000
> genes predicted.
> I recently obtained PacBio sequences for the same genomes, so I created
> two hybrid assemblies. Both assemblies were very familiar in length and
> completed number of orthologs to the Illumina only assembly, but had much
> fewer, but longer contigs.
>
> I re-ran Maker using the settings below. For one of my genomes, I got
> around 11,000 genes predicted again, as expected. However, for the other
> genome, I am continuously getting ~4,400 predicted genes.
>
> I am asking for help as to how I can determine why I keep getting fewer
> predicted genes for only one of my genomes, even though I ran them the same?
>
> Thanks,
> Morgan S.
>
> maker_opts.log
> #-----Genome (these are always required)
> genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked
> #genome sequence (fasta file or$
> organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic
>
> #-----Re-annotation Using MAKER Derived GFF3
> maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff
> #MAKER derive$
> est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
> altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
> protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
> rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
> model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
> pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
> other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no
>
> #-----EST Evidence (for best results provide a file for at least one)
> est= #set of ESTs or assembled mRNA-seq in fasta format
> altest= #EST/cDNA sequence file in fasta format from an alternate organism
> est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
> altest_gff= #aligned ESTs from a closly relate species in GFF3 format
>
> #-----Protein Homology Evidence (for best results provide a file for at
> least one)
> protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta
> #protein sequence file in fasta format (i.e. from mutiple oransisms)
> protein_gff=  #aligned protein homology evidence from an external GFF3 file
>
> #-----Repeat Masking (leave values blank to skip repeat masking)
> model_org= #select a model organism for RepBase masking in RepeatMasker
> rmlib= #provide an organism specific repeat library in fasta format for
> RepeatMasker
> repeat_protein= #provide a fasta file of transposable element proteins for
> RepeatRunner
> rm_gff= #pre-identified repeat elements from an external GFF3 file
> prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change
> this), 1 = yes, 0 = no
> softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg
> and dust filtering)
>
> #-----Gene Prediction
> snaphmm= #SNAP HMM file
> gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file
> augustus_species=1368D_uni #Augustus gene prediction species model
> fgenesh_par_file= #FGENESH parameter file
> pred_gff= #ab-initio predictions from an external GFF3 file
> model_gff= #annotated gene models from an external GFF3 file (annotation
> pass-through)
> est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
> protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no
> trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
> snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
> unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 =
> yes, 0 = no
>
> #-----Other Annotation Feature Types (features MAKER doesn't recognize)
> other_gff= #extra features to pass-through to final MAKER generated GFF3
> file
>
> #-----External Application Behavior Options
> alt_peptide=C #amino acid used to replace non-standard amino acids in
> BLAST databases
> cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI,
> leave 1 when using MPI)
>
> #-----MAKER Behavior Options
> max_dna_len=100000 #length for dividing up contigs into chunks
> (increases/decreases memory usage)
> min_contig=1 #skip genome contigs below this length (under 10kb are often
> useless)
>
> pred_flank=200 #flank for extending evidence clusters sent to gene
> predictors
> pred_stats=1 #report AED and QI statistics for all predictions as well as
> models
> AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and
> 1)
> min_protein=0 #require at least this many amino acids in predicted proteins
> alt_splice=0 #Take extra steps to try and find alternative splicing, 1 =
> yes, 0 = no
> always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0
> = no
> map_forward=0 #map names and attributes forward from old GFF3 genes, 1 =
> yes, 0 = no
> keep_preds=1 #Concordance threshold to add unsupported gene prediction
> (bound by 0 and 1)
>
> split_hit=10000 #length for the splitting of hits (expected max intron
> size for evidence alignments)
> single_exon=1 #consider single exon EST evidence when generating
> annotations, 1 = yes, 0 = no
> single_length=250 #min length required for single exon ESTs if
> 'single_exon is enabled'
> correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion
> genes
>
> tries=2 #number of times to try a contig if there is a failure for some
> reason
> clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0
> = no
> clean_up=0 #removes theVoid directory with individual analysis files, 1 =
> yes, 0 = no
> TMP= #specify a directory other than the system default temporary
> directory for temporary files
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>


-- 
Xabier V?zquez-Campos, *PhD*
*Research Associate*
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190204/2d94d0d9/attachment-0002.html>

From keith.decker at bayer.com  Mon Feb  4 11:09:35 2019
From: keith.decker at bayer.com (DECKER, KEITH F [AG/1005])
Date: Mon, 4 Feb 2019 18:09:35 +0000
Subject: [maker-devel] MAKER on AWS
Message-ID: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com>

I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be.
I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html
but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use.

So my questions are

1.  Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS?
2.  Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end?  If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate?

Thanks and sorry for the long question
Keith
This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.
Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.
Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so.
If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190204/e803b13e/attachment-0002.html>

From carsonhh at gmail.com  Mon Feb  4 11:31:29 2019
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 4 Feb 2019 11:31:29 -0700
Subject: [maker-devel] MAKER on AWS
In-Reply-To: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com>
References: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com>
Message-ID: <0934DD0D-9431-4454-A278-87E27D44F984@gmail.com>

You can try and stand up a cluster inside AWS, or like you said just start independent instances each with their own piece of the total dataset. There is a tools called fasta_tool inside of maker that makes it easy to split up the dataset into equal sized chunks.

Alternatively, CyVerse has set up an interesting MAKER wrapper (WQ-MAKER) that launches multiple cloud instances for MAKER and handles data chunking for you (they?ve been using XSEDE cloud resources through the NSF)  ?>
http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf <http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf>

Here is an example of an external project using their setup ?> http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/ <http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/>

?Carson


> On Feb 4, 2019, at 11:09 AM, DECKER, KEITH F [AG/1005] <keith.decker at bayer.com> wrote:
> 
> I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be.
> I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html <http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html>
> but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use. 
>  
> So my questions are
>  
> 1.  Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS?
> 2.  Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end?  If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate?
>  
> Thanks and sorry for the long question
> Keith
> 
> 
> This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.
> Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.
> Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so.
> If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190204/839034e2/attachment-0002.html>

From liorglck at gmail.com  Mon Feb  4 02:00:29 2019
From: liorglck at gmail.com (Lior Glick)
Date: Mon, 4 Feb 2019 11:00:29 +0200
Subject: [maker-devel] MAKER not calling RepeatMasker exe indicated in
 maker_exe.ctl
Message-ID: <CAFOVipNgzGd-wLNqz1WGx+mM_8R3KZOtqatq6D+nuNCHboRPXQ@mail.gmail.com>

Dear MAKER users,

I've been using MAKER for a while now, with RepeatMasker installed locally.
By that I mean that I can type 'RepeatMasker' in my terminal and the
software is initiated. Typing 'which RepeatMasker' shows the correct local
path.
I also use this path as value for the maker_exe.ctl parameter
'RepeatMasker'.
Trying to generalize my working environment, I am trying to use a conda env
<https://anaconda.org/bioconda/maker> which is capable of running MAKER.
This env comes with RepeatMasker as well. Once I activate this env, I can
still run RepeatMasker, but it points to a different path. When I run MAKER
within this env, it fails right away with the error message:
ERROR: Could not determine if RepBase is installed
Running the same configuration files locally (i.e. outside the conda env)
results in a successful run.
This leads me to think that MAKER is not actually using the path indicated
in the maker_exe.ctl file, and rather looks for RepeatMasker in $PATH or
something similar. Is that the expected behavior? Any suggestions of how to
overcome this issue?

Thanks and best regards,
Lior
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190204/bd480e61/attachment-0002.html>

From keith.decker at bayer.com  Mon Feb  4 11:39:48 2019
From: keith.decker at bayer.com (DECKER, KEITH F [AG/1005])
Date: Mon, 4 Feb 2019 18:39:48 +0000
Subject: [maker-devel] MAKER on AWS
In-Reply-To: <0934DD0D-9431-4454-A278-87E27D44F984@gmail.com>
References: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com>
	<0934DD0D-9431-4454-A278-87E27D44F984@gmail.com>
Message-ID: <1BAD7C53-AFA5-4A4A-B35B-D760B3D4C28D@monsanto.com>

Thanks,
Do you have metrics on how MAKER performs on annotating a single chromosome on a single machine?  For example, will I see anything close to 16X speed-up using a 16 core machine, and does performance improvement saturate at a certain number of cores?

-Keith

From: Carson Holt <carsonhh at gmail.com>
Date: Monday, February 4, 2019 at 12:33 PM
To: "DECKER, KEITH F [AG/1005]" <keith.decker at bayer.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] MAKER on AWS

You can try and stand up a cluster inside AWS, or like you said just start independent instances each with their own piece of the total dataset. There is a tools called fasta_tool inside of maker that makes it easy to split up the dataset into equal sized chunks.

Alternatively, CyVerse has set up an interesting MAKER wrapper (WQ-MAKER) that launches multiple cloud instances for MAKER and handles data chunking for you (they?ve been using XSEDE cloud resources through the NSF)  ?>
http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf

Here is an example of an external project using their setup ?> http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/

?Carson


On Feb 4, 2019, at 11:09 AM, DECKER, KEITH F [AG/1005] <keith.decker at bayer.com<mailto:keith.decker at bayer.com>> wrote:

I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be.
I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html
but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use.

So my questions are

1.  Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS?
2.  Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end?  If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate?

Thanks and sorry for the long question
Keith


This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.

Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.

Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so.

If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.
Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.
Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so.
If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190204/05ee72b5/attachment-0002.html>

From carsonhh at gmail.com  Mon Feb  4 12:00:00 2019
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 4 Feb 2019 12:00:00 -0700
Subject: [maker-devel] MAKER on AWS
In-Reply-To: <1BAD7C53-AFA5-4A4A-B35B-D760B3D4C28D@monsanto.com>
References: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com>
	<0934DD0D-9431-4454-A278-87E27D44F984@gmail.com>
	<1BAD7C53-AFA5-4A4A-B35B-D760B3D4C28D@monsanto.com>
Message-ID: <EF78A658-7C9E-4F10-AA30-73E97DB30297@gmail.com>

I don?t have cloud performance stats, but I do have cluster performance stats you may be able to somewhat correlate (attached). On a cluster we see nearly linear performance gains until ~100 CPU cores, and the plateau doesn?t fully level out until well after 600 cores (we are hitting IO and networking limits for inter-node communication). So if you are only using a single instance, you can essentially consider it the equivalent of a single real machine which would fall well under 100 CPU cores, and performance growth would be expected to be linear on that instance.

?Carson


> On Feb 4, 2019, at 11:39 AM, DECKER, KEITH F [AG/1005] <keith.decker at bayer.com> wrote:
> 
> Thanks,
> Do you have metrics on how MAKER performs on annotating a single chromosome on a single machine?  For example, will I see anything close to 16X speed-up using a 16 core machine, and does performance improvement saturate at a certain number of cores?
>  
> -Keith
>  
> From: Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>>
> Date: Monday, February 4, 2019 at 12:33 PM
> To: "DECKER, KEITH F [AG/1005]" <keith.decker at bayer.com <mailto:keith.decker at bayer.com>>
> Cc: "maker-devel at yandell-lab.org <mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org <mailto:maker-devel at yandell-lab.org>>
> Subject: Re: [maker-devel] MAKER on AWS
>  
> You can try and stand up a cluster inside AWS, or like you said just start independent instances each with their own piece of the total dataset. There is a tools called fasta_tool inside of maker that makes it easy to split up the dataset into equal sized chunks. 
>  
> Alternatively, CyVerse has set up an interesting MAKER wrapper (WQ-MAKER) that launches multiple cloud instances for MAKER and handles data chunking for you (they?ve been using XSEDE cloud resources through the NSF)  ?>
> http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf <http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf>
>  
> Here is an example of an external project using their setup ?> http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/ <http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/>
>  
> ?Carson
>  
>  
>  
> 
> 
> On Feb 4, 2019, at 11:09 AM, DECKER, KEITH F [AG/1005] <keith.decker at bayer.com <mailto:keith.decker at bayer.com>> wrote:
>  
> I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be.
> I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html <http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html>
> but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use. 
>  
> So my questions are
>  
> 1.  Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS?
> 2.  Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end?  If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate?
>  
> Thanks and sorry for the long question
> Keith
> 
> 
>  
> This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.
> Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.
> Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so.
> If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.
>  
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
>  
> 
> 
> This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.
> Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.
> Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so.
> If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190204/43c5cc9f/attachment-0004.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-2.pdf
Type: application/pdf
Size: 41425 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190204/43c5cc9f/attachment-0002.pdf>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190204/43c5cc9f/attachment-0005.html>

From xvazquezc at gmail.com  Tue Feb  5 15:42:40 2019
From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=)
Date: Wed, 6 Feb 2019 09:42:40 +1100
Subject: [maker-devel] Re-annotation, fewer gene predictions
In-Reply-To: <DM5PR14MB129277D10A397B2CBE0DDA08AE6E0@DM5PR14MB1292.namprd14.prod.outlook.com>
References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>
	<CAL0hg4HevFbPhVLfuLq3WF7iJUFpHKwm0X9q+X_yX5sJsCqKDA@mail.gmail.com>
	<DM5PR14MB129277D10A397B2CBE0DDA08AE6E0@DM5PR14MB1292.namprd14.prod.outlook.com>
Message-ID: <CAL0hg4EH=79A7ucKe=ORznXh=7Suu9Q8AEWj7C8Xio82=G4fvw@mail.gmail.com>

Don't you use SNAP? It usually produces quite decent results. And easier to
train than any of the other predictors

In any case, the Augustus gene model is way off in both cases
GM doesn't seem bad if your fungus has a rather usual genome... in the
first. For the second, it looks bad

I'm not too familiar with the reannotation but I'd rather create the gene
models from scratch rather than reuse the ones from the Illumina-only
genomes.
Note that assemblies with long-reads, have a higher proportion of
repetitive elements that need masking and RepeatMasker only may not be
enough. In theory, this shouldn't affect Augustus model if trained through
BUSCO as it uses defined conserved markers to create the gene model, but
I'm not so sure about GM.

If you trained Augustus with BUSCO, and this is the result, I'd discard the
gene model and train it again by the "traditional way", i.e. as it used to
be when we only had CEGMA. I had good results just by changing the training
method.

Hope it helps,
Xabi


On Wed, 6 Feb 2019 at 02:19, morgan sobol <morgan_starr_s at live.com> wrote:

> Thank you, Xabi for the response.
> The number of proteins from each source is greatly lower than before.
> Previous numbers were 325, 10,899, and 11,243 for augustus, genemark, and
> maker respectively.
> The more recent numbers are 25, 857, 4418 respectively.
>
> So do you think maybe this hints that something is wrong from genemark?
>
> Morgan
>
>
> ------------------------------
> *From:* Xabier V?zquez-Campos <xvazquezc at gmail.com>
> *Sent:* Sunday, February 3, 2019 4:43 PM
> *To:* morgan sobol
> *Cc:* maker-devel at yandell-lab.org
> *Subject:* Re: [maker-devel] Re-annotation, fewer gene predictions
>
> Hi Morgan,
>
> We had a similar issue with AUGUSTUS underpredicting when using a
> BUSCO-derived gene model
> https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ
>
> Also, check the number of proteins by each individual predictor. If the
> numbers from one of them are off, you may find a possible source of issues.
> We didn't have a very good experience with GM, as it used to overpredict
> an absurd number of proteins.
>
> Xabi
>
> On Mon, 4 Feb 2019 at 06:15, morgan sobol <morgan_starr_s at live.com> wrote:
>
> Hello,
>
> I previously used Maker to annotate two different fungal genomes that were
> created using Illumina sequences only. For these genomes, I had over 11,000
> genes predicted.
> I recently obtained PacBio sequences for the same genomes, so I created
> two hybrid assemblies. Both assemblies were very familiar in length and
> completed number of orthologs to the Illumina only assembly, but had much
> fewer, but longer contigs.
>
> I re-ran Maker using the settings below. For one of my genomes, I got
> around 11,000 genes predicted again, as expected. However, for the other
> genome, I am continuously getting ~4,400 predicted genes.
>
> I am asking for help as to how I can determine why I keep getting fewer
> predicted genes for only one of my genomes, even though I ran them the same?
>
> Thanks,
> Morgan S.
>
> maker_opts.log
> #-----Genome (these are always required)
> genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked
> #genome sequence (fasta file or$
> organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic
>
> #-----Re-annotation Using MAKER Derived GFF3
> maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff
> #MAKER derive$
> est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
> altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
> protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
> rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
> model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
> pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
> other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no
>
> #-----EST Evidence (for best results provide a file for at least one)
> est= #set of ESTs or assembled mRNA-seq in fasta format
> altest= #EST/cDNA sequence file in fasta format from an alternate organism
> est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
> altest_gff= #aligned ESTs from a closly relate species in GFF3 format
>
> #-----Protein Homology Evidence (for best results provide a file for at
> least one)
> protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta
> #protein sequence file in fasta format (i.e. from mutiple oransisms)
> protein_gff=  #aligned protein homology evidence from an external GFF3 file
>
> #-----Repeat Masking (leave values blank to skip repeat masking)
> model_org= #select a model organism for RepBase masking in RepeatMasker
> rmlib= #provide an organism specific repeat library in fasta format for
> RepeatMasker
> repeat_protein= #provide a fasta file of transposable element proteins for
> RepeatRunner
> rm_gff= #pre-identified repeat elements from an external GFF3 file
> prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change
> this), 1 = yes, 0 = no
> softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg
> and dust filtering)
>
> #-----Gene Prediction
> snaphmm= #SNAP HMM file
> gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file
> augustus_species=1368D_uni #Augustus gene prediction species model
> fgenesh_par_file= #FGENESH parameter file
> pred_gff= #ab-initio predictions from an external GFF3 file
> model_gff= #annotated gene models from an external GFF3 file (annotation
> pass-through)
> est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
> protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no
> trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
> snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
> unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 =
> yes, 0 = no
>
> #-----Other Annotation Feature Types (features MAKER doesn't recognize)
> other_gff= #extra features to pass-through to final MAKER generated GFF3
> file
>
> #-----External Application Behavior Options
> alt_peptide=C #amino acid used to replace non-standard amino acids in
> BLAST databases
> cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI,
> leave 1 when using MPI)
>
> #-----MAKER Behavior Options
> max_dna_len=100000 #length for dividing up contigs into chunks
> (increases/decreases memory usage)
> min_contig=1 #skip genome contigs below this length (under 10kb are often
> useless)
>
> pred_flank=200 #flank for extending evidence clusters sent to gene
> predictors
> pred_stats=1 #report AED and QI statistics for all predictions as well as
> models
> AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and
> 1)
> min_protein=0 #require at least this many amino acids in predicted proteins
> alt_splice=0 #Take extra steps to try and find alternative splicing, 1 =
> yes, 0 = no
> always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0
> = no
> map_forward=0 #map names and attributes forward from old GFF3 genes, 1 =
> yes, 0 = no
> keep_preds=1 #Concordance threshold to add unsupported gene prediction
> (bound by 0 and 1)
>
> split_hit=10000 #length for the splitting of hits (expected max intron
> size for evidence alignments)
> single_exon=1 #consider single exon EST evidence when generating
> annotations, 1 = yes, 0 = no
> single_length=250 #min length required for single exon ESTs if
> 'single_exon is enabled'
> correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion
> genes
>
> tries=2 #number of times to try a contig if there is a failure for some
> reason
> clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0
> = no
> clean_up=0 #removes theVoid directory with individual analysis files, 1 =
> yes, 0 = no
> TMP= #specify a directory other than the system default temporary
> directory for temporary files
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
> --
> Xabier V?zquez-Campos, *PhD*
> *Research Associate*
> NSW Systems Biology Initiative
> School of Biotechnology and Biomolecular Sciences
> The University of New South Wales
> Sydney NSW 2052 AUSTRALIA
>


-- 
Xabier V?zquez-Campos, *PhD*
*Research Associate*
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190206/ed24fbe6/attachment-0002.html>

From xvazquezc at gmail.com  Wed Feb  6 15:33:47 2019
From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=)
Date: Thu, 7 Feb 2019 09:33:47 +1100
Subject: [maker-devel] Re-annotation, fewer gene predictions
In-Reply-To: <DM5PR14MB1292FEA9F662D408FEBB3D21AE6F0@DM5PR14MB1292.namprd14.prod.outlook.com>
References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>
	<CAL0hg4HevFbPhVLfuLq3WF7iJUFpHKwm0X9q+X_yX5sJsCqKDA@mail.gmail.com>
	<DM5PR14MB129277D10A397B2CBE0DDA08AE6E0@DM5PR14MB1292.namprd14.prod.outlook.com>
	<CAL0hg4EH=79A7ucKe=ORznXh=7Suu9Q8AEWj7C8Xio82=G4fvw@mail.gmail.com>
	<DM5PR14MB1292FEA9F662D408FEBB3D21AE6F0@DM5PR14MB1292.namprd14.prod.outlook.com>
Message-ID: <CAL0hg4HG0n1+kw4PpFL_LG66nE+Sdd1fzX2Atn5+o+KryVCtug@mail.gmail.com>

 SNAP is easy to train, works well in fungal genomes and it's explained in
Maker's wiki:
http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_WGS_Assembly_and_Annotation_Winter_School_2018#Training_ab_initio_Gene_Predictors

Oh, sorry, I didn't explain myself well. What I was trying to say is that
before BUSCO, when we only had CEGMA, we would proceed in a different way
to train Augustus as CEGMA wouldn't produce Augustus gene models
automatically. I don't mean you to use CEGMA.

This is what I have on my own documentation about how to train Augustus
"the old way"

> AUGUSTUS? the old way
>
> Alternatively, you can train AUGUSTUS in a more ?manual? way, like when we
> were using CEGMA. The training starts with the output from the second
> instance of fathom in the SNAP training section.
>
> cd ${MYGENOME_DIR}/maker/snap1
> perl ~/bin/zff2augustus_gbk.pl > ${MYGENOME}.train1.gb
>
> zff2augustus_gbk.pl generates a GenBank file from export.dna.
>
> The actual training of AUGUSTUS will be through the *webAUGUSTUS server*.
>
> Before proceed, it is recommended to rename the fasta headers, specially
> if they contain special characters and/or very long headers. This is the
> main reason of failure for the jobs submitted to webAUGUSTUS. You can use
> the simplifyFastaHeaders.pl
> <http://bioinf.uni-greifswald.de/bioinf/downloads/simplifyFastaHeaders.pl>
> script for that:
>
> perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_assembly.fasta nameStem ${MYGENOME}_contigs_rename.fasta ${MYGENOME}_contigs.map
>
> perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_transcripts_assembled.fasta nameStem ${MYGENOME}_rna_rename.fasta ${MYGENOME}_rna.map
>
> nameStem is the base name for naming each of the sequences in the
> multifasta files. Use a value with something appropriate. Use *contig*
> and *rna* for the assembly and RNA-seq files, respectively; or something
> based on that. For example, ?pgcontig? and ?pgrna? for contigs and RNA from *Puccinia
> graminis*
> *DO NOT* give the same nameStem to both fasta files, and don?t use any
> special character.
>
> We need the following files (minimum):
>
>    - ${MYGENOME}_assembly.fasta as *Genome file*
>    - ${MYGENOME}.train1.gb as *Training gene structure file*
>
> If we also have RNA-seq data:
>
>    - ${MYGENOME}_assembled_transcripts.fasta as *cDNA file*
>
> Use ${MYGENOME}_v1 as *Species name*. We will need to have a different
> species name in the retraining step. Otherwise when Maker2 is rerun, Maker2
> will see the same name and will not rerun AUGUSTUS, even though the species
> profile is different. So, ${MYGENOME}_v1 just do the job and tracks
> version.
>
> Once the job is finished, the *Species parameter archive* (
> parameters.tar.gz) will contain a folder with the model files for your
> species. Copy it to the species folder of your AUGUSTUS installation.
>
Hope this helps

PS: hit reply all so this is logged in Maker's mail list in case anybody
else experiences similar issues

On Thu, 7 Feb 2019 at 06:36, morgan sobol <morgan_starr_s at live.com> wrote:

> I have not used SNAP or CEGMA, however, I see that CEGMA was discontinued
> in 2015.
> Do you think that will be a problem, or is it still worth using the old
> version?
>
>
> ------------------------------
> *From:* Xabier V?zquez-Campos <xvazquezc at gmail.com>
> *Sent:* Tuesday, February 5, 2019 4:42 PM
> *To:* morgan sobol; Maker Mailing List
> *Subject:* Re: [maker-devel] Re-annotation, fewer gene predictions
>
> Don't you use SNAP? It usually produces quite decent results. And easier
> to train than any of the other predictors
>
> In any case, the Augustus gene model is way off in both cases
> GM doesn't seem bad if your fungus has a rather usual genome... in the
> first. For the second, it looks bad
>
> I'm not too familiar with the reannotation but I'd rather create the gene
> models from scratch rather than reuse the ones from the Illumina-only
> genomes.
> Note that assemblies with long-reads, have a higher proportion of
> repetitive elements that need masking and RepeatMasker only may not be
> enough. In theory, this shouldn't affect Augustus model if trained through
> BUSCO as it uses defined conserved markers to create the gene model, but
> I'm not so sure about GM.
>
> If you trained Augustus with BUSCO, and this is the result, I'd discard
> the gene model and train it again by the "traditional way", i.e. as it used
> to be when we only had CEGMA. I had good results just by changing the
> training method.
>
> Hope it helps,
> Xabi
>
>
>
>
> On Wed, 6 Feb 2019 at 02:19, morgan sobol <morgan_starr_s at live.com> wrote:
>
> Thank you, Xabi for the response.
> The number of proteins from each source is greatly lower than before.
> Previous numbers were 325, 10,899, and 11,243 for augustus, genemark, and
> maker respectively.
> The more recent numbers are 25, 857, 4418 respectively.
>
> So do you think maybe this hints that something is wrong from genemark?
>
> Morgan
>
>
> ------------------------------
> *From:* Xabier V?zquez-Campos <xvazquezc at gmail.com>
> *Sent:* Sunday, February 3, 2019 4:43 PM
> *To:* morgan sobol
> *Cc:* maker-devel at yandell-lab.org
> *Subject:* Re: [maker-devel] Re-annotation, fewer gene predictions
>
> Hi Morgan,
>
> We had a similar issue with AUGUSTUS underpredicting when using a
> BUSCO-derived gene model
> https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ
>
> Also, check the number of proteins by each individual predictor. If the
> numbers from one of them are off, you may find a possible source of issues.
> We didn't have a very good experience with GM, as it used to overpredict
> an absurd number of proteins.
>
> Xabi
>
> On Mon, 4 Feb 2019 at 06:15, morgan sobol <morgan_starr_s at live.com> wrote:
>
> Hello,
>
> I previously used Maker to annotate two different fungal genomes that were
> created using Illumina sequences only. For these genomes, I had over 11,000
> genes predicted.
> I recently obtained PacBio sequences for the same genomes, so I created
> two hybrid assemblies. Both assemblies were very familiar in length and
> completed number of orthologs to the Illumina only assembly, but had much
> fewer, but longer contigs.
>
> I re-ran Maker using the settings below. For one of my genomes, I got
> around 11,000 genes predicted again, as expected. However, for the other
> genome, I am continuously getting ~4,400 predicted genes.
>
> I am asking for help as to how I can determine why I keep getting fewer
> predicted genes for only one of my genomes, even though I ran them the same?
>
> Thanks,
> Morgan S.
>
> maker_opts.log
> #-----Genome (these are always required)
> genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked
> #genome sequence (fasta file or$
> organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic
>
> #-----Re-annotation Using MAKER Derived GFF3
> maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff
> #MAKER derive$
> est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
> altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
> protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
> rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
> model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
> pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
> other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no
>
> #-----EST Evidence (for best results provide a file for at least one)
> est= #set of ESTs or assembled mRNA-seq in fasta format
> altest= #EST/cDNA sequence file in fasta format from an alternate organism
> est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
> altest_gff= #aligned ESTs from a closly relate species in GFF3 format
>
> #-----Protein Homology Evidence (for best results provide a file for at
> least one)
> protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta
> #protein sequence file in fasta format (i.e. from mutiple oransisms)
> protein_gff=  #aligned protein homology evidence from an external GFF3 file
>
> #-----Repeat Masking (leave values blank to skip repeat masking)
> model_org= #select a model organism for RepBase masking in RepeatMasker
> rmlib= #provide an organism specific repeat library in fasta format for
> RepeatMasker
> repeat_protein= #provide a fasta file of transposable element proteins for
> RepeatRunner
> rm_gff= #pre-identified repeat elements from an external GFF3 file
> prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change
> this), 1 = yes, 0 = no
> softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg
> and dust filtering)
>
> #-----Gene Prediction
> snaphmm= #SNAP HMM file
> gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file
> augustus_species=1368D_uni #Augustus gene prediction species model
> fgenesh_par_file= #FGENESH parameter file
> pred_gff= #ab-initio predictions from an external GFF3 file
> model_gff= #annotated gene models from an external GFF3 file (annotation
> pass-through)
> est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
> protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no
> trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
> snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
> unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 =
> yes, 0 = no
>
> #-----Other Annotation Feature Types (features MAKER doesn't recognize)
> other_gff= #extra features to pass-through to final MAKER generated GFF3
> file
>
> #-----External Application Behavior Options
> alt_peptide=C #amino acid used to replace non-standard amino acids in
> BLAST databases
> cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI,
> leave 1 when using MPI)
>
> #-----MAKER Behavior Options
> max_dna_len=100000 #length for dividing up contigs into chunks
> (increases/decreases memory usage)
> min_contig=1 #skip genome contigs below this length (under 10kb are often
> useless)
>
> pred_flank=200 #flank for extending evidence clusters sent to gene
> predictors
> pred_stats=1 #report AED and QI statistics for all predictions as well as
> models
> AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and
> 1)
> min_protein=0 #require at least this many amino acids in predicted proteins
> alt_splice=0 #Take extra steps to try and find alternative splicing, 1 =
> yes, 0 = no
> always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0
> = no
> map_forward=0 #map names and attributes forward from old GFF3 genes, 1 =
> yes, 0 = no
> keep_preds=1 #Concordance threshold to add unsupported gene prediction
> (bound by 0 and 1)
>
> split_hit=10000 #length for the splitting of hits (expected max intron
> size for evidence alignments)
> single_exon=1 #consider single exon EST evidence when generating
> annotations, 1 = yes, 0 = no
> single_length=250 #min length required for single exon ESTs if
> 'single_exon is enabled'
> correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion
> genes
>
> tries=2 #number of times to try a contig if there is a failure for some
> reason
> clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0
> = no
> clean_up=0 #removes theVoid directory with individual analysis files, 1 =
> yes, 0 = no
> TMP= #specify a directory other than the system default temporary
> directory for temporary files
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
> --
> Xabier V?zquez-Campos, *PhD*
> *Research Associate*
> NSW Systems Biology Initiative
> School of Biotechnology and Biomolecular Sciences
> The University of New South Wales
> Sydney NSW 2052 AUSTRALIA
>
>
>
> --
> Xabier V?zquez-Campos, *PhD*
> *Research Associate*
> NSW Systems Biology Initiative
> School of Biotechnology and Biomolecular Sciences
> The University of New South Wales
> Sydney NSW 2052 AUSTRALIA
>


-- 
Xabier V?zquez-Campos, *PhD*
*Research Associate*
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190207/e334d07a/attachment-0002.html>

From liorglic at mail.tau.ac.il  Mon Feb 11 07:04:16 2019
From: liorglic at mail.tau.ac.il (Lior Glick)
Date: Mon, 11 Feb 2019 16:04:16 +0200
Subject: [maker-devel] MAKER not calling RepeatMasker exe indicated in
 maker_exe.ctl
Message-ID: <CAOzMDPxUf8a9orgsmbJ8QDdq4=OoKL_AkjVbsbPcGGm8z6ufXg@mail.gmail.com>

Dear MAKER users,

I've been using MAKER for a while now, with RepeatMasker installed locally.
By that I mean that I can type 'RepeatMasker' in my terminal and the
software is initiated. Typing 'which RepeatMasker' shows the correct local
path.
I also use this path as value for the maker_exe.ctl parameter
'RepeatMasker'.
Trying to generalize my working environment, I am trying to use a conda env
<https://anaconda.org/bioconda/maker> which is capable of running MAKER.
This env comes with RepeatMasker as well. Once I activate this env, I can
still run RepeatMasker, but it points to a different path. When I run MAKER
within this env, it fails right away with the error message:
ERROR: Could not determine if RepBase is installed
Running the same configuration files locally (i.e. outside the conda env)
results in a successful run.
This leads me to think that MAKER is not actually using the path indicated
in the maker_exe.ctl file, and rather looks for RepeatMasker in $PATH or
something similar. Is that the expected behavior? Any suggestions of how to
overcome this issue?

Thanks and best regards,
Lior
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190211/2c8039fa/attachment-0002.html>

From liorglic at mail.tau.ac.il  Mon Feb 11 07:12:25 2019
From: liorglic at mail.tau.ac.il (Lior Glick)
Date: Mon, 11 Feb 2019 16:12:25 +0200
Subject: [maker-devel] Unknown (X) amino acids in predicted proteins
Message-ID: <CAOzMDPwAC-KnF_h__kOUM_s5nziOHmrGq8ika9Hfb40wny3_xQ@mail.gmail.com>

Dear MAKER users,

After completing a MAKER run, I looked at the protein fasta files that
MAKER outputs and noticed that a small fraction of the sequences include X
characters, indicating unknown amino acids. I was wondering how such
sequences are obtained, I mean how come there are unknown amino acids in
the prediction? Is this an indication of low-quality predictions?
Is there any documentation regarding the procedure that generates the
protein sequences?

Thanks a lot,
Lior
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190211/55a59fcd/attachment-0002.html>

From kapeelc at gmail.com  Thu Feb  7 12:43:47 2019
From: kapeelc at gmail.com (Kapeel Chougule)
Date: Thu, 7 Feb 2019 14:43:47 -0500
Subject: [maker-devel] MAKER v3 Fgenesh ERROR
Message-ID: <CA+DOtefuUEc5_fFh7j2ykb4yBKmtEp1vgt0Pea-RF+7GCqr9ig@mail.gmail.com>

Hi, Carson

I have been getting this error with fgenesh tool within MAKER. It runs ok
with most of the assembly contigs but seems to fail on one contig or part
of the contig with the below error

Widget::fgenesh:
/mnt/grid/ware/hpc/home/data/mcampbel/applications/maker/bin/../lib/Widget/fgenesh/fgenesh_wrap
/mnt/grid/ware/hpc_norepl/data/data/programs/fgenesh_v8/fgenesh_suite_v8.0.0a/fgenesh
/sonas-hs/ware/hpc_norepl/data/programs/fgenesh_v8/fgenesh_suite_v8.0.0a/Zeamays.mpar.dat.new
/tmp/uge/53139300.1.primary.q/maker_j3ttxX/6/6_1.600610-613023.Zeamays.mpar.dat.new.auto_annotator.fgenesh.fasta
-exon_table:/tmp/uge/53139300.1.primary.q/maker_j3ttxX/6/6_1.600610-613023.Zeamays.mpar.dat.new.auto_annotator.xdef.fgenesh
>
/tmp/uge/53139300.1.primary.q/maker_j3ttxX/6/6_1.600610-613023.Zeamays.mpar.dat.new.auto_annotator.fgenesh
#-------------------------------#
 ...processing 9 of 24
 ...processing 8 of 28
 ...processing 10 of 24
 ...processing 9 of 28
 ...processing 11 of 24
 ...processing 10 of 28
 ...processing 12 of 24
 ...processing 11 of 28
deleted:0 genes
ERROR: FgenesH failed
--> rank=14, hostname=bnbcompute50
ERROR: Failed while annotating transcripts
ERROR: Chunk failed at level:1, tier_type:4
FAILED CONTIG:Super-Scaffold_14.2_contig2

I updated the perl module fgenesh.pm as suggested in the previous threads.
Attached are the  maker_opts.ctl and STDERR log file.

Thanks

Kapeel


-- 


*Kapeel ChouguleComputational Scientist Developer II*


*One Bungtown Road Cold Spring Harbor, NY 11724http://www.warelab.org/
<http://www.warelab.org/>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190207/b825acee/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: maker_opts.ctl
Type: application/octet-stream
Size: 5421 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190207/b825acee/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stderr.log
Type: application/octet-stream
Size: 10012918 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190207/b825acee/attachment-0005.obj>

From fatih.sarigoel at durham.ac.uk  Wed Feb 13 05:20:40 2019
From: fatih.sarigoel at durham.ac.uk (SARIGOEL, FATIH)
Date: Wed, 13 Feb 2019 12:20:40 +0000
Subject: [maker-devel] Does Conda Maker actually work?
Message-ID: <VI1PR06MB5613478CC864D85EB234EDF2B5660@VI1PR06MB5613.eurprd06.prod.outlook.com>

Greetings,
I notice that you never mention conda installation on your website, so I am curious if the conda version is actually supposed to be working fine or not; as for me it didn't.
I created a new conda environment and installed Maker (tried this with both installation options)
When I run the example files, I get this error:

"make: *** [Makefile:330: IndexedBase_14e0.o] Error 127
A problem was encountered while attempting to compile and install your Inline
C code. The command that failed was:
  "make > out.make 2>&1" with error code 2"

My conda environment is here
/fast_new/work/users/fsarigo_m/miniconda3
I don't understand why the program is trying to look here:
/home/conda
which does not exist

Also begins with a "possible precedence issue"

Thanks for your help in advance!
Fatih

+++++

Here is the full log until the end of the contig:

(MakerX) [fsarigo_m at med0223 MAKER]$ maker
Possible precedence issue with control flow operator at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 845.
STATUS: Parsing control files...
STATUS: Processing and indexing input FASTA files...
STATUS: Setting up database for any GFF3 input...
A data structure will be created for you at:
/fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_datastore

To access files for individual sequences use the datastore index:
/fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_master_datastore_index.log

STATUS: Now running MAKER...
examining contents of the fasta file and run log


--Next Contig--

Processing run.log file...
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: contig-dpp-500-500
Length: 32156
#---------------------------------------------------------------------


Running Mkbootstrap for IndexedBase_14e0 ()
chmod 644 "IndexedBase_14e0.bs"
"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644
"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp"  -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap"   IndexedBase_14e0.xs > IndexedBase_14e0.xsc
mv IndexedBase_14e0.xsc IndexedBase_14e0.c
/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c  -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2   -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE"   IndexedBase_14e0.c
/bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory
make: *** [Makefile:330: IndexedBase_14e0.o] Error 127

A problem was encountered while attempting to compile and install your Inline
C code. The command that failed was:
  "make > out.make 2>&1" with error code 2

The build directory was:
/fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0

To debug the problem, cd to the build directory, and inspect the output files.

Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin'
 at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275.
--> rank=NA, hostname=med0223
...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869.
--> rank=NA, hostname=med0223
 at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 38.
Error::_throw_Error_Simple(HASH(0x564b40c78870)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 306
Error::subs::run_clauses(HASH(0x564b40688970), "Running Mkbootstrap for IndexedBase_14e0 ()\x{a}chmod 644 \"Indexe"..., undef, ARRAY(0x564b40673ad0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 426
Error::subs::try(CODE(0x564b406899b8), HASH(0x564b40688970)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/FastaSeq.pm line 95
FastaSeq::seq(FastaSeq=HASH(0x564b4068a7f0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 478
Process::MpiChunk::_go(Process::MpiChunk=HASH(0x564b40673c08), "run", HASH(0x564b40673c80), 0, 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 341
Process::MpiChunk::run(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 357
Process::MpiChunk::run_all(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiTiers.pm line 287
Process::MpiTiers::run_all(Process::MpiTiers=HASH(0x564b4053f9f0), 0) called at /fast/users/fsarigo_m/miniconda3/envs/MakerX/bin/maker line 683
Running Mkbootstrap for IndexedBase_14e0 ()
chmod 644 "IndexedBase_14e0.bs"
"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644
"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp"  -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap"   IndexedBase_14e0.xs > IndexedBase_14e0.xsc
mv IndexedBase_14e0.xsc IndexedBase_14e0.c
/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c  -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2   -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE"   IndexedBase_14e0.c
/bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory
make: *** [Makefile:330: IndexedBase_14e0.o] Error 127

A problem was encountered while attempting to compile and install your Inline
C code. The command that failed was:
  "make > out.make 2>&1" with error code 2

The build directory was:
/fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0

To debug the problem, cd to the build directory, and inspect the output files.

Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin'
 at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275.
--> rank=NA, hostname=med0223
...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869.
--> rank=NA, hostname=med0223
--> rank=NA, hostname=med0223
--> rank=NA, hostname=med0223
ERROR: Failed while examining contents of the fasta file and run log
ERROR: Chunk failed at level:0, tier_type:0
FAILED CONTIG:contig-dpp-500-500

examining contents of the fasta file and run log


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190213/5e5ba244/attachment-0002.html>

From carsonhh at gmail.com  Wed Feb 13 07:51:44 2019
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 13 Feb 2019 07:51:44 -0700
Subject: [maker-devel] Does Conda Maker actually work?
In-Reply-To: <VI1PR06MB5613478CC864D85EB234EDF2B5660@VI1PR06MB5613.eurprd06.prod.outlook.com>
References: <VI1PR06MB5613478CC864D85EB234EDF2B5660@VI1PR06MB5613.eurprd06.prod.outlook.com>
Message-ID: <0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com>

The conda recipe was produced by another group. I do not currently recommend using it because I have seen a number of issues pop up on the list based on people attempting to install MAKER via conda.  I know there is at least an issue with the conda RepeatMasker install, and there may be others. The specific failure you show is from Bio::DB::IndexedBase trying to compile an Inline::C function. It may be that conda is installing an older BioPerl where this issue still exists ?> https://github.com/bioperl/bioperl-live/issues/215 <https://github.com/bioperl/bioperl-live/issues/215>

Or it may be that there is a new related issue (I?ve seen a handful of other examples that seem to relate back to Bio::DB::IndexedBase) ?> https://github.com/bioperl/bioperl-live/issues/305 <https://github.com/bioperl/bioperl-live/issues/305>

Try installing MAKER without conda (make sure to remove any components that are in conda first to avoid conflicts).

?Carson


> On Feb 13, 2019, at 5:20 AM, SARIGOEL, FATIH <fatih.sarigoel at durham.ac.uk> wrote:
> 
> Greetings,
> I notice that you never mention conda installation on your website, so I am curious if the conda version is actually supposed to be working fine or not; as for me it didn't.
> I created a new conda environment and installed Maker (tried this with both installation options)
> When I run the example files, I get this error:
> 
> "make: *** [Makefile:330: IndexedBase_14e0.o] Error 127
> A problem was encountered while attempting to compile and install your Inline
> C code. The command that failed was:
>   "make > out.make 2>&1" with error code 2"
> 
> My conda environment is here
> /fast_new/work/users/fsarigo_m/miniconda3
> I don't understand why the program is trying to look here:
> /home/conda
> which does not exist
> 
> Also begins with a "possible precedence issue"
> 
> Thanks for your help in advance!
> Fatih
> 
> +++++
> 
> Here is the full log until the end of the contig:
> 
> (MakerX) [fsarigo_m at med0223 MAKER]$ maker
> Possible precedence issue with control flow operator at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 845.
> STATUS: Parsing control files...
> STATUS: Processing and indexing input FASTA files...
> STATUS: Setting up database for any GFF3 input...
> A data structure will be created for you at:
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_datastore
> 
> To access files for individual sequences use the datastore index:
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_master_datastore_index.log
> 
> STATUS: Now running MAKER...
> examining contents of the fasta file and run log
> 
> 
> 
> --Next Contig--
> 
> Processing run.log file...
> #---------------------------------------------------------------------
> Now starting the contig!!
> SeqID: contig-dpp-500-500
> Length: 32156
> #---------------------------------------------------------------------
> 
> 
> Running Mkbootstrap for IndexedBase_14e0 ()
> chmod 644 "IndexedBase_14e0.bs"
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp"  -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap"   IndexedBase_14e0.xs > IndexedBase_14e0.xsc
> mv IndexedBase_14e0.xsc IndexedBase_14e0.c
> /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c  -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2   -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE"   IndexedBase_14e0.c
> /bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory
> make: *** [Makefile:330: IndexedBase_14e0.o] Error 127
> 
> A problem was encountered while attempting to compile and install your Inline
> C code. The command that failed was:
>   "make > out.make 2>&1" with error code 2
> 
> The build directory was:
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0
> 
> To debug the problem, cd to the build directory, and inspect the output files.
> 
> Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin'
>  at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275.
> --> rank=NA, hostname=med0223
> ...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869.
> --> rank=NA, hostname=med0223
>  at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 38.
> Error::_throw_Error_Simple(HASH(0x564b40c78870)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 306
> Error::subs::run_clauses(HASH(0x564b40688970), "Running Mkbootstrap for IndexedBase_14e0 ()\x{a}chmod 644 \"Indexe"..., undef, ARRAY(0x564b40673ad0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 426
> Error::subs::try(CODE(0x564b406899b8), HASH(0x564b40688970)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/FastaSeq.pm line 95
> FastaSeq::seq(FastaSeq=HASH(0x564b4068a7f0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 478
> Process::MpiChunk::_go(Process::MpiChunk=HASH(0x564b40673c08), "run", HASH(0x564b40673c80), 0, 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 341
> Process::MpiChunk::run(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 357
> Process::MpiChunk::run_all(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiTiers.pm line 287
> Process::MpiTiers::run_all(Process::MpiTiers=HASH(0x564b4053f9f0), 0) called at /fast/users/fsarigo_m/miniconda3/envs/MakerX/bin/maker line 683
> Running Mkbootstrap for IndexedBase_14e0 ()
> chmod 644 "IndexedBase_14e0.bs"
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp"  -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap"   IndexedBase_14e0.xs > IndexedBase_14e0.xsc
> mv IndexedBase_14e0.xsc IndexedBase_14e0.c
> /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c  -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2   -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE"   IndexedBase_14e0.c
> /bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory
> make: *** [Makefile:330: IndexedBase_14e0.o] Error 127
> 
> A problem was encountered while attempting to compile and install your Inline
> C code. The command that failed was:
>   "make > out.make 2>&1" with error code 2
> 
> The build directory was:
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0
> 
> To debug the problem, cd to the build directory, and inspect the output files.
> 
> Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin'
>  at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275.
> --> rank=NA, hostname=med0223
> ...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869.
> --> rank=NA, hostname=med0223
> --> rank=NA, hostname=med0223
> --> rank=NA, hostname=med0223
> ERROR: Failed while examining contents of the fasta file and run log
> ERROR: Chunk failed at level:0, tier_type:0
> FAILED CONTIG:contig-dpp-500-500
> 
> examining contents of the fasta file and run log
> 
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190213/033ff22a/attachment-0002.html>

From carsonhh at gmail.com  Wed Feb 13 10:14:13 2019
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 13 Feb 2019 10:14:13 -0700
Subject: [maker-devel] MAKER not calling RepeatMasker exe indicated in
 maker_exe.ctl
In-Reply-To: <CAFOVipNgzGd-wLNqz1WGx+mM_8R3KZOtqatq6D+nuNCHboRPXQ@mail.gmail.com>
References: <CAFOVipNgzGd-wLNqz1WGx+mM_8R3KZOtqatq6D+nuNCHboRPXQ@mail.gmail.com>
Message-ID: <6AFF11A9-9860-4047-A337-4B974C6C0F30@gmail.com>

The conda installation of RepeatMasker runs oddly. It does not appear to run the ./configure script during setup, and is missing files inside the repeat library as a result.

--Carson


> On Feb 4, 2019, at 2:00 AM, Lior Glick <liorglck at gmail.com> wrote:
> 
> Dear MAKER users,
> 
> I've been using MAKER for a while now, with RepeatMasker installed locally. By that I mean that I can type 'RepeatMasker' in my terminal and the software is initiated. Typing 'which RepeatMasker' shows the correct local path.
> I also use this path as value for the maker_exe.ctl parameter 'RepeatMasker'.
> Trying to generalize my working environment, I am trying to use a conda env <https://anaconda.org/bioconda/maker> which is capable of running MAKER. This env comes with RepeatMasker as well. Once I activate this env, I can still run RepeatMasker, but it points to a different path. When I run MAKER within this env, it fails right away with the error message:
> ERROR: Could not determine if RepBase is installed
> Running the same configuration files locally (i.e. outside the conda env) results in a successful run.
> This leads me to think that MAKER is not actually using the path indicated in the maker_exe.ctl file, and rather looks for RepeatMasker in $PATH or something similar. Is that the expected behavior? Any suggestions of how to overcome this issue?
> 
> Thanks and best regards,
> Lior
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190213/204470fd/attachment-0002.html>

From carsonhh at gmail.com  Wed Feb 13 10:18:44 2019
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 13 Feb 2019 10:18:44 -0700
Subject: [maker-devel] Unknown (X) amino acids in predicted proteins
In-Reply-To: <CAOzMDPwAC-KnF_h__kOUM_s5nziOHmrGq8ika9Hfb40wny3_xQ@mail.gmail.com>
References: <CAOzMDPwAC-KnF_h__kOUM_s5nziOHmrGq8ika9Hfb40wny3_xQ@mail.gmail.com>
Message-ID: <1472E55C-62CB-4A73-B45D-C4BEF3E014B7@gmail.com>

If you use GFF3 as input, or use est2genome or protein2genome in your final run, you may have ?N? characters from the assembly as part of your CDS (?N? is the ambiguity code for DNA which will result in an ?X? when translated which is the ambiguity code for amino acids). Augustus will do internal gymnastics and completely splice out exons containing N?s to try and never have this issue, but may not always be able to. It?s an indication of genome assembly issues.

--Carson


> On Feb 11, 2019, at 7:12 AM, Lior Glick <liorglic at mail.tau.ac.il> wrote:
> 
> Dear MAKER users,
> 
> After completing a MAKER run, I looked at the protein fasta files that MAKER outputs and noticed that a small fraction of the sequences include X characters, indicating unknown amino acids. I was wondering how such sequences are obtained, I mean how come there are unknown amino acids in the prediction? Is this an indication of low-quality predictions?
> Is there any documentation regarding the procedure that generates the protein sequences?
> 
> Thanks a lot,
> Lior
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Wed Feb 13 10:24:01 2019
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 13 Feb 2019 10:24:01 -0700
Subject: [maker-devel] Re-annotation, fewer gene predictions
In-Reply-To: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>
References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>
Message-ID: <D33A2A92-BFCA-4493-A66E-99C567954AD2@gmail.com>

One thing you can also do is use old models as protein= input and run the protein2genome option just to see where things align. You may find that not all old models are recoverable in the new assembly. Fewer genes in the new assembly may mean redundant/duplicate contigs were collapse and split contigs were joined resulting in multiple gene fragments becoming a unified single model. Make sure to always review contigs in a browser to see how models and evidence correlate.

?Carson


> On Feb 3, 2019, at 12:13 PM, morgan sobol <morgan_starr_s at live.com> wrote:
> 
> Hello, 
> 
> I previously used Maker to annotate two different fungal genomes that were created using Illumina sequences only. For these genomes, I had over 11,000 genes predicted. 
> I recently obtained PacBio sequences for the same genomes, so I created two hybrid assemblies. Both assemblies were very familiar in length and completed number of orthologs to the Illumina only assembly, but had much fewer, but longer contigs. 
> 
> I re-ran Maker using the settings below. For one of my genomes, I got around 11,000 genes predicted again, as expected. However, for the other genome, I am continuously getting ~4,400 predicted genes. 
> 
> I am asking for help as to how I can determine why I keep getting fewer predicted genes for only one of my genomes, even though I ran them the same?
> 
> Thanks,
> Morgan S. 
> 
> maker_opts.log
> #-----Genome (these are always required)
> genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked #genome sequence (fasta file or$
> organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic
> 
> #-----Re-annotation Using MAKER Derived GFF3
> maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff #MAKER derive$
> est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
> altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
> protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
> rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
> model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
> pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
> other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no
> 
> #-----EST Evidence (for best results provide a file for at least one)
> est= #set of ESTs or assembled mRNA-seq in fasta format
> altest= #EST/cDNA sequence file in fasta format from an alternate organism
> est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
> altest_gff= #aligned ESTs from a closly relate species in GFF3 format
> 
> #-----Protein Homology Evidence (for best results provide a file for at least one)
> protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta  #protein sequence file in fasta format (i.e. from mutiple oransisms)
> protein_gff=  #aligned protein homology evidence from an external GFF3 file
> 
> #-----Repeat Masking (leave values blank to skip repeat masking)
> model_org= #select a model organism for RepBase masking in RepeatMasker
> rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker
> repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner
> rm_gff= #pre-identified repeat elements from an external GFF3 file
> prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no
> softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering)
> 
> #-----Gene Prediction
> snaphmm= #SNAP HMM file
> gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file
> augustus_species=1368D_uni #Augustus gene prediction species model
> fgenesh_par_file= #FGENESH parameter file
> pred_gff= #ab-initio predictions from an external GFF3 file
> model_gff= #annotated gene models from an external GFF3 file (annotation pass-through)
> est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
> protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no
> trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
> snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
> unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no
> 
> #-----Other Annotation Feature Types (features MAKER doesn't recognize)
> other_gff= #extra features to pass-through to final MAKER generated GFF3 file
> 
> #-----External Application Behavior Options
> alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases
> cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI)
> 
> #-----MAKER Behavior Options
> max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage)
> min_contig=1 #skip genome contigs below this length (under 10kb are often useless)
> 
> pred_flank=200 #flank for extending evidence clusters sent to gene predictors
> pred_stats=1 #report AED and QI statistics for all predictions as well as models
> AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
> min_protein=0 #require at least this many amino acids in predicted proteins
> alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no
> always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no
> map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no
> keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1)
> 
> split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments)
> single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no
> single_length=250 #min length required for single exon ESTs if 'single_exon is enabled'
> correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes
> 
> tries=2 #number of times to try a contig if there is a failure for some reason
> clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no
> clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no
> TMP= #specify a directory other than the system default temporary directory for temporary files
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190213/9051057c/attachment-0002.html>

From liorglck at gmail.com  Sun Feb 17 11:50:10 2019
From: liorglck at gmail.com (Lior Glick)
Date: Sun, 17 Feb 2019 20:50:10 +0200
Subject: [maker-devel] Does Conda Maker actually work?
In-Reply-To: <0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com>
References: <VI1PR06MB5613478CC864D85EB234EDF2B5660@VI1PR06MB5613.eurprd06.prod.outlook.com>
	<0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com>
Message-ID: <CAFOVipPHWZ++FwVdBMDuMx_PTRT2Ep-MZc=iD13ezT1bgrMZwg@mail.gmail.com>

That's good to know. Any plans on creating a stable conda package in the
future? It'd be a very nice feature, especially since MAKER is not always
straightforward to install.

On Wed, Feb 13, 2019 at 5:22 PM Carson Holt <carsonhh at gmail.com> wrote:

> The conda recipe was produced by another group. I do not currently
> recommend using it because I have seen a number of issues pop up on the
> list based on people attempting to install MAKER via conda.  I know there
> is at least an issue with the conda RepeatMasker install, and there may be
> others. The specific failure you show is from Bio::DB::IndexedBase trying
> to compile an Inline::C function. It may be that conda is installing an
> older BioPerl where this issue still exists ?>
> https://github.com/bioperl/bioperl-live/issues/215
>
> Or it may be that there is a new related issue (I?ve seen a handful of
> other examples that seem to relate back to Bio::DB::IndexedBase) ?>
> https://github.com/bioperl/bioperl-live/issues/305
>
> Try installing MAKER without conda (make sure to remove any components
> that are in conda first to avoid conflicts).
>
> ?Carson
>
>
> On Feb 13, 2019, at 5:20 AM, SARIGOEL, FATIH <fatih.sarigoel at durham.ac.uk>
> wrote:
>
> Greetings,
> I notice that you never mention conda installation on your website, so I
> am curious if the conda version is actually supposed to be working fine or
> not; as for me it didn't.
> I created a new conda environment and installed Maker (tried this with
> both installation options)
> When I run the example files, I get this error:
>
> "make: *** [Makefile:330: IndexedBase_14e0.o] Error 127
> A problem was encountered while attempting to compile and install your
> Inline
> C code. The command that failed was:
>   "make > out.make 2>&1" with error code 2"
>
> My conda environment is here
> /fast_new/work/users/fsarigo_m/miniconda3
> I don't understand why the program is trying to look here:
> /home/conda
> which does not exist
>
> Also begins with a "possible precedence issue"
>
> Thanks for your help in advance!
> Fatih
>
> +++++
>
> Here is the full log until the end of the contig:
>
> (MakerX) [fsarigo_m at med0223 MAKER]$ maker
> Possible precedence issue with control flow operator at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm
> line 845.
> STATUS: Parsing control files...
> STATUS: Processing and indexing input FASTA files...
> STATUS: Setting up database for any GFF3 input...
> A data structure will be created for you at:
>
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_datastore
>
> To access files for individual sequences use the datastore index:
>
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_master_datastore_index.log
>
> STATUS: Now running MAKER...
> examining contents of the fasta file and run log
>
>
>
> --Next Contig--
>
> Processing run.log file...
> #---------------------------------------------------------------------
> Now starting the contig!!
> SeqID: contig-dpp-500-500
> Length: 32156
> #---------------------------------------------------------------------
>
>
> Running Mkbootstrap for IndexedBase_14e0 ()
> chmod 644 "IndexedBase_14e0.bs"
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl"
> -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs
> blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl"
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp"
> -typemap
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap"
>  IndexedBase_14e0.xs > IndexedBase_14e0.xsc
> mv IndexedBase_14e0.xsc IndexedBase_14e0.c
> /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc
> -c  -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin"
> -D_REENTRANT -D_GNU_SOURCE
> --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot
> -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong
> -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2
>  -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC
> --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot
> "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE"
>  IndexedBase_14e0.c
> /bin/sh:
> /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc:
> No such file or directory
> make: *** [Makefile:330: IndexedBase_14e0.o] Error 127
>
> A problem was encountered while attempting to compile and install your
> Inline
> C code. The command that failed was:
>   "make > out.make 2>&1" with error code 2
>
> The build directory was:
>
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0
>
> To debug the problem, cd to the build directory, and inspect the output
> files.
>
> Environment PATH =
> '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin'
>  at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm
> line 275.
> --> rank=NA, hostname=med0223
> ...propagated at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm
> line 869.
> --> rank=NA, hostname=med0223
>  at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm
> line 38.
> Error::_throw_Error_Simple(HASH(0x564b40c78870)) called at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm
> line 306
> Error::subs::run_clauses(HASH(0x564b40688970), "Running Mkbootstrap for
> IndexedBase_14e0 ()\x{a}chmod 644 \"Indexe"..., undef,
> ARRAY(0x564b40673ad0)) called at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm
> line 426
> Error::subs::try(CODE(0x564b406899b8), HASH(0x564b40688970)) called at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/FastaSeq.pm
> line 95
> FastaSeq::seq(FastaSeq=HASH(0x564b4068a7f0)) called at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm
> line 478
> Process::MpiChunk::_go(Process::MpiChunk=HASH(0x564b40673c08), "run",
> HASH(0x564b40673c80), 0, 0) called at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm
> line 341
> Process::MpiChunk::run(Process::MpiChunk=HASH(0x564b40673c08), 0) called
> at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm
> line 357
> Process::MpiChunk::run_all(Process::MpiChunk=HASH(0x564b40673c08), 0)
> called at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiTiers.pm
> line 287
> Process::MpiTiers::run_all(Process::MpiTiers=HASH(0x564b4053f9f0), 0)
> called at /fast/users/fsarigo_m/miniconda3/envs/MakerX/bin/maker line 683
> Running Mkbootstrap for IndexedBase_14e0 ()
> chmod 644 "IndexedBase_14e0.bs"
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl"
> -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs
> blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl"
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp"
> -typemap
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap"
>  IndexedBase_14e0.xs > IndexedBase_14e0.xsc
> mv IndexedBase_14e0.xsc IndexedBase_14e0.c
> /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc
> -c  -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin"
> -D_REENTRANT -D_GNU_SOURCE
> --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot
> -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong
> -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2
>  -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC
> --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot
> "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE"
>  IndexedBase_14e0.c
> /bin/sh:
> /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc:
> No such file or directory
> make: *** [Makefile:330: IndexedBase_14e0.o] Error 127
>
> A problem was encountered while attempting to compile and install your
> Inline
> C code. The command that failed was:
>   "make > out.make 2>&1" with error code 2
>
> The build directory was:
>
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0
>
> To debug the problem, cd to the build directory, and inspect the output
> files.
>
> Environment PATH =
> '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin'
>  at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm
> line 275.
> --> rank=NA, hostname=med0223
> ...propagated at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm
> line 869.
> --> rank=NA, hostname=med0223
> --> rank=NA, hostname=med0223
> --> rank=NA, hostname=med0223
> ERROR: Failed while examining contents of the fasta file and run log
> ERROR: Chunk failed at level:0, tier_type:0
> FAILED CONTIG:contig-dpp-500-500
>
> examining contents of the fasta file and run log
>
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190217/678d8fd1/attachment-0002.html>

From morgan_starr_s at live.com  Mon Feb 18 02:08:56 2019
From: morgan_starr_s at live.com (morgan sobol)
Date: Mon, 18 Feb 2019 09:08:56 +0000
Subject: [maker-devel] Re-annotation, fewer gene predictions
In-Reply-To: <CAL0hg4HG0n1+kw4PpFL_LG66nE+Sdd1fzX2Atn5+o+KryVCtug@mail.gmail.com>
References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>
	<CAL0hg4HevFbPhVLfuLq3WF7iJUFpHKwm0X9q+X_yX5sJsCqKDA@mail.gmail.com>
	<DM5PR14MB129277D10A397B2CBE0DDA08AE6E0@DM5PR14MB1292.namprd14.prod.outlook.com>
	<CAL0hg4EH=79A7ucKe=ORznXh=7Suu9Q8AEWj7C8Xio82=G4fvw@mail.gmail.com>
	<DM5PR14MB1292FEA9F662D408FEBB3D21AE6F0@DM5PR14MB1292.namprd14.prod.outlook.com>,
	<CAL0hg4HG0n1+kw4PpFL_LG66nE+Sdd1fzX2Atn5+o+KryVCtug@mail.gmail.com>
Message-ID: <DM5PR14MB1292E82A4864CCC40B80122EAE630@DM5PR14MB1292.namprd14.prod.outlook.com>

Thank you, Xabi and Carson.
With your help, I was able to improve the annotation with a more appropriate number of predictions.

Best,
Morgan

________________________________
From: Xabier V?zquez-Campos <xvazquezc at gmail.com>
Sent: Wednesday, February 6, 2019 11:33 PM
To: morgan sobol; Maker Mailing List
Subject: Re: [maker-devel] Re-annotation, fewer gene predictions

SNAP is easy to train, works well in fungal genomes and it's explained in Maker's wiki:
http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_WGS_Assembly_and_Annotation_Winter_School_2018#Training_ab_initio_Gene_Predictors

Oh, sorry, I didn't explain myself well. What I was trying to say is that before BUSCO, when we only had CEGMA, we would proceed in a different way to train Augustus as CEGMA wouldn't produce Augustus gene models automatically. I don't mean you to use CEGMA.

This is what I have on my own documentation about how to train Augustus "the old way"
AUGUSTUS? the old way

Alternatively, you can train AUGUSTUS in a more ?manual? way, like when we were using CEGMA. The training starts with the output from the second instance of fathom in the SNAP training section.

cd ${MYGENOME_DIR}/maker/snap1
perl ~/bin/zff2augustus_gbk.pl<http://zff2augustus_gbk.pl> > ${MYGENOME}.train1.gb<http://train1.gb>

zff2augustus_gbk.pl<http://zff2augustus_gbk.pl> generates a GenBank file from export.dna.

The actual training of AUGUSTUS will be through the webAUGUSTUS server.

Before proceed, it is recommended to rename the fasta headers, specially if they contain special characters and/or very long headers. This is the main reason of failure for the jobs submitted to webAUGUSTUS. You can use the simplifyFastaHeaders.pl<http://bioinf.uni-greifswald.de/bioinf/downloads/simplifyFastaHeaders.pl> script for that:

perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_assembly.fasta nameStem ${MYGENOME}_contigs_rename.fasta ${MYGENOME}_contigs.map

perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_transcripts_assembled.fasta nameStem ${MYGENOME}_rna_rename.fasta ${MYGENOME}_rna.map

nameStem is the base name for naming each of the sequences in the multifasta files. Use a value with something appropriate. Use contig and rna for the assembly and RNA-seq files, respectively; or something based on that. For example, ?pgcontig? and ?pgrna? for contigs and RNA from Puccinia graminis
DO NOT give the same nameStem to both fasta files, and don?t use any special character.

We need the following files (minimum):

  *   ${MYGENOME}_assembly.fasta as Genome file
  *   ${MYGENOME}.train1.gb<http://train1.gb> as Training gene structure file

If we also have RNA-seq data:

  *   ${MYGENOME}_assembled_transcripts.fasta as cDNA file

Use ${MYGENOME}_v1 as Species name. We will need to have a different species name in the retraining step. Otherwise when Maker2 is rerun, Maker2 will see the same name and will not rerun AUGUSTUS, even though the species profile is different. So, ${MYGENOME}_v1 just do the job and tracks version.

Once the job is finished, the Species parameter archive (parameters.tar.gz) will contain a folder with the model files for your species. Copy it to the species folder of your AUGUSTUS installation.

Hope this helps

PS: hit reply all so this is logged in Maker's mail list in case anybody else experiences similar issues

On Thu, 7 Feb 2019 at 06:36, morgan sobol <morgan_starr_s at live.com<mailto:morgan_starr_s at live.com>> wrote:
I have not used SNAP or CEGMA, however, I see that CEGMA was discontinued in 2015.
Do you think that will be a problem, or is it still worth using the old version?


________________________________
From: Xabier V?zquez-Campos <xvazquezc at gmail.com<mailto:xvazquezc at gmail.com>>
Sent: Tuesday, February 5, 2019 4:42 PM
To: morgan sobol; Maker Mailing List
Subject: Re: [maker-devel] Re-annotation, fewer gene predictions

Don't you use SNAP? It usually produces quite decent results. And easier to train than any of the other predictors

In any case, the Augustus gene model is way off in both cases
GM doesn't seem bad if your fungus has a rather usual genome... in the first. For the second, it looks bad

I'm not too familiar with the reannotation but I'd rather create the gene models from scratch rather than reuse the ones from the Illumina-only genomes.
Note that assemblies with long-reads, have a higher proportion of repetitive elements that need masking and RepeatMasker only may not be enough. In theory, this shouldn't affect Augustus model if trained through BUSCO as it uses defined conserved markers to create the gene model, but I'm not so sure about GM.

If you trained Augustus with BUSCO, and this is the result, I'd discard the gene model and train it again by the "traditional way", i.e. as it used to be when we only had CEGMA. I had good results just by changing the training method.

Hope it helps,
Xabi


On Wed, 6 Feb 2019 at 02:19, morgan sobol <morgan_starr_s at live.com<mailto:morgan_starr_s at live.com>> wrote:
Thank you, Xabi for the response.
The number of proteins from each source is greatly lower than before.
Previous numbers were 325, 10,899, and 11,243 for augustus, genemark, and maker respectively.
The more recent numbers are 25, 857, 4418 respectively.

So do you think maybe this hints that something is wrong from genemark?

Morgan


________________________________
From: Xabier V?zquez-Campos <xvazquezc at gmail.com<mailto:xvazquezc at gmail.com>>
Sent: Sunday, February 3, 2019 4:43 PM
To: morgan sobol
Cc: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Re-annotation, fewer gene predictions

Hi Morgan,

We had a similar issue with AUGUSTUS underpredicting when using a BUSCO-derived gene model
https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ

Also, check the number of proteins by each individual predictor. If the numbers from one of them are off, you may find a possible source of issues.
We didn't have a very good experience with GM, as it used to overpredict an absurd number of proteins.

Xabi

On Mon, 4 Feb 2019 at 06:15, morgan sobol <morgan_starr_s at live.com<mailto:morgan_starr_s at live.com>> wrote:
Hello,

I previously used Maker to annotate two different fungal genomes that were created using Illumina sequences only. For these genomes, I had over 11,000 genes predicted.
I recently obtained PacBio sequences for the same genomes, so I created two hybrid assemblies. Both assemblies were very familiar in length and completed number of orthologs to the Illumina only assembly, but had much fewer, but longer contigs.

I re-ran Maker using the settings below. For one of my genomes, I got around 11,000 genes predicted again, as expected. However, for the other genome, I am continuously getting ~4,400 predicted genes.

I am asking for help as to how I can determine why I keep getting fewer predicted genes for only one of my genomes, even though I ran them the same?

Thanks,
Morgan S.

maker_opts.log
#-----Genome (these are always required)
genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked #genome sequence (fasta file or$
organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic

#-----Re-annotation Using MAKER Derived GFF3
maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff #MAKER derive$
est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no

#-----EST Evidence (for best results provide a file for at least one)
est= #set of ESTs or assembled mRNA-seq in fasta format
altest= #EST/cDNA sequence file in fasta format from an alternate organism
est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
altest_gff= #aligned ESTs from a closly relate species in GFF3 format

#-----Protein Homology Evidence (for best results provide a file for at least one)
protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta  #protein sequence file in fasta format (i.e. from mutiple oransisms)
protein_gff=  #aligned protein homology evidence from an external GFF3 file

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org= #select a model organism for RepBase masking in RepeatMasker
rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker
repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner
rm_gff= #pre-identified repeat elements from an external GFF3 file
prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no
softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering)

#-----Gene Prediction
snaphmm= #SNAP HMM file
gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file
augustus_species=1368D_uni #Augustus gene prediction species model
fgenesh_par_file= #FGENESH parameter file
pred_gff= #ab-initio predictions from an external GFF3 file
model_gff= #annotated gene models from an external GFF3 file (annotation pass-through)
est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no
trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no

#-----Other Annotation Feature Types (features MAKER doesn't recognize)
other_gff= #extra features to pass-through to final MAKER generated GFF3 file

#-----External Application Behavior Options
alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases
cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI)

#-----MAKER Behavior Options
max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage)
min_contig=1 #skip genome contigs below this length (under 10kb are often useless)

pred_flank=200 #flank for extending evidence clusters sent to gene predictors
pred_stats=1 #report AED and QI statistics for all predictions as well as models
AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
min_protein=0 #require at least this many amino acids in predicted proteins
alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no
always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no
map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no
keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1)

split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments)
single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no
single_length=250 #min length required for single exon ESTs if 'single_exon is enabled'
correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes

tries=2 #number of times to try a contig if there is a failure for some reason
clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no
clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no
TMP= #specify a directory other than the system default temporary directory for temporary files

_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


--
Xabier V?zquez-Campos, PhD
Research Associate
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA


--
Xabier V?zquez-Campos, PhD
Research Associate
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA


--
Xabier V?zquez-Campos, PhD
Research Associate
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190218/cd5b4c18/attachment-0002.html>

From anthony.bretaudeau at inria.fr  Mon Feb 18 02:53:39 2019
From: anthony.bretaudeau at inria.fr (Anthony Bretaudeau)
Date: Mon, 18 Feb 2019 10:53:39 +0100
Subject: [maker-devel] Does Conda Maker actually work?
In-Reply-To: <CAFOVipPHWZ++FwVdBMDuMx_PTRT2Ep-MZc=iD13ezT1bgrMZwg@mail.gmail.com>
References: <VI1PR06MB5613478CC864D85EB234EDF2B5660@VI1PR06MB5613.eurprd06.prod.outlook.com>
	<0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com>
	<CAFOVipPHWZ++FwVdBMDuMx_PTRT2Ep-MZc=iD13ezT1bgrMZwg@mail.gmail.com>
Message-ID: <3aa1eb97-f8bf-dd61-febf-464ad4b1626c@inria.fr>

An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190218/d42974d5/attachment-0002.html>

From liorglic at mail.tau.ac.il  Sun Feb 24 05:50:49 2019
From: liorglic at mail.tau.ac.il (Lior Glick)
Date: Sun, 24 Feb 2019 14:50:49 +0200
Subject: [maker-devel] Profiling MAKER runs
Message-ID: <CAOzMDPyHL9tM-DWTBJb=SSMT1KH6FwhArdgqgN-8aVoBthY69g@mail.gmail.com>

Dear MAKER users,
I was wondering if any of you has an idea of a way by which I can profile
my runs. What I mean is I'd like to know how much time was spent on each
step of the analysis - am I spending most of the time masking repeats,
blasting transcripts/proteins, running ab-initio predictors etc. Based on
this information, I might want to adjust my configuration, e.g. maybe I'm
spending a lot of time blasting transcripts, and reducing the number of
input transcripts would reduce run time significantly without having a
major effect on results quality.
As far as I can see, the main run log does not provide such information,
and I'm not sure where else to look. Any ideas or directions could be of
help.

Thanks!
Lior
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190224/584449c3/attachment-0002.html>

From morgan_starr_s at live.com  Sun Feb  3 12:13:47 2019
From: morgan_starr_s at live.com (morgan sobol)
Date: Sun, 3 Feb 2019 19:13:47 +0000
Subject: [maker-devel] Re-annotation, fewer gene predictions
Message-ID: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>

Hello,

I previously used Maker to annotate two different fungal genomes that were created using Illumina sequences only. For these genomes, I had over 11,000 genes predicted.
I recently obtained PacBio sequences for the same genomes, so I created two hybrid assemblies. Both assemblies were very familiar in length and completed number of orthologs to the Illumina only assembly, but had much fewer, but longer contigs.

I re-ran Maker using the settings below. For one of my genomes, I got around 11,000 genes predicted again, as expected. However, for the other genome, I am continuously getting ~4,400 predicted genes.

I am asking for help as to how I can determine why I keep getting fewer predicted genes for only one of my genomes, even though I ran them the same?

Thanks,
Morgan S.

maker_opts.log
#-----Genome (these are always required)
genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked #genome sequence (fasta file or$
organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic

#-----Re-annotation Using MAKER Derived GFF3
maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff #MAKER derive$
est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no

#-----EST Evidence (for best results provide a file for at least one)
est= #set of ESTs or assembled mRNA-seq in fasta format
altest= #EST/cDNA sequence file in fasta format from an alternate organism
est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
altest_gff= #aligned ESTs from a closly relate species in GFF3 format

#-----Protein Homology Evidence (for best results provide a file for at least one)
protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta  #protein sequence file in fasta format (i.e. from mutiple oransisms)
protein_gff=  #aligned protein homology evidence from an external GFF3 file

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org= #select a model organism for RepBase masking in RepeatMasker
rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker
repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner
rm_gff= #pre-identified repeat elements from an external GFF3 file
prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no
softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering)

#-----Gene Prediction
snaphmm= #SNAP HMM file
gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file
augustus_species=1368D_uni #Augustus gene prediction species model
fgenesh_par_file= #FGENESH parameter file
pred_gff= #ab-initio predictions from an external GFF3 file
model_gff= #annotated gene models from an external GFF3 file (annotation pass-through)
est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no
trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no

#-----Other Annotation Feature Types (features MAKER doesn't recognize)
other_gff= #extra features to pass-through to final MAKER generated GFF3 file

#-----External Application Behavior Options
alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases
cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI)

#-----MAKER Behavior Options
max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage)
min_contig=1 #skip genome contigs below this length (under 10kb are often useless)

pred_flank=200 #flank for extending evidence clusters sent to gene predictors
pred_stats=1 #report AED and QI statistics for all predictions as well as models
AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
min_protein=0 #require at least this many amino acids in predicted proteins
alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no
always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no
map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no
keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1)

split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments)
single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no
single_length=250 #min length required for single exon ESTs if 'single_exon is enabled'
correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes

tries=2 #number of times to try a contig if there is a failure for some reason
clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no
clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no
TMP= #specify a directory other than the system default temporary directory for temporary files

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190203/ce613295/attachment-0003.html>

From xvazquezc at gmail.com  Sun Feb  3 15:43:42 2019
From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=)
Date: Mon, 4 Feb 2019 09:43:42 +1100
Subject: [maker-devel] Re-annotation, fewer gene predictions
In-Reply-To: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>
References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>
Message-ID: <CAL0hg4HevFbPhVLfuLq3WF7iJUFpHKwm0X9q+X_yX5sJsCqKDA@mail.gmail.com>

Hi Morgan,

We had a similar issue with AUGUSTUS underpredicting when using a
BUSCO-derived gene model
https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ

Also, check the number of proteins by each individual predictor. If the
numbers from one of them are off, you may find a possible source of issues.
We didn't have a very good experience with GM, as it used to overpredict an
absurd number of proteins.

Xabi

On Mon, 4 Feb 2019 at 06:15, morgan sobol <morgan_starr_s at live.com> wrote:

> Hello,
>
> I previously used Maker to annotate two different fungal genomes that were
> created using Illumina sequences only. For these genomes, I had over 11,000
> genes predicted.
> I recently obtained PacBio sequences for the same genomes, so I created
> two hybrid assemblies. Both assemblies were very familiar in length and
> completed number of orthologs to the Illumina only assembly, but had much
> fewer, but longer contigs.
>
> I re-ran Maker using the settings below. For one of my genomes, I got
> around 11,000 genes predicted again, as expected. However, for the other
> genome, I am continuously getting ~4,400 predicted genes.
>
> I am asking for help as to how I can determine why I keep getting fewer
> predicted genes for only one of my genomes, even though I ran them the same?
>
> Thanks,
> Morgan S.
>
> maker_opts.log
> #-----Genome (these are always required)
> genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked
> #genome sequence (fasta file or$
> organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic
>
> #-----Re-annotation Using MAKER Derived GFF3
> maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff
> #MAKER derive$
> est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
> altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
> protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
> rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
> model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
> pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
> other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no
>
> #-----EST Evidence (for best results provide a file for at least one)
> est= #set of ESTs or assembled mRNA-seq in fasta format
> altest= #EST/cDNA sequence file in fasta format from an alternate organism
> est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
> altest_gff= #aligned ESTs from a closly relate species in GFF3 format
>
> #-----Protein Homology Evidence (for best results provide a file for at
> least one)
> protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta
> #protein sequence file in fasta format (i.e. from mutiple oransisms)
> protein_gff=  #aligned protein homology evidence from an external GFF3 file
>
> #-----Repeat Masking (leave values blank to skip repeat masking)
> model_org= #select a model organism for RepBase masking in RepeatMasker
> rmlib= #provide an organism specific repeat library in fasta format for
> RepeatMasker
> repeat_protein= #provide a fasta file of transposable element proteins for
> RepeatRunner
> rm_gff= #pre-identified repeat elements from an external GFF3 file
> prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change
> this), 1 = yes, 0 = no
> softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg
> and dust filtering)
>
> #-----Gene Prediction
> snaphmm= #SNAP HMM file
> gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file
> augustus_species=1368D_uni #Augustus gene prediction species model
> fgenesh_par_file= #FGENESH parameter file
> pred_gff= #ab-initio predictions from an external GFF3 file
> model_gff= #annotated gene models from an external GFF3 file (annotation
> pass-through)
> est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
> protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no
> trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
> snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
> unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 =
> yes, 0 = no
>
> #-----Other Annotation Feature Types (features MAKER doesn't recognize)
> other_gff= #extra features to pass-through to final MAKER generated GFF3
> file
>
> #-----External Application Behavior Options
> alt_peptide=C #amino acid used to replace non-standard amino acids in
> BLAST databases
> cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI,
> leave 1 when using MPI)
>
> #-----MAKER Behavior Options
> max_dna_len=100000 #length for dividing up contigs into chunks
> (increases/decreases memory usage)
> min_contig=1 #skip genome contigs below this length (under 10kb are often
> useless)
>
> pred_flank=200 #flank for extending evidence clusters sent to gene
> predictors
> pred_stats=1 #report AED and QI statistics for all predictions as well as
> models
> AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and
> 1)
> min_protein=0 #require at least this many amino acids in predicted proteins
> alt_splice=0 #Take extra steps to try and find alternative splicing, 1 =
> yes, 0 = no
> always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0
> = no
> map_forward=0 #map names and attributes forward from old GFF3 genes, 1 =
> yes, 0 = no
> keep_preds=1 #Concordance threshold to add unsupported gene prediction
> (bound by 0 and 1)
>
> split_hit=10000 #length for the splitting of hits (expected max intron
> size for evidence alignments)
> single_exon=1 #consider single exon EST evidence when generating
> annotations, 1 = yes, 0 = no
> single_length=250 #min length required for single exon ESTs if
> 'single_exon is enabled'
> correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion
> genes
>
> tries=2 #number of times to try a contig if there is a failure for some
> reason
> clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0
> = no
> clean_up=0 #removes theVoid directory with individual analysis files, 1 =
> yes, 0 = no
> TMP= #specify a directory other than the system default temporary
> directory for temporary files
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>


-- 
Xabier V?zquez-Campos, *PhD*
*Research Associate*
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190204/2d94d0d9/attachment-0003.html>

From keith.decker at bayer.com  Mon Feb  4 11:09:35 2019
From: keith.decker at bayer.com (DECKER, KEITH F [AG/1005])
Date: Mon, 4 Feb 2019 18:09:35 +0000
Subject: [maker-devel] MAKER on AWS
Message-ID: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com>

I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be.
I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html
but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use.

So my questions are

1.  Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS?
2.  Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end?  If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate?

Thanks and sorry for the long question
Keith
This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.
Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.
Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so.
If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190204/e803b13e/attachment-0003.html>

From carsonhh at gmail.com  Mon Feb  4 11:31:29 2019
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 4 Feb 2019 11:31:29 -0700
Subject: [maker-devel] MAKER on AWS
In-Reply-To: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com>
References: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com>
Message-ID: <0934DD0D-9431-4454-A278-87E27D44F984@gmail.com>

You can try and stand up a cluster inside AWS, or like you said just start independent instances each with their own piece of the total dataset. There is a tools called fasta_tool inside of maker that makes it easy to split up the dataset into equal sized chunks.

Alternatively, CyVerse has set up an interesting MAKER wrapper (WQ-MAKER) that launches multiple cloud instances for MAKER and handles data chunking for you (they?ve been using XSEDE cloud resources through the NSF)  ?>
http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf <http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf>

Here is an example of an external project using their setup ?> http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/ <http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/>

?Carson


> On Feb 4, 2019, at 11:09 AM, DECKER, KEITH F [AG/1005] <keith.decker at bayer.com> wrote:
> 
> I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be.
> I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html <http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html>
> but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use. 
>  
> So my questions are
>  
> 1.  Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS?
> 2.  Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end?  If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate?
>  
> Thanks and sorry for the long question
> Keith
> 
> 
> This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.
> Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.
> Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so.
> If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190204/839034e2/attachment-0003.html>

From liorglck at gmail.com  Mon Feb  4 02:00:29 2019
From: liorglck at gmail.com (Lior Glick)
Date: Mon, 4 Feb 2019 11:00:29 +0200
Subject: [maker-devel] MAKER not calling RepeatMasker exe indicated in
 maker_exe.ctl
Message-ID: <CAFOVipNgzGd-wLNqz1WGx+mM_8R3KZOtqatq6D+nuNCHboRPXQ@mail.gmail.com>

Dear MAKER users,

I've been using MAKER for a while now, with RepeatMasker installed locally.
By that I mean that I can type 'RepeatMasker' in my terminal and the
software is initiated. Typing 'which RepeatMasker' shows the correct local
path.
I also use this path as value for the maker_exe.ctl parameter
'RepeatMasker'.
Trying to generalize my working environment, I am trying to use a conda env
<https://anaconda.org/bioconda/maker> which is capable of running MAKER.
This env comes with RepeatMasker as well. Once I activate this env, I can
still run RepeatMasker, but it points to a different path. When I run MAKER
within this env, it fails right away with the error message:
ERROR: Could not determine if RepBase is installed
Running the same configuration files locally (i.e. outside the conda env)
results in a successful run.
This leads me to think that MAKER is not actually using the path indicated
in the maker_exe.ctl file, and rather looks for RepeatMasker in $PATH or
something similar. Is that the expected behavior? Any suggestions of how to
overcome this issue?

Thanks and best regards,
Lior
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190204/bd480e61/attachment-0003.html>

From keith.decker at bayer.com  Mon Feb  4 11:39:48 2019
From: keith.decker at bayer.com (DECKER, KEITH F [AG/1005])
Date: Mon, 4 Feb 2019 18:39:48 +0000
Subject: [maker-devel] MAKER on AWS
In-Reply-To: <0934DD0D-9431-4454-A278-87E27D44F984@gmail.com>
References: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com>
	<0934DD0D-9431-4454-A278-87E27D44F984@gmail.com>
Message-ID: <1BAD7C53-AFA5-4A4A-B35B-D760B3D4C28D@monsanto.com>

Thanks,
Do you have metrics on how MAKER performs on annotating a single chromosome on a single machine?  For example, will I see anything close to 16X speed-up using a 16 core machine, and does performance improvement saturate at a certain number of cores?

-Keith

From: Carson Holt <carsonhh at gmail.com>
Date: Monday, February 4, 2019 at 12:33 PM
To: "DECKER, KEITH F [AG/1005]" <keith.decker at bayer.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] MAKER on AWS

You can try and stand up a cluster inside AWS, or like you said just start independent instances each with their own piece of the total dataset. There is a tools called fasta_tool inside of maker that makes it easy to split up the dataset into equal sized chunks.

Alternatively, CyVerse has set up an interesting MAKER wrapper (WQ-MAKER) that launches multiple cloud instances for MAKER and handles data chunking for you (they?ve been using XSEDE cloud resources through the NSF)  ?>
http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf

Here is an example of an external project using their setup ?> http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/

?Carson


On Feb 4, 2019, at 11:09 AM, DECKER, KEITH F [AG/1005] <keith.decker at bayer.com<mailto:keith.decker at bayer.com>> wrote:

I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be.
I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html
but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use.

So my questions are

1.  Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS?
2.  Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end?  If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate?

Thanks and sorry for the long question
Keith


This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.

Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.

Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so.

If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.
Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.
Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so.
If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190204/05ee72b5/attachment-0003.html>

From carsonhh at gmail.com  Mon Feb  4 12:00:00 2019
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 4 Feb 2019 12:00:00 -0700
Subject: [maker-devel] MAKER on AWS
In-Reply-To: <1BAD7C53-AFA5-4A4A-B35B-D760B3D4C28D@monsanto.com>
References: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com>
	<0934DD0D-9431-4454-A278-87E27D44F984@gmail.com>
	<1BAD7C53-AFA5-4A4A-B35B-D760B3D4C28D@monsanto.com>
Message-ID: <EF78A658-7C9E-4F10-AA30-73E97DB30297@gmail.com>

I don?t have cloud performance stats, but I do have cluster performance stats you may be able to somewhat correlate (attached). On a cluster we see nearly linear performance gains until ~100 CPU cores, and the plateau doesn?t fully level out until well after 600 cores (we are hitting IO and networking limits for inter-node communication). So if you are only using a single instance, you can essentially consider it the equivalent of a single real machine which would fall well under 100 CPU cores, and performance growth would be expected to be linear on that instance.

?Carson


> On Feb 4, 2019, at 11:39 AM, DECKER, KEITH F [AG/1005] <keith.decker at bayer.com> wrote:
> 
> Thanks,
> Do you have metrics on how MAKER performs on annotating a single chromosome on a single machine?  For example, will I see anything close to 16X speed-up using a 16 core machine, and does performance improvement saturate at a certain number of cores?
>  
> -Keith
>  
> From: Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>>
> Date: Monday, February 4, 2019 at 12:33 PM
> To: "DECKER, KEITH F [AG/1005]" <keith.decker at bayer.com <mailto:keith.decker at bayer.com>>
> Cc: "maker-devel at yandell-lab.org <mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org <mailto:maker-devel at yandell-lab.org>>
> Subject: Re: [maker-devel] MAKER on AWS
>  
> You can try and stand up a cluster inside AWS, or like you said just start independent instances each with their own piece of the total dataset. There is a tools called fasta_tool inside of maker that makes it easy to split up the dataset into equal sized chunks. 
>  
> Alternatively, CyVerse has set up an interesting MAKER wrapper (WQ-MAKER) that launches multiple cloud instances for MAKER and handles data chunking for you (they?ve been using XSEDE cloud resources through the NSF)  ?>
> http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf <http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf>
>  
> Here is an example of an external project using their setup ?> http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/ <http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/>
>  
> ?Carson
>  
>  
>  
> 
> 
> On Feb 4, 2019, at 11:09 AM, DECKER, KEITH F [AG/1005] <keith.decker at bayer.com <mailto:keith.decker at bayer.com>> wrote:
>  
> I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be.
> I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html <http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html>
> but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use. 
>  
> So my questions are
>  
> 1.  Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS?
> 2.  Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end?  If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate?
>  
> Thanks and sorry for the long question
> Keith
> 
> 
>  
> This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.
> Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.
> Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so.
> If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.
>  
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
>  
> 
> 
> This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.
> Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.
> Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so.
> If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190204/43c5cc9f/attachment-0006.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-2.pdf
Type: application/pdf
Size: 41425 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190204/43c5cc9f/attachment-0003.pdf>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190204/43c5cc9f/attachment-0007.html>

From xvazquezc at gmail.com  Tue Feb  5 15:42:40 2019
From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=)
Date: Wed, 6 Feb 2019 09:42:40 +1100
Subject: [maker-devel] Re-annotation, fewer gene predictions
In-Reply-To: <DM5PR14MB129277D10A397B2CBE0DDA08AE6E0@DM5PR14MB1292.namprd14.prod.outlook.com>
References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>
	<CAL0hg4HevFbPhVLfuLq3WF7iJUFpHKwm0X9q+X_yX5sJsCqKDA@mail.gmail.com>
	<DM5PR14MB129277D10A397B2CBE0DDA08AE6E0@DM5PR14MB1292.namprd14.prod.outlook.com>
Message-ID: <CAL0hg4EH=79A7ucKe=ORznXh=7Suu9Q8AEWj7C8Xio82=G4fvw@mail.gmail.com>

Don't you use SNAP? It usually produces quite decent results. And easier to
train than any of the other predictors

In any case, the Augustus gene model is way off in both cases
GM doesn't seem bad if your fungus has a rather usual genome... in the
first. For the second, it looks bad

I'm not too familiar with the reannotation but I'd rather create the gene
models from scratch rather than reuse the ones from the Illumina-only
genomes.
Note that assemblies with long-reads, have a higher proportion of
repetitive elements that need masking and RepeatMasker only may not be
enough. In theory, this shouldn't affect Augustus model if trained through
BUSCO as it uses defined conserved markers to create the gene model, but
I'm not so sure about GM.

If you trained Augustus with BUSCO, and this is the result, I'd discard the
gene model and train it again by the "traditional way", i.e. as it used to
be when we only had CEGMA. I had good results just by changing the training
method.

Hope it helps,
Xabi


On Wed, 6 Feb 2019 at 02:19, morgan sobol <morgan_starr_s at live.com> wrote:

> Thank you, Xabi for the response.
> The number of proteins from each source is greatly lower than before.
> Previous numbers were 325, 10,899, and 11,243 for augustus, genemark, and
> maker respectively.
> The more recent numbers are 25, 857, 4418 respectively.
>
> So do you think maybe this hints that something is wrong from genemark?
>
> Morgan
>
>
> ------------------------------
> *From:* Xabier V?zquez-Campos <xvazquezc at gmail.com>
> *Sent:* Sunday, February 3, 2019 4:43 PM
> *To:* morgan sobol
> *Cc:* maker-devel at yandell-lab.org
> *Subject:* Re: [maker-devel] Re-annotation, fewer gene predictions
>
> Hi Morgan,
>
> We had a similar issue with AUGUSTUS underpredicting when using a
> BUSCO-derived gene model
> https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ
>
> Also, check the number of proteins by each individual predictor. If the
> numbers from one of them are off, you may find a possible source of issues.
> We didn't have a very good experience with GM, as it used to overpredict
> an absurd number of proteins.
>
> Xabi
>
> On Mon, 4 Feb 2019 at 06:15, morgan sobol <morgan_starr_s at live.com> wrote:
>
> Hello,
>
> I previously used Maker to annotate two different fungal genomes that were
> created using Illumina sequences only. For these genomes, I had over 11,000
> genes predicted.
> I recently obtained PacBio sequences for the same genomes, so I created
> two hybrid assemblies. Both assemblies were very familiar in length and
> completed number of orthologs to the Illumina only assembly, but had much
> fewer, but longer contigs.
>
> I re-ran Maker using the settings below. For one of my genomes, I got
> around 11,000 genes predicted again, as expected. However, for the other
> genome, I am continuously getting ~4,400 predicted genes.
>
> I am asking for help as to how I can determine why I keep getting fewer
> predicted genes for only one of my genomes, even though I ran them the same?
>
> Thanks,
> Morgan S.
>
> maker_opts.log
> #-----Genome (these are always required)
> genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked
> #genome sequence (fasta file or$
> organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic
>
> #-----Re-annotation Using MAKER Derived GFF3
> maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff
> #MAKER derive$
> est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
> altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
> protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
> rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
> model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
> pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
> other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no
>
> #-----EST Evidence (for best results provide a file for at least one)
> est= #set of ESTs or assembled mRNA-seq in fasta format
> altest= #EST/cDNA sequence file in fasta format from an alternate organism
> est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
> altest_gff= #aligned ESTs from a closly relate species in GFF3 format
>
> #-----Protein Homology Evidence (for best results provide a file for at
> least one)
> protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta
> #protein sequence file in fasta format (i.e. from mutiple oransisms)
> protein_gff=  #aligned protein homology evidence from an external GFF3 file
>
> #-----Repeat Masking (leave values blank to skip repeat masking)
> model_org= #select a model organism for RepBase masking in RepeatMasker
> rmlib= #provide an organism specific repeat library in fasta format for
> RepeatMasker
> repeat_protein= #provide a fasta file of transposable element proteins for
> RepeatRunner
> rm_gff= #pre-identified repeat elements from an external GFF3 file
> prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change
> this), 1 = yes, 0 = no
> softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg
> and dust filtering)
>
> #-----Gene Prediction
> snaphmm= #SNAP HMM file
> gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file
> augustus_species=1368D_uni #Augustus gene prediction species model
> fgenesh_par_file= #FGENESH parameter file
> pred_gff= #ab-initio predictions from an external GFF3 file
> model_gff= #annotated gene models from an external GFF3 file (annotation
> pass-through)
> est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
> protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no
> trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
> snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
> unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 =
> yes, 0 = no
>
> #-----Other Annotation Feature Types (features MAKER doesn't recognize)
> other_gff= #extra features to pass-through to final MAKER generated GFF3
> file
>
> #-----External Application Behavior Options
> alt_peptide=C #amino acid used to replace non-standard amino acids in
> BLAST databases
> cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI,
> leave 1 when using MPI)
>
> #-----MAKER Behavior Options
> max_dna_len=100000 #length for dividing up contigs into chunks
> (increases/decreases memory usage)
> min_contig=1 #skip genome contigs below this length (under 10kb are often
> useless)
>
> pred_flank=200 #flank for extending evidence clusters sent to gene
> predictors
> pred_stats=1 #report AED and QI statistics for all predictions as well as
> models
> AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and
> 1)
> min_protein=0 #require at least this many amino acids in predicted proteins
> alt_splice=0 #Take extra steps to try and find alternative splicing, 1 =
> yes, 0 = no
> always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0
> = no
> map_forward=0 #map names and attributes forward from old GFF3 genes, 1 =
> yes, 0 = no
> keep_preds=1 #Concordance threshold to add unsupported gene prediction
> (bound by 0 and 1)
>
> split_hit=10000 #length for the splitting of hits (expected max intron
> size for evidence alignments)
> single_exon=1 #consider single exon EST evidence when generating
> annotations, 1 = yes, 0 = no
> single_length=250 #min length required for single exon ESTs if
> 'single_exon is enabled'
> correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion
> genes
>
> tries=2 #number of times to try a contig if there is a failure for some
> reason
> clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0
> = no
> clean_up=0 #removes theVoid directory with individual analysis files, 1 =
> yes, 0 = no
> TMP= #specify a directory other than the system default temporary
> directory for temporary files
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
> --
> Xabier V?zquez-Campos, *PhD*
> *Research Associate*
> NSW Systems Biology Initiative
> School of Biotechnology and Biomolecular Sciences
> The University of New South Wales
> Sydney NSW 2052 AUSTRALIA
>


-- 
Xabier V?zquez-Campos, *PhD*
*Research Associate*
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190206/ed24fbe6/attachment-0003.html>

From xvazquezc at gmail.com  Wed Feb  6 15:33:47 2019
From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=)
Date: Thu, 7 Feb 2019 09:33:47 +1100
Subject: [maker-devel] Re-annotation, fewer gene predictions
In-Reply-To: <DM5PR14MB1292FEA9F662D408FEBB3D21AE6F0@DM5PR14MB1292.namprd14.prod.outlook.com>
References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>
	<CAL0hg4HevFbPhVLfuLq3WF7iJUFpHKwm0X9q+X_yX5sJsCqKDA@mail.gmail.com>
	<DM5PR14MB129277D10A397B2CBE0DDA08AE6E0@DM5PR14MB1292.namprd14.prod.outlook.com>
	<CAL0hg4EH=79A7ucKe=ORznXh=7Suu9Q8AEWj7C8Xio82=G4fvw@mail.gmail.com>
	<DM5PR14MB1292FEA9F662D408FEBB3D21AE6F0@DM5PR14MB1292.namprd14.prod.outlook.com>
Message-ID: <CAL0hg4HG0n1+kw4PpFL_LG66nE+Sdd1fzX2Atn5+o+KryVCtug@mail.gmail.com>

 SNAP is easy to train, works well in fungal genomes and it's explained in
Maker's wiki:
http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_WGS_Assembly_and_Annotation_Winter_School_2018#Training_ab_initio_Gene_Predictors

Oh, sorry, I didn't explain myself well. What I was trying to say is that
before BUSCO, when we only had CEGMA, we would proceed in a different way
to train Augustus as CEGMA wouldn't produce Augustus gene models
automatically. I don't mean you to use CEGMA.

This is what I have on my own documentation about how to train Augustus
"the old way"

> AUGUSTUS? the old way
>
> Alternatively, you can train AUGUSTUS in a more ?manual? way, like when we
> were using CEGMA. The training starts with the output from the second
> instance of fathom in the SNAP training section.
>
> cd ${MYGENOME_DIR}/maker/snap1
> perl ~/bin/zff2augustus_gbk.pl > ${MYGENOME}.train1.gb
>
> zff2augustus_gbk.pl generates a GenBank file from export.dna.
>
> The actual training of AUGUSTUS will be through the *webAUGUSTUS server*.
>
> Before proceed, it is recommended to rename the fasta headers, specially
> if they contain special characters and/or very long headers. This is the
> main reason of failure for the jobs submitted to webAUGUSTUS. You can use
> the simplifyFastaHeaders.pl
> <http://bioinf.uni-greifswald.de/bioinf/downloads/simplifyFastaHeaders.pl>
> script for that:
>
> perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_assembly.fasta nameStem ${MYGENOME}_contigs_rename.fasta ${MYGENOME}_contigs.map
>
> perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_transcripts_assembled.fasta nameStem ${MYGENOME}_rna_rename.fasta ${MYGENOME}_rna.map
>
> nameStem is the base name for naming each of the sequences in the
> multifasta files. Use a value with something appropriate. Use *contig*
> and *rna* for the assembly and RNA-seq files, respectively; or something
> based on that. For example, ?pgcontig? and ?pgrna? for contigs and RNA from *Puccinia
> graminis*
> *DO NOT* give the same nameStem to both fasta files, and don?t use any
> special character.
>
> We need the following files (minimum):
>
>    - ${MYGENOME}_assembly.fasta as *Genome file*
>    - ${MYGENOME}.train1.gb as *Training gene structure file*
>
> If we also have RNA-seq data:
>
>    - ${MYGENOME}_assembled_transcripts.fasta as *cDNA file*
>
> Use ${MYGENOME}_v1 as *Species name*. We will need to have a different
> species name in the retraining step. Otherwise when Maker2 is rerun, Maker2
> will see the same name and will not rerun AUGUSTUS, even though the species
> profile is different. So, ${MYGENOME}_v1 just do the job and tracks
> version.
>
> Once the job is finished, the *Species parameter archive* (
> parameters.tar.gz) will contain a folder with the model files for your
> species. Copy it to the species folder of your AUGUSTUS installation.
>
Hope this helps

PS: hit reply all so this is logged in Maker's mail list in case anybody
else experiences similar issues

On Thu, 7 Feb 2019 at 06:36, morgan sobol <morgan_starr_s at live.com> wrote:

> I have not used SNAP or CEGMA, however, I see that CEGMA was discontinued
> in 2015.
> Do you think that will be a problem, or is it still worth using the old
> version?
>
>
> ------------------------------
> *From:* Xabier V?zquez-Campos <xvazquezc at gmail.com>
> *Sent:* Tuesday, February 5, 2019 4:42 PM
> *To:* morgan sobol; Maker Mailing List
> *Subject:* Re: [maker-devel] Re-annotation, fewer gene predictions
>
> Don't you use SNAP? It usually produces quite decent results. And easier
> to train than any of the other predictors
>
> In any case, the Augustus gene model is way off in both cases
> GM doesn't seem bad if your fungus has a rather usual genome... in the
> first. For the second, it looks bad
>
> I'm not too familiar with the reannotation but I'd rather create the gene
> models from scratch rather than reuse the ones from the Illumina-only
> genomes.
> Note that assemblies with long-reads, have a higher proportion of
> repetitive elements that need masking and RepeatMasker only may not be
> enough. In theory, this shouldn't affect Augustus model if trained through
> BUSCO as it uses defined conserved markers to create the gene model, but
> I'm not so sure about GM.
>
> If you trained Augustus with BUSCO, and this is the result, I'd discard
> the gene model and train it again by the "traditional way", i.e. as it used
> to be when we only had CEGMA. I had good results just by changing the
> training method.
>
> Hope it helps,
> Xabi
>
>
>
>
> On Wed, 6 Feb 2019 at 02:19, morgan sobol <morgan_starr_s at live.com> wrote:
>
> Thank you, Xabi for the response.
> The number of proteins from each source is greatly lower than before.
> Previous numbers were 325, 10,899, and 11,243 for augustus, genemark, and
> maker respectively.
> The more recent numbers are 25, 857, 4418 respectively.
>
> So do you think maybe this hints that something is wrong from genemark?
>
> Morgan
>
>
> ------------------------------
> *From:* Xabier V?zquez-Campos <xvazquezc at gmail.com>
> *Sent:* Sunday, February 3, 2019 4:43 PM
> *To:* morgan sobol
> *Cc:* maker-devel at yandell-lab.org
> *Subject:* Re: [maker-devel] Re-annotation, fewer gene predictions
>
> Hi Morgan,
>
> We had a similar issue with AUGUSTUS underpredicting when using a
> BUSCO-derived gene model
> https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ
>
> Also, check the number of proteins by each individual predictor. If the
> numbers from one of them are off, you may find a possible source of issues.
> We didn't have a very good experience with GM, as it used to overpredict
> an absurd number of proteins.
>
> Xabi
>
> On Mon, 4 Feb 2019 at 06:15, morgan sobol <morgan_starr_s at live.com> wrote:
>
> Hello,
>
> I previously used Maker to annotate two different fungal genomes that were
> created using Illumina sequences only. For these genomes, I had over 11,000
> genes predicted.
> I recently obtained PacBio sequences for the same genomes, so I created
> two hybrid assemblies. Both assemblies were very familiar in length and
> completed number of orthologs to the Illumina only assembly, but had much
> fewer, but longer contigs.
>
> I re-ran Maker using the settings below. For one of my genomes, I got
> around 11,000 genes predicted again, as expected. However, for the other
> genome, I am continuously getting ~4,400 predicted genes.
>
> I am asking for help as to how I can determine why I keep getting fewer
> predicted genes for only one of my genomes, even though I ran them the same?
>
> Thanks,
> Morgan S.
>
> maker_opts.log
> #-----Genome (these are always required)
> genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked
> #genome sequence (fasta file or$
> organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic
>
> #-----Re-annotation Using MAKER Derived GFF3
> maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff
> #MAKER derive$
> est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
> altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
> protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
> rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
> model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
> pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
> other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no
>
> #-----EST Evidence (for best results provide a file for at least one)
> est= #set of ESTs or assembled mRNA-seq in fasta format
> altest= #EST/cDNA sequence file in fasta format from an alternate organism
> est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
> altest_gff= #aligned ESTs from a closly relate species in GFF3 format
>
> #-----Protein Homology Evidence (for best results provide a file for at
> least one)
> protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta
> #protein sequence file in fasta format (i.e. from mutiple oransisms)
> protein_gff=  #aligned protein homology evidence from an external GFF3 file
>
> #-----Repeat Masking (leave values blank to skip repeat masking)
> model_org= #select a model organism for RepBase masking in RepeatMasker
> rmlib= #provide an organism specific repeat library in fasta format for
> RepeatMasker
> repeat_protein= #provide a fasta file of transposable element proteins for
> RepeatRunner
> rm_gff= #pre-identified repeat elements from an external GFF3 file
> prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change
> this), 1 = yes, 0 = no
> softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg
> and dust filtering)
>
> #-----Gene Prediction
> snaphmm= #SNAP HMM file
> gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file
> augustus_species=1368D_uni #Augustus gene prediction species model
> fgenesh_par_file= #FGENESH parameter file
> pred_gff= #ab-initio predictions from an external GFF3 file
> model_gff= #annotated gene models from an external GFF3 file (annotation
> pass-through)
> est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
> protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no
> trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
> snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
> unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 =
> yes, 0 = no
>
> #-----Other Annotation Feature Types (features MAKER doesn't recognize)
> other_gff= #extra features to pass-through to final MAKER generated GFF3
> file
>
> #-----External Application Behavior Options
> alt_peptide=C #amino acid used to replace non-standard amino acids in
> BLAST databases
> cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI,
> leave 1 when using MPI)
>
> #-----MAKER Behavior Options
> max_dna_len=100000 #length for dividing up contigs into chunks
> (increases/decreases memory usage)
> min_contig=1 #skip genome contigs below this length (under 10kb are often
> useless)
>
> pred_flank=200 #flank for extending evidence clusters sent to gene
> predictors
> pred_stats=1 #report AED and QI statistics for all predictions as well as
> models
> AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and
> 1)
> min_protein=0 #require at least this many amino acids in predicted proteins
> alt_splice=0 #Take extra steps to try and find alternative splicing, 1 =
> yes, 0 = no
> always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0
> = no
> map_forward=0 #map names and attributes forward from old GFF3 genes, 1 =
> yes, 0 = no
> keep_preds=1 #Concordance threshold to add unsupported gene prediction
> (bound by 0 and 1)
>
> split_hit=10000 #length for the splitting of hits (expected max intron
> size for evidence alignments)
> single_exon=1 #consider single exon EST evidence when generating
> annotations, 1 = yes, 0 = no
> single_length=250 #min length required for single exon ESTs if
> 'single_exon is enabled'
> correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion
> genes
>
> tries=2 #number of times to try a contig if there is a failure for some
> reason
> clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0
> = no
> clean_up=0 #removes theVoid directory with individual analysis files, 1 =
> yes, 0 = no
> TMP= #specify a directory other than the system default temporary
> directory for temporary files
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
> --
> Xabier V?zquez-Campos, *PhD*
> *Research Associate*
> NSW Systems Biology Initiative
> School of Biotechnology and Biomolecular Sciences
> The University of New South Wales
> Sydney NSW 2052 AUSTRALIA
>
>
>
> --
> Xabier V?zquez-Campos, *PhD*
> *Research Associate*
> NSW Systems Biology Initiative
> School of Biotechnology and Biomolecular Sciences
> The University of New South Wales
> Sydney NSW 2052 AUSTRALIA
>


-- 
Xabier V?zquez-Campos, *PhD*
*Research Associate*
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190207/e334d07a/attachment-0003.html>

From liorglic at mail.tau.ac.il  Mon Feb 11 07:04:16 2019
From: liorglic at mail.tau.ac.il (Lior Glick)
Date: Mon, 11 Feb 2019 16:04:16 +0200
Subject: [maker-devel] MAKER not calling RepeatMasker exe indicated in
 maker_exe.ctl
Message-ID: <CAOzMDPxUf8a9orgsmbJ8QDdq4=OoKL_AkjVbsbPcGGm8z6ufXg@mail.gmail.com>

Dear MAKER users,

I've been using MAKER for a while now, with RepeatMasker installed locally.
By that I mean that I can type 'RepeatMasker' in my terminal and the
software is initiated. Typing 'which RepeatMasker' shows the correct local
path.
I also use this path as value for the maker_exe.ctl parameter
'RepeatMasker'.
Trying to generalize my working environment, I am trying to use a conda env
<https://anaconda.org/bioconda/maker> which is capable of running MAKER.
This env comes with RepeatMasker as well. Once I activate this env, I can
still run RepeatMasker, but it points to a different path. When I run MAKER
within this env, it fails right away with the error message:
ERROR: Could not determine if RepBase is installed
Running the same configuration files locally (i.e. outside the conda env)
results in a successful run.
This leads me to think that MAKER is not actually using the path indicated
in the maker_exe.ctl file, and rather looks for RepeatMasker in $PATH or
something similar. Is that the expected behavior? Any suggestions of how to
overcome this issue?

Thanks and best regards,
Lior
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190211/2c8039fa/attachment-0003.html>

From liorglic at mail.tau.ac.il  Mon Feb 11 07:12:25 2019
From: liorglic at mail.tau.ac.il (Lior Glick)
Date: Mon, 11 Feb 2019 16:12:25 +0200
Subject: [maker-devel] Unknown (X) amino acids in predicted proteins
Message-ID: <CAOzMDPwAC-KnF_h__kOUM_s5nziOHmrGq8ika9Hfb40wny3_xQ@mail.gmail.com>

Dear MAKER users,

After completing a MAKER run, I looked at the protein fasta files that
MAKER outputs and noticed that a small fraction of the sequences include X
characters, indicating unknown amino acids. I was wondering how such
sequences are obtained, I mean how come there are unknown amino acids in
the prediction? Is this an indication of low-quality predictions?
Is there any documentation regarding the procedure that generates the
protein sequences?

Thanks a lot,
Lior
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190211/55a59fcd/attachment-0003.html>

From kapeelc at gmail.com  Thu Feb  7 12:43:47 2019
From: kapeelc at gmail.com (Kapeel Chougule)
Date: Thu, 7 Feb 2019 14:43:47 -0500
Subject: [maker-devel] MAKER v3 Fgenesh ERROR
Message-ID: <CA+DOtefuUEc5_fFh7j2ykb4yBKmtEp1vgt0Pea-RF+7GCqr9ig@mail.gmail.com>

Hi, Carson

I have been getting this error with fgenesh tool within MAKER. It runs ok
with most of the assembly contigs but seems to fail on one contig or part
of the contig with the below error

Widget::fgenesh:
/mnt/grid/ware/hpc/home/data/mcampbel/applications/maker/bin/../lib/Widget/fgenesh/fgenesh_wrap
/mnt/grid/ware/hpc_norepl/data/data/programs/fgenesh_v8/fgenesh_suite_v8.0.0a/fgenesh
/sonas-hs/ware/hpc_norepl/data/programs/fgenesh_v8/fgenesh_suite_v8.0.0a/Zeamays.mpar.dat.new
/tmp/uge/53139300.1.primary.q/maker_j3ttxX/6/6_1.600610-613023.Zeamays.mpar.dat.new.auto_annotator.fgenesh.fasta
-exon_table:/tmp/uge/53139300.1.primary.q/maker_j3ttxX/6/6_1.600610-613023.Zeamays.mpar.dat.new.auto_annotator.xdef.fgenesh
>
/tmp/uge/53139300.1.primary.q/maker_j3ttxX/6/6_1.600610-613023.Zeamays.mpar.dat.new.auto_annotator.fgenesh
#-------------------------------#
 ...processing 9 of 24
 ...processing 8 of 28
 ...processing 10 of 24
 ...processing 9 of 28
 ...processing 11 of 24
 ...processing 10 of 28
 ...processing 12 of 24
 ...processing 11 of 28
deleted:0 genes
ERROR: FgenesH failed
--> rank=14, hostname=bnbcompute50
ERROR: Failed while annotating transcripts
ERROR: Chunk failed at level:1, tier_type:4
FAILED CONTIG:Super-Scaffold_14.2_contig2

I updated the perl module fgenesh.pm as suggested in the previous threads.
Attached are the  maker_opts.ctl and STDERR log file.

Thanks

Kapeel


-- 


*Kapeel ChouguleComputational Scientist Developer II*


*One Bungtown Road Cold Spring Harbor, NY 11724http://www.warelab.org/
<http://www.warelab.org/>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190207/b825acee/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: maker_opts.ctl
Type: application/octet-stream
Size: 5421 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190207/b825acee/attachment-0006.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stderr.log
Type: application/octet-stream
Size: 10012918 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190207/b825acee/attachment-0007.obj>

From fatih.sarigoel at durham.ac.uk  Wed Feb 13 05:20:40 2019
From: fatih.sarigoel at durham.ac.uk (SARIGOEL, FATIH)
Date: Wed, 13 Feb 2019 12:20:40 +0000
Subject: [maker-devel] Does Conda Maker actually work?
Message-ID: <VI1PR06MB5613478CC864D85EB234EDF2B5660@VI1PR06MB5613.eurprd06.prod.outlook.com>

Greetings,
I notice that you never mention conda installation on your website, so I am curious if the conda version is actually supposed to be working fine or not; as for me it didn't.
I created a new conda environment and installed Maker (tried this with both installation options)
When I run the example files, I get this error:

"make: *** [Makefile:330: IndexedBase_14e0.o] Error 127
A problem was encountered while attempting to compile and install your Inline
C code. The command that failed was:
  "make > out.make 2>&1" with error code 2"

My conda environment is here
/fast_new/work/users/fsarigo_m/miniconda3
I don't understand why the program is trying to look here:
/home/conda
which does not exist

Also begins with a "possible precedence issue"

Thanks for your help in advance!
Fatih

+++++

Here is the full log until the end of the contig:

(MakerX) [fsarigo_m at med0223 MAKER]$ maker
Possible precedence issue with control flow operator at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 845.
STATUS: Parsing control files...
STATUS: Processing and indexing input FASTA files...
STATUS: Setting up database for any GFF3 input...
A data structure will be created for you at:
/fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_datastore

To access files for individual sequences use the datastore index:
/fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_master_datastore_index.log

STATUS: Now running MAKER...
examining contents of the fasta file and run log


--Next Contig--

Processing run.log file...
#---------------------------------------------------------------------
Now starting the contig!!
SeqID: contig-dpp-500-500
Length: 32156
#---------------------------------------------------------------------


Running Mkbootstrap for IndexedBase_14e0 ()
chmod 644 "IndexedBase_14e0.bs"
"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644
"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp"  -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap"   IndexedBase_14e0.xs > IndexedBase_14e0.xsc
mv IndexedBase_14e0.xsc IndexedBase_14e0.c
/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c  -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2   -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE"   IndexedBase_14e0.c
/bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory
make: *** [Makefile:330: IndexedBase_14e0.o] Error 127

A problem was encountered while attempting to compile and install your Inline
C code. The command that failed was:
  "make > out.make 2>&1" with error code 2

The build directory was:
/fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0

To debug the problem, cd to the build directory, and inspect the output files.

Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin'
 at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275.
--> rank=NA, hostname=med0223
...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869.
--> rank=NA, hostname=med0223
 at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 38.
Error::_throw_Error_Simple(HASH(0x564b40c78870)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 306
Error::subs::run_clauses(HASH(0x564b40688970), "Running Mkbootstrap for IndexedBase_14e0 ()\x{a}chmod 644 \"Indexe"..., undef, ARRAY(0x564b40673ad0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 426
Error::subs::try(CODE(0x564b406899b8), HASH(0x564b40688970)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/FastaSeq.pm line 95
FastaSeq::seq(FastaSeq=HASH(0x564b4068a7f0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 478
Process::MpiChunk::_go(Process::MpiChunk=HASH(0x564b40673c08), "run", HASH(0x564b40673c80), 0, 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 341
Process::MpiChunk::run(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 357
Process::MpiChunk::run_all(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiTiers.pm line 287
Process::MpiTiers::run_all(Process::MpiTiers=HASH(0x564b4053f9f0), 0) called at /fast/users/fsarigo_m/miniconda3/envs/MakerX/bin/maker line 683
Running Mkbootstrap for IndexedBase_14e0 ()
chmod 644 "IndexedBase_14e0.bs"
"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644
"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp"  -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap"   IndexedBase_14e0.xs > IndexedBase_14e0.xsc
mv IndexedBase_14e0.xsc IndexedBase_14e0.c
/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c  -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2   -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE"   IndexedBase_14e0.c
/bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory
make: *** [Makefile:330: IndexedBase_14e0.o] Error 127

A problem was encountered while attempting to compile and install your Inline
C code. The command that failed was:
  "make > out.make 2>&1" with error code 2

The build directory was:
/fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0

To debug the problem, cd to the build directory, and inspect the output files.

Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin'
 at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275.
--> rank=NA, hostname=med0223
...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869.
--> rank=NA, hostname=med0223
--> rank=NA, hostname=med0223
--> rank=NA, hostname=med0223
ERROR: Failed while examining contents of the fasta file and run log
ERROR: Chunk failed at level:0, tier_type:0
FAILED CONTIG:contig-dpp-500-500

examining contents of the fasta file and run log


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190213/5e5ba244/attachment-0003.html>

From carsonhh at gmail.com  Wed Feb 13 07:51:44 2019
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 13 Feb 2019 07:51:44 -0700
Subject: [maker-devel] Does Conda Maker actually work?
In-Reply-To: <VI1PR06MB5613478CC864D85EB234EDF2B5660@VI1PR06MB5613.eurprd06.prod.outlook.com>
References: <VI1PR06MB5613478CC864D85EB234EDF2B5660@VI1PR06MB5613.eurprd06.prod.outlook.com>
Message-ID: <0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com>

The conda recipe was produced by another group. I do not currently recommend using it because I have seen a number of issues pop up on the list based on people attempting to install MAKER via conda.  I know there is at least an issue with the conda RepeatMasker install, and there may be others. The specific failure you show is from Bio::DB::IndexedBase trying to compile an Inline::C function. It may be that conda is installing an older BioPerl where this issue still exists ?> https://github.com/bioperl/bioperl-live/issues/215 <https://github.com/bioperl/bioperl-live/issues/215>

Or it may be that there is a new related issue (I?ve seen a handful of other examples that seem to relate back to Bio::DB::IndexedBase) ?> https://github.com/bioperl/bioperl-live/issues/305 <https://github.com/bioperl/bioperl-live/issues/305>

Try installing MAKER without conda (make sure to remove any components that are in conda first to avoid conflicts).

?Carson


> On Feb 13, 2019, at 5:20 AM, SARIGOEL, FATIH <fatih.sarigoel at durham.ac.uk> wrote:
> 
> Greetings,
> I notice that you never mention conda installation on your website, so I am curious if the conda version is actually supposed to be working fine or not; as for me it didn't.
> I created a new conda environment and installed Maker (tried this with both installation options)
> When I run the example files, I get this error:
> 
> "make: *** [Makefile:330: IndexedBase_14e0.o] Error 127
> A problem was encountered while attempting to compile and install your Inline
> C code. The command that failed was:
>   "make > out.make 2>&1" with error code 2"
> 
> My conda environment is here
> /fast_new/work/users/fsarigo_m/miniconda3
> I don't understand why the program is trying to look here:
> /home/conda
> which does not exist
> 
> Also begins with a "possible precedence issue"
> 
> Thanks for your help in advance!
> Fatih
> 
> +++++
> 
> Here is the full log until the end of the contig:
> 
> (MakerX) [fsarigo_m at med0223 MAKER]$ maker
> Possible precedence issue with control flow operator at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 845.
> STATUS: Parsing control files...
> STATUS: Processing and indexing input FASTA files...
> STATUS: Setting up database for any GFF3 input...
> A data structure will be created for you at:
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_datastore
> 
> To access files for individual sequences use the datastore index:
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_master_datastore_index.log
> 
> STATUS: Now running MAKER...
> examining contents of the fasta file and run log
> 
> 
> 
> --Next Contig--
> 
> Processing run.log file...
> #---------------------------------------------------------------------
> Now starting the contig!!
> SeqID: contig-dpp-500-500
> Length: 32156
> #---------------------------------------------------------------------
> 
> 
> Running Mkbootstrap for IndexedBase_14e0 ()
> chmod 644 "IndexedBase_14e0.bs"
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp"  -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap"   IndexedBase_14e0.xs > IndexedBase_14e0.xsc
> mv IndexedBase_14e0.xsc IndexedBase_14e0.c
> /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c  -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2   -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE"   IndexedBase_14e0.c
> /bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory
> make: *** [Makefile:330: IndexedBase_14e0.o] Error 127
> 
> A problem was encountered while attempting to compile and install your Inline
> C code. The command that failed was:
>   "make > out.make 2>&1" with error code 2
> 
> The build directory was:
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0
> 
> To debug the problem, cd to the build directory, and inspect the output files.
> 
> Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin'
>  at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275.
> --> rank=NA, hostname=med0223
> ...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869.
> --> rank=NA, hostname=med0223
>  at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 38.
> Error::_throw_Error_Simple(HASH(0x564b40c78870)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 306
> Error::subs::run_clauses(HASH(0x564b40688970), "Running Mkbootstrap for IndexedBase_14e0 ()\x{a}chmod 644 \"Indexe"..., undef, ARRAY(0x564b40673ad0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 426
> Error::subs::try(CODE(0x564b406899b8), HASH(0x564b40688970)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/FastaSeq.pm line 95
> FastaSeq::seq(FastaSeq=HASH(0x564b4068a7f0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 478
> Process::MpiChunk::_go(Process::MpiChunk=HASH(0x564b40673c08), "run", HASH(0x564b40673c80), 0, 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 341
> Process::MpiChunk::run(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 357
> Process::MpiChunk::run_all(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiTiers.pm line 287
> Process::MpiTiers::run_all(Process::MpiTiers=HASH(0x564b4053f9f0), 0) called at /fast/users/fsarigo_m/miniconda3/envs/MakerX/bin/maker line 683
> Running Mkbootstrap for IndexedBase_14e0 ()
> chmod 644 "IndexedBase_14e0.bs"
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp"  -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap"   IndexedBase_14e0.xs > IndexedBase_14e0.xsc
> mv IndexedBase_14e0.xsc IndexedBase_14e0.c
> /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c  -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2   -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE"   IndexedBase_14e0.c
> /bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory
> make: *** [Makefile:330: IndexedBase_14e0.o] Error 127
> 
> A problem was encountered while attempting to compile and install your Inline
> C code. The command that failed was:
>   "make > out.make 2>&1" with error code 2
> 
> The build directory was:
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0
> 
> To debug the problem, cd to the build directory, and inspect the output files.
> 
> Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin'
>  at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275.
> --> rank=NA, hostname=med0223
> ...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869.
> --> rank=NA, hostname=med0223
> --> rank=NA, hostname=med0223
> --> rank=NA, hostname=med0223
> ERROR: Failed while examining contents of the fasta file and run log
> ERROR: Chunk failed at level:0, tier_type:0
> FAILED CONTIG:contig-dpp-500-500
> 
> examining contents of the fasta file and run log
> 
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190213/033ff22a/attachment-0003.html>

From carsonhh at gmail.com  Wed Feb 13 10:14:13 2019
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 13 Feb 2019 10:14:13 -0700
Subject: [maker-devel] MAKER not calling RepeatMasker exe indicated in
 maker_exe.ctl
In-Reply-To: <CAFOVipNgzGd-wLNqz1WGx+mM_8R3KZOtqatq6D+nuNCHboRPXQ@mail.gmail.com>
References: <CAFOVipNgzGd-wLNqz1WGx+mM_8R3KZOtqatq6D+nuNCHboRPXQ@mail.gmail.com>
Message-ID: <6AFF11A9-9860-4047-A337-4B974C6C0F30@gmail.com>

The conda installation of RepeatMasker runs oddly. It does not appear to run the ./configure script during setup, and is missing files inside the repeat library as a result.

--Carson


> On Feb 4, 2019, at 2:00 AM, Lior Glick <liorglck at gmail.com> wrote:
> 
> Dear MAKER users,
> 
> I've been using MAKER for a while now, with RepeatMasker installed locally. By that I mean that I can type 'RepeatMasker' in my terminal and the software is initiated. Typing 'which RepeatMasker' shows the correct local path.
> I also use this path as value for the maker_exe.ctl parameter 'RepeatMasker'.
> Trying to generalize my working environment, I am trying to use a conda env <https://anaconda.org/bioconda/maker> which is capable of running MAKER. This env comes with RepeatMasker as well. Once I activate this env, I can still run RepeatMasker, but it points to a different path. When I run MAKER within this env, it fails right away with the error message:
> ERROR: Could not determine if RepBase is installed
> Running the same configuration files locally (i.e. outside the conda env) results in a successful run.
> This leads me to think that MAKER is not actually using the path indicated in the maker_exe.ctl file, and rather looks for RepeatMasker in $PATH or something similar. Is that the expected behavior? Any suggestions of how to overcome this issue?
> 
> Thanks and best regards,
> Lior
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190213/204470fd/attachment-0003.html>

From carsonhh at gmail.com  Wed Feb 13 10:18:44 2019
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 13 Feb 2019 10:18:44 -0700
Subject: [maker-devel] Unknown (X) amino acids in predicted proteins
In-Reply-To: <CAOzMDPwAC-KnF_h__kOUM_s5nziOHmrGq8ika9Hfb40wny3_xQ@mail.gmail.com>
References: <CAOzMDPwAC-KnF_h__kOUM_s5nziOHmrGq8ika9Hfb40wny3_xQ@mail.gmail.com>
Message-ID: <1472E55C-62CB-4A73-B45D-C4BEF3E014B7@gmail.com>

If you use GFF3 as input, or use est2genome or protein2genome in your final run, you may have ?N? characters from the assembly as part of your CDS (?N? is the ambiguity code for DNA which will result in an ?X? when translated which is the ambiguity code for amino acids). Augustus will do internal gymnastics and completely splice out exons containing N?s to try and never have this issue, but may not always be able to. It?s an indication of genome assembly issues.

--Carson


> On Feb 11, 2019, at 7:12 AM, Lior Glick <liorglic at mail.tau.ac.il> wrote:
> 
> Dear MAKER users,
> 
> After completing a MAKER run, I looked at the protein fasta files that MAKER outputs and noticed that a small fraction of the sequences include X characters, indicating unknown amino acids. I was wondering how such sequences are obtained, I mean how come there are unknown amino acids in the prediction? Is this an indication of low-quality predictions?
> Is there any documentation regarding the procedure that generates the protein sequences?
> 
> Thanks a lot,
> Lior
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From carsonhh at gmail.com  Wed Feb 13 10:24:01 2019
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 13 Feb 2019 10:24:01 -0700
Subject: [maker-devel] Re-annotation, fewer gene predictions
In-Reply-To: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>
References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>
Message-ID: <D33A2A92-BFCA-4493-A66E-99C567954AD2@gmail.com>

One thing you can also do is use old models as protein= input and run the protein2genome option just to see where things align. You may find that not all old models are recoverable in the new assembly. Fewer genes in the new assembly may mean redundant/duplicate contigs were collapse and split contigs were joined resulting in multiple gene fragments becoming a unified single model. Make sure to always review contigs in a browser to see how models and evidence correlate.

?Carson


> On Feb 3, 2019, at 12:13 PM, morgan sobol <morgan_starr_s at live.com> wrote:
> 
> Hello, 
> 
> I previously used Maker to annotate two different fungal genomes that were created using Illumina sequences only. For these genomes, I had over 11,000 genes predicted. 
> I recently obtained PacBio sequences for the same genomes, so I created two hybrid assemblies. Both assemblies were very familiar in length and completed number of orthologs to the Illumina only assembly, but had much fewer, but longer contigs. 
> 
> I re-ran Maker using the settings below. For one of my genomes, I got around 11,000 genes predicted again, as expected. However, for the other genome, I am continuously getting ~4,400 predicted genes. 
> 
> I am asking for help as to how I can determine why I keep getting fewer predicted genes for only one of my genomes, even though I ran them the same?
> 
> Thanks,
> Morgan S. 
> 
> maker_opts.log
> #-----Genome (these are always required)
> genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked #genome sequence (fasta file or$
> organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic
> 
> #-----Re-annotation Using MAKER Derived GFF3
> maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff #MAKER derive$
> est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
> altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
> protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
> rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
> model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
> pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
> other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no
> 
> #-----EST Evidence (for best results provide a file for at least one)
> est= #set of ESTs or assembled mRNA-seq in fasta format
> altest= #EST/cDNA sequence file in fasta format from an alternate organism
> est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
> altest_gff= #aligned ESTs from a closly relate species in GFF3 format
> 
> #-----Protein Homology Evidence (for best results provide a file for at least one)
> protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta  #protein sequence file in fasta format (i.e. from mutiple oransisms)
> protein_gff=  #aligned protein homology evidence from an external GFF3 file
> 
> #-----Repeat Masking (leave values blank to skip repeat masking)
> model_org= #select a model organism for RepBase masking in RepeatMasker
> rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker
> repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner
> rm_gff= #pre-identified repeat elements from an external GFF3 file
> prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no
> softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering)
> 
> #-----Gene Prediction
> snaphmm= #SNAP HMM file
> gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file
> augustus_species=1368D_uni #Augustus gene prediction species model
> fgenesh_par_file= #FGENESH parameter file
> pred_gff= #ab-initio predictions from an external GFF3 file
> model_gff= #annotated gene models from an external GFF3 file (annotation pass-through)
> est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
> protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no
> trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
> snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
> unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no
> 
> #-----Other Annotation Feature Types (features MAKER doesn't recognize)
> other_gff= #extra features to pass-through to final MAKER generated GFF3 file
> 
> #-----External Application Behavior Options
> alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases
> cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI)
> 
> #-----MAKER Behavior Options
> max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage)
> min_contig=1 #skip genome contigs below this length (under 10kb are often useless)
> 
> pred_flank=200 #flank for extending evidence clusters sent to gene predictors
> pred_stats=1 #report AED and QI statistics for all predictions as well as models
> AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
> min_protein=0 #require at least this many amino acids in predicted proteins
> alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no
> always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no
> map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no
> keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1)
> 
> split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments)
> single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no
> single_length=250 #min length required for single exon ESTs if 'single_exon is enabled'
> correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes
> 
> tries=2 #number of times to try a contig if there is a failure for some reason
> clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no
> clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no
> TMP= #specify a directory other than the system default temporary directory for temporary files
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190213/9051057c/attachment-0003.html>

From liorglck at gmail.com  Sun Feb 17 11:50:10 2019
From: liorglck at gmail.com (Lior Glick)
Date: Sun, 17 Feb 2019 20:50:10 +0200
Subject: [maker-devel] Does Conda Maker actually work?
In-Reply-To: <0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com>
References: <VI1PR06MB5613478CC864D85EB234EDF2B5660@VI1PR06MB5613.eurprd06.prod.outlook.com>
	<0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com>
Message-ID: <CAFOVipPHWZ++FwVdBMDuMx_PTRT2Ep-MZc=iD13ezT1bgrMZwg@mail.gmail.com>

That's good to know. Any plans on creating a stable conda package in the
future? It'd be a very nice feature, especially since MAKER is not always
straightforward to install.

On Wed, Feb 13, 2019 at 5:22 PM Carson Holt <carsonhh at gmail.com> wrote:

> The conda recipe was produced by another group. I do not currently
> recommend using it because I have seen a number of issues pop up on the
> list based on people attempting to install MAKER via conda.  I know there
> is at least an issue with the conda RepeatMasker install, and there may be
> others. The specific failure you show is from Bio::DB::IndexedBase trying
> to compile an Inline::C function. It may be that conda is installing an
> older BioPerl where this issue still exists ?>
> https://github.com/bioperl/bioperl-live/issues/215
>
> Or it may be that there is a new related issue (I?ve seen a handful of
> other examples that seem to relate back to Bio::DB::IndexedBase) ?>
> https://github.com/bioperl/bioperl-live/issues/305
>
> Try installing MAKER without conda (make sure to remove any components
> that are in conda first to avoid conflicts).
>
> ?Carson
>
>
> On Feb 13, 2019, at 5:20 AM, SARIGOEL, FATIH <fatih.sarigoel at durham.ac.uk>
> wrote:
>
> Greetings,
> I notice that you never mention conda installation on your website, so I
> am curious if the conda version is actually supposed to be working fine or
> not; as for me it didn't.
> I created a new conda environment and installed Maker (tried this with
> both installation options)
> When I run the example files, I get this error:
>
> "make: *** [Makefile:330: IndexedBase_14e0.o] Error 127
> A problem was encountered while attempting to compile and install your
> Inline
> C code. The command that failed was:
>   "make > out.make 2>&1" with error code 2"
>
> My conda environment is here
> /fast_new/work/users/fsarigo_m/miniconda3
> I don't understand why the program is trying to look here:
> /home/conda
> which does not exist
>
> Also begins with a "possible precedence issue"
>
> Thanks for your help in advance!
> Fatih
>
> +++++
>
> Here is the full log until the end of the contig:
>
> (MakerX) [fsarigo_m at med0223 MAKER]$ maker
> Possible precedence issue with control flow operator at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm
> line 845.
> STATUS: Parsing control files...
> STATUS: Processing and indexing input FASTA files...
> STATUS: Setting up database for any GFF3 input...
> A data structure will be created for you at:
>
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_datastore
>
> To access files for individual sequences use the datastore index:
>
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_master_datastore_index.log
>
> STATUS: Now running MAKER...
> examining contents of the fasta file and run log
>
>
>
> --Next Contig--
>
> Processing run.log file...
> #---------------------------------------------------------------------
> Now starting the contig!!
> SeqID: contig-dpp-500-500
> Length: 32156
> #---------------------------------------------------------------------
>
>
> Running Mkbootstrap for IndexedBase_14e0 ()
> chmod 644 "IndexedBase_14e0.bs"
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl"
> -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs
> blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl"
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp"
> -typemap
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap"
>  IndexedBase_14e0.xs > IndexedBase_14e0.xsc
> mv IndexedBase_14e0.xsc IndexedBase_14e0.c
> /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc
> -c  -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin"
> -D_REENTRANT -D_GNU_SOURCE
> --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot
> -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong
> -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2
>  -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC
> --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot
> "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE"
>  IndexedBase_14e0.c
> /bin/sh:
> /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc:
> No such file or directory
> make: *** [Makefile:330: IndexedBase_14e0.o] Error 127
>
> A problem was encountered while attempting to compile and install your
> Inline
> C code. The command that failed was:
>   "make > out.make 2>&1" with error code 2
>
> The build directory was:
>
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0
>
> To debug the problem, cd to the build directory, and inspect the output
> files.
>
> Environment PATH =
> '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin'
>  at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm
> line 275.
> --> rank=NA, hostname=med0223
> ...propagated at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm
> line 869.
> --> rank=NA, hostname=med0223
>  at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm
> line 38.
> Error::_throw_Error_Simple(HASH(0x564b40c78870)) called at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm
> line 306
> Error::subs::run_clauses(HASH(0x564b40688970), "Running Mkbootstrap for
> IndexedBase_14e0 ()\x{a}chmod 644 \"Indexe"..., undef,
> ARRAY(0x564b40673ad0)) called at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm
> line 426
> Error::subs::try(CODE(0x564b406899b8), HASH(0x564b40688970)) called at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/FastaSeq.pm
> line 95
> FastaSeq::seq(FastaSeq=HASH(0x564b4068a7f0)) called at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm
> line 478
> Process::MpiChunk::_go(Process::MpiChunk=HASH(0x564b40673c08), "run",
> HASH(0x564b40673c80), 0, 0) called at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm
> line 341
> Process::MpiChunk::run(Process::MpiChunk=HASH(0x564b40673c08), 0) called
> at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm
> line 357
> Process::MpiChunk::run_all(Process::MpiChunk=HASH(0x564b40673c08), 0)
> called at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiTiers.pm
> line 287
> Process::MpiTiers::run_all(Process::MpiTiers=HASH(0x564b4053f9f0), 0)
> called at /fast/users/fsarigo_m/miniconda3/envs/MakerX/bin/maker line 683
> Running Mkbootstrap for IndexedBase_14e0 ()
> chmod 644 "IndexedBase_14e0.bs"
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl"
> -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs
> blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl"
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp"
> -typemap
> "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap"
>  IndexedBase_14e0.xs > IndexedBase_14e0.xsc
> mv IndexedBase_14e0.xsc IndexedBase_14e0.c
> /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc
> -c  -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin"
> -D_REENTRANT -D_GNU_SOURCE
> --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot
> -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong
> -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2
>  -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC
> --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot
> "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE"
>  IndexedBase_14e0.c
> /bin/sh:
> /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc:
> No such file or directory
> make: *** [Makefile:330: IndexedBase_14e0.o] Error 127
>
> A problem was encountered while attempting to compile and install your
> Inline
> C code. The command that failed was:
>   "make > out.make 2>&1" with error code 2
>
> The build directory was:
>
> /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0
>
> To debug the problem, cd to the build directory, and inspect the output
> files.
>
> Environment PATH =
> '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin'
>  at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm
> line 275.
> --> rank=NA, hostname=med0223
> ...propagated at
> /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm
> line 869.
> --> rank=NA, hostname=med0223
> --> rank=NA, hostname=med0223
> --> rank=NA, hostname=med0223
> ERROR: Failed while examining contents of the fasta file and run log
> ERROR: Chunk failed at level:0, tier_type:0
> FAILED CONTIG:contig-dpp-500-500
>
> examining contents of the fasta file and run log
>
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190217/678d8fd1/attachment-0003.html>

From morgan_starr_s at live.com  Mon Feb 18 02:08:56 2019
From: morgan_starr_s at live.com (morgan sobol)
Date: Mon, 18 Feb 2019 09:08:56 +0000
Subject: [maker-devel] Re-annotation, fewer gene predictions
In-Reply-To: <CAL0hg4HG0n1+kw4PpFL_LG66nE+Sdd1fzX2Atn5+o+KryVCtug@mail.gmail.com>
References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com>
	<CAL0hg4HevFbPhVLfuLq3WF7iJUFpHKwm0X9q+X_yX5sJsCqKDA@mail.gmail.com>
	<DM5PR14MB129277D10A397B2CBE0DDA08AE6E0@DM5PR14MB1292.namprd14.prod.outlook.com>
	<CAL0hg4EH=79A7ucKe=ORznXh=7Suu9Q8AEWj7C8Xio82=G4fvw@mail.gmail.com>
	<DM5PR14MB1292FEA9F662D408FEBB3D21AE6F0@DM5PR14MB1292.namprd14.prod.outlook.com>,
	<CAL0hg4HG0n1+kw4PpFL_LG66nE+Sdd1fzX2Atn5+o+KryVCtug@mail.gmail.com>
Message-ID: <DM5PR14MB1292E82A4864CCC40B80122EAE630@DM5PR14MB1292.namprd14.prod.outlook.com>

Thank you, Xabi and Carson.
With your help, I was able to improve the annotation with a more appropriate number of predictions.

Best,
Morgan

________________________________
From: Xabier V?zquez-Campos <xvazquezc at gmail.com>
Sent: Wednesday, February 6, 2019 11:33 PM
To: morgan sobol; Maker Mailing List
Subject: Re: [maker-devel] Re-annotation, fewer gene predictions

SNAP is easy to train, works well in fungal genomes and it's explained in Maker's wiki:
http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_WGS_Assembly_and_Annotation_Winter_School_2018#Training_ab_initio_Gene_Predictors

Oh, sorry, I didn't explain myself well. What I was trying to say is that before BUSCO, when we only had CEGMA, we would proceed in a different way to train Augustus as CEGMA wouldn't produce Augustus gene models automatically. I don't mean you to use CEGMA.

This is what I have on my own documentation about how to train Augustus "the old way"
AUGUSTUS? the old way

Alternatively, you can train AUGUSTUS in a more ?manual? way, like when we were using CEGMA. The training starts with the output from the second instance of fathom in the SNAP training section.

cd ${MYGENOME_DIR}/maker/snap1
perl ~/bin/zff2augustus_gbk.pl<http://zff2augustus_gbk.pl> > ${MYGENOME}.train1.gb<http://train1.gb>

zff2augustus_gbk.pl<http://zff2augustus_gbk.pl> generates a GenBank file from export.dna.

The actual training of AUGUSTUS will be through the webAUGUSTUS server.

Before proceed, it is recommended to rename the fasta headers, specially if they contain special characters and/or very long headers. This is the main reason of failure for the jobs submitted to webAUGUSTUS. You can use the simplifyFastaHeaders.pl<http://bioinf.uni-greifswald.de/bioinf/downloads/simplifyFastaHeaders.pl> script for that:

perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_assembly.fasta nameStem ${MYGENOME}_contigs_rename.fasta ${MYGENOME}_contigs.map

perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_transcripts_assembled.fasta nameStem ${MYGENOME}_rna_rename.fasta ${MYGENOME}_rna.map

nameStem is the base name for naming each of the sequences in the multifasta files. Use a value with something appropriate. Use contig and rna for the assembly and RNA-seq files, respectively; or something based on that. For example, ?pgcontig? and ?pgrna? for contigs and RNA from Puccinia graminis
DO NOT give the same nameStem to both fasta files, and don?t use any special character.

We need the following files (minimum):

  *   ${MYGENOME}_assembly.fasta as Genome file
  *   ${MYGENOME}.train1.gb<http://train1.gb> as Training gene structure file

If we also have RNA-seq data:

  *   ${MYGENOME}_assembled_transcripts.fasta as cDNA file

Use ${MYGENOME}_v1 as Species name. We will need to have a different species name in the retraining step. Otherwise when Maker2 is rerun, Maker2 will see the same name and will not rerun AUGUSTUS, even though the species profile is different. So, ${MYGENOME}_v1 just do the job and tracks version.

Once the job is finished, the Species parameter archive (parameters.tar.gz) will contain a folder with the model files for your species. Copy it to the species folder of your AUGUSTUS installation.

Hope this helps

PS: hit reply all so this is logged in Maker's mail list in case anybody else experiences similar issues

On Thu, 7 Feb 2019 at 06:36, morgan sobol <morgan_starr_s at live.com<mailto:morgan_starr_s at live.com>> wrote:
I have not used SNAP or CEGMA, however, I see that CEGMA was discontinued in 2015.
Do you think that will be a problem, or is it still worth using the old version?


________________________________
From: Xabier V?zquez-Campos <xvazquezc at gmail.com<mailto:xvazquezc at gmail.com>>
Sent: Tuesday, February 5, 2019 4:42 PM
To: morgan sobol; Maker Mailing List
Subject: Re: [maker-devel] Re-annotation, fewer gene predictions

Don't you use SNAP? It usually produces quite decent results. And easier to train than any of the other predictors

In any case, the Augustus gene model is way off in both cases
GM doesn't seem bad if your fungus has a rather usual genome... in the first. For the second, it looks bad

I'm not too familiar with the reannotation but I'd rather create the gene models from scratch rather than reuse the ones from the Illumina-only genomes.
Note that assemblies with long-reads, have a higher proportion of repetitive elements that need masking and RepeatMasker only may not be enough. In theory, this shouldn't affect Augustus model if trained through BUSCO as it uses defined conserved markers to create the gene model, but I'm not so sure about GM.

If you trained Augustus with BUSCO, and this is the result, I'd discard the gene model and train it again by the "traditional way", i.e. as it used to be when we only had CEGMA. I had good results just by changing the training method.

Hope it helps,
Xabi


On Wed, 6 Feb 2019 at 02:19, morgan sobol <morgan_starr_s at live.com<mailto:morgan_starr_s at live.com>> wrote:
Thank you, Xabi for the response.
The number of proteins from each source is greatly lower than before.
Previous numbers were 325, 10,899, and 11,243 for augustus, genemark, and maker respectively.
The more recent numbers are 25, 857, 4418 respectively.

So do you think maybe this hints that something is wrong from genemark?

Morgan


________________________________
From: Xabier V?zquez-Campos <xvazquezc at gmail.com<mailto:xvazquezc at gmail.com>>
Sent: Sunday, February 3, 2019 4:43 PM
To: morgan sobol
Cc: maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Re-annotation, fewer gene predictions

Hi Morgan,

We had a similar issue with AUGUSTUS underpredicting when using a BUSCO-derived gene model
https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ

Also, check the number of proteins by each individual predictor. If the numbers from one of them are off, you may find a possible source of issues.
We didn't have a very good experience with GM, as it used to overpredict an absurd number of proteins.

Xabi

On Mon, 4 Feb 2019 at 06:15, morgan sobol <morgan_starr_s at live.com<mailto:morgan_starr_s at live.com>> wrote:
Hello,

I previously used Maker to annotate two different fungal genomes that were created using Illumina sequences only. For these genomes, I had over 11,000 genes predicted.
I recently obtained PacBio sequences for the same genomes, so I created two hybrid assemblies. Both assemblies were very familiar in length and completed number of orthologs to the Illumina only assembly, but had much fewer, but longer contigs.

I re-ran Maker using the settings below. For one of my genomes, I got around 11,000 genes predicted again, as expected. However, for the other genome, I am continuously getting ~4,400 predicted genes.

I am asking for help as to how I can determine why I keep getting fewer predicted genes for only one of my genomes, even though I ran them the same?

Thanks,
Morgan S.

maker_opts.log
#-----Genome (these are always required)
genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked #genome sequence (fasta file or$
organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic

#-----Re-annotation Using MAKER Derived GFF3
maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff #MAKER derive$
est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no

#-----EST Evidence (for best results provide a file for at least one)
est= #set of ESTs or assembled mRNA-seq in fasta format
altest= #EST/cDNA sequence file in fasta format from an alternate organism
est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
altest_gff= #aligned ESTs from a closly relate species in GFF3 format

#-----Protein Homology Evidence (for best results provide a file for at least one)
protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta  #protein sequence file in fasta format (i.e. from mutiple oransisms)
protein_gff=  #aligned protein homology evidence from an external GFF3 file

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org= #select a model organism for RepBase masking in RepeatMasker
rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker
repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner
rm_gff= #pre-identified repeat elements from an external GFF3 file
prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no
softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering)

#-----Gene Prediction
snaphmm= #SNAP HMM file
gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file
augustus_species=1368D_uni #Augustus gene prediction species model
fgenesh_par_file= #FGENESH parameter file
pred_gff= #ab-initio predictions from an external GFF3 file
model_gff= #annotated gene models from an external GFF3 file (annotation pass-through)
est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no
trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no

#-----Other Annotation Feature Types (features MAKER doesn't recognize)
other_gff= #extra features to pass-through to final MAKER generated GFF3 file

#-----External Application Behavior Options
alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases
cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI)

#-----MAKER Behavior Options
max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage)
min_contig=1 #skip genome contigs below this length (under 10kb are often useless)

pred_flank=200 #flank for extending evidence clusters sent to gene predictors
pred_stats=1 #report AED and QI statistics for all predictions as well as models
AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
min_protein=0 #require at least this many amino acids in predicted proteins
alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no
always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no
map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no
keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1)

split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments)
single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no
single_length=250 #min length required for single exon ESTs if 'single_exon is enabled'
correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes

tries=2 #number of times to try a contig if there is a failure for some reason
clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no
clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no
TMP= #specify a directory other than the system default temporary directory for temporary files

_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


--
Xabier V?zquez-Campos, PhD
Research Associate
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA


--
Xabier V?zquez-Campos, PhD
Research Associate
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA


--
Xabier V?zquez-Campos, PhD
Research Associate
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190218/cd5b4c18/attachment-0003.html>

From anthony.bretaudeau at inria.fr  Mon Feb 18 02:53:39 2019
From: anthony.bretaudeau at inria.fr (Anthony Bretaudeau)
Date: Mon, 18 Feb 2019 10:53:39 +0100
Subject: [maker-devel] Does Conda Maker actually work?
In-Reply-To: <CAFOVipPHWZ++FwVdBMDuMx_PTRT2Ep-MZc=iD13ezT1bgrMZwg@mail.gmail.com>
References: <VI1PR06MB5613478CC864D85EB234EDF2B5660@VI1PR06MB5613.eurprd06.prod.outlook.com>
	<0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com>
	<CAFOVipPHWZ++FwVdBMDuMx_PTRT2Ep-MZc=iD13ezT1bgrMZwg@mail.gmail.com>
Message-ID: <3aa1eb97-f8bf-dd61-febf-464ad4b1626c@inria.fr>

An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190218/d42974d5/attachment-0003.html>

From liorglic at mail.tau.ac.il  Sun Feb 24 05:50:49 2019
From: liorglic at mail.tau.ac.il (Lior Glick)
Date: Sun, 24 Feb 2019 14:50:49 +0200
Subject: [maker-devel] Profiling MAKER runs
Message-ID: <CAOzMDPyHL9tM-DWTBJb=SSMT1KH6FwhArdgqgN-8aVoBthY69g@mail.gmail.com>

Dear MAKER users,
I was wondering if any of you has an idea of a way by which I can profile
my runs. What I mean is I'd like to know how much time was spent on each
step of the analysis - am I spending most of the time masking repeats,
blasting transcripts/proteins, running ab-initio predictors etc. Based on
this information, I might want to adjust my configuration, e.g. maybe I'm
spending a lot of time blasting transcripts, and reducing the number of
input transcripts would reduce run time significantly without having a
major effect on results quality.
As far as I can see, the main run log does not provide such information,
and I'm not sure where else to look. Any ideas or directions could be of
help.

Thanks!
Lior
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190224/584449c3/attachment-0003.html>